<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    I recently upgraded my dev cluster to 3.3.&nbsp; To do this I copied the

    data out of the old volume into a bare disk, wiped out everything

    about Gluster, installed the 3.3 packages, create a new volume (I

    wanted to change my brick layout), then copied the data back into

    the new volume.&nbsp; Previously everything worked fine but now my users

    are complaining of random errors when compiling software.<br>

    <br>

    I enabled debug logging for the clients and I see this:<br>

    <br>

    x36e3613ad3] (--&gt;/usr/lib64/libglusterfs.so.0(mem_get0+0x1b)

    [0x36e364018b]))) 0-mem-pool: Mem pool is full. Callocing mem<br>

    [2012-06-12 17:12:02.783526] D [mem-pool.c:457:mem_get]

    (--&gt;/usr/lib64/libglusterfs.so.0(dict_unserialize+0x28d)

    [0x36e361413d] (--&gt;/usr/lib64/libglusterfs.so.0(dict_set+0x163)

    [0<br>

    x36e3613ad3] (--&gt;/usr/lib64/libglusterfs.so.0(mem_get0+0x1b)

    [0x36e364018b]))) 0-mem-pool: Mem pool is full. Callocing mem<br>

    [2012-06-12 17:12:02.783584] D [mem-pool.c:457:mem_get]

    (--&gt;/usr/lib64/libglusterfs.so.0(dict_unserialize+0x28d)

    [0x36e361413d] (--&gt;/usr/lib64/libglusterfs.so.0(dict_set+0x163)

    [0<br>

    x36e3613ad3] (--&gt;/usr/lib64/libglusterfs.so.0(mem_get0+0x1b)

    [0x36e364018b]))) 0-mem-pool: Mem pool is full. Callocing mem<br>

    [2012-06-12 17:12:45.726083] D

    [client-handshake.c:184:client_start_ping] 0-vol_home-client-0:

    returning as transport is already disconnected OR there are no

    frames (0 || 0)<br>

    [2012-06-12 17:12:45.726154] D

    [client-handshake.c:184:client_start_ping] 0-vol_home-client-3:

    returning as transport is already disconnected OR there are no

    frames (0 || 0)<br>

    [2012-06-12 17:12:45.726171] D

    [client-handshake.c:184:client_start_ping] 0-vol_home-client-1:

    returning as transport is already disconnected OR there are no

    frames (0 || 0)<br>

    <b>[2012-06-12 17:15:35.888437] E [rpc-clnt.c:208:call_bail]

      0-vol_home-client-2: bailing out frame type(GlusterFS 3.1)

      op(RENAME(8)) xid = 0x2015421x sent = 2012-06-12 16:45:26.237621.

      timeout = 1800</b><br>

    [2012-06-12 17:15:35.888507] W

    [client3_1-fops.c:2385:client3_1_rename_cbk] 0-vol_home-client-2:

    remote operation failed: Transport endpoint is not connected<br>

    [2012-06-12 17:15:35.888529] W [dht-rename.c:478:dht_rename_cbk]

    0-vol_home-dht:

    /sam/senthil/genboree/SupportingPkgs/gcc-3.4.6/x86_64-unknown-linux-gnu/32/libjava/java/net/SocketException.class.tmp:

    rename on vol_home-client-2 failed (Transport endpoint is not

    connected)<br>

    [2012-06-12 17:15:35.889803] W [fuse-bridge.c:1516:fuse_rename_cbk]

    0-glusterfs-fuse: 2776710:

    /sam/senthil/genboree/SupportingPkgs/gcc-3.4.6/x86_64-unknown-linux-gnu/32/libjava/java/net/SocketException.class.tmp

    -&gt;

    /sam/senthil/genboree/SupportingPkgs/gcc-3.4.6/x86_64-unknown-linux-gnu/32/libjava/java/net/SocketException.class

    =&gt; -1 (Transport endpoint is not connected)<br>

    [2012-06-12 17:15:35.890002] D [mem-pool.c:457:mem_get]

    (--&gt;/usr/lib64/libglusterfs.so.0(dict_new+0xb) [0x36e3613d6b]

    (--&gt;/usr/lib64/libglusterfs.so.0(get_new_dict_full+0x27)

    [0x36e3613c67] (--&gt;/usr/lib64/libglusterfs.so.0(mem_get0+0x1b)

    [0x36e364018b]))) 0-mem-pool: Mem pool is full. Callocing mem<br>

    [2012-06-12 17:15:35.890167] D [mem-pool.c:457:mem_get]

    (--&gt;/usr/lib64/glusterfs/3.3.0/xlator/performance/md-cache.so(mdc_load_reqs+0x3d)

    [0x2aaaac201a2d] (--&gt;/usr/lib64/libglusterfs.so.0(dict_set+0x163)

    [0x36e3613ad3] (--&gt;/usr/lib64/libglusterfs.so.0(mem_get0+0x1b)

    [0x36e364018b]))) 0-mem-pool: Mem pool is full. Callocing mem<br>

    [2012-06-12 17:15:35.890258] D [mem-pool.c:457:mem_get]

    (--&gt;/usr/lib64/glusterfs/3.3.0/xlator/performance/md-cache.so(mdc_load_reqs+0x3d)

    [0x2aaaac201a2d] (--&gt;/usr/lib64/libglusterfs.so.0(dict_set+0x163)

    [0x36e3613ad3] (--&gt;/usr/lib64/libglusterfs.so.0(mem_get0+0x1b)

    [0x36e364018b]))) 0-mem-pool: Mem pool is full. Callocing mem<br>

    [2012-06-12 17:15:35.890311] D [mem-pool.c:457:mem_get]

    (--&gt;/usr/lib64/glusterfs/3.3.0/xlator/performance/md-cache.so(mdc_load_reqs+0x3d)

    [0x2aaaac201a2d] (--&gt;/usr/lib64/libglusterfs.so.0(dict_set+0x163)

    [0x36e3613ad3] (--&gt;/usr/lib64/libglusterfs.so.0(mem_get0+0x1b)

    [0x36e364018b]))) 0-mem-pool: Mem pool is full. Callocing mem<br>

    [2012-06-12 17:15:35.890363] D [mem-pool.c:457:mem_get]

    (--&gt;/usr/lib64/glusterfs/3.3.0/xlator/performance/md-cache.so(mdc_load_reqs+0x3d)

    [0x2aaaac201a2d] (--&gt;/usr/lib64/libglusterfs.so.0(dict_set+0x163)

    [0x36e3613ad3] (--&gt;/usr/lib64/libglusterfs.so.0(mem_get0+0x1b)

    [0x36e364018b]))) 0-mem-pool: Mem pool is full. Callocing mem<br>

    ** and so on, more of the same...<br>

    <br>

    If I enable debug logging on the bricks I see thousands of these

    lines every minute and I'm forced to disable the logging:<br>

    <br>

    [2012-06-12 15:32:45.760598] D [io-threads.c:268:iot_schedule]

    0-vol_home-io-threads: LOOKUP scheduled as fast fop<br>

    <br>

    Here's my config:<br>

    <br>

    # gluster volume info<br>

    Volume Name: vol_home<br>

    Type: Distribute<br>

    Volume ID: 07ec60be-ec0c-4579-a675-069bb34c12ab<br>

    Status: Started<br>

    Number of Bricks: 4<br>

    Transport-type: tcp<br>

    Bricks:<br>

    Brick1: storage0-dev.cssd.pitt.edu:/brick/0<br>

    Brick2: storage1-dev.cssd.pitt.edu:/brick/2<br>

    Brick3: storage0-dev.cssd.pitt.edu:/brick/1<br>

    Brick4: storage1-dev.cssd.pitt.edu:/brick/3<br>

    Options Reconfigured:<br>

    diagnostics.brick-log-level: INFO<br>

    diagnostics.client-log-level: INFO<br>

    features.limit-usage:

    /home/cssd/jaw171:50GB,/cssd:200GB,/cssd/jaw171:75GB<br>

    nfs.rpc-auth-allow: 10.54.50.*,127.*<br>

    auth.allow: 10.54.50.*,127.*<br>

    performance.io-cache: off<br>

    cluster.min-free-disk: 5<br>

    performance.cache-size: 128000000<br>

    features.quota: on<br>

    nfs.disable: on<br>

    <br>

    # rpm -qa | grep gluster<br>

    glusterfs-fuse-3.3.0-1.el6.x86_64<br>

    glusterfs-server-3.3.0-1.el6.x86_64<br>

    glusterfs-3.3.0-1.el6.x86_64<br>

    <br>

    Name resolution is fine on everything, everything can ping

    everything else by name, no firewalls are running anywhere, there's

    no disk errors on the storage nodes.<br>

    <br>

    Did the way I copied data out of one volume and back into another

    cause this (some xattr problem)?&nbsp; What else could be causing this

    problem?&nbsp; I'm looking to go production with GlusterFS on a 242 (soon

    to grow) node HPC cluster at the end of this month.<br>

    <br>

    Also, one of my co-workers improved upon an existing remote quota

    viewer written in Python.&nbsp; I'll post the code soon for those

    interested.<br>

    <pre class="moz-signature" cols="72">-- 

Jeff White - Linux/Unix Systems Engineer

University of Pittsburgh - CSSD</pre>

  </body>

</html>