<br>Hi All,<br><br>I am having a few problems with a gluster configuration I'm using. The issues are:<br><br>1) Sometimes the gluster client running on ServerA stops serving files. Doing an "ls" on the mount point returns an empty directory. All the other clients seem fine when this happens. Unmounting and remounting the gluster directory temporarily "fixes" the problem. Sometimes it fixes it for a few minutes, sometimes it fixes it for a day.<br>
2) The log files in /var/log/glusterfs are not being rotated on ServerA. They are being rotated on ServerB.<br>3) On ServerB I have both /etc/glusterd and /etc/glusterfs. ServerA and the pure clients have only /etc/glusterfs.<br>
<br>Here is some info on my setup, but if there is any info missing please let me know and I'll provide it.<br><br><span style="font-family:courier new,monospace">Gluster version: 3.3.0<br>OS: Ubuntu 12.04 (running on EC2)<br>
</span><br>On ServerA the following is filling up the /var/log/glusterfs/glustershd.log file:<br><br><span style="font-family:courier new,monospace">[2012-12-18 19:20:46.819623] I [afr-common.c:1340:afr_launch_self_heal] 0-default-replicate-0: background entry self-heal triggered. path: <gfid:d88ad693-86fd-49eb-9360-7fe89d0e6cf6>, reason: lookup detected pending operations<br>
[2012-12-18 19:20:46.831481] E [afr-self-heal-common.c:1087:afr_sh_common_lookup_resp_handler] 0-default-replicate-0: path <gfid:d88ad693-86fd-49eb-9360-7fe89d0e6cf6>/test_quote.pdf on subvolume default-client-1 => -1 (No such file or directory)<br>
[2012-12-18 19:20:46.831512] I [afr-self-heal-entry.c:1904:afr_sh_entry_common_lookup_done] 0-default-replicate-0: <gfid:d88ad693-86fd-49eb-9360-7fe89d0e6cf6>/test_quote.pdf: Skipping entry self-heal because of gfid absence<br>
[2012-12-18 19:20:46.833554] E [afr-self-heal-common.c:2156:afr_self_heal_completion_cbk] 0-default-replicate-0: background entry self-heal failed on <gfid:d88ad693-86fd-49eb-9360-7fe89d0e6cf6><br></span><br>I have a single replicated volume called "default". There are two servers each with one brick.<br>
<br><span style="font-family:courier new,monospace">gluster> volume info<br> <br>Volume Name: default<br>Type: Replicate<br>Volume ID: cb46f3ac-2ae1-4c9d-a2af-0df242b2acd3<br>Status: Started<br>Number of Bricks: 1 x 2 = 2<br>
Transport-type: tcp<br>Bricks:<br>Brick1: ServerA:/ebs/gluster/default<br>Brick2: ServerB:/ebs/gluster/default<br></span><br><span style="font-family:courier new,monospace">gluster> volume status all<br>Status of volume: default<br>
Gluster process Port Online Pid<br>------------------------------------------------------------------------------<br>Brick ServerA:/ebs/gluster/default 24009 Y 3575<br>Brick ServerB:/ebs/gluster/default 24009 Y 2241<br>
NFS Server on localhost 38467 Y 3581<br>Self-heal Daemon on localhost N/A Y 3587<br>NFS Server on ServerB 38467 Y 2247<br>Self-heal Daemon on ServerB N/A Y 2253</span><br>
<br>In addition to ServerA and ServerB (which are also running the gluster client) there are about 10 other systems acting as pure clients.<br><br>Does anybody have any ideas what might be causing my problems? Or additional things to check?<br>
<br>Thanks in advance!<br>- chris<br><br><br>