<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
Hi,
<div><br>
</div>
<div>I will try to reproduce this in a vagrant-cluster environnement.</div>
<div><br>
</div>
<div>If it can help, here is the time line of the event.</div>
<div><br>
</div>
<div>t0: 2 serveurs in replicate mode no issue</div>
<div>t1: power down server1 due to hardware issue</div>
<div>t2: server2 still continue to serve files through NFS and fuse, and still continue to being updated by automated build process / copy from other places</div>
<div>t3: power up server1 which has been fixed, auto healing start for files, in the mean time we had a Jenkins job deploying files and remove/create symlinks to the proper target (through NFS and fuse).</div>
<div>t4: Heal failed on some directory, in fact some symlinks in those directory has not been updated from those located on server2 and we could not access those directory anymore (I/O Error).</div>
<div>t5: Removing the symlinks from server1 (directly on the brick), start a new replicate from server2, the symlink are now consistent between the two servers.</div>
<div><br>
</div>
<div>We didn’t add/remove brick during this process.</div>
<div><br>
</div>
<div>As I can see, the split-mount help to remove the bad files from bricks in a split-brain like event. (That what we have done by hand so far).</div>
<div><br>
</div>
<div>Thanks</div>
<div><br>
<div>
<div>On Apr 19, 2014, at 9:42 AM, Joe Julian &lt;<a href="mailto:joe@julianfamily.org">joe@julianfamily.org</a>&gt; wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
What would really help is a clear list of steps to reproduce this issue. It sounds like a bug but I can't repro.<br>
<br>
In your questions you ask in relation to adding or removing bricks whether you can continue to read and write. My understanding is that you're not doing that (gluster volume (add|remove)-brick) but rather just shutting down. If my understanding is correct,
 then yes. You should be able to continue normal operation. <br>
Repairing this issue is the same as healing split-brain. The easiest way is to use splitmount[1] to delete one of them.<br>
<br>
[1] <a href="https://forge.gluster.org/splitmount">https://forge.gluster.org/splitmount</a><br>
<br>
<br>
<br>
<div class="gmail_quote">On April 17, 2014 4:37:32 PM PDT, &quot;PEPONNET, Cyril (Cyril)&quot; &lt;<a href="mailto:cyril.peponnet@alcatel-lucent.com">cyril.peponnet@alcatel-lucent.com</a>&gt; wrote:
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div>Hi gluster people !</div>
<div><br>
</div>
<div>I would like some help regarding an issue we have with our early production glusterfs setup.</div>
<div><br>
</div>
<div><b>Our Topology:</b></div>
<div><br>
</div>
<div>2 Bricks in Replicate mode:</div>
<div><br>
</div>
<div>[root@myBrick1 /]# cat /etc/redhat-release&nbsp;<br>
CentOS release 6.5 (Final)<br>
[root@myBrick1 /]# glusterfs --version<br>
glusterfs 3.4.2 built on Jan&nbsp;&nbsp;3 2014 12:38:05<br>
Repository revision: <a href="git://git.gluster.com/glusterfs.git">git://git.gluster.com/glusterfs.git</a><br>
Copyright (c) 2006-2013 Red Hat, Inc. &lt;<a href="http://www.redhat.com/">http://www.redhat.com/</a>&gt;<br>
<br>
</div>
<div><br>
</div>
<div>[root@myBrick1 /]# gluster volume info</div>
<div>&nbsp;</div>
<blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;">
<div>Volume Name: myVol</div>
<div>Type: Replicate</div>
<div>Volume ID: 58f5d775-acb5-416d-bee6-5209f7b20363</div>
<div>Status: Started</div>
<div>Number of Bricks: 1 x 2 = 2</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1: myBrick1.company.lan:/export/raid/myVol</div>
<div>Brick2: myBrick2.company.lan:/export/raid/myVol</div>
<div>Options Reconfigured:</div>
<div>nfs.enable-ino32: on</div>
</blockquote>
<div><br>
</div>
<div><b>The issue:</b></div>
<div><br>
</div>
<div>We power down a brick (myBrick1) for hardware maintenance, when we power it up, issues starts with some files (symlinks in fact), auto healing seems not working fine for all the files…</div>
<div><br>
</div>
<div>Let's take a look with one faulty symlink:</div>
<div><br>
</div>
<div>Using fuse.glusterfs (sometimes it works sometimes not)</div>
<div><br>
</div>
<div>[root@myBrick2 /]mount</div>
<blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;">
<div>...</div>
<div>myBrick2.company.lan:/myVol on /images type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)</div>
<div>...</div>
</blockquote>
<div><br>
</div>
<div>[root@myBrick2 /]# stat /images/myProject1/2.1_stale/current</div>
<blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;">
<div>&nbsp; File: `/images/myProject1/2.1_stale/current' -&gt; `current-59a77422'</div>
<div>&nbsp; Size: 16 &nbsp; &nbsp; &nbsp; &nbsp;<span class="Apple-tab-span" style="white-space: pre;"> </span>
Blocks: 0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;IO Block: 131072 symbolic link</div>
<div>Device: 13h/19d<span class="Apple-tab-span" style="white-space: pre;"> </span>
Inode: 11422905275486058235 &nbsp;Links: 1</div>
<div>Access: (0777/lrwxrwxrwx) &nbsp;Uid: ( &nbsp;499/ testlab) &nbsp; Gid: ( &nbsp;499/ testlab)</div>
<div>Access: 2014-04-17 14:05:54.488238322 -0700</div>
<div>Modify: 2014-04-16 19:46:05.033299589 -0700</div>
<div>Change: 2014-04-17 14:05:54.487238322 -0700</div>
</blockquote>
<div><br>
</div>
<div>[root@myBrick2 /]# stat /images/myProject1/2.1_stale/current</div>
<blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;">stat: cannot stat `/images/myProject1/2.1_stale/current': Input/output error</blockquote>
<div><br>
</div>
<div>I type the above commands with few seconds between them.</div>
<div><br>
</div>
<div>Let's try with the other brick</div>
<div><br>
</div>
<div>[root@myBrick1 ~]mount</div>
<blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;">
<div>...</div>
<div>myBrick1.company.lan:/myVol on /images type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)</div>
<div>...</div>
</blockquote>
<div><br>
</div>
<div>[root@myBrick1 ~]# stat /images/myProject1/2.1_stale/current</div>
<blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;">stat: cannot stat `/images/myProject1/2.1_stale/current': Input/output error</blockquote>
<div><br>
</div>
<div>With this one it always fail… (myBrick1 is the server we powered up after maintenance).</div>
<div><br>
</div>
<div>Using nfs:</div>
<div><br>
</div>
<div>It never works (tested with two bricks)</div>
<div><br>
</div>
<div>[root@station-localdomain myProject1]# mount</div>
<blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;">
<div>...</div>
<div>myBrick1:/myVol on /images type nfs (rw,relatime,vers=3,rsize=8192,wsize=8192,namlen=255,hard,proto=tcp,timeo=14,retrans=2,sec=sys,mountaddr=10.0.0.57,mountvers=3,mountport=38465,mountproto=tcp,local_lock=none,addr=10.0.0.57)</div>
<div>...</div>
</blockquote>
<div><br>
</div>
<div>[root@station-localdomain myProject1]# ls 2.1_stale&nbsp;</div>
<blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;">ls: cannot access 2.1_stale: Input/output error</blockquote>
<div><br>
</div>
<div>In both cases here are the logs:</div>
<div><br>
</div>
<div>==&gt; /var/log/glusterfs/glustershd.log &lt;==</div>
<div>[2014-04-17 10:20:25.861003] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0: &lt;gfid:fcbbe770-6388-4d74-a78a-7939b17e36aa&gt;: Performing conservative merge</div>
<div>[2014-04-17 10:20:25.895143] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0: &lt;gfid:ae058719-61de-47de-82dc-6cb8a3d80afe&gt;: Performing conservative merge</div>
<div>[2014-04-17 10:20:25.949176] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0: &lt;gfid:868e3eb7-03e6-4b6b-a75a-16b31bdf8a10&gt;: Performing conservative merge</div>
<div>[2014-04-17 10:20:25.995289] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0: &lt;gfid:115efb83-2154-4f9d-8c70-a31f476db110&gt;: Performing conservative merge</div>
<div>[2014-04-17 10:20:26.013995] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0: &lt;gfid:0982330a-2e08-4b97-9ea5-cf991d295e41&gt;: Performing conservative merge</div>
<div>[2014-04-17 10:20:26.050693] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0: &lt;gfid:3a15b54b-b92c-4ed5-875e-1af0a3b94e0c&gt;: Performing conservative merge</div>
<div><br>
</div>
<div>==&gt; /var/log/glusterfs/usr-global.log &lt;==</div>
<div>[2014-04-17 10:20:38.281705] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0: /images/myProject1/2.1_stale: Performing conservative merge</div>
<div>[2014-04-17 10:20:38.286986] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_s: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:38.287030] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_s</div>
<div>[2014-04-17 10:20:38.287169] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_b: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:38.287202] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_b</div>
<div>[2014-04-17 10:20:38.287280] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_n: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:38.287308] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_n</div>
<div>[2014-04-17 10:20:38.287506] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/current: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:38.287538] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/current</div>
<div>[2014-04-17 10:20:38.311222] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_s: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:38.311277] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_s</div>
<div>[2014-04-17 10:20:38.311345] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_b: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:38.311385] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_b</div>
<div>[2014-04-17 10:20:38.311473] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/current: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:38.311502] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/current</div>
<div>[2014-04-17 10:20:38.332110] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_n: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:38.332149] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_n</div>
<div>[2014-04-17 10:20:38.332845] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-myVol-replicate-0: background &nbsp;entry self-heal failed on /images/myProject1/2.1_stale</div>
<div>[2014-04-17 10:20:41.447911] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0: /images/myProject1/2.1_stale: Performing conservative merge</div>
<div>[2014-04-17 10:20:41.453950] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_s: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:41.453998] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_s</div>
<div>[2014-04-17 10:20:41.454135] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_b: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:41.454163] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_b</div>
<div>[2014-04-17 10:20:41.454237] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_n: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:41.454263] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_n</div>
<div>[2014-04-17 10:20:41.454385] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/current: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:41.454413] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/current</div>
<div>[2014-04-17 10:20:41.479015] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_s: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:41.479063] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_s</div>
<div>[2014-04-17 10:20:41.479149] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_b: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:41.479177] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_b</div>
<div>[2014-04-17 10:20:41.479252] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/current: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:41.479279] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/current</div>
<div>[2014-04-17 10:20:41.499291] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_n: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:41.499333] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_n</div>
<div>[2014-04-17 10:20:41.499995] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-myVol-replicate-0: background &nbsp;entry self-heal failed on /images/myProject1/2.1_stale</div>
<div>[2014-04-17 10:20:43.149818] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0: /images/myProject1/2.1_stale: Performing conservative merge</div>
<div>[2014-04-17 10:20:43.155127] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_s: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:43.155185] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_s</div>
<div>[2014-04-17 10:20:43.155308] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_b: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:43.155346] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_b</div>
<div>[2014-04-17 10:20:43.155441] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_n: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:43.155477] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_n</div>
<div>[2014-04-17 10:20:43.155628] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/current: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:43.155660] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/current</div>
<div>[2014-04-17 10:20:43.180271] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_s: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:43.180324] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_s</div>
<div>[2014-04-17 10:20:43.180425] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_b: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:43.180455] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_b</div>
<div>[2014-04-17 10:20:43.180545] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/current: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:43.180578] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/current</div>
<div>[2014-04-17 10:20:43.201070] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_n: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:43.201112] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_n</div>
<div>[2014-04-17 10:20:43.201788] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-myVol-replicate-0: background &nbsp;entry self-heal failed on /images/myProject1/2.1_stale</div>
<div>[2014-04-17 10:20:44.646242] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0: /images/myProject1/2.1_stale: Performing conservative merge</div>
<div>[2014-04-17 10:20:44.652027] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_s: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:44.652072] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_s</div>
<div>[2014-04-17 10:20:44.652207] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_b: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:44.652239] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_b</div>
<div>[2014-04-17 10:20:44.652341] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_n: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:44.652372] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_n</div>
<div>[2014-04-17 10:20:44.652518] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/current: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:44.652550] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/current</div>
<div>[2014-04-17 10:20:44.676929] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_s: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:44.676973] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_s</div>
<div>[2014-04-17 10:20:44.677062] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_b: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:44.677107] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_b</div>
<div>[2014-04-17 10:20:44.677196] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/current: gfid differs on subvolume 0</div>
<div>[2014-04-17 10:20:44.677225] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/current</div>
<div>[2014-04-17 10:20:44.698071] W [afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0: /images/myProject1/2.1_stale/latest_n: gfid differs on subvolume 1</div>
<div>[2014-04-17 10:20:44.698113] E [afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk] 0-myVol-replicate-0: Conflicting entries for /images/myProject1/2.1_stale/latest_n</div>
<div>[2014-04-17 10:20:44.698816] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-myVol-replicate-0: background &nbsp;entry self-heal failed on /images/myProject1/2.1_stale</div>
<div><br>
</div>
<div><br>
</div>
<div>So let’s take a deeper look.</div>
<div><br>
</div>
<div>Any split-brain ?</div>
<blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;"><br>
</blockquote>
[root@myBrick1 glusterfs]# gluster volume heal myVol info split-brain<br>
<blockquote style="margin: 0px 0px 0px 40px; border: none; padding: 0px;">
<div>Gathering Heal info on volume myVol has been successful</div>
<div><br>
</div>
<div>Brick myBrick1.company.lan:/export/raid/myVol</div>
<div>Number of entries: 0</div>
<div><br>
</div>
<div>Brick myBrick2.company.lan:/export/raid/myVol</div>
<div>Number of entries: 0</div>
</blockquote>
<div><br>
</div>
<div>Nop</div>
<div><br>
</div>
<div>Any heal-failed ?</div>
<div><br>
</div>
<div>[root@myBrick1 glusterfs]# gluster volume heal myVol info heal-failed |wc -l</div>
<div>380</div>
<div><br>
</div>
<div>Plenty… how unique of them ?</div>
<div><br>
</div>
<div>[root@myBrick1 glusterfs]# gluster volume heal myVol info heal-failed | cut -d &quot; &quot; -f 3 | sort -u | wc -l</div>
<div>18</div>
<div><br>
</div>
<div>Digging on it</div>
<div><br>
</div>
<div>Here are failling entry from gluster volume myVol info heal-failed</div>
<div><br>
</div>
<div>&lt;gfid:0982330a-2e08-4b97-9ea5-cf991d295e41&gt;</div>
<div>&lt;gfid:29337848-ffad-413b-91b1-7bd062b8c939&gt;</div>
<div>&lt;gfid:3140c4f6-d95c-41bb-93c4-18a644497160&gt;</div>
<div>&lt;gfid:5dd03e08-c9b6-4315-b6ae-efcb45558f18&gt;</div>
<div>&lt;gfid:75db102b-99f4-4852-98b6-d43e39c3ccb6&gt;</div>
<div>&lt;gfid:9c193529-75bf-4f81-bbd6-95a952d646dd&gt;</div>
<div>&lt;gfid:a21c5f72-6d05-4c56-a34f-fcbef48374da&gt;</div>
<div>&lt;gfid:f6660583-c8a7-4d4a-88d8-1138fc1030f5&gt;</div>
<div>/images/myProject3/2.1</div>
<div>/images/myProject3/2.1_stale</div>
<div>/images/myProject2/2.1</div>
<div>/images/myProject2/2.1_stale</div>
<div>/images/myProject1/2.1</div>
<div>/images/myProject1/2.1_stale</div>
<div><br>
</div>
<div>Lets compare their xattr</div>
<div><br>
</div>
<div>getfattr from file above :</div>
<div><br>
</div>
<div>myBrick1:</div>
<div><br>
</div>
<div>/export/raid/myVol/images/myProject3/2.1 -&gt; trusted.gfid=0x2d18a6f72a894f20a260478b5a9602be</div>
<div>/export/raid/myVol/images/myProject3/2.1_stale -&gt; trusted.gfid=0x29337848ffad413b91b17bd062b8c939</div>
<div>/export/raid/myVol/images/myProject2/2.1 -&gt; trusted.gfid=0x04cdfe8bb83b4b27b42153df913b5181</div>
<div>/export/raid/myVol/images/myProject2/2.1_stale -&gt; trusted.gfid=0x5dd03e08c9b64315b6aeefcb45558f18</div>
<div>/export/raid/myVol/images/myProject1/2.1 -&gt; trusted.gfid=0xca8fedea8ad64612a33db75ea1ca4421</div>
<div>/export/raid/myVol/images/myProject1/2.1_stale -&gt; trusted.gfid=0xa21c5f726d054c56a34ffcbef48374da</div>
<div><br>
</div>
<div><br>
</div>
<div>myBrick2:</div>
<div><br>
</div>
<div>/export/raid/myVol/images/myProject3/2.1 -&gt; trusted.gfid=0x2d18a6f72a894f20a260478b5a9602be</div>
<div>/export/raid/myVol/images/myProject3/2.1_stale -&gt; trusted.gfid=0x29337848ffad413b91b17bd062b8c939</div>
<div>/export/raid/myVol/images/myProject2/2.1 -&gt; trusted.gfid=0x04cdfe8bb83b4b27b42153df913b5181</div>
<div>/export/raid/myVol/images/myProject2/2.1_stale -&gt; trusted.gfid=0x5dd03e08c9b64315b6aeefcb45558f18</div>
<div>/export/raid/myVol/images/myProject1/2.1 -&gt; trusted.gfid=0xca8fedea8ad64612a33db75ea1ca4421</div>
<div>/export/raid/myVol/images/myProject1/2.1_stale -&gt; trusted.gfid=0xa21c5f726d054c56a34ffcbef48374da</div>
<div><br>
</div>
<div>Damns they seems good, I made a cross check md5sum all files are the same on both bricks,</div>
<div><br>
</div>
<div>Let chek the symlinks now</div>
<div><br>
</div>
<div>myBrick1:</div>
<div><br>
</div>
<div>/export/raid/myVol/images/myProject3/2.1_stale/current&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x95b01ba94e0c482eacf51ebb20c1cba1</div>
<div>/export/raid/myVol/images/myProject3/2.1_stale/latest_b&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x5c0165dfe5c84c7ea076731065292135</div>
<div>/export/raid/myVol/images/myProject3/2.1_stale/latest_n&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0xab4de5d630084cde891ac65a7904f6d0</div>
<div>/export/raid/myVol/images/myProject3/2.1_stale/latest_s&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x4c38edac7d4c4e5e8ad8ff44a164f7b8</div>
<div>/export/raid/myVol/images/myProject2/2.1_stale/current&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x946ce9e7224f4fc581d817a1ebcec087</div>
<div>/export/raid/myVol/images/myProject2/2.1_stale/latest_b&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x8fb0020e97a54be0953e4786c6933f86</div>
<div>/export/raid/myVol/images/myProject2/2.1_stale/latest_n&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x8de1551a69f244e3bb0a61cbaba57414</div>
<div>/export/raid/myVol/images/myProject2/2.1_stale/latest_s&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0xd11e42f54a2944a68ee7fd1f544539a9</div>
<div>/export/raid/myVol/images/myProject1/2.1_stale/current&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x1ce5eac809694d83b983023efaea0f64</div>
<div>/export/raid/myVol/images/myProject1/2.1_stale/latest_b&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0xcc25cfdf98f749caaf259c76fe1b85b1</div>
<div>/export/raid/myVol/images/myProject1/2.1_stale/latest_n&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x3889f78789a14b388e9fb5caa2231cc7</div>
<div>/export/raid/myVol/images/myProject1/2.1_stale/latest_s&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x2f9901e45c5a47d282faaf65c675cf48</div>
<div><br>
</div>
<div>myBrick2:</div>
<div><br>
</div>
<div>/export/raid/myVol/images/myProject3/2.1_stale/current&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0xb6d0a17f397b4922a0ac0e3d740ca8c7</div>
<div>/export/raid/myVol/images/myProject3/2.1_stale/latest_b&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0xc80750eca11b40f48feb99ea6cd07799</div>
<div>/export/raid/myVol/images/myProject3/2.1_stale/latest_n&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x57bb90d64bd74c29ab30d844b33528b7</div>
<div>/export/raid/myVol/images/myProject3/2.1_stale/latest_s&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x47f94619468a419d98011d8e67a43068</div>
<div>/export/raid/myVol/images/myProject2/2.1_stale/current&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0xcda7ba2331e6489f95c524a17ae179bf</div>
<div>/export/raid/myVol/images/myProject2/2.1_stale/latest_b&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0xc36205fa8d2c49bda64001d667aab8a6</div>
<div>/export/raid/myVol/images/myProject2/2.1_stale/latest_n&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0xf71b8f6a4ff14e75951b47ae40817b70</div>
<div>/export/raid/myVol/images/myProject2/2.1_stale/latest_s&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0xcba23dbde7b14b63acc43d287a1b527e</div>
<div>/export/raid/myVol/images/myProject1/2.1_stale/current&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x82434b00c17d4d7d88cff71ba4d8c10d</div>
<div>/export/raid/myVol/images/myProject1/2.1_stale/latest_b&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0xbba1934c4f7240bab705bd0548fbdc22</div>
<div>/export/raid/myVol/images/myProject1/2.1_stale/latest_n&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x408ef2e2c1c5454fb0b920065eebab2f</div>
<div>/export/raid/myVol/images/myProject1/2.1_stale/latest_s&nbsp;<span class="Apple-tab-span" style="white-space: pre;">
</span>xattr=trusted.gfid=0x9fc2992ad16c4186a0d36ef93fea73f1</div>
<div><br>
</div>
<div>Damn, trusted.gfid is not consistent between the bricks… how does it append ?</div>
<div><br>
</div>
<div>When we power up the myBrick1 we have some automated jobs writing on the other brick through nfs, and this job copying a bunch of file and update the symlinks above.</div>
<div><br>
</div>
<div>So definitely there is an issue with replication… but only with symlinks ???</div>
<div><br>
</div>
<div><b>So here are the questions:</b></div>
<div><br>
</div>
<div><span class="Apple-tab-span" style="white-space: pre;"></span>1) Can we still use glusters in read/write operation when adding a new or old brick ? (down for maintenance for example) ? That a key point in our deployment for scalability and flexibility</div>
<div><span class="Apple-tab-span" style="white-space: pre;"></span></div>
<div><span class="Apple-tab-span" style="white-space: pre;"></span>2) How can I recover / delete the conflicted grid files ? (as symlinks are the same in both side, only xattr differs).&nbsp;</div>
<div><br>
</div>
<div><br>
</div>
<div>Thanks a lot for your help</div>
<div><br>
</div>
<div>Cyril</div>
<div style="margin-top: 2.5em; margin-bottom: 1em; border-bottom-width: 1px; border-bottom-style: solid; border-bottom-color: rgb(0, 0, 0);">
<br class="webkit-block-placeholder">
</div>
<pre class="k9mail"><hr><br>Gluster-users mailing list<br><a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br><a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a></pre>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</body>
</html>