Hi all!<br><br>After the start of pserver12 I ran the getfattr command on all 4 systems in order to check which files were out of sync. This came back with 63 files on pserver12 and none on the others. After starting the gluster server and client daemons on 12, the first batch was done automagically, as stated before. But not all of them as II would have expected.<br>
<br>Best, Martin<br><br><div class="gmail_quote">2011/4/29 Pranith Kumar. Karampuri <span dir="ltr">&lt;<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>&gt;</span><br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
This means that there is no differences in gfids. Could you let me know how the self heal is done after the pserver12 was brought up?.<br>
How did you find out that the self-heal is needed for 63 files?.<br>
<div class="im"><br>
Pranith.<br>
----- Original Message -----<br>
From: &quot;Martin Schenker&quot; &lt;<a href="mailto:martin.schenker@profitbricks.com">martin.schenker@profitbricks.com</a>&gt;<br>
</div><div><div></div><div class="h5">To: &quot;Pranith Kumar. Karampuri&quot; &lt;<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>&gt;, <a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a><br>

Sent: Friday, April 29, 2011 11:05:55 PM<br>
Subject: Re: [Gluster-users] Server outage,     file sync/self-heal doesn&#39;t sync ALL files?!<br>
<br>
Sorry, I had manually sync due to imminent server upgrades.<br>
50 min. after the initial sync I was asked to bring the servers in a<br>
safe state for an upgrade and did a manual<br>
&quot;touch-on-server13-client-mountpoint&quot; which triggered an immediate<br>
self-heal on the rest of the files.<br>
<br>
All files were in sync across all four server after this action. Will<br>
run this command next time!!<br>
<br>
Best, Martin<br>
<br>
Am 29.04.2011 19:30, schrieb Pranith Kumar. Karampuri:<br>
&gt; hi Martin,<br>
&gt;        Could you please send the output of -m &quot;trusted*&quot; instead of &quot;trusted.afr&quot; for the remaining 24 files from both the servers. I would like to see the gfids of these files on both the machines.<br>

&gt;<br>
&gt; Pranith.<br>
&gt; ----- Original Message -----<br>
&gt; From: &quot;Martin Schenker&quot;&lt;<a href="mailto:martin.schenker@profitbricks.com">martin.schenker@profitbricks.com</a>&gt;<br>
&gt; To: <a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a><br>
&gt; Sent: Friday, April 29, 2011 8:39:46 PM<br>
&gt; Subject: [Gluster-users] Server outage,       file sync/self-heal doesn&#39;t sync ALL files?!<br>
&gt;<br>
&gt; Hi all!<br>
&gt;<br>
&gt; We have another incident over here.<br>
&gt;<br>
&gt; One of the servers (pserver12) in a pair (12&amp;  13) has been rebooted.<br>
&gt; pserver13 showed 63 files not in sync after the outage for 2h.<br>
&gt;<br>
&gt; Both server are clients as well.<br>
&gt;<br>
&gt; Starting pserver12 brought up the self-heal mechanism, but only 39 files<br>
&gt; were triggered within the first 10 min. Now the system seems dormant and<br>
&gt; 24 files are left hanging.<br>
&gt;<br>
&gt; On the other three servers no inconsistencies are seen.<br>
&gt;<br>
&gt; tail of client log file:<br>
&gt;<br>
&gt; 2011-04-29 14:48:23.820022] I<br>
&gt; [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]<br>
&gt; 0-storage0-replicate-2: diff self-heal on /pserver13-17: 1960 blocks of<br>
&gt; 22736 were different (8.62%)<br>
&gt; [2011-04-29 14:48:23.887651] E [afr-common.c:110:afr_set_split_brain]<br>
&gt; 0-storage0-replicate-2: invalid argument: inode<br>
&gt; [2011-04-29 14:48:23.887740] I<br>
&gt; [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]<br>
&gt; 0-storage0-replicate-2: background  data self-heal completed on<br>
&gt; /pserver13-17<br>
&gt; [2011-04-29 14:48:24.272220] I<br>
&gt; [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]<br>
&gt; 0-storage0-replicate-2: diff self-heal on /pserver13-19: 1960 blocks of<br>
&gt; 22744 were different (8.62%)<br>
&gt; [2011-04-29 14:48:24.341868] E [afr-common.c:110:afr_set_split_brain]<br>
&gt; 0-storage0-replicate-2: invalid argument: inode<br>
&gt; [2011-04-29 14:48:24.341959] I<br>
&gt; [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]<br>
&gt; 0-storage0-replicate-2: background  data self-heal completed on<br>
&gt; /pserver13-19<br>
&gt; [2011-04-29 14:48:24.758131] I<br>
&gt; [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]<br>
&gt; 0-storage0-replicate-2: diff self-heal on /pserver13-23: 1952 blocks of<br>
&gt; 22752 were different (8.58%)<br>
&gt; [2011-04-29 14:48:24.766054] E [afr-common.c:110:afr_set_split_brain]<br>
&gt; 0-storage0-replicate-2: invalid argument: inode<br>
&gt; [2011-04-29 14:48:24.766137] I<br>
&gt; [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]<br>
&gt; 0-storage0-replicate-2: background  data self-heal completed on<br>
&gt; /pserver13-23<br>
&gt; [2011-04-29 14:48:24.884613] I<br>
&gt; [afr-self-heal-algorithm.c:526:sh_diff_loop_driver_done]<br>
&gt; 0-storage0-replicate-2: diff self-heal on /pserver13-10: 1952 blocks of<br>
&gt; 22760 were different (8.58%)<br>
&gt; [2011-04-29 14:48:24.895631] E [afr-common.c:110:afr_set_split_brain]<br>
&gt; 0-storage0-replicate-2: invalid argument: inode<br>
&gt; [2011-04-29 14:48:24.895721] I<br>
&gt; [afr-self-heal-common.c:1527:afr_self_heal_completion_cbk]<br>
&gt; 0-storage0-replicate-2: background  data self-heal completed on<br>
&gt; /pserver13-10<br>
&gt; 0 root@pserver13:/var/log/glusterfs # date<br>
&gt; Fri Apr 29 15:08:18 UTC 2011<br>
&gt;<br>
&gt;<br>
&gt; Search for mismatch:<br>
&gt;<br>
&gt; 0 root@pserver13:~ # getfattr -R -d -e hex -m &quot;trusted.afr.&quot;<br>
&gt; /mnt/gluster/brick?/storage | grep -v 0x000000000000000000000000 | grep<br>
&gt; -B1 -A1 trusted | grep -c file<br>
&gt; getfattr: Removing leading &#39;/&#39; from absolute path names<br>
&gt; *24*<br>
&gt;<br>
&gt;<br>
&gt; 0 root@pserver13:~ # getfattr -R -d -e hex -m &quot;trusted.afr.&quot;<br>
&gt; /mnt/gluster/brick?/storage | grep -v 0x000000000000000000000000 | grep<br>
&gt; -B1  trusted<br>
&gt; getfattr: Removing leading &#39;/&#39; from absolute path names<br>
&gt; # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-33<br>
&gt; trusted.afr.storage0-client-4=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-26<br>
&gt; trusted.afr.storage0-client-4=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file:<br>
&gt; mnt/gluster/brick0/storage/images/1959/cd55c5f3-9aa1-bfd9-99a0-01c13a7d8559/hdd-images<br>
&gt; trusted.afr.storage0-client-4=0x000000000000001600000001<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-24<br>
&gt; trusted.afr.storage0-client-4=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-8<br>
&gt; trusted.afr.storage0-client-4=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-21<br>
&gt; trusted.afr.storage0-client-4=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-22<br>
&gt; trusted.afr.storage0-client-4=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-30<br>
&gt; trusted.afr.storage0-client-4=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-20<br>
&gt; trusted.afr.storage0-client-4=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-9<br>
&gt; trusted.afr.storage0-client-4=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick0/storage/de-dc1-c1-pserver5-38<br>
&gt; trusted.afr.storage0-client-4=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-18<br>
&gt; trusted.afr.storage0-client-6=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-2<br>
&gt; trusted.afr.storage0-client-6=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-23<br>
&gt; trusted.afr.storage0-client-6=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-4<br>
&gt; trusted.afr.storage0-client-6=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-3<br>
&gt; trusted.afr.storage0-client-6=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-34<br>
&gt; trusted.afr.storage0-client-6=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-37<br>
&gt; trusted.afr.storage0-client-6=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-12<br>
&gt; trusted.afr.storage0-client-6=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-27<br>
&gt; trusted.afr.storage0-client-6=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file:<br>
&gt; mnt/gluster/brick1/storage/images/1831/9a039a81-60fe-5fa3-f562-8f6d3828382b/hdd-images/13169<br>
&gt; trusted.afr.storage0-client-6=0x100000020000000000000000<br>
&gt; --<br>
&gt; # file:<br>
&gt; mnt/gluster/brick1/storage/images/1959/cd55c5f3-9aa1-bfd9-99a0-01c13a7d8559/hdd-images<br>
&gt; trusted.afr.storage0-client-6=0x000000000000001600000002<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-25<br>
&gt; trusted.afr.storage0-client-6=0x270000010000000000000000<br>
&gt; --<br>
&gt; # file: mnt/gluster/brick1/storage/de-dc1-c1-pserver5-7<br>
&gt; trusted.afr.storage0-client-6=0x270000010000000000000000<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; I could trigger manually but why isn&#39;t the sync/self-heal not working on<br>
&gt; all files shown as inconsistent? Or am I assuming something wrongly here?!?<br>
&gt;<br>
&gt; Best, Martin<br>
&gt;<br>
&gt; _______________________________________________<br>
&gt; Gluster-users mailing list<br>
&gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
&gt; <a href="http://gluster.org/cgi-bin/mailman/listinfo/gluster-users" target="_blank">http://gluster.org/cgi-bin/mailman/listinfo/gluster-users</a><br>
&gt;<br>
<br>
</div></div></blockquote></div><br>