<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<div class="moz-cite-prefix">On 09/19/2014 09:58 PM, Ramesh
Natarajan wrote:<br>
</div>
<blockquote
cite="mid:CAGmOvDKV8KShFj0huCyXOU0dHe3rhff+oQn9PRxNu8uZH=BH1w@mail.gmail.com"
type="cite">
<div dir="ltr">I was able to run another set of tests this week
and I was able to reproduce the issue again. Going by the
extended attributes, I think i ran into the same issue I saw
earlier..
<div><br>
</div>
<div> Do you think i need to open up a bug report?</div>
</div>
</blockquote>
hi Ramesh,<br>
I already fixed this bug. <a class="moz-txt-link-freetext" href="http://review.gluster.org/8757">http://review.gluster.org/8757</a>. We
should have the fix in next 3.5.x release I believe.<br>
<br>
Pranith<br>
<blockquote
cite="mid:CAGmOvDKV8KShFj0huCyXOU0dHe3rhff+oQn9PRxNu8uZH=BH1w@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div><br>
</div>
<div>
<div>Brick 1: </div>
<div><br>
</div>
<div>trusted.afr.PL2-client-0=0x000000000000000000000000</div>
<div>trusted.afr.PL2-client-1=0x000000010000000000000000</div>
<div>trusted.afr.PL2-client-2=0x000000010000000000000000</div>
<div>trusted.gfid=0x1cea509b07cc49e9bd28560b5f33032c</div>
<div><br>
</div>
<div>Brick 2</div>
<div><br>
</div>
<div>
<div>trusted.afr.PL2-client-0=0x0000125c0000000000000000</div>
<div>trusted.afr.PL2-client-1=0x000000000000000000000000</div>
<div>trusted.afr.PL2-client-2=0x000000000000000000000000</div>
<div>trusted.gfid=0x1cea509b07cc49e9bd28560b5f33032c</div>
</div>
<div><br>
</div>
<div>Brick 3</div>
<div><br>
</div>
<div>
<div>trusted.afr.PL2-client-0=0x0000125c0000000000000000</div>
<div>trusted.afr.PL2-client-1=0x000000000000000000000000</div>
<div>trusted.afr.PL2-client-2=0x000000000000000000000000</div>
<div>trusted.gfid=0x1cea509b07cc49e9bd28560b5f33032c</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>[root@ip-172-31-12-218 ~]# gluster volume info</div>
<div> </div>
<div>Volume Name: PL1</div>
<div>Type: Replicate</div>
<div>Volume ID: bd351bae-d467-4e8c-bbd2-6a0fe99c346a</div>
<div>Status: Started</div>
<div>Number of Bricks: 1 x 3 = 3</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1: 172.31.38.189:/data/vol1/gluster-data</div>
<div>Brick2: 172.31.16.220:/data/vol1/gluster-data</div>
<div>Brick3: 172.31.12.218:/data/vol1/gluster-data</div>
<div>Options Reconfigured:</div>
<div>cluster.server-quorum-type: server</div>
<div>network.ping-timeout: 12</div>
<div>nfs.addr-namelookup: off</div>
<div>performance.cache-size: 2147483648</div>
<div>cluster.quorum-type: auto</div>
<div>performance.read-ahead: off</div>
<div>performance.client-io-threads: on</div>
<div>performance.io-thread-count: 64</div>
<div>cluster.eager-lock: on</div>
<div>cluster.server-quorum-ratio: 51%</div>
<div> </div>
<div>Volume Name: PL2</div>
<div>Type: Replicate</div>
<div>Volume ID: e6ad8787-05d8-474b-bc78-748f8c13700f</div>
<div>Status: Started</div>
<div>Number of Bricks: 1 x 3 = 3</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1: 172.31.38.189:/data/vol2/gluster-data</div>
<div>Brick2: 172.31.16.220:/data/vol2/gluster-data</div>
<div>Brick3: 172.31.12.218:/data/vol2/gluster-data</div>
<div>Options Reconfigured:</div>
<div>nfs.addr-namelookup: off</div>
<div>cluster.server-quorum-type: server</div>
<div>network.ping-timeout: 12</div>
<div>performance.cache-size: 2147483648</div>
<div>cluster.quorum-type: auto</div>
<div>performance.read-ahead: off</div>
<div>performance.client-io-threads: on</div>
<div>performance.io-thread-count: 64</div>
<div>cluster.eager-lock: on</div>
<div>cluster.server-quorum-ratio: 51%</div>
<div>[root@ip-172-31-12-218 ~]# </div>
</div>
<div><br>
</div>
<div><b>Mount command</b></div>
<div><br>
</div>
<div>Client</div>
<div><br>
</div>
<div>mount -t glusterfs -o
defaults,enable-ino32,direct-io-mode=disable,log-level=WARNING,log-file=/var/log/gluster.log,backupvolfile-server=172.31.38.189,backupvolfile-server=172.31.12.218,background-qlen=256
172.31.16.220:/PL2 /mnt/vm<br>
</div>
<div><br>
</div>
<div>Server</div>
<div><br>
</div>
<div>
<div>/dev/xvdf /data/vol1 xfs defaults,inode64,noatime
1 2</div>
<div>/dev/xvdg /data/vol2 xfs defaults,inode64,noatime 1
2</div>
</div>
<div><br>
</div>
<div><b>Packages</b></div>
<div><br>
</div>
<div>Client</div>
<div><br>
</div>
<div>
<div>rpm -qa | grep gluster</div>
<div>glusterfs-fuse-3.5.2-1.el6.x86_64</div>
<div>glusterfs-3.5.2-1.el6.x86_64</div>
<div>glusterfs-libs-3.5.2-1.el6.x86_64</div>
</div>
<div><br>
</div>
<div>Server</div>
<div><br>
</div>
<div>
<div>[root@ip-172-31-12-218 ~]# rpm -qa | grep gluster</div>
<div>glusterfs-3.5.2-1.el6.x86_64</div>
<div>glusterfs-fuse-3.5.2-1.el6.x86_64</div>
<div>glusterfs-api-3.5.2-1.el6.x86_64</div>
<div>glusterfs-server-3.5.2-1.el6.x86_64</div>
<div>glusterfs-libs-3.5.2-1.el6.x86_64</div>
<div>glusterfs-cli-3.5.2-1.el6.x86_64</div>
<div>[root@ip-172-31-12-218 ~]# </div>
</div>
<div><br>
</div>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Sat, Sep 6, 2014 at 9:01 AM, Pranith
Kumar Karampuri <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb">
<div class="h5"><br>
On 09/06/2014 04:53 AM, Jeff Darcy wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
I have a replicate glusterfs setup on 3 Bricks (
replicate = 3 ). I have<br>
client and server quorum turned on. I rebooted one
of the 3 bricks. When it<br>
came back up, the client started throwing error
messages that one of the<br>
files went into split brain.<br>
</blockquote>
This is a good example of how split brain can happen
even with all kinds of<br>
quorum enabled. Let's look at those xattrs. BTW,
thank you for a very<br>
nicely detailed bug report which includes those.<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
BRICK1<br>
========<br>
[root@ip-172-31-38-189 ~]# getfattr -d -m . -e hex<br>
/data/vol2/gluster-data/apache_cp_mm1/logs/access_log.<a
moz-do-not-send="true" href="tel:2014-09-05-17"
value="+12014090517" target="_blank">2014-09-05-17</a>_00_00<br>
getfattr: Removing leading '/' from absolute path
names<br>
# file:<br>
data/vol2/gluster-data/apache_cp_mm1/logs/access_log.<a
moz-do-not-send="true" href="tel:2014-09-05-17"
value="+12014090517" target="_blank">2014-09-05-17</a>_00_00<br>
trusted.afr.PL2-client-0=0x000000000000000000000000<br>
trusted.afr.PL2-client-1=0x000000010000000000000000<br>
trusted.afr.PL2-client-2=0x000000010000000000000000<br>
trusted.gfid=0xea950263977e46bf89a0ef631ca139c2<br>
<br>
BRICK 2<br>
=======<br>
[root@ip-172-31-16-220 ~]# getfattr -d -m . -e hex<br>
/data/vol2/gluster-data/apache_cp_mm1/logs/access_log.<a
moz-do-not-send="true" href="tel:2014-09-05-17"
value="+12014090517" target="_blank">2014-09-05-17</a>_00_00<br>
getfattr: Removing leading '/' from absolute path
names<br>
# file:<br>
data/vol2/gluster-data/apache_cp_mm1/logs/access_log.<a
moz-do-not-send="true" href="tel:2014-09-05-17"
value="+12014090517" target="_blank">2014-09-05-17</a>_00_00<br>
trusted.afr.PL2-client-0=0x00000d460000000000000000<br>
trusted.afr.PL2-client-1=0x000000000000000000000000<br>
trusted.afr.PL2-client-2=0x000000000000000000000000<br>
trusted.gfid=0xea950263977e46bf89a0ef631ca139c2<br>
BRICK 3<br>
=========<br>
[root@ip-172-31-12-218 ~]# getfattr -d -m . -e hex<br>
/data/vol2/gluster-data/apache_cp_mm1/logs/access_log.<a
moz-do-not-send="true" href="tel:2014-09-05-17"
value="+12014090517" target="_blank">2014-09-05-17</a>_00_00<br>
getfattr: Removing leading '/' from absolute path
names<br>
# file:<br>
data/vol2/gluster-data/apache_cp_mm1/logs/access_log.<a
moz-do-not-send="true" href="tel:2014-09-05-17"
value="+12014090517" target="_blank">2014-09-05-17</a>_00_00<br>
trusted.afr.PL2-client-0=0x00000d460000000000000000<br>
trusted.afr.PL2-client-1=0x000000000000000000000000<br>
trusted.afr.PL2-client-2=0x000000000000000000000000<br>
trusted.gfid=0xea950263977e46bf89a0ef631ca139c2<br>
</blockquote>
Here, we see that brick 1 shows a single pending
operation for the other<br>
two, while they show 0xd46 (3398) pending operations
for brick 1.<br>
Here's how this can happen.<br>
<br>
(1) There is exactly one pending operation.<br>
<br>
(2) Brick1 completes the write first, and says so.<br>
<br>
(3) Client sends messages to all three, saying to
decrement brick1's<br>
count.<br>
<br>
(4) All three bricks receive and process that message.<br>
<br>
(5) Brick1 fails.<br>
<br>
(6) Brick2 and brick3 complete the write, and say so.<br>
<br>
(7) Client tells all bricks to decrement remaining
counts.<br>
<br>
(8) Brick2 and brick3 receive and process that
message.<br>
<br>
(9) Brick1 is dead, so its counts for brick2/3 stay at
one.<br>
<br>
(10) Brick2 and brick3 have quorum, with all-zero
pending counters.<br>
<br>
(11) Client sends 0xd46 more writes to brick2 and
brick3.<br>
<br>
Note that at no point did we lose quorum. Note also
the tight timing<br>
required. If brick1 had failed an instant earlier, it
would not have<br>
decremented its own counter. If it had failed an
instant later, it<br>
would have decremented brick2's and brick3's as well.
If brick1 had not<br>
finished first, we'd be in yet another scenario. If
delayed changelog<br>
had been operative, the messages at (3) and (7) would
have been combined<br>
to leave us in yet another scenario. As far as I can
tell, we would<br>
have been able to resolve the conflict in all those
cases.<br>
*** Key point: quorum enforcement does not totally
eliminate split<br>
brain. It only makes the frequency a few orders of
magnitude lower. ***<br>
</blockquote>
<br>
</div>
</div>
Not quite right. After we fixed the bug <a
moz-do-not-send="true"
href="https://bugzilla.redhat.com/show_bug.cgi?id=1066996"
target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1066996</a>,
the only two possible ways to introduce split-brain are<br>
1) if we have an implementation bug in changelog xattr
marking, I believe that to be the case here.<br>
2) Keep writing to the file from the mount then<br>
a) take brick 1 down, wait until at least one write is
successful<br>
b) bring brick1 back up and take brick 2 down (self-heal
should not happen) wait until at least one write is
successful<br>
c) bring brick2 back up and take brick 3 down (self-heal
should not happen) wait until at least one write is
successful<br>
<br>
With outcast implementation case-2 will also be immune to
split-brain errors.<br>
<br>
Then the only way we have split-brains in afr is
implementation errors of changelog marking. If we test it
thoroughly and fix such problems we can get it to be immune
to split-brain :-).<span class="HOEnZb"><font
color="#888888"><br>
<br>
Pranith<br>
</font></span>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">
So, is there any way to prevent this completely? Some
AFR enhancements,<br>
such as the oft-promised "outcast" feature[1], might
have helped.<br>
NSR[2] is immune to this particular problem. "Policy
based split brain<br>
resolution"[3] might have resolved it automatically
instead of merely<br>
flagging it. Unfortunately, those are all in the
future. For now, I'd<br>
say the best approach is to resolve the conflict
manually and try to<br>
move on. Unless there's more going on than meets the
eye, recurrence<br>
should be very unlikely.<br>
<br>
[1] <a moz-do-not-send="true"
href="http://www.gluster.org/community/documentation/index.php/Features/outcast"
target="_blank">http://www.gluster.org/community/documentation/index.php/Features/outcast</a><br>
<br>
[2] <a moz-do-not-send="true"
href="http://www.gluster.org/community/documentation/index.php/Features/new-style-replication"
target="_blank">http://www.gluster.org/community/documentation/index.php/Features/new-style-replication</a><br>
<br>
[3] <a moz-do-not-send="true"
href="http://www.gluster.org/community/documentation/index.php/Features/pbspbr"
target="_blank">http://www.gluster.org/community/documentation/index.php/Features/pbspbr</a><br>
</span><span class="">
_______________________________________________<br>
Gluster-users mailing list<br>
<a moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org"
target="_blank">Gluster-users@gluster.org</a><br>
<a moz-do-not-send="true"
href="http://supercolony.gluster.org/mailman/listinfo/gluster-users"
target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br>
</span></blockquote>
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>