<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <div class="moz-cite-prefix">On 09/19/2014 09:58 PM, Ramesh
      Natarajan wrote:<br>
    </div>
    <blockquote
cite="mid:CAGmOvDKV8KShFj0huCyXOU0dHe3rhff+oQn9PRxNu8uZH=BH1w@mail.gmail.com"
      type="cite">
      <div dir="ltr">I was able to run another set of tests this week
        and I was able to reproduce the issue again. Going by the
        extended attributes, I think i ran into the same issue I saw
        earlier..
        <div><br>
        </div>
        <div> Do you think i need to open up a bug report?</div>
      </div>
    </blockquote>
    hi Ramesh,<br>
         I already fixed this bug. <a class="moz-txt-link-freetext" href="http://review.gluster.org/8757">http://review.gluster.org/8757</a>. We
    should have the fix in next 3.5.x release I believe.<br>
    <br>
    Pranith<br>
    <blockquote
cite="mid:CAGmOvDKV8KShFj0huCyXOU0dHe3rhff+oQn9PRxNu8uZH=BH1w@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div><br>
          </div>
          <div>
            <div>Brick 1: </div>
            <div><br>
            </div>
            <div>trusted.afr.PL2-client-0=0x000000000000000000000000</div>
            <div>trusted.afr.PL2-client-1=0x000000010000000000000000</div>
            <div>trusted.afr.PL2-client-2=0x000000010000000000000000</div>
            <div>trusted.gfid=0x1cea509b07cc49e9bd28560b5f33032c</div>
            <div><br>
            </div>
            <div>Brick 2</div>
            <div><br>
            </div>
            <div>
              <div>trusted.afr.PL2-client-0=0x0000125c0000000000000000</div>
              <div>trusted.afr.PL2-client-1=0x000000000000000000000000</div>
              <div>trusted.afr.PL2-client-2=0x000000000000000000000000</div>
              <div>trusted.gfid=0x1cea509b07cc49e9bd28560b5f33032c</div>
            </div>
            <div><br>
            </div>
            <div>Brick 3</div>
            <div><br>
            </div>
            <div>
              <div>trusted.afr.PL2-client-0=0x0000125c0000000000000000</div>
              <div>trusted.afr.PL2-client-1=0x000000000000000000000000</div>
              <div>trusted.afr.PL2-client-2=0x000000000000000000000000</div>
              <div>trusted.gfid=0x1cea509b07cc49e9bd28560b5f33032c</div>
            </div>
            <div><br>
            </div>
            <div><br>
            </div>
            <div>
              <div>[root@ip-172-31-12-218 ~]# gluster volume info</div>
              <div> </div>
              <div>Volume Name: PL1</div>
              <div>Type: Replicate</div>
              <div>Volume ID: bd351bae-d467-4e8c-bbd2-6a0fe99c346a</div>
              <div>Status: Started</div>
              <div>Number of Bricks: 1 x 3 = 3</div>
              <div>Transport-type: tcp</div>
              <div>Bricks:</div>
              <div>Brick1: 172.31.38.189:/data/vol1/gluster-data</div>
              <div>Brick2: 172.31.16.220:/data/vol1/gluster-data</div>
              <div>Brick3: 172.31.12.218:/data/vol1/gluster-data</div>
              <div>Options Reconfigured:</div>
              <div>cluster.server-quorum-type: server</div>
              <div>network.ping-timeout: 12</div>
              <div>nfs.addr-namelookup: off</div>
              <div>performance.cache-size: 2147483648</div>
              <div>cluster.quorum-type: auto</div>
              <div>performance.read-ahead: off</div>
              <div>performance.client-io-threads: on</div>
              <div>performance.io-thread-count: 64</div>
              <div>cluster.eager-lock: on</div>
              <div>cluster.server-quorum-ratio: 51%</div>
              <div> </div>
              <div>Volume Name: PL2</div>
              <div>Type: Replicate</div>
              <div>Volume ID: e6ad8787-05d8-474b-bc78-748f8c13700f</div>
              <div>Status: Started</div>
              <div>Number of Bricks: 1 x 3 = 3</div>
              <div>Transport-type: tcp</div>
              <div>Bricks:</div>
              <div>Brick1: 172.31.38.189:/data/vol2/gluster-data</div>
              <div>Brick2: 172.31.16.220:/data/vol2/gluster-data</div>
              <div>Brick3: 172.31.12.218:/data/vol2/gluster-data</div>
              <div>Options Reconfigured:</div>
              <div>nfs.addr-namelookup: off</div>
              <div>cluster.server-quorum-type: server</div>
              <div>network.ping-timeout: 12</div>
              <div>performance.cache-size: 2147483648</div>
              <div>cluster.quorum-type: auto</div>
              <div>performance.read-ahead: off</div>
              <div>performance.client-io-threads: on</div>
              <div>performance.io-thread-count: 64</div>
              <div>cluster.eager-lock: on</div>
              <div>cluster.server-quorum-ratio: 51%</div>
              <div>[root@ip-172-31-12-218 ~]# </div>
            </div>
            <div><br>
            </div>
            <div><b>Mount command</b></div>
            <div><br>
            </div>
            <div>Client</div>
            <div><br>
            </div>
            <div>mount -t glusterfs -o
              defaults,enable-ino32,direct-io-mode=disable,log-level=WARNING,log-file=/var/log/gluster.log,backupvolfile-server=172.31.38.189,backupvolfile-server=172.31.12.218,background-qlen=256
              172.31.16.220:/PL2  /mnt/vm<br>
            </div>
            <div><br>
            </div>
            <div>Server</div>
            <div><br>
            </div>
            <div>
              <div>/dev/xvdf    /data/vol1 xfs defaults,inode64,noatime
                1 2</div>
              <div>/dev/xvdg   /data/vol2 xfs defaults,inode64,noatime 1
                2</div>
            </div>
            <div><br>
            </div>
            <div><b>Packages</b></div>
            <div><br>
            </div>
            <div>Client</div>
            <div><br>
            </div>
            <div>
              <div>rpm -qa | grep gluster</div>
              <div>glusterfs-fuse-3.5.2-1.el6.x86_64</div>
              <div>glusterfs-3.5.2-1.el6.x86_64</div>
              <div>glusterfs-libs-3.5.2-1.el6.x86_64</div>
            </div>
            <div><br>
            </div>
            <div>Server</div>
            <div><br>
            </div>
            <div>
              <div>[root@ip-172-31-12-218 ~]# rpm -qa | grep gluster</div>
              <div>glusterfs-3.5.2-1.el6.x86_64</div>
              <div>glusterfs-fuse-3.5.2-1.el6.x86_64</div>
              <div>glusterfs-api-3.5.2-1.el6.x86_64</div>
              <div>glusterfs-server-3.5.2-1.el6.x86_64</div>
              <div>glusterfs-libs-3.5.2-1.el6.x86_64</div>
              <div>glusterfs-cli-3.5.2-1.el6.x86_64</div>
              <div>[root@ip-172-31-12-218 ~]# </div>
            </div>
            <div><br>
            </div>
          </div>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Sat, Sep 6, 2014 at 9:01 AM, Pranith
          Kumar Karampuri <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div class="HOEnZb">
              <div class="h5"><br>
                On 09/06/2014 04:53 AM, Jeff Darcy wrote:<br>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    I have a replicate glusterfs setup on 3 Bricks (
                    replicate = 3 ). I have<br>
                    client and server quorum turned on. I rebooted one
                    of the 3 bricks. When it<br>
                    came back up, the client started throwing error
                    messages that one of the<br>
                    files went into split brain.<br>
                  </blockquote>
                  This is a good example of how split brain can happen
                  even with all kinds of<br>
                  quorum enabled.  Let's look at those xattrs.  BTW,
                  thank you for a very<br>
                  nicely detailed bug report which includes those.<br>
                  <br>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    BRICK1<br>
                    ========<br>
                    [root@ip-172-31-38-189 ~]# getfattr -d -m . -e hex<br>
                    /data/vol2/gluster-data/apache_cp_mm1/logs/access_log.<a
                      moz-do-not-send="true" href="tel:2014-09-05-17"
                      value="+12014090517" target="_blank">2014-09-05-17</a>_00_00<br>
                    getfattr: Removing leading '/' from absolute path
                    names<br>
                    # file:<br>
                    data/vol2/gluster-data/apache_cp_mm1/logs/access_log.<a
                      moz-do-not-send="true" href="tel:2014-09-05-17"
                      value="+12014090517" target="_blank">2014-09-05-17</a>_00_00<br>
                    trusted.afr.PL2-client-0=0x000000000000000000000000<br>
                    trusted.afr.PL2-client-1=0x000000010000000000000000<br>
                    trusted.afr.PL2-client-2=0x000000010000000000000000<br>
                    trusted.gfid=0xea950263977e46bf89a0ef631ca139c2<br>
                    <br>
                    BRICK 2<br>
                    =======<br>
                    [root@ip-172-31-16-220 ~]# getfattr -d -m . -e hex<br>
                    /data/vol2/gluster-data/apache_cp_mm1/logs/access_log.<a
                      moz-do-not-send="true" href="tel:2014-09-05-17"
                      value="+12014090517" target="_blank">2014-09-05-17</a>_00_00<br>
                    getfattr: Removing leading '/' from absolute path
                    names<br>
                    # file:<br>
                    data/vol2/gluster-data/apache_cp_mm1/logs/access_log.<a
                      moz-do-not-send="true" href="tel:2014-09-05-17"
                      value="+12014090517" target="_blank">2014-09-05-17</a>_00_00<br>
                    trusted.afr.PL2-client-0=0x00000d460000000000000000<br>
                    trusted.afr.PL2-client-1=0x000000000000000000000000<br>
                    trusted.afr.PL2-client-2=0x000000000000000000000000<br>
                    trusted.gfid=0xea950263977e46bf89a0ef631ca139c2<br>
                    BRICK 3<br>
                    =========<br>
                    [root@ip-172-31-12-218 ~]# getfattr -d -m . -e hex<br>
                    /data/vol2/gluster-data/apache_cp_mm1/logs/access_log.<a
                      moz-do-not-send="true" href="tel:2014-09-05-17"
                      value="+12014090517" target="_blank">2014-09-05-17</a>_00_00<br>
                    getfattr: Removing leading '/' from absolute path
                    names<br>
                    # file:<br>
                    data/vol2/gluster-data/apache_cp_mm1/logs/access_log.<a
                      moz-do-not-send="true" href="tel:2014-09-05-17"
                      value="+12014090517" target="_blank">2014-09-05-17</a>_00_00<br>
                    trusted.afr.PL2-client-0=0x00000d460000000000000000<br>
                    trusted.afr.PL2-client-1=0x000000000000000000000000<br>
                    trusted.afr.PL2-client-2=0x000000000000000000000000<br>
                    trusted.gfid=0xea950263977e46bf89a0ef631ca139c2<br>
                  </blockquote>
                  Here, we see that brick 1 shows a single pending
                  operation for the other<br>
                  two, while they show 0xd46 (3398) pending operations
                  for brick 1.<br>
                  Here's how this can happen.<br>
                  <br>
                  (1) There is exactly one pending operation.<br>
                  <br>
                  (2) Brick1 completes the write first, and says so.<br>
                  <br>
                  (3) Client sends messages to all three, saying to
                  decrement brick1's<br>
                  count.<br>
                  <br>
                  (4) All three bricks receive and process that message.<br>
                  <br>
                  (5) Brick1 fails.<br>
                  <br>
                  (6) Brick2 and brick3 complete the write, and say so.<br>
                  <br>
                  (7) Client tells all bricks to decrement remaining
                  counts.<br>
                  <br>
                  (8) Brick2 and brick3 receive and process that
                  message.<br>
                  <br>
                  (9) Brick1 is dead, so its counts for brick2/3 stay at
                  one.<br>
                  <br>
                  (10) Brick2 and brick3 have quorum, with all-zero
                  pending counters.<br>
                  <br>
                  (11) Client sends 0xd46 more writes to brick2 and
                  brick3.<br>
                  <br>
                  Note that at no point did we lose quorum. Note also
                  the tight timing<br>
                  required.  If brick1 had failed an instant earlier, it
                  would not have<br>
                  decremented its own counter.  If it had failed an
                  instant later, it<br>
                  would have decremented brick2's and brick3's as well. 
                  If brick1 had not<br>
                  finished first, we'd be in yet another scenario.  If
                  delayed changelog<br>
                  had been operative, the messages at (3) and (7) would
                  have been combined<br>
                  to leave us in yet another scenario.  As far as I can
                  tell, we would<br>
                  have been able to resolve the conflict in all those
                  cases.<br>
                  *** Key point: quorum enforcement does not totally
                  eliminate split<br>
                  brain.  It only makes the frequency a few orders of
                  magnitude lower. ***<br>
                </blockquote>
                <br>
              </div>
            </div>
            Not quite right. After we fixed the bug <a
              moz-do-not-send="true"
              href="https://bugzilla.redhat.com/show_bug.cgi?id=1066996"
              target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1066996</a>,
            the only two possible ways to introduce split-brain are<br>
            1) if we have an implementation bug in changelog xattr
            marking, I believe that to be the case here.<br>
            2) Keep writing to the file from the mount then<br>
            a) take brick 1 down, wait until at least one write is
            successful<br>
            b) bring brick1 back up and take brick 2 down (self-heal
            should not happen) wait until at least one write is
            successful<br>
            c) bring brick2 back up and take brick 3 down (self-heal
            should not happen) wait until at least one write is
            successful<br>
            <br>
            With outcast implementation case-2 will also be immune to
            split-brain errors.<br>
            <br>
            Then the only way we have split-brains in afr is
            implementation errors of changelog marking. If we test it
            thoroughly and fix such problems we can get it to be immune
            to split-brain :-).<span class="HOEnZb"><font
                color="#888888"><br>
                <br>
                Pranith<br>
              </font></span>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
                class="">
                So, is there any way to prevent this completely?  Some
                AFR enhancements,<br>
                such as the oft-promised "outcast" feature[1], might
                have helped.<br>
                NSR[2] is immune to this particular problem.  "Policy
                based split brain<br>
                resolution"[3] might have resolved it automatically
                instead of merely<br>
                flagging it.  Unfortunately, those are all in the
                future.  For now, I'd<br>
                say the best approach is to resolve the conflict
                manually and try to<br>
                move on.  Unless there's more going on than meets the
                eye, recurrence<br>
                should be very unlikely.<br>
                <br>
                [1] <a moz-do-not-send="true"
href="http://www.gluster.org/community/documentation/index.php/Features/outcast"
                  target="_blank">http://www.gluster.org/community/documentation/index.php/Features/outcast</a><br>
                <br>
                [2] <a moz-do-not-send="true"
href="http://www.gluster.org/community/documentation/index.php/Features/new-style-replication"
                  target="_blank">http://www.gluster.org/community/documentation/index.php/Features/new-style-replication</a><br>
                <br>
                [3] <a moz-do-not-send="true"
href="http://www.gluster.org/community/documentation/index.php/Features/pbspbr"
                  target="_blank">http://www.gluster.org/community/documentation/index.php/Features/pbspbr</a><br>
              </span><span class="">
                _______________________________________________<br>
                Gluster-users mailing list<br>
                <a moz-do-not-send="true"
                  href="mailto:Gluster-users@gluster.org"
                  target="_blank">Gluster-users@gluster.org</a><br>
                <a moz-do-not-send="true"
                  href="http://supercolony.gluster.org/mailman/listinfo/gluster-users"
                  target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br>
              </span></blockquote>
            <br>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>