<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <div class="moz-cite-prefix">On 02/22/2014 05:44 PM, Greg Scott

      wrote:<br>

    </div>

    <blockquote

      cite="mid:141ac001d70043f1b38101ade5606472@mail2013.infrasupport.local"

      type="cite">

      <p class="MsoNormal">I have 2 nodes named fw1 and fw2.&nbsp; When I

        ifdown the NIC I&#8217;m using for Gluster on either node, that node

        cannot see&nbsp; its Gluster volume, but the other node can see it

        after a timeout.&nbsp; As soon as I ifup that NIC, everyone can see

        everything again.&nbsp; <o:p></o:p></p>

      <p class="MsoNormal"><o:p>&nbsp;</o:p></p>

      <p class="MsoNormal">Is this expected behavior?&nbsp; When that

        interconnect drops, I want both nodes to see their own local

        copy and then sync everything back up when the interconnect

        connects again.&nbsp;

      </p>

    </blockquote>

    If a client loses communication on an open tcp connection to a

    server, there is a timeout period (defaults to 42 seconds) where the

    client waits for the communication to continue as dropping and

    re-establishing hundreds to potentially tens of thousands of file

    descriptors and locks is a very expensive process, disruptive to the

    entire environment.<br>

    <br>

    With the test process you're describing, the clients are connected

    to both servers (hopefully based on hostname resolution) ip

    addresses on the same network. When you down a nic, that address is

    no longer available. Not only can the remote client not connect to

    it, but your local client cannot as well as the address no longer

    exists.<br>

    <br>

    In your real-life concern, the interconnect would not interfere with

    the existence of either machines' ip address so after the

    ping-timeout, operations would resume in a split-brain

    configuration. As long as no changes were made to the same file on

    both volumes, when the connection is reestablished, the self-heal

    will do exactly what you expect.<br>

    <br>

    However.... what you're counting on is the most common cause of

    split-brain. Each client connected to one server independently

    modifies the same file. When the connection is reestablished, the

    self-heal is processed and that file is marked as split-brain -

    inaccessible from the client mount until it's resolved by admin

    intervention.<br>

    <br>

    You can avoid the split-brain using a couple of quorum techniques,

    the one that would seem to satisfy your requirements leaving your

    volume read-only during the duration of the outage.<br>

  </body>

</html>