<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 02/22/2014 05:44 PM, Greg Scott
wrote:<br>
</div>
<blockquote
cite="mid:141ac001d70043f1b38101ade5606472@mail2013.infrasupport.local"
type="cite">
<p class="MsoNormal">I have 2 nodes named fw1 and fw2. When I
ifdown the NIC I’m using for Gluster on either node, that node
cannot see its Gluster volume, but the other node can see it
after a timeout. As soon as I ifup that NIC, everyone can see
everything again. <o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Is this expected behavior? When that
interconnect drops, I want both nodes to see their own local
copy and then sync everything back up when the interconnect
connects again.
</p>
</blockquote>
If a client loses communication on an open tcp connection to a
server, there is a timeout period (defaults to 42 seconds) where the
client waits for the communication to continue as dropping and
re-establishing hundreds to potentially tens of thousands of file
descriptors and locks is a very expensive process, disruptive to the
entire environment.<br>
<br>
With the test process you're describing, the clients are connected
to both servers (hopefully based on hostname resolution) ip
addresses on the same network. When you down a nic, that address is
no longer available. Not only can the remote client not connect to
it, but your local client cannot as well as the address no longer
exists.<br>
<br>
In your real-life concern, the interconnect would not interfere with
the existence of either machines' ip address so after the
ping-timeout, operations would resume in a split-brain
configuration. As long as no changes were made to the same file on
both volumes, when the connection is reestablished, the self-heal
will do exactly what you expect.<br>
<br>
However.... what you're counting on is the most common cause of
split-brain. Each client connected to one server independently
modifies the same file. When the connection is reestablished, the
self-heal is processed and that file is marked as split-brain -
inaccessible from the client mount until it's resolved by admin
intervention.<br>
<br>
You can avoid the split-brain using a couple of quorum techniques,
the one that would seem to satisfy your requirements leaving your
volume read-only during the duration of the outage.<br>
</body>
</html>