<div dir="ltr">Hello,<div><br></div><div>Interesting, we seems to be several users with issues regarding recovery but there is no to little replies... ;-)</div><div><br></div><div>I did some more testing over the weekend. Same initial workload (two glusterfs servers, one client that continuesly</div>

<div>updates a file with timestamps) and then two easy testcases:</div><div><br></div><div>1. one of the glusterfs servers is constantly rebooting (just a initscript that sleeps for 60 seconds before issuing &quot;reboot&quot;)</div>

<div><br></div><div>2. similar to 1 but instead of rebooting itself, it is rebooting the other glusterfs server so that the result is that they a server</div><div>    comes up, wait for a bit and then rebooting the other server.</div>

<div><br></div><div>During the whole weekend this has progressed nicely. The client is running all the time without issues and the glusterfs</div><div>that comes back (either only one or one of the servers, depending on the testcase shown above) is actively getting into</div>

<div>sync and updates it&#39;s copy of the file.</div><div><br></div><div>So it seems to me that we need to look deeper in the recovery case (of course, but it is interesting to know about the</div><div>nice&amp;easy usescases as well). I&#39;m surprised that the recovery from a failover (to restore the rendundancy) isn&#39;t getting</div>

<div>higher attention here. Are we (and others that has difficulties in this area) running a unusual usecase?</div><div><br></div><div>BR,</div><div>Per</div><div><div class="gmail_extra"><br><br><div class="gmail_quote">

On Wed, Dec 4, 2013 at 12:17 PM, Per Hallsmark <span dir="ltr">&lt;<a href="mailto:per@hallsmark.se" target="_blank">per@hallsmark.se</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div dir="ltr"><div>Hello,</div><div><br></div><div>I&#39;ve found GlusterFS to be an interesting project. Not so much experience of it</div><div>(although from similar usecases with DRBD+NFS setups) so I setup some</div>

<div>

testcase to try out failover and recovery.</div><div><br></div><div>For this I have a setup with two glusterfs servers (each is a VM) and one client (also a VM).</div><div>I&#39;m using GlusterFS 3.4 btw.</div><div><br></div>


<div>The servers manages a gluster volume created as:</div><div><br></div><div>gluster volume create testvol rep 2 transport tcp gs1:/export/vda1/brick gs2:/export/vda1/brick</div><div>gluster volume start testvol</div><div>


gluster volume set testvol network.ping-timeout<span style="white-space:pre-wrap">        </span>5</div><div><br></div><div>Then the client mounts this volume as:</div><div>mount -t glusterfs gs1:/testvol /import/testvol</div>

<div><br></div><div>Everything seems to work good in normal usecases, I can write/read to the volume, take servers down and up again etc.</div><div><br></div><div>As a fault scenario, I&#39;m testing a fault injection like this:</div>


<div><br></div><div>1. continuesly writing timestamps to a file on the volume from the client. It is automated in a smaller testscript like:</div><div><div>:~/glusterfs-test$ cat scripts/test-gfs-client.sh </div>

<div>#!/bin/sh</div><div><br></div><div>gfs=/import/testvol</div><div><br></div><div>while true; do</div><div><span style="white-space:pre-wrap">        </span>date +%s &gt;&gt; $gfs/timestamp.txt</div><div><span style="white-space:pre-wrap">        </span>ts=`tail -1 $gfs/timestamp.txt`</div>


<div><span style="white-space:pre-wrap">        </span>md5sum=`md5sum $gfs/timestamp.txt | cut -f1 -d&quot; &quot;`</div><div><span style="white-space:pre-wrap">        </span>echo &quot;Timestamp = $ts, md5sum = $md5sum&quot;</div>

<div><span style="white-space:pre-wrap">        </span>sleep 1</div><div>done</div><div>:~/glusterfs-test$</div></div><div><br></div><div><div>As can be seen, the client is a quite simple user of the glusterfs volume. Low datarate and single user for example.</div>


</div><div><br></div><div><br></div><div>2. disabling ethernet in one of the VM (ifconfig eth0 down) to simulate like a broken network</div><div><br></div><div>3. After a short while, the failed server is brought alive again (ifconfig eth0 up)</div>


<div><br></div><div>Step 2 and 3 is also automated in a testscript like:</div><div><br></div><div><div>:~/glusterfs-test$ cat scripts/fault-injection.sh </div><div>#!/bin/sh</div><div><br></div><div># fault injection script tailored for two glusterfs nodes named gs1 and gs2</div>


<div><br></div><div>if [ &quot;$HOSTNAME&quot; == &quot;gs1&quot; ]; then</div><div><span style="white-space:pre-wrap">        </span>peer=&quot;gs2&quot;</div><div>else</div><div><span style="white-space:pre-wrap">        </span>peer=&quot;gs1&quot;</div>


<div>fi</div><div><br></div><div>inject_eth_fault() {</div><div><span style="white-space:pre-wrap">        </span>echo &quot;network down...&quot;</div><div><span style="white-space:pre-wrap">        </span>ifconfig eth0 down</div>

<div><span style="white-space:pre-wrap">        </span>sleep 10</div><div><span style="white-space:pre-wrap">        </span>ifconfig eth0 up</div><div><span style="white-space:pre-wrap">        </span>echo &quot;... and network up again.&quot;</div>


<div>}</div><div><br></div><div>recover() {</div><div><span style="white-space:pre-wrap">        </span>echo &quot;recovering from fault...&quot;</div><div><span style="white-space:pre-wrap">        </span>service glusterd restart</div>


<div>}</div><div><br></div><div>while true; do</div><div><span style="white-space:pre-wrap">        </span>sleep 60</div><div><span style="white-space:pre-wrap">        </span>if [ ! -f /tmp/nofault ]; then</div><div><span style="white-space:pre-wrap">                </span>if ping -c 1 $peer; then</div>


<div><span style="white-space:pre-wrap">                        </span>inject_eth_fault</div><div><span style="white-space:pre-wrap">                        </span>recover</div><div><span style="white-space:pre-wrap">                </span>fi</div><div><span style="white-space:pre-wrap">        </span>fi</div>


<div>done</div><div>:~/glusterfs-test$</div></div><div><br></div><div><br></div><div>I then see that:</div><div><br></div><div>A. This goes well first time, one server leaves the cluster and the client hang for like 8 seconds before beeing able to write to the volume again.</div>


<div><br></div><div>B. When the failed server comes back, I can check that from both servers they see each other and &quot;gluster peer status&quot; shows they believe the other is in connected state.</div><div><br></div>


<div>C. When the failed server comes back, it is not automatically seeking active participation on syncing volume etc (the local storage timestamp file isn&#39;t updated).</div><div><br></div><div>D. If I do restart of glusterd service (service glusterd restart) the failed node seems to get back like it was before. Not always though... The chance is higher if I have long time between fault injections (long = 60 sec or so, with a forced faulty state of 10 sec)</div>


<div>With a period time of some minutes, I could have the cluster servicing the client OK for up to 8+ hours at least.</div><div>Shortening the period, I&#39;m easily down to like 10-15 minutes.</div><div><br></div><div>

E. Sooner or later I enter a state where the two servers seems to be up, seeing it&#39;s peer (gluster peer status) and such but none is serving the volume to the client.</div>

<div>I&#39;ve tried to &quot;heal&quot; the volume in different way but it doesn&#39;t help. Sometimes it is just that one of the timestamp copies in each of</div><div>the servers is ahead which is simpler but sometimes both the timestamp files have added data at end that the other doesnt have.</div>


<div><br></div><div>To the questions: </div><div><br></div><div>* Is it so that from a design point of perspective, the choice in the glusterfs team is that one shouldn&#39;t rely soley on glusterfs daemons beeing able to  recover from a faulty state? There is need for cluster manager services (like heartbeat for example) to be part? That would make experience C understandable and one could then take heartbeat or similar packages to start/stop services.</div>


<div><br></div><div>* What would then be the recommended procedure to recover from a faulty glusterfs node? (so that experience D and E is not happening)</div><div><br></div><div>* What is the expected failover timing (of course depending on config, but say with a give ping timeout etc)?</div>


<div>  and expected recovery timing (with similar dependency on config)?</div><div><br></div><div>* What/how is glusterfs team testing to make sure that the failover, recovery/healing functionality etc works?</div><div><br>


</div><div>Any opinion if the testcase is bad is of course also very welcome.</div><div><br></div><div>Best regards,</div><div>Per</div></div>

</blockquote></div><br></div></div></div>