<div dir="ltr"><div>Hi all,<br><br>Is the 42s timeout tunable?<br><br></div><div>Should the default be made lower, eg. 3 second?<br></div><div><br></div>Thanks.<br><br><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Tue, Feb 11, 2014 at 3:37 PM, Kaushal M <span dir="ltr"><<a href="mailto:kshlmster@gmail.com" target="_blank">kshlmster@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
The 42 second hang is most likely the ping timeout of the client translator.<br>
<br>
What most likely happened was that, the brick on annex3 was being used<br>
for the read when you pulled its plug. When you pulled the plug, the<br>
connection between the client and annex3 isn't gracefully terminated<br>
and the client translator still sees the connection as alive. Because<br>
of this the next fop is also sent to annex3, but it will timeout as<br>
annex3 is dead. After the timeout happens, the connection is marked as<br>
dead, and the associated client xlator is marked as down. Since afr<br>
now know annex3 is dead, it sends the next fop to annex4 which is<br>
still alive.<br>
<br>
These kinds of unclean connection terminations are only handled by<br>
request/ping timeouts currently. You could set the ping timeout values<br>
to be lower, to reduce the detection time.<br>
<span class="HOEnZb"><font color="#888888"><br>
~kaushal<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
On Tue, Feb 11, 2014 at 11:57 AM, Krishnan Parthasarathi<br>
<<a href="mailto:kparthas@redhat.com">kparthas@redhat.com</a>> wrote:<br>
> James,<br>
><br>
> Could you provide the logs of the mount process, where you see the hang for 42s?<br>
> My initial guess, seeing 42s, is that the client translator's ping timeout<br>
> is in play.<br>
><br>
> I would encourage you to report a bug and attach relevant logs.<br>
> If the issue (observed) turns out to be an acceptable/explicable behavioural<br>
> quirk of glusterfs, then we could close the bug :-)<br>
><br>
> cheers,<br>
> Krish<br>
> ----- Original Message -----<br>
>> It's been a while since I did some gluster replication testing, so I<br>
>> spun up a quick cluster *cough, plug* using puppet-gluster+vagrant (of<br>
>> course) and here are my results.<br>
>><br>
>> * Setup is a 2x2 distributed-replicated cluster<br>
>> * Hosts are named: annex{1..4}<br>
>> * Volume name is 'puppet'<br>
>> * Client vm's mount (fuse) the volume.<br>
>><br>
>> * On the client:<br>
>><br>
>> # cd /mnt/gluster/puppet/<br>
>> # dd if=/dev/urandom of=random.51200 count=51200<br>
>> # sha1sum random.51200<br>
>> # rsync -v --bwlimit=10 --progress random.51200 root@localhost:/tmp<br>
>><br>
>> * This gives me about an hour to mess with the bricks...<br>
>> * By looking on the hosts directly, I see that the random.51200 file is<br>
>> on annex3 and annex4...<br>
>><br>
>> * On annex3:<br>
>> # poweroff<br>
>> [host shuts down...]<br>
>><br>
>> * On client1:<br>
>> # time ls<br>
>> random.51200<br>
>><br>
>> real 0m42.705s<br>
>> user 0m0.001s<br>
>> sys 0m0.002s<br>
>><br>
>> [hangs for about 42 seconds, and then returns successfully...]<br>
>><br>
>> * I then powerup annex3, and then pull the plug on annex4. The same sort<br>
>> of thing happens... It hangs for 42 seconds, but then everything works<br>
>> as normal. This is of course the cluster timeout value and the answer to<br>
>> life the universe and everything.<br>
>><br>
>> Question: Why doesn't glusterfs automatically flip over to using the<br>
>> other available host right away? If you agree, I'll report this as a<br>
>> bug. If there's a way to do this, let me know.<br>
>><br>
>> Apart from the delay, glad that this is of course still HA ;)<br>
>><br>
>> Cheers,<br>
>> James<br>
>> @purpleidea (twitter/irc)<br>
>> <a href="https://ttboj.wordpress.com/" target="_blank">https://ttboj.wordpress.com/</a><br>
>><br>
>><br>
>> _______________________________________________<br>
>> Gluster-devel mailing list<br>
>> <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>
>> <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>
>><br>
> _______________________________________________<br>
> Gluster-users mailing list<br>
> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
> <a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br>
<br>
_______________________________________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>
<a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>Sharuzzaman Ahmat Raslan
</div>