<div dir="ltr"><div>Hi all,<br><br>Is the 42s timeout tunable?<br><br></div><div>Should the default be made lower, eg. 3 second?<br></div><div><br></div>Thanks.<br><br><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">


On Tue, Feb 11, 2014 at 3:37 PM, Kaushal M <span dir="ltr">&lt;<a href="mailto:kshlmster@gmail.com" target="_blank">kshlmster@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


The 42 second hang is most likely the ping timeout of the client translator.<br>

<br>

What most likely happened was that, the brick on annex3 was being used<br>

for the read when you pulled its plug. When you pulled the plug, the<br>

connection between the client and annex3 isn&#39;t gracefully terminated<br>

and the client translator still sees the connection as alive. Because<br>

of this the next fop is also sent to annex3, but it will timeout as<br>

annex3 is dead. After the timeout happens, the connection is marked as<br>

dead, and the associated client xlator is marked as down. Since afr<br>

now know annex3 is dead, it sends the next fop to annex4 which is<br>

still alive.<br>

<br>

These kinds of unclean connection terminations are only handled by<br>

request/ping timeouts currently. You could set the ping timeout values<br>

to be lower, to reduce the detection time.<br>

<span class="HOEnZb"><font color="#888888"><br>

~kaushal<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

On Tue, Feb 11, 2014 at 11:57 AM, Krishnan Parthasarathi<br>

&lt;<a href="mailto:kparthas@redhat.com">kparthas@redhat.com</a>&gt; wrote:<br>

&gt; James,<br>

&gt;<br>

&gt; Could you provide the logs of the mount process, where you see the hang for 42s?<br>

&gt; My initial guess, seeing 42s, is that the client translator&#39;s ping timeout<br>

&gt; is in play.<br>

&gt;<br>

&gt; I would encourage you to report a bug and attach relevant logs.<br>

&gt; If the issue (observed) turns out to be an acceptable/explicable behavioural<br>

&gt; quirk of glusterfs, then we could close the bug :-)<br>

&gt;<br>

&gt; cheers,<br>

&gt; Krish<br>

&gt; ----- Original Message -----<br>

&gt;&gt; It&#39;s been a while since I did some gluster replication testing, so I<br>

&gt;&gt; spun up a quick cluster *cough, plug* using puppet-gluster+vagrant (of<br>

&gt;&gt; course) and here are my results.<br>

&gt;&gt;<br>

&gt;&gt; * Setup is a 2x2 distributed-replicated cluster<br>

&gt;&gt; * Hosts are named: annex{1..4}<br>

&gt;&gt; * Volume name is &#39;puppet&#39;<br>

&gt;&gt; * Client vm&#39;s mount (fuse) the volume.<br>

&gt;&gt;<br>

&gt;&gt; * On the client:<br>

&gt;&gt;<br>

&gt;&gt; # cd /mnt/gluster/puppet/<br>

&gt;&gt; # dd if=/dev/urandom of=random.51200 count=51200<br>

&gt;&gt; # sha1sum random.51200<br>

&gt;&gt; # rsync -v --bwlimit=10 --progress random.51200 root@localhost:/tmp<br>

&gt;&gt;<br>

&gt;&gt; * This gives me about an hour to mess with the bricks...<br>

&gt;&gt; * By looking on the hosts directly, I see that the random.51200 file is<br>

&gt;&gt; on annex3 and annex4...<br>

&gt;&gt;<br>

&gt;&gt; * On annex3:<br>

&gt;&gt; # poweroff<br>

&gt;&gt; [host shuts down...]<br>

&gt;&gt;<br>

&gt;&gt; * On client1:<br>

&gt;&gt; # time ls<br>

&gt;&gt; random.51200<br>

&gt;&gt;<br>

&gt;&gt; real    0m42.705s<br>

&gt;&gt; user    0m0.001s<br>

&gt;&gt; sys     0m0.002s<br>

&gt;&gt;<br>

&gt;&gt; [hangs for about 42 seconds, and then returns successfully...]<br>

&gt;&gt;<br>

&gt;&gt; * I then powerup annex3, and then pull the plug on annex4. The same sort<br>

&gt;&gt; of thing happens... It hangs for 42 seconds, but then everything works<br>

&gt;&gt; as normal. This is of course the cluster timeout value and the answer to<br>

&gt;&gt; life the universe and everything.<br>

&gt;&gt;<br>

&gt;&gt; Question: Why doesn&#39;t glusterfs automatically flip over to using the<br>

&gt;&gt; other available host right away? If you agree, I&#39;ll report this as a<br>

&gt;&gt; bug. If there&#39;s a way to do this, let me know.<br>

&gt;&gt;<br>

&gt;&gt; Apart from the delay, glad that this is of course still HA ;)<br>

&gt;&gt;<br>

&gt;&gt; Cheers,<br>

&gt;&gt; James<br>

&gt;&gt; @purpleidea (twitter/irc)<br>

&gt;&gt; <a href="https://ttboj.wordpress.com/" target="_blank">https://ttboj.wordpress.com/</a><br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; _______________________________________________<br>

&gt;&gt; Gluster-devel mailing list<br>

&gt;&gt; <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>

&gt;&gt; <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

&gt;&gt;<br>

&gt; _______________________________________________<br>

&gt; Gluster-users mailing list<br>

&gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

&gt; <a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br>

<br>

_______________________________________________<br>

Gluster-devel mailing list<br>

<a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>

<a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>Sharuzzaman Ahmat Raslan

</div>