<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><pre>hi, Pranith,</pre><pre>     Thank you for your detailed reply, now I know what the problem is , and I will trace it.&nbsp;</pre><pre>     Nice reply, thanks.</pre><pre>Best Regards.</pre><pre>Jules Wang</pre><pre><br>At&nbsp;2012-08-01&nbsp;17:13:51,"Pranith&nbsp;Kumar&nbsp;Karampuri"&nbsp;&lt;pkarampu@redhat.com&gt;&nbsp;wrote:
&gt;Jules,
&gt;&nbsp;&nbsp;&nbsp;&nbsp;When&nbsp;a&nbsp;frame&nbsp;hits&nbsp;its&nbsp;time-out&nbsp;'rpc/rpc-lib/src/rpc-clnt.c:138:call_bail&nbsp;(void&nbsp;*data)'&nbsp;is&nbsp;triggered.
&gt;When&nbsp;the&nbsp;client&nbsp;observes&nbsp;a&nbsp;network&nbsp;disconnection&nbsp;(ping-timer-expiry&nbsp;etc)&nbsp;it&nbsp;triggers&nbsp;'rpc/rpc-lib/src/rpc-clnt.c:341:saved_frames_unwind&nbsp;(struct&nbsp;saved_frames&nbsp;*saved_frames)'.&nbsp;When&nbsp;a&nbsp;node&nbsp;goes&nbsp;down,&nbsp;ping&nbsp;timer&nbsp;will&nbsp;expire&nbsp;and&nbsp;then&nbsp;the&nbsp;frames&nbsp;are&nbsp;unwound&nbsp;in&nbsp;at&nbsp;max&nbsp;~42&nbsp;seconds.&nbsp;So&nbsp;in&nbsp;VM&nbsp;scenario&nbsp;it&nbsp;wont&nbsp;hang&nbsp;for&nbsp;30&nbsp;minutes.
&gt;To&nbsp;answer&nbsp;your&nbsp;actual&nbsp;question,&nbsp;why&nbsp;such&nbsp;a&nbsp;big&nbsp;frame&nbsp;timeout:&nbsp;Afr&nbsp;takes&nbsp;entry-locks&nbsp;while&nbsp;performing&nbsp;self-heals,&nbsp;which&nbsp;block&nbsp;other&nbsp;entry&nbsp;fops&nbsp;like&nbsp;create,&nbsp;delete&nbsp;etc.&nbsp;The&nbsp;timeout&nbsp;is&nbsp;put&nbsp;sufficiently&nbsp;large&nbsp;to&nbsp;succeed&nbsp;the&nbsp;entry&nbsp;operations.
&gt;
&gt;Afr&nbsp;used&nbsp;to&nbsp;take&nbsp;a&nbsp;lock&nbsp;on&nbsp;entire&nbsp;file&nbsp;to&nbsp;perform&nbsp;data-self-heal&nbsp;on&nbsp;a&nbsp;regular&nbsp;file,&nbsp;we&nbsp;managed&nbsp;to&nbsp;remove&nbsp;that.&nbsp;We&nbsp;are&nbsp;working&nbsp;on&nbsp;doing&nbsp;the&nbsp;same&nbsp;for&nbsp;entry-self-heal.&nbsp;Once&nbsp;that&nbsp;happens&nbsp;we&nbsp;will&nbsp;be&nbsp;in&nbsp;a&nbsp;good&nbsp;position&nbsp;to&nbsp;change&nbsp;these&nbsp;to&nbsp;lower&nbsp;values.
&gt;
&gt;Pranith.
&gt;
&gt;-----&nbsp;Original&nbsp;Message&nbsp;-----
&gt;From:&nbsp;"Jules&nbsp;Wang"&nbsp;&lt;lancelotds@163.com&gt;
&gt;To:&nbsp;"devel"&nbsp;&lt;gluster-devel@nongnu.org&gt;
&gt;Sent:&nbsp;Wednesday,&nbsp;August&nbsp;1,&nbsp;2012&nbsp;1:55:47&nbsp;PM
&gt;Subject:&nbsp;[Gluster-devel]&nbsp;question&nbsp;on&nbsp;time-out&nbsp;parameters
&gt;
&gt;
&gt;
&gt;hi,&nbsp;all&nbsp;
&gt;When&nbsp;I&nbsp;was&nbsp;tracking&nbsp;the&nbsp;bug&nbsp;https://bugzilla.redhat.com/show_bug.cgi?id=794699&nbsp;
&gt;
&gt;
&gt;I&nbsp;noticed&nbsp;that&nbsp;the&nbsp;default&nbsp;value&nbsp;of&nbsp;"ping-timeout"&nbsp;was&nbsp;42&nbsp;and&nbsp;the&nbsp;default&nbsp;value&nbsp;of&nbsp;"frame-timeout"&nbsp;was&nbsp;1800(30&nbsp;minutes)&nbsp;(in&nbsp;xlators/protocol/client/src/client.c)&nbsp;
&gt;
&gt;When&nbsp;a&nbsp;node&nbsp;is&nbsp;down(ex.&nbsp;powered&nbsp;off),&nbsp;the&nbsp;volume&nbsp;will&nbsp;be&nbsp;out-of-service&nbsp;for&nbsp;a&nbsp;long&nbsp;time.&nbsp;If&nbsp;there&nbsp;is&nbsp;a&nbsp;vm&nbsp;run&nbsp;on&nbsp;the&nbsp;volume,&nbsp;it&nbsp;will&nbsp;probably&nbsp;get&nbsp;crush.&nbsp;
&gt;
&gt;
&gt;So&nbsp;I&nbsp;wonder&nbsp;why&nbsp;we&nbsp;set&nbsp;large&nbsp;number&nbsp;to&nbsp;these&nbsp;parameters?&nbsp;
&gt;
&gt;Best&nbsp;Regards.&nbsp;
&gt;
&gt;
&gt;Jules&nbsp;Wang&nbsp;
&gt;
&gt;
&gt;_______________________________________________
&gt;Gluster-devel&nbsp;mailing&nbsp;list
&gt;Gluster-devel@nongnu.org
&gt;https://lists.nongnu.org/mailman/listinfo/gluster-devel
</pre></div><br><br><span title="neteasefooter"><span id="netease_mail_footer"></span></span>