It&#39;s a kernel bug which already fixed in RHEL6.2, try it with 2.6.32-220 kernel, it will works fine for you.<br><br><div class="gmail_quote">On Fri, May 11, 2012 at 12:24 PM, 程耀东 <span dir="ltr">&lt;<a href="mailto:chyd@ihep.ac.cn" target="_blank">chyd@ihep.ac.cn</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<br>

I am using gluster in physical machine (CPU: 2 Xeon E5620, MEM: 24GB, 1Gpbs network link, Centos 6.0 linux 2.6.32-71.el6.x86_64). &nbsp;When reading or writing small numbers of files, the system is fine. But when too many files are accessing concurrently, the problem will occure some times. &nbsp;After disabling THP entirely:<br>


echo never&gt; /sys/kernel/mm/redhat_transparent_hugepage/enabled<br>

It seems that the problem is resolved. I will continue to test it and see the results.<br>

<br>

Thank,<br>

yaodong<br>

<br>

&gt; -----原始邮件-----<br>

&gt; 发件人: &quot;Bryan Whitehead&quot; &lt;<a href="mailto:driver@megahappy.net" target="_blank">driver@megahappy.net</a>&gt;<br>

&gt; 发送时间: 2012年5月11日 星期五<br>

&gt; 收件人: chyd &lt;<a href="mailto:chyd@ihep.ac.cn" target="_blank">chyd@ihep.ac.cn</a>&gt;<br>

&gt; 抄送: gluster-users &lt;<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>&gt;<br>

&gt; 主题: Re: [Gluster-users] BUG: 764964 (dead lock)<br>

&gt;<br>

&gt; Can you explain where glusterfs is being used? Is this lockup<br>

&gt; happening on a VM running in on a file-disk-image on top of gluster?<br>

&gt; is gluster itself causing this timeout?<br>

&gt;<br>

&gt; On Wed, May 9, 2012 at 6:59 PM, chyd &lt;<a href="mailto:chyd@ihep.ac.cn" target="_blank">chyd@ihep.ac.cn</a>&gt; wrote:<br>

&gt; &gt; Hi all,<br>

&gt; &gt;<br>

&gt; &gt; I&#39;m encountering a lockup problem many times when reading/writing large<br>

&gt; &gt; numbers of files. I cannot break out of the race in gdb, a ps will lock up<br>

&gt; &gt; when it tries to read that process&#39; data, df (of course) locks up. No kill<br>

&gt; &gt; signals have any effect. Except &#39;pidstat -p ALL&#39; can get the pid, I could&#39;t<br>

&gt; &gt; do anything. The only way out of it is to umount -f.<br>

&gt; &gt; I am using gluster 3.2.6 on CentOS 6.0 (2.6.32-71.el6.x86_64).<br>

&gt; &gt;<br>

&gt; &gt; The problem is the same as BUG 764964<br>

&gt; &gt; (<a href="https://bugzilla.redhat.com/show_bug.cgi?id=764964" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=764964</a>). and it is difficult to<br>

&gt; &gt; duplicate, I am find a way to produce it quickly. Any one else also<br>

&gt; &gt; encountered this problem? How do you solve it?<br>

&gt; &gt;<br>

&gt; &gt; Attached dmesg log:<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: INFO: task glusterfs:27888 blocked for more<br>

&gt; &gt; than 120 seconds.<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: &quot;echo 0 &gt;<br>

&gt; &gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: glusterfs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; D ffff88033fc24b00&nbsp;&nbsp;&nbsp;&nbsp; 0<br>

&gt; &gt; 27888&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 0x00000080<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: ffff8806310fbe48 0000000000000086<br>

&gt; &gt; 0000000000000000 ffff8806310fbc58<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: ffff8806310fbdc8 0000000000020010<br>

&gt; &gt; ffff8806310fbee8 00000001021f184a<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: ffff8806311a0678 ffff8806310fbfd8<br>

&gt; &gt; 0000000000010518 ffff8806311a0678<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: Call Trace:<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: [&lt;ffffffff814ca6b5&gt;]<br>

&gt; &gt; rwsem_down_failed_common+0x95/0x1d0<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: [&lt;ffffffff814ca813&gt;]<br>

&gt; &gt; rwsem_down_write_failed+0x23/0x30<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: [&lt;ffffffff81264253&gt;]<br>

&gt; &gt; call_rwsem_down_write_failed+0x13/0x20<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: [&lt;ffffffff814c9d12&gt;] ? down_write+0x32/0x40<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: [&lt;ffffffff8113b468&gt;] sys_munmap+0x48/0x80<br>

&gt; &gt; May 10 00:01:52 PPC-002 kernel: [&lt;ffffffff81013172&gt;]<br>

&gt; &gt; system_call_fastpath+0x16/0x1b<br>

&gt; &gt;<br>

&gt; &gt; Thank you in advance.<br>

&gt; &gt; Yaodong<br>

&gt; &gt;<br>

&gt; &gt; 2012-05-10<br>

&gt; &gt; ________________________________<br>

&gt; &gt;<br>

&gt; &gt; _______________________________________________<br>

&gt; &gt; Gluster-users mailing list<br>

&gt; &gt; <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

&gt; &gt; <a href="http://gluster.org/cgi-bin/mailman/listinfo/gluster-users" target="_blank">http://gluster.org/cgi-bin/mailman/listinfo/gluster-users</a><br>

&gt; &gt;<br>

<br>

_______________________________________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

<a href="http://gluster.org/cgi-bin/mailman/listinfo/gluster-users" target="_blank">http://gluster.org/cgi-bin/mailman/listinfo/gluster-users</a><br>

</blockquote></div><br>