Thanks for the report Ian. I have filed a bug report <a href="https://bugzilla.redhat.com/show_bug.cgi?id=809982">https://bugzilla.redhat.com/show_bug.cgi?id=809982</a><br><br><div class="gmail_quote">On Wed, Apr 4, 2012 at 4:57 AM, Ian Latter <span dir="ltr">&lt;<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

Sorry;<br>

<br>

  That &quot;long (unsigned 32bit)&quot; should have been<br>

&quot;long (signed 32bit)&quot; ... so that&#39;s twice that bug has<br>

bitten ;-)<br>

<div class="im HOEnZb"><br>

<br>

Cheers,<br>

<br>

<br>

----- Original Message -----<br>

</div><div class="im HOEnZb">&gt;From: &quot;Ian Latter&quot; &lt;<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>&gt;<br>

&gt;To: &quot;Pranith Kumar K&quot; &lt;<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>&gt;<br>

</div><div class="im HOEnZb">&gt;Subject:  [Gluster-devel] SOLVED - Re:  replicate<br>

background threads<br>

</div><div class="HOEnZb"><div class="h5">&gt;Date: Wed, 04 Apr 2012 21:51:11 +1000<br>

&gt;<br>

&gt; Hello,<br>

&gt;<br>

&gt;<br>

&gt;   Michael and I ran a battery of testing today and<br>

&gt; closed out the two issues identified below (of March<br>

&gt; 11).<br>

&gt;<br>

&gt;<br>

&gt; FYI RE the &quot;background-self-heal-only&quot; patch;<br>

&gt;<br>

&gt;   It has been tested now to our satisfaction and<br>

&gt;   works as described/intended.<br>

&gt;<br>

&gt;<br>

&gt;<br>

<a href="http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-background-only.patch" target="_blank">http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-background-only.patch</a><br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; FYI RE the 2GB replicate error;<br>

&gt;<br>

&gt;   &gt;&gt;&gt;    2) Of the file that were replicated, not all were<br>

&gt;   &gt;&gt;&gt;          corrupted (capped at 2G -- note that we<br>

&gt;   &gt;&gt;&gt;          confirmed that this was the first 2G of the<br>

&gt;   &gt;&gt;&gt;          source file contents).<br>

&gt;   &gt;&gt;&gt;<br>

&gt;   &gt;&gt;&gt; So is there a known replicate issue with files<br>

&gt;   &gt;&gt;&gt; greater than 2GB?<br>

&gt;<br>

&gt;   We have confirmed this issue and the referenced<br>

&gt;   patch appears to correct the problem.  We were<br>

&gt;   able to get one particular file to reliably fail at 2GB<br>

&gt;   under GlusterFS 3.2.6, and then correctly<br>

&gt;   transfer it and many other &gt;2GB files, after<br>

&gt;   applying this patch.<br>

&gt;<br>

&gt;   The error stems from putting the off_t (64bit)<br>

&gt;   offset value into a void * cookie value typecast<br>

&gt;   as long (unsigned 32bit) and then restoring it into<br>

&gt;   an off_t again.  The tip-off was a recurring offset<br>

&gt;   of 18446744071562067968 seen in the logs. The<br>

&gt;   effect is described well here;<br>

&gt;<br>

&gt;<br>

<a href="http://stackoverflow.com/questions/5628484/unexpected-behavior-from-unsigned-int64" target="_blank">http://stackoverflow.com/questions/5628484/unexpected-behavior-from-unsigned-int64</a><br>

&gt;<br>

&gt;   We can&#39;t explain why this issue was intermittent,<br>

&gt;   and we&#39;re not sure if the rw_sh-&gt;offset is the<br>

&gt;   correct 64bit offset to use.  However that offset<br>

&gt;   appeared to match the cookie value in all tested<br>

&gt;   pre-failure states.  Please advise if there is a<br>

&gt;   better (more correct) off_t offset to use.<br>

&gt;<br>

&gt;<br>

&gt;<br>

<a href="http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-2GB.patch" target="_blank">http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-2GB.patch</a><br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; Thanks for your help,<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; ----- Original Message -----<br>

&gt; &gt;From: &quot;Ian Latter&quot; &lt;<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>&gt;<br>

&gt; &gt;To: &quot;Pranith Kumar K&quot; &lt;<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>&gt;<br>

&gt; &gt;Subject:  Re: [Gluster-devel] replicate background threads<br>

&gt; &gt;Date: Tue, 03 Apr 2012 20:41:48 +1000<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; Pizza reveals all ;-)<br>

&gt; &gt;<br>

&gt; &gt; There&#39;s an error in there with the LOCK going<br>

&gt; &gt; without a paired UNLOCK in the afr-common<br>

&gt; &gt; test.  Revised (untested) patch attached.<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; ----- Original Message -----<br>

&gt; &gt; &gt;From: &quot;Ian Latter&quot; &lt;<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>&gt;<br>

&gt; &gt; &gt;To: &quot;Pranith Kumar K&quot; &lt;<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>&gt;<br>

&gt; &gt; &gt;Subject:  Re: [Gluster-devel] replicate background threads<br>

&gt; &gt; &gt;Date: Tue, 03 Apr 2012 19:46:51 +1000<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; FYI - untested patch attached.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; ----- Original Message -----<br>

&gt; &gt; &gt; &gt;From: &quot;Ian Latter&quot; &lt;<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>&gt;<br>

&gt; &gt; &gt; &gt;To: &quot;Pranith Kumar K&quot; &lt;<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>&gt;<br>

&gt; &gt; &gt; &gt;Subject:  Re: [Gluster-devel] replicate background<br>

threads<br>

&gt; &gt; &gt; &gt;Date: Tue, 03 Apr 2012 18:50:11 +1000<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; FYI - I can see that this option doesn&#39;t exist, I&#39;m<br>

&gt; &gt; adding it<br>

&gt; &gt; &gt; &gt; now.<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; ----- Original Message -----<br>

&gt; &gt; &gt; &gt; &gt;From: &quot;Ian Latter&quot; &lt;<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>&gt;<br>

&gt; &gt; &gt; &gt; &gt;To: &quot;Pranith Kumar K&quot; &lt;<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>&gt;<br>

&gt; &gt; &gt; &gt; &gt;Subject:  Re: [Gluster-devel] replicate background<br>

&gt; threads<br>

&gt; &gt; &gt; &gt; &gt;Date: Mon, 02 Apr 2012 18:02:26 +1000<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; Hello Pranith,<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;   Michael has come back from his business trip and<br>

&gt; &gt; &gt; &gt; &gt; we&#39;re about to start testing again (though now under<br>

&gt; &gt; &gt; &gt; &gt; kernel 3.2.13 and GlusterFS 3.2.6).<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;   I&#39;ve published the 32bit (i586) client on the<br>

Saturn<br>

&gt; &gt; &gt; &gt; &gt; project site if anyone is chasing it;<br>

&gt; &gt; &gt; &gt; &gt;   <a href="http://midnightcode.org/projects/saturn/" target="_blank">http://midnightcode.org/projects/saturn/</a><br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;   One quick question, is there a tune-able parameter<br>

&gt; &gt; &gt; &gt; &gt; that will allow a stat to be non blocking (i.e. to<br>

stop<br>

&gt; &gt; &gt; &gt; &gt; self-heal going foreground) when the background<br>

&gt; &gt; &gt; &gt; &gt; self heal count is reached?<br>

&gt; &gt; &gt; &gt; &gt;   I.e. rather than having the stat hang for 2 days<br>

&gt; &gt; &gt; &gt; &gt; while the files are replicated, we&#39;d rather it fell<br>

&gt; &gt; &gt; &gt; &gt; through and allowed subsequent stats to attempt<br>

&gt; &gt; &gt; &gt; &gt; background self healing (perhaps at a time when<br>

&gt; &gt; &gt; &gt; &gt; background self heal slots are available).<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; Thanks,<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; ----- Original Message -----<br>

&gt; &gt; &gt; &gt; &gt; &gt;From: &quot;Ian Latter&quot; &lt;<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt;To: &quot;Pranith Kumar K&quot; &lt;<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt;Subject:  Re: [Gluster-devel] replicate background<br>

&gt; &gt; threads<br>

&gt; &gt; &gt; &gt; &gt; &gt;Date: Wed, 14 Mar 2012 19:36:24 +1000<br>

&gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; Hello,<br>

&gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; hi Ian,<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt;      Maintaining a queue of files that need to be<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; self-healed does not scale in practice, in cases<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; where there are millions of files that need self-<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; heal. So such a thing is not implemented. The<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; idea is to make self-heal foreground after a<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; certain-limit (background-self-heal-count) so<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; there is no necessity for such a queue.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; Pranith.<br>

&gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; Ok, I understand - it will be interesting to observe<br>

&gt; &gt; &gt; &gt; &gt; &gt; the system with the new knowledge from your<br>

&gt; &gt; &gt; &gt; &gt; &gt; messages - thanks for your help, appreciate it.<br>

&gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; Cheers,<br>

&gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; ----- Original Message -----<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt;From: &quot;Pranith Kumar K&quot; &lt;<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt;To: &quot;Ian Latter&quot; &lt;<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt;Subject:  Re: [Gluster-devel] replicate background<br>

&gt; &gt; &gt; threads<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt;Date: Wed, 14 Mar 2012 07:33:32 +0530<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; On 03/14/2012 01:47 AM, Ian Latter wrote:<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Thanks for the info Pranith;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &lt;pranithk&gt;  the option to increase the max<br>

num of<br>

&gt; &gt; &gt; &gt; &gt; background<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; self-heals<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; is cluster.background-self-heal-count. Default<br>

&gt; &gt; &gt; value of<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; which is 16. I<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; assume you know what you are doing to the<br>

&gt; &gt; performance<br>

&gt; &gt; &gt; &gt; &gt; of the<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; system by<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; increasing this number.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; I didn&#39;t know this.  Is there a queue length for<br>

&gt; &gt; what<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; is yet to be handled by the background self heal<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; count?  If so, can it also be adjusted?<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; ----- Original Message -----<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; From: &quot;Pranith Kumar K&quot;&lt;<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; To: &quot;Ian Latter&quot;&lt;<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; Subject:  Re: [Gluster-devel] replicate<br>

&gt; background<br>

&gt; &gt; &gt; &gt; &gt; threads<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; Date: Tue, 13 Mar 2012 21:07:53 +0530<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; On 03/13/2012 07:52 PM, Ian Latter wrote:<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; Hello,<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;     Well we&#39;ve been privy to our first true<br>

&gt; &gt; error in<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; Gluster now, and we&#39;re not sure of the cause.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;     The SaturnI machine with 1Gbyte of RAM was<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; exhausting its memory and crashing and we saw<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; core dumps on SaturnM and MMC.  Replacing<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; the SaturnI hardware with identical<br>

hardware to<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; SaturnM, but retaining SaturnI&#39;s original<br>

disks,<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; (so fixing the memory capacity problem) we saw<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; crashes randomly at all nodes.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;     Looking for irregularities at the file<br>

&gt; system<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; we noticed that (we&#39;d estimate) about 60% of<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; the files at the OS/EXT3 layer of SaturnI<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; (sourced via replicate from SaturnM) were of<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; size <a href="tel:2147483648" value="+12147483648">2147483648</a> (2^31) where they should<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; have been substantially larger.  While we<br>

would<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; happily accept &quot;you shouldn&#39;t be using a 32bit<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; gluster package&quot; as the answer, we note two<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; deltas;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;     1) All files used in testing were copied<br>

&gt; &gt; on from<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;          32 bit clients to 32 bit servers,<br>

&gt; with no<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;          observable errors<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;     2) Of the file that were replicated,<br>

not all<br>

&gt; &gt; &gt; were<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;          corrupted (capped at 2G -- note<br>

that we<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;          confirmed that this was the first 2G<br>

&gt; &gt; of the<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;          source file contents).<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; So is there a known replicate issue with files<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; greater than 2GB?  Has anyone done any<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; serious testing with significant numbers of<br>

&gt; files<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; of this size?  Are there configurations<br>

specific<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; to files/structures of these dimensions?<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; We noted that reversing the configuration,<br>

such<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; that SaturnI provides the replicate Brick<br>

&gt; amongst<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; a local distribute and a remote map to SaturnM<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; where SaturnM simply serves a local<br>

distribute;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; that the data served to MMC is accurate (it<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; continues to show 15GB files, even where there<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; is a local 2GB copy).  Further, a client<br>

&quot;cp&quot; at<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; MMC, of a file with a 2GB local replicate of a<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; 15GB file, will result in a 15GB file being<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; created and replicated via Gluster (i.e. the<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; correct specification at both server nodes).<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; So my other question is; Is it possible that<br>

&gt; we&#39;ve<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; managed to corrupt something in this<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; environment?  I.e. during the initial memory<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; exhaustion events?  And is there a robust way<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; to have the replicate files revalidated by<br>

&gt; gluster<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; as a stat doesn&#39;t seem to be correcting<br>

files in<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; this state (i.e. replicate on SaturnM<br>

results in<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; daemon crashes, replicate on SaturnI results<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; in files being left in the bad state).<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; Also, I&#39;m not a member of the users list; if<br>

&gt; these<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; questions are better posed there then let me<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; know and I&#39;ll re-post them there.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; Thanks,<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; ----- Original Message -----<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; From: &quot;Ian<br>

Latter&quot;&lt;<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; To:&lt;<a href="mailto:gluster-devel@nongnu.org">gluster-devel@nongnu.org</a>&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; Subject:  [Gluster-devel] replicate<br>

background<br>

&gt; &gt; &gt; &gt; threads<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; Date: Sun, 11 Mar 2012 20:17:15 +1000<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; Hello,<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     My mate Michael and I have been steadily<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; advancing our Gluster testing and today we<br>

&gt; &gt; finally<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; reached some heavier conditions.  The outcome<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; was different from expectations built from<br>

&gt; &gt; our more<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; basic testing so I think we have a couple of<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; questions regarding the AFR/Replicate<br>

&gt; background<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; threads that may need a developer&#39;s<br>

&gt; contribution.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; Any help appreciated.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     The setup is a 3 box environment, but<br>

lets<br>

&gt; &gt; &gt; start<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; with two;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;       SaturnM (Server)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - 6core CPU, 16GB RAM, 1Gbps net<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - 3.2.6 Kernel (custom distro)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - 3.2.5 Gluster (32bit)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - 3x2TB drives, CFQ, EXT3<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - Bricked up into a single local 6TB<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;             &quot;distribute&quot; brick<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - &quot;brick&quot; served to the network<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;       MMC (Client)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - 4core CPU, 8GB RAM, 1Gbps net<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - Ubuntu<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - 3.2.5 Gluster (32bit)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - Don&#39;t recall the disk space<br>

locally<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - &quot;brick&quot; from SaturnM mounted<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;       500 x 15Gbyte files were copied<br>

from MMC<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; to a single sub-directory on the brick served<br>

&gt; &gt; from<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; SaturnM, all went well and dandy.  So then we<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; moved on to a 3 box environment;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;       SaturnI (Server)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          = 1core CPU, 1GB RAM, 1Gbps net<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          = 3.2.6 Kernel (custom distro)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          = 3.2.5 Gluster (32bit)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          = 4x2TB drives, CFQ, EXT3<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          = Bricked up into a single local 8TB<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;             &quot;distribute&quot; brick<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          = &quot;brick&quot; served to the network<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;       SaturnM (Server/Client)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - 6core CPU, 16GB RAM, 1Gbps net<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - 3.2.6 Kernel (custom distro)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - 3.2.5 Gluster (32bit)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - 3x2TB drives, CFQ, EXT3<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - Bricked up into a single local 6TB<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;             &quot;distribute&quot; brick<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          = Replicate brick added to sit over<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;             the local distribute brick and a<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;             client &quot;brick&quot; mapped from<br>

SaturnI<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - Replicate &quot;brick&quot; served to the<br>

&gt; &gt; network<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;       MMC (Client)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - 4core CPU, 8GB RAM, 1Gbps net<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - Ubuntu<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - 3.2.5 Gluster (32bit)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - Don&#39;t recall the disk space<br>

locally<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          - &quot;brick&quot; from SaturnM mounted<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;          = &quot;brick&quot; from SaturnI mounted<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     Now, in lesser testing in this scenario<br>

&gt; &gt; all was<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; well - any files on SaturnI would appear on<br>

&gt; &gt; SaturnM<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; (not a functional part of our test) and the<br>

&gt; &gt; &gt; &gt; content on<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; SaturnM would appear on SaturnI (the real<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; objective).<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     Earlier testing used a handful of smaller<br>

&gt; &gt; files<br>

&gt; &gt; &gt; &gt; &gt; (10s<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; to 100s of Mbytes) and a single 15Gbyte file.<br>

&gt; &gt;  The<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; 15Gbyte file would be &quot;stat&quot; via an &quot;ls&quot;,<br>

which<br>

&gt; &gt; &gt; would<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; kick off a background replication (ls<br>

&gt; &gt; appeared un-<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; blocked) and the file would be transferred.<br>

&gt; &gt; Also,<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; interrupting the transfer (pulling the LAN<br>

&gt; cable)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; would result in a partial 15Gbyte file being<br>

&gt; &gt; &gt; &gt; corrected<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; in a subsequent background process on another<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; stat.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     *However* .. when confronted with 500 x<br>

&gt; &gt; 15Gbyte<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; files, in a single directory (but not the<br>

root<br>

&gt; &gt; &gt; &gt; &gt; directory)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; things don&#39;t quite work out as nicely.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     - First, the &quot;ls&quot; (at MMC against the<br>

&gt; SaturnM<br>

&gt; &gt; &gt; &gt; &gt; brick)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;       is blocking and hangs the terminal<br>

&gt; (ctrl-c<br>

&gt; &gt; &gt; &gt; &gt; doesn&#39;t<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;       kill it).<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; &lt;pranithk&gt;  At max 16 files can be self-healed<br>

&gt; &gt; in the<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; back-ground in<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; parallel. 17th file self-heal will happen<br>

in the<br>

&gt; &gt; &gt; &gt; &gt; &gt; foreground.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     - Then, looking from MMC at the SaturnI<br>

&gt; file<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;        system (ls -s) once per second,<br>

and then<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;        comparing the output (diff ls1.txt<br>

&gt; &gt; ls2.txt |<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;        grep -v &#39;&gt;&#39;) we can see that<br>

between 10<br>

&gt; &gt; &gt; and 17<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;        files are being updated simultaneously<br>

&gt; &gt; &gt; by the<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;        background process<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; &lt;pranithk&gt;  This is expected.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     - Further, a request at MMC for a<br>

&gt; single file<br>

&gt; &gt; &gt; &gt; that<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;       was originally in the 500 x 15Gbyte<br>

&gt; &gt; &gt; sub-dir on<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;       SaturnM (which should return<br>

&gt; unblocked with<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;       correct results) will;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;         a) work as expected if there are less<br>

&gt; &gt; &gt; than 17<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;             active background file tasks<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;         b) block/hang if there are more<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     - Where-as a stat (ls) outside of the 500<br>

&gt; &gt; x 15<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;        sub-directory, such as the root of<br>

that<br>

&gt; &gt; &gt; brick,<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;        would always work as expected (return<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;        immediately, unblocked).<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; &lt;pranithk&gt;  stat on the directory will only<br>

&gt; &gt; &gt; create the<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; files with &#39;0&#39;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; file size. Then when you ls/stat the actual<br>

&gt; &gt; file the<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; self-heal for the<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; file gets triggered.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     Thus, to us, it appears as though there<br>

&gt; is a<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; queue feeding a set of (around) 16 worker<br>

&gt; threads<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; in AFR.  If your request was to the loaded<br>

&gt; &gt; &gt; directory<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; then you would be blocked until a worker was<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; available, and if your request was to any<br>

other<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; location, it would return unblocked<br>

&gt; regardless of<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; the worker pool state.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     The only thread metric that we could<br>

&gt; find to<br>

&gt; &gt; &gt; &gt; tweak<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; was performance/io-threads (which was set to<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; 16 per physical disk; well per locks + posix<br>

&gt; &gt; brick<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; stacks) but increasing this to 64 per stack<br>

&gt; &gt; didn&#39;t<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; change the outcome (10 to 17 active<br>

background<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; transfers).<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; &lt;pranithk&gt;  the option to increase the max<br>

num of<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; background self-heals<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; is cluster.background-self-heal-count. Default<br>

&gt; &gt; &gt; value of<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; which is 16. I<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; assume you know what you are doing to the<br>

&gt; &gt; &gt; &gt; performance of<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; the system by<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; increasing this number.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     So, given the above, is our analysis<br>

&gt; &gt; sound, and<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; if so, is there a way to increase the size<br>

&gt; of the<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; pool of active worker threads?  The objective<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; being to allow unblocked access to an<br>

existing<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; repository of files (on SaturnM) while a<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; secondary/back-up is being filled, via<br>

&gt; GlusterFS?<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;     Note that I understand that performance<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; (through-put) will be an issue in the<br>

described<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; environment: this replication process is<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; estimated to run for between 10 and 40 hours,<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; which is acceptable so long as it isn&#39;t<br>

&gt; blocking<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; (there&#39;s a production-capable file set in<br>

&gt; place).<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; Any help appreciated.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; Please let us know how it goes.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; Thanks,<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; --<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; Ian Latter<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; Late night coder ..<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

_______________________________________________<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; Gluster-devel mailing list<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt; <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; --<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; Ian Latter<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; Late night coder ..<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

_______________________________________________<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; Gluster-devel mailing list<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt; <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt; &gt; <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; hi Ian,<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;        inline replies with&lt;pranithk&gt;.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt; Pranith.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; --<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Ian Latter<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Late night coder ..<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; hi Ian,<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt;       Maintaining a queue of files that need to be<br>

&gt; &gt; &gt; &gt; &gt; &gt; self-healed does not<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; scale in practice, in cases where there are<br>

&gt; &gt; millions of<br>

&gt; &gt; &gt; &gt; &gt; &gt; files that need<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; self-heal. So such a thing is not implemented. The<br>

&gt; &gt; &gt; idea is<br>

&gt; &gt; &gt; &gt; &gt; &gt; to make<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; self-heal foreground after a certain-limit<br>

&gt; &gt; &gt; &gt; &gt; &gt; (background-self-heal-count)<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; so there is no necessity for such a queue.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt; Pranith.<br>

&gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; --<br>

&gt; &gt; &gt; &gt; &gt; &gt; Ian Latter<br>

&gt; &gt; &gt; &gt; &gt; &gt; Late night coder ..<br>

&gt; &gt; &gt; &gt; &gt; &gt; <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>

&gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; &gt; _______________________________________________<br>

&gt; &gt; &gt; &gt; &gt; &gt; Gluster-devel mailing list<br>

&gt; &gt; &gt; &gt; &gt; &gt; <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>

&gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

&gt; &gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; --<br>

&gt; &gt; &gt; &gt; &gt; Ian Latter<br>

&gt; &gt; &gt; &gt; &gt; Late night coder ..<br>

&gt; &gt; &gt; &gt; &gt; <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; _______________________________________________<br>

&gt; &gt; &gt; &gt; &gt; Gluster-devel mailing list<br>

&gt; &gt; &gt; &gt; &gt; <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>

&gt; &gt; &gt; &gt; &gt;<br>

<a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; --<br>

&gt; &gt; &gt; &gt; Ian Latter<br>

&gt; &gt; &gt; &gt; Late night coder ..<br>

&gt; &gt; &gt; &gt; <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; _______________________________________________<br>

&gt; &gt; &gt; &gt; Gluster-devel mailing list<br>

&gt; &gt; &gt; &gt; <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>

&gt; &gt; &gt; &gt; <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; --<br>

&gt; &gt; &gt; Ian Latter<br>

&gt; &gt; &gt; Late night coder ..<br>

&gt; &gt; &gt; <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>

&gt; &gt; &gt; _______________________________________________<br>

&gt; &gt; &gt; Gluster-devel mailing list<br>

&gt; &gt; &gt; <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>

&gt; &gt; &gt; <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

&gt; &gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; --<br>

&gt; &gt; Ian Latter<br>

&gt; &gt; Late night coder ..<br>

&gt; &gt; <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>

&gt; &gt; _______________________________________________<br>

&gt; &gt; Gluster-devel mailing list<br>

&gt; &gt; <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>

&gt; &gt; <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

&gt; &gt;<br>

&gt;<br>

&gt;<br>

&gt; --<br>

&gt; Ian Latter<br>

&gt; Late night coder ..<br>

&gt; <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; Gluster-devel mailing list<br>

&gt; <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>

&gt; <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

&gt;<br>

<br>

<br>

--<br>

Ian Latter<br>

Late night coder ..<br>

<a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>

<br>

_______________________________________________<br>

Gluster-devel mailing list<br>

<a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>

<a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

</div></div></blockquote></div><br>