Thanks for the report Ian. I have filed a bug report <a href="https://bugzilla.redhat.com/show_bug.cgi?id=809982">https://bugzilla.redhat.com/show_bug.cgi?id=809982</a><br><br><div class="gmail_quote">On Wed, Apr 4, 2012 at 4:57 AM, Ian Latter <span dir="ltr"><<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
Sorry;<br>
<br>
That "long (unsigned 32bit)" should have been<br>
"long (signed 32bit)" ... so that's twice that bug has<br>
bitten ;-)<br>
<div class="im HOEnZb"><br>
<br>
Cheers,<br>
<br>
<br>
----- Original Message -----<br>
</div><div class="im HOEnZb">>From: "Ian Latter" <<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>><br>
>To: "Pranith Kumar K" <<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>><br>
</div><div class="im HOEnZb">>Subject: [Gluster-devel] SOLVED - Re: replicate<br>
background threads<br>
</div><div class="HOEnZb"><div class="h5">>Date: Wed, 04 Apr 2012 21:51:11 +1000<br>
><br>
> Hello,<br>
><br>
><br>
> Michael and I ran a battery of testing today and<br>
> closed out the two issues identified below (of March<br>
> 11).<br>
><br>
><br>
> FYI RE the "background-self-heal-only" patch;<br>
><br>
> It has been tested now to our satisfaction and<br>
> works as described/intended.<br>
><br>
><br>
><br>
<a href="http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-background-only.patch" target="_blank">http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-background-only.patch</a><br>
><br>
><br>
><br>
> FYI RE the 2GB replicate error;<br>
><br>
> >>> 2) Of the file that were replicated, not all were<br>
> >>> corrupted (capped at 2G -- note that we<br>
> >>> confirmed that this was the first 2G of the<br>
> >>> source file contents).<br>
> >>><br>
> >>> So is there a known replicate issue with files<br>
> >>> greater than 2GB?<br>
><br>
> We have confirmed this issue and the referenced<br>
> patch appears to correct the problem. We were<br>
> able to get one particular file to reliably fail at 2GB<br>
> under GlusterFS 3.2.6, and then correctly<br>
> transfer it and many other >2GB files, after<br>
> applying this patch.<br>
><br>
> The error stems from putting the off_t (64bit)<br>
> offset value into a void * cookie value typecast<br>
> as long (unsigned 32bit) and then restoring it into<br>
> an off_t again. The tip-off was a recurring offset<br>
> of 18446744071562067968 seen in the logs. The<br>
> effect is described well here;<br>
><br>
><br>
<a href="http://stackoverflow.com/questions/5628484/unexpected-behavior-from-unsigned-int64" target="_blank">http://stackoverflow.com/questions/5628484/unexpected-behavior-from-unsigned-int64</a><br>
><br>
> We can't explain why this issue was intermittent,<br>
> and we're not sure if the rw_sh->offset is the<br>
> correct 64bit offset to use. However that offset<br>
> appeared to match the cookie value in all tested<br>
> pre-failure states. Please advise if there is a<br>
> better (more correct) off_t offset to use.<br>
><br>
><br>
><br>
<a href="http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-2GB.patch" target="_blank">http://midnightcode.org/projects/saturn/code/glusterfs-3.2.6-2GB.patch</a><br>
><br>
><br>
><br>
> Thanks for your help,<br>
><br>
><br>
><br>
><br>
> ----- Original Message -----<br>
> >From: "Ian Latter" <<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>><br>
> >To: "Pranith Kumar K" <<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>><br>
> >Subject: Re: [Gluster-devel] replicate background threads<br>
> >Date: Tue, 03 Apr 2012 20:41:48 +1000<br>
> ><br>
> ><br>
> > Pizza reveals all ;-)<br>
> ><br>
> > There's an error in there with the LOCK going<br>
> > without a paired UNLOCK in the afr-common<br>
> > test. Revised (untested) patch attached.<br>
> ><br>
> ><br>
> ><br>
> ><br>
> > ----- Original Message -----<br>
> > >From: "Ian Latter" <<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>><br>
> > >To: "Pranith Kumar K" <<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>><br>
> > >Subject: Re: [Gluster-devel] replicate background threads<br>
> > >Date: Tue, 03 Apr 2012 19:46:51 +1000<br>
> > ><br>
> > ><br>
> > > FYI - untested patch attached.<br>
> > ><br>
> > ><br>
> > ><br>
> > > ----- Original Message -----<br>
> > > >From: "Ian Latter" <<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>><br>
> > > >To: "Pranith Kumar K" <<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>><br>
> > > >Subject: Re: [Gluster-devel] replicate background<br>
threads<br>
> > > >Date: Tue, 03 Apr 2012 18:50:11 +1000<br>
> > > ><br>
> > > ><br>
> > > > FYI - I can see that this option doesn't exist, I'm<br>
> > adding it<br>
> > > > now.<br>
> > > ><br>
> > > ><br>
> > > > ----- Original Message -----<br>
> > > > >From: "Ian Latter" <<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>><br>
> > > > >To: "Pranith Kumar K" <<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>><br>
> > > > >Subject: Re: [Gluster-devel] replicate background<br>
> threads<br>
> > > > >Date: Mon, 02 Apr 2012 18:02:26 +1000<br>
> > > > ><br>
> > > > ><br>
> > > > > Hello Pranith,<br>
> > > > ><br>
> > > > ><br>
> > > > > Michael has come back from his business trip and<br>
> > > > > we're about to start testing again (though now under<br>
> > > > > kernel 3.2.13 and GlusterFS 3.2.6).<br>
> > > > ><br>
> > > > > I've published the 32bit (i586) client on the<br>
Saturn<br>
> > > > > project site if anyone is chasing it;<br>
> > > > > <a href="http://midnightcode.org/projects/saturn/" target="_blank">http://midnightcode.org/projects/saturn/</a><br>
> > > > ><br>
> > > > > One quick question, is there a tune-able parameter<br>
> > > > > that will allow a stat to be non blocking (i.e. to<br>
stop<br>
> > > > > self-heal going foreground) when the background<br>
> > > > > self heal count is reached?<br>
> > > > > I.e. rather than having the stat hang for 2 days<br>
> > > > > while the files are replicated, we'd rather it fell<br>
> > > > > through and allowed subsequent stats to attempt<br>
> > > > > background self healing (perhaps at a time when<br>
> > > > > background self heal slots are available).<br>
> > > > ><br>
> > > > ><br>
> > > > > Thanks,<br>
> > > > ><br>
> > > > ><br>
> > > > ><br>
> > > > > ----- Original Message -----<br>
> > > > > >From: "Ian Latter" <<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>><br>
> > > > > >To: "Pranith Kumar K" <<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>><br>
> > > > > >Subject: Re: [Gluster-devel] replicate background<br>
> > threads<br>
> > > > > >Date: Wed, 14 Mar 2012 19:36:24 +1000<br>
> > > > > ><br>
> > > > > > Hello,<br>
> > > > > ><br>
> > > > > > > hi Ian,<br>
> > > > > > > Maintaining a queue of files that need to be<br>
> > > > > > > self-healed does not scale in practice, in cases<br>
> > > > > > > where there are millions of files that need self-<br>
> > > > > > > heal. So such a thing is not implemented. The<br>
> > > > > > > idea is to make self-heal foreground after a<br>
> > > > > > > certain-limit (background-self-heal-count) so<br>
> > > > > > > there is no necessity for such a queue.<br>
> > > > > > ><br>
> > > > > > > Pranith.<br>
> > > > > ><br>
> > > > > > Ok, I understand - it will be interesting to observe<br>
> > > > > > the system with the new knowledge from your<br>
> > > > > > messages - thanks for your help, appreciate it.<br>
> > > > > ><br>
> > > > > ><br>
> > > > > > Cheers,<br>
> > > > > ><br>
> > > > > > ----- Original Message -----<br>
> > > > > > >From: "Pranith Kumar K" <<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>><br>
> > > > > > >To: "Ian Latter" <<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>><br>
> > > > > > >Subject: Re: [Gluster-devel] replicate background<br>
> > > threads<br>
> > > > > > >Date: Wed, 14 Mar 2012 07:33:32 +0530<br>
> > > > > > ><br>
> > > > > > > On 03/14/2012 01:47 AM, Ian Latter wrote:<br>
> > > > > > > > Thanks for the info Pranith;<br>
> > > > > > > ><br>
> > > > > > > > <pranithk> the option to increase the max<br>
num of<br>
> > > > > background<br>
> > > > > > > > self-heals<br>
> > > > > > > > is cluster.background-self-heal-count. Default<br>
> > > value of<br>
> > > > > > > > which is 16. I<br>
> > > > > > > > assume you know what you are doing to the<br>
> > performance<br>
> > > > > of the<br>
> > > > > > > > system by<br>
> > > > > > > > increasing this number.<br>
> > > > > > > ><br>
> > > > > > > ><br>
> > > > > > > > I didn't know this. Is there a queue length for<br>
> > what<br>
> > > > > > > > is yet to be handled by the background self heal<br>
> > > > > > > > count? If so, can it also be adjusted?<br>
> > > > > > > ><br>
> > > > > > > ><br>
> > > > > > > > ----- Original Message -----<br>
> > > > > > > >> From: "Pranith Kumar K"<<a href="mailto:pranithk@gluster.com">pranithk@gluster.com</a>><br>
> > > > > > > >> To: "Ian Latter"<<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>><br>
> > > > > > > >> Subject: Re: [Gluster-devel] replicate<br>
> background<br>
> > > > > threads<br>
> > > > > > > >> Date: Tue, 13 Mar 2012 21:07:53 +0530<br>
> > > > > > > >><br>
> > > > > > > >> On 03/13/2012 07:52 PM, Ian Latter wrote:<br>
> > > > > > > >>> Hello,<br>
> > > > > > > >>><br>
> > > > > > > >>><br>
> > > > > > > >>> Well we've been privy to our first true<br>
> > error in<br>
> > > > > > > >>> Gluster now, and we're not sure of the cause.<br>
> > > > > > > >>><br>
> > > > > > > >>> The SaturnI machine with 1Gbyte of RAM was<br>
> > > > > > > >>> exhausting its memory and crashing and we saw<br>
> > > > > > > >>> core dumps on SaturnM and MMC. Replacing<br>
> > > > > > > >>> the SaturnI hardware with identical<br>
hardware to<br>
> > > > > > > >>> SaturnM, but retaining SaturnI's original<br>
disks,<br>
> > > > > > > >>> (so fixing the memory capacity problem) we saw<br>
> > > > > > > >>> crashes randomly at all nodes.<br>
> > > > > > > >>><br>
> > > > > > > >>> Looking for irregularities at the file<br>
> system<br>
> > > > > > > >>> we noticed that (we'd estimate) about 60% of<br>
> > > > > > > >>> the files at the OS/EXT3 layer of SaturnI<br>
> > > > > > > >>> (sourced via replicate from SaturnM) were of<br>
> > > > > > > >>> size <a href="tel:2147483648" value="+12147483648">2147483648</a> (2^31) where they should<br>
> > > > > > > >>> have been substantially larger. While we<br>
would<br>
> > > > > > > >>> happily accept "you shouldn't be using a 32bit<br>
> > > > > > > >>> gluster package" as the answer, we note two<br>
> > > > > > > >>> deltas;<br>
> > > > > > > >>> 1) All files used in testing were copied<br>
> > on from<br>
> > > > > > > >>> 32 bit clients to 32 bit servers,<br>
> with no<br>
> > > > > > > >>> observable errors<br>
> > > > > > > >>> 2) Of the file that were replicated,<br>
not all<br>
> > > were<br>
> > > > > > > >>> corrupted (capped at 2G -- note<br>
that we<br>
> > > > > > > >>> confirmed that this was the first 2G<br>
> > of the<br>
> > > > > > > >>> source file contents).<br>
> > > > > > > >>><br>
> > > > > > > >>><br>
> > > > > > > >>> So is there a known replicate issue with files<br>
> > > > > > > >>> greater than 2GB? Has anyone done any<br>
> > > > > > > >>> serious testing with significant numbers of<br>
> files<br>
> > > > > > > >>> of this size? Are there configurations<br>
specific<br>
> > > > > > > >>> to files/structures of these dimensions?<br>
> > > > > > > >>><br>
> > > > > > > >>> We noted that reversing the configuration,<br>
such<br>
> > > > > > > >>> that SaturnI provides the replicate Brick<br>
> amongst<br>
> > > > > > > >>> a local distribute and a remote map to SaturnM<br>
> > > > > > > >>> where SaturnM simply serves a local<br>
distribute;<br>
> > > > > > > >>> that the data served to MMC is accurate (it<br>
> > > > > > > >>> continues to show 15GB files, even where there<br>
> > > > > > > >>> is a local 2GB copy). Further, a client<br>
"cp" at<br>
> > > > > > > >>> MMC, of a file with a 2GB local replicate of a<br>
> > > > > > > >>> 15GB file, will result in a 15GB file being<br>
> > > > > > > >>> created and replicated via Gluster (i.e. the<br>
> > > > > > > >>> correct specification at both server nodes).<br>
> > > > > > > >>><br>
> > > > > > > >>> So my other question is; Is it possible that<br>
> we've<br>
> > > > > > > >>> managed to corrupt something in this<br>
> > > > > > > >>> environment? I.e. during the initial memory<br>
> > > > > > > >>> exhaustion events? And is there a robust way<br>
> > > > > > > >>> to have the replicate files revalidated by<br>
> gluster<br>
> > > > > > > >>> as a stat doesn't seem to be correcting<br>
files in<br>
> > > > > > > >>> this state (i.e. replicate on SaturnM<br>
results in<br>
> > > > > > > >>> daemon crashes, replicate on SaturnI results<br>
> > > > > > > >>> in files being left in the bad state).<br>
> > > > > > > >>><br>
> > > > > > > >>><br>
> > > > > > > >>> Also, I'm not a member of the users list; if<br>
> these<br>
> > > > > > > >>> questions are better posed there then let me<br>
> > > > > > > >>> know and I'll re-post them there.<br>
> > > > > > > >>><br>
> > > > > > > >>><br>
> > > > > > > >>><br>
> > > > > > > >>> Thanks,<br>
> > > > > > > >>><br>
> > > > > > > >>><br>
> > > > > > > >>><br>
> > > > > > > >>><br>
> > > > > > > >>><br>
> > > > > > > >>> ----- Original Message -----<br>
> > > > > > > >>>> From: "Ian<br>
Latter"<<a href="mailto:ian.latter@midnightcode.org">ian.latter@midnightcode.org</a>><br>
> > > > > > > >>>> To:<<a href="mailto:gluster-devel@nongnu.org">gluster-devel@nongnu.org</a>><br>
> > > > > > > >>>> Subject: [Gluster-devel] replicate<br>
background<br>
> > > > threads<br>
> > > > > > > >>>> Date: Sun, 11 Mar 2012 20:17:15 +1000<br>
> > > > > > > >>>><br>
> > > > > > > >>>> Hello,<br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
> > > > > > > >>>> My mate Michael and I have been steadily<br>
> > > > > > > >>>> advancing our Gluster testing and today we<br>
> > finally<br>
> > > > > > > >>>> reached some heavier conditions. The outcome<br>
> > > > > > > >>>> was different from expectations built from<br>
> > our more<br>
> > > > > > > >>>> basic testing so I think we have a couple of<br>
> > > > > > > >>>> questions regarding the AFR/Replicate<br>
> background<br>
> > > > > > > >>>> threads that may need a developer's<br>
> contribution.<br>
> > > > > > > >>>> Any help appreciated.<br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
> > > > > > > >>>> The setup is a 3 box environment, but<br>
lets<br>
> > > start<br>
> > > > > > > >>>> with two;<br>
> > > > > > > >>>><br>
> > > > > > > >>>> SaturnM (Server)<br>
> > > > > > > >>>> - 6core CPU, 16GB RAM, 1Gbps net<br>
> > > > > > > >>>> - 3.2.6 Kernel (custom distro)<br>
> > > > > > > >>>> - 3.2.5 Gluster (32bit)<br>
> > > > > > > >>>> - 3x2TB drives, CFQ, EXT3<br>
> > > > > > > >>>> - Bricked up into a single local 6TB<br>
> > > > > > > >>>> "distribute" brick<br>
> > > > > > > >>>> - "brick" served to the network<br>
> > > > > > > >>>><br>
> > > > > > > >>>> MMC (Client)<br>
> > > > > > > >>>> - 4core CPU, 8GB RAM, 1Gbps net<br>
> > > > > > > >>>> - Ubuntu<br>
> > > > > > > >>>> - 3.2.5 Gluster (32bit)<br>
> > > > > > > >>>> - Don't recall the disk space<br>
locally<br>
> > > > > > > >>>> - "brick" from SaturnM mounted<br>
> > > > > > > >>>><br>
> > > > > > > >>>> 500 x 15Gbyte files were copied<br>
from MMC<br>
> > > > > > > >>>> to a single sub-directory on the brick served<br>
> > from<br>
> > > > > > > >>>> SaturnM, all went well and dandy. So then we<br>
> > > > > > > >>>> moved on to a 3 box environment;<br>
> > > > > > > >>>><br>
> > > > > > > >>>> SaturnI (Server)<br>
> > > > > > > >>>> = 1core CPU, 1GB RAM, 1Gbps net<br>
> > > > > > > >>>> = 3.2.6 Kernel (custom distro)<br>
> > > > > > > >>>> = 3.2.5 Gluster (32bit)<br>
> > > > > > > >>>> = 4x2TB drives, CFQ, EXT3<br>
> > > > > > > >>>> = Bricked up into a single local 8TB<br>
> > > > > > > >>>> "distribute" brick<br>
> > > > > > > >>>> = "brick" served to the network<br>
> > > > > > > >>>><br>
> > > > > > > >>>> SaturnM (Server/Client)<br>
> > > > > > > >>>> - 6core CPU, 16GB RAM, 1Gbps net<br>
> > > > > > > >>>> - 3.2.6 Kernel (custom distro)<br>
> > > > > > > >>>> - 3.2.5 Gluster (32bit)<br>
> > > > > > > >>>> - 3x2TB drives, CFQ, EXT3<br>
> > > > > > > >>>> - Bricked up into a single local 6TB<br>
> > > > > > > >>>> "distribute" brick<br>
> > > > > > > >>>> = Replicate brick added to sit over<br>
> > > > > > > >>>> the local distribute brick and a<br>
> > > > > > > >>>> client "brick" mapped from<br>
SaturnI<br>
> > > > > > > >>>> - Replicate "brick" served to the<br>
> > network<br>
> > > > > > > >>>><br>
> > > > > > > >>>> MMC (Client)<br>
> > > > > > > >>>> - 4core CPU, 8GB RAM, 1Gbps net<br>
> > > > > > > >>>> - Ubuntu<br>
> > > > > > > >>>> - 3.2.5 Gluster (32bit)<br>
> > > > > > > >>>> - Don't recall the disk space<br>
locally<br>
> > > > > > > >>>> - "brick" from SaturnM mounted<br>
> > > > > > > >>>> = "brick" from SaturnI mounted<br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
> > > > > > > >>>> Now, in lesser testing in this scenario<br>
> > all was<br>
> > > > > > > >>>> well - any files on SaturnI would appear on<br>
> > SaturnM<br>
> > > > > > > >>>> (not a functional part of our test) and the<br>
> > > > content on<br>
> > > > > > > >>>> SaturnM would appear on SaturnI (the real<br>
> > > > > > > >>>> objective).<br>
> > > > > > > >>>><br>
> > > > > > > >>>> Earlier testing used a handful of smaller<br>
> > files<br>
> > > > > (10s<br>
> > > > > > > >>>> to 100s of Mbytes) and a single 15Gbyte file.<br>
> > The<br>
> > > > > > > >>>> 15Gbyte file would be "stat" via an "ls",<br>
which<br>
> > > would<br>
> > > > > > > >>>> kick off a background replication (ls<br>
> > appeared un-<br>
> > > > > > > >>>> blocked) and the file would be transferred.<br>
> > Also,<br>
> > > > > > > >>>> interrupting the transfer (pulling the LAN<br>
> cable)<br>
> > > > > > > >>>> would result in a partial 15Gbyte file being<br>
> > > > corrected<br>
> > > > > > > >>>> in a subsequent background process on another<br>
> > > > > > > >>>> stat.<br>
> > > > > > > >>>><br>
> > > > > > > >>>> *However* .. when confronted with 500 x<br>
> > 15Gbyte<br>
> > > > > > > >>>> files, in a single directory (but not the<br>
root<br>
> > > > > directory)<br>
> > > > > > > >>>> things don't quite work out as nicely.<br>
> > > > > > > >>>> - First, the "ls" (at MMC against the<br>
> SaturnM<br>
> > > > > brick)<br>
> > > > > > > >>>> is blocking and hangs the terminal<br>
> (ctrl-c<br>
> > > > > doesn't<br>
> > > > > > > >>>> kill it).<br>
> > > > > > > >> <pranithk> At max 16 files can be self-healed<br>
> > in the<br>
> > > > > > > > back-ground in<br>
> > > > > > > >> parallel. 17th file self-heal will happen<br>
in the<br>
> > > > > > foreground.<br>
> > > > > > > >>>> - Then, looking from MMC at the SaturnI<br>
> file<br>
> > > > > > > >>>> system (ls -s) once per second,<br>
and then<br>
> > > > > > > >>>> comparing the output (diff ls1.txt<br>
> > ls2.txt |<br>
> > > > > > > >>>> grep -v '>') we can see that<br>
between 10<br>
> > > and 17<br>
> > > > > > > >>>> files are being updated simultaneously<br>
> > > by the<br>
> > > > > > > >>>> background process<br>
> > > > > > > >> <pranithk> This is expected.<br>
> > > > > > > >>>> - Further, a request at MMC for a<br>
> single file<br>
> > > > that<br>
> > > > > > > >>>> was originally in the 500 x 15Gbyte<br>
> > > sub-dir on<br>
> > > > > > > >>>> SaturnM (which should return<br>
> unblocked with<br>
> > > > > > > >>>> correct results) will;<br>
> > > > > > > >>>> a) work as expected if there are less<br>
> > > than 17<br>
> > > > > > > >>>> active background file tasks<br>
> > > > > > > >>>> b) block/hang if there are more<br>
> > > > > > > >>>> - Where-as a stat (ls) outside of the 500<br>
> > x 15<br>
> > > > > > > >>>> sub-directory, such as the root of<br>
that<br>
> > > brick,<br>
> > > > > > > >>>> would always work as expected (return<br>
> > > > > > > >>>> immediately, unblocked).<br>
> > > > > > > >> <pranithk> stat on the directory will only<br>
> > > create the<br>
> > > > > > > > files with '0'<br>
> > > > > > > >> file size. Then when you ls/stat the actual<br>
> > file the<br>
> > > > > > > > self-heal for the<br>
> > > > > > > >> file gets triggered.<br>
> > > > > > > >>>><br>
> > > > > > > >>>> Thus, to us, it appears as though there<br>
> is a<br>
> > > > > > > >>>> queue feeding a set of (around) 16 worker<br>
> threads<br>
> > > > > > > >>>> in AFR. If your request was to the loaded<br>
> > > directory<br>
> > > > > > > >>>> then you would be blocked until a worker was<br>
> > > > > > > >>>> available, and if your request was to any<br>
other<br>
> > > > > > > >>>> location, it would return unblocked<br>
> regardless of<br>
> > > > > > > >>>> the worker pool state.<br>
> > > > > > > >>>><br>
> > > > > > > >>>> The only thread metric that we could<br>
> find to<br>
> > > > tweak<br>
> > > > > > > >>>> was performance/io-threads (which was set to<br>
> > > > > > > >>>> 16 per physical disk; well per locks + posix<br>
> > brick<br>
> > > > > > > >>>> stacks) but increasing this to 64 per stack<br>
> > didn't<br>
> > > > > > > >>>> change the outcome (10 to 17 active<br>
background<br>
> > > > > > > >>>> transfers).<br>
> > > > > > > >> <pranithk> the option to increase the max<br>
num of<br>
> > > > > > > > background self-heals<br>
> > > > > > > >> is cluster.background-self-heal-count. Default<br>
> > > value of<br>
> > > > > > > > which is 16. I<br>
> > > > > > > >> assume you know what you are doing to the<br>
> > > > performance of<br>
> > > > > > > > the system by<br>
> > > > > > > >> increasing this number.<br>
> > > > > > > >>>><br>
> > > > > > > >>>> So, given the above, is our analysis<br>
> > sound, and<br>
> > > > > > > >>>> if so, is there a way to increase the size<br>
> of the<br>
> > > > > > > >>>> pool of active worker threads? The objective<br>
> > > > > > > >>>> being to allow unblocked access to an<br>
existing<br>
> > > > > > > >>>> repository of files (on SaturnM) while a<br>
> > > > > > > >>>> secondary/back-up is being filled, via<br>
> GlusterFS?<br>
> > > > > > > >>>><br>
> > > > > > > >>>> Note that I understand that performance<br>
> > > > > > > >>>> (through-put) will be an issue in the<br>
described<br>
> > > > > > > >>>> environment: this replication process is<br>
> > > > > > > >>>> estimated to run for between 10 and 40 hours,<br>
> > > > > > > >>>> which is acceptable so long as it isn't<br>
> blocking<br>
> > > > > > > >>>> (there's a production-capable file set in<br>
> place).<br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
> > > > > > > >>>> Any help appreciated.<br>
> > > > > > > >>>><br>
> > > > > > > >> Please let us know how it goes.<br>
> > > > > > > >>>> Thanks,<br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
> > > > > > > >>>> --<br>
> > > > > > > >>>> Ian Latter<br>
> > > > > > > >>>> Late night coder ..<br>
> > > > > > > >>>> <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>
> > > > > > > >>>><br>
> > > > > > > >>>><br>
_______________________________________________<br>
> > > > > > > >>>> Gluster-devel mailing list<br>
> > > > > > > >>>> <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>
> > > > > > > >>>><br>
> > > > <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>
> > > > > > > >>>><br>
> > > > > > > >>> --<br>
> > > > > > > >>> Ian Latter<br>
> > > > > > > >>> Late night coder ..<br>
> > > > > > > >>> <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>
> > > > > > > >>><br>
> > > > > > > >>><br>
_______________________________________________<br>
> > > > > > > >>> Gluster-devel mailing list<br>
> > > > > > > >>> <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>
> > > > > > > >>><br>
> > > > <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>
> > > > > > > >> hi Ian,<br>
> > > > > > > >> inline replies with<pranithk>.<br>
> > > > > > > >><br>
> > > > > > > >> Pranith.<br>
> > > > > > > >><br>
> > > > > > > ><br>
> > > > > > > > --<br>
> > > > > > > > Ian Latter<br>
> > > > > > > > Late night coder ..<br>
> > > > > > > > <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>
> > > > > > > hi Ian,<br>
> > > > > > > Maintaining a queue of files that need to be<br>
> > > > > > self-healed does not<br>
> > > > > > > scale in practice, in cases where there are<br>
> > millions of<br>
> > > > > > files that need<br>
> > > > > > > self-heal. So such a thing is not implemented. The<br>
> > > idea is<br>
> > > > > > to make<br>
> > > > > > > self-heal foreground after a certain-limit<br>
> > > > > > (background-self-heal-count)<br>
> > > > > > > so there is no necessity for such a queue.<br>
> > > > > > ><br>
> > > > > > > Pranith.<br>
> > > > > > ><br>
> > > > > ><br>
> > > > > ><br>
> > > > > > --<br>
> > > > > > Ian Latter<br>
> > > > > > Late night coder ..<br>
> > > > > > <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>
> > > > > ><br>
> > > > > > _______________________________________________<br>
> > > > > > Gluster-devel mailing list<br>
> > > > > > <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>
> > > > > ><br>
> <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>
> > > > > ><br>
> > > > ><br>
> > > > ><br>
> > > > > --<br>
> > > > > Ian Latter<br>
> > > > > Late night coder ..<br>
> > > > > <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>
> > > > ><br>
> > > > > _______________________________________________<br>
> > > > > Gluster-devel mailing list<br>
> > > > > <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>
> > > > ><br>
<a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>
> > > > ><br>
> > > ><br>
> > > ><br>
> > > > --<br>
> > > > Ian Latter<br>
> > > > Late night coder ..<br>
> > > > <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>
> > > ><br>
> > > > _______________________________________________<br>
> > > > Gluster-devel mailing list<br>
> > > > <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>
> > > > <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>
> > > ><br>
> > ><br>
> > ><br>
> > > --<br>
> > > Ian Latter<br>
> > > Late night coder ..<br>
> > > <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>
> > > _______________________________________________<br>
> > > Gluster-devel mailing list<br>
> > > <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>
> > > <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>
> > ><br>
> ><br>
> ><br>
> > --<br>
> > Ian Latter<br>
> > Late night coder ..<br>
> > <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>
> > _______________________________________________<br>
> > Gluster-devel mailing list<br>
> > <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>
> > <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>
> ><br>
><br>
><br>
> --<br>
> Ian Latter<br>
> Late night coder ..<br>
> <a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>
><br>
> _______________________________________________<br>
> Gluster-devel mailing list<br>
> <a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>
> <a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>
><br>
<br>
<br>
--<br>
Ian Latter<br>
Late night coder ..<br>
<a href="http://midnightcode.org/" target="_blank">http://midnightcode.org/</a><br>
<br>
_______________________________________________<br>
Gluster-devel mailing list<br>
<a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>
<a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>
</div></div></blockquote></div><br>