<div dir="ltr">Thank you for that information.<div><br></div><div>Are there plans to restore the previous functionality in a later release of 3.6.x? Or is this what we should expect going forward?</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 20, 2014 at 11:24 PM, Anuradha Talur <span dir="ltr"><<a href="mailto:atalur@redhat.com" target="_blank">atalur@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
<br>
----- Original Message -----<br>
> From: "Joe Julian" <<a href="mailto:joe@julianfamily.org">joe@julianfamily.org</a>><br>
> To: "Anuradha Talur" <<a href="mailto:atalur@redhat.com">atalur@redhat.com</a>>, "Vince Loschiavo" <<a href="mailto:vloschiavo@gmail.com">vloschiavo@gmail.com</a>><br>
> Cc: "<a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>" <<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>><br>
</span><div><div class="h5">> Sent: Friday, November 21, 2014 12:06:27 PM<br>
> Subject: Re: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios related)<br>
><br>
><br>
><br>
> On November 20, 2014 10:01:45 PM PST, Anuradha Talur <<a href="mailto:atalur@redhat.com">atalur@redhat.com</a>><br>
> wrote:<br>
> ><br>
> ><br>
> >----- Original Message -----<br>
> >> From: "Vince Loschiavo" <<a href="mailto:vloschiavo@gmail.com">vloschiavo@gmail.com</a>><br>
> >> To: "<a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>" <<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>><br>
> >> Sent: Wednesday, November 19, 2014 9:50:50 PM<br>
> >> Subject: [Gluster-users] v3.6.1 vs v3.5.2 self heal - help (Nagios<br>
> >related)<br>
> >><br>
> >><br>
> >> Hello Gluster Community,<br>
> >><br>
> >> I have been using the Nagios monitoring scripts, mentioned in the<br>
> >below<br>
> >> thread, on 3.5.2 with great success. The most useful of these is the<br>
> >self<br>
> >> heal.<br>
> >><br>
> >> However, I've just upgraded to 3.6.1 on the lab and the self heal<br>
> >daemon has<br>
> >> become quite aggressive. I continually get alerts/warnings on 3.6.1<br>
> >that<br>
> >> virt disk images need self heal, then they clear. This is not the<br>
> >case on<br>
> >> 3.5.2. This<br>
> >><br>
> >> Configuration:<br>
> >> 2 node, 2 brick replicated volume with 2x1GB LAG network between the<br>
> >peers<br>
> >> using this volume as a QEMU/KVM virt image store through the fuse<br>
> >mount on<br>
> >> Centos 6.5.<br>
> >><br>
> >> Example:<br>
> >> on 3.5.2:<br>
> >> gluster volume heal volumename info: shows the bricks and number of<br>
> >entries<br>
> >> to be healed: 0<br>
> >><br>
> >> On v3.5.2 - During normal gluster operations, I can run this command<br>
> >over and<br>
> >> over again, 2-4 times per second, and it will always show 0 entries<br>
> >to be<br>
> >> healed. I've used this as an indicator that the bricks are<br>
> >synchronized.<br>
> >><br>
> >> Last night, I upgraded to 3.6.1 in lab and I'm seeing different<br>
> >behavior.<br>
> >> Running gluster volume heal volumename info , during normal<br>
> >operations, will<br>
> >> show a file out-of-sync, seemingly between every block written to<br>
> >disk then<br>
> >> synced to the peer. I can run the command over and over again, 2-4<br>
> >times per<br>
> >> second, and it will almost always show something out of sync. The<br>
> >individual<br>
> >> files change, meaning:<br>
> >><br>
> >> Example:<br>
> >> 1st Run: shows file1 out of sync<br>
> >> 2nd run: shows file 2 and file 3 out of sync but file 1 is now in<br>
> >sync (not<br>
> >> in the list)<br>
> >> 3rd run: shows file 3 and file 4 out of sync but file 1 and 2 are in<br>
> >sync<br>
> >> (not in the list).<br>
> >> ...<br>
> >> nth run: shows 0 files out of sync<br>
> >> nth+1 run: shows file 3 and 12 out of sync.<br>
> >><br>
> >> From looking at the virtual machines running off this gluster volume,<br>
> >it's<br>
> >> obvious that gluster is working well. However, this obviously plays<br>
> >havoc<br>
> >> with Nagios and alerts. Nagios will run the heal info and get<br>
> >different and<br>
> >> non-useful results each time, and will send alerts.<br>
> >><br>
> >> Is this behavior change (3.5.2 vs 3.6.1) expected? Is there a way to<br>
> >tune the<br>
> >> settings or change the monitoring method to get better results into<br>
> >Nagios.<br>
> >><br>
> >In 3.6.1 the way heal info command works is different from that in<br>
> >3.5.2. In 3.6.1, it is self-heal daemon that gathers the entries that<br>
> >might need healing. Currently, in 3.6.1, there isn't a method to<br>
> >distinguish between a file that is being healed and a file with<br>
> >on-going I/O while listing. Hence you see files with normal operation<br>
> >too listed in the output of heal info command.<br>
><br>
> How did that regression pass?!<br>
</div></div>Test cases to check this condition was not written in regression tests.<br>
><br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Thanks,<br>
Anuradha.<br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">-Vince Loschiavo<br></div>
</div>