<div dir="ltr"><div><div>Joe,<br>Thanks for your reply.<br>I grep'd the logs for the name of one of the files that had become unreachable over NFS after resync (i/o error). It comes up in <volumename>.log and nfs.log on the node that had stayed online:<br>
The relevant logs are here :<br><a href="https://gist.github.com/nicolasochem/f9d24a2bf57b0d40bb7d">https://gist.github.com/nicolasochem/f9d24a2bf57b0d40bb7d</a><br><br></div>One important piece of information is that the node that was taken offline had previously filled up the root filesystem because of a memory/southbridge issue which filled /var/log/messages completely. Upon restoration of the machine, gluster did not come up because one file in /var/lib/gluster/peers was empty.<br>
</div><div>The issue is described there : <a href="https://bugzilla.redhat.com/show_bug.cgi?id=858732">https://bugzilla.redhat.com/show_bug.cgi?id=858732</a><br></div><div><br>I removed the empty peer file, glusterd started, and then I started having i/o errors described in my original mail.<br>
<br></div><div>The key log data is IMO : "background meta-data data missing-entry self-heal failed on"</div><div><br></div>Based on this and the log, could it be that gluster failed to write to /var/lib/gluster because of disk full, which caused issues ?<br>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Mar 28, 2014 at 8:13 AM, Joe Julian <span dir="ltr"><<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><br>
<br>
On March 27, 2014 11:08:03 PM PDT, Nicolas Ochem <<a href="mailto:nicolas.ochem@gmail.com">nicolas.ochem@gmail.com</a>> wrote:<br>
>Hi list,<br>
>I would like to describe an issue I had today with Gluster and ask for<br>
>opinion:<br>
><br>
>I have a replicated mount with 2 replica. There is about 1TB of<br>
>production<br>
>data in there in around 100.000 files. They sit on 2x Supermicro<br>
>x9dr3-ln4f<br>
>machines with a RAID array of 18TB each, 64gb of ram, 2x Xeon CPUs, as<br>
>recommended in Red Hat hardware guidelines for storage server. They<br>
>have a<br>
>10gb link between each other. I am running gluster 3.4.2 on centos 6.5<br>
><br>
>This storage is NFS-mounted to a lot of production servers. A very<br>
>little<br>
>part of this data is actually useful, the rest is legacy.<br>
><br>
>Due to some unrelated issue with one of the supermicro server (faulty<br>
>memory), I had to take one of the nodes offline for 3 days.<br>
><br>
>When I brought it back up, some files and directories ended up in<br>
>heal-failed state (but no split-brain). Unfortunately that were the<br>
>critical files that had been edited in the last 3 days. On the NFS<br>
>mounts,<br>
>attempts to read these files resulted in I/O error.<br>
><br>
>I was able to fix a few of these files by manually removing them in<br>
>each<br>
>brick and then copying them to the mounted volume again. But I did not<br>
>know<br>
>what to do when full directories were unreachable because of "heal<br>
>failed".<br>
><br>
>I later read that healing could take time and that heal-failed may be a<br>
>transient state (is that correct?<br>
><a href="http://stackoverflow.com/questions/19257054/is-it-normal-to-get-a-lot-of-heal-failed-entries-in-a-gluster-mount" target="_blank">http://stackoverflow.com/questions/19257054/is-it-normal-to-get-a-lot-of-heal-failed-entries-in-a-gluster-mount</a>),<br>
>but at the time I thought that was beyond recovery, so I proceeded to<br>
>destroy the gluster volume. Then on one of the replicas I moved the<br>
>content<br>
>of the brick to another directory, created another volume with the same<br>
>name, then copied the content of the brick to the mounted volume. This<br>
>took<br>
>around 2 hours. Then I had to reboot all my NFS-mounted machines which<br>
>were<br>
>in "stale NFS file handle" state.<br>
><br>
>Few questions :<br>
>- I realize that I cannot expect 1TB of data to heal instantly, but is<br>
>there any way for me to know if the system would have recovered<br>
>eventually<br>
>despite being shown as "heal failed" ?<br>
>- if yes, what amount of files and filesize should I clean-up from my<br>
>volume to make this time go under 10 minutes ?<br>
>- would native gluster mounts instead of NFS have been of help here ?<br>
>- would any other course of action have resulted in faster recovery<br>
>time ?<br>
>- is there a way in such situation to make one replica have authority<br>
>about<br>
>the correct status of the filesystem ?<br>
><br>
>Thanks in advance for your replies.<br>
><br>
><br>
</div></div>Although the self-heal daemon can take time to heal all the files, accessing a file that needs healed does trigger the heal to be performed immediately by the client (the nfs server is the client in this case).<br>
<br>
Like pretty much all errors in GlusterFS, you would have had to look in the logs to find why something as vague as "heal failed" happened.<br>
<br>
</blockquote></div><br></div>