Hi Dan, <br><br>Thank you a lot for your comprehensive explantion of using rsync to sync glusterfs servers. I have not a opportunity to check that solution because my customer decided to give up of Glusters. I will test it at my lab. <br>


Thanks, <br>Jimmy,<br><br><div class="gmail_quote">On 16 May 2012 16:45, Dan Bretherton <span dir="ltr">&lt;<a href="mailto:d.a.bretherton@reading.ac.uk" target="_blank">d.a.bretherton@reading.ac.uk</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi Glusterfs Users!<br>

<br>

I have got one replicated volume with two bricks:<br>

<br>

s1 ~ # gluster volume info<br>

<br>

Volume Name: data-ns<br>

Type: Replicate<br>

Status: Started<br>

Number of Bricks: 2<br>

Transport-type: tcp<br>

Bricks:<br>

Brick1: s1:/mnt/gluster/data-ns<br>

Brick2: s2:/mnt/gluster/data-ns<br>

Options Reconfigured:<br>

performance.cache-refresh-<u></u>timeout: 1<br>

performance.io-thread-count: 32<br>

auth.allow: 10.*<br>

performance.cache-size: 1073741824<br>

<br>

<br>

There are 5 clients which have got mounted volume from s1 server.<br>

<br>

We&#39;ve face a hardware failure on s2  box for about one week. During that<br>

time the s2 box was down.<br>

All read writes operations went to s1.<br>

Now I would like to synchronize all files on s2 which is operable. I have<br>

started Glusterfs Server and<br>

executed self healing process(&quot;find  with stat&quot;on the glusterfs mount from<br>

s2 box).<br>

During the replication process I have faced very strange behaviour of<br>

Glusterfs.<br>

Some of clients have tried to get lots of files from s2 server, but those<br>

files did not exist or have got 0 bytes size.<br>

<br>

It caused lots of &quot;disk wait&quot; on the web servers (clients which have got<br>

mounted volume from s1) and finally 503 http response had been sent.<br>

<br>

My question is, how to avoid serving files from s2 box until all files<br>

would be replicated correctly from s1 server?<br>

<br>

I have installed Glusters 3.2.6-1 from Debian repository.<br>

<br>

Thank you a lot in advance,<br>

Jimmy,<br>

</blockquote>

<br>

Dear Jimmy,<br>

I have had problems re-synchronising out of date servers myself.  I posted the following query last year.<br>

<br>

<a href="http://gluster.org/pipermail/gluster-users/2011-October/008933.html" target="_blank">http://gluster.org/pipermail/<u></u>gluster-users/2011-October/<u></u>008933.html</a><br>

<br>

In my case I was mainly worried about the self-heal process causing excessive load, which I suspected of causing my fairly low specification servers to hang.  Following that posting I received some advice off line concerning the use of rsync to re-synchronise out of date servers that have been off line for repairs for a long period of time.  I was advised that it is safe to use rsync, provided that the -X or --xattrs option is used to preserve extended attributes, and it is also necessary to use the --delete option in order to delete files that were deleted from the live server.  When I do this I disable the glusterd service while the rsync is taking place, although I have not been advised that this is essential. It is possible that  files on the live server may be modified while the rsync is in process, so I always follow up with a targeted self-heal in order to bring the repaired server fully up to date.  The targeted self-heal procedure is described in the following Gluster Community article.<br>


<br>

<a href="http://community.gluster.org/a/howto-targeted-self-heal-repairing-less-than-the-whole-volume/" target="_blank">http://community.gluster.org/<u></u>a/howto-targeted-self-heal-<u></u>repairing-less-than-the-whole-<u></u>volume/</a><br>


<br>

When the resynchronisation process is complete I have noticed that the volume of data in replicated bricks can differ by up to 100MB.  I find this a bit worrying, but I haven&#39;t had time to find out exactly which files are on these bricks and why the volume of data reported by df differs on the two servers.<br>


<br>

The problem with the rsync approach is that it can take a very long time if there are a large number of files to synchronise, probably because rsync is single threaded.  I recently had one rsync going for two weeks and it still didn&#39;t finish, and I discovered that the bricks in question had more than 2.5 million files.  I couldn&#39;t wait any longer to bring my repaired server back into service so I killed the rsync and started glusterd, and I then ran a targeted self-heal on the unsynchronised bricks to continue the resynchronisation.  That is still going on now, but I am not seeing excessive load and haven&#39;t noticed any replication errors (but I haven&#39;t got the time to check thoroughly). This might be because most of the file transfer has already taken place or because most of the files in these particular bricks are small.<br>


<br>

My conclusion from this experience is that if a server goes down for a long time and becomes significantly out of date, it is best to use rsync (with glusterd disabled) to do as much of the file transfer as possible.  Once that has been done, the GlusterFS self heal mechanism can finish off the resynchronisation without any problematic side effects.  I will follow that procedure next time and report any other problems or observations.<span class="HOEnZb"><font color="#888888"><br>


<br>

-Dan.<br>

<br>

<br>

<br>

</font></span></blockquote></div><br>