<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Thanks guys, for your responses!  I get the digest, so I&#39;m going to cut/paste the juicier bits into one message... And a warning... if some of my comments suggest I really don&#39;t know what I&#39;m doing - well, that could very well be right.  I&#39;m definitely down the learning curve a way - IT is not my real job or background.</div>

<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">---------- Forwarded message ----------<br>

From: Alex Chekholko &lt;<a href="mailto:chekh@stanford.edu">chekh@stanford.edu</a>&gt;<br>To: <a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a><br>Cc: <br>Date: Mon, 17 Mar 2014 11:23:15 -0700<br>Subject: Re: [Gluster-users] Replicate Over VPN<br>

<br>

<br>

On 03/13/2014 04:50 PM, Brock Nanson wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><br>

</blockquote>

...<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

2) I&#39;ve seen it suggested that the write function isn&#39;t considered<br>

complete until it&#39;s complete on all bricks in the volume. My write<br>

speeds would seem to confirm this.<br>

</blockquote>

<br>

Yes, the write will return when all replicas are written.  AKA synchronous replication.  Usually &quot;replication&quot; means &quot;synchronous replication&quot;.<br></blockquote><div><br></div><div>OK, so the replication is bit by bit, real time across all the replicas.  &#39;Synchronous&#39; meaning &#39;common clock&#39; in essence.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

Is this correct and is there any way<br>

to cache the data and allow it to trickle over the link in the<br>

background?<br>

</blockquote>

<br>

You&#39;re talking about asynchronous replication.  Which GlusterFS calls &quot;geo-replication&quot;.<br></blockquote><div><br></div><div>Understood... so this means one direction only in reality, at least until the nut of doing the replication in both directions can be cracked.  &#39;Asynchronous&#39; might be a bit of a misdirection though, because it would suggest (to me at least), communication in *both* directions, but not based on the same clock.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">...<br>

<br>

Geo-replication would seem to be the ideal solution, except for the fact<br>

that it apparently only works in one direction (although it was<br>

evidently hoped it would be upgraded in 3.4.0 to go in both directions I<br>

understand).<br>

</blockquote>

<br>

So if you allow replication to be delayed, and you allow writes on both sides, how would you deal with the same file simultaneously being written on both sides.  Which would win in the end?<br></blockquote><div><br></div>

<div>This is the big question of course, and I think the answer requires more knowledge than I have relating to how the replication process occurs.  In my unsophisticated way, I would assume that under the hood, gluster would sound something like this whenever a new file is written to Node A:</div>

<div><br></div><div>1) Samba wants to write a file, I&#39;m awake!</div><div>2) Hey Node B, wake up, we&#39;re about to start writing some bits synchronously.  File is called &#39;junk.txt&#39;.</div><div>3) OK, we&#39;ve both opened that file for writing...</div>

<div>3) Samba, start your transmission.</div><div>4) &#39;write, write, write&#39;, in Node A/B perfect harmony</div><div>5) Close that file and make sure the file listing is updated.</div><div><br></div><div>This bit level understanding is something I don&#39;t have.  At some point, the directory listing would be updated to show the new or updated file.  When does that happen?  Before or after the file is written?</div>

<div><br></div><div>So to answer your question about which file would be win if simultaneously written, I need to understand whether simply having the file opened for writing is enough to take control of it.  That is, can Node A tell Node B that junk.txt is going to be written, thus preventing Node B from accepting a local write request?  If this is the case, then gluster would only need to send enough information from Node A to Node B to indicate the write was coming and that the file is off limits until further notice.  The write could occur as fast as possible on the local node, and dribble across the VPN as fast as the link allows to the other.  So #4 above would be &#39;write, write, write as fast as each node reasonably can, but not necessarily in harmony&#39;.  And if communication was broken during the process, the heal function would be called upon to sort it out when communication is restored.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

So are there any configuration tricks (write-behind, compression etc)<br>

that might help me out?  Is there a way to fool geo-replication into<br>

working in both directions, recognizing my application isn&#39;t seeing<br>

serious read/write activity and some reasonable amount of risk is<br>

acceptable?<br>

<br>

</blockquote>

<br>

You&#39;re basically talking about running rsyncs in both directions.  How will you handle any file conflicts?<br></blockquote><div><br></div><div>Yes, I suppose in a way I am, but not based on a cron job... it would ideally be a full time synchronization, like gluster does, but without the requirement of perfect Synchronicity (wasn&#39;t that a Police album?).</div>

<div><br></div><div>Assuming my kindergarten understanding above could be applied here, the file conflicts would presumably only exist if the VPN link went down, preventing the &#39;open the file for writing&#39; command to be completed on both ends.  If the link went down part way through a dribbling write to Node B, the healing process would presumably have a go at fixing the problem after the link is reinstated.  If someone wrote to the remote copy during the outage, the typical heal issues would come into play.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

<br>

-- <br>

Alex Chekholko <a href="mailto:chekh@stanford.edu" target="_blank">chekh@stanford.edu</a><br>

<br><div style="font-size:14px;font-family:Calibri,sans-serif;word-wrap:break-word"><span><div><div><blockquote style="BORDER-LEFT:#b5c4df 5 solid;PADDING:0 0 0 5;MARGIN:0 0 0 5"><div><br></div></blockquote></div></div></span></div>

---------- Forwarded message ----------<br>From: Alex Chekholko &lt;<a href="mailto:chekh@stanford.edu">chekh@stanford.edu</a>&gt;<br>To: <a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a><br>Cc: <br>

Date: Mon, 17 Mar 2014 15:11:37 -0700<br>Subject: Re: [Gluster-users] Replicate Over VPN<br>Replying back to list.<br>

<br>

I don&#39;t know of a currently available clustered filesystem that allows bi-directional asynchronous replication.  Even in your case where you can have manual curation, what would you want to happen when two humans modify the same files at the same time in your two geographic locations?  And don&#39;t tell us it will never happen.<br>

</blockquote><div><br></div><div>Heh, yes, Murphy is a complete bast*rd, so in spite of the odds associated with sharing over a million files with 30 people, it would have to happen eventually.  However, the key here is what we do is reproducible and not as sensitive as, say, banking data.  If someone hits &#39;save&#39; and something pukes, the worst case is they&#39;ve wasted 10 or 15 minutes of work... which they can do again.  I absolutely understand why gluster is as bit-for-bit fanatical about keeping everything identical and correct as it is.  It has to be for virtually all implementations and wouldn&#39;t be considered ready for the real world if it wasn&#39;t.  I just need a lazy mode and a tick box saying I acknowledge and accept all the risks!</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

Synchronous replication works a bit differently everywhere, so you&#39;ll just want to double-check which is most compatible with your workflow.<br>

<br>

In glusterfs, the client talks to all replicas and returns when each replica has confirmed it has written the data.<br>

<br>

In ceph, the client talks to the master replica, and then that master replica forwards the writes to all the other replicas, and then confirms to the client that all the replicas are written.<br>

<br>

<br>

For your async use case, how often does the shared data change?  Perhaps something like a plain rsync every night would be sufficient?  Or a ZFS send/receive if that&#39;s faster than rsync?<br></blockquote><div><br></div>

<div>As noted above, the number of users isn&#39;t that large.  The data is changing regularly, but we&#39;re only talking about a small number of files.  A workstation user may be in the same file all day long, doing regular saves and perhaps opening another file or two for reference once in a while. </div>

<div><br></div><div>The reality is, if I knew when a write was about to happen I could drop the VPN connection, allow the write to finish on the local machine without VPN delay (a second or two), then bring up the connection again and let the heal process look after things.  The user wouldn&#39;t see the long delay of the bit-by-bit save to the other node and the synchronization would happen in the background.  The few seconds of &#39;downtime&#39; during a write would be acceptable to me because the odds of the heal process finding new files on both ends is incredibly small (in my usage case).  Rsync is something I&#39;ve used in the past... but it requires too much supervision to ensure it does what you really expect it to do.  Or so I&#39;ve found.  It&#39;s better suited for backups IMHO.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

<br>

On 03/17/2014 02:58 PM, Carlos Capriotti wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

Being a little bit familiar with Brock&#39;s work environment, I think I can<br>

clarify on this: they have a human, manual system of avoiding those<br>

conflicts. Only one person/geographical group will use the files at a<br>

given time.<br>

<br>

All that matters then is being able to automate the replication/exchange<br>

process, so, in this case, the question still needing an answer would<br>

be, &quot;is there a way to make geo-rep work both ways ?&quot;<br>

<br>

Sorry for taking point here, Brock. I thought this would speed up the<br>

discussion a tad.<br></blockquote></blockquote><div><br></div><div>No problem Carlos, your help is appreciated!</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

<br>

On Mon, Mar 17, 2014 at 7:23 PM, Alex Chekholko &lt;<a href="mailto:chekh@stanford.edu" target="_blank">chekh@stanford.edu</a><br>

&lt;mailto:<a href="mailto:chekh@stanford.edu" target="_blank">chekh@stanford.edu</a>&gt;&gt; wrote:<br>

<br>

<br>

    You&#39;re basically talking about running rsyncs in both directions.<br>

      How will you handle any file conflicts?<br>

<br>

<br>

    --<br>

    Alex Chekholko <a href="mailto:chekh@stanford.edu" target="_blank">chekh@stanford.edu</a> &lt;mailto:<a href="mailto:chekh@stanford.edu" target="_blank">chekh@stanford.edu</a>&gt;<br>

<br>

</blockquote>

<br>

<br><br>---------- Forwarded message ----------<br>From: Marcus Bointon &lt;<a href="mailto:marcus@synchromedia.co.uk">marcus@synchromedia.co.uk</a>&gt;<br>To: gluster-users &lt;<a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>&gt;<br>

Cc: <br>Date: Tue, 18 Mar 2014 00:03:03 +0100<br>Subject: Re: [Gluster-users] Replicate Over VPN<br>On 17 Mar 2014, at 23:11, Alex Chekholko &lt;<a href="mailto:chekh@stanford.edu">chekh@stanford.edu</a>&gt; wrote:<br>

<br>

&gt; For your async use case, how often does the shared data change?  Perhaps something like a plain rsync every night would be sufficient?  Or a ZFS send/receive if that&#39;s faster than rsync?<br>

<br>

(This should really have been in reply to Brock, but I lost his post somewhere)<br>

<br>

There are some fairly simple solutions for this that may be workable, especially if writes are somewhat constrained. If all reads and writes by a single client go to the same back-end server, perhaps because of cookie or IP-based stickiness, they can cope with longish latency propagating to other servers, read-what-you-just-wrote will always succeed, and simultaneous writes to the same file are very unlikely. A classic use case would be user-uploaded image files for a web server cluster.<br>


<br>

Bidirectional rsync has serious issues with deletions. Other systems worth looking at include:<br>

csync2: <a href="http://oss.linbit.com/csync2/" target="_blank">http://oss.linbit.com/csync2/</a><br>

Unison: <a href="http://www.cis.upenn.edu/~bcpierce/unison/" target="_blank">http://www.cis.upenn.edu/~bcpierce/unison/</a><br>

Bsync: <a href="https://github.com/dooblem/bsync" target="_blank">https://github.com/dooblem/bsync</a></blockquote><div><br></div><div>I&#39;m actually looking at Unison at the moment as suggested by another mentor off-list.  I&#39;m not convinced it&#39;s the way to go yet though, as you noted... the rsync is something that makes me nervous in my usage case.</div>

<div><br></div><div>If you want to have a good laugh, ponder what I&#39;m considering now.... ;-)  Some form of ownCloud *with* gluster...  My thinking (without the benefit of testing the details yet) is to put ownCloud server on each of the gluster node boxes, sharing out the gluster-mounted volume.  Sitting next to each gluster node would be another box, this one running the ownCloud client (I&#39;ve seen suggestions that someone has put together a command line implementation of the client for headless servers) and Samba.  So a user would read/write/browse on the Samba server and drives, with typical gigabit LAN performance.  OwnCloud would sync changes back to the gluster volume and would live with the speed issues of the synchronous replication over the VPN.  The other gluster node at the far end of the VPN would then share the changes via ownCloud over to the Samba box sitting next to it.  Perhaps this could all be done with ownCloud alone, but I really like the way gluster will maintain the data integrity across the VPN... the problems I have with it are really BECAUSE it&#39;s doing a good job!</div>

<div><br></div><div>I don&#39;t know how big a mess the concurrent file use issue would create... or if this could even function as pondered.  But I&#39;m thinking it would isolate the workstations from the delay issues of the VPN speed.  And might be worth testing for sh1t5 and giggles.  (yeah, I really am that desperate to find a reasonably functional solution!)</div>

<div><br></div><div>Thanks guys!</div><div><br></div><div>Brock</div></div></div></div>