<html><body><div style="font-family: lucida console,sans-serif; font-size: 12pt; color: #000000"><div><br></div><div>BZ's raised for snapshots<br></div><div><br></div><div><a class="bz_bug_link           bz_status_NEW " title="NEW - [RFE] - Add a default snapshot name when creating a snap" href="https://bugzilla.redhat.com/show_bug.cgi?id=1086493" data-mce-href="https://bugzilla.redhat.com/show_bug.cgi?id=1086493">1086493</a><span class="bz_bug_link           bz_status_NEW "> - rfe requesting a default name for snapshots (<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1086493">https://bugzilla.redhat.com/show_bug.cgi?id=1086493</a>)</span><br data-mce-bogus="1"></div><div><span class="bz_bug_link           bz_status_NEW "><a class="bz_bug_link           bz_status_NEW " title="NEW - [RFE] - Upon snaprestore, immediately take a snapshot to provide recovery point" href="https://bugzilla.redhat.com/show_bug.cgi?id=1086497" data-mce-href="https://bugzilla.redhat.com/show_bug.cgi?id=1086497">1086497</a><span class="bz_bug_link           bz_status_NEW "> - rfe requesting the snapshot after snap restore function (<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1086497">https://bugzilla.redhat.com/show_bug.cgi?id=1086497</a> )</span></span><br data-mce-bogus="1"></div><div><span class="bz_bug_link           bz_status_NEW "><span class="bz_bug_link           bz_status_NEW "><br></span></span></div><div><span class="bz_bug_link           bz_status_NEW "><span class="bz_bug_link           bz_status_NEW ">PC<br></span></span></div><div><br></div><hr id="zwchr"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Paul Cuzner" &lt;pcuzner@redhat.com&gt;<br><b>To: </b>"Rajesh Joseph" &lt;rjoseph@redhat.com&gt;<br><b>Cc: </b>"gluster-devel" &lt;gluster-devel@nongnu.org&gt;<br><b>Sent: </b>Wednesday, 9 April, 2014 1:50:07 PM<br><b>Subject: </b>Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div><div style="font-family: lucida console,sans-serif; font-size: 12pt; color: #000000"><div>Thanks again, Rajesh.<br></div><div><br></div><div><br></div><hr id="zwchr"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Rajesh Joseph" &lt;rjoseph@redhat.com&gt;<br><b>To: </b>"Paul Cuzner" &lt;pcuzner@redhat.com&gt;<br><b>Cc: </b>"gluster-devel" &lt;gluster-devel@nongnu.org&gt;<br><b>Sent: </b>Wednesday, 9 April, 2014 12:04:35 AM<br><b>Subject: </b>Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>Hi Paul,<br><div><br></div>Whenever a brick comes online it performs a handshake with glusterd. The brick will not send a notification to <br>clients until the handshake is done. We are planning to provide an extension to this and recreate those missing snaps.<br><div><br></div>Best Regards,<br>Rajesh<br><div><br></div>----- Original Message -----<br>From: "Paul Cuzner" &lt;pcuzner@redhat.com&gt;<br>To: "Rajesh Joseph" &lt;rjoseph@redhat.com&gt;<br>Cc: "gluster-devel" &lt;gluster-devel@nongnu.org&gt;<br>Sent: Tuesday, April 8, 2014 12:49:13 PM<br>Subject: Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>Rajesh, <br><div><br></div>Perfect explanation - the 'penny has dropped'. I was missing the healing process of the snap being based on the snap from the replica. <br><div><br></div>One final question - I assume the scenario you mention about the brick coming back online before the snapshots are taken is theoretical and there are blocks in place to prevent this from happening? <br><div><br></div>BTW, I'll get the BZ RFE's in by the end of my week, and will post the BZ's back to the list for info. <br><div><br></div>Thanks! <br><div><br></div>PC <br><div><br></div>----- Original Message -----<br><div><br></div>&gt; From: "Rajesh Joseph" &lt;rjoseph@redhat.com&gt;<br>&gt; To: "Paul Cuzner" &lt;pcuzner@redhat.com&gt;<br>&gt; Cc: "gluster-devel" &lt;gluster-devel@nongnu.org&gt;<br>&gt; Sent: Tuesday, 8 April, 2014 5:09:10 PM<br>&gt; Subject: Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>&gt; Hi Paul,<br><div><br></div>&gt; It would be great if you can raise RFEs for both snap after restore and<br>&gt; snapshot naming.<br><div><br></div>&gt; Let's say your volume "Vol" has bricks b1, b2, b3 and b4.<br><div><br></div>&gt; @0800 - S1 (snapshot volume) -&gt; s1_b1, s1_b2, s1_b3, s1_b4 (These are<br>&gt; respective snap bricks which are on independent thin LVs)<br><div><br></div>&gt; @0830 - b1 went down<br><div><br></div>&gt; @1000 - S2 (snapshot volume) -&gt; s2_b1, x, s2_b3, s2_b4. Here we mark the<br>&gt; brick has pending snapshot.<br>&gt; Note that s2_b1 will have all the changes missed by b2 till 1000 hours. AFR<br>&gt; will mark the<br>&gt; pending changes on s2_b1.<br><div><br></div>&gt; @1200 - S3 (Snapshot volume) -&gt; s3_b1, x, s3_b3, s3_b4. This missed snapshot<br>&gt; is also recorded.<br><div><br></div>&gt; @1400 - S4 (Snapshot volume) -&gt; s4_b1, x, s4_b3, s4_b4. This missed snapshot<br>&gt; is also recorded.<br><div><br></div>&gt; @1530 - b2 comes back. Before making it online we take snapshot s2_b2, s3_b2<br>&gt; and s4_b2. Since all<br>&gt; these three snapshots are taken nearly at the same time content-wise all of<br>&gt; them would be<br>&gt; at the same state. Now these bricks are added to their respective volumes.<br>&gt; Note that till<br>&gt; now no healing is done. After addition snapshot volumes will look like this:<br>&gt; S2 -&gt; s2_b1, s2_b2, s2_b3, s2_b4.<br>&gt; S3 -&gt; s3_b1, s3_b2, s3_b3, s3_b4.<br>&gt; S4 -&gt; s4_b1, s4_b2, s4_b3, s4_b4.<br>&gt; After this b2 will come online, i.e. clients can access this brick. Now S2,<br>&gt; S3 and S4 is healed.<br>&gt; s2_b2 will get healed from s2_b1, s3_b2 will be healed from s3_b1 and so on<br>&gt; and so forth.<br>&gt; This healing will take s2_b2 to the point when the snapshot is taken.<br><div><br></div>&gt; If the bricks come online before taking these snapshots self heal will try to<br>&gt; take the brick (b2) to point closer<br>&gt; to the current time (@1530). Therefore it will not be consistent with the<br>&gt; other replica-set.<br><div><br></div>&gt; Please let me know if you have more questions or clarifications.<br><div><br></div>&gt; Best Regards,<br>&gt; Rajesh<br><div><br></div>&gt; ----- Original Message -----<br>&gt; From: "Paul Cuzner" &lt;pcuzner@redhat.com&gt;<br>&gt; To: "Rajesh Joseph" &lt;rjoseph@redhat.com&gt;<br>&gt; Cc: "gluster-devel" &lt;gluster-devel@nongnu.org&gt;<br>&gt; Sent: Tuesday, April 8, 2014 8:01:57 AM<br>&gt; Subject: Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>&gt; Thanks Rajesh.<br><div><br></div>&gt; Let me know if I should raise any RFE's - snap after restore, snapshot<br>&gt; naming, etc<br><div><br></div>&gt; I'm still being thick about the snapshot process with missing bricks. What<br>&gt; I'm missing is the heal process between snaps - my assumption is that the<br>&gt; snap of a brick needs to be consistent with the other brick snaps within the<br>&gt; same replica set. Lets use a home drive use case as an example - typically,<br>&gt; I'd expect to see a home directories getting snapped at 0800, 1000,<br>&gt; 1200,1400, 1600, 1800, 2200 each day. So in that context, say we have a<br>&gt; dist-repl volume with 4 bricks, b1&lt;-&gt;b2, b3&lt;-&gt;b4;<br><div><br></div>&gt; @ 0800 all bricks are available, snap (S1) succeeds with a snap volume being<br>&gt; created from all bricks<br>&gt; --- files continue to be changed and added<br>&gt; @ 0830 b2 is unavailable (D0). Gluster tracks the pending updates on b1,<br>&gt; needed to be applied to b2<br>&gt; --- files continue to be changed and added.<br>&gt; @ 1000 snap requested - 3 of 4 bricks available, snap taken (S2) on b1, b3<br>&gt; and b4 - snapvolume activated<br>&gt; --- files continue to change<br>&gt; @ 1200 a further snap performed - S3<br>&gt; --- files continue to change<br>&gt; @ 1400 snapshot S4 taken<br>&gt; --- files change<br>&gt; @ 1530 missing brick 2 comes back online (D1)<br><div><br></div>&gt; Now between disruption of D0 and D1 there have been several snaps. My<br>&gt; understanding is that each snap should provide a view of the filesystem<br>&gt; consistent at the time of the snapshot - correct?<br><div><br></div>&gt; You mention<br>&gt; + brick2 comes up. At this moment we take a snapshot before we allow new I/O<br>&gt; or heal of the brick. We multiple snaps are missed then all the snaps are<br>&gt; taken at this time. We don't wait till the brick is brought to the same<br>&gt; state as other bricks.<br>&gt; + brick2_s1 (snap of brick2) will be added to s1 volume (snapshot volume).<br>&gt; Self heal will take of bringing brick2 state to its other replica set.<br><div><br></div>&gt; According to this description, if you snapshot b2 as soon as it's back online<br>&gt; - that generates S1,S2 and S3 as at 08:30 - and lets self heal bring b2 up<br>&gt; to the current time D1. However, doesn't this mean that S1,S2 and S3 on<br>&gt; brick2 are not equal to S2,S3,S4 on brick1?<br><div><br></div>&gt; If that is right, then if b1 is unavailable the corresponding snapshots on b2<br>&gt; wouldn't support the recovery points of 1000,1200 and 1400 - which we know<br>&gt; are ok on b1.<br><div><br></div>&gt; I guess I'd envisaged snapshots working hand-in-glove with self heal to<br>&gt; maintain the snapshot consistency - and may just be stuck on that thought.<br><div><br></div>&gt; Maybe this is something I'll only get on whiteboard - wouldn't be the first<br>&gt; time :(<br><div><br></div>&gt; I appreciate you patience in explaining this recovery process!<br><div><br></div>&gt; ----- Original Message -----<br><div><br></div>&gt; &gt; From: "Rajesh Joseph" &lt;rjoseph@redhat.com&gt;<br>&gt; &gt; To: "Paul Cuzner" &lt;pcuzner@redhat.com&gt;<br>&gt; &gt; Cc: "gluster-devel" &lt;gluster-devel@nongnu.org&gt;<br>&gt; &gt; Sent: Monday, 7 April, 2014 10:12:53 PM<br>&gt; &gt; Subject: Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>&gt; &gt; Thanks Paul for your valuable comments. Please find my comments in-lined<br>&gt; &gt; below.<br><div><br></div>&gt; &gt; Please let us know if you have more questions or clarifications. I will try<br>&gt; &gt; to update the<br>&gt; &gt; doc where ever more clarity is needed.<br><div><br></div>&gt; &gt; Thanks &amp; Regards,<br>&gt; &gt; Rajesh<br><div><br></div>&gt; &gt; ----- Original Message -----<br>&gt; &gt; From: "Paul Cuzner" &lt;pcuzner@redhat.com&gt;<br>&gt; &gt; To: "Rajesh Joseph" &lt;rjoseph@redhat.com&gt;<br>&gt; &gt; Cc: "gluster-devel" &lt;gluster-devel@nongnu.org&gt;<br>&gt; &gt; Sent: Monday, April 7, 2014 1:59:10 AM<br>&gt; &gt; Subject: Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>&gt; &gt; Hi Rajesh,<br><div><br></div>&gt; &gt; Thanks for updating the design doc. It reads well.<br><div><br></div>&gt; &gt; I have a number of questions that would help my understanding;<br><div><br></div>&gt; &gt; Logging : The doc doesn't mention how the snapshot process is logged -<br>&gt; &gt; - will snapshot use an existing log or a new log?<br>&gt; &gt; [RJ]: As of now snapshot make use of existing logging framework.<br>&gt; &gt; - Will the log be specific to a volume, or will all snapshot activity be<br>&gt; &gt; logged in a single file?<br>&gt; &gt; [RJ]: Snapshot module is embedded in gluster core framework. Therefore the<br>&gt; &gt; logs will also be part of glusterd logs.<br>&gt; &gt; - will the log be visible on all nodes, or just the originating node?<br>&gt; &gt; [RJ]: Similar to glusterd snapshot logs related to each node will be<br>&gt; &gt; visible<br>&gt; &gt; in those nodes.<br>&gt; &gt; - will the highlevel snapshot action be visible when looking from the other<br>&gt; &gt; nodes either in the logs or at the cli?<br>&gt; &gt; [RJ]: As of now highlevel snapshot action will be visible only in the logs<br>&gt; &gt; of<br>&gt; &gt; originator node. Though cli can be used see<br>&gt; &gt; list and info of snapshots from any other nodes.<br><div><br></div>&gt; &gt; Restore : You mention that after a restore operation, the snapshot will be<br>&gt; &gt; automatically deleted.<br>&gt; &gt; - I don't believe this is a prudent thing to do. Here's an example, I've<br>&gt; &gt; seen<br>&gt; &gt; ALOT. Application has a programmatic error, leading to data 'corruption' -<br>&gt; &gt; devs work on the program, storage guys roll the volume back. So far so<br>&gt; &gt; good...devs provide the updated program, and away you go...BUT the issue is<br>&gt; &gt; not resolved, so you need to roll back again to the same point in time. If<br>&gt; &gt; you delete the snap automatically, you loose the restore point. Yes the<br>&gt; &gt; admin could take another snap after the restore - but why add more work<br>&gt; &gt; into<br>&gt; &gt; a recovery process where people are already stressed out :) I'd recommend<br>&gt; &gt; leaving the snapshot if possible, and let it age out naturally.<br>&gt; &gt; [RJ]: Snapshot restore is a simple operation wherein volume bricks will<br>&gt; &gt; simply point to the brick snapshot instead of the original brick. Therefore<br>&gt; &gt; once the restore is done we cannot use the same snapshot again. We are<br>&gt; &gt; planning to implement a configurable option which will automatically take<br>&gt; &gt; snapshot of the snapshot to fulfill the above mentioned requirement. But<br>&gt; &gt; with the given timeline and resources we will not be able to target it in<br>&gt; &gt; the coming release.<br><div><br></div>&gt; &gt; Auto-delete : Is this a post phase of the snapshot create, so the<br>&gt; &gt; successfully creation of a new snapshot will trigger the pruning of old<br>&gt; &gt; versions?<br>&gt; &gt; [RJ] Yes, if we reach the snapshot limit for a volume then the snapshot<br>&gt; &gt; create operation will trigger pruning of older snapshots.<br><div><br></div>&gt; &gt; Snapshot Naming : The doc states the name is mandatory.<br>&gt; &gt; - why not offer a default - volume_name_timestamp - instead of making the<br>&gt; &gt; caller decide on a name. Having this as a default will also make the list<br>&gt; &gt; under .snap more usable by default.<br>&gt; &gt; - providing a sensible default will make it easier for end users for self<br>&gt; &gt; service restore. More sensible defaults = more happy admins :)<br>&gt; &gt; [RJ]: This is a good to have feature we will try to incorporate this in the<br>&gt; &gt; next release.<br><div><br></div>&gt; &gt; Quorum and snaprestore : the doc mentions that when a returning brick comes<br>&gt; &gt; back, it will be snap'd before pending changes are applied. If I understand<br>&gt; &gt; the use of quorum correctly, can you comment on the following scenario;<br>&gt; &gt; - With a brick offline, we'll be tracking changes. Say after 1hr a snap is<br>&gt; &gt; invoked because quorum is met<br>&gt; &gt; - changes continue on the volume for another 15 minutes beyond the snap,<br>&gt; &gt; when<br>&gt; &gt; the offline brick comes back online.<br>&gt; &gt; - at this point there are two point in times to bring the brick back to -<br>&gt; &gt; the<br>&gt; &gt; brick needs the changes up to the point of the snap, then a snap of the<br>&gt; &gt; brick followed by the 'replay' of the additional changes to get back to the<br>&gt; &gt; same point in time as the other replica's in the replica set.<br>&gt; &gt; - of course, the brick could be offline for 24 or 48 hours due to a<br>&gt; &gt; hardware<br>&gt; &gt; fault - during which time multiple snapshots could have been made<br>&gt; &gt; - it wasn't clear to me how this scenario is dealt with from the doc?<br>&gt; &gt; [RJ]: Following action is taken in case we miss a snapshot on brick.<br>&gt; &gt; + Lets say brick2 is down while taking snapshot s1.<br>&gt; &gt; + Snapshot s1 will be taken for all the bricks except brick2. Will update<br>&gt; &gt; the<br>&gt; &gt; bookkeeping about the missed activity.<br>&gt; &gt; + I/O can continue to happen on origin volume.<br>&gt; &gt; + brick2 comes up. At this moment we take a snapshot before we allow new<br>&gt; &gt; I/O<br>&gt; &gt; or heal of the brick. We multiple snaps are missed then all the snaps are<br>&gt; &gt; taken at this time. We don't wait till the brick is brought to the same<br>&gt; &gt; state as other bricks.<br>&gt; &gt; + brick2_s1 (snap of brick2) will be added to s1 volume (snapshot volume).<br>&gt; &gt; Self heal will take of bringing brick2 state to its other replica set.<br><div><br></div>&gt; &gt; barrier : two things are mentioned here - a buffer size and a timeout<br>&gt; &gt; value.<br>&gt; &gt; - from an admin's pespective, being able to specify the timeout (secs) is<br>&gt; &gt; likely to be more workable - and will allow them to align this setting with<br>&gt; &gt; any potential timeout setting within the application running against the<br>&gt; &gt; gluster volume. I don't think most admins will know or want to know how to<br>&gt; &gt; size the buffer properly.<br>&gt; &gt; [RJ]: In the current release we are only providing the timeout value as a<br>&gt; &gt; configurable option. The buffer size is being considered for future release<br>&gt; &gt; as configurable option or we find our-self what would be the optimal value<br>&gt; &gt; based on user's system configuration.<br><div><br></div>&gt; &gt; Hopefully the above makes sense.<br><div><br></div>&gt; &gt; Cheers,<br><div><br></div>&gt; &gt; Paul C<br><div><br></div>&gt; &gt; ----- Original Message -----<br><div><br></div>&gt; &gt; &gt; From: "Rajesh Joseph" &lt;rjoseph@redhat.com&gt;<br>&gt; &gt; &gt; To: "gluster-devel" &lt;gluster-devel@nongnu.org&gt;<br>&gt; &gt; &gt; Sent: Wednesday, 2 April, 2014 3:55:28 AM<br>&gt; &gt; &gt; Subject: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>&gt; &gt; &gt; Hi all,<br><div><br></div>&gt; &gt; &gt; I have updated the GlusterFS snapshot forge wiki.<br><div><br></div>&gt; &gt; &gt; https://forge.gluster.org/snapshot/pages/Home<br><div><br></div>&gt; &gt; &gt; Please go through it and let me know if you have any questions or<br>&gt; &gt; &gt; queries.<br><div><br></div>&gt; &gt; &gt; Best Regards,<br>&gt; &gt; &gt; Rajesh<br><div><br></div>&gt; &gt; &gt; [PS]: Please ignore previous mail. Accidentally hit send before<br>&gt; &gt; &gt; completing<br>&gt; &gt; &gt; :)<br><div><br></div>&gt; &gt; &gt; _______________________________________________<br>&gt; &gt; &gt; Gluster-devel mailing list<br>&gt; &gt; &gt; Gluster-devel@nongnu.org<br>&gt; &gt; &gt; https://lists.nongnu.org/mailman/listinfo/gluster-devel<br></blockquote><div><br></div></div><br>_______________________________________________<br>Gluster-devel mailing list<br>Gluster-devel@nongnu.org<br>https://lists.nongnu.org/mailman/listinfo/gluster-devel<br></blockquote><div><br></div></div></body></html>