<html><body><div style="font-family: lucida console,sans-serif; font-size: 12pt; color: #000000"><div><br></div><div>BZ's raised for snapshots<br></div><div><br></div><div><a class="bz_bug_link bz_status_NEW " title="NEW - [RFE] - Add a default snapshot name when creating a snap" href="https://bugzilla.redhat.com/show_bug.cgi?id=1086493" data-mce-href="https://bugzilla.redhat.com/show_bug.cgi?id=1086493">1086493</a><span class="bz_bug_link bz_status_NEW "> - rfe requesting a default name for snapshots (<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1086493">https://bugzilla.redhat.com/show_bug.cgi?id=1086493</a>)</span><br data-mce-bogus="1"></div><div><span class="bz_bug_link bz_status_NEW "><a class="bz_bug_link bz_status_NEW " title="NEW - [RFE] - Upon snaprestore, immediately take a snapshot to provide recovery point" href="https://bugzilla.redhat.com/show_bug.cgi?id=1086497" data-mce-href="https://bugzilla.redhat.com/show_bug.cgi?id=1086497">1086497</a><span class="bz_bug_link bz_status_NEW "> - rfe requesting the snapshot after snap restore function (<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1086497">https://bugzilla.redhat.com/show_bug.cgi?id=1086497</a> )</span></span><br data-mce-bogus="1"></div><div><span class="bz_bug_link bz_status_NEW "><span class="bz_bug_link bz_status_NEW "><br></span></span></div><div><span class="bz_bug_link bz_status_NEW "><span class="bz_bug_link bz_status_NEW ">PC<br></span></span></div><div><br></div><hr id="zwchr"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Paul Cuzner" <pcuzner@redhat.com><br><b>To: </b>"Rajesh Joseph" <rjoseph@redhat.com><br><b>Cc: </b>"gluster-devel" <gluster-devel@nongnu.org><br><b>Sent: </b>Wednesday, 9 April, 2014 1:50:07 PM<br><b>Subject: </b>Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div><div style="font-family: lucida console,sans-serif; font-size: 12pt; color: #000000"><div>Thanks again, Rajesh.<br></div><div><br></div><div><br></div><hr id="zwchr"><blockquote style="border-left:2px solid #1010FF;margin-left:5px;padding-left:5px;color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Rajesh Joseph" <rjoseph@redhat.com><br><b>To: </b>"Paul Cuzner" <pcuzner@redhat.com><br><b>Cc: </b>"gluster-devel" <gluster-devel@nongnu.org><br><b>Sent: </b>Wednesday, 9 April, 2014 12:04:35 AM<br><b>Subject: </b>Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>Hi Paul,<br><div><br></div>Whenever a brick comes online it performs a handshake with glusterd. The brick will not send a notification to <br>clients until the handshake is done. We are planning to provide an extension to this and recreate those missing snaps.<br><div><br></div>Best Regards,<br>Rajesh<br><div><br></div>----- Original Message -----<br>From: "Paul Cuzner" <pcuzner@redhat.com><br>To: "Rajesh Joseph" <rjoseph@redhat.com><br>Cc: "gluster-devel" <gluster-devel@nongnu.org><br>Sent: Tuesday, April 8, 2014 12:49:13 PM<br>Subject: Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>Rajesh, <br><div><br></div>Perfect explanation - the 'penny has dropped'. I was missing the healing process of the snap being based on the snap from the replica. <br><div><br></div>One final question - I assume the scenario you mention about the brick coming back online before the snapshots are taken is theoretical and there are blocks in place to prevent this from happening? <br><div><br></div>BTW, I'll get the BZ RFE's in by the end of my week, and will post the BZ's back to the list for info. <br><div><br></div>Thanks! <br><div><br></div>PC <br><div><br></div>----- Original Message -----<br><div><br></div>> From: "Rajesh Joseph" <rjoseph@redhat.com><br>> To: "Paul Cuzner" <pcuzner@redhat.com><br>> Cc: "gluster-devel" <gluster-devel@nongnu.org><br>> Sent: Tuesday, 8 April, 2014 5:09:10 PM<br>> Subject: Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>> Hi Paul,<br><div><br></div>> It would be great if you can raise RFEs for both snap after restore and<br>> snapshot naming.<br><div><br></div>> Let's say your volume "Vol" has bricks b1, b2, b3 and b4.<br><div><br></div>> @0800 - S1 (snapshot volume) -> s1_b1, s1_b2, s1_b3, s1_b4 (These are<br>> respective snap bricks which are on independent thin LVs)<br><div><br></div>> @0830 - b1 went down<br><div><br></div>> @1000 - S2 (snapshot volume) -> s2_b1, x, s2_b3, s2_b4. Here we mark the<br>> brick has pending snapshot.<br>> Note that s2_b1 will have all the changes missed by b2 till 1000 hours. AFR<br>> will mark the<br>> pending changes on s2_b1.<br><div><br></div>> @1200 - S3 (Snapshot volume) -> s3_b1, x, s3_b3, s3_b4. This missed snapshot<br>> is also recorded.<br><div><br></div>> @1400 - S4 (Snapshot volume) -> s4_b1, x, s4_b3, s4_b4. This missed snapshot<br>> is also recorded.<br><div><br></div>> @1530 - b2 comes back. Before making it online we take snapshot s2_b2, s3_b2<br>> and s4_b2. Since all<br>> these three snapshots are taken nearly at the same time content-wise all of<br>> them would be<br>> at the same state. Now these bricks are added to their respective volumes.<br>> Note that till<br>> now no healing is done. After addition snapshot volumes will look like this:<br>> S2 -> s2_b1, s2_b2, s2_b3, s2_b4.<br>> S3 -> s3_b1, s3_b2, s3_b3, s3_b4.<br>> S4 -> s4_b1, s4_b2, s4_b3, s4_b4.<br>> After this b2 will come online, i.e. clients can access this brick. Now S2,<br>> S3 and S4 is healed.<br>> s2_b2 will get healed from s2_b1, s3_b2 will be healed from s3_b1 and so on<br>> and so forth.<br>> This healing will take s2_b2 to the point when the snapshot is taken.<br><div><br></div>> If the bricks come online before taking these snapshots self heal will try to<br>> take the brick (b2) to point closer<br>> to the current time (@1530). Therefore it will not be consistent with the<br>> other replica-set.<br><div><br></div>> Please let me know if you have more questions or clarifications.<br><div><br></div>> Best Regards,<br>> Rajesh<br><div><br></div>> ----- Original Message -----<br>> From: "Paul Cuzner" <pcuzner@redhat.com><br>> To: "Rajesh Joseph" <rjoseph@redhat.com><br>> Cc: "gluster-devel" <gluster-devel@nongnu.org><br>> Sent: Tuesday, April 8, 2014 8:01:57 AM<br>> Subject: Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>> Thanks Rajesh.<br><div><br></div>> Let me know if I should raise any RFE's - snap after restore, snapshot<br>> naming, etc<br><div><br></div>> I'm still being thick about the snapshot process with missing bricks. What<br>> I'm missing is the heal process between snaps - my assumption is that the<br>> snap of a brick needs to be consistent with the other brick snaps within the<br>> same replica set. Lets use a home drive use case as an example - typically,<br>> I'd expect to see a home directories getting snapped at 0800, 1000,<br>> 1200,1400, 1600, 1800, 2200 each day. So in that context, say we have a<br>> dist-repl volume with 4 bricks, b1<->b2, b3<->b4;<br><div><br></div>> @ 0800 all bricks are available, snap (S1) succeeds with a snap volume being<br>> created from all bricks<br>> --- files continue to be changed and added<br>> @ 0830 b2 is unavailable (D0). Gluster tracks the pending updates on b1,<br>> needed to be applied to b2<br>> --- files continue to be changed and added.<br>> @ 1000 snap requested - 3 of 4 bricks available, snap taken (S2) on b1, b3<br>> and b4 - snapvolume activated<br>> --- files continue to change<br>> @ 1200 a further snap performed - S3<br>> --- files continue to change<br>> @ 1400 snapshot S4 taken<br>> --- files change<br>> @ 1530 missing brick 2 comes back online (D1)<br><div><br></div>> Now between disruption of D0 and D1 there have been several snaps. My<br>> understanding is that each snap should provide a view of the filesystem<br>> consistent at the time of the snapshot - correct?<br><div><br></div>> You mention<br>> + brick2 comes up. At this moment we take a snapshot before we allow new I/O<br>> or heal of the brick. We multiple snaps are missed then all the snaps are<br>> taken at this time. We don't wait till the brick is brought to the same<br>> state as other bricks.<br>> + brick2_s1 (snap of brick2) will be added to s1 volume (snapshot volume).<br>> Self heal will take of bringing brick2 state to its other replica set.<br><div><br></div>> According to this description, if you snapshot b2 as soon as it's back online<br>> - that generates S1,S2 and S3 as at 08:30 - and lets self heal bring b2 up<br>> to the current time D1. However, doesn't this mean that S1,S2 and S3 on<br>> brick2 are not equal to S2,S3,S4 on brick1?<br><div><br></div>> If that is right, then if b1 is unavailable the corresponding snapshots on b2<br>> wouldn't support the recovery points of 1000,1200 and 1400 - which we know<br>> are ok on b1.<br><div><br></div>> I guess I'd envisaged snapshots working hand-in-glove with self heal to<br>> maintain the snapshot consistency - and may just be stuck on that thought.<br><div><br></div>> Maybe this is something I'll only get on whiteboard - wouldn't be the first<br>> time :(<br><div><br></div>> I appreciate you patience in explaining this recovery process!<br><div><br></div>> ----- Original Message -----<br><div><br></div>> > From: "Rajesh Joseph" <rjoseph@redhat.com><br>> > To: "Paul Cuzner" <pcuzner@redhat.com><br>> > Cc: "gluster-devel" <gluster-devel@nongnu.org><br>> > Sent: Monday, 7 April, 2014 10:12:53 PM<br>> > Subject: Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>> > Thanks Paul for your valuable comments. Please find my comments in-lined<br>> > below.<br><div><br></div>> > Please let us know if you have more questions or clarifications. I will try<br>> > to update the<br>> > doc where ever more clarity is needed.<br><div><br></div>> > Thanks & Regards,<br>> > Rajesh<br><div><br></div>> > ----- Original Message -----<br>> > From: "Paul Cuzner" <pcuzner@redhat.com><br>> > To: "Rajesh Joseph" <rjoseph@redhat.com><br>> > Cc: "gluster-devel" <gluster-devel@nongnu.org><br>> > Sent: Monday, April 7, 2014 1:59:10 AM<br>> > Subject: Re: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>> > Hi Rajesh,<br><div><br></div>> > Thanks for updating the design doc. It reads well.<br><div><br></div>> > I have a number of questions that would help my understanding;<br><div><br></div>> > Logging : The doc doesn't mention how the snapshot process is logged -<br>> > - will snapshot use an existing log or a new log?<br>> > [RJ]: As of now snapshot make use of existing logging framework.<br>> > - Will the log be specific to a volume, or will all snapshot activity be<br>> > logged in a single file?<br>> > [RJ]: Snapshot module is embedded in gluster core framework. Therefore the<br>> > logs will also be part of glusterd logs.<br>> > - will the log be visible on all nodes, or just the originating node?<br>> > [RJ]: Similar to glusterd snapshot logs related to each node will be<br>> > visible<br>> > in those nodes.<br>> > - will the highlevel snapshot action be visible when looking from the other<br>> > nodes either in the logs or at the cli?<br>> > [RJ]: As of now highlevel snapshot action will be visible only in the logs<br>> > of<br>> > originator node. Though cli can be used see<br>> > list and info of snapshots from any other nodes.<br><div><br></div>> > Restore : You mention that after a restore operation, the snapshot will be<br>> > automatically deleted.<br>> > - I don't believe this is a prudent thing to do. Here's an example, I've<br>> > seen<br>> > ALOT. Application has a programmatic error, leading to data 'corruption' -<br>> > devs work on the program, storage guys roll the volume back. So far so<br>> > good...devs provide the updated program, and away you go...BUT the issue is<br>> > not resolved, so you need to roll back again to the same point in time. If<br>> > you delete the snap automatically, you loose the restore point. Yes the<br>> > admin could take another snap after the restore - but why add more work<br>> > into<br>> > a recovery process where people are already stressed out :) I'd recommend<br>> > leaving the snapshot if possible, and let it age out naturally.<br>> > [RJ]: Snapshot restore is a simple operation wherein volume bricks will<br>> > simply point to the brick snapshot instead of the original brick. Therefore<br>> > once the restore is done we cannot use the same snapshot again. We are<br>> > planning to implement a configurable option which will automatically take<br>> > snapshot of the snapshot to fulfill the above mentioned requirement. But<br>> > with the given timeline and resources we will not be able to target it in<br>> > the coming release.<br><div><br></div>> > Auto-delete : Is this a post phase of the snapshot create, so the<br>> > successfully creation of a new snapshot will trigger the pruning of old<br>> > versions?<br>> > [RJ] Yes, if we reach the snapshot limit for a volume then the snapshot<br>> > create operation will trigger pruning of older snapshots.<br><div><br></div>> > Snapshot Naming : The doc states the name is mandatory.<br>> > - why not offer a default - volume_name_timestamp - instead of making the<br>> > caller decide on a name. Having this as a default will also make the list<br>> > under .snap more usable by default.<br>> > - providing a sensible default will make it easier for end users for self<br>> > service restore. More sensible defaults = more happy admins :)<br>> > [RJ]: This is a good to have feature we will try to incorporate this in the<br>> > next release.<br><div><br></div>> > Quorum and snaprestore : the doc mentions that when a returning brick comes<br>> > back, it will be snap'd before pending changes are applied. If I understand<br>> > the use of quorum correctly, can you comment on the following scenario;<br>> > - With a brick offline, we'll be tracking changes. Say after 1hr a snap is<br>> > invoked because quorum is met<br>> > - changes continue on the volume for another 15 minutes beyond the snap,<br>> > when<br>> > the offline brick comes back online.<br>> > - at this point there are two point in times to bring the brick back to -<br>> > the<br>> > brick needs the changes up to the point of the snap, then a snap of the<br>> > brick followed by the 'replay' of the additional changes to get back to the<br>> > same point in time as the other replica's in the replica set.<br>> > - of course, the brick could be offline for 24 or 48 hours due to a<br>> > hardware<br>> > fault - during which time multiple snapshots could have been made<br>> > - it wasn't clear to me how this scenario is dealt with from the doc?<br>> > [RJ]: Following action is taken in case we miss a snapshot on brick.<br>> > + Lets say brick2 is down while taking snapshot s1.<br>> > + Snapshot s1 will be taken for all the bricks except brick2. Will update<br>> > the<br>> > bookkeeping about the missed activity.<br>> > + I/O can continue to happen on origin volume.<br>> > + brick2 comes up. At this moment we take a snapshot before we allow new<br>> > I/O<br>> > or heal of the brick. We multiple snaps are missed then all the snaps are<br>> > taken at this time. We don't wait till the brick is brought to the same<br>> > state as other bricks.<br>> > + brick2_s1 (snap of brick2) will be added to s1 volume (snapshot volume).<br>> > Self heal will take of bringing brick2 state to its other replica set.<br><div><br></div>> > barrier : two things are mentioned here - a buffer size and a timeout<br>> > value.<br>> > - from an admin's pespective, being able to specify the timeout (secs) is<br>> > likely to be more workable - and will allow them to align this setting with<br>> > any potential timeout setting within the application running against the<br>> > gluster volume. I don't think most admins will know or want to know how to<br>> > size the buffer properly.<br>> > [RJ]: In the current release we are only providing the timeout value as a<br>> > configurable option. The buffer size is being considered for future release<br>> > as configurable option or we find our-self what would be the optimal value<br>> > based on user's system configuration.<br><div><br></div>> > Hopefully the above makes sense.<br><div><br></div>> > Cheers,<br><div><br></div>> > Paul C<br><div><br></div>> > ----- Original Message -----<br><div><br></div>> > > From: "Rajesh Joseph" <rjoseph@redhat.com><br>> > > To: "gluster-devel" <gluster-devel@nongnu.org><br>> > > Sent: Wednesday, 2 April, 2014 3:55:28 AM<br>> > > Subject: [Gluster-devel] GlusterFS Snapshot internals<br><div><br></div>> > > Hi all,<br><div><br></div>> > > I have updated the GlusterFS snapshot forge wiki.<br><div><br></div>> > > https://forge.gluster.org/snapshot/pages/Home<br><div><br></div>> > > Please go through it and let me know if you have any questions or<br>> > > queries.<br><div><br></div>> > > Best Regards,<br>> > > Rajesh<br><div><br></div>> > > [PS]: Please ignore previous mail. Accidentally hit send before<br>> > > completing<br>> > > :)<br><div><br></div>> > > _______________________________________________<br>> > > Gluster-devel mailing list<br>> > > Gluster-devel@nongnu.org<br>> > > https://lists.nongnu.org/mailman/listinfo/gluster-devel<br></blockquote><div><br></div></div><br>_______________________________________________<br>Gluster-devel mailing list<br>Gluster-devel@nongnu.org<br>https://lists.nongnu.org/mailman/listinfo/gluster-devel<br></blockquote><div><br></div></div></body></html>