<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi Avati,<br>

    <br>

    El 06/02/14 00:24, Anand Avati ha escrit:<br>

    <blockquote

cite="mid:CAFboF2ybM7+UvKv4Di2jpN0fGKXoiFsYFaUuLe-KThYTHtr4Pw@mail.gmail.com"

      type="cite">

      <div dir="ltr">Xavi,

        <div>Getting such a caching mechanism has several aspects. First

          of all we need the framework pieces implemented (particularly

          server originated messages to the client for invalidation and

          revokes) in a well designed way. Particularly how we address a

          specific translator in a message originating from the server.

          Some of the recent changes to client_t allows for server-side

          translators to get a handle (the client_t object) on which

          messages can be submitted back to the client.</div>

        <div><br>

        </div>

        <div>Such a framework (of having server originated messages) is

          also necessary for implementing oplocks (and possibly leases)

          - particularly interesting for the Samba integration.</div>

        <div><br>

        </div>

      </div>

    </blockquote>

    Yes, that is a basic requirement for many features. I saw the

    client_t changes but haven't had time to see if they could be used

    to implement the kind of mechanism I proposed. This will need a

    look.<br>

    <br>

    When I started implementing the DFC translator

    (<a class="moz-txt-link-freetext" href="https://forge.gluster.org/disperse/dfc">https://forge.gluster.org/disperse/dfc</a>) I needed something very

    similar but at that time there wasn't any suitable client_t

    implementation I could use. I solved it by using a pool of special

    getxattr requests that the translator on the bricks stores until it

    needs to send some message back to the client. It's not a great

    solution but it works with the available resources at the moment.<br>

    <br>

    <blockquote

cite="mid:CAFboF2ybM7+UvKv4Di2jpN0fGKXoiFsYFaUuLe-KThYTHtr4Pw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>As Jeff already mentioned, this is an area where gluster

          has not focussed on, given the targeted use case. However the

          benefits of extending this to internal use cases (to avoid

          per-operation inodelks can benefit many modules -

          encryption/crypt, afr, etc.) It seems possible to have a

          common framework for delegating locks to clients, and build

          caching coherency protocols / oplocks / inodelk avoidence on

          top of it.</div>

        <div><br>

        </div>

        <div>Feel free to share a more detailed proposal if you have

          have/plan - I'm sure the Samba folks (Ira copied) would be

          interested too.</div>

      </div>

    </blockquote>

    I have some ideas on how to implement it and some special cases, but

    I need to work more on it before it can be considered a valid model.

    I just wanted to propose the idea to see if it could be valid or not

    before spending too much of my scarce time working on it. I'll try

    to get a more detailed picture to discuss it.<br>

    <br>

    Best regards,<br>

    <br>

    Xavi<br>

    <br>

    <blockquote

cite="mid:CAFboF2ybM7+UvKv4Di2jpN0fGKXoiFsYFaUuLe-KThYTHtr4Pw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Thanks!</div>

        <div>Avati<br>

          <div class="gmail_extra">

            <br>

            <br>

            <div class="gmail_quote">On Wed, Feb 5, 2014 at 11:27 AM,

              Xavier Hernandez <span dir="ltr">&lt;<a

                  moz-do-not-send="true"

                  href="mailto:xhernandez@datalab.es" target="_blank">xhernandez@datalab.es</a>&gt;</span>

              wrote:<br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div class="HOEnZb">

                  <div class="h5">On 04.02.2014 17:18, Jeff Darcy wrote:<br>

                    <br>

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      <blockquote class="gmail_quote" style="margin:0 0

                        0 .8ex;border-left:1px #ccc

                        solid;padding-left:1ex">

                        The only synchronization point needed is to make

                        sure that all bricks<br>

                        agree on the inode state and which client owns

                        it. This can be achieved<br>

                        without locking using a method similar to what I

                        implemented in the DFC<br>

                        translator. Besides the lock-less architecture,

                        the main advantage is<br>

                        that much more aggressive caching strategies can

                        be implemented very<br>

                        near to the final user, increasing considerably

                        the throughput of the<br>

                        file system. Special care has to be taken with

                        things than can fail on<br>

                        background writes (basically brick space and

                        user access rights). Those<br>

                        should be handled appropiately on the client

                        side to guarantee future<br>

                        success of writes. Of course this is only a high

                        level overview. A<br>

                        deeper analysis should be done to see what to do

                        on each special case.<br>

                        What do you think ?<br>

                      </blockquote>

                      <br>

                      I think this is a great idea for where we can go -

                      and need to go - in the<br>

                      long term. However, it's important to recognize

                      that it *is* the long<br>

                      term. We had to solve almost exactly the same

                      problems in MPFS long ago.<br>

                      Whether the synchronization uses locks or not

                      *locally* is meaningless,<br>

                      because all of the difficult problems have to do

                      with recovering the<br>

                      *distributed* state. What happens when a brick

                      fails while holding an<br>

                      inode in any state but I? How do we recognize it,

                      what do we do about it,<br>

                      how do we handle the case where it comes back and

                      needs to re-acquire its<br>

                      previous state? How do we make sure that a brick

                      can successfully flush<br>

                      everything it needs to before it yields a

                      lock/lease/whatever? That's<br>

                      going to require some kind of flow control, which

                      is itself a pretty big<br>

                      project. It's not impossible, but it took multiple

                      people some years for<br>

                      MPFS, and ditto for every other project (e.g. Ceph

                      or XtreemFS) which<br>

                      adopted similar approaches. GlusterFS's historical

                      avoidance of this<br>

                      complexity certainly has some drawbacks, but it

                      has also been key to us<br>

                      making far more progress in other areas.<br>

                      <br>

                    </blockquote>

                  </div>

                </div>

                Well, it's true that there will be a lot of tricky cases

                that will need<br>

                to be handled to be sure that data integrity and system

                responsiveness is<br>

                guaranteed, however I think that they are not more

                difficult than what<br>

                can happen currently if a client dies or loses

                communication while it<br>

                holds a lock on a file.<br>

                <br>

                Anyway I think there is a great potential with this

                mechanism because it<br>

                can allow the implementation of powefull caches, even

                based on SSD that<br>

                could improve the performance a lot.<br>

                <br>

                Of course there is a lot of work solving all potential

                failures and<br>

                designing the right thing. An important consideration is

                that all<br>

                these methods try to solve a problem that is seldom

                found (i.e. having<br>

                more than one client modifying the same file at the same

                time). So a<br>

                solution that has almost 0 overhead for the normal case

                and allows the<br>

                implementation of aggressive caching mechanisms seems a

                big win.

                <div class="im"><br>

                  <br>

                  <blockquote class="gmail_quote" style="margin:0 0 0

                    .8ex;border-left:1px #ccc solid;padding-left:1ex">

                    To move forward on this, I think we need a *much*

                    more detailed idea of<br>

                    how we're going to handle the nasty cases. Would

                    some sort of online<br>

                    collaboration - e.g. Hangouts - make more sense than

                    continuing via<br>

                    email?<br>

                    <br>

                  </blockquote>

                </div>

                Of course, we can talk on irc or another place if you

                prefer<br>

                <br>

                Xavi

                <div class="HOEnZb">

                  <div class="h5"><br>

                    <br>

                    _______________________________________________<br>

                    Gluster-devel mailing list<br>

                    <a moz-do-not-send="true"

                      href="mailto:Gluster-devel@nongnu.org"

                      target="_blank">Gluster-devel@nongnu.org</a><br>

                    <a moz-do-not-send="true"

                      href="https://lists.nongnu.org/mailman/listinfo/gluster-devel"

                      target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

                  </div>

                </div>

              </blockquote>

            </div>

            <br>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

  </body>

</html>