<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 01/07/2013 05:06 PM, Stephan von

      Krawczynski wrote:<br>

    </div>

    <blockquote cite="mid:20130108020651.44ad22fe.skraw@ithnet.com"

      type="cite">

      <pre wrap="">On Mon, 07 Jan 2013 13:19:49 -0800

Joe Julian <a class="moz-txt-link-rfc2396E" href="mailto:joe@julianfamily.org">&lt;joe@julianfamily.org&gt;</a> wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">You have a replicated filesystem, brick1 and brick2.

Brick 2 goes down and you edit a 4k file, appending data to it.

That change, and the fact that there is a pending change, is stored on 

brick1.

Brick2 returns to service.

Your app wants to append to the file again. It calls stat on the file. 

Brick2 answers first stating that the file is 4k long. Your app seeks to 

4k and writes. Now the data you wrote before is gone.

</pre>

      </blockquote>

      <pre wrap="">

Forgive my ignorance, but it obvious that this implementation of a stat on a

replicating fs is shit. Of course a stat should await _all_ returning local

stats and should choose the stat of the _latest_ file version and note that

the file needs self heal.

</pre>

    </blockquote>

    Apparently I wasn't very clear that I was demonstrating an example

    of <i>why</i> there is a self-heal check whenever stat (or anything

    else that instantiates a file descriptor) is called.

    <blockquote cite="mid:20130108020651.44ad22fe.skraw@ithnet.com"

      type="cite">

      <pre wrap=""> 

</pre>

      <blockquote type="cite">

        <pre wrap="">This is one of the processes by which stale stat data can cause data 

loss. That's why each lookup() (which precedes the stat) causes a 

self-heal check and why it's a problem that hasn't been resolved in the 

last two years.

</pre>

      </blockquote>

      <pre wrap="">

self-heal is no answer to this question. The only valid answer is choosing the

_latest_ file version no matter if self heal is necessary or not.</pre>

    </blockquote>

    How do you know the _latest_? You contact the bricks that have the

    file. In a replicated volume that only happens if you check with

    _all_ the replicas. That's called a self-heal check. I'm not saying

    that if a self-heal is needed that it's completed before that answer

    is returned, simply that there's extra latency involved in ensuring

    you're not given the wrong response.<br>

    <blockquote cite="mid:20130108020651.44ad22fe.skraw@ithnet.com"

      type="cite">

      <pre wrap="">

</pre>

      <blockquote type="cite">

        <pre wrap="">I don't know the answer. I know that they want this problem to be 

solved, but right now the best solution is hardware. The lower the 

latency, the less of a problem you'll have.

</pre>

      </blockquote>

      <pre wrap="">

The only solution is correct programming, no matter what the below hardware

looks like. The only outcome of good or bad hardware is how _fast_ the

_correct_ answer reaches the fs client.</pre>

    </blockquote>

    Yes, if you can control the programming of your application, that

    would be a better solution. Unfortunately most of us use

    pre-packaged software like apache, php, etc. Since most of us don't

    have the chance to use the "correct programming" solution, then

    you're going to need to decrease latency if your going to open

    thousands of fd's for every operation and are unsatisfied with the

    results.<br>

    <blockquote cite="mid:20130108020651.44ad22fe.skraw@ithnet.com"

      type="cite">

      <pre wrap="">

Your description is a satire, not?

</pre>

      <blockquote type="cite">

        <pre wrap="">On 01/07/2013 12:59 PM, Dennis Jacobfeuerborn wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">On 01/07/2013 06:11 PM, Jeff Darcy wrote:

</pre>

          <blockquote type="cite">

            <pre wrap="">On 01/07/2013 12:03 PM, Dennis Jacobfeuerborn wrote:

</pre>

            <blockquote type="cite">

              <pre wrap="">The "gm convert" processes make almost no progress even though on a regular

filesystem each call takes only a fraction of a second.

</pre>

            </blockquote>

            <pre wrap="">Can you run gm_convert under strace?  That will give us a more accurate

idea of what kind of I/O it's generating.  I recommend both -t and -T to

get timing information as well.  Also, it never hurts to file a bug so

we can track/prioritize/etc.  Thanks.

<a class="moz-txt-link-freetext" href="https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS">https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS</a>

</pre>

          </blockquote>

          <pre wrap="">Thanks for the strace hint. As it turned out the gm convert call was issued

on the filename with a "[0]" appended which apparently led gm to stat() all

(!) files in the directory.

While this particular problem isn't really a glusterfs problem is there a

way to improve the stat() performance in general?

Regards,

   Dennis

_______________________________________________

Gluster-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>

<a class="moz-txt-link-freetext" href="http://supercolony.gluster.org/mailman/listinfo/gluster-users">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a>

</pre>

        </blockquote>

        <pre wrap="">

_______________________________________________

Gluster-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>

<a class="moz-txt-link-freetext" href="http://supercolony.gluster.org/mailman/listinfo/gluster-users">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a>

</pre>

      </blockquote>

      <pre wrap="">

</pre>

    </blockquote>

    <br>

  </body>

</html>