<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#ffffff">

    <br>

    On 29/09/11 12:28, Dan Bretherton wrote:

    <blockquote cite="mid:4E845656.3090304@reading.ac.uk" type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      <br>

      On 08/09/11 23:51, Dan Bretherton wrote:

      <blockquote cite="mid:4E6946E7.1010004@reading.ac.uk" type="cite">

        <br>

        <blockquote

cite="mid:CAN6e=3MryzbSX0wi=GbSZxQDk_xUVSoocKx7JGFp92iauqkB6g@mail.gmail.com"

          type="cite">

          <div class="gmail_quote">On Wed, Sep 7, 2011 at 4:27 PM, Dan

            Bretherton <span dir="ltr">&lt;<a moz-do-not-send="true"

                href="mailto:d.a.bretherton@reading.ac.uk">d.a.bretherton@reading.ac.uk</a>&gt;</span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt

              0.8ex; border-left: 1px solid rgb(204, 204, 204);

              padding-left: 1ex;">

              <div>

                <div class="h5"><br>

                  On 17/08/11 16:19, Dan Bretherton wrote:<br>

                  <blockquote class="gmail_quote" style="margin: 0pt 0pt

                    0pt 0.8ex; border-left: 1px solid rgb(204, 204,

                    204); padding-left: 1ex;"> <br>

                    <blockquote class="gmail_quote" style="margin: 0pt

                      0pt 0pt 0.8ex; border-left: 1px solid rgb(204,

                      204, 204); padding-left: 1ex;"> <br>

                      <br>

                      <br>

                      Dan Bretherton wrote:<br>

                      <blockquote class="gmail_quote" style="margin: 0pt

                        0pt 0pt 0.8ex; border-left: 1px solid rgb(204,

                        204, 204); padding-left: 1ex;"> <br>

                        On 15/08/11 20:00, <a moz-do-not-send="true"

                          href="mailto:gluster-users-request@gluster.org"

                          target="_blank">gluster-users-request@gluster.org</a>

                        wrote:<br>

                        <blockquote class="gmail_quote" style="margin:

                          0pt 0pt 0pt 0.8ex; border-left: 1px solid

                          rgb(204, 204, 204); padding-left: 1ex;">

                          Message: 1<br>

                          Date: Sun, 14 Aug 2011 23:24:46 +0300<br>

                          From: "Deyan Chepishev - SuperHosting.BG"&lt;<a

                            moz-do-not-send="true"

                            href="mailto:dchepishev@superhosting.bg"

                            target="_blank">dchepishev@superhosting.bg</a>&gt;<br>

                          Subject: [Gluster-users] cluster.min-free-disk

                          &nbsp;separate for each<br>

                          &nbsp; &nbsp;brick<br>

                          To: <a moz-do-not-send="true"

                            href="mailto:gluster-users@gluster.org"

                            target="_blank">gluster-users@gluster.org</a><br>

                          Message-ID:&lt;<a moz-do-not-send="true"

                            href="mailto:4E482F0E.3030604@superhosting.bg"

                            target="_blank">4E482F0E.3030604@superhosting.bg</a>&gt;<br>

                          Content-Type: text/plain; charset=UTF-8;

                          format=flowed<br>

                          <br>

                          Hello,<br>

                          <br>

                          I have a gluster set up with very different

                          brick sizes.<br>

                          <br>

                          brick1: 9T<br>

                          brick2: 9T<br>

                          brick3: 37T<br>

                          <br>

                          with this configuration if I set the parameter

                          cluster.min-free-disk to 10% it<br>

                          applies to all bricks which is quite

                          uncomfortable with these brick sizes,<br>

                          because 10% for the small bricks are ~ 1T but

                          for the big brick it is ~3.7T and<br>

                          what happens at the end is that if all brick

                          go to 90% usage and I continue<br>

                          writing, the small ones eventually fill up to

                          100% while the big one has enough<br>

                          free space.<br>

                          <br>

                          My question is, is there a way to set

                          cluster.min-free-disk per brick instead<br>

                          setting it for the entire volume or any other

                          way to work around this problem ?<br>

                          <br>

                          Thank you in advance<br>

                          <br>

                          Regards,<br>

                          Deyan<br>

                          <br>

                        </blockquote>

                        Hello Deyan,<br>

                        <br>

                        I have exactly the same problem and I have asked

                        about it before - see links below.<br>

                        <br>

                        <a moz-do-not-send="true"

href="http://community.gluster.org/q/in-version-3-1-4-how-can-i-set-the-minimum-amount-of-free-disk-space-on-the-bricks/"

                          target="_blank">http://community.gluster.org/q/in-version-3-1-4-how-can-i-set-the-minimum-amount-of-free-disk-space-on-the-bricks/</a>

                        <br>

                        <a moz-do-not-send="true"

                          href="http://gluster.org/pipermail/gluster-users/2011-May/007788.html"

                          target="_blank">http://gluster.org/pipermail/gluster-users/2011-May/007788.html</a><br>

                        <br>

                        My understanding is that the patch referred to

                        in Amar's reply in the May thread prevents a

                        "migrate-data" rebalance operation failing by

                        running out of space on smaller bricks, but that

                        doesn't solve the problem we are having. &nbsp;Being

                        able to set min-free-disk for each brick

                        separately would be useful, as would being able

                        to set this value as a number of bytes rather

                        than a percentage. &nbsp;However, even if these

                        features were present we would still have a

                        problem when the amount of free space becomes

                        less than min-free-disk, because this just

                        results in a warning message in the logs and

                        doesn't actually prevent more files from being

                        written. &nbsp;In other words, min-free-disk is a

                        soft limit rather than a hard limit. &nbsp;When a

                        volume is more than 90% full there may still be

                        hundreds of gigabytes of free space spread over

                        the large bricks, but the small bricks may each

                        only have a few gigabytes left of even less.

                        &nbsp;Users do "df" and see lots of free space in the

                        volume so they continue writing files. &nbsp;However,

                        when GlusterFS chooses to write a file to a

                        small brick, the write fails with "device full"

                        errors if the file grows too large, which is

                        often the case here with files typically several

                        gigabytes in size for some applications.<br>

                        <br>

                        I would really like to know if there is a way to

                        make min-free-disk a hard limit. &nbsp;Ideally,

                        GlusterFS would chose a brick on which to write

                        a file based on how much free space it has left

                        rather than choosing a brick at random (or

                        however it is done now). &nbsp;That would solve the

                        problem of non-uniform brick sizes without the

                        need for a hard min-free-disk limit.<br>

                        <br>

                        Amar's comment in the May thread about QA

                        testing being done only on volumes with uniform

                        brick sizes prompted me to start standardising

                        on a uniform brick size for each volume in my

                        cluster. &nbsp;My impression is that implementing the

                        features needed for users with non-uniform brick

                        sizes is not a priority for Gluster, and that

                        users are all expected to use uniform brick

                        sizes. &nbsp;I really think this fact should be

                        stated clearly in the GlusterFS documentation,

                        in the sections on creating volumes in the

                        Administration Guide for example. &nbsp;That would

                        stop other users from going down the path that I

                        did initially, which has given me a real

                        headache because I am now having to move tens of

                        terabytes of data off bricks that are larger

                        than the new standard size.<br>

                        <br>

                        Regards<br>

                        Dan.<br>

                        <br>

                      </blockquote>

                      Hello,<br>

                      <br>

                      This is really bad news, because I already

                      migrated my data and I just realized that I am

                      screwed because Gluster just does not care about

                      the brick sizes.<br>

                      It is impossible to move to uniform brick sizes.<br>

                      <br>

                      Currently we use 2TB &nbsp;HDDs, but the disks are

                      growing and soon we will probably use 3TB hdds or

                      whatever other larges sizes appear on the market.

                      So if we choose to use raid5 and some level of

                      redundancy (for example 6hdds in raid5, no matter

                      what their size is) this sooner or later will lead

                      us to non uniform bricks which is a problem and it

                      is not correct to expect that we always can or

                      want to provide uniform size bricks.<br>

                      <br>

                      With this way of thinking if we currently have 10T

                      from 6x2T in hdd5, at some point when there is a

                      10T on a single disk we will have to use no raid

                      just because gluster can not handle non uniform

                      bricks.<br>

                      <br>

                      Regards,<br>

                      Deyan<br>

                      <br>

                    </blockquote>

                    <br>

                    I think Amar might have provided the answer in his

                    posting to the thread yesterday, which has just

                    appeared in my autospam folder.<br>

                    <br>

                    <a moz-do-not-send="true"

href="http://gluster.org/pipermail/gluster-users/2011-August/008579.html"

                      target="_blank">http://gluster.org/pipermail/gluster-users/2011-August/008579.html</a><br>

                    <br>

                    <blockquote class="gmail_quote" style="margin: 0pt

                      0pt 0pt 0.8ex; border-left: 1px solid rgb(204,

                      204, 204); padding-left: 1ex;"> With size option,

                      you can have a hardbound on min-free-disk<br>

                    </blockquote>

                    This means that you can set a hard limit on

                    min-free-disk, and set a value in GB that is bigger

                    than the biggest file that is ever likely to be

                    written. &nbsp;This looks likely to solve our problem and

                    make non-uniform brick sizes a practical

                    proposition. &nbsp;I wish I had known about this back in

                    May when I embarked on my cluster restructuring

                    exercise; the issue was discussed in this thread in

                    May as well: &nbsp;<a moz-do-not-send="true"

                      href="http://gluster.org/pipermail/gluster-users/2011-May/007794.html"

                      target="_blank">http://gluster.org/pipermail/gluster-users/2011-May/007794.html</a><br>

                    <br>

                    Once I have moved all the data off the large bricks

                    and standardised on a uniform brick size, it will be

                    relatively easy to stick to this because I use LVM.

                    &nbsp;I create logical volumes for new bricks when a

                    volume needs extending. &nbsp;The only problem with this

                    approach is what happens when the amount of free

                    space left on a server is less than the size of the

                    brick you want to create. &nbsp;The only option then

                    would be to use new servers, potentially wasting

                    several TB of free space on existing servers. &nbsp;The

                    standard brick size for most of my volumes is 3TB,

                    which allows me to use a mixture of small servers

                    and large servers in a volume and limits the amount

                    of free space that would be wasted if there wasn't

                    quite enough free space on a server to create

                    another brick. &nbsp;Another consequence of having 3TB

                    bricks is that a single server typically has two

                    more more bricks belonging to a the same volume,

                    although I do my best to distribute the volumes

                    across different servers in order to spread the

                    load. &nbsp;I am not aware of any problems associated

                    with exporting multiple bricks from a single server

                    and it has not caused me any problems so far that I

                    am aware of.<br>

                    <br>

                    -Dan.<br>

                    <br>

                  </blockquote>

                </div>

              </div>

              Hello Deyan,<br>

              <br>

              Have you tried giving min-free-disk a value in gigabytes,

              and if so does it prevent new files being written to your

              bricks when they are nearly full? &nbsp;I recently tried it

              myself and found that min-free-disk had no effect all. &nbsp;I

              deliberately filled my test/backup volume and most of the

              bricks became 100 full. &nbsp;I set min-free-disk to "20GB", as

              reported in "gluster volume ... info" below.<br>

              <br>

              cluster.min-free-disk: 20GB<br>

              <br>

              Unless I am doing something wrong it seems as though we

              can not "have a hardbound on min-free-disk" after all, and

              uniform brick size is therefore an essential requirement.

              &nbsp;It still doesn't say that in the documentation, at least

              not in the volume creation sections.

              <div>

                <div class="h5"><br>

                  <br>

                  -Dan.<br>

                  <br>

                </div>

              </div>

            </blockquote>

          </div>

          On 08/09/11 06:35, Raghavendra Bhat wrote:<br>

          &gt; This is how it is supposed to work.<br>

          &gt;<br>

          &gt; Suppose a distribute volume is created with 2 bricks. 1st

          brick is having 25GB of free space, 2nd disk has 35 GB of free

          space. If one sets a 30GB of minimum-free-disk through volume

          set (gluster volume set &lt;volname&gt; min-free-disk 30GB),

          then whenever files are created, if the file is hashed to the

          1st brick (which has 25GB of free space), then actual file

          will be created in the 2nd brick to which a linkfile will be

          created in the 1st brick. So the linkfile points to the actual

          file. A warning message indicating minimum free disk limit has

          been crosses and adding more nodes will be printed in the

          glusterfs log file. So any file which is hashed to the 1st

          brick will be created in the 2nd brick.<br>

          &gt;<br>

          &gt; Once the free space of 2nd brick also comes below 30 GB,

          then the files will be created in the respective hashed bricks

          only. There will be a warning message in the log file about

          the 2nd brick also crossing the minimum free disk limit.<br>

          &gt;<br>

          &gt; Regards,<br>

          &gt; Raghavendra Bhat<br>

          <br>

        </blockquote>

        Dear Raghavendra,<br>

        Thanks for explaining this to me.&nbsp; This mechanism should allow a

        volume to function correctly with non-uniform brick sizes even

        though min-free-disk is not a hard limit.&nbsp; I can understand now

        why I had so many problems with the default value of 10% for

        min-free-disk.&nbsp; 10% of a large brick can be very large compared

        to 10% of a small brick, so when they started filling up at the

        same rate after all had less than 10% free space the small

        bricks usually filled up long before large ones, giving "device

        full" errors even when df still showed a lot of free space in

        the volume.&nbsp; At least now we can minimise this effect by setting

        min-free-disk to a value in GB.<br>

        <br>

        -Dan.<br>

        <br>

      </blockquote>

      Dear Raghavendra,<br>

      Unfortunately I am still having problems with some bricks filling

      up completely, despite having "cluster.min-free-disk: 20GB".&nbsp; In

      one case I am still seeing warnings about bricks being nearly full

      in percentage terms in the client logs, so I am wondering if the

      volume is still using cluster.min-free-disk: 10%, and ignoring the

      20GB setting I changed it to.&nbsp; When I changed

      cluster.min-free-disk should this have taken effect immediately is

      there something else I should have done to activate the change?<br>

      <br>

      In your example above, suppose there are 9 bricks instead of 2

      bricks (as in my volume), and they all have less than 30GB free

      space except for one which is nearly empty, is GlusterFS clever

      enough to find that nearly empty brick every time when creating

      new files?&nbsp; I expected all new files to be created in my nearly

      empty brick but that has not happened.&nbsp; Some files have gone in

      there but most have gone to nearly full bricks, one of which has

      now filled up completely.&nbsp; I have done rebalance...fix-layout a

      number of times.&nbsp; What can I do to fix this problem?&nbsp; The volumes

      with one or more full bricks are unusable because users are

      getting "device full" errors for some writes even though both

      volumes are showing several TB free space.<br>

      <br>

      Regards<br>

      -Dan Bretherton.<br>

    </blockquote>

    <br>

    Dear All,<br>

    If anyone is interested, I managed to produce the expected behaviour

    by setting min-free-disk to 300GB rather than 30GB.&nbsp; 300GB is is

    approximately 10% of the size of most of the bricks in the volume.&nbsp;

    I don't understand why setting min-free-disk to 30GB (about 1% of

    the brick) didn't work; maybe it is too close to the limit for some

    reason.&nbsp; I wonder if the default value of min-free-disk=10% is

    significant.&nbsp; It seems that for non-uniform brick sizes, the correct

    approach is to set min-free-disk to a value in GB that is

    approximately 10% of the brick size in each case.<br>

    <br>

    -Dan<br>

  </body>

</html>