<div dir="ltr">I strongly suggest not using 3.3.1 or whole 3.3 branch. I would only go for 3.4.1 on something close to production and even there I wouldn&#39;t yet use rebalance/shrinking. We give gluster heavy testing before it goes to production and about updating, why don&#39;t you build your own packages? We are maintaining our builds for several years now with our patches which gladly end up in gluster upstream sooner or later.<br>

</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Nov 6, 2013 at 9:57 PM, Justin Dossey <span dir="ltr">&lt;<a href="mailto:jbd@podomatic.com" target="_blank">jbd@podomatic.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Joe,<div><br></div><div>You&#39;re right-- I probably should have dialed it back a bit! It&#39;s frustrating sometimes when I post about such a major issue and never see any reply.  </div>

<div><br></div><div>

In my case, I run into gfid bugs regularly, almost always in situations where I have copied an entire directory tree into a GlusterFS mount.  There have been no connectivity issues between nodes, no node restarts, etc, for months, but once in a while, I get a gfid mismatch and must manually correct the situation.</div>

<div><br></div><div>I would certainly purchase GlusterFS support if I had any option other than Red Hat-- they only support Red Hat Storage and that isn&#39;t a good fit for my environment at this time.  If GlusterFS is successful the way it could be, there will definitely be an opportunity for a firm to support it on non-RedHat platforms.</div>

<div><br></div><div>FWIW, I&#39;ve created a Github repo to store my scripts for navigating GlusterFS issues.  If they remain relevant and the repo gets activity, I&#39;ll go to Gluster Forge.  <a href="https://github.com/justindossey/gluster-scripts" target="_blank">https://github.com/justindossey/gluster-scripts</a></div>

<div><br></div><div><br></div><div><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Nov 6, 2013 at 12:15 PM, Joe Julian <span dir="ltr">&lt;<a href="mailto:joe@julianfamily.org" target="_blank">joe@julianfamily.org</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000"><div>

    On 11/06/2013 11:52 AM, Justin Dossey wrote:<br>

    <blockquote type="cite">

      <div dir="ltr">Shawn,

        <div><br>

        </div>

        <div>I had a very similar experience with a rebalance on 3.3.1,

          and it took weeks to get everything straightened out.  I would

          be happy to share the scripts I wrote to correct the

          permissions issues if you wish, though I&#39;m not sure it would

          be appropriate to share them directly on this list.  Perhaps I

          should just create a project on Github that is devoted to

          collecting scripts people use to fix their GlusterFS

          environments!  </div>

        <div><br>

        </div>

        <div>After that (awful) experience, I am loath to run further

          rebalances.  I&#39;ve even spent days evaluating alternatives to

          GlusterFS, as my experience with this list over the last six

          months indicates that support for community users is minimal,

          even in the face of major bugs such as the one with

          rebalancing and the continuing &quot;gfid different on subvolume&quot;

          bugs with 3.3.2.</div>

      </div>

    </blockquote></div>

    I&#39;m one of oldest GlusterFS users around here and one of the biggest

    proponents and even I have been loath to rebalance until 3.4.1.<br>

    <br>

    There are no open bugs for gfid mismatches that I could find. The

    last time someone mentioned that error in IRC it was 2am, I was at a

    convention, and I told the user how to solve that problem (

    <a href="http://irclog.perlgeek.de/gluster/2013-06-14#i_7196149" target="_blank">http://irclog.perlgeek.de/gluster/2013-06-14#i_7196149</a> ). It was

    caused by split-brain. If you have a bug, it would be more

    productive to file it rather than make negative comments about a

    community of people that have no requirement to help anybody, but do

    it anyway just because they&#39;re nice people.<br>

    <br>

    This is going to sound snarky because it&#39;s in text, but I mean this

    sincerely. If community support is not sufficient, you might

    consider purchasing support from a company that provides it

    professionally.<div><br>

    <br>

    <blockquote type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>Let me know what you think of the Github thing and I&#39;ll

          proceed appropriately.</div>

      </div>

    </blockquote></div>

    Even better, put them up on <a href="http://forge.gluster.org" target="_blank">http://forge.gluster.org</a><div><div><br>

    <br>

    <blockquote type="cite">

      <div class="gmail_extra"><br>

        <br>

        <div class="gmail_quote">On Tue, Nov 5, 2013 at 9:05 PM, Shawn

          Heisey <span dir="ltr">&lt;<a href="mailto:gluster@elyograg.org" target="_blank">gluster@elyograg.org</a>&gt;</span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">We

            recently added storage servers to our gluster install,

            running 3.3.1<br>

            on CentOS 6.  It went from 40TB usable (8x2

            distribute-replicate) to<br>

            80TB usable (16x2).  There was a little bit over 20TB used

            space on the<br>

            volume.<br>

            <br>

            The add-brick went through without incident, but the

            rebalance failed<br>

            after moving 1.5TB of the approximately 10TB that needed to

            be moved.  A<br>

            side issue is that it took four days for that 1.5TB to move.

             I&#39;m aware<br>

            that gluster has overhead, and that there&#39;s only so much

            speed you can<br>

            get out of gigabit, but a 100Mb/s half-duplex link could

            have copied the<br>

            data faster if it had been a straight copy.<br>

            <br>

            After I discovered that the rebalance had failed, I noticed

            that there<br>

            were other problems.  There are a small number of completely

            lost files<br>

            (91 that I know about so far), a huge number of permission

            issues (over<br>

            800,000 files changed to 000), and about 32000 files that

            are throwing<br>

            read errors via the fuse/nfs mount but seem to be available

            directly on<br>

            bricks.  That last category of problem file has the sticky

            bit set, with<br>

            almost all of them having ---------T permissions.  The good

            files on<br>

            bricks typically have the same permissions, but are readable

            by root.  I<br>

            haven&#39;t worked out the scripting necessary to automate all

            the fixing<br>

            that needs to happen yet.<br>

            <br>

            We really need to know what happened.  We do plan to upgrade

            to 3.4.1,<br>

            but there were some reasons that we didn&#39;t want to upgrade

            before adding<br>

            storage.<br>

            <br>

            * Upgrading will result in service interruption to our

            clients, which<br>

            mount via NFS.  It would likely be just a hiccup, with quick

            failover,<br>

            but it&#39;s still a service interruption.<br>

            * We have a pacemaker cluster providing the shared IP

            address for NFS<br>

            mounting.  It&#39;s running CentOS 6.3.  A &quot;yum upgrade&quot; to

            upgrade gluster<br>

            will also upgrade to CentOS 6.4.  The pacemaker in 6.4 is

            incompatible<br>

            with the pacemaker in 6.3, which will likely result in<br>

            longer-than-expected downtime for the shared IP address.<br>

            * We didn&#39;t want to risk potential problems with running

            gluster 3.3.1<br>

            on the existing servers and 3.4.1 on the new servers.<br>

            * We needed the new storage added right away, before we

            could schedule<br>

            maintenance to deal with the upgrade issues.<br>

            <br>

            Something that would be extremely helpful would be obtaining

            the<br>

            services of an expert-level gluster consultant who can look

            over<br>

            everything we&#39;ve done to see if there is anything we&#39;ve done

            wrong and<br>

            how we might avoid problems in the future.  I don&#39;t know how

            much the<br>

            company can authorize for this, but we obviously want it to

            be as cheap<br>

            as possible.  We are in Salt Lake City, UT, USA.  It would

            be preferable<br>

            to have the consultant be physically present at our

            location.<br>

            <br>

            I&#39;m working on redacting one bit of identifying info from

            our rebalance<br>

            log, then I can put it up on dropbox for everyone to

            examine.<br>

            <br>

            Thanks,<br>

            Shawn<br>

            <br>

            _______________________________________________<br>

            Gluster-users mailing list<br>

            <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

            <a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br>

          </blockquote>

        </div>

        <br>

        <br clear="all">

        <div><br>

        </div>

        -- <br>

        Justin Dossey<br>

        CTO, PodOmatic

        <div><br>

        </div>

      </div>

      <br>

      <fieldset></fieldset>

      <br>

      <pre>_______________________________________________

Gluster-users mailing list

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>

<a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a></pre>

    </blockquote>

    <br>

  </div></div></div>

<br>_______________________________________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

<a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br></blockquote></div><br><br clear="all"><div><br></div>-- <br>Justin Dossey<br>

CTO, PodOmatic<div><br></div>

</div>

</div></div><br>_______________________________________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

<a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br></blockquote></div><br></div>