<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

<meta name="Generator" content="Microsoft Word 14 (filtered medium)">

<style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri","sans-serif";}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

span.EmailStyle17

        {mso-style-type:personal-compose;

        font-family:"Calibri","sans-serif";

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-family:"Calibri","sans-serif";}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

</head>

<body lang="EN-US" link="blue" vlink="purple">

<div class="WordSection1">

<p class="MsoNormal">We are about to abandon GlusterFS as a solution for our object storage needs.&nbsp; I&#8217;m hoping to get some feedback to tell me whether we have missed something and are making the wrong decision.&nbsp; We&#8217;re already a year into this project after

 evaluating a number of solutions.&nbsp; I&#8217;d like not to abandon GlusterFS if we just misunderstand how it works.<o:p></o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">Our use case is fairly straight forward.&nbsp; We need to save a bunch of somewhat large files (1MB-100MB).&nbsp; For the most part, these files are write once, read several times.&nbsp; Our initial store is 80TB, but we expect to go to roughly 320TB

 fairly quickly.&nbsp; After that, we expect to be adding another 80TB every few months.&nbsp; We are using some COTS servers which we add in pairs; each server has 40TB of usable storage.&nbsp; We intend to keep two copies of each file.&nbsp; We currently run 4TB bricks<o:p></o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">In our somewhat limited test environment, GlusterFS seemed to work well.&nbsp; And, our initial introduction of GlusterFS into our production environment went well.&nbsp; We had our initial 2 server (80TB) cluster about 50% full and things seemed

 to be going well.&nbsp; <o:p></o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">Then we added another pair of servers (for a total of 160TB).&nbsp; This went fine until we did the rebalance.&nbsp; We were running 3.3.1.&nbsp; We ran into the handle leak problem (which unfortunately we didn&#8217;t know about beforehand).&nbsp; We also found

 that if any of the bricks went offline while the rebalance was going on, then files were lost or they lost their permissions.&nbsp; We still don&#8217;t know why some of the bricks went offline, but they did and we have verified in our test environment that this is sufficient

 to cause the corruption problem.<o:p></o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">The good news is that we think both of these problems got fixed in 3.4.1.&nbsp; So why are we leaving?<o:p></o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">In trying to figure out what was going on with our GlusterFS system after the disastrous rebalance, we ran across two posts.&nbsp; The first one was

<a href="http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/">

http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/</a>.&nbsp; If we understand it correctly, anytime you add new storage servers to your cluster, you have to do a rebalance and that rebalance will require a minimum of 50% of the data in the cluster

 to be moved to make the hashing algorithms work.&nbsp; This means that when we have a 320TB cluster and add another 80TB, we have to move at least 160TB just to get things back into balance.&nbsp; Our estimate is that that will take months.&nbsp; It probably won&#8217;t finish

 before we need to add another 80TB.<o:p></o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">The other post we ran across was <a href="http://www.gluster.org/community/documentation/index.php/Planning34/ElasticBrick">

http://www.gluster.org/community/documentation/index.php/Planning34/ElasticBrick</a>.&nbsp; This post seems to confirm our understanding of the rebalance.&nbsp; It appears to be a discussion of the rebalance problem and a possible solution.&nbsp; It was apparently discussed

 for 3.4, but didn&#8217;t make the cut.&nbsp; <o:p></o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">I&#8217;d be happy to find out that we just got it wrong.&nbsp; Tell me that rebalancing doesn&#8217;t work the way we think.&nbsp; Or maybe we should configure things different or something.<o:p></o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">My problem is that if GlusterFS isn&#8217;t good for starting with a small cluster (80TB) and growing over time to half a petabyte, what is the use case it is intended for?&nbsp; Do you really have to start out with the amount of storage you think

 you&#8217;ll need in the long-run and just fill it up as you go?&nbsp; That&#8217;s why I&#8217;m nervous about our understanding of the rebalance.&nbsp; It&#8217;s hard to believe it works this way (at least from our perspective).<o:p></o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">We have a lot of man hours into writing code and putting infrastructure in for GlusterFS.&nbsp; We can likely reuse much of it for another system.&nbsp; I would just like to know that we really do understand the rebalance and that it really works

 the way I described it before we start evaluating other object store solutions.<o:p></o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">Comments?<o:p></o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal">Scott<o:p></o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

<p class="MsoNormal"><o:p>&nbsp;</o:p></p>

</div>

</body>

</html>