<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 08/02/2013 06:22 AM, Xavier Trilla
wrote:<br>
</div>
<blockquote
cite="mid:BD99EB262C32654EBEA452C5D5ED37C2026D636267@ST-SRV-05.silicontower.lan"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<meta name="Generator" content="Microsoft Word 14 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EstiloCorreo17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 3.0cm 70.85pt 3.0cm;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We have been playing for
a while with GlusterFS (Now with ver 3.4). We are running
tests and playing with it to check if GlusterFS can be
really used as the distributed storage for OpenStack block
storage (Cinder) as new features in KVM, GlusterFS and
OpenStack are pointing to GlusterFS as the future of
OpenStack open source block and object storage. <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">But we’ve found a
problem just when we started playing with GlusterFS… The way
distribute translator (DHT) balances the load. I mean, we
understand and see the benefits of metadata less setup.
Using hashes based on filenames and assigning a hash range
to each brick is clever, reliable and fast, but from our
understanding there is a big problem when it comes to
storing VM images of a OpenStack deployment. <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I mean, OpenStack Block
Storage (Cinder) assigns a name to each volume it creates
(GUID), so GlusterFS does a hash of the filename and decides
in which brick it should be stored. But as in this scenario
we don’t have many files (I mean, we would just have one big
file per VM) we may end with a really unbalanced storage. <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Let’s say we have a 4
bricks setup with DHT distribute, and we want to store 100
VMs there, so the ideal scenario would be:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Brick1: 25 VMs<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Brick2: 25 VMs<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Brick3: 25 VMs<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Brick4: 25 VMs<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">As VMs are IO intensive
it’s really important to correctly balance the load, as each
brick has a limited amount of IOPS, but as DHT is just based
on a filename HASH, we could end with something like the
following scenario (Or even worse): <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Brick1: 50 VMs<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Brick2: 10 VMs<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Brick3: 35 VMs<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Brick4: 5 VMs<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">And if we scale this
out, things may get even worse. I mean, we may end with
almost all VM file in one or two bricks and all the other
bricks almost empty. And if we use growing VM disk image
files like qcow2 the option “min-free-disk” will not prevent
all VMs disk image files being stored in the same brick. So,
I understand DHT works well for large amount of small files,
but for few big IO intensive files doesn’t seem to be a
really good solution… (I mean, we are looking for a solution
able to handle around 32 bricks and around 1500 VM for the
initial deployment and able to scale up to 256 bricks and
12000 VMs :/ )<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">So, anybody has a
suggestion about how to handle this? I mean so far we only
see two options: Either using legacy unify translator with
ALU scheduler or either use cluster/stripe translator with a
big block-size so at least load gets balanced across all
bricks in some way. But obviously we don’t like unify as it
needs a namespace brick, and using stripping seems to have
an impact on performance and really complicates
backup/restore/recovery strategies. <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<br>
</div>
</blockquote>
<br>
Another suggestion that you may want to try is, have your GlusterFS
node also serve as OpenStack Cinder and use NUFA[1]<br>
<br>
~shanks<br>
<br>
[1]
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
<a
href="http://gluster.org/community/documentation/index.php/Translators/cluster/nufa">http://gluster.org/community/documentation/index.php/Translators/cluster/nufa</a>
</body>
</html>