<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
My minimal donation:<br>
<br>
<div class="moz-cite-prefix">On 07/10/2013 04:01 AM, Allan Latham
wrote:<br>
</div>
<blockquote cite="mid:51DD3F0A.8080808@flexsys-group.de" type="cite">
<pre wrap="">There seems to be a problem with the way gluster is going.
For me it would be an ideal solution if it actually worked.</pre>
</blockquote>
Actually working is always the ideal. Actually working for all
possible use cases... may be a little more difficult (though still
ideal).<br>
<blockquote cite="mid:51DD3F0A.8080808@flexsys-group.de" type="cite">
<pre wrap="">I have a simple scenario and it just simply doesn't work. Reading over
the network when the file is available locally is plainly wrong. Our
application cannot take the performance hit nor the extra network traffic.</pre>
</blockquote>
It's not "wrong" just not the way you envision it. <br>
<br>
Typically, in a scaled scenario where clustered storage has the
strongest advantage, you'll have a limited number of storage servers
and a much greater number of application servers. The likelihood
that any of those application servers is going to have the file they
want locally, even if they're shared-use, is pretty slim.
Engineering for that probability is the "correct" solution in that
use case.<br>
<blockquote cite="mid:51DD3F0A.8080808@flexsys-group.de" type="cite">
<pre wrap="">I would suggest:
1. get a simple minimalist configuration working - 2 hosts and
replication only.
2. make it bomb-proof.
2a. it must cope with network failures, random reboots etc.
2b. if it stops it has to auto-recover quickly.</pre>
</blockquote>
So far, all done within reasonable parameters. "bomb proof" is an
obvious exaggeration and is unattainable. If you literally blow up
all your servers, you're going to lose data.<br>
<blockquote cite="mid:51DD3F0A.8080808@flexsys-group.de" type="cite">
<pre wrap="">
2c. if it can't it needs thorough documentation and adequate logs so a
reasonable sysop can rescue it.</pre>
</blockquote>
Define "reasonable sysop". Correcting from any failure that isn't
automatic is going to require a certain amount of understanding
about clustering, split-brain, and split-brain recovery. That's not
your typical first-tier sysop, IMHO.<br>
<blockquote cite="mid:51DD3F0A.8080808@flexsys-group.de" type="cite">
<pre wrap="">
2d. it needs a fast validation scanner which verifies that data is where
it should be and is identical everywhere (md5sum).</pre>
</blockquote>
md5sum isn't the fastest checksum algorithm.<br>
<blockquote cite="mid:51DD3F0A.8080808@flexsys-group.de" type="cite">
<pre wrap="">
3. make it efficient (read local whenever possible - use rsync
techniques - remove scalability obstacles so it doesn't get
exponentially slower as more files are replicated)</pre>
</blockquote>
See earlier point about scaled systems. Also it does not get
"exponentially slower as more files are replicated". That would be
silly.<br>
<blockquote cite="mid:51DD3F0A.8080808@flexsys-group.de" type="cite">
<pre wrap="">4. when that works expand to multiple hosts and clever distribution
techniques.
(repeat items 2 and 3 in the more complex environment)
If it doesn't work rock solid in a simple scenario it will never work in
a large scale cluster.</pre>
</blockquote>
Not necessarily true. That's like <a
href="http://joejulian.name/blog/dont-get-stuck-micro-engineering-for-scale/">comparing
Apples to Orchards</a>.<br>
<blockquote cite="mid:51DD3F0A.8080808@flexsys-group.de" type="cite">
<pre wrap="">
Until point 3 is reached I cannot use it - which is a great
disappointment for me as well as the good guys doing the development.</pre>
</blockquote>
Consider expanding your thinking to bits you have more control over.
Network latency is probably the biggest. Consider using low-latency
10Gig cards(1) and switches(2) or infiniband.<br>
<blockquote cite="mid:51DD3F0A.8080808@flexsys-group.de" type="cite">
<pre wrap="">
Good luck and thanks again
Allan
</pre>
</blockquote>
1) <a class="moz-txt-link-freetext" href="http://www.solarflare.com">http://www.solarflare.com</a> makes sub microsecond latency adapters
that can utilize a userspace driver pinned to the cpu doing the
request eliminating a context switch<br>
2) <a class="moz-txt-link-freetext" href="http://www.aristanetworks.com/en/products/7100t">http://www.aristanetworks.com/en/products/7100t</a> is a 2.5
microsecond switch<br>
</body>
</html>