<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">AFAIK kernel does not allow requests
bigger than 128KB and gluster has this limit hardcoded in
fuse-bridge.c. Currently it is not possible to increase or
decrease this value.<br>
<br>
I made the tests using maximum block sizes.<br>
<br>
Al 12/03/13 08:16, En/na lierihanmei ha escrit:<br>
</div>
<blockquote
cite="mid:19ce6bfd.189bb.13d5d750146.Coremail.lierihanmei@163.com"
type="cite">
<div
style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><br>
<br>
When glusterfs mount fuse, It uses the max_read=128KB option.
Any big <span style="white-space: pre-wrap; line-height: 1.7;"> request
would be split. Tuning the option, it will be faster in big
read and write, but no use for small files.</span><br>
<br>
<br>
<br>
<pre>
At 2013-03-11 18:49:47,"Xavier Hernandez" <a class="moz-txt-link-rfc2396E" href="mailto:xhernandez@datalab.es"><xhernandez@datalab.es></a> wrote:
>Hello,
>
>I've recently performed some tests with gluster on a fast network (IP
>over infiniband) and got some unexpected results. It seems that
>mount/fuse is becoming a bottleneck when the network and disk are very fast.
>
>I started with a simple distributed volume with 2 bricks mounted on a
>ramdisk to avoid possible disk bottlenecks (however I repeated the tests
>with an SSD and, later, with a normal hard disk and the results were the
>same, probably due to the good work of performance translators). With
>this configuration, a single write reached a throughput of ~420 MB/s.
>It's way below the maximum network limit, but for a single write it's
>quite acceptable. However with two concurrent writes (carefully chosen
>so that each one goes to a different brick), the throughput was ~200
>MB/s (for each transfer). That was totally unexpected. As there was
>plenty of bandwith available and no IO limitation, I was expecting
>something near 800 MB/s.
>
>In fact, any combination of concurrent writes always led to the same
>combined throughput of ~400 MB/s.
>
>Trying to determine the cause of this odd behavior, I noticed that
>mount/fuse uses a single thread to serve kernel requests, and once a
>request is received, it is sent down the xlator stack to process it,
>only reading additional requests once the stack returns. This means that
>to reach a 420 MB/s throughput using 128KB per request (the current
>maximum block size), it needs to serve, at least, 3360 requests per
>second. In other words, it processes each request in 300 us. If we take
>into account that every translator will allocate memory, and do some
>system calls, it's quite possible that it really takes 300 us to serve
>each request.
>
>To see if this is the case, I added the performance/io-threads just
>below the mount/fuse. This would queue each request to a different
>thread, freeing the current one to read another request much before than
>300 us. This should improve the concurrent writes case.
>
>The results are good. Using this simple modification, 2 concurrent
>writes performed at ~300 MB/s each one. However the throughput for a
>single write dropped to ~250 MB/s. Anyway, this solution is not valid
>because there is some incompatibility with this configuration and some
>things do not work well (for example a simple 'ls' does not show all the
>files).
>
>Then I modified the mount/fuse xlator to start some threads to serve
>kernel requests. With this modification all seems to work as expected
>and throughput is quite better: a single write still performs at 420
>MB/s, and 2 concurrent writes reach 330 MB/s. In fact, any combination
>of 2 or more concurrent writes has a combined throughput of ~650 MB/s.
>
>However, a replicate volume does not improve at all. I'm not sure why.
>It seems that there should be some kind of serialization point in
>cluster/afr. A single write has a throughput of ~175 MB/s, and 2
>concurrent writes ~85 MB/s. I'll have to investigate this further.
>
>Does all this make sense ?
>
>Is this something that would be worth investing more time ?
>
>Regards,
>
>Xavi
>
>_______________________________________________
>Gluster-devel mailing list
><a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a>
><a class="moz-txt-link-freetext" href="https://lists.nongnu.org/mailman/listinfo/gluster-devel">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a>
</pre>
</div>
<br>
<br>
<span title="neteasefooter"><span id="netease_mail_footer"></span></span>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Gluster-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a>
<a class="moz-txt-link-freetext" href="https://lists.nongnu.org/mailman/listinfo/gluster-devel">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a>
</pre>
</blockquote>
<br>
</body>
</html>