<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 03.01.2014 01:59, Mikhail T. wrote:<br>
</div>
<blockquote cite="mid:52C65FDB.7040800@aldan.algebra.com"
type="cite">We expected to pay <i>some</i> performance penalty
for the features, but the actual numbers are causing a
sticker-shock...</blockquote>
<div align="justify">
<p>We tried to mount GlusterFS replicated volume using NFS, rather
than glusterfs/fuse and were pleasantly surprised. The
read-performance improved dramatically -- and is in line with
that of the NetApp NFS-server we are also using:</p>
<table caption="File-serving latency in milliseconds by the
underlying filesystem" border="1">
<tbody>
<tr>
<th><br>
</th>
<th colspan="2">Local FS<br>
</th>
<th colspan="2">NetApp NFS</th>
<th colspan="2">GlusterFS</th>
<th colspan="2">GlusterNFS
</th>
</tr>
<tr>
<th>
<br>
</th>
<th>Average</th>
<th>Minimum</th>
<th>Average</th>
<th>Minimum</th>
<th>Average</th>
<th>Minimum</th>
<th>Average</th>
<th>Minimum </th>
</tr>
<tr>
<th>Small static file</th>
<td>1.532</td>
<td>0.513</td>
<td>4.027</td>
<td>0.916</td>
<td>27.81</td>
<td>7.184</td>
<td>5.591</td>
<td>1.394 </td>
</tr>
<tr>
<th>Large static file</th>
<td>14.45</td>
<td>2.721</td>
<td>14.56</td>
<td>3.643</td>
<td>37.90</td>
<td>7.433</td>
<td>14.95</td>
<td>4.829
</td>
</tr>
</tbody>
</table>
<p>This suggests, the performance problem is not on the
Gluster-servers, but on the client-side and I tried to use the
profiler. Although the software is <a
href="https://bugzilla.redhat.com/show_bug.cgi?id=762856">not
easily profilable due to its use of shared libraries</a>,
partial coverage is possible... Here is what I did:</p>
</div>
<blockquote><tt>% env LDFLAGS="-pg" CFLAGS="-pg -O2 -march=core2
-pipe -fno-strict-aliasing -g" ./configure --enable-static</tt><tt><br>
</tt><tt>% make</tt><tt><br>
</tt><tt>% cd glusterfsd/src</tt><tt><br>
</tt><tt>% cc -pg -o glusterfsd.profilable *.o -lpthread
../../rpc/xdr/src/*.o ../../rpc/rpc-lib/src/*.o -lcrypto -lz
../../libglusterfs/src/*.o ../../xlators/mount/fuse/src/*.o -L
../../libglusterfs/src/.libs/ -lglusterfs -L
../../rpc/rpc-lib/src/.libs -lgfrpc -L ../../rpc/xdr/src/.libs
-lgfxdr</tt><tt><br>
</tt><tt>% ln -s glusterfsd.profiled glusterfs</tt><tt><br>
</tt><tt>% ./glusterfs --no-daemon ......</tt><br>
</blockquote>
<div align="justify">
<p>I then ran some tests for two minutes and umounted the share.
The resulting gmon.out does not cover any calls, that took place
in the explicitly dlopen-ed objects (like fuse.so), but
everything else is included: my <tt>glusterfsd.profilable</tt>
executable does not use shared libraries for its own work, it
links explicitly with the <tt>.o</tt>-files. (The shared libs
are needed only to be able to <tt>dlopen</tt> various
"xlators", which expect certain symbols to be available.)</p>
<p>Now, my testing did repeated requests of two files -- for one
minute each. First a tiny 430-byte file, and then a bigger 93Kb
one. In 122 seconds there were about 55K file-transfers and <tt>glusterfs</tt>-process
accumulated 135 seconds of CPU-time... According to gprof, these
55K transfers resulted in 597971 calls to each of the <tt>rpc_clnt_notify</tt>
and <tt>rpc_transport_notify</tt>, <i>each</i> accounting for
over 34% of the total time.</p>
<p>Joseph Landman <a
href="https://lists.gnu.org/archive/html/gluster-devel/2014-01/msg00011.html">indicated
earlier</a>, that he blames the kernel/user-space
context-switching, and that surely is responsible for <i>some</i>
overhead. But the bulk of it seems to be in the client/server
communication (that is, between the <tt>glusterfs</tt> client
and <tt>glusterfsd</tt> server), which appears to be
unnecessarily chatty. Why are there 100 notifications for each
file-transfer? I wonder, if this can be improved bringing the
performance of glusterfs-mounts closer to that of the NFS
method... (In fact, the NFS method may benefit from such an
optimization too.)<br>
</p>
</div>
<p align="justify">I may be reading profiler's output wrong -- I am
rusty in this area. Would anybody else care to take a look at the
results:</p>
<ul>
<li><a
href="http://aldan.algebra.com/%7Emi/tmp/glusterfs-gprof.txt">Plain</a>
"<tt>gprof glusterfsd.profiled</tt>"</li>
<li><a
href="http://aldan.algebra.com/%7Emi/tmp/glusterfs-annotated.txt">Annotated
source</a> produced by "<tt>gprof -A -l -x glusterfsd.profiled</tt>"</li>
</ul>
<div align="justify">
<p>Please, let me know, if you'd like me to massage the data in
some different way -- I'll keep the raw results around for some
time... Thank you! Yours,</p>
</div>
<blockquote>-mi<br>
</blockquote>
</body>
</html>