<div dir="ltr">Alexander:<div><br></div><div>Performance is quite a vague concept. Relative, even. I don&#39;t mean to start some philosophy or anything, but it is true. </div><div><br></div><div>To begin with how are you connecting to the gluster volumes ? NFS ? Fuse (native glusterfs) ?</div>

<div><br></div><div>What volume set are you using ? Striped ? Distributed ?</div><div><br></div><div>How is your network set ? Jumbo frames ? </div><div> </div><div>From the details you provided, you are not a first timer. Sounds like you&#39;ve been doing a lot of research. Did  you happen to test the performance with other services, for instance, native NFS,or even ol-n-good FPT  ? </div>

<div><br></div><div>Is network performance ok ? </div><div><br></div><div>I was fighting some read and write performance issues a couple of weeks ago on my test servers, and it turns out it was the buffers on my NFS client. Tweaking that, and performance for LARGE FILES COPY saturated 1 Gbps.</div>

<div><br></div><div>But in the process I&#39;ve collected an interesting number of gluster and systcl hacks that seemed to improve performance as well.</div><div><br></div><div>Use at your own risk, for this affects memory usage on your server:</div>

<div><br></div><div>For sysctl.conf:</div><div><br></div><div><div>net.core.wmem_max=12582912</div><div>net.core.rmem_max=12582912</div><div>net.ipv4.tcp_rmem= 10240 87380 12582912</div><div>net.ipv4.tcp_wmem= 10240 87380 12582912</div>

<div>net.ipv4.tcp_window_scaling = 1</div><div>net.ipv4.tcp_timestamps = 1</div><div>net.ipv4.tcp_sack = 1</div><div>vm.swappiness=10</div><div>vm.dirty_background_ratio=1</div><div>net.ipv4.neigh.default.gc_thresh2=2048</div>

<div>net.ipv4.neigh.default.gc_thresh3=4096</div><div>net.core.netdev_max_backlog=2500</div><div>net.ipv4.tcp_mem= 12582912 12582912 12582912</div><div><br></div></div><div><br></div><div><br></div><div>If using a NFS client, use the following mount options:</div>

<div><br></div><div><div><br></div><div>-o rw,async,vers=3,rsize=65536,wsize=65536</div></div><div><br></div><div><br></div><div><br></div><div><br></div><div>Gliuster options I am currently using:</div><div><br></div><div>

<br></div><div><div>network.remote-dio: on</div><div>cluster.eager-lock: enable</div><div>performance.stat-prefetch: off</div><div>performance.io-cache: off</div><div>performance.read-ahead: off</div><div>performance.quick-read: off</div>

<div>network.ping-timeout: 20</div><div>nfs.nlm: off</div><div>nfs.addr-namelookup: off</div></div><div><br></div><div><br></div><div><br></div><div>Other gluster options I found elsewhere and are worth a try:</div><div><br>

</div><div><div>gluster volume set BigVol diagnostics.brick-log-level WARNING</div><div>gluster volume set BigVol diagnostics.client-log-level WARNING</div><div>gluster volume set BigVol nfs.enable-ino32 on</div><div><br>

</div><div>gluster volume set BigVol performance.cache-max-file-size 2MB</div><div>gluster volume set BigVol performance.cache-refresh-timeout 4</div><div>gluster volume set BigVol performance.cache-size 256MB</div><div>gluster volume set BigVol performance.write-behind-window-size 4MB</div>

<div>gluster volume set BigVol performance.io-thread-count 32</div></div><div><br></div><div>Now, DO keep in mind: mine is a TEST environment, while yours is a real-life situation. </div><div><br></div><div>Cheers, </div>

<div><br></div><div>Carlos</div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Mar 10, 2014 at 7:06 PM, Alexander Valys <span dir="ltr">&lt;<a href="mailto:avalys@avalys.net" target="_blank">avalys@avalys.net</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">A quick performance question.<br>

<br>

I have a small cluster of 4 machines, 64 cores in total.  I am running a scientific simulation on them, which writes at between 0.1 and 10 MB/s (total) to roughly 64 HDF5 files.  Each HDF5 file is written by only one process.  The writes are not continuous, but consist of writing roughly 1 MB of data to each file every few seconds.<br>


<br>

Writing to HDF5 involves a lot of reading the file metadata and random seeking within the file,  since we are actually writing to about 30 datasets inside each file.  I am hosting the output on a distributed gluster volume (one brick local to each machine) to provide a unified namespace for the (very rare) case when each process needs to read the other&#39;s files.<br>


<br>

I am seeing somewhat lower performance than I expected, i.e. a factor of approximately 4 less throughput than each node writing locally to the bare drives.  I expected the write-behind cache to buffer each write, but it seems that the writes are being quickly flushed across the network regardless of what write-behind cache size I use (32 MB currently), and the simulation stalls while waiting for the I/O operation to finish.  Anyone have any suggestions as to what to look at?  I am using gluster 3.4.2 on ubuntu 12.04.  I have flush-behind turned on, and have mounted the volume with direct-io-mode=disable, and have the cache size set to 256M.<br>


<br>

The nodes are connected via a dedicated gigabit ethernet network, carrying only gluster traffic (no simulation traffic).<br>

<br>

(sorry if this message comes through twice, I sent it yesterday but was not subscribed)<br>

_______________________________________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

<a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br>

</blockquote></div><br></div>