<div dir="ltr">I'm testing GlusterFS viability for use with a typical PHP webapp (ie. lots of small files). I don't care so much for the C in the CAP theorem, as I have very few writes. I could live with a write propagation delay of 5 minutes (or dirty caches for up to 5 minutes). <div>
<br></div><div style>So I'm optimizing for low latency reads of small files. My testsetup is 2 node replication. Each node is both server and gluster client. Both are in sync. I stop glusterfs-server @ node2. @node1, I run a simple benchmark: repeatedly (to prime the cache) open & close 1000 small files. I have enabled the client-side io-cache and quick-read translators (see below for config).</div>
<div style><br></div><div style>The results are consistently 2 ms per open (O_RDONLY) call. Which is too slow, unfortunately, as I need < 0.2ms.</div><div style><br></div><div style>The same test with a local Gluster server and NFS mount, I get somewhat better performance but still 0.6ms. </div>
<div style><br></div><div style>The same test with Linux NFS server (v3) and local mount, I get 0.12ms per open.</div><div style><br></div><div style>I can't explain the lag using Gluster, because I can't see any traffic being sent to node2. I would expect that using the io-cache translator and local-only operation, the performance would approach that of the kernel FS cache.</div>
<div style><br></div><div style>Is this assumption correct? If yes, how would I profile the client sub system to detect the bottleneck? </div><div style><br></div><div style>If no, then I have to accept that 0.8ms open calls are the best that I could squeeze out of this system. Then I'll probably look into AFS, userspace async replication or gluster NFS mount with cachefilesd. Which would you recommend?</div>
<div style><br></div><div style>Thanks a lot! </div><div style>BTW I like Gluster a lot, and hope that it is also suitable for this small files use case ;)</div><div style><br></div><div style>//Willem</div><div><br><div style>
PS Am testing with kernel 3.5.0-17-generic 64bit and gluster 3.2.5-1ubuntu1.</div><div><div><br></div><div><div>Client volfile:</div><div>+------------------------------------------------------------------------------+</div>
<div> 1: volume testvol-client-0</div><div> 2: type protocol/client</div><div> 3: option remote-host g1</div><div> 4: option remote-subvolume /data</div><div> 5: option transport-type tcp</div><div> 6: end-volume</div>
<div> 7: </div><div> 8: volume testvol-client-1</div><div> 9: type protocol/client</div><div> 10: option remote-host g2</div><div> 11: option remote-subvolume /data</div><div> 12: option transport-type tcp</div>
<div> 13: end-volume</div><div> 14: </div><div> 15: volume testvol-replicate-0</div><div> 16: type cluster/replicate</div><div> 17: subvolumes testvol-client-0 testvol-client-1</div><div> 18: end-volume</div><div>
19: </div><div> 20: volume testvol-write-behind</div><div> 21: type performance/write-behind</div><div> 22: option flush-behind on</div><div> 23: subvolumes testvol-replicate-0</div><div> 24: end-volume</div>
<div> 25: </div><div> 26: volume testvol-io-cache</div><div> 27: type performance/io-cache</div><div> 28: option max-file-size 256KB</div><div> 29: option cache-timeout 60</div><div> 30: option priority *.php:3,*:0</div>
<div> 31: option cache-size 256MB</div><div> 32: subvolumes testvol-write-behind</div><div> 33: end-volume</div><div> 34: </div><div> 35: volume testvol-quick-read</div><div> 36: type performance/quick-read</div>
<div> 37: option cache-size 256MB</div><div> 38: subvolumes testvol-io-cache</div><div> 39: end-volume</div><div> 40: </div><div> 41: volume testvol</div><div> 42: type debug/io-stats</div><div> 43: option latency-measurement off</div>
<div> 44: option count-fop-hits off</div><div> 45: subvolumes testvol-quick-read</div><div> 46: end-volume</div></div></div></div><div><br></div><div style>Server volfile:</div><div style><div>+------------------------------------------------------------------------------+</div>
<div> 1: volume testvol-posix</div><div> 2: type storage/posix</div><div> 3: option directory /data</div><div> 4: end-volume</div><div> 5: </div><div> 6: volume testvol-access-control</div><div> 7: type features/access-control</div>
<div> 8: subvolumes testvol-posix</div><div> 9: end-volume</div><div> 10: </div><div> 11: volume testvol-locks</div><div> 12: type features/locks</div><div> 13: subvolumes testvol-access-control</div><div> 14: end-volume</div>
<div> 15: </div><div> 16: volume testvol-io-threads</div><div> 17: type performance/io-threads</div><div> 18: subvolumes testvol-locks</div><div> 19: end-volume</div><div> 20: </div><div> 21: volume testvol-marker</div>
<div> 22: type features/marker</div><div> 23: option volume-uuid bc89684f-569c-48b0-bc67-09bfd30ba253</div><div> 24: option timestamp-file /etc/glusterd/vols/testvol/marker.tstamp</div><div> 25: option xtime off</div>
<div> 26: option quota off</div><div> 27: subvolumes testvol-io-threads</div><div> 28: end-volume</div><div> 29: </div><div> 30: volume /data</div><div> 31: type debug/io-stats</div><div> 32: option latency-measurement off</div>
<div> 33: option count-fop-hits off</div><div> 34: subvolumes testvol-marker</div><div> 35: end-volume</div><div> 36: </div><div> 37: volume testvol-server</div><div> 38: type protocol/server</div><div> 39: option transport-type tcp</div>
<div> 40: option auth.addr./data.allow *</div><div> 41: subvolumes /data</div><div> 42: end-volume</div><div><br></div><div style>My benchmark to simulate PHP webapp i/o:</div><div style><div>#!/usr/bin/env python</div>
<div><br></div><div>import sys</div><div>import os</div><div>import time</div><div>import optparse</div><div><br></div><div>def print_timing(func):</div><div> def wrapper(*arg):</div><div> t1 = time.time()</div>
<div> res = func(*arg)</div><div> t2 = time.time()</div><div> print '%-15.15s %6d ms' % (func.func_name, int ( (t2-t1)*1000.0 ))</div><div> return res</div><div> return wrapper</div>
<div><br></div><div><br></div><div>def parse_options():</div><div> parser = optparse.OptionParser()</div><div> parser.add_option("--path", '-p', default="/mnt/glusterfs", </div><div> help="Base directory for running tests (default: /mnt/glusterfs)", </div>
<div> )</div><div> parser.add_option("--num", '-n', type="int", default=100, </div><div> help="Number of files per test (default: 100)", </div><div> )</div><div> (options, args) = parser.parse_args()</div>
<div> return options</div><div><br></div><div>class FSBench():</div><div> </div><div> def __init__(self,path="/tmp",num=100):</div><div> self.path = path</div><div> self.num = num</div><div>
</div><div> @print_timing</div><div> def test_open_read(self):</div><div> for filename in self.get_files():</div><div> f = open(filename)</div><div> data = f.read()</div><div> f.close()</div>
<div> </div><div> </div><div> def get_files(self):</div><div> for i in range(self.num):</div><div> filename = self.path + "/test_%03d" % i</div><div> yield filename</div>
<div> </div><div> @print_timing </div><div> def test_stat(self):</div><div> for filename in self.get_files():</div><div> os.stat(filename)</div><div><br></div><div> @print_timing </div>
<div> def test_stat_nonexist(self):</div><div> for filename in self.get_files():</div><div> try:</div><div> os.stat(filename+"blkdsflskdf")</div><div> except OSError:</div>
<div> pass</div><div> </div><div> @print_timing</div><div> def test_write(self):</div><div> for filename in self.get_files():</div><div> f = open(filename,'w')</div><div>
f.write('hi there\n')</div><div> f.close()</div><div> </div><div> @print_timing</div><div> def test_delete(self):</div><div> for filename in self.get_files():</div><div>
os.unlink(filename)</div><div> </div><div>if __name__ == '__main__':</div><div> <br></div><div> options = parse_options() </div><div> bench = FSBench(path=options.path, num=options.num)</div>
<div> </div><div> bench.test_write()</div><div> bench.test_open_read()<br></div><div> bench.test_stat()<br></div><div> bench.test_stat_nonexist()<br></div><div> bench.test_delete()<br></div><div><br></div>
</div><div style><br></div></div><div><br></div></div>