<div dir="ltr">Hi All,<br>I am new to Gluster and trying to figure out how the things 

actually work. Have been reading the doc, lists and blogs but still have

 some doubts. Did a little test trying to understand how a native client

would read from a replicated volume and need some help. My setup: 0. Using Gluster 3.4.0 of CentOS 6.4 64bit servers. 1. Replicated gluster volume, using two bricks on two different gluster servers. 2.

 Dedicated apache web server, mounting the replicated gluster volume, 

using gluster native client, having it&#39;s document root directory, with 

10 000 100K unique files, on the gluster mount.<br>3. Dedicated workstation that queries each unique 100K file from the web server using  httperf:<br>httperf

 --hog --server=192.168.29.45 --uri= --port=80 --wlog 

y,/var/tmp/100K_urls --num-calls 1 --timeout 5 --num-conns 10000  --rate

 100<br><br><br>Findings:<br><br>1. Gluster servers will cache requests 

and then will serve subsequent request from the cache (no disk i/o will 

be issued). Clearing the mem cache (sync; echo &quot;3&quot; 

&gt;/proc/sys/vm/drop_caches) will trigger reading the files from disk 

again.<br>  1.1 Not sure if the caching is due to gluster caching (the 

performance/io-cache translator is enabled by default) or it is due to 

the system caching.<br>  1.2 During the initial caching on the gluster 

servers, the load (top) on the web server would go above 200. It will go

 to normal level, once the i/o on the gluster servers goes down (server 

starts serving requests from the cache). There are any to very little 

blocked process on the web server (vmstat) and performance looks good. 

Apparently this should be attributed somehow to the gluster client  .. 

but why? <br><br>2. In order to figure out how the gluster client reads 

from the gluster servers, I started iptraf on the gluster client (the 

apache server) and on both gluster servers. After that, I stared the 

httperf from the workstation, requesting  each of the 10 000, 100K files

 located on the gluster volume (a total of 1G). In my understanding, the

 client would make sure that the requested file is the right one by 

checking file&#39;s metadata on all gluster 

servers, part of the replication volume, but will fetch it from only one

 of them.<br><br>To my surprise, according to iptraf, each gluster 

server sent 1038M (close to the total size of all files) to the gluster 

client. And the gluster client received ~98001K from each gluster 

server:<br><br>Proto/Port   Pkts      Bytes    PktsTo    BytesTo  PktsFrom BytesFrom<br><br>Gluster server 1:<br>TCP/1020  543629   1072M   101758   1038M    441871  34578324<br><br>Gluster server 2:<br>TCP/1019  546148   1072M   103366   1038M    442782  34625676<br>

<br>Gluster Client:<br>TCP/1020  1087627  981701K  697846  949912K  389781  

31789104                                                                                                            

 <br>TCP/1019  1087516  981973K  697993  950220K  389523  31753440<br><br>In the other hand, each gluster server cached ~500MB while the gluster client cached ~1GB<br><br>Gluster server 1:<br># free -m<br>             total       used       free     shared    buffers     cached<br>

Mem:          3821       1072       2748          0         66        520<br>-/+ buffers/cache:        485       3336<br>Swap:         3951          0       3951<br><br>Gluster server 1:<br># free -m <br>             total       used       free     shared    buffers     cached<br>

Mem:          3821       1085       2735          0         65        522<br>-/+ buffers/cache:        497       3324<br>Swap:         3951          3       3948<br><br>Gluser Client:<br># free -m<br>             total       used       free     shared    buffers     cached<br>

Mem:          3821       1330       2490          0          1        993<br>-/+ buffers/cache:        336       3485<br>Swap:         3951          0       3951<br><br><br>Can someone explain how exactly a single native guster client would read from a replication volume. <br>

Is

 there any mechanism (any details) that would spread requests (from a 

single client) among all servers, taking part in a replica volume, 

making it faster and spread the load among all servers?<br>Would a 

client take a decision where to fetch a file from each time based on 

some criteria (details) or it will be bound to a give server all the 

time?<br><br>Is there any low level documentation since I find the one provided on the gluster site to be very basic?<br><br>Thanks!<br>Kal<br></div>