Hi All,<br><br>Thanks for great feedback, I had changed ip&#39;s and I noticed one server wasn&#39;t connecting correctly when checking log.<br><br>To ensure I had no wrong-doings I&#39;ve re-done the bricks from scratch, clean configurations, with mount info attached below, still not performing &#39;great&#39; compared to a single NFS mount.<br>

<br>The application we&#39;re running our files don&#39;t change, we only add / 

delete files, so I&#39;d like to get directory / file info cached as much as possible.<br><br><br>Config info:<br>

gluster&gt; volume info data-storage<br><br>Volume Name: data-storage<br>Type: Replicate<br>Volume ID: cc91c107-bdbb-4179-a097-cdd3e9d5ac93<br>Status: Started<br>Number of Bricks: 1 x 2 = 2<br>Transport-type: tcp<br>Bricks:<br>


Brick1: fs1:/data/storage<br>Brick2: fs2:/data/storage<br>gluster&gt; <br><br><br>On my web1 node I mounted:<br># mount -t glusterfs fs1:/data-storage /storage<br><br>I&#39;ve copied over my data to it again and doing a ls several times, takes ~0.5 seconds:<br>


[@web1 files]# time ls -all|wc -l<br>1989<br><br>real    0m0.485s<br>user    0m0.022s<br>sys     0m0.109s<br>[@web1 files]# time ls -all|wc -l<br>1989<br><br>real    0m0.489s<br>user    0m0.016s<br>sys     0m0.116s<br>[@web1 files]# time ls -all|wc -l<br>


1989<br><br>real    0m0.493s<br>user    0m0.018s<br>sys     0m0.115s<br><br>Doing the same thing on the raw os files on one node takes 0.021s<br>[@fs2 files]# time ls -all|wc -l<br>1989<br><br>real    0m0.021s<br>user    0m0.007s<br>


sys     0m0.015s<br>[@fs2 files]# time ls -all|wc -l<br>1989<br><br>real    0m0.020s<br>user    0m0.008s<br>sys     0m0.013s<br><br><br>Now full directory listing even seems slower... :<br>[@web1 files]# time ls -alR|wc -l<br>

2242956<br><br>real    74m0.660s<br>user    0m20.117s<br>sys     1m24.734s<br>[@web1 files]# time ls -alR|wc -l<br>2242956<br><br>real    26m27.159s<br>user    0m17.387s<br>sys     1m11.217s<br>[@web1 files]# time ls -alR|wc -l<br>

2242956<br><br>real    27m38.163s<br>user    0m18.333s<br>sys     1m19.824s<br><br><br>Just as crazy reference, on another single server with SSD&#39;s (Raid 10) drives I get:<br>files# time ls -alR|wc -l<br>2260484<br><br>

real    0m15.761s<br>user    0m5.170s<br>sys     0m7.670s<br>For the same operation. (this server even have more files...)<br><br>My goal is to get this directory listing as fast as possible, I don&#39;t have the hardware/budget to test a SSD configuration, but would a SSD setup give me ~1minute directory listing time (assuming it is 4 times slower than single node)?<br>

<br>If I added two more bricks to the cluster / replicated, would this double read speed?<br><br>Thanks for any insight!<br><br><br>-------------------- storage.log from web1 on mount ---------------------<br>[2012-06-07 20:47:45.584320] I [glusterfsd.c:1666:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.3.0<br>


[2012-06-07 20:47:45.624548] I [io-cache.c:1549:check_cache_size_ok] 0-data-storage-quick-read: Max cache size is 8252092416<br>[2012-06-07 20:47:45.624612] I [io-cache.c:1549:check_cache_size_ok] 0-data-storage-io-cache: Max cache size is 8252092416<br>


[2012-06-07 20:47:45.628148] I [client.c:2142:notify] 0-data-storage-client-0: parent translators are ready, attempting connect on transport<br>[2012-06-07 20:47:45.631059] I [client.c:2142:notify] 0-data-storage-client-1: parent translators are ready, attempting connect on transport<br>


Given volfile:<br>+------------------------------------------------------------------------------+<br>  1: volume data-storage-client-0<br>  2:     type protocol/client<br>  3:     option remote-host fs1<br>  4:     option remote-subvolume /data/storage<br>


  5:     option transport-type tcp<br>  6: end-volume<br>  7: <br>  8: volume data-storage-client-1<br>  9:     type protocol/client<br> 10:     option remote-host fs2<br> 11:     option remote-subvolume /data/storage<br>


 12:     option transport-type tcp<br> 13: end-volume<br> 14: <br> 15: volume data-storage-replicate-0<br> 16:     type cluster/replicate<br> 17:     subvolumes data-storage-client-0 data-storage-client-1<br> 18: end-volume<br>


 19: <br> 20: volume data-storage-write-behind<br> 21:     type performance/write-behind<br> 22:     subvolumes data-storage-replicate-0<br> 23: end-volume<br> 24: <br> 25: volume data-storage-read-ahead<br> 26:     type performance/read-ahead<br>


 27:     subvolumes data-storage-write-behind<br> 28: end-volume<br> 29: <br> 30: volume data-storage-io-cache<br> 31:     type performance/io-cache<br> 32:     subvolumes data-storage-read-ahead<br> 33: end-volume<br>34: <br>


 35: volume data-storage-quick-read<br> 36:     type performance/quick-read<br> 37:     subvolumes data-storage-io-cache<br> 38: end-volume<br> 39: <br> 40: volume data-storage-md-cache<br> 41:     type performance/md-cache<br>


 42:     subvolumes data-storage-quick-read<br> 43: end-volume<br> 44: <br> 45: volume data-storage<br> 46:     type debug/io-stats<br> 47:     option latency-measurement off<br> 48:     option count-fop-hits off<br> 49:     subvolumes data-storage-md-cache<br>


 50: end-volume<br><br>+------------------------------------------------------------------------------+<br>[2012-06-07 20:47:45.642625] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 0-data-storage-client-0: changing port to 24009 (from 0)<br>


[2012-06-07 20:47:45.648604] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 0-data-storage-client-1: changing port to 24009 (from 0)<br>[2012-06-07 20:47:49.592729] I [client-handshake.c:1636:select_server_supported_programs] 0-data-storage-client-0: Using Program GlusterFS 3.3.0, Num (1298437), Version (330)<br>


[2012-06-07 20:47:49.595099] I [client-handshake.c:1636:select_server_supported_programs] 0-data-storage-client-1: Using Program GlusterFS 3.3.0, Num (1298437), Version (330)<br>[2012-06-07 20:47:49.608455] I [client-handshake.c:1433:client_setvolume_cbk] 0-data-storage-client-0: Connected to <a href="http://10.1.80.81:24009" target="_blank">10.1.80.81:24009</a>, attached to remote volume &#39;/data/storage&#39;.<br>


[2012-06-07 20:47:49.608489] I [client-handshake.c:1445:client_setvolume_cbk] 0-data-storage-client-0: Server and Client lk-version numbers are not same, reopening the fds<br>[2012-06-07 20:47:49.608572] I [afr-common.c:3627:afr_notify] 0-data-storage-replicate-0: Subvolume &#39;data-storage-client-0&#39; came back up; going online.<br>


[2012-06-07 20:47:49.608837] I [client-handshake.c:453:client_set_lk_version_cbk] 0-data-storage-client-0: Server lk version = 1<br>[2012-06-07 20:47:49.616381] I [client-handshake.c:1433:client_setvolume_cbk] 0-data-storage-client-1: Connected to <a href="http://10.1.80.82:24009" target="_blank">10.1.80.82:24009</a>, attached to remote volume &#39;/data/storage&#39;.<br>


[2012-06-07 20:47:49.616434] I [client-handshake.c:1445:client_setvolume_cbk] 0-data-storage-client-1: Server and Client lk-version numbers are not same, reopening the fds<br>[2012-06-07 20:47:49.621808] I [fuse-bridge.c:4193:fuse_graph_setup] 0-fuse: switched to graph 0<br>


[2012-06-07 20:47:49.622793] I [client-handshake.c:453:client_set_lk_version_cbk] 0-data-storage-client-1: Server lk version = 1<br>[2012-06-07 20:47:49.622873] I [fuse-bridge.c:3376:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13<br>


[2012-06-07 20:47:49.623440] I [afr-common.c:1964:afr_set_root_inode_on_first_lookup] 0-data-storage-replicate-0: added root inode<br><br>-------------------- End storage.log -----------------------------------------------------<br>


<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><div class="gmail_quote">On Thu, Jun 7, 2012 at 9:46 AM, Pranith Kumar Karampuri <span dir="ltr">&lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

hi Brian,<br>

    &#39;stat&#39; command comes as fop (File-operation) &#39;lookup&#39; to the gluster mount which triggers self-heal. So the behavior is still same.<br>

I was referring to the fop &#39;stat&#39; which will be performed only on one of the bricks.<br>

Unfortunately most of the commands and fops have same name.<br>

Following are some of the examples of read-fops:<br>

        .access<br>

        .stat<br>

        .fstat<br>

        .readlink<br>

        .getxattr<br>

        .fgetxattr<br>

        .readv<br>

<div><br>

Pranith.<br>

----- Original Message -----<br>

From: &quot;Brian Candler&quot; &lt;<a href="mailto:B.Candler@pobox.com" target="_blank">B.Candler@pobox.com</a>&gt;<br>

</div><div>To: &quot;Pranith Kumar Karampuri&quot; &lt;<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;<br>

Cc: &quot;olav johansen&quot; &lt;<a href="mailto:luxis2012@gmail.com" target="_blank">luxis2012@gmail.com</a>&gt;, <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>, &quot;Fernando Frediani (Qube)&quot; &lt;<a href="mailto:fernando.frediani@qubenet.net" target="_blank">fernando.frediani@qubenet.net</a>&gt;<br>


Sent: Thursday, June 7, 2012 7:06:26 PM<br>

Subject: Re: [Gluster-users] Performance optimization tips Gluster 3.3? (small  files / directory listings)<br>

<br>

</div><div><div>On Thu, Jun 07, 2012 at 08:34:56AM -0400, Pranith Kumar Karampuri wrote:<br>

&gt; Brian,<br>

&gt;   Small correction: &#39;sending queries to *both* servers to check they are in sync - even read accesses.&#39; Read fops like stat/getxattr etc are sent to only one brick.<br>

<br>

Is that new behaviour for 3.3? My understanding was that stat() was a<br>

healing operation.<br>

<a href="http://gluster.org/community/documentation/index.php/Gluster_3.2:_Triggering_Self-Heal_on_Replicate" target="_blank">http://gluster.org/community/documentation/index.php/Gluster_3.2:_Triggering_Self-Heal_on_Replicate</a><br>


<br>

If this is no longer true, then I&#39;d like to understand what happens after a<br>

node has been down and comes up again.  I understand there&#39;s a self-healing<br>

daemon in 3.3, but what if you try to access a file which has not yet been<br>

healed?<br>

<br>

I&#39;m interested in understanding this, especially the split-brain scenarios<br>

(better to understand them *before* you&#39;re stuck in a problem :-)<br>

<br>

BTW I&#39;m in the process of building a 2-node 3.3 test cluster right now.<br>

<br>

Cheers,<br>

<br>

Brian.<br>

</div></div></blockquote></div><br>