Hi Jordan,<br> Replies Inline.<br><br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d">At 11:02 PM 2/20/2009, Jordan Mendler wrote:<br>


<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

I am prototyping GlusterFS with ~50-60TB of raw disk space across non-raided disks in ~30 compute nodes. I initially separated the nodes into groups of two, and did a replicate across each set of single drives in a pair of servers. Next I did a stripe across the 33 resulting AFR groups, with a block size of 1MB and later with the default block size. With these configurations I am only seeing throughput of about 15-25 MB/s, despite a full Gig-E network.<br>


</blockquote></div></blockquote><div><br>Generally, we recommend stripe set of 4 nodes, and if you have more nodes, we recommend doing aggregate of multiple stripe volumes. This will help with scaling issues if you decide to add more nodes later, because by nature, stripe translator can&#39;t add more subvolumes, instead distribute can add more subvolumes (which can be even a new stripe of 4 subvolumes). <br>

<br>Also, we recommend having stripe-size of 128KB, with which one should have write-behind block-size of 128KB * (no of subvolumes of stripe), which helps to send each write call parallel to all the nodes. <br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="Ih2E3d"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>

What is generally the recommended configuration in a large striped environment? I am wondering if the number of nodes in the stripe is causing too much overhead, or if the bottleneck is likely somewhere else. </blockquote>

</div></blockquote><div><br>Yes, if the number of striped volumes are high, there is bit of more CPU consumption at client, and we may not utilize the parallelism properly. Again, having setup as described above should help. <br>

 </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

In addition, I saw a thread on the list that indicates it is better to replicate across stripes rather than stripe across replicates. Does anyone have any comments or opinion regarding this?<br>

</blockquote>

</div></blockquote><div><br>after rc2 releases, both should work fine. but before that, there was a known bug that, replicate was not handling &#39;holes&#39; created in stripe, while self-healing. Now that issue has been addressed.<br>

<br><br></div></div>Regards,<br><br>-- <br>Amar Tumballi<br><br>