Thanks Brian for your inputs.<br>Our requirement is both high throughput and high-availability.<br><br>Let me give a little bit of background for better understanding of our requirement -<br>It will be used by animation artists and a render-farm with around 300 render nodes.<br>


--------------------------------------------------------------------------------------------------------------------------------------------------------<br>1. When a rendering job is fired, we can expect at least 50 render nodes to simultaneously hit the storage to read a single scene (information) file. Now, this file could be anywhere in the range of 100MB to 2GB in size. <br>


<br>2.  Once the render is complete, each of this render node would write the generated image file back to the storage. The image files would be of 10 - 50MB is size. Here again, we can expect most of the renders to finish almost simultaneously, usually within a few seconds of each other.<br>


<br>3. The 100MB - 2GB scene will almost always be written to by a single artist i.e. no 2 artists would be working on the same scene file simultaneously.<br><br>4. The 10 - 50MB image files, from different rendering activities, would then be read by another set of nodes, for something called &#39;compositing&#39;. Compositing, gives you the final &#39;shot&#39; output.<br>


--------------------------------------------------------------------------------------------------------------------------------------------------------<br><br>We were trying to cater to both large file (100MB - 2GB) read speed and small file (10-50MB) read+write speed.<br>


With Gluster, we were thinking of setting the individual stripe size to 50MB so that each volume could hold a complete small file. While larger files could be striped across in 50MB chunks.<br><br>The RAID Controllers that come with branded hardware does not allow individual disk access (no passthrough mode), And plain SAS Controllers don&#39;t come with cache. So we were thinking of using a RAID Controller with cache, and creating RAID 0 arrays using just 2 disks.<br>


<br>One more thought, is it possible to have a mix of RAID6 volumes, and individual disks and force Gluster to write large files (*.ma) to RAID6 volumes and small files (*.iff) to individual disks. That would solve our problem completely.<br>


<br>Regards,<br><br><br>Indivar Nair<br><br><div class="gmail_quote">On Fri, Sep 28, 2012 at 1:01 AM, Brian Candler <span dir="ltr">&lt;<a href="mailto:B.Candler@pobox.com" target="_blank">B.Candler@pobox.com</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Thu, Sep 27, 2012 at 10:08:12PM +0530, Indivar Nair wrote:<br>

&gt;    We were trying to define our storage spec for Gluster and was wondering<br>

&gt;    which would be better purely from a performance perspective.<br>

&gt;    1. Use a simple 24 Disk JBOD with SAS Controller and export each hard<br>

&gt;    disk as an individual volume<br>

&gt;    OR<br>

&gt;    2. Use the 24 Disk JBOD with Flash Based Cache enabled RAID Controller,<br>

&gt;        create 12 RAID 0 Arrays of 2 Disks each,<br>

&gt;        and take advantage of the caching, especially for writing.<br>

<br>

</div>I&#39;m not sure why your controller would do caching for pairs of disks in<br>

RAID0, but not for single disks??<br>

<div class="im"><br>

&gt;    Just FYI, we will be creating a &#39;Striped Replicated&#39; Volume for H/A.<br>

<br>

</div>Where each server has a bunch of RAID0 disk sets? IMO this is a really,<br>

really bad idea.<br>

<br>

Consider the following:<br>

<br>

A. One disk in your RAID0 fails entirely. The whole volume is toast. You<br>

insert a new disk, do mkfs, and then you have to sync the whole filesystem&#39;s<br>

worth of data from the other server.  You hope that a disk doesn&#39;t fail in<br>

the corresponding volume on the other server during this period.<br>

<br>

But it&#39;s worse than this. Consider:<br>

<br>

B. You have a single unrecoverable read error on a single sector.<br>

<br>

In a RAID1 or RAID5 or RAID6, the controller will be able to recover the<br>

data from a different disk, write the data back to the failed disk, which<br>

will remap the bad sector to another part of the disk, and everything will<br>

continue fine just as if nothing happened. (Side note: you need to have<br>

drives which support ERC/TLER for this to work)<br>

<br>

With a RAID0, your entire brick will go down; Gluster cannot do this sort of<br>

sector-level repair.  You are then back in the situation (A) above, except<br>

that you will end up needlessly replacing a drive.<br>

<br>

Or you can dd the affected drive with zeros to force any bad sectors to be<br>

remapped; this will take hours, meanwhile you cross your fingers that you<br>

don&#39;t have any read error from the RAID0 on your other server.<br>

<br>

This is not a good recipe for data safety. If you care about capacity over<br>

speed, then use RAID6 in your bricks.  If you care about speed over<br>

capacity, then use RAID10.<br>

<br>

Of course, if you are just using this for scratch space (lots of temporary<br>

files) then RAID0 is probably fine - but your talk of HA suggests that your<br>

data is more important than that.<br>

<br>

Regards,<br>

<br>

Brian.<br>

</blockquote></div><br>