<div dir="ltr">On Mon, Apr 29, 2013 at 8:44 PM, Robert Hajime Lanning <span dir="ltr"><<a href="mailto:lanning@lanning.cc" target="_blank">lanning@lanning.cc</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 04/29/13 20:28, Anand Avati wrote:<div><div class="h5"><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
On Mon, Apr 29, 2013 at 9:19 AM, Heath Skarlupka <<a href="mailto:heath.skarlupka@ssec.wisc.edu" target="_blank">heath.skarlupka@ssec.wisc.edu</a> <mailto:<a href="mailto:heath.skarlupka@ssec.wisc.edu" target="_blank">heath.skarlupka@ssec.<u></u>wisc.edu</a>>> wrote:<br>
<br>
Gluster-Users,<br>
<br>
We currently have a 30 node Gluster Distributed-Replicate 15 x 2<br>
filesystem. Each node has a ~20TB xfs filesystem mounted to /data<br>
and the bricks live on /data/brick. We have been very happy with<br>
this setup, but are now collecting more data that doesn't need to<br>
be replicated because it can be easily regenerated. Most of the<br>
data lives on our replicated volume and is starting to waste<br>
space. My plan was to create a second directory under the /data<br>
partition called /data/non_replicated_brick on each of the 30<br>
nodes and start up a second Gluster filesystem. This would allow<br>
me to dynamically size the replicated and non_replicated space<br>
based on our current needs.<br>
<br>
I'm a bit worried about going forward with this because I haven't<br>
seen many users talk about putting two gluster bricks on the same<br>
underlying filesystem. I've gotten passed the technical hurdle<br>
and know that it is technically possible, but I'm worried about<br>
corner cases and issues that might crop up when we add more bricks<br>
and need to rebalance both gluster volumes at once. Does anybody<br>
have any insight in what the caveats of doing this are or are<br>
there any users putting multiple bricks on a single filesystem in<br>
the 50-100 node size range. Thank you all for your insights and help!<br>
<br>
<br>
This is a very common use case and should work fine. In the future we are exploring better integration with dm-thinp so that each brick has its own XFS filesystem on a thin provisioned logical volume. But for now you can create a second volume on the same XFS filesystems.<br>
<br>
Avati<br>
<br>
</blockquote>
<br></div></div>
There is an issue when replicated bricks fill unevenly. The non-replicated volume will cause uneven filling of bricks as seen in the replicated volume.<br>
<br>
I am not sure how ENOSPC is handled asymmetrically, but if the fuller brick happens to be down during a write that would be causing ENOSPC, you won't get the error and replication will fail, when the self-heal kicks in.<span class="HOEnZb"><font color="#888888"><br>
<br></font></span></blockquote><div><br></div><div style>Yes, self-heal will keep failing till enough free space is made available. Ideally you should set the "min-free-disk" parameter and have new creations redirected to a different server from about 80-90% util or so, and only let existing file grow bigger.</div>
<div style><br></div><div style>Avati </div></div></div></div>