<div dir="ltr">Ok, it appears that the following worked. Thanks for the nudge in the right direction:<div><br></div><div>volume replace-brick test-a 10.250.4.65:/localmnt/g2lv5 10.250.4.65:/localmnt/g2lv6 commit force</div>
<div><br></div><div>then</div><div>volume heal test-a full</div><div><br></div><div>and monitor the progress with</div><div>volume heal test-a info</div><div><br></div><div>However that does not solve my problem for what to do when a brick is corrupted somehow, if I don't have enough space to first heal it and then replace it. </div>
<div><br></div><div>That did get me thinking though, "what if I replace the brick, forgoe the heal, replace it again and then do a heal?" That seems to work.</div><div><br></div><div>So if I lose one brick, here is the process that I used to recover it:</div>
<div>1) create a directory that is just to temporary trick gluster and allow us to maintain the correct replica count: mkdir /localmnt/garbage</div><div>2) replace the dead brick with our garbage directory: volume replace-brick test-a 10.250.4.65:/localmnt/g2lv5 10.250.4.65:/localmnt/garbage commit force</div>
<div>3) fix our dead brick using whatever process is required. in this case, for testing, we had to remove some gluster bits or it throws the "already part of a volume error":</div><div>setfattr -x trusted.glusterfs.volume-id /localmnt/g2lv5<br>
</div><div>setfattr -x trusted.gfid /localmnt/g2lv5<br></div><div>4) now that our dead brick is fixed, swap it for the garbage/temporary brick: volume replace-brick test-a 10.250.4.65:/localmnt/garbage 10.250.4.65:/localmnt/g2lv5 commit force</div>
<div>5) now all that we have to do is let gluster heal the volume: volume heal test-a full</div><div><br></div><div>Is there anything wrong with this procedure?</div><div><br></div><div>Cheers,<br></div><div>Dave</div><div>
<br></div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Aug 16, 2013 at 11:03 AM, David Gibbons <span dir="ltr"><<a href="mailto:david.c.gibbons@gmail.com" target="_blank">david.c.gibbons@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Ravi,<div><br></div><div>Thanks for the tips. When I run a volume status:</div><div><div>gluster> volume status test-a</div>
<div>Status of volume: test-a</div><div>Gluster process Port Online Pid</div>
<div>------------------------------------------------------------------------------</div><div>Brick 10.250.4.63:/localmnt/g1lv2 49152 Y 8072</div><div>Brick 10.250.4.65:/localmnt/g2lv2 49152 Y 3403</div>
<div>Brick 10.250.4.63:/localmnt/g1lv3 49153 Y 8081</div><div>Brick 10.250.4.65:/localmnt/g2lv3 49153 Y 3410</div><div>Brick 10.250.4.63:/localmnt/g1lv4 49154 Y 8090</div>
<div>Brick 10.250.4.65:/localmnt/g2lv4 49154 Y 3417</div><div>Brick 10.250.4.63:/localmnt/g1lv5 49155 Y 8099</div><div>Brick 10.250.4.65:/localmnt/g2lv5 N/A N N/A</div>
<div>Brick 10.250.4.63:/localmnt/g1lv1 49156 Y 8576</div><div>Brick 10.250.4.65:/localmnt/g2lv1 49156 Y 3431</div><div>NFS Server on localhost 2049 Y 3440</div>
<div>Self-heal Daemon on localhost N/A Y 3445</div><div>NFS Server on 10.250.4.63 2049 Y 8586</div><div>Self-heal Daemon on 10.250.4.63 N/A Y 8593</div>
<div><br></div><div>There are no active volume tasks</div><div>--</div></div><div><br></div><div>Attempting to start the volume results in:</div><div><div>gluster> volume start test-a force</div><div>volume start: test-a: failed: Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /localmnt/g2lv5. Reason : No data available</div>
<div>--</div></div><div><br></div><div>It doesn't like when I try to fire off a heal either:</div><div><div>gluster> volume heal test-a</div><div>Launching Heal operation on volume test-a has been unsuccessful</div>
</div><div>--</div><div><br></div><div>Although that did lead me to this:</div><div><div>gluster> volume heal test-a info</div><div>Gathering Heal info on volume test-a has been successful</div><div><br></div><div>Brick 10.250.4.63:/localmnt/g1lv2</div>
<div>Number of entries: 0</div><div><br></div><div>Brick 10.250.4.65:/localmnt/g2lv2</div><div>Number of entries: 0</div><div><br></div><div>Brick 10.250.4.63:/localmnt/g1lv3</div><div>Number of entries: 0</div><div><br>
</div>
<div>Brick 10.250.4.65:/localmnt/g2lv3</div><div>Number of entries: 0</div><div><br></div><div>Brick 10.250.4.63:/localmnt/g1lv4</div><div>Number of entries: 0</div><div><br></div><div>Brick 10.250.4.65:/localmnt/g2lv4</div>
<div>Number of entries: 0</div><div><br></div><div>Brick 10.250.4.63:/localmnt/g1lv5</div><div>Number of entries: 0</div><div><br></div><div>Brick 10.250.4.65:/localmnt/g2lv5</div><div>Status: Brick is Not connected</div>
<div>Number of entries: 0</div><div><br></div><div>Brick 10.250.4.63:/localmnt/g1lv1</div><div>Number of entries: 0</div><div><br></div><div>Brick 10.250.4.65:/localmnt/g2lv1</div><div>Number of entries: 0</div></div><div>
--</div><div><br></div><div>So perhaps I need to re-connect the brick?</div><div><br></div><div>Cheers,</div><div>Dave</div><div><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">
On Fri, Aug 16, 2013 at 12:43 AM, Ravishankar N <span dir="ltr"><<a href="mailto:ravishankar@redhat.com" target="_blank">ravishankar@redhat.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><div><div>
<div>On 08/15/2013 10:05 PM, David Gibbons
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi There,
<div><br>
</div>
<div>I'm currently testing Gluster for possible production use.
I haven't been able to find the answer to this question in the
forum arch or in the public docs. It's possible that I don't
know which keywords to search for.</div>
<div><br>
</div>
<div>Here's the question (more details below): let's say that
one of my bricks "fails" -- <i>not</i> a whole node failure
but a single brick failure within the node. How do I replace a
single brick on a node and force a sync from one of the
replicas?</div>
<div><br>
</div>
<div>I have two nodes with 5 bricks each:</div>
<div>
<div>gluster> volume info test-a</div>
<div><br>
</div>
<div>Volume Name: test-a</div>
<div>Type: Distributed-Replicate</div>
<div>Volume ID: e8957773-dd36-44ae-b80a-01e22c78a8b4</div>
<div>Status: Started</div>
<div>Number of Bricks: 5 x 2 = 10</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1: 10.250.4.63:/localmnt/g1lv2</div>
<div>Brick2: 10.250.4.65:/localmnt/g2lv2</div>
<div>Brick3: 10.250.4.63:/localmnt/g1lv3</div>
<div>Brick4: 10.250.4.65:/localmnt/g2lv3</div>
<div>Brick5: 10.250.4.63:/localmnt/g1lv4</div>
<div>Brick6: 10.250.4.65:/localmnt/g2lv4</div>
<div>Brick7: 10.250.4.63:/localmnt/g1lv5</div>
<div>Brick8: 10.250.4.65:/localmnt/g2lv5</div>
<div>Brick9: 10.250.4.63:/localmnt/g1lv1</div>
<div>Brick10: 10.250.4.65:/localmnt/g2lv1</div>
</div>
<div><br>
</div>
<div>I formatted 10.250.4.65:/localmnt/g2lv5 (to simulate a
"failure"). What is the next step? I have tried various
combinations of removing and re-adding the brick, replacing
the brick, etc. I read in a previous message to this list that
replace-brick was for planned changes which makes sense, so
that's probably not my next step.</div>
</div>
</blockquote></div></div>
You must first check if the 'formatted' brick
10.250.4.65:/localmnt/g2lv5 is online using the `gluster volume
status` command. If not start the volume using `gluster volume start
<VOLNAME>force`. You can then use the gluster volume heal
command which would copy the data from the other replica brick into
your formatted brick.<br>
Hope this helps.<br>
-Ravi<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Cheers,</div>
<div>Dave</div>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Gluster-users mailing list
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a></pre>
</blockquote>
<br>
</div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>