<div dir="ltr">Thanks Justin, I found the problem. The VM can be deleted now.<div><br></div><div>Turns out, there was more than enough time for the rebalance to complete. But we hit a race, which caused a command to fail.</div>

<div><br></div><div>The particular test that failed is waiting for rebalance to finish. It does this by doing a &#39;gluster volume rebalance &lt;&gt; status&#39; command and checking the result. The EXPECT_WITHIN function runs this command till we have a match, the command fails or the timeout happens.</div>

<div><br></div><div>For a rebalance status command, glusterd sends a request to the rebalance process (as a brick_op) to get the latest stats. It had done the same in this case as well. But while glusterd was waiting for the reply, the rebalance completed and the process stopped itself. This caused the rpc connection between glusterd and rebalance proc to close. This caused the all pending requests to be unwound as failures. Which in turnlead to the command failing.</div>

<div><br></div><div>I cannot think of a way to avoid this race from within glusterd. For this particular test, we could avoid using the &#39;rebalance status&#39; command if we directly checked the rebalance process state using its pid etc. I don&#39;t particularly approve of this approach, as I think I used the &#39;rebalance status&#39; command for a reason. But I currently cannot recall the reason, and if cannot come with it soon, I wouldn&#39;t mind changing the test to avoid rebalance status.</div>

<div><br></div><div>~kaushal</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, May 22, 2014 at 5:22 PM, Justin Clift <span dir="ltr">&lt;<a href="mailto:justin@gluster.org" target="_blank">justin@gluster.org</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On 22/05/2014, at 12:32 PM, Kaushal M wrote:<br>

&gt; I haven&#39;t yet. But I will.<br>

&gt;<br>

&gt; Justin,<br>

&gt; Can I get take a peek inside the vm?<br>

<br>

</div>Sure.<br>

<br>

  IP: 23.253.57.20<br>

  User: root<br>

  Password: foobar123<br>

<br>

The stdout log from the regression test is in /tmp/regression.log.<br>

<br>

The GlusterFS git repo is in /root/glusterfs.  Um, you should be<br>

able to find everything else pretty easily.<br>

<br>

Btw, this is just a temp VM, so feel free to do anything you want<br>

with it.  When you&#39;re finished with it let me know so I can delete<br>

it. :)<br>

<span class="HOEnZb"><font color="#888888"><br>

+ Justin<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

<br>

&gt; ~kaushal<br>

&gt;<br>

&gt;<br>

&gt; On Thu, May 22, 2014 at 4:53 PM, Pranith Kumar Karampuri &lt;<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>&gt; wrote:<br>

&gt; Kaushal,<br>

&gt;    Rebalance status command seems to be failing sometimes. I sent a mail about such spurious failure earlier today. Did you get a chance to look at the logs and confirm that rebalance didn&#39;t fail and it is indeed a timeout?<br>


&gt;<br>

&gt; Pranith<br>

&gt; ----- Original Message -----<br>

&gt; &gt; From: &quot;Kaushal M&quot; &lt;<a href="mailto:kshlmster@gmail.com">kshlmster@gmail.com</a>&gt;<br>

&gt; &gt; To: &quot;Pranith Kumar Karampuri&quot; &lt;<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>&gt;<br>

&gt; &gt; Cc: &quot;Justin Clift&quot; &lt;<a href="mailto:justin@gluster.org">justin@gluster.org</a>&gt;, &quot;Gluster Devel&quot; &lt;<a href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a>&gt;<br>

&gt; &gt; Sent: Thursday, May 22, 2014 4:40:25 PM<br>

&gt; &gt; Subject: Re: [Gluster-devel] bug-857330/normal.t failure<br>

&gt; &gt;<br>

&gt; &gt; The test is waiting for rebalance to finish. This is a rebalance with some<br>

&gt; &gt; actual data so it could have taken a long time to finish. I did set a<br>

&gt; &gt; pretty high timeout, but it seems like it&#39;s not enough for the new VMs.<br>

&gt; &gt;<br>

&gt; &gt; Possible options are,<br>

&gt; &gt; - Increase this timeout further<br>

&gt; &gt; - Reduce the amount of data. Currently this is 100 directories with 10<br>

&gt; &gt; files each of size between 10-500KB<br>

&gt; &gt;<br>

&gt; &gt; ~kaushal<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri &lt;<br>

&gt; &gt; <a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>&gt; wrote:<br>

&gt; &gt;<br>

&gt; &gt; &gt; Kaushal has more context about these CCed. Keep the setup until he<br>

&gt; &gt; &gt; responds so that he can take a look.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Pranith<br>

&gt; &gt; &gt; ----- Original Message -----<br>

&gt; &gt; &gt; &gt; From: &quot;Justin Clift&quot; &lt;<a href="mailto:justin@gluster.org">justin@gluster.org</a>&gt;<br>

&gt; &gt; &gt; &gt; To: &quot;Pranith Kumar Karampuri&quot; &lt;<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>&gt;<br>

&gt; &gt; &gt; &gt; Cc: &quot;Gluster Devel&quot; &lt;<a href="mailto:gluster-devel@gluster.org">gluster-devel@gluster.org</a>&gt;<br>

&gt; &gt; &gt; &gt; Sent: Thursday, May 22, 2014 3:54:46 PM<br>

&gt; &gt; &gt; &gt; Subject: bug-857330/normal.t failure<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; Hi Pranith,<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; Ran a few VM&#39;s with your Gerrit CR 7835 applied, and in &quot;DEBUG&quot;<br>

&gt; &gt; &gt; &gt; mode (I think).<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; One of the VM&#39;s had a failure in bug-857330/normal.t:<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt;   Test Summary Report<br>

&gt; &gt; &gt; &gt;   -------------------<br>

&gt; &gt; &gt; &gt;   ./tests/basic/rpm.t                             (Wstat: 0 Tests: 0<br>

&gt; &gt; &gt; Failed:<br>

&gt; &gt; &gt; &gt;   0)<br>

&gt; &gt; &gt; &gt;     Parse errors: Bad plan.  You planned 8 tests but ran 0.<br>

&gt; &gt; &gt; &gt;   ./tests/bugs/bug-857330/normal.t                (Wstat: 0 Tests: 24<br>

&gt; &gt; &gt; Failed:<br>

&gt; &gt; &gt; &gt;   1)<br>

&gt; &gt; &gt; &gt;     Failed test:  13<br>

&gt; &gt; &gt; &gt;   Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr  1.73 sys +<br>

&gt; &gt; &gt; 941.82<br>

&gt; &gt; &gt; &gt;   cusr 645.54 csys = 1591.22 CPU)<br>

&gt; &gt; &gt; &gt;   Result: FAIL<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; Seems to be this test:<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt;   COMMAND=&quot;volume rebalance $V0 status&quot;<br>

&gt; &gt; &gt; &gt;   PATTERN=&quot;completed&quot;<br>

&gt; &gt; &gt; &gt;   EXPECT_WITHIN 300 $PATTERN get-task-status<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; Is this one on your radar already?<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; Btw, this VM is still online.  Can give you access to retrieve logs<br>

&gt; &gt; &gt; &gt; if useful.<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; + Justin<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; --<br>

&gt; &gt; &gt; &gt; Open Source and Standards @ Red Hat<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; <a href="http://twitter.com/realjustinclift" target="_blank">twitter.com/realjustinclift</a><br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; _______________________________________________<br>

&gt; &gt; &gt; Gluster-devel mailing list<br>

&gt; &gt; &gt; <a href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a><br>

&gt; &gt; &gt; <a href="http://supercolony.gluster.org/mailman/listinfo/gluster-devel" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-devel</a><br>

&gt; &gt; &gt;<br>

&gt; &gt;<br>

&gt;<br>

<br>

--<br>

Open Source and Standards @ Red Hat<br>

<br>

<a href="http://twitter.com/realjustinclift" target="_blank">twitter.com/realjustinclift</a><br>

<br>

</div></div></blockquote></div><br></div>