<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Harry-<br>

    Thanks for the tip.&nbsp; My problem could well have been the same as

    yours.&nbsp; I have known for some time that "gluster peer status"

    doesn't give useful connection information but I didn't know about

    the "gluster volume status" commands; they must be new in version

    3.3.&nbsp; I usually discover connection problems by seeing phrases like

    "disconnected" and "anomalies" in the logs.&nbsp; This has been happening

    more often since I upgraded to version 3.3, and I suspect it is

    being caused by the very high load experienced by some servers.&nbsp; I

    have seen this load problem discussed in other threads.&nbsp; The next

    time I attempt a rebalance operation I will run "<font><font

        face="verdana,sans-serif">gluster volume status all detail"

        first to check connectivity.</font></font><br>

    <br>

    -Dan<br>

    <br>

    On 08/08/2012 08:31 PM, Harry Mangalam wrote:

    <blockquote

cite="mid:CAEib2OnXp05Wz3RcXPgdZ0VTqRX_QB-Oh_i2CwMVOos_svfijg@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      <font><font face="verdana,sans-serif">This sounds similar, tho not

          identical to a problem that I had recently (descriibed here:</font></font>

      <div><font><font face="verdana,sans-serif">&lt;<a

              moz-do-not-send="true"

href="http://gluster.org/pipermail/gluster-users/2012-August/011054.html">http://gluster.org/pipermail/gluster-users/2012-August/011054.html</a>&gt;</font></font></div>

      <div><font><font face="verdana,sans-serif">My problems resulted

            were teh result of starting this kind of rebalance with a

            server node appearing to be connected (via the 'gluster peer

            status' output, but not &nbsp;actually being connected as shown

            by the&nbsp;</font></font></div>

      <div><font><font face="verdana,sans-serif">'gluster volume status

            all detail' output. &nbsp;Note especially the part that describes

            its online state.</font></font></div>

      <div><font><font face="verdana,sans-serif"><br>

          </font></font></div>

      <div><font><font face="verdana,sans-serif">

            <div>------------------------------------------------------------------------------</div>

            <div>Brick &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: Brick pbs3ib:/bducgl</div>

            <div>Port &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : 24018</div>

            <div>Online &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : N &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;

              &lt;&lt;=====================</div>

            <div>Pid &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: 20953</div>

            <div>File System &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;: xfs</div>

            <div><br>

            </div>

          </font></font></div>

      <div><font><font face="verdana,sans-serif"><br>

          </font></font></div>

      <div><font><font face="verdana,sans-serif">You may have already

            verified this, but what I did was to start a rebalance /

            fix-layout with a disconnected brick and it went ahead and

            tried to do it, unsuccessfully as you might guess.. &nbsp;But

            when I finally was able to reconnect the downed brick, and

            restart the rebalance, it (astonishingly) was able to bring

            everything back. &nbsp;So props to the gluster team.</font></font></div>

      <div><font><font face="verdana,sans-serif"><br>

          </font></font></div>

      <div><font><font face="verdana,sans-serif">hjm</font></font></div>

      <div><font><font face="verdana,sans-serif"><br>

          </font></font><br>

        <div class="gmail_quote">

          On Wed, Aug 8, 2012 at 11:58 AM, Dan Bretherton <span

            dir="ltr">&lt;<a moz-do-not-send="true"

              href="mailto:d.a.bretherton@reading.ac.uk" target="_blank">d.a.bretherton@reading.ac.uk</a>&gt;</span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            Hello All-<br>

            I have noticed another problem after upgrading to version

            3.3. &nbsp;I am unable to do "gluster volume rebalance

            &lt;VOLUME&gt; fix-layout status" or "...fix-layout ...

            stop" after starting a rebalance operation with "gluster

            volume rebalance &lt;VOLUME&gt; fix-layout start". &nbsp; The

            fix-layout operation seemed to be progressing normally on

            all the servers according to the log files, but all attempts

            to do "status" or "stop" result in the CLI usage message

            being returned. &nbsp;The only reference to the rebalance

            commands in the log files were these, which all the servers

            seem to have one or more of.<br>

            <br>

            [root@romulus glusterfs]# grep rebalance *.log<br>

            etc-glusterfs-glusterd.vol.log:[2012-08-08 12:49:04.870709]

            W [socket.c:1512:__socket_proto_state_machine] 0-management:

            reading from socket failed. Error (Transport endpoint is not

            connected), peer (/var/lib/glusterd/vols/tracks/rebalance/cb21050d-05c2-42b3-8660-230954bab324.sock)<br>

            tracks-rebalance.log:[2012-08-06 10:41:18.550241] I

            [graph.c:241:gf_add_cmdline_options] 0-tracks-dht: adding

            option 'rebalance-cmd' for volume 'tracks-dht' with value

            '4'<br>

            <br>

            The volume name is "tracks" by the way. &nbsp;I wanted to stop

            the rebalance operation because it seemed to be causing a

            very high load on some of the servers had been running for

            several days. &nbsp;I ended up having to manually kill the

            rebalance processes on all the servers followed by

            restarting glusterd.<br>

            <br>

            After that I found that one of the servers had

            "rebalance_status=4" in file /var/lib/glusterd/vols/tracks/<a

              moz-do-not-send="true" href="http://node_state.info"

              target="_blank">node_state.info</a>, whereas all the

            others had "rebalance_status=0". &nbsp;I manually changed the '4'

            to '0' and restarted glusterd. &nbsp;I don't know if this was a

            consequence of the way I had killed the rebalance operation

            or the cause of the strange behaviour. &nbsp;I don't really want

            to start another rebalance going to test because the last

            one was so disruptive.<br>

            <br>

            Has anyone else experienced this problem since upgrading to

            3.3?<br>

            <br>

            Regards,<br>

            Dan.<br>

            <br>

            _______________________________________________<br>

            Gluster-users mailing list<br>

            <a moz-do-not-send="true"

              href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

            <a moz-do-not-send="true"

              href="http://gluster.org/cgi-bin/mailman/listinfo/gluster-users"

              target="_blank">http://gluster.org/cgi-bin/mailman/listinfo/gluster-users</a><br>

          </blockquote>

        </div>

        <br>

        <br clear="all">

        <div><br>

        </div>

        -- <br>

        Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine<br>

        [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487<br>

        415 South Circle View Dr, Irvine, CA, 92697 [shipping]<br>

        MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)<br>

        <br>

      </div>

    </blockquote>

  </body>

</html>