<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    hi Roman,<br>
         Do you think we can run this test again? this time, could you
    enable 'gluster volume profile &lt;volname&gt; start', do the same
    test. Provide output of 'gluster volume profile &lt;volname&gt;
    info' and logs after the test?<br>
    <br>
    Pranith<br>
    <div class="moz-cite-prefix">On 10/13/2014 09:45 PM, Roman wrote:<br>
    </div>
    <blockquote
cite="mid:CAFR=TBrzaGiprr7FK78FunZvgoWu32gZUE07nnQQEG-xs5EbKA@mail.gmail.com"
      type="cite">
      <div dir="ltr">Sure !
        <div><br>
        </div>
        <div>
          <div>root@stor1:~# gluster volume info</div>
          <div><br>
          </div>
          <div>Volume Name: HA-2TB-TT-Proxmox-cluster</div>
          <div>Type: Replicate</div>
          <div>Volume ID: 66e38bde-c5fa-4ce2-be6e-6b2adeaa16c2</div>
          <div>Status: Started</div>
          <div>Number of Bricks: 1 x 2 = 2</div>
          <div>Transport-type: tcp</div>
          <div>Bricks:</div>
          <div>Brick1: stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB</div>
          <div>Brick2: stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB</div>
          <div>Options Reconfigured:</div>
          <div>nfs.disable: 0</div>
          <div>network.ping-timeout: 10</div>
          <div><br>
          </div>
          <div>Volume Name: HA-WIN-TT-1T</div>
          <div>Type: Replicate</div>
          <div>Volume ID: 2937ac01-4cba-44a8-8ff8-0161b67f8ee4</div>
          <div>Status: Started</div>
          <div>Number of Bricks: 1 x 2 = 2</div>
          <div>Transport-type: tcp</div>
          <div>Bricks:</div>
          <div>Brick1: stor1:/exports/NFS-WIN/1T</div>
          <div>Brick2: stor2:/exports/NFS-WIN/1T</div>
          <div>Options Reconfigured:</div>
          <div>nfs.disable: 1</div>
          <div>network.ping-timeout: 10</div>
          <div><br>
          </div>
          <div><br>
          </div>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">2014-10-13 19:09 GMT+03:00 Pranith
          Kumar Karampuri <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>&gt;</span>:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF"> Could you give your
              'gluster volume info' output?<br>
              <br>
              Pranith
              <div>
                <div class="h5"><br>
                  <div>On 10/13/2014 09:36 PM, Roman wrote:<br>
                  </div>
                </div>
              </div>
              <blockquote type="cite">
                <div>
                  <div class="h5">
                    <div dir="ltr">Hi,
                      <div><br>
                      </div>
                      <div>I've got this kind of setup (servers run
                        replica)</div>
                      <div><br>
                      </div>
                      <div><br>
                      </div>
                      <div>@ 10G backend</div>
                      <div>gluster storage1</div>
                      <div>gluster storage2</div>
                      <div>gluster client1</div>
                      <div><br>
                      </div>
                      <div>@1g backend</div>
                      <div>other gluster clients</div>
                      <div><br>
                      </div>
                      <div>Servers got HW RAID5 with SAS disks.</div>
                      <div><br>
                      </div>
                      <div>So today I've desided to create a 900GB file
                        for iscsi target that will be located @
                        glusterfs separate volume, using dd (just a
                        dummy file filled with zeros, bs=1G count 900)</div>
                      <div>For the first of all the process took pretty
                        lots of time, the writing speed was 130 MB/sec
                        (client port was 2 gbps, servers ports were
                        running @ 1gbps).</div>
                      <div>Then it reported something like "endpoint is
                        not connected" and all of my VMs on the other
                        volume started to give me IO errors.</div>
                      <div>Servers load was around 4,6 (total 12 cores)</div>
                      <div><br>
                      </div>
                      <div>Maybe it was due to timeout of 2 secs, so
                        I've made it a big higher, 10 sec.</div>
                      <div><br>
                      </div>
                      <div>Also during the dd image creation time, VMs
                        very often reported me that their disks are slow
                        like</div>
                      <div>
                        <p>WARNINGs: Read IO Wait time is -0.02 (outside
                          range [0:1]).</p>
                        <p>Is 130MB /sec is the maximum bandwidth for
                          all of the volumes in total? That why would we
                          need 10g backends?</p>
                        <p>HW Raid local speed is 300 MB/sec, so it
                          should not be an issue. any ideas or mby any
                          advices?</p>
                        <p><br>
                        </p>
                        <p>Maybe some1 got optimized sysctl.conf for 10G
                          backend?</p>
                        <p>mine is pretty simple, which can be found
                          from googling.</p>
                        <p><br>
                        </p>
                        <p>just to mention: those VM-s were connected
                          using separate 1gbps intraface, which means,
                          they should not be affected by the client with
                          10g backend.</p>
                        <p><br>
                        </p>
                        <p>logs are pretty useless, they just say  this
                          during the outage</p>
                        <p><br>
                        </p>
                        <p>[2014-10-13 12:09:18.392910] W
                          [client-handshake.c:276:client_ping_cbk]
                          0-HA-2TB-TT-Proxmox-cluster-client-0: timer
                          must have expired</p>
                        <p>[2014-10-13 12:10:08.389708] C
                          [client-handshake.c:127:rpc_client_ping_timer_expired]
                          0-HA-2TB-TT-Proxmox-cluster-client-0: server <a
                            moz-do-not-send="true"
                            href="http://10.250.0.1:49159"
                            target="_blank">10.250.0.1:49159</a> has not
                          responded in the last 2 seconds,
                          disconnecting.</p>
                        <p>[2014-10-13 12:10:08.390312] W
                          [client-handshake.c:276:client_ping_cbk]
                          0-HA-2TB-TT-Proxmox-cluster-client-0: timer
                          must have expired</p>
                      </div>
                      <div>so I decided to set the timout a bit higher.</div>
                      <div>
                        <div><br>
                        </div>
                        <div>So it seems to me, that under high load
                          GlusterFS is not useable? 130 MB/s is not that
                          much to get some kind of timeouts or makeing
                          the systme so slow, that VM-s feeling
                          themselves bad.</div>
                        <div><br>
                        </div>
                        <div>Of course, after the disconnection, healing
                          process was started, but as VM-s lost
                          connection to both of servers, it was pretty
                          useless, they could not run anymore. and BTW,
                          when u load the server with such huge job (dd
                          of 900GB), healing process goes soooooo slow
                          :)</div>
                        <div><br>
                        </div>
                        <div><br>
                        </div>
                        <div><br>
                        </div>
                        -- <br>
                        Best regards,<br>
                        Roman. </div>
                    </div>
                    <br>
                    <fieldset></fieldset>
                    <br>
                  </div>
                </div>
                <pre>_______________________________________________
Gluster-users mailing list
<a moz-do-not-send="true" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a></pre>
              </blockquote>
              <br>
            </div>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        Best regards,<br>
        Roman.
      </div>
    </blockquote>
    <br>
  </body>
</html>