<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
hi Roman,<br>
Do you think we can run this test again? this time, could you
enable 'gluster volume profile <volname> start', do the same
test. Provide output of 'gluster volume profile <volname>
info' and logs after the test?<br>
<br>
Pranith<br>
<div class="moz-cite-prefix">On 10/13/2014 09:45 PM, Roman wrote:<br>
</div>
<blockquote
cite="mid:CAFR=TBrzaGiprr7FK78FunZvgoWu32gZUE07nnQQEG-xs5EbKA@mail.gmail.com"
type="cite">
<div dir="ltr">Sure !
<div><br>
</div>
<div>
<div>root@stor1:~# gluster volume info</div>
<div><br>
</div>
<div>Volume Name: HA-2TB-TT-Proxmox-cluster</div>
<div>Type: Replicate</div>
<div>Volume ID: 66e38bde-c5fa-4ce2-be6e-6b2adeaa16c2</div>
<div>Status: Started</div>
<div>Number of Bricks: 1 x 2 = 2</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1: stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB</div>
<div>Brick2: stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB</div>
<div>Options Reconfigured:</div>
<div>nfs.disable: 0</div>
<div>network.ping-timeout: 10</div>
<div><br>
</div>
<div>Volume Name: HA-WIN-TT-1T</div>
<div>Type: Replicate</div>
<div>Volume ID: 2937ac01-4cba-44a8-8ff8-0161b67f8ee4</div>
<div>Status: Started</div>
<div>Number of Bricks: 1 x 2 = 2</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1: stor1:/exports/NFS-WIN/1T</div>
<div>Brick2: stor2:/exports/NFS-WIN/1T</div>
<div>Options Reconfigured:</div>
<div>nfs.disable: 1</div>
<div>network.ping-timeout: 10</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2014-10-13 19:09 GMT+03:00 Pranith
Kumar Karampuri <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> Could you give your
'gluster volume info' output?<br>
<br>
Pranith
<div>
<div class="h5"><br>
<div>On 10/13/2014 09:36 PM, Roman wrote:<br>
</div>
</div>
</div>
<blockquote type="cite">
<div>
<div class="h5">
<div dir="ltr">Hi,
<div><br>
</div>
<div>I've got this kind of setup (servers run
replica)</div>
<div><br>
</div>
<div><br>
</div>
<div>@ 10G backend</div>
<div>gluster storage1</div>
<div>gluster storage2</div>
<div>gluster client1</div>
<div><br>
</div>
<div>@1g backend</div>
<div>other gluster clients</div>
<div><br>
</div>
<div>Servers got HW RAID5 with SAS disks.</div>
<div><br>
</div>
<div>So today I've desided to create a 900GB file
for iscsi target that will be located @
glusterfs separate volume, using dd (just a
dummy file filled with zeros, bs=1G count 900)</div>
<div>For the first of all the process took pretty
lots of time, the writing speed was 130 MB/sec
(client port was 2 gbps, servers ports were
running @ 1gbps).</div>
<div>Then it reported something like "endpoint is
not connected" and all of my VMs on the other
volume started to give me IO errors.</div>
<div>Servers load was around 4,6 (total 12 cores)</div>
<div><br>
</div>
<div>Maybe it was due to timeout of 2 secs, so
I've made it a big higher, 10 sec.</div>
<div><br>
</div>
<div>Also during the dd image creation time, VMs
very often reported me that their disks are slow
like</div>
<div>
<p>WARNINGs: Read IO Wait time is -0.02 (outside
range [0:1]).</p>
<p>Is 130MB /sec is the maximum bandwidth for
all of the volumes in total? That why would we
need 10g backends?</p>
<p>HW Raid local speed is 300 MB/sec, so it
should not be an issue. any ideas or mby any
advices?</p>
<p><br>
</p>
<p>Maybe some1 got optimized sysctl.conf for 10G
backend?</p>
<p>mine is pretty simple, which can be found
from googling.</p>
<p><br>
</p>
<p>just to mention: those VM-s were connected
using separate 1gbps intraface, which means,
they should not be affected by the client with
10g backend.</p>
<p><br>
</p>
<p>logs are pretty useless, they just say this
during the outage</p>
<p><br>
</p>
<p>[2014-10-13 12:09:18.392910] W
[client-handshake.c:276:client_ping_cbk]
0-HA-2TB-TT-Proxmox-cluster-client-0: timer
must have expired</p>
<p>[2014-10-13 12:10:08.389708] C
[client-handshake.c:127:rpc_client_ping_timer_expired]
0-HA-2TB-TT-Proxmox-cluster-client-0: server <a
moz-do-not-send="true"
href="http://10.250.0.1:49159"
target="_blank">10.250.0.1:49159</a> has not
responded in the last 2 seconds,
disconnecting.</p>
<p>[2014-10-13 12:10:08.390312] W
[client-handshake.c:276:client_ping_cbk]
0-HA-2TB-TT-Proxmox-cluster-client-0: timer
must have expired</p>
</div>
<div>so I decided to set the timout a bit higher.</div>
<div>
<div><br>
</div>
<div>So it seems to me, that under high load
GlusterFS is not useable? 130 MB/s is not that
much to get some kind of timeouts or makeing
the systme so slow, that VM-s feeling
themselves bad.</div>
<div><br>
</div>
<div>Of course, after the disconnection, healing
process was started, but as VM-s lost
connection to both of servers, it was pretty
useless, they could not run anymore. and BTW,
when u load the server with such huge job (dd
of 900GB), healing process goes soooooo slow
:)</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</div>
<br>
<fieldset></fieldset>
<br>
</div>
</div>
<pre>_______________________________________________
Gluster-users mailing list
<a moz-do-not-send="true" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a></pre>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman.
</div>
</blockquote>
<br>
</body>
</html>