<div dir="ltr"><div>But VMs are running from different volume, so you are not able to see it...<br></div><div>today my collegue started a full format for that huge virtual disk file and I've recieved it again:</div><div><p class=""> WARNINGs: Read
IO Wait time is 1.22 (outside range [0:1]).</p><p class="">I'll send logs @ friday.</p></div><div class="gmail_extra"><br><div class="gmail_quote">2014-10-15 18:24 GMT+03:00 Pranith Kumar Karampuri <span dir="ltr"><<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span class="">
<br>
<div>On 10/14/2014 01:20 AM, Roman wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">ok. done.
<div>this time there were no disconnects, at least all of vms
are working, but got some mails from VM about IO writes again.</div>
<div><span style="font-size:11pt;font-family:Calibri,sans-serif"><br>
</span></div>
<div><span style="font-size:11pt;font-family:Calibri,sans-serif">WARNINGs:
Read IO Wait time is 1.45 (outside
range [0:1]).</span><br>
</div>
</div>
</blockquote></span>
This warning says 'Read IO wait' and there is not a single READ
operation that came to gluster. Wondering why that is :-/. Any clue?
There is at least one write which took 3 seconds according to the
stats. At least one synchronization operation (FINODELK) took 23
seconds. Could you give logs of this run? for mount, glustershd,
bricks.<span class="HOEnZb"><font color="#888888"><br>
<br>
Pranith</font></span><div><div class="h5"><br>
<blockquote type="cite">
<div dir="ltr">
<div><span style="font-size:11pt;font-family:Calibri,sans-serif"><br>
</span></div>
<div>here is the output</div>
<div><br>
</div>
<div>
<div>root@stor1:~# gluster volume profile HA-WIN-TT-1T info</div>
<div>Brick: stor1:/exports/NFS-WIN/1T</div>
<div>--------------------------------</div>
<div>Cumulative Stats:</div>
<div> Block Size: 131072b+ 262144b+</div>
<div> No. of Reads: 0 0</div>
<div>No. of Writes: 7372798 1</div>
<div> %-latency Avg-latency Min-Latency Max-Latency
No. of calls Fop</div>
<div> --------- ----------- ----------- -----------
------------ ----</div>
<div> 0.00 0.00 us 0.00 us 0.00 us
25 RELEASE</div>
<div> 0.00 0.00 us 0.00 us 0.00 us
16 RELEASEDIR</div>
<div> 0.00 64.00 us 52.00 us 76.00 us
2 ENTRYLK</div>
<div> 0.00 73.50 us 51.00 us 96.00 us
2 FLUSH</div>
<div> 0.00 68.43 us 30.00 us 135.00 us
7 STATFS</div>
<div> 0.00 54.31 us 44.00 us 109.00 us
16 OPENDIR</div>
<div> 0.00 50.75 us 16.00 us 74.00 us
24 FSTAT</div>
<div> 0.00 47.77 us 19.00 us 119.00 us
26 GETXATTR</div>
<div> 0.00 59.21 us 21.00 us 89.00 us
24 OPEN</div>
<div> 0.00 59.39 us 22.00 us 296.00 us
28 READDIR</div>
<div> 0.00 4972.00 us 4972.00 us 4972.00 us
1 CREATE</div>
<div> 0.00 97.42 us 19.00 us 184.00 us
62 LOOKUP</div>
<div> 0.00 89.49 us 20.00 us 656.00 us
324 FXATTROP</div>
<div> 3.91 1255944.81 us 127.00 us 23397532.00 us
189 FSYNC</div>
<div> 7.40 3406275.50 us 17.00 us 23398013.00 us
132 INODELK</div>
<div> 34.96 94598.02 us 8.00 us 23398705.00 us
22445 FINODELK</div>
<div> 53.73 442.66 us 79.00 us 3116494.00 us
7372799 WRITE</div>
<div><br>
</div>
<div> Duration: 7813 seconds</div>
<div> Data Read: 0 bytes</div>
<div>Data Written: 966367641600 bytes</div>
<div><br>
</div>
<div>Interval 0 Stats:</div>
<div> Block Size: 131072b+ 262144b+</div>
<div> No. of Reads: 0 0</div>
<div>No. of Writes: 7372798 1</div>
<div> %-latency Avg-latency Min-Latency Max-Latency
No. of calls Fop</div>
<div> --------- ----------- ----------- -----------
------------ ----</div>
<div> 0.00 0.00 us 0.00 us 0.00 us
25 RELEASE</div>
<div> 0.00 0.00 us 0.00 us 0.00 us
16 RELEASEDIR</div>
<div> 0.00 64.00 us 52.00 us 76.00 us
2 ENTRYLK</div>
<div> 0.00 73.50 us 51.00 us 96.00 us
2 FLUSH</div>
<div> 0.00 68.43 us 30.00 us 135.00 us
7 STATFS</div>
<div> 0.00 54.31 us 44.00 us 109.00 us
16 OPENDIR</div>
<div> 0.00 50.75 us 16.00 us 74.00 us
24 FSTAT</div>
<div> 0.00 47.77 us 19.00 us 119.00 us
26 GETXATTR</div>
<div> 0.00 59.21 us 21.00 us 89.00 us
24 OPEN</div>
<div> 0.00 59.39 us 22.00 us 296.00 us
28 READDIR</div>
<div> 0.00 4972.00 us 4972.00 us 4972.00 us
1 CREATE</div>
<div> 0.00 97.42 us 19.00 us 184.00 us
62 LOOKUP</div>
<div> 0.00 89.49 us 20.00 us 656.00 us
324 FXATTROP</div>
<div> 3.91 1255944.81 us 127.00 us 23397532.00 us
189 FSYNC</div>
<div> 7.40 3406275.50 us 17.00 us 23398013.00 us
132 INODELK</div>
<div> 34.96 94598.02 us 8.00 us 23398705.00 us
22445 FINODELK</div>
<div> 53.73 442.66 us 79.00 us 3116494.00 us
7372799 WRITE</div>
<div><br>
</div>
<div> Duration: 7813 seconds</div>
<div> Data Read: 0 bytes</div>
<div>Data Written: 966367641600 bytes</div>
<div><br>
</div>
<div>Brick: stor2:/exports/NFS-WIN/1T</div>
<div>--------------------------------</div>
<div>Cumulative Stats:</div>
<div> Block Size: 131072b+ 262144b+</div>
<div> No. of Reads: 0 0</div>
<div>No. of Writes: 7372798 1</div>
<div> %-latency Avg-latency Min-Latency Max-Latency
No. of calls Fop</div>
<div> --------- ----------- ----------- -----------
------------ ----</div>
<div> 0.00 0.00 us 0.00 us 0.00 us
25 RELEASE</div>
<div> 0.00 0.00 us 0.00 us 0.00 us
16 RELEASEDIR</div>
<div> 0.00 61.50 us 46.00 us 77.00 us
2 ENTRYLK</div>
<div> 0.00 82.00 us 67.00 us 97.00 us
2 FLUSH</div>
<div> 0.00 265.00 us 265.00 us 265.00 us
1 CREATE</div>
<div> 0.00 57.43 us 30.00 us 85.00 us
7 STATFS</div>
<div> 0.00 61.12 us 37.00 us 107.00 us
16 OPENDIR</div>
<div> 0.00 44.04 us 12.00 us 86.00 us
24 FSTAT</div>
<div> 0.00 41.42 us 24.00 us 96.00 us
26 GETXATTR</div>
<div> 0.00 45.93 us 24.00 us 133.00 us
28 READDIR</div>
<div> 0.00 57.17 us 25.00 us 147.00 us
24 OPEN</div>
<div> 0.00 145.28 us 31.00 us 288.00 us
32 READDIRP</div>
<div> 0.00 39.50 us 10.00 us 152.00 us
132 INODELK</div>
<div> 0.00 330.97 us 20.00 us 14280.00 us
62 LOOKUP</div>
<div> 0.00 79.06 us 19.00 us 851.00 us
430 FXATTROP</div>
<div> 0.02 29.32 us 7.00 us 28154.00 us
22568 FINODELK</div>
<div> 7.80 1313096.68 us 125.00 us 23281862.00 us
189 FSYNC</div>
<div> 92.18 397.92 us 76.00 us 1838343.00 us
7372799 WRITE</div>
<div><br>
</div>
<div> Duration: 7811 seconds</div>
<div> Data Read: 0 bytes</div>
<div>Data Written: 966367641600 bytes</div>
<div><br>
</div>
<div>Interval 0 Stats:</div>
<div> Block Size: 131072b+ 262144b+</div>
<div> No. of Reads: 0 0</div>
<div>No. of Writes: 7372798 1</div>
<div> %-latency Avg-latency Min-Latency Max-Latency
No. of calls Fop</div>
<div> --------- ----------- ----------- -----------
------------ ----</div>
<div> 0.00 0.00 us 0.00 us 0.00 us
25 RELEASE</div>
<div> 0.00 0.00 us 0.00 us 0.00 us
16 RELEASEDIR</div>
<div> 0.00 61.50 us 46.00 us 77.00 us
2 ENTRYLK</div>
<div> 0.00 82.00 us 67.00 us 97.00 us
2 FLUSH</div>
<div> 0.00 265.00 us 265.00 us 265.00 us
1 CREATE</div>
<div> 0.00 57.43 us 30.00 us 85.00 us
7 STATFS</div>
<div> 0.00 61.12 us 37.00 us 107.00 us
16 OPENDIR</div>
<div> 0.00 44.04 us 12.00 us 86.00 us
24 FSTAT</div>
<div> 0.00 41.42 us 24.00 us 96.00 us
26 GETXATTR</div>
<div> 0.00 45.93 us 24.00 us 133.00 us
28 READDIR</div>
<div> 0.00 57.17 us 25.00 us 147.00 us
24 OPEN</div>
<div> 0.00 145.28 us 31.00 us 288.00 us
32 READDIRP</div>
<div> 0.00 39.50 us 10.00 us 152.00 us
132 INODELK</div>
<div> 0.00 330.97 us 20.00 us 14280.00 us
62 LOOKUP</div>
<div> 0.00 79.06 us 19.00 us 851.00 us
430 FXATTROP</div>
<div> 0.02 29.32 us 7.00 us 28154.00 us
22568 FINODELK</div>
<div> 7.80 1313096.68 us 125.00 us 23281862.00 us
189 FSYNC</div>
<div> 92.18 397.92 us 76.00 us 1838343.00 us
7372799 WRITE</div>
<div><br>
</div>
<div> Duration: 7811 seconds</div>
<div> Data Read: 0 bytes</div>
<div>Data Written: 966367641600 bytes</div>
<div><br>
</div>
</div>
<div>does it make something more clear?</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2014-10-13 20:40 GMT+03:00 Roman <span dir="ltr"><<a href="mailto:romeo.r@gmail.com" target="_blank">romeo.r@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">i think i may know what was an issue. There
was an iscsitarget service runing, that was exporting this
generated block device. so maybe my collegue Windows
server picked it up and mountd :) I'll if it will happen
again.</div>
<div class="gmail_extra">
<div>
<div><br>
<div class="gmail_quote">2014-10-13 20:27 GMT+03:00
Roman <span dir="ltr"><<a href="mailto:romeo.r@gmail.com" target="_blank">romeo.r@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">So may I restart the volume and
start the test, or you need something else from
this issue?</div>
<div class="gmail_extra">
<div>
<div><br>
<div class="gmail_quote">2014-10-13 19:49
GMT+03:00 Pranith Kumar Karampuri <span dir="ltr"><<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span>
<br>
<div>On 10/13/2014 10:03 PM, Roman
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">hmm,
<div>seems like another strange
issue? Seen this before. Had
to restart the volume to get
my empty space back.</div>
<div>
<div>root@glstor-cli:/srv/nfs/HA-WIN-TT-1T#
ls -l</div>
<div>total 943718400</div>
<div>-rw-r--r-- 1 root root
966367641600 Oct 13 16:55
disk</div>
<div>root@glstor-cli:/srv/nfs/HA-WIN-TT-1T#
rm disk</div>
<div>root@glstor-cli:/srv/nfs/HA-WIN-TT-1T#
df -h</div>
<div>Filesystem
Size Used Avail Use%
Mounted on</div>
<div>rootfs
282G 1.1G 266G 1% /</div>
<div>udev
10M 0 10M 0%
/dev</div>
<div>tmpfs
1.4G 228K 1.4G 1% /run</div>
<div>/dev/disk/by-uuid/c62ee3c0-c0e5-44af-b0cd-7cb3fbcc0fba
282G 1.1G 266G 1% /</div>
<div>tmpfs
5.0M 0 5.0M 0%
/run/lock</div>
<div>tmpfs
5.2G 0 5.2G 0%
/run/shm</div>
<div>stor1:HA-WIN-TT-1T
1008G 901G 57G 95%
/srv/nfs/HA-WIN-TT-1T</div>
</div>
<div><br>
</div>
<div>no file, but size is still
901G.</div>
<div>Both servers show the same.</div>
<div>Do I really have to restart
the volume to fix that?</div>
</div>
</blockquote>
</span> IMO this can happen if there
is an fd leak. open-fd is the only
variable that can change with volume
restart. How do you re-create the bug?<span><font color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<blockquote type="cite">
<div class="gmail_extra"><br>
<div class="gmail_quote">2014-10-13
19:30 GMT+03:00 Roman <span dir="ltr"><<a href="mailto:romeo.r@gmail.com" target="_blank">romeo.r@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Sure.
<div>I'll let it to run
for this night .</div>
</div>
<div class="gmail_extra">
<div>
<div><br>
<div class="gmail_quote">2014-10-13
19:19 GMT+03:00
Pranith Kumar
Karampuri <span dir="ltr"><<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> hi Roman,<br>
Do you
think we can
run this test
again? this
time, could
you enable
'gluster
volume profile
<volname>
start', do the
same test.
Provide output
of 'gluster
volume profile
<volname>
info' and logs
after the
test?<span><font color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<div>On
10/13/2014
09:45 PM,
Roman wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Sure
!
<div><br>
</div>
<div>
<div>root@stor1:~#
gluster volume
info</div>
<div><br>
</div>
<div>Volume
Name:
HA-2TB-TT-Proxmox-cluster</div>
<div>Type:
Replicate</div>
<div>Volume
ID:
66e38bde-c5fa-4ce2-be6e-6b2adeaa16c2</div>
<div>Status:
Started</div>
<div>Number of
Bricks: 1 x 2
= 2</div>
<div>Transport-type:
tcp</div>
<div>Bricks:</div>
<div>Brick1:
stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB</div>
<div>Brick2:
stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB</div>
<div>Options
Reconfigured:</div>
<div>nfs.disable:
0</div>
<div>network.ping-timeout:
10</div>
<div><br>
</div>
<div>Volume
Name:
HA-WIN-TT-1T</div>
<div>Type:
Replicate</div>
<div>Volume
ID:
2937ac01-4cba-44a8-8ff8-0161b67f8ee4</div>
<div>Status:
Started</div>
<div>Number of
Bricks: 1 x 2
= 2</div>
<div>Transport-type:
tcp</div>
<div>Bricks:</div>
<div>Brick1:
stor1:/exports/NFS-WIN/1T</div>
<div>Brick2:
stor2:/exports/NFS-WIN/1T</div>
<div>Options
Reconfigured:</div>
<div>nfs.disable:
1</div>
<div>network.ping-timeout:
10</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2014-10-13
19:09
GMT+03:00
Pranith Kumar
Karampuri <span dir="ltr"><<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> Could you give your 'gluster volume info' output?<br>
<br>
Pranith
<div>
<div><br>
<div>On
10/13/2014
09:36 PM,
Roman wrote:<br>
</div>
</div>
</div>
<blockquote type="cite">
<div>
<div>
<div dir="ltr">Hi,
<div><br>
</div>
<div>I've got
this kind of
setup (servers
run replica)</div>
<div><br>
</div>
<div><br>
</div>
<div>@ 10G
backend</div>
<div>gluster
storage1</div>
<div>gluster
storage2</div>
<div>gluster
client1</div>
<div><br>
</div>
<div>@1g
backend</div>
<div>other
gluster
clients</div>
<div><br>
</div>
<div>Servers
got HW RAID5
with SAS
disks.</div>
<div><br>
</div>
<div>So today
I've desided
to create a
900GB file for
iscsi target
that will be
located @
glusterfs
separate
volume, using
dd (just a
dummy file
filled with
zeros, bs=1G
count 900)</div>
<div>For the
first of all
the process
took pretty
lots of time,
the writing
speed was 130
MB/sec (client
port was 2
gbps, servers
ports were
running @
1gbps).</div>
<div>Then it
reported
something like
"endpoint is
not connected"
and all of my
VMs on the
other volume
started to
give me IO
errors.</div>
<div>Servers
load was
around 4,6
(total 12
cores)</div>
<div><br>
</div>
<div>Maybe it
was due to
timeout of 2
secs, so I've
made it a big
higher, 10
sec.</div>
<div><br>
</div>
<div>Also
during the dd
image creation
time, VMs very
often reported
me that their
disks are slow
like</div>
<div>
<p>WARNINGs:
Read IO Wait
time is -0.02
(outside range
[0:1]).</p>
<p>Is 130MB
/sec is the
maximum
bandwidth for
all of the
volumes in
total? That
why would we
need 10g
backends?</p>
<p>HW Raid
local speed is
300 MB/sec, so
it should not
be an issue.
any ideas or
mby any
advices?</p>
<p><br>
</p>
<p>Maybe some1
got optimized
sysctl.conf
for 10G
backend?</p>
<p>mine is
pretty simple,
which can be
found from
googling.</p>
<p><br>
</p>
<p>just to
mention: those
VM-s were
connected
using separate
1gbps
intraface,
which means,
they should
not be
affected by
the client
with 10g
backend.</p>
<p><br>
</p>
<p>logs are
pretty
useless, they
just say this
during the
outage</p>
<p><br>
</p>
<p>[2014-10-13
12:09:18.392910]
W
[client-handshake.c:276:client_ping_cbk]
0-HA-2TB-TT-Proxmox-cluster-client-0:
timer must
have expired</p>
<p>[2014-10-13
12:10:08.389708]
C
[client-handshake.c:127:rpc_client_ping_timer_expired]
0-HA-2TB-TT-Proxmox-cluster-client-0:
server <a href="http://10.250.0.1:49159" target="_blank">10.250.0.1:49159</a> has
not responded
in the last 2
seconds,
disconnecting.</p>
<p>[2014-10-13
12:10:08.390312]
W
[client-handshake.c:276:client_ping_cbk]
0-HA-2TB-TT-Proxmox-cluster-client-0:
timer must
have expired</p>
</div>
<div>so I
decided to set
the timout a
bit higher.</div>
<div>
<div><br>
</div>
<div>So it
seems to me,
that under
high load
GlusterFS is
not useable?
130 MB/s is
not that much
to get some
kind of
timeouts or
makeing the
systme so
slow, that
VM-s feeling
themselves
bad.</div>
<div><br>
</div>
<div>Of
course, after
the
disconnection,
healing
process was
started, but
as VM-s lost
connection to
both of
servers, it
was pretty
useless, they
could not run
anymore. and
BTW, when u
load the
server with
such huge job
(dd of 900GB),
healing
process goes
soooooo slow
:)</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</div>
<br>
<fieldset></fieldset>
<br>
</div>
</div>
<pre>_______________________________________________
Gluster-users mailing list
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a></pre>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
</div>
</div>
<span><font color="#888888">-- <br>
Best regards,<br>
Roman. </font></span></div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
</div>
</div>
<span><font color="#888888">-- <br>
Best regards,<br>
Roman.
</font></span></div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
</div>
</div>
<span><font color="#888888">-- <br>
Best regards,<br>
Roman.
</font></span></div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman.
</div>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>Best regards,<br>Roman.
</div></div>