<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<div class="moz-cite-prefix">On 08/06/2014 12:27 PM, Roman wrote:<br>
</div>
<blockquote
cite="mid:CAFR=TBoRrOW37e4wUc1eY3p78mU9zApTzyNnjGXS8eVsf3txrg@mail.gmail.com"
type="cite">
<div dir="ltr">Yesterday I've reproduced this situation two times.
<div> </div>
<div>The setup:</div>
<div>1. Hardware and network</div>
<div> a. Disks INTEL SSDSC2BB240G4</div>
<div> b1. Network cards: X540-AT2</div>
<div>
b2. Netgear 10g switch</div>
<div>2. Software setup:</div>
<div> a. OS: Debian wheezy</div>
<div> b. Glusterfs: 3.4.4 (latest 3.4.4 from gluster
repository)</div>
<div> c. Promox VE with update glusterfs from gluster
repository</div>
<div>3. Software Configuration</div>
<div> a. create replicated volume with
cluster.self-heal-daemon: off; nfs.disable: off;
network.ping-timeout: 2 opts</div>
<div> b. mount it on proxmox VE (via proxmox gui, it mouts
with these opts: stor1:HA-fast-150G-PVE1 on /mnt/pve/FAST-TESt
type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
)</div>
<div> c. install VM with qcow2 or raw disk image.</div>
<div> d. disable port / remove network cable from one of
storage servers</div>
<div> e. wait and put cable back</div>
<div> f. keep waiting for sync (pointless, it won't ever
start)</div>
<div> g. disable another port for second server (or remove
cable from second server)</div>
<div> h. profit.</div>
<div><br>
</div>
<div>Maybe I could use 3.5.2 from debian sid (testing)
repository to test with?</div>
</div>
</blockquote>
Sure, you can go ahead. I will just write one document about
maintaining VMs on gluster from the perspective of replication.<br>
<br>
Pranith<br>
<blockquote
cite="mid:CAFR=TBoRrOW37e4wUc1eY3p78mU9zApTzyNnjGXS8eVsf3txrg@mail.gmail.com"
type="cite">
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">2014-08-06 9:39 GMT+03:00 Pranith Kumar
Karampuri <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> Roman,<br>
The file went into split-brain. I think we should do
these tests with 3.5.2. Where monitoring the heals is
easier. Let me also come up with a document about how to
do this testing you are trying to do.<br>
<br>
Humble/Niels,<br>
Do we have debs available for 3.5.2? In 3.5.1 there
was packaging issue where /usr/bin/glfsheal is not
packaged along with the deb. I think that should be fixed
now as well?<br>
<br>
Pranith<br>
<br>
<div>On 08/06/2014 11:52 AM, Roman wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">good morning,
<div><br>
</div>
<div>
<div>root@stor1:~# getfattr -d -m. -e hex
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2</div>
<div>getfattr: Removing leading '/' from absolute
path names</div>
<div># file:
exports/fast-test/150G/images/127/vm-127-disk-1.qcow2</div>
<div>trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000</div>
<div>trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000</div>
<div>trusted.gfid=0x23c79523075a4158bea38078da570449</div>
</div>
<div><br>
</div>
<div>
<div>getfattr: Removing leading '/' from absolute
path names</div>
<div># file:
exports/fast-test/150G/images/127/vm-127-disk-1.qcow2</div>
<div>trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000</div>
<div>trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000</div>
<div>trusted.gfid=0x23c79523075a4158bea38078da570449</div>
</div>
<div> <br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">2014-08-06 9:20 GMT+03:00
Pranith Kumar Karampuri <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:pkarampu@redhat.com"
target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div> <br>
<div>On 08/06/2014 11:30 AM, Roman wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Also, this time files are not
the same!
<div>
<div><br>
</div>
<div>root@stor1:~# md5sum
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2</div>
<div>32411360c53116b96a059f17306caeda
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2</div>
</div>
<div><br>
</div>
<div>
<div>root@stor2:~# md5sum
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2</div>
<div>65b8a6031bcb6f5fb3a11cb1e8b1c9c9
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2</div>
</div>
</div>
</blockquote>
</div>
What is the getfattr output?<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<blockquote type="cite">
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">2014-08-05
16:33 GMT+03:00 Roman <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:romeo.r@gmail.com"
target="_blank">romeo.r@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div dir="ltr">Nope, it is not
working. But this time it went a
bit other way
<div><br>
</div>
<div>
<div>root@gluster-client:~#
dmesg</div>
<div>Segmentation fault</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>I was not able even to start
the VM after I done the tests</div>
<div><br>
</div>
<div><span
style="color:rgb(0,0,0);font-family:tahoma,arial,verdana,sans-serif;font-size:11px;line-height:16px;white-space:pre-wrap">Could
not read qcow2 header:
Operation not permitted</span><br>
</div>
<div> <span
style="color:rgb(0,0,0);font-family:tahoma,arial,verdana,sans-serif;font-size:11px;line-height:16px;white-space:pre-wrap"><br>
</span></div>
<div>And it seems, it never starts
to sync files after first
disconnect. VM survives first
disconnect, but not second (I
waited around 30 minutes). Also,
I've got network.ping-timeout: 2
in volume settings, but logs
react on first disconnect around
30 seconds. Second was faster, 2
seconds.</div>
<div><br>
</div>
<div>Reaction was different also:</div>
<div><br>
</div>
<div>slower one:</div>
<div>
<div>[2014-08-05
13:26:19.558435] W
[socket.c:514:__socket_rwv]
0-glusterfs: readv failed
(Connection timed out)</div>
<div>[2014-08-05
13:26:19.558485] W
[socket.c:1962:__socket_proto_state_machine]
0-glusterfs: reading from
socket failed. Error
(Connection timed out), peer (<a
moz-do-not-send="true"
href="http://10.250.0.1:24007"
target="_blank">10.250.0.1:24007</a>)</div>
<div>[2014-08-05
13:26:21.281426] W
[socket.c:514:__socket_rwv]
0-HA-fast-150G-PVE1-client-0:
readv failed (Connection timed
out)</div>
<div>[2014-08-05
13:26:21.281474] W
[socket.c:1962:__socket_proto_state_machine]
0-HA-fast-150G-PVE1-client-0:
reading from socket failed.
Error (Connection timed out),
peer (<a
moz-do-not-send="true"
href="http://10.250.0.1:49153"
target="_blank">10.250.0.1:49153</a>)</div>
<div>[2014-08-05
13:26:21.281507] I
[client.c:2098:client_rpc_notify]
0-HA-fast-150G-PVE1-client-0:
disconnected</div>
</div>
<div><br>
</div>
<div>the fast one:</div>
<div>
<div>2014-08-05 12:52:44.607389]
C
[client-handshake.c:127:rpc_client_ping_timer_expired]
0-HA-fast-150G-PVE1-client-1:
server <a
moz-do-not-send="true"
href="http://10.250.0.2:49153"
target="_blank">10.250.0.2:49153</a>
has not responded in the last
2 seconds, disconnecting.</div>
<div>[2014-08-05
12:52:44.607491] W
[socket.c:514:__socket_rwv]
0-HA-fast-150G-PVE1-client-1:
readv failed (No data
available)</div>
<div>[2014-08-05
12:52:44.607585] E
[rpc-clnt.c:368:saved_frames_unwind]
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
[0x7fcb1b4b0558]
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
[0x7fcb1b4aea63]
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
[0x7fcb1b4ae97e])))
0-HA-fast-150G-PVE1-client-1:
forced unwinding frame
type(GlusterFS 3.3)
op(LOOKUP(27)) called at
2014-08-05 12:52:42.463881
(xid=0x381883x)</div>
<div>[2014-08-05
12:52:44.607604] W
[client-rpc-fops.c:2624:client3_3_lookup_cbk]
0-HA-fast-150G-PVE1-client-1:
remote operation failed:
Transport endpoint is not
connected. Path: /
(00000000-0000-0000-0000-000000000001)</div>
<div>[2014-08-05
12:52:44.607736] E
[rpc-clnt.c:368:saved_frames_unwind]
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
[0x7fcb1b4b0558]
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
[0x7fcb1b4aea63]
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
[0x7fcb1b4ae97e])))
0-HA-fast-150G-PVE1-client-1:
forced unwinding frame
type(GlusterFS Handshake)
op(PING(3)) called at
2014-08-05 12:52:42.463891
(xid=0x381884x)</div>
<div>[2014-08-05
12:52:44.607753] W
[client-handshake.c:276:client_ping_cbk]
0-HA-fast-150G-PVE1-client-1:
timer must have expired</div>
<div>[2014-08-05
12:52:44.607776] I
[client.c:2098:client_rpc_notify]
0-HA-fast-150G-PVE1-client-1:
disconnected</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>I've got SSD disks (just for
an info).</div>
<div>Should I go and give a try
for 3.5.2?</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">
2014-08-05 13:06 GMT+03:00
Pranith Kumar Karampuri <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:pkarampu@redhat.com"
target="_blank">pkarampu@redhat.com</a>></span>:
<div>
<div><br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000"
bgcolor="#FFFFFF"> reply
along with gluster-users
please :-). May be you
are hitting 'reply'
instead of 'reply all'?<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<div>On 08/05/2014
03:35 PM, Roman
wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">To
make sure and
clean, I've
created another
VM with raw
format and goint
to repeat those
steps. So now
I've got two
VM-s one with
qcow2 format and
other with raw
format. I will
send another
e-mail shortly.</div>
<div
class="gmail_extra"><br>
<br>
<div
class="gmail_quote">2014-08-05
13:01
GMT+03:00
Pranith Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
text="#000000"
bgcolor="#FFFFFF">
<div> <br>
<div>On
08/05/2014
03:07 PM,
Roman wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">really,
seems like the
same file
<div><br>
</div>
<div>stor1:</div>
<div>a951641c5230472929836f9fcede6b04
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
</div>
<div><br>
</div>
<div>stor2:</div>
<div>a951641c5230472929836f9fcede6b04
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>one thing
I've seen from
logs, that
somehow
proxmox VE is
connecting
with wrong
version to
servers?</div>
<div>[2014-08-05
09:23:45.218550]
I
[client-handshake.c:1659:select_server_supported_programs]
0-HA-fast-150G-PVE1-client-0:
Using Program
GlusterFS 3.3,
Num (1298437),
Version (330)<br>
</div>
</div>
</blockquote>
</div>
It is the rpc
(over the
network data
structures)
version, which
is not changed
at all from
3.3 so thats
not a problem.
So what is the
conclusion? Is
your test case
working now or
not?<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<blockquote
type="cite">
<div dir="ltr">
<div> </div>
<div>but if I
issue:</div>
<div>
<div>root@pve1:~#
glusterfs -V</div>
<div>glusterfs
3.4.4 built on
Jun 28 2014
03:44:57</div>
</div>
<div>seems ok.</div>
<div><br>
</div>
<div>server
use 3.4.4
meanwhile</div>
<div>[2014-08-05
09:23:45.117875]
I
[server-handshake.c:567:server_setvolume]
0-HA-fast-150G-PVE1-server:
accepted
client from
stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
(version:
3.4.4)<br>
</div>
<div>[2014-08-05
09:23:49.103035]
I
[server-handshake.c:567:server_setvolume]
0-HA-fast-150G-PVE1-server:
accepted
client from
stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
(version:
3.4.4)<br>
</div>
<div><br>
</div>
<div>if this
could be the
reason, of
course.</div>
<div>I did
restart the
Proxmox VE
yesterday
(just for an
information)</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<div
class="gmail_extra"><br>
<br>
<div
class="gmail_quote">2014-08-05
12:30
GMT+03:00
Pranith Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
text="#000000"
bgcolor="#FFFFFF">
<div> <br>
<div>On
08/05/2014
02:33 PM,
Roman wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">Waited
long enough
for now, still
different
sizes and no
logs about
healing :(
<div><br>
</div>
<div>stor1 </div>
<div>
<div># file:
exports/fast-test/150G/images/127/vm-127-disk-1.qcow2</div>
<div>trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000</div>
<div>trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000</div>
<div>trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921</div>
</div>
<div><br>
</div>
<div>
<div>root@stor1:~#
du -sh
/exports/fast-test/150G/images/127/</div>
<div>1.2G
/exports/fast-test/150G/images/127/</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>stor2</div>
<div>
<div># file:
exports/fast-test/150G/images/127/vm-127-disk-1.qcow2</div>
<div>trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000</div>
<div>trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000</div>
<div>trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>root@stor2:~#
du -sh
/exports/fast-test/150G/images/127/</div>
<div>1.4G
/exports/fast-test/150G/images/127/</div>
</div>
</div>
</blockquote>
</div>
According to
the
changelogs,
the file
doesn't need
any healing.
Could you stop
the operations
on the VMs and
take md5sum on
both these
machines?<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<blockquote
type="cite">
<div dir="ltr">
<div><br>
</div>
<div><br>
</div>
</div>
<div
class="gmail_extra"><br>
<br>
<div
class="gmail_quote">2014-08-05
11:49
GMT+03:00
Pranith Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
text="#000000"
bgcolor="#FFFFFF">
<div> <br>
<div>On
08/05/2014
02:06 PM,
Roman wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">Well,
it seems like
it doesn't see
the changes
were made to
the volume ? I
created two
files 200 and
100 MB (from
/dev/zero)
after I
disconnected
the first
brick. Then
connected it
back and got
these logs:
<div> <br>
</div>
<div>
<div>[2014-08-05
08:30:37.830150]
I
[glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
0-glusterfs:
No change in
volfile,
continuing</div>
<div>[2014-08-05
08:30:37.830207]
I
[rpc-clnt.c:1676:rpc_clnt_reconfig]
0-HA-fast-150G-PVE1-client-0:
changing port
to 49153 (from
0)</div>
<div>[2014-08-05
08:30:37.830239]
W
[socket.c:514:__socket_rwv]
0-HA-fast-150G-PVE1-client-0:
readv failed
(No data
available)</div>
<div>[2014-08-05
08:30:37.831024]
I
[client-handshake.c:1659:select_server_supported_programs]
0-HA-fast-150G-PVE1-client-0:
Using Program
GlusterFS 3.3,
Num (1298437),
Version (330)</div>
<div>[2014-08-05
08:30:37.831375]
I
[client-handshake.c:1456:client_setvolume_cbk]
0-HA-fast-150G-PVE1-client-0:
Connected to <a
moz-do-not-send="true" href="http://10.250.0.1:49153" target="_blank">10.250.0.1:49153</a>,
attached to
remote volume
'/exports/fast-test/150G'.</div>
<div>[2014-08-05
08:30:37.831394]
I
[client-handshake.c:1468:client_setvolume_cbk]
0-HA-fast-150G-PVE1-client-0:
Server and
Client
lk-version
numbers are
not same,
reopening the
fds</div>
<div>[2014-08-05
08:30:37.831566]
I
[client-handshake.c:450:client_set_lk_version_cbk]
0-HA-fast-150G-PVE1-client-0:
Server lk
version = 1</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>[2014-08-05
08:30:37.830150]
I
[glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
0-glusterfs:
No change in
volfile,
continuing<br>
</div>
<div>this line
seems weird to
me tbh.</div>
<div>I do not
see any
traffic on
switch
interfaces
between
gluster
servers, which
means, there
is no syncing
between them.</div>
<div>I tried
to ls -l the
files on the
client and
servers to
trigger the
healing, but
seems like no
success.
Should I wait
more?</div>
</div>
</blockquote>
</div>
Yes, it should
take around
10-15 minutes.
Could you
provide
'getfattr -d
-m. -e hex
<file-on-brick>'
on both the
bricks.<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<blockquote
type="cite">
<div
class="gmail_extra"><br>
<br>
<div
class="gmail_quote">2014-08-05
11:25
GMT+03:00
Pranith Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
text="#000000"
bgcolor="#FFFFFF">
<div> <br>
<div>On
08/05/2014
01:10 PM,
Roman wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">Ahha!
For some
reason I was
not able to
start the VM
anymore,
Proxmox VE
told me, that
it is not able
to read the
qcow2 header
due to
permission is
denied for
some reason.
So I just
deleted that
file and
created a new
VM. And the
nex message
I've got was
this:</div>
</blockquote>
</div>
Seems like
these are the
messages where
you took down
the bricks
before
self-heal.
Could you
restart the
run waiting
for self-heals
to complete
before taking
down the next
brick?<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<blockquote
type="cite">
<div dir="ltr">
<div> <br>
<div><br>
</div>
<div>
<div>[2014-08-05
07:31:25.663412]
E
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
0-HA-fast-150G-PVE1-replicate-0:
Unable to
self-heal
contents of
'/images/124/vm-124-disk-1.qcow2'
(possible
split-brain).
Please delete
the file from
all but the
preferred
subvolume.-
Pending
matrix: [ [ 0
60 ] [ 11 0 ]
]</div>
<div>[2014-08-05
07:31:25.663955]
E
[afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
0-HA-fast-150G-PVE1-replicate-0:
background
data
self-heal
failed on
/images/124/vm-124-disk-1.qcow2</div>
</div>
<div><br>
</div>
</div>
</div>
<div
class="gmail_extra"><br>
<br>
<div
class="gmail_quote">2014-08-05
10:13
GMT+03:00
Pranith Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
text="#000000"
bgcolor="#FFFFFF"> I just responded to your earlier mail about how the
log looks. The
log comes on
the mount's
logfile<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<div>On
08/05/2014
12:41 PM,
Roman wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">Ok,
so I've waited
enough, I
think. Had no
any traffic on
switch ports
between
servers. Could
not find any
suitable log
message about
completed
self-heal
(waited about
30 minutes).
Plugged out
the other
server's UTP
cable this
time and got
in the same
situation:
<div>
<div>root@gluster-test1:~#
cat
/var/log/dmesg</div>
<div>-bash:
/bin/cat:
Input/output
error</div>
</div>
<div><br>
</div>
<div>brick
logs:</div>
<div>
<div>[2014-08-05
07:09:03.005474]
I
[server.c:762:server_rpc_notify]
0-HA-fast-150G-PVE1-server:
disconnecting
connectionfrom
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0</div>
<div>[2014-08-05
07:09:03.005530]
I
[server-helpers.c:729:server_connection_put]
0-HA-fast-150G-PVE1-server:
Shutting down
connection
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0</div>
<div>[2014-08-05
07:09:03.005560]
I
[server-helpers.c:463:do_fd_cleanup]
0-HA-fast-150G-PVE1-server:
fd cleanup on
/images/124/vm-124-disk-1.qcow2</div>
<div>[2014-08-05
07:09:03.005797]
I
[server-helpers.c:617:server_connection_destroy]
0-HA-fast-150G-PVE1-server:
destroyed
connection of
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<div
class="gmail_extra"><br>
<br>
<div
class="gmail_quote">2014-08-05
9:53 GMT+03:00
Pranith Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
text="#000000"
bgcolor="#FFFFFF"> Do you think it is possible for you to do these tests
on the latest
version 3.5.2?
'gluster
volume heal
<volname>
info' would
give you that
information in
versions >
3.5.1.<br>
Otherwise you
will have to
check it from
either the
logs, there
will be
self-heal
completed
message on the
mount logs
(or) by
observing
'getfattr -d
-m. -e hex
<image-file-on-bricks>'<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<br>
<div>On
08/05/2014
12:09 PM,
Roman wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">Ok,
I understand.
I will try
this shortly.
<div>How can I
be sure, that
healing
process is
done, if I am
not able to
see its
status?</div>
</div>
<div
class="gmail_extra"><br>
<br>
<div
class="gmail_quote">2014-08-05
9:30 GMT+03:00
Pranith Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
text="#000000"
bgcolor="#FFFFFF"> Mounts will do the healing, not the self-heal-daemon.
The problem I
feel is that
whichever
process does
the healing
has the latest
information
about the good
bricks in this
usecase. Since
for VM
usecase,
mounts should
have the
latest
information,
we should let
the mounts do
the healing.
If the mount
accesses the
VM image
either by
someone doing
operations
inside the VM
or explicit
stat on the
file it should
do the
healing.<span><font
color="#888888"><br>
<br>
Pranith.</font></span>
<div>
<div><br>
<br>
<div>On
08/05/2014
10:39 AM,
Roman wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">Hmmm,
you told me to
turn it off.
Did I
understood
something
wrong? After I
issued the
command you've
sent me, I was
not able to
watch the
healing
process, it
said, it won't
be healed,
becouse its
turned off.</div>
<div
class="gmail_extra"><br>
<br>
<div
class="gmail_quote">2014-08-05
5:39 GMT+03:00
Pranith Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
text="#000000"
bgcolor="#FFFFFF"> You didn't mention anything about self-healing. Did
you wait until
the self-heal
is complete?<span><font
color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<div>On
08/04/2014
05:49 PM,
Roman wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">
<div>Hi!</div>
<div>Result is
pretty same. I
set the switch
port down for
1st server, it
was ok. Then
set it up back
and set other
server's port
off. and it
triggered IO
error on two
virtual
machines: one
with local
root FS but
network
mounted
storage. and
other with
network root
FS. 1st gave
an error on
copying to or
from the
mounted
network disk,
other just
gave me an
error for even
reading
log.files.</div>
<div><br>
</div>
<div>
<div>cat:
/var/log/alternatives.log:
Input/output
error<br>
</div>
<div>then I
reset the kvm
VM and it said
me, there is
no boot
device. Next I
virtually
powered it off
and then back
on and it has
booted.</div>
<div><br>
</div>
<div>By the
way, did I
have to
start/stop
volume?</div>
</div>
<div><br>
</div>
<div>>> <span
style="font-family:arial,sans-serif;font-size:13px">Could you do the
following and
test it again?</span></div>
<span
style="font-family:arial,sans-serif;font-size:13px">>>
gluster volume
set
<volname>
cluster.self-heal-daemon
off</span><br
style="font-family:arial,sans-serif;font-size:13px">
<br
style="font-family:arial,sans-serif;font-size:13px">
<span
style="font-family:arial,sans-serif;font-size:13px">>>Pranith</span>
<div><br>
</div>
<div><br>
</div>
</div>
<div
class="gmail_extra"><br>
<br>
<div
class="gmail_quote">2014-08-04
14:10
GMT+03:00
Pranith Kumar
Karampuri <span
dir="ltr"><<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote
class="gmail_quote"
style="margin:0
0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
text="#000000"
bgcolor="#FFFFFF">
<div> <br>
<div>On
08/04/2014
03:33 PM,
Roman wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">
<div><span
style="font-family:arial,sans-serif;font-size:13px">Hello!</span></div>
<div><span
style="font-family:arial,sans-serif;font-size:13px"><br>
</span></div>
<span
style="font-family:arial,sans-serif;font-size:13px">Facing
the same
problem as
mentioned
here:</span>
<div
style="font-family:arial,sans-serif;font-size:13px">
<br>
</div>
<div
style="font-family:arial,sans-serif;font-size:13px"><a
moz-do-not-send="true"
href="http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html"
target="_blank">http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html</a></div>
<div
style="font-family:arial,sans-serif;font-size:13px"><br>
</div>
<div
style="font-family:arial,sans-serif;font-size:13px">my
set up is up
and running,
so i'm ready
to help you
back with
feedback.</div>
<div
style="font-family:arial,sans-serif;font-size:13px">
<br>
</div>
<div
style="font-family:arial,sans-serif;font-size:13px">setup:</div>
<div
style="font-family:arial,sans-serif;font-size:13px">proxmox
server as
client</div>
<div
style="font-family:arial,sans-serif;font-size:13px">
2 gluster
physical
servers</div>
<div
style="font-family:arial,sans-serif;font-size:13px"><br>
</div>
<div
style="font-family:arial,sans-serif;font-size:13px">server
side and
client side
both running
atm 3.4.4
glusterfs from
gluster repo.</div>
<div
style="font-family:arial,sans-serif;font-size:13px"><br>
</div>
<div
style="font-family:arial,sans-serif;font-size:13px">the
problem is:</div>
<div
style="font-family:arial,sans-serif;font-size:13px"><br>
</div>
<div
style="font-family:arial,sans-serif;font-size:13px">
1. craeted
replica
bricks.</div>
<div
style="font-family:arial,sans-serif;font-size:13px">2.
mounted in
proxmox (tried
both promox
ways: via GUI
and fstab
(with backup
volume line),
btw while
mounting via
fstab I'm
unable to
launch a VM
without cache,
meanwhile
direct-io-mode
is enabled in
fstab line)</div>
<div
style="font-family:arial,sans-serif;font-size:13px">3.
installed VM</div>
<div
style="font-family:arial,sans-serif;font-size:13px">4.
bring one
volume down -
ok</div>
<div
style="font-family:arial,sans-serif;font-size:13px">
5. bringing
up, waiting
for sync is
done.</div>
<div
style="font-family:arial,sans-serif;font-size:13px">6.
bring other
volume down -
getting IO
errors on VM
guest and not
able to
restore the VM
after I reset
the VM via
host. It says
(no bootable
media). After
I shut it down
(forced) and
bring back up,
it boots.</div>
</div>
</blockquote>
</div>
Could you do
the following
and test it
again?<br>
gluster volume
set
<volname>
cluster.self-heal-daemon
off<br>
<br>
Pranith<br>
<blockquote
type="cite">
<div>
<div dir="ltr">
<div
style="font-family:arial,sans-serif;font-size:13px"><br>
</div>
<div
style="font-family:arial,sans-serif;font-size:13px">Need
help. Tried
3.4.3, 3.4.4.</div>
<div
style="font-family:arial,sans-serif;font-size:13px">Still
missing pkg-s
for 3.4.5 for
debian and
3.5.2 (3.5.1
always gives a
healing error
for some
reason)</div>
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
<br>
<fieldset></fieldset>
<br>
</div>
<pre>_______________________________________________
Gluster-users mailing list
<a moz-do-not-send="true" href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a>
<a moz-do-not-send="true" href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a></pre>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br>
<br
clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br
clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br
clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br
clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br
clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br
clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br
clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br
clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
<span><font color="#888888"><br>
<br clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </font></span></div>
</blockquote>
</div>
<br>
<br clear="all">
<span class="HOEnZb"><font
color="#888888">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </font></span></div>
<span class="HOEnZb"><font color="#888888">
</font></span></blockquote>
<span class="HOEnZb"><font color="#888888">
<br>
</font></span></div>
<span class="HOEnZb"><font color="#888888"> </font></span></div>
<span class="HOEnZb"><font color="#888888"> </font></span></div>
<span class="HOEnZb"><font color="#888888"> </font></span></blockquote>
<span class="HOEnZb"><font color="#888888"> </font></span></div>
<span class="HOEnZb"><font color="#888888"> <br>
<br clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </font></span></div>
</blockquote>
<br>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman.
</div>
</blockquote>
<br>
</body>
</html>