<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<div class="moz-cite-prefix">On 08/26/2014 11:55 AM, Roman wrote:<br>
</div>
<blockquote
cite="mid:CAFR=TBoSUbLv4s385KpzU0BddXJBazctUnZmfpR6ET6J+ZTFFw@mail.gmail.com"
type="cite">
<div dir="ltr">Hello all again!
<div>I'm back from vacation and I'm pretty happy with 3.5.2
available for wheezy. Thanks! Just made my updates.</div>
<div>For 3.5.2 do I still have to set cluster.self-heal-daemon
to off?</div>
</div>
</blockquote>
Welcome back :-). If you set it to off, the test case you execute
will work(Validate please :-) ). But we need to test it with
self-heal-daemon 'on' and fix any bugs if the test case does not
work?<br>
<br>
Pranith.<br>
<blockquote
cite="mid:CAFR=TBoSUbLv4s385KpzU0BddXJBazctUnZmfpR6ET6J+ZTFFw@mail.gmail.com"
type="cite">
<div dir="ltr">
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">2014-08-06 12:49 GMT+03:00 Humble
Chirammal <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:hchiramm@redhat.com" target="_blank">hchiramm@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class=""><br>
<br>
<br>
----- Original Message -----<br>
| From: "Pranith Kumar Karampuri" <<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>><br>
| To: "Roman" <<a moz-do-not-send="true"
href="mailto:romeo.r@gmail.com">romeo.r@gmail.com</a>><br>
| Cc: <a moz-do-not-send="true"
href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>,
"Niels de Vos" <<a moz-do-not-send="true"
href="mailto:ndevos@redhat.com">ndevos@redhat.com</a>>,
"Humble Chirammal" <<a moz-do-not-send="true"
href="mailto:hchiramm@redhat.com">hchiramm@redhat.com</a>><br>
| Sent: Wednesday, August 6, 2014 12:09:57 PM<br>
| Subject: Re: [Gluster-users] libgfapi failover problem
on replica bricks<br>
|<br>
| Roman,<br>
| The file went into split-brain. I think we should
do these tests<br>
| with 3.5.2. Where monitoring the heals is easier. Let me
also come up<br>
| with a document about how to do this testing you are
trying to do.<br>
|<br>
| Humble/Niels,<br>
| Do we have debs available for 3.5.2? In 3.5.1 there
was packaging<br>
| issue where /usr/bin/glfsheal is not packaged along with
the deb. I<br>
| think that should be fixed now as well?<br>
|<br>
</div>
Pranith,<br>
<br>
The 3.5.2 packages for debian is not available yet. We are
co-ordinating internally to get it processed.<br>
I will update the list once its available.<br>
<br>
--Humble<br>
<div class="">|<br>
| On 08/06/2014 11:52 AM, Roman wrote:<br>
| > good morning,<br>
| ><br>
| > root@stor1:~# getfattr -d -m. -e hex<br>
| >
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| > getfattr: Removing leading '/' from absolute path
names<br>
| > # file:
exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >
trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000<br>
| >
trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000<br>
| > trusted.gfid=0x23c79523075a4158bea38078da570449<br>
| ><br>
| > getfattr: Removing leading '/' from absolute path
names<br>
| > # file:
exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >
trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000<br>
| >
trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000<br>
| > trusted.gfid=0x23c79523075a4158bea38078da570449<br>
| ><br>
| ><br>
| ><br>
| > 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri
<<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div>
| > <mailto:<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div class="">| ><br>
| ><br>
| > On 08/06/2014 11:30 AM, Roman wrote:<br>
| >> Also, this time files are not the same!<br>
| >><br>
| >> root@stor1:~# md5sum<br>
| >>
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >> 32411360c53116b96a059f17306caeda<br>
| >>
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >><br>
| >> root@stor2:~# md5sum<br>
| >>
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >> 65b8a6031bcb6f5fb3a11cb1e8b1c9c9<br>
| >>
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| > What is the getfattr output?<br>
| ><br>
| > Pranith<br>
| ><br>
| >><br>
| >><br>
| >> 2014-08-05 16:33 GMT+03:00 Roman <<a
moz-do-not-send="true" href="mailto:romeo.r@gmail.com">romeo.r@gmail.com</a><br>
</div>
| >> <mailto:<a moz-do-not-send="true"
href="mailto:romeo.r@gmail.com">romeo.r@gmail.com</a>>>:<br>
<div class="">| >><br>
| >> Nope, it is not working. But this time
it went a bit other way<br>
| >><br>
| >> root@gluster-client:~# dmesg<br>
| >> Segmentation fault<br>
| >><br>
| >><br>
| >> I was not able even to start the VM
after I done the tests<br>
| >><br>
| >> Could not read qcow2 header: Operation
not permitted<br>
| >><br>
| >> And it seems, it never starts to sync
files after first<br>
| >> disconnect. VM survives first
disconnect, but not second (I<br>
| >> waited around 30 minutes). Also, I've<br>
| >> got network.ping-timeout: 2 in volume
settings, but logs<br>
| >> react on first disconnect around 30
seconds. Second was<br>
| >> faster, 2 seconds.<br>
| >><br>
| >> Reaction was different also:<br>
| >><br>
| >> slower one:<br>
| >> [2014-08-05 13:26:19.558435] W
[socket.c:514:__socket_rwv]<br>
| >> 0-glusterfs: readv failed (Connection
timed out)<br>
| >> [2014-08-05 13:26:19.558485] W<br>
| >>
[socket.c:1962:__socket_proto_state_machine] 0-glusterfs:<br>
| >> reading from socket failed. Error
(Connection timed out),<br>
</div>
| >> peer (<a moz-do-not-send="true"
href="http://10.250.0.1:24007" target="_blank">10.250.0.1:24007</a>
<<a moz-do-not-send="true" href="http://10.250.0.1:24007"
target="_blank">http://10.250.0.1:24007</a>>)<br>
<div class="">| >> [2014-08-05
13:26:21.281426] W [socket.c:514:__socket_rwv]<br>
| >> 0-HA-fast-150G-PVE1-client-0: readv
failed (Connection timed out)<br>
| >> [2014-08-05 13:26:21.281474] W<br>
| >>
[socket.c:1962:__socket_proto_state_machine]<br>
| >> 0-HA-fast-150G-PVE1-client-0: reading
from socket failed.<br>
| >> Error (Connection timed out), peer (<a
moz-do-not-send="true" href="http://10.250.0.1:49153"
target="_blank">10.250.0.1:49153</a><br>
</div>
| >> <<a moz-do-not-send="true"
href="http://10.250.0.1:49153" target="_blank">http://10.250.0.1:49153</a>>)<br>
<div class="">| >> [2014-08-05
13:26:21.281507] I<br>
| >> [client.c:2098:client_rpc_notify]<br>
| >> 0-HA-fast-150G-PVE1-client-0:
disconnected<br>
| >><br>
| >> the fast one:<br>
| >> 2014-08-05 12:52:44.607389] C<br>
| >>
[client-handshake.c:127:rpc_client_ping_timer_expired]<br>
| >> 0-HA-fast-150G-PVE1-client-1: server <a
moz-do-not-send="true" href="http://10.250.0.2:49153"
target="_blank">10.250.0.2:49153</a><br>
</div>
| >> <<a moz-do-not-send="true"
href="http://10.250.0.2:49153" target="_blank">http://10.250.0.2:49153</a>>
has not responded in the last 2<br>
<div>
<div class="h5">| >> seconds, disconnecting.<br>
| >> [2014-08-05 12:52:44.607491] W
[socket.c:514:__socket_rwv]<br>
| >> 0-HA-fast-150G-PVE1-client-1: readv
failed (No data available)<br>
| >> [2014-08-05 12:52:44.607585] E<br>
| >> [rpc-clnt.c:368:saved_frames_unwind]<br>
| >>
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)<br>
| >> [0x7fcb1b4b0558]<br>
| >>
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)<br>
| >> [0x7fcb1b4aea63]<br>
| >>
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)<br>
| >> [0x7fcb1b4ae97e])))
0-HA-fast-150G-PVE1-client-1: forced<br>
| >> unwinding frame type(GlusterFS 3.3)
op(LOOKUP(27)) called at<br>
| >> 2014-08-05 12:52:42.463881
(xid=0x381883x)<br>
| >> [2014-08-05 12:52:44.607604] W<br>
| >>
[client-rpc-fops.c:2624:client3_3_lookup_cbk]<br>
| >> 0-HA-fast-150G-PVE1-client-1: remote
operation failed:<br>
| >> Transport endpoint is not connected.
Path: /<br>
| >>
(00000000-0000-0000-0000-000000000001)<br>
| >> [2014-08-05 12:52:44.607736] E<br>
| >> [rpc-clnt.c:368:saved_frames_unwind]<br>
| >>
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)<br>
| >> [0x7fcb1b4b0558]<br>
| >>
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)<br>
| >> [0x7fcb1b4aea63]<br>
| >>
(-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)<br>
| >> [0x7fcb1b4ae97e])))
0-HA-fast-150G-PVE1-client-1: forced<br>
| >> unwinding frame type(GlusterFS
Handshake) op(PING(3)) called<br>
| >> at 2014-08-05 12:52:42.463891
(xid=0x381884x)<br>
| >> [2014-08-05 12:52:44.607753] W<br>
| >>
[client-handshake.c:276:client_ping_cbk]<br>
| >> 0-HA-fast-150G-PVE1-client-1: timer
must have expired<br>
| >> [2014-08-05 12:52:44.607776] I<br>
| >> [client.c:2098:client_rpc_notify]<br>
| >> 0-HA-fast-150G-PVE1-client-1:
disconnected<br>
| >><br>
| >><br>
| >><br>
| >> I've got SSD disks (just for an
info).<br>
| >> Should I go and give a try for 3.5.2?<br>
| >><br>
| >><br>
| >><br>
| >> 2014-08-05 13:06 GMT+03:00 Pranith
Kumar Karampuri<br>
</div>
</div>
| >> <<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>
<mailto:<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div class="">| >><br>
| >> reply along with gluster-users
please :-). May be you are<br>
| >> hitting 'reply' instead of 'reply
all'?<br>
| >><br>
| >> Pranith<br>
| >><br>
| >> On 08/05/2014 03:35 PM, Roman
wrote:<br>
| >>> To make sure and clean, I've
created another VM with raw<br>
| >>> format and goint to repeat
those steps. So now I've got<br>
| >>> two VM-s one with qcow2 format
and other with raw<br>
| >>> format. I will send another
e-mail shortly.<br>
| >>><br>
| >>><br>
| >>> 2014-08-05 13:01 GMT+03:00
Pranith Kumar Karampuri<br>
</div>
| >>> <<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>
<mailto:<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div>
<div class="h5">| >>><br>
| >>><br>
| >>> On 08/05/2014 03:07 PM,
Roman wrote:<br>
| >>>> really, seems like
the same file<br>
| >>>><br>
| >>>> stor1:<br>
| >>>>
a951641c5230472929836f9fcede6b04<br>
| >>>>
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >>>><br>
| >>>> stor2:<br>
| >>>>
a951641c5230472929836f9fcede6b04<br>
| >>>>
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >>>><br>
| >>>><br>
| >>>> one thing I've seen
from logs, that somehow proxmox<br>
| >>>> VE is connecting with
wrong version to servers?<br>
| >>>> [2014-08-05
09:23:45.218550] I<br>
| >>>>
[client-handshake.c:1659:select_server_supported_programs]<br>
| >>>>
0-HA-fast-150G-PVE1-client-0: Using Program<br>
| >>>> GlusterFS 3.3, Num
(1298437), Version (330)<br>
| >>> It is the rpc (over the
network data structures)<br>
| >>> version, which is not
changed at all from 3.3 so<br>
| >>> thats not a problem. So
what is the conclusion? Is<br>
| >>> your test case working
now or not?<br>
| >>><br>
| >>> Pranith<br>
| >>><br>
| >>>> but if I issue:<br>
| >>>> root@pve1:~#
glusterfs -V<br>
| >>>> glusterfs 3.4.4 built
on Jun 28 2014 03:44:57<br>
| >>>> seems ok.<br>
| >>>><br>
| >>>> server use 3.4.4
meanwhile<br>
| >>>> [2014-08-05
09:23:45.117875] I<br>
| >>>>
[server-handshake.c:567:server_setvolume]<br>
| >>>>
0-HA-fast-150G-PVE1-server: accepted client from<br>
| >>>>
stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0<br>
| >>>> (version: 3.4.4)<br>
| >>>> [2014-08-05
09:23:49.103035] I<br>
| >>>>
[server-handshake.c:567:server_setvolume]<br>
| >>>>
0-HA-fast-150G-PVE1-server: accepted client from<br>
| >>>>
stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0<br>
| >>>> (version: 3.4.4)<br>
| >>>><br>
| >>>> if this could be the
reason, of course.<br>
| >>>> I did restart the
Proxmox VE yesterday (just for an<br>
| >>>> information)<br>
| >>>><br>
| >>>><br>
| >>>><br>
| >>>><br>
| >>>><br>
| >>>> 2014-08-05 12:30
GMT+03:00 Pranith Kumar Karampuri<br>
</div>
</div>
| >>>> <<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>
<mailto:<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div>
<div class="h5">| >>>><br>
| >>>><br>
| >>>> On 08/05/2014
02:33 PM, Roman wrote:<br>
| >>>>> Waited long
enough for now, still different<br>
| >>>>> sizes and no
logs about healing :(<br>
| >>>>><br>
| >>>>> stor1<br>
| >>>>> # file:<br>
| >>>>>
exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >>>>>
trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000<br>
| >>>>>
trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000<br>
| >>>>>
trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921<br>
| >>>>><br>
| >>>>> root@stor1:~#
du -sh<br>
| >>>>>
/exports/fast-test/150G/images/127/<br>
| >>>>> 1.2G
/exports/fast-test/150G/images/127/<br>
| >>>>><br>
| >>>>><br>
| >>>>> stor2<br>
| >>>>> # file:<br>
| >>>>>
exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >>>>>
trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000<br>
| >>>>>
trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000<br>
| >>>>>
trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921<br>
| >>>>><br>
| >>>>><br>
| >>>>> root@stor2:~#
du -sh<br>
| >>>>>
/exports/fast-test/150G/images/127/<br>
| >>>>> 1.4G
/exports/fast-test/150G/images/127/<br>
| >>>> According to the
changelogs, the file doesn't<br>
| >>>> need any healing.
Could you stop the operations<br>
| >>>> on the VMs and
take md5sum on both these machines?<br>
| >>>><br>
| >>>> Pranith<br>
| >>>><br>
| >>>>><br>
| >>>>><br>
| >>>>><br>
| >>>>><br>
| >>>>> 2014-08-05
11:49 GMT+03:00 Pranith Kumar<br>
| >>>>> Karampuri
<<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div>
</div>
| >>>>> <mailto:<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div>
<div class="h5">| >>>>><br>
| >>>>><br>
| >>>>> On
08/05/2014 02:06 PM, Roman wrote:<br>
| >>>>>> Well,
it seems like it doesn't see the<br>
| >>>>>>
changes were made to the volume ? I<br>
| >>>>>>
created two files 200 and 100 MB (from<br>
| >>>>>>
/dev/zero) after I disconnected the first<br>
| >>>>>>
brick. Then connected it back and got<br>
| >>>>>> these
logs:<br>
| >>>>>><br>
| >>>>>>
[2014-08-05 08:30:37.830150] I<br>
| >>>>>>
[glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]<br>
| >>>>>>
0-glusterfs: No change in volfile, continuing<br>
| >>>>>>
[2014-08-05 08:30:37.830207] I<br>
| >>>>>>
[rpc-clnt.c:1676:rpc_clnt_reconfig]<br>
| >>>>>>
0-HA-fast-150G-PVE1-client-0: changing<br>
| >>>>>> port
to 49153 (from 0)<br>
| >>>>>>
[2014-08-05 08:30:37.830239] W<br>
| >>>>>>
[socket.c:514:__socket_rwv]<br>
| >>>>>>
0-HA-fast-150G-PVE1-client-0: readv<br>
| >>>>>>
failed (No data available)<br>
| >>>>>>
[2014-08-05 08:30:37.831024] I<br>
| >>>>>>
[client-handshake.c:1659:select_server_supported_programs]<br>
| >>>>>>
0-HA-fast-150G-PVE1-client-0: Using<br>
| >>>>>>
Program GlusterFS 3.3, Num (1298437),<br>
| >>>>>>
Version (330)<br>
| >>>>>>
[2014-08-05 08:30:37.831375] I<br>
| >>>>>>
[client-handshake.c:1456:client_setvolume_cbk]<br>
| >>>>>>
0-HA-fast-150G-PVE1-client-0: Connected<br>
| >>>>>> to <a
moz-do-not-send="true" href="http://10.250.0.1:49153"
target="_blank">10.250.0.1:49153</a><br>
</div>
</div>
| >>>>>> <<a
moz-do-not-send="true" href="http://10.250.0.1:49153"
target="_blank">http://10.250.0.1:49153</a>>, attached
to<br>
<div>
<div class="h5">| >>>>>>
remote volume '/exports/fast-test/150G'.<br>
| >>>>>>
[2014-08-05 08:30:37.831394] I<br>
| >>>>>>
[client-handshake.c:1468:client_setvolume_cbk]<br>
| >>>>>>
0-HA-fast-150G-PVE1-client-0: Server and<br>
| >>>>>>
Client lk-version numbers are not same,<br>
| >>>>>>
reopening the fds<br>
| >>>>>>
[2014-08-05 08:30:37.831566] I<br>
| >>>>>>
[client-handshake.c:450:client_set_lk_version_cbk]<br>
| >>>>>>
0-HA-fast-150G-PVE1-client-0: Server lk<br>
| >>>>>>
version = 1<br>
| >>>>>><br>
| >>>>>><br>
| >>>>>>
[2014-08-05 08:30:37.830150] I<br>
| >>>>>>
[glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]<br>
| >>>>>>
0-glusterfs: No change in volfile, continuing<br>
| >>>>>> this
line seems weird to me tbh.<br>
| >>>>>> I do
not see any traffic on switch<br>
| >>>>>>
interfaces between gluster servers, which<br>
| >>>>>>
means, there is no syncing between them.<br>
| >>>>>> I
tried to ls -l the files on the client<br>
| >>>>>> and
servers to trigger the healing, but<br>
| >>>>>> seems
like no success. Should I wait more?<br>
| >>>>> Yes, it
should take around 10-15 minutes.<br>
| >>>>> Could you
provide 'getfattr -d -m. -e hex<br>
| >>>>>
<file-on-brick>' on both the bricks.<br>
| >>>>><br>
| >>>>> Pranith<br>
| >>>>><br>
| >>>>>><br>
| >>>>>><br>
| >>>>>>
2014-08-05 11:25 GMT+03:00 Pranith Kumar<br>
| >>>>>>
Karampuri <<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div>
</div>
| >>>>>>
<mailto:<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div>
<div class="h5">| >>>>>><br>
| >>>>>><br>
| >>>>>>
On 08/05/2014 01:10 PM, Roman wrote:<br>
| >>>>>>>
Ahha! For some reason I was not able<br>
| >>>>>>>
to start the VM anymore, Proxmox VE<br>
| >>>>>>>
told me, that it is not able to read<br>
| >>>>>>>
the qcow2 header due to permission<br>
| >>>>>>>
is denied for some reason. So I just<br>
| >>>>>>>
deleted that file and created a new<br>
| >>>>>>>
VM. And the nex message I've got was<br>
| >>>>>>>
this:<br>
| >>>>>>
Seems like these are the messages<br>
| >>>>>>
where you took down the bricks before<br>
| >>>>>>
self-heal. Could you restart the run<br>
| >>>>>>
waiting for self-heals to complete<br>
| >>>>>>
before taking down the next brick?<br>
| >>>>>><br>
| >>>>>>
Pranith<br>
| >>>>>><br>
| >>>>>>><br>
| >>>>>>><br>
| >>>>>>>
[2014-08-05 07:31:25.663412] E<br>
| >>>>>>>
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]<br>
| >>>>>>>
0-HA-fast-150G-PVE1-replicate-0:<br>
| >>>>>>>
Unable to self-heal contents of<br>
| >>>>>>>
'/images/124/vm-124-disk-1.qcow2'<br>
| >>>>>>>
(possible split-brain). Please<br>
| >>>>>>>
delete the file from all but the<br>
| >>>>>>>
preferred subvolume.- Pending<br>
| >>>>>>>
matrix: [ [ 0 60 ] [ 11 0 ] ]<br>
| >>>>>>>
[2014-08-05 07:31:25.663955] E<br>
| >>>>>>>
[afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]<br>
| >>>>>>>
0-HA-fast-150G-PVE1-replicate-0:<br>
| >>>>>>>
background data self-heal failed on<br>
| >>>>>>>
/images/124/vm-124-disk-1.qcow2<br>
| >>>>>>><br>
| >>>>>>><br>
| >>>>>>><br>
| >>>>>>>
2014-08-05 10:13 GMT+03:00 Pranith<br>
| >>>>>>>
Kumar Karampuri <<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div>
</div>
| >>>>>>>
<mailto:<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div>
<div class="h5">| >>>>>>><br>
| >>>>>>>
I just responded to your earlier<br>
| >>>>>>>
mail about how the log looks.<br>
| >>>>>>>
The log comes on the mount's logfile<br>
| >>>>>>><br>
| >>>>>>>
Pranith<br>
| >>>>>>><br>
| >>>>>>>
On 08/05/2014 12:41 PM, Roman wrote:<br>
| >>>>>>>>
Ok, so I've waited enough, I<br>
| >>>>>>>>
think. Had no any traffic on<br>
| >>>>>>>>
switch ports between servers.<br>
| >>>>>>>>
Could not find any suitable log<br>
| >>>>>>>>
message about completed<br>
| >>>>>>>>
self-heal (waited about 30<br>
| >>>>>>>>
minutes). Plugged out the other<br>
| >>>>>>>>
server's UTP cable this time<br>
| >>>>>>>>
and got in the same situation:<br>
| >>>>>>>>
root@gluster-test1:~# cat<br>
| >>>>>>>>
/var/log/dmesg<br>
| >>>>>>>>
-bash: /bin/cat: Input/output error<br>
| >>>>>>>><br>
| >>>>>>>>
brick logs:<br>
| >>>>>>>>
[2014-08-05 07:09:03.005474] I<br>
| >>>>>>>>
[server.c:762:server_rpc_notify]<br>
| >>>>>>>>
0-HA-fast-150G-PVE1-server:<br>
| >>>>>>>>
disconnecting connectionfrom<br>
| >>>>>>>>
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0<br>
| >>>>>>>>
[2014-08-05 07:09:03.005530] I<br>
| >>>>>>>>
[server-helpers.c:729:server_connection_put]<br>
| >>>>>>>>
0-HA-fast-150G-PVE1-server:<br>
| >>>>>>>>
Shutting down connection<br>
| >>>>>>>>
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0<br>
| >>>>>>>>
[2014-08-05 07:09:03.005560] I<br>
| >>>>>>>>
[server-helpers.c:463:do_fd_cleanup]<br>
| >>>>>>>>
0-HA-fast-150G-PVE1-server: fd<br>
| >>>>>>>>
cleanup on<br>
| >>>>>>>>
/images/124/vm-124-disk-1.qcow2<br>
| >>>>>>>>
[2014-08-05 07:09:03.005797] I<br>
| >>>>>>>>
[server-helpers.c:617:server_connection_destroy]<br>
| >>>>>>>>
0-HA-fast-150G-PVE1-server:<br>
| >>>>>>>>
destroyed connection of<br>
| >>>>>>>>
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0<br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>>
2014-08-05 9:53 GMT+03:00<br>
| >>>>>>>>
Pranith Kumar Karampuri<br>
| >>>>>>>>
<<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div>
</div>
| >>>>>>>>
<mailto:<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div>
<div class="h5">| >>>>>>>><br>
| >>>>>>>>
Do you think it is possible<br>
| >>>>>>>>
for you to do these tests<br>
| >>>>>>>>
on the latest version<br>
| >>>>>>>>
3.5.2? 'gluster volume heal<br>
| >>>>>>>>
<volname> info' would give<br>
| >>>>>>>>
you that information in<br>
| >>>>>>>>
versions > 3.5.1.<br>
| >>>>>>>>
Otherwise you will have to<br>
| >>>>>>>>
check it from either the<br>
| >>>>>>>>
logs, there will be<br>
| >>>>>>>>
self-heal completed message<br>
| >>>>>>>>
on the mount logs (or) by<br>
| >>>>>>>>
observing 'getfattr -d -m.<br>
| >>>>>>>>
-e hex <image-file-on-bricks>'<br>
| >>>>>>>><br>
| >>>>>>>>
Pranith<br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>>
On 08/05/2014 12:09 PM,<br>
| >>>>>>>>
Roman wrote:<br>
| >>>>>>>>>
Ok, I understand. I will<br>
| >>>>>>>>>
try this shortly.<br>
| >>>>>>>>>
How can I be sure, that<br>
| >>>>>>>>>
healing process is done,<br>
| >>>>>>>>>
if I am not able to see<br>
| >>>>>>>>>
its status?<br>
| >>>>>>>>><br>
| >>>>>>>>><br>
| >>>>>>>>>
2014-08-05 9:30 GMT+03:00<br>
| >>>>>>>>>
Pranith Kumar Karampuri<br>
| >>>>>>>>>
<<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div>
</div>
| >>>>>>>>>
<mailto:<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div>
<div class="h5">| >>>>>>>>><br>
| >>>>>>>>>
Mounts will do the<br>
| >>>>>>>>>
healing, not the<br>
| >>>>>>>>>
self-heal-daemon. The<br>
| >>>>>>>>>
problem I feel is that<br>
| >>>>>>>>>
whichever process does<br>
| >>>>>>>>>
the healing has the<br>
| >>>>>>>>>
latest information<br>
| >>>>>>>>>
about the good bricks<br>
| >>>>>>>>>
in this usecase. Since<br>
| >>>>>>>>>
for VM usecase, mounts<br>
| >>>>>>>>>
should have the latest<br>
| >>>>>>>>>
information, we should<br>
| >>>>>>>>>
let the mounts do the<br>
| >>>>>>>>>
healing. If the mount<br>
| >>>>>>>>>
accesses the VM image<br>
| >>>>>>>>>
either by someone<br>
| >>>>>>>>>
doing operations<br>
| >>>>>>>>>
inside the VM or<br>
| >>>>>>>>>
explicit stat on the<br>
| >>>>>>>>>
file it should do the<br>
| >>>>>>>>>
healing.<br>
| >>>>>>>>><br>
| >>>>>>>>>
Pranith.<br>
| >>>>>>>>><br>
| >>>>>>>>><br>
| >>>>>>>>>
On 08/05/2014 10:39<br>
| >>>>>>>>>
AM, Roman wrote:<br>
| >>>>>>>>>>
Hmmm, you told me to<br>
| >>>>>>>>>>
turn it off. Did I<br>
| >>>>>>>>>>
understood something<br>
| >>>>>>>>>>
wrong? After I issued<br>
| >>>>>>>>>>
the command you've<br>
| >>>>>>>>>>
sent me, I was not<br>
| >>>>>>>>>>
able to watch the<br>
| >>>>>>>>>>
healing process, it<br>
| >>>>>>>>>>
said, it won't be<br>
| >>>>>>>>>>
healed, becouse its<br>
| >>>>>>>>>>
turned off.<br>
| >>>>>>>>>><br>
| >>>>>>>>>><br>
| >>>>>>>>>>
2014-08-05 5:39<br>
| >>>>>>>>>>
GMT+03:00 Pranith<br>
| >>>>>>>>>>
Kumar Karampuri<br>
| >>>>>>>>>>
<<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div>
</div>
| >>>>>>>>>>
<mailto:<a moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div>
<div class="h5">| >>>>>>>>>><br>
| >>>>>>>>>>
You didn't<br>
| >>>>>>>>>>
mention anything<br>
| >>>>>>>>>>
about<br>
| >>>>>>>>>>
self-healing. Did<br>
| >>>>>>>>>>
you wait until<br>
| >>>>>>>>>>
the self-heal is<br>
| >>>>>>>>>>
complete?<br>
| >>>>>>>>>><br>
| >>>>>>>>>>
Pranith<br>
| >>>>>>>>>><br>
| >>>>>>>>>>
On 08/04/2014<br>
| >>>>>>>>>>
05:49 PM, Roman<br>
| >>>>>>>>>>
wrote:<br>
| >>>>>>>>>>>
Hi!<br>
| >>>>>>>>>>>
Result is pretty<br>
| >>>>>>>>>>>
same. I set the<br>
| >>>>>>>>>>>
switch port down<br>
| >>>>>>>>>>>
for 1st server,<br>
| >>>>>>>>>>>
it was ok. Then<br>
| >>>>>>>>>>>
set it up back<br>
| >>>>>>>>>>>
and set other<br>
| >>>>>>>>>>>
server's port<br>
| >>>>>>>>>>>
off. and it<br>
| >>>>>>>>>>>
triggered IO<br>
| >>>>>>>>>>>
error on two<br>
| >>>>>>>>>>>
virtual<br>
| >>>>>>>>>>>
machines: one<br>
| >>>>>>>>>>>
with local root<br>
| >>>>>>>>>>>
FS but network<br>
| >>>>>>>>>>>
mounted storage.<br>
| >>>>>>>>>>>
and other with<br>
| >>>>>>>>>>>
network root FS.<br>
| >>>>>>>>>>>
1st gave an<br>
| >>>>>>>>>>>
error on copying<br>
| >>>>>>>>>>>
to or from the<br>
| >>>>>>>>>>>
mounted network<br>
| >>>>>>>>>>>
disk, other just<br>
| >>>>>>>>>>>
gave me an error<br>
| >>>>>>>>>>>
for even reading<br>
| >>>>>>>>>>>
log.files.<br>
| >>>>>>>>>>><br>
| >>>>>>>>>>>
cat:<br>
| >>>>>>>>>>>
/var/log/alternatives.log:<br>
| >>>>>>>>>>>
Input/output error<br>
| >>>>>>>>>>>
then I reset the<br>
| >>>>>>>>>>>
kvm VM and it<br>
| >>>>>>>>>>>
said me, there<br>
| >>>>>>>>>>>
is no boot<br>
| >>>>>>>>>>>
device. Next I<br>
| >>>>>>>>>>>
virtually<br>
| >>>>>>>>>>>
powered it off<br>
| >>>>>>>>>>>
and then back on<br>
| >>>>>>>>>>>
and it has booted.<br>
| >>>>>>>>>>><br>
| >>>>>>>>>>>
By the way, did<br>
| >>>>>>>>>>>
I have to<br>
| >>>>>>>>>>>
start/stop volume?<br>
| >>>>>>>>>>><br>
| >>>>>>>>>>>
>> Could you do<br>
| >>>>>>>>>>>
the following<br>
| >>>>>>>>>>>
and test it again?<br>
| >>>>>>>>>>>
>> gluster
volume<br>
| >>>>>>>>>>>
set <volname><br>
| >>>>>>>>>>>
cluster.self-heal-daemon<br>
| >>>>>>>>>>>
off<br>
| >>>>>>>>>>><br>
| >>>>>>>>>>>
>>Pranith<br>
| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>>
2014-08-04 14:10<br>
| >>>>>>>>>>>
GMT+03:00<br>
| >>>>>>>>>>>
Pranith Kumar<br>
| >>>>>>>>>>>
Karampuri<br>
| >>>>>>>>>>>
<<a
moz-do-not-send="true"
href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div>
</div>
| >>>>>>>>>>>
<mailto:<a
moz-do-not-send="true" href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div>
<div class="h5">|
>>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>>
On<br>
| >>>>>>>>>>>
08/04/2014<br>
| >>>>>>>>>>>
03:33 PM,<br>
| >>>>>>>>>>>
Roman wrote:<br>
| >>>>>>>>>>>>
Hello!<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>>
Facing the<br>
| >>>>>>>>>>>>
same<br>
| >>>>>>>>>>>>
problem as<br>
| >>>>>>>>>>>>
mentioned<br>
| >>>>>>>>>>>>
here:<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>>
<a
moz-do-not-send="true"
href="http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html"
target="_blank">http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html</a><br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>>
my set up<br>
| >>>>>>>>>>>>
is up and<br>
| >>>>>>>>>>>>
running, so<br>
| >>>>>>>>>>>>
i'm ready<br>
| >>>>>>>>>>>>
to help you<br>
| >>>>>>>>>>>>
back with<br>
| >>>>>>>>>>>>
feedback.<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>>
setup:<br>
| >>>>>>>>>>>>
proxmox<br>
| >>>>>>>>>>>>
server as<br>
| >>>>>>>>>>>>
client<br>
| >>>>>>>>>>>>
2 gluster<br>
| >>>>>>>>>>>>
physical<br>
| >>>>>>>>>>>>
servers<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>>
server side<br>
| >>>>>>>>>>>>
and client<br>
| >>>>>>>>>>>>
side both<br>
| >>>>>>>>>>>>
running atm<br>
| >>>>>>>>>>>>
3.4.4<br>
| >>>>>>>>>>>>
glusterfs<br>
| >>>>>>>>>>>>
from<br>
| >>>>>>>>>>>>
gluster repo.<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>>
the problem
is:<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>>
1. craeted<br>
| >>>>>>>>>>>>
replica
bricks.<br>
| >>>>>>>>>>>>
2. mounted<br>
| >>>>>>>>>>>>
in proxmox<br>
| >>>>>>>>>>>>
(tried both<br>
| >>>>>>>>>>>>
promox<br>
| >>>>>>>>>>>>
ways: via<br>
| >>>>>>>>>>>>
GUI and<br>
| >>>>>>>>>>>>
fstab (with<br>
| >>>>>>>>>>>>
backup<br>
| >>>>>>>>>>>>
volume<br>
| >>>>>>>>>>>>
line), btw<br>
| >>>>>>>>>>>>
while<br>
| >>>>>>>>>>>>
mounting<br>
| >>>>>>>>>>>>
via fstab<br>
| >>>>>>>>>>>>
I'm unable<br>
| >>>>>>>>>>>>
to launch a<br>
| >>>>>>>>>>>>
VM without<br>
| >>>>>>>>>>>>
cache,<br>
| >>>>>>>>>>>>
meanwhile<br>
| >>>>>>>>>>>>
direct-io-mode<br>
| >>>>>>>>>>>>
is enabled<br>
| >>>>>>>>>>>>
in fstab
line)<br>
| >>>>>>>>>>>>
3. installed
VM<br>
| >>>>>>>>>>>>
4. bring<br>
| >>>>>>>>>>>>
one volume<br>
| >>>>>>>>>>>>
down - ok<br>
| >>>>>>>>>>>>
5. bringing<br>
| >>>>>>>>>>>>
up, waiting<br>
| >>>>>>>>>>>>
for sync is<br>
| >>>>>>>>>>>>
done.<br>
| >>>>>>>>>>>>
6. bring<br>
| >>>>>>>>>>>>
other<br>
| >>>>>>>>>>>>
volume down<br>
| >>>>>>>>>>>>
- getting<br>
| >>>>>>>>>>>>
IO errors<br>
| >>>>>>>>>>>>
on VM guest<br>
| >>>>>>>>>>>>
and not<br>
| >>>>>>>>>>>>
able to<br>
| >>>>>>>>>>>>
restore the<br>
| >>>>>>>>>>>>
VM after I<br>
| >>>>>>>>>>>>
reset the<br>
| >>>>>>>>>>>>
VM via<br>
| >>>>>>>>>>>>
host. It<br>
| >>>>>>>>>>>>
says (no<br>
| >>>>>>>>>>>>
bootable<br>
| >>>>>>>>>>>>
media).<br>
| >>>>>>>>>>>>
After I<br>
| >>>>>>>>>>>>
shut it<br>
| >>>>>>>>>>>>
down<br>
| >>>>>>>>>>>>
(forced)<br>
| >>>>>>>>>>>>
and bring<br>
| >>>>>>>>>>>>
back up, it<br>
| >>>>>>>>>>>>
boots.<br>
| >>>>>>>>>>>
Could you do<br>
| >>>>>>>>>>>
the<br>
| >>>>>>>>>>>
following<br>
| >>>>>>>>>>>
and test it<br>
| >>>>>>>>>>>
again?<br>
| >>>>>>>>>>>
gluster<br>
| >>>>>>>>>>>
volume set<br>
| >>>>>>>>>>>
<volname><br>
| >>>>>>>>>>>
cluster.self-heal-daemon<br>
| >>>>>>>>>>>
off<br>
| >>>>>>>>>>><br>
| >>>>>>>>>>>
Pranith<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>>
Need help.<br>
| >>>>>>>>>>>>
Tried<br>
| >>>>>>>>>>>>
3.4.3, 3.4.4.<br>
| >>>>>>>>>>>>
Still<br>
| >>>>>>>>>>>>
missing<br>
| >>>>>>>>>>>>
pkg-s for<br>
| >>>>>>>>>>>>
3.4.5 for<br>
| >>>>>>>>>>>>
debian and<br>
| >>>>>>>>>>>>
3.5.2<br>
| >>>>>>>>>>>>
(3.5.1<br>
| >>>>>>>>>>>>
always<br>
| >>>>>>>>>>>>
gives a<br>
| >>>>>>>>>>>>
healing<br>
| >>>>>>>>>>>>
error for<br>
| >>>>>>>>>>>>
some reason)<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>>
--<br>
| >>>>>>>>>>>>
Best regards,<br>
| >>>>>>>>>>>>
Roman.<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>>
_______________________________________________<br>
| >>>>>>>>>>>>
Gluster-users<br>
| >>>>>>>>>>>>
mailing list<br>
| >>>>>>>>>>>>
<a
moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
</div>
</div>
| >>>>>>>>>>>>
<mailto:<a
moz-do-not-send="true"
href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>><br>
<div class="HOEnZb">
<div class="h5">|
>>>>>>>>>>>>
<a
moz-do-not-send="true"
href="http://supercolony.gluster.org/mailman/listinfo/gluster-users"
target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>>
--<br>
| >>>>>>>>>>>
Best regards,<br>
| >>>>>>>>>>>
Roman.<br>
| >>>>>>>>>><br>
| >>>>>>>>>><br>
| >>>>>>>>>><br>
| >>>>>>>>>><br>
| >>>>>>>>>>
--<br>
| >>>>>>>>>>
Best regards,<br>
| >>>>>>>>>>
Roman.<br>
| >>>>>>>>><br>
| >>>>>>>>><br>
| >>>>>>>>><br>
| >>>>>>>>><br>
| >>>>>>>>>
--<br>
| >>>>>>>>>
Best regards,<br>
| >>>>>>>>>
Roman.<br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>>
--<br>
| >>>>>>>>
Best regards,<br>
| >>>>>>>>
Roman.<br>
| >>>>>>><br>
| >>>>>>><br>
| >>>>>>><br>
| >>>>>>><br>
| >>>>>>>
--<br>
| >>>>>>>
Best regards,<br>
| >>>>>>>
Roman.<br>
| >>>>>><br>
| >>>>>><br>
| >>>>>><br>
| >>>>>><br>
| >>>>>> --<br>
| >>>>>> Best
regards,<br>
| >>>>>>
Roman.<br>
| >>>>><br>
| >>>>><br>
| >>>>><br>
| >>>>><br>
| >>>>> --<br>
| >>>>> Best regards,<br>
| >>>>> Roman.<br>
| >>>><br>
| >>>><br>
| >>>><br>
| >>>><br>
| >>>> --<br>
| >>>> Best regards,<br>
| >>>> Roman.<br>
| >>><br>
| >>><br>
| >>><br>
| >>><br>
| >>> --<br>
| >>> Best regards,<br>
| >>> Roman.<br>
| >><br>
| >><br>
| >><br>
| >><br>
| >> --<br>
| >> Best regards,<br>
| >> Roman.<br>
| >><br>
| >><br>
| >><br>
| >><br>
| >> --<br>
| >> Best regards,<br>
| >> Roman.<br>
| ><br>
| ><br>
| ><br>
| ><br>
| > --<br>
| > Best regards,<br>
| > Roman.<br>
|<br>
|<br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman.
</div>
</blockquote>
<br>
</body>
</html>