<div dir="ltr">Hello all again!<div>I'm back from vacation and I'm pretty happy with 3.5.2 available for wheezy. Thanks! Just made my updates.</div><div>For 3.5.2 do I still have to set cluster.self-heal-daemon to off?</div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-08-06 12:49 GMT+03:00 Humble Chirammal <span dir="ltr"><<a href="mailto:hchiramm@redhat.com" target="_blank">hchiramm@redhat.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class=""><br>
<br>
<br>
----- Original Message -----<br>
| From: "Pranith Kumar Karampuri" <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>><br>
| To: "Roman" <<a href="mailto:romeo.r@gmail.com">romeo.r@gmail.com</a>><br>
| Cc: <a href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>, "Niels de Vos" <<a href="mailto:ndevos@redhat.com">ndevos@redhat.com</a>>, "Humble Chirammal" <<a href="mailto:hchiramm@redhat.com">hchiramm@redhat.com</a>><br>
| Sent: Wednesday, August 6, 2014 12:09:57 PM<br>
| Subject: Re: [Gluster-users] libgfapi failover problem on replica bricks<br>
|<br>
| Roman,<br>
| The file went into split-brain. I think we should do these tests<br>
| with 3.5.2. Where monitoring the heals is easier. Let me also come up<br>
| with a document about how to do this testing you are trying to do.<br>
|<br>
| Humble/Niels,<br>
| Do we have debs available for 3.5.2? In 3.5.1 there was packaging<br>
| issue where /usr/bin/glfsheal is not packaged along with the deb. I<br>
| think that should be fixed now as well?<br>
|<br>
</div>Pranith,<br>
<br>
The 3.5.2 packages for debian is not available yet. We are co-ordinating internally to get it processed.<br>
I will update the list once its available.<br>
<br>
--Humble<br>
<div class="">|<br>
| On 08/06/2014 11:52 AM, Roman wrote:<br>
| > good morning,<br>
| ><br>
| > root@stor1:~# getfattr -d -m. -e hex<br>
| > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| > getfattr: Removing leading '/' from absolute path names<br>
| > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000<br>
| > trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000<br>
| > trusted.gfid=0x23c79523075a4158bea38078da570449<br>
| ><br>
| > getfattr: Removing leading '/' from absolute path names<br>
| > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000<br>
| > trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000<br>
| > trusted.gfid=0x23c79523075a4158bea38078da570449<br>
| ><br>
| ><br>
| ><br>
| > 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div>| > <mailto:<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div class="">| ><br>
| ><br>
| > On 08/06/2014 11:30 AM, Roman wrote:<br>
| >> Also, this time files are not the same!<br>
| >><br>
| >> root@stor1:~# md5sum<br>
| >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >> 32411360c53116b96a059f17306caeda<br>
| >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >><br>
| >> root@stor2:~# md5sum<br>
| >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >> 65b8a6031bcb6f5fb3a11cb1e8b1c9c9<br>
| >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| > What is the getfattr output?<br>
| ><br>
| > Pranith<br>
| ><br>
| >><br>
| >><br>
| >> 2014-08-05 16:33 GMT+03:00 Roman <<a href="mailto:romeo.r@gmail.com">romeo.r@gmail.com</a><br>
</div>| >> <mailto:<a href="mailto:romeo.r@gmail.com">romeo.r@gmail.com</a>>>:<br>
<div class="">| >><br>
| >> Nope, it is not working. But this time it went a bit other way<br>
| >><br>
| >> root@gluster-client:~# dmesg<br>
| >> Segmentation fault<br>
| >><br>
| >><br>
| >> I was not able even to start the VM after I done the tests<br>
| >><br>
| >> Could not read qcow2 header: Operation not permitted<br>
| >><br>
| >> And it seems, it never starts to sync files after first<br>
| >> disconnect. VM survives first disconnect, but not second (I<br>
| >> waited around 30 minutes). Also, I've<br>
| >> got network.ping-timeout: 2 in volume settings, but logs<br>
| >> react on first disconnect around 30 seconds. Second was<br>
| >> faster, 2 seconds.<br>
| >><br>
| >> Reaction was different also:<br>
| >><br>
| >> slower one:<br>
| >> [2014-08-05 13:26:19.558435] W [socket.c:514:__socket_rwv]<br>
| >> 0-glusterfs: readv failed (Connection timed out)<br>
| >> [2014-08-05 13:26:19.558485] W<br>
| >> [socket.c:1962:__socket_proto_state_machine] 0-glusterfs:<br>
| >> reading from socket failed. Error (Connection timed out),<br>
</div>| >> peer (<a href="http://10.250.0.1:24007" target="_blank">10.250.0.1:24007</a> <<a href="http://10.250.0.1:24007" target="_blank">http://10.250.0.1:24007</a>>)<br>
<div class="">| >> [2014-08-05 13:26:21.281426] W [socket.c:514:__socket_rwv]<br>
| >> 0-HA-fast-150G-PVE1-client-0: readv failed (Connection timed out)<br>
| >> [2014-08-05 13:26:21.281474] W<br>
| >> [socket.c:1962:__socket_proto_state_machine]<br>
| >> 0-HA-fast-150G-PVE1-client-0: reading from socket failed.<br>
| >> Error (Connection timed out), peer (<a href="http://10.250.0.1:49153" target="_blank">10.250.0.1:49153</a><br>
</div>| >> <<a href="http://10.250.0.1:49153" target="_blank">http://10.250.0.1:49153</a>>)<br>
<div class="">| >> [2014-08-05 13:26:21.281507] I<br>
| >> [client.c:2098:client_rpc_notify]<br>
| >> 0-HA-fast-150G-PVE1-client-0: disconnected<br>
| >><br>
| >> the fast one:<br>
| >> 2014-08-05 12:52:44.607389] C<br>
| >> [client-handshake.c:127:rpc_client_ping_timer_expired]<br>
| >> 0-HA-fast-150G-PVE1-client-1: server <a href="http://10.250.0.2:49153" target="_blank">10.250.0.2:49153</a><br>
</div>| >> <<a href="http://10.250.0.2:49153" target="_blank">http://10.250.0.2:49153</a>> has not responded in the last 2<br>
<div><div class="h5">| >> seconds, disconnecting.<br>
| >> [2014-08-05 12:52:44.607491] W [socket.c:514:__socket_rwv]<br>
| >> 0-HA-fast-150G-PVE1-client-1: readv failed (No data available)<br>
| >> [2014-08-05 12:52:44.607585] E<br>
| >> [rpc-clnt.c:368:saved_frames_unwind]<br>
| >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)<br>
| >> [0x7fcb1b4b0558]<br>
| >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)<br>
| >> [0x7fcb1b4aea63]<br>
| >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)<br>
| >> [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced<br>
| >> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at<br>
| >> 2014-08-05 12:52:42.463881 (xid=0x381883x)<br>
| >> [2014-08-05 12:52:44.607604] W<br>
| >> [client-rpc-fops.c:2624:client3_3_lookup_cbk]<br>
| >> 0-HA-fast-150G-PVE1-client-1: remote operation failed:<br>
| >> Transport endpoint is not connected. Path: /<br>
| >> (00000000-0000-0000-0000-000000000001)<br>
| >> [2014-08-05 12:52:44.607736] E<br>
| >> [rpc-clnt.c:368:saved_frames_unwind]<br>
| >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)<br>
| >> [0x7fcb1b4b0558]<br>
| >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)<br>
| >> [0x7fcb1b4aea63]<br>
| >> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)<br>
| >> [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced<br>
| >> unwinding frame type(GlusterFS Handshake) op(PING(3)) called<br>
| >> at 2014-08-05 12:52:42.463891 (xid=0x381884x)<br>
| >> [2014-08-05 12:52:44.607753] W<br>
| >> [client-handshake.c:276:client_ping_cbk]<br>
| >> 0-HA-fast-150G-PVE1-client-1: timer must have expired<br>
| >> [2014-08-05 12:52:44.607776] I<br>
| >> [client.c:2098:client_rpc_notify]<br>
| >> 0-HA-fast-150G-PVE1-client-1: disconnected<br>
| >><br>
| >><br>
| >><br>
| >> I've got SSD disks (just for an info).<br>
| >> Should I go and give a try for 3.5.2?<br>
| >><br>
| >><br>
| >><br>
| >> 2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri<br>
</div></div>| >> <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a> <mailto:<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div class="">| >><br>
| >> reply along with gluster-users please :-). May be you are<br>
| >> hitting 'reply' instead of 'reply all'?<br>
| >><br>
| >> Pranith<br>
| >><br>
| >> On 08/05/2014 03:35 PM, Roman wrote:<br>
| >>> To make sure and clean, I've created another VM with raw<br>
| >>> format and goint to repeat those steps. So now I've got<br>
| >>> two VM-s one with qcow2 format and other with raw<br>
| >>> format. I will send another e-mail shortly.<br>
| >>><br>
| >>><br>
| >>> 2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri<br>
</div>| >>> <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a> <mailto:<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div><div class="h5">| >>><br>
| >>><br>
| >>> On 08/05/2014 03:07 PM, Roman wrote:<br>
| >>>> really, seems like the same file<br>
| >>>><br>
| >>>> stor1:<br>
| >>>> a951641c5230472929836f9fcede6b04<br>
| >>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >>>><br>
| >>>> stor2:<br>
| >>>> a951641c5230472929836f9fcede6b04<br>
| >>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >>>><br>
| >>>><br>
| >>>> one thing I've seen from logs, that somehow proxmox<br>
| >>>> VE is connecting with wrong version to servers?<br>
| >>>> [2014-08-05 09:23:45.218550] I<br>
| >>>> [client-handshake.c:1659:select_server_supported_programs]<br>
| >>>> 0-HA-fast-150G-PVE1-client-0: Using Program<br>
| >>>> GlusterFS 3.3, Num (1298437), Version (330)<br>
| >>> It is the rpc (over the network data structures)<br>
| >>> version, which is not changed at all from 3.3 so<br>
| >>> thats not a problem. So what is the conclusion? Is<br>
| >>> your test case working now or not?<br>
| >>><br>
| >>> Pranith<br>
| >>><br>
| >>>> but if I issue:<br>
| >>>> root@pve1:~# glusterfs -V<br>
| >>>> glusterfs 3.4.4 built on Jun 28 2014 03:44:57<br>
| >>>> seems ok.<br>
| >>>><br>
| >>>> server use 3.4.4 meanwhile<br>
| >>>> [2014-08-05 09:23:45.117875] I<br>
| >>>> [server-handshake.c:567:server_setvolume]<br>
| >>>> 0-HA-fast-150G-PVE1-server: accepted client from<br>
| >>>> stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0<br>
| >>>> (version: 3.4.4)<br>
| >>>> [2014-08-05 09:23:49.103035] I<br>
| >>>> [server-handshake.c:567:server_setvolume]<br>
| >>>> 0-HA-fast-150G-PVE1-server: accepted client from<br>
| >>>> stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0<br>
| >>>> (version: 3.4.4)<br>
| >>>><br>
| >>>> if this could be the reason, of course.<br>
| >>>> I did restart the Proxmox VE yesterday (just for an<br>
| >>>> information)<br>
| >>>><br>
| >>>><br>
| >>>><br>
| >>>><br>
| >>>><br>
| >>>> 2014-08-05 12:30 GMT+03:00 Pranith Kumar Karampuri<br>
</div></div>| >>>> <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a> <mailto:<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div><div class="h5">| >>>><br>
| >>>><br>
| >>>> On 08/05/2014 02:33 PM, Roman wrote:<br>
| >>>>> Waited long enough for now, still different<br>
| >>>>> sizes and no logs about healing :(<br>
| >>>>><br>
| >>>>> stor1<br>
| >>>>> # file:<br>
| >>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000<br>
| >>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000<br>
| >>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921<br>
| >>>>><br>
| >>>>> root@stor1:~# du -sh<br>
| >>>>> /exports/fast-test/150G/images/127/<br>
| >>>>> 1.2G /exports/fast-test/150G/images/127/<br>
| >>>>><br>
| >>>>><br>
| >>>>> stor2<br>
| >>>>> # file:<br>
| >>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2<br>
| >>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000<br>
| >>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000<br>
| >>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921<br>
| >>>>><br>
| >>>>><br>
| >>>>> root@stor2:~# du -sh<br>
| >>>>> /exports/fast-test/150G/images/127/<br>
| >>>>> 1.4G /exports/fast-test/150G/images/127/<br>
| >>>> According to the changelogs, the file doesn't<br>
| >>>> need any healing. Could you stop the operations<br>
| >>>> on the VMs and take md5sum on both these machines?<br>
| >>>><br>
| >>>> Pranith<br>
| >>>><br>
| >>>>><br>
| >>>>><br>
| >>>>><br>
| >>>>><br>
| >>>>> 2014-08-05 11:49 GMT+03:00 Pranith Kumar<br>
| >>>>> Karampuri <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div></div>| >>>>> <mailto:<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div><div class="h5">| >>>>><br>
| >>>>><br>
| >>>>> On 08/05/2014 02:06 PM, Roman wrote:<br>
| >>>>>> Well, it seems like it doesn't see the<br>
| >>>>>> changes were made to the volume ? I<br>
| >>>>>> created two files 200 and 100 MB (from<br>
| >>>>>> /dev/zero) after I disconnected the first<br>
| >>>>>> brick. Then connected it back and got<br>
| >>>>>> these logs:<br>
| >>>>>><br>
| >>>>>> [2014-08-05 08:30:37.830150] I<br>
| >>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]<br>
| >>>>>> 0-glusterfs: No change in volfile, continuing<br>
| >>>>>> [2014-08-05 08:30:37.830207] I<br>
| >>>>>> [rpc-clnt.c:1676:rpc_clnt_reconfig]<br>
| >>>>>> 0-HA-fast-150G-PVE1-client-0: changing<br>
| >>>>>> port to 49153 (from 0)<br>
| >>>>>> [2014-08-05 08:30:37.830239] W<br>
| >>>>>> [socket.c:514:__socket_rwv]<br>
| >>>>>> 0-HA-fast-150G-PVE1-client-0: readv<br>
| >>>>>> failed (No data available)<br>
| >>>>>> [2014-08-05 08:30:37.831024] I<br>
| >>>>>> [client-handshake.c:1659:select_server_supported_programs]<br>
| >>>>>> 0-HA-fast-150G-PVE1-client-0: Using<br>
| >>>>>> Program GlusterFS 3.3, Num (1298437),<br>
| >>>>>> Version (330)<br>
| >>>>>> [2014-08-05 08:30:37.831375] I<br>
| >>>>>> [client-handshake.c:1456:client_setvolume_cbk]<br>
| >>>>>> 0-HA-fast-150G-PVE1-client-0: Connected<br>
| >>>>>> to <a href="http://10.250.0.1:49153" target="_blank">10.250.0.1:49153</a><br>
</div></div>| >>>>>> <<a href="http://10.250.0.1:49153" target="_blank">http://10.250.0.1:49153</a>>, attached to<br>
<div><div class="h5">| >>>>>> remote volume '/exports/fast-test/150G'.<br>
| >>>>>> [2014-08-05 08:30:37.831394] I<br>
| >>>>>> [client-handshake.c:1468:client_setvolume_cbk]<br>
| >>>>>> 0-HA-fast-150G-PVE1-client-0: Server and<br>
| >>>>>> Client lk-version numbers are not same,<br>
| >>>>>> reopening the fds<br>
| >>>>>> [2014-08-05 08:30:37.831566] I<br>
| >>>>>> [client-handshake.c:450:client_set_lk_version_cbk]<br>
| >>>>>> 0-HA-fast-150G-PVE1-client-0: Server lk<br>
| >>>>>> version = 1<br>
| >>>>>><br>
| >>>>>><br>
| >>>>>> [2014-08-05 08:30:37.830150] I<br>
| >>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]<br>
| >>>>>> 0-glusterfs: No change in volfile, continuing<br>
| >>>>>> this line seems weird to me tbh.<br>
| >>>>>> I do not see any traffic on switch<br>
| >>>>>> interfaces between gluster servers, which<br>
| >>>>>> means, there is no syncing between them.<br>
| >>>>>> I tried to ls -l the files on the client<br>
| >>>>>> and servers to trigger the healing, but<br>
| >>>>>> seems like no success. Should I wait more?<br>
| >>>>> Yes, it should take around 10-15 minutes.<br>
| >>>>> Could you provide 'getfattr -d -m. -e hex<br>
| >>>>> <file-on-brick>' on both the bricks.<br>
| >>>>><br>
| >>>>> Pranith<br>
| >>>>><br>
| >>>>>><br>
| >>>>>><br>
| >>>>>> 2014-08-05 11:25 GMT+03:00 Pranith Kumar<br>
| >>>>>> Karampuri <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div></div>| >>>>>> <mailto:<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div><div class="h5">| >>>>>><br>
| >>>>>><br>
| >>>>>> On 08/05/2014 01:10 PM, Roman wrote:<br>
| >>>>>>> Ahha! For some reason I was not able<br>
| >>>>>>> to start the VM anymore, Proxmox VE<br>
| >>>>>>> told me, that it is not able to read<br>
| >>>>>>> the qcow2 header due to permission<br>
| >>>>>>> is denied for some reason. So I just<br>
| >>>>>>> deleted that file and created a new<br>
| >>>>>>> VM. And the nex message I've got was<br>
| >>>>>>> this:<br>
| >>>>>> Seems like these are the messages<br>
| >>>>>> where you took down the bricks before<br>
| >>>>>> self-heal. Could you restart the run<br>
| >>>>>> waiting for self-heals to complete<br>
| >>>>>> before taking down the next brick?<br>
| >>>>>><br>
| >>>>>> Pranith<br>
| >>>>>><br>
| >>>>>>><br>
| >>>>>>><br>
| >>>>>>> [2014-08-05 07:31:25.663412] E<br>
| >>>>>>> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]<br>
| >>>>>>> 0-HA-fast-150G-PVE1-replicate-0:<br>
| >>>>>>> Unable to self-heal contents of<br>
| >>>>>>> '/images/124/vm-124-disk-1.qcow2'<br>
| >>>>>>> (possible split-brain). Please<br>
| >>>>>>> delete the file from all but the<br>
| >>>>>>> preferred subvolume.- Pending<br>
| >>>>>>> matrix: [ [ 0 60 ] [ 11 0 ] ]<br>
| >>>>>>> [2014-08-05 07:31:25.663955] E<br>
| >>>>>>> [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]<br>
| >>>>>>> 0-HA-fast-150G-PVE1-replicate-0:<br>
| >>>>>>> background data self-heal failed on<br>
| >>>>>>> /images/124/vm-124-disk-1.qcow2<br>
| >>>>>>><br>
| >>>>>>><br>
| >>>>>>><br>
| >>>>>>> 2014-08-05 10:13 GMT+03:00 Pranith<br>
| >>>>>>> Kumar Karampuri <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div></div>| >>>>>>> <mailto:<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div><div class="h5">| >>>>>>><br>
| >>>>>>> I just responded to your earlier<br>
| >>>>>>> mail about how the log looks.<br>
| >>>>>>> The log comes on the mount's logfile<br>
| >>>>>>><br>
| >>>>>>> Pranith<br>
| >>>>>>><br>
| >>>>>>> On 08/05/2014 12:41 PM, Roman wrote:<br>
| >>>>>>>> Ok, so I've waited enough, I<br>
| >>>>>>>> think. Had no any traffic on<br>
| >>>>>>>> switch ports between servers.<br>
| >>>>>>>> Could not find any suitable log<br>
| >>>>>>>> message about completed<br>
| >>>>>>>> self-heal (waited about 30<br>
| >>>>>>>> minutes). Plugged out the other<br>
| >>>>>>>> server's UTP cable this time<br>
| >>>>>>>> and got in the same situation:<br>
| >>>>>>>> root@gluster-test1:~# cat<br>
| >>>>>>>> /var/log/dmesg<br>
| >>>>>>>> -bash: /bin/cat: Input/output error<br>
| >>>>>>>><br>
| >>>>>>>> brick logs:<br>
| >>>>>>>> [2014-08-05 07:09:03.005474] I<br>
| >>>>>>>> [server.c:762:server_rpc_notify]<br>
| >>>>>>>> 0-HA-fast-150G-PVE1-server:<br>
| >>>>>>>> disconnecting connectionfrom<br>
| >>>>>>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0<br>
| >>>>>>>> [2014-08-05 07:09:03.005530] I<br>
| >>>>>>>> [server-helpers.c:729:server_connection_put]<br>
| >>>>>>>> 0-HA-fast-150G-PVE1-server:<br>
| >>>>>>>> Shutting down connection<br>
| >>>>>>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0<br>
| >>>>>>>> [2014-08-05 07:09:03.005560] I<br>
| >>>>>>>> [server-helpers.c:463:do_fd_cleanup]<br>
| >>>>>>>> 0-HA-fast-150G-PVE1-server: fd<br>
| >>>>>>>> cleanup on<br>
| >>>>>>>> /images/124/vm-124-disk-1.qcow2<br>
| >>>>>>>> [2014-08-05 07:09:03.005797] I<br>
| >>>>>>>> [server-helpers.c:617:server_connection_destroy]<br>
| >>>>>>>> 0-HA-fast-150G-PVE1-server:<br>
| >>>>>>>> destroyed connection of<br>
| >>>>>>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0<br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>> 2014-08-05 9:53 GMT+03:00<br>
| >>>>>>>> Pranith Kumar Karampuri<br>
| >>>>>>>> <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div></div>| >>>>>>>> <mailto:<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div><div class="h5">| >>>>>>>><br>
| >>>>>>>> Do you think it is possible<br>
| >>>>>>>> for you to do these tests<br>
| >>>>>>>> on the latest version<br>
| >>>>>>>> 3.5.2? 'gluster volume heal<br>
| >>>>>>>> <volname> info' would give<br>
| >>>>>>>> you that information in<br>
| >>>>>>>> versions > 3.5.1.<br>
| >>>>>>>> Otherwise you will have to<br>
| >>>>>>>> check it from either the<br>
| >>>>>>>> logs, there will be<br>
| >>>>>>>> self-heal completed message<br>
| >>>>>>>> on the mount logs (or) by<br>
| >>>>>>>> observing 'getfattr -d -m.<br>
| >>>>>>>> -e hex <image-file-on-bricks>'<br>
| >>>>>>>><br>
| >>>>>>>> Pranith<br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>> On 08/05/2014 12:09 PM,<br>
| >>>>>>>> Roman wrote:<br>
| >>>>>>>>> Ok, I understand. I will<br>
| >>>>>>>>> try this shortly.<br>
| >>>>>>>>> How can I be sure, that<br>
| >>>>>>>>> healing process is done,<br>
| >>>>>>>>> if I am not able to see<br>
| >>>>>>>>> its status?<br>
| >>>>>>>>><br>
| >>>>>>>>><br>
| >>>>>>>>> 2014-08-05 9:30 GMT+03:00<br>
| >>>>>>>>> Pranith Kumar Karampuri<br>
| >>>>>>>>> <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div></div>| >>>>>>>>> <mailto:<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div><div class="h5">| >>>>>>>>><br>
| >>>>>>>>> Mounts will do the<br>
| >>>>>>>>> healing, not the<br>
| >>>>>>>>> self-heal-daemon. The<br>
| >>>>>>>>> problem I feel is that<br>
| >>>>>>>>> whichever process does<br>
| >>>>>>>>> the healing has the<br>
| >>>>>>>>> latest information<br>
| >>>>>>>>> about the good bricks<br>
| >>>>>>>>> in this usecase. Since<br>
| >>>>>>>>> for VM usecase, mounts<br>
| >>>>>>>>> should have the latest<br>
| >>>>>>>>> information, we should<br>
| >>>>>>>>> let the mounts do the<br>
| >>>>>>>>> healing. If the mount<br>
| >>>>>>>>> accesses the VM image<br>
| >>>>>>>>> either by someone<br>
| >>>>>>>>> doing operations<br>
| >>>>>>>>> inside the VM or<br>
| >>>>>>>>> explicit stat on the<br>
| >>>>>>>>> file it should do the<br>
| >>>>>>>>> healing.<br>
| >>>>>>>>><br>
| >>>>>>>>> Pranith.<br>
| >>>>>>>>><br>
| >>>>>>>>><br>
| >>>>>>>>> On 08/05/2014 10:39<br>
| >>>>>>>>> AM, Roman wrote:<br>
| >>>>>>>>>> Hmmm, you told me to<br>
| >>>>>>>>>> turn it off. Did I<br>
| >>>>>>>>>> understood something<br>
| >>>>>>>>>> wrong? After I issued<br>
| >>>>>>>>>> the command you've<br>
| >>>>>>>>>> sent me, I was not<br>
| >>>>>>>>>> able to watch the<br>
| >>>>>>>>>> healing process, it<br>
| >>>>>>>>>> said, it won't be<br>
| >>>>>>>>>> healed, becouse its<br>
| >>>>>>>>>> turned off.<br>
| >>>>>>>>>><br>
| >>>>>>>>>><br>
| >>>>>>>>>> 2014-08-05 5:39<br>
| >>>>>>>>>> GMT+03:00 Pranith<br>
| >>>>>>>>>> Kumar Karampuri<br>
| >>>>>>>>>> <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div></div>| >>>>>>>>>> <mailto:<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div><div class="h5">| >>>>>>>>>><br>
| >>>>>>>>>> You didn't<br>
| >>>>>>>>>> mention anything<br>
| >>>>>>>>>> about<br>
| >>>>>>>>>> self-healing. Did<br>
| >>>>>>>>>> you wait until<br>
| >>>>>>>>>> the self-heal is<br>
| >>>>>>>>>> complete?<br>
| >>>>>>>>>><br>
| >>>>>>>>>> Pranith<br>
| >>>>>>>>>><br>
| >>>>>>>>>> On 08/04/2014<br>
| >>>>>>>>>> 05:49 PM, Roman<br>
| >>>>>>>>>> wrote:<br>
| >>>>>>>>>>> Hi!<br>
| >>>>>>>>>>> Result is pretty<br>
| >>>>>>>>>>> same. I set the<br>
| >>>>>>>>>>> switch port down<br>
| >>>>>>>>>>> for 1st server,<br>
| >>>>>>>>>>> it was ok. Then<br>
| >>>>>>>>>>> set it up back<br>
| >>>>>>>>>>> and set other<br>
| >>>>>>>>>>> server's port<br>
| >>>>>>>>>>> off. and it<br>
| >>>>>>>>>>> triggered IO<br>
| >>>>>>>>>>> error on two<br>
| >>>>>>>>>>> virtual<br>
| >>>>>>>>>>> machines: one<br>
| >>>>>>>>>>> with local root<br>
| >>>>>>>>>>> FS but network<br>
| >>>>>>>>>>> mounted storage.<br>
| >>>>>>>>>>> and other with<br>
| >>>>>>>>>>> network root FS.<br>
| >>>>>>>>>>> 1st gave an<br>
| >>>>>>>>>>> error on copying<br>
| >>>>>>>>>>> to or from the<br>
| >>>>>>>>>>> mounted network<br>
| >>>>>>>>>>> disk, other just<br>
| >>>>>>>>>>> gave me an error<br>
| >>>>>>>>>>> for even reading<br>
| >>>>>>>>>>> log.files.<br>
| >>>>>>>>>>><br>
| >>>>>>>>>>> cat:<br>
| >>>>>>>>>>> /var/log/alternatives.log:<br>
| >>>>>>>>>>> Input/output error<br>
| >>>>>>>>>>> then I reset the<br>
| >>>>>>>>>>> kvm VM and it<br>
| >>>>>>>>>>> said me, there<br>
| >>>>>>>>>>> is no boot<br>
| >>>>>>>>>>> device. Next I<br>
| >>>>>>>>>>> virtually<br>
| >>>>>>>>>>> powered it off<br>
| >>>>>>>>>>> and then back on<br>
| >>>>>>>>>>> and it has booted.<br>
| >>>>>>>>>>><br>
| >>>>>>>>>>> By the way, did<br>
| >>>>>>>>>>> I have to<br>
| >>>>>>>>>>> start/stop volume?<br>
| >>>>>>>>>>><br>
| >>>>>>>>>>> >> Could you do<br>
| >>>>>>>>>>> the following<br>
| >>>>>>>>>>> and test it again?<br>
| >>>>>>>>>>> >> gluster volume<br>
| >>>>>>>>>>> set <volname><br>
| >>>>>>>>>>> cluster.self-heal-daemon<br>
| >>>>>>>>>>> off<br>
| >>>>>>>>>>><br>
| >>>>>>>>>>> >>Pranith<br>
| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>> 2014-08-04 14:10<br>
| >>>>>>>>>>> GMT+03:00<br>
| >>>>>>>>>>> Pranith Kumar<br>
| >>>>>>>>>>> Karampuri<br>
| >>>>>>>>>>> <<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a><br>
</div></div>| >>>>>>>>>>> <mailto:<a href="mailto:pkarampu@redhat.com">pkarampu@redhat.com</a>>>:<br>
<div><div class="h5">| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>> On<br>
| >>>>>>>>>>> 08/04/2014<br>
| >>>>>>>>>>> 03:33 PM,<br>
| >>>>>>>>>>> Roman wrote:<br>
| >>>>>>>>>>>> Hello!<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>> Facing the<br>
| >>>>>>>>>>>> same<br>
| >>>>>>>>>>>> problem as<br>
| >>>>>>>>>>>> mentioned<br>
| >>>>>>>>>>>> here:<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>> <a href="http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html" target="_blank">http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html</a><br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>> my set up<br>
| >>>>>>>>>>>> is up and<br>
| >>>>>>>>>>>> running, so<br>
| >>>>>>>>>>>> i'm ready<br>
| >>>>>>>>>>>> to help you<br>
| >>>>>>>>>>>> back with<br>
| >>>>>>>>>>>> feedback.<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>> setup:<br>
| >>>>>>>>>>>> proxmox<br>
| >>>>>>>>>>>> server as<br>
| >>>>>>>>>>>> client<br>
| >>>>>>>>>>>> 2 gluster<br>
| >>>>>>>>>>>> physical<br>
| >>>>>>>>>>>> servers<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>> server side<br>
| >>>>>>>>>>>> and client<br>
| >>>>>>>>>>>> side both<br>
| >>>>>>>>>>>> running atm<br>
| >>>>>>>>>>>> 3.4.4<br>
| >>>>>>>>>>>> glusterfs<br>
| >>>>>>>>>>>> from<br>
| >>>>>>>>>>>> gluster repo.<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>> the problem is:<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>> 1. craeted<br>
| >>>>>>>>>>>> replica bricks.<br>
| >>>>>>>>>>>> 2. mounted<br>
| >>>>>>>>>>>> in proxmox<br>
| >>>>>>>>>>>> (tried both<br>
| >>>>>>>>>>>> promox<br>
| >>>>>>>>>>>> ways: via<br>
| >>>>>>>>>>>> GUI and<br>
| >>>>>>>>>>>> fstab (with<br>
| >>>>>>>>>>>> backup<br>
| >>>>>>>>>>>> volume<br>
| >>>>>>>>>>>> line), btw<br>
| >>>>>>>>>>>> while<br>
| >>>>>>>>>>>> mounting<br>
| >>>>>>>>>>>> via fstab<br>
| >>>>>>>>>>>> I'm unable<br>
| >>>>>>>>>>>> to launch a<br>
| >>>>>>>>>>>> VM without<br>
| >>>>>>>>>>>> cache,<br>
| >>>>>>>>>>>> meanwhile<br>
| >>>>>>>>>>>> direct-io-mode<br>
| >>>>>>>>>>>> is enabled<br>
| >>>>>>>>>>>> in fstab line)<br>
| >>>>>>>>>>>> 3. installed VM<br>
| >>>>>>>>>>>> 4. bring<br>
| >>>>>>>>>>>> one volume<br>
| >>>>>>>>>>>> down - ok<br>
| >>>>>>>>>>>> 5. bringing<br>
| >>>>>>>>>>>> up, waiting<br>
| >>>>>>>>>>>> for sync is<br>
| >>>>>>>>>>>> done.<br>
| >>>>>>>>>>>> 6. bring<br>
| >>>>>>>>>>>> other<br>
| >>>>>>>>>>>> volume down<br>
| >>>>>>>>>>>> - getting<br>
| >>>>>>>>>>>> IO errors<br>
| >>>>>>>>>>>> on VM guest<br>
| >>>>>>>>>>>> and not<br>
| >>>>>>>>>>>> able to<br>
| >>>>>>>>>>>> restore the<br>
| >>>>>>>>>>>> VM after I<br>
| >>>>>>>>>>>> reset the<br>
| >>>>>>>>>>>> VM via<br>
| >>>>>>>>>>>> host. It<br>
| >>>>>>>>>>>> says (no<br>
| >>>>>>>>>>>> bootable<br>
| >>>>>>>>>>>> media).<br>
| >>>>>>>>>>>> After I<br>
| >>>>>>>>>>>> shut it<br>
| >>>>>>>>>>>> down<br>
| >>>>>>>>>>>> (forced)<br>
| >>>>>>>>>>>> and bring<br>
| >>>>>>>>>>>> back up, it<br>
| >>>>>>>>>>>> boots.<br>
| >>>>>>>>>>> Could you do<br>
| >>>>>>>>>>> the<br>
| >>>>>>>>>>> following<br>
| >>>>>>>>>>> and test it<br>
| >>>>>>>>>>> again?<br>
| >>>>>>>>>>> gluster<br>
| >>>>>>>>>>> volume set<br>
| >>>>>>>>>>> <volname><br>
| >>>>>>>>>>> cluster.self-heal-daemon<br>
| >>>>>>>>>>> off<br>
| >>>>>>>>>>><br>
| >>>>>>>>>>> Pranith<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>> Need help.<br>
| >>>>>>>>>>>> Tried<br>
| >>>>>>>>>>>> 3.4.3, 3.4.4.<br>
| >>>>>>>>>>>> Still<br>
| >>>>>>>>>>>> missing<br>
| >>>>>>>>>>>> pkg-s for<br>
| >>>>>>>>>>>> 3.4.5 for<br>
| >>>>>>>>>>>> debian and<br>
| >>>>>>>>>>>> 3.5.2<br>
| >>>>>>>>>>>> (3.5.1<br>
| >>>>>>>>>>>> always<br>
| >>>>>>>>>>>> gives a<br>
| >>>>>>>>>>>> healing<br>
| >>>>>>>>>>>> error for<br>
| >>>>>>>>>>>> some reason)<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>> --<br>
| >>>>>>>>>>>> Best regards,<br>
| >>>>>>>>>>>> Roman.<br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>><br>
| >>>>>>>>>>>> _______________________________________________<br>
| >>>>>>>>>>>> Gluster-users<br>
| >>>>>>>>>>>> mailing list<br>
| >>>>>>>>>>>> <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
</div></div>| >>>>>>>>>>>> <mailto:<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>><br>
<div class="HOEnZb"><div class="h5">| >>>>>>>>>>>> <a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>><br>
| >>>>>>>>>>> --<br>
| >>>>>>>>>>> Best regards,<br>
| >>>>>>>>>>> Roman.<br>
| >>>>>>>>>><br>
| >>>>>>>>>><br>
| >>>>>>>>>><br>
| >>>>>>>>>><br>
| >>>>>>>>>> --<br>
| >>>>>>>>>> Best regards,<br>
| >>>>>>>>>> Roman.<br>
| >>>>>>>>><br>
| >>>>>>>>><br>
| >>>>>>>>><br>
| >>>>>>>>><br>
| >>>>>>>>> --<br>
| >>>>>>>>> Best regards,<br>
| >>>>>>>>> Roman.<br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>><br>
| >>>>>>>> --<br>
| >>>>>>>> Best regards,<br>
| >>>>>>>> Roman.<br>
| >>>>>>><br>
| >>>>>>><br>
| >>>>>>><br>
| >>>>>>><br>
| >>>>>>> --<br>
| >>>>>>> Best regards,<br>
| >>>>>>> Roman.<br>
| >>>>>><br>
| >>>>>><br>
| >>>>>><br>
| >>>>>><br>
| >>>>>> --<br>
| >>>>>> Best regards,<br>
| >>>>>> Roman.<br>
| >>>>><br>
| >>>>><br>
| >>>>><br>
| >>>>><br>
| >>>>> --<br>
| >>>>> Best regards,<br>
| >>>>> Roman.<br>
| >>>><br>
| >>>><br>
| >>>><br>
| >>>><br>
| >>>> --<br>
| >>>> Best regards,<br>
| >>>> Roman.<br>
| >>><br>
| >>><br>
| >>><br>
| >>><br>
| >>> --<br>
| >>> Best regards,<br>
| >>> Roman.<br>
| >><br>
| >><br>
| >><br>
| >><br>
| >> --<br>
| >> Best regards,<br>
| >> Roman.<br>
| >><br>
| >><br>
| >><br>
| >><br>
| >> --<br>
| >> Best regards,<br>
| >> Roman.<br>
| ><br>
| ><br>
| ><br>
| ><br>
| > --<br>
| > Best regards,<br>
| > Roman.<br>
|<br>
|<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Best regards,<br>Roman.
</div>