<div dir="ltr">Hmm, I don't know how, but both VM-s survived the second server outage :) Still had no any message about healing completion anywhere :)</div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-09-01 10:13 GMT+03:00 Roman <span dir="ltr"><<a href="mailto:romeo.r@gmail.com" target="_blank">romeo.r@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">The mount is on the proxmox machine. <div><br></div><div>here are the logs from disconnection till connection:</div>
<div><br></div><div><br><div>[2014-09-01 06:19:38.059383] W [socket.c:522:__socket_rwv] 0-glusterfs: readv on <a href="http://10.250.0.1:24007" target="_blank">10.250.0.1:24007</a> failed (Connection timed out)</div>
<div>[2014-09-01 06:19:40.338393] W [socket.c:522:__socket_rwv] 0-HA-2TB-TT-Proxmox-cluster-client-0: readv on <a href="http://10.250.0.1:49159" target="_blank">10.250.0.1:49159</a> failed (Connection timed out)</div><div>
[2014-09-01 06:19:40.338447] I [client.c:2229:client_rpc_notify] 0-HA-2TB-TT-Proxmox-cluster-client-0: disconnected from <a href="http://10.250.0.1:49159" target="_blank">10.250.0.1:49159</a>. Client process will keep trying to connect to glusterd until brick's port is available</div>
<div>[2014-09-01 06:19:49.196768] E [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to <a href="http://10.250.0.1:24007" target="_blank">10.250.0.1:24007</a> failed (No route to host)</div><div>[2014-09-01 06:20:05.565444] E [socket.c:2161:socket_connect_finish] 0-HA-2TB-TT-Proxmox-cluster-client-0: connection to <a href="http://10.250.0.1:24007" target="_blank">10.250.0.1:24007</a> failed (No route to host)</div>
<div>[2014-09-01 06:23:26.607180] I [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-HA-2TB-TT-Proxmox-cluster-client-0: changing port to 49159 (from 0)</div><div>[2014-09-01 06:23:26.608032] I [client-handshake.c:1677:select_server_supported_programs] 0-HA-2TB-TT-Proxmox-cluster-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)</div>
<div>[2014-09-01 06:23:26.608395] I [client-handshake.c:1462:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: Connected to <a href="http://10.250.0.1:49159" target="_blank">10.250.0.1:49159</a>, attached to remote volume '/exports/HA-2TB-TT-Proxmox-cluster/2TB'.</div>
<div>[2014-09-01 06:23:26.608420] I [client-handshake.c:1474:client_setvolume_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: Server and Client lk-version numbers are not same, reopening the fds</div><div>[2014-09-01 06:23:26.608606] I [client-handshake.c:450:client_set_lk_version_cbk] 0-HA-2TB-TT-Proxmox-cluster-client-0: Server lk version = 1</div>
<div>[2014-09-01 06:23:40.604979] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing</div></div><div><br></div><div>Now there is no healing traffic also. I could try to disconnect now second server to see if it is going to failover. I don't really believe it will :(</div>
<div><br></div><div>here are some logs for stor1 server (the one I've disconnected):</div><div><div>root@stor1:~# cat /var/log/glusterfs/bricks/exports-HA-2TB-TT-Proxmox-cluster-2TB.log</div><div>[2014-09-01 06:19:26.403323] I [server.c:520:server_rpc_notify] 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom pve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div>
<div>[2014-09-01 06:19:26.403399] I [server-helpers.c:289:do_fd_cleanup] 0-HA-2TB-TT-Proxmox-cluster-server: fd cleanup on /images/112/vm-112-disk-1.raw</div><div>[2014-09-01 06:19:26.403486] I [client_t.c:417:gf_client_unref] 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection pve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div>
<div>[2014-09-01 06:19:29.475318] I [server.c:520:server_rpc_notify] 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom stor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>
[2014-09-01 06:19:29.475373] I [client_t.c:417:gf_client_unref] 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection stor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>[2014-09-01 06:19:36.963318] I [server.c:520:server_rpc_notify] 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom stor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div>
<div>[2014-09-01 06:19:36.963373] I [client_t.c:417:gf_client_unref] 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection stor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>[2014-09-01 06:19:40.419298] I [server.c:520:server_rpc_notify] 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom pve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div>
<div>[2014-09-01 06:19:40.419355] I [client_t.c:417:gf_client_unref] 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection pve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>[2014-09-01 06:19:42.531310] I [server.c:520:server_rpc_notify] 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom sisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div>
<div>[2014-09-01 06:19:42.531368] I [client_t.c:417:gf_client_unref] 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection sisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>
[2014-09-01 06:23:25.088518] I [server-handshake.c:575:server_setvolume] 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from sisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-1 (version: 3.5.2)</div>
<div>[2014-09-01 06:23:25.532734] I [server-handshake.c:575:server_setvolume] 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from stor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-1 (version: 3.5.2)</div>
<div>[2014-09-01 06:23:26.608074] I [server-handshake.c:575:server_setvolume] 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from pve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-1 (version: 3.5.2)</div>
<div>[2014-09-01 06:23:27.187556] I [server-handshake.c:575:server_setvolume] 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from pve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-1 (version: 3.5.2)</div>
<div>[2014-09-01 06:23:27.213890] I [server-handshake.c:575:server_setvolume] 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from stor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-1 (version: 3.5.2)</div>
<div>[2014-09-01 06:23:31.222654] I [server-handshake.c:575:server_setvolume] 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from pve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-1 (version: 3.5.2)</div>
<div>[2014-09-01 06:23:52.591365] I [server.c:520:server_rpc_notify] 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom pve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>
[2014-09-01 06:23:52.591447] W [inodelk.c:392:pl_inodelk_log_cleanup] 0-HA-2TB-TT-Proxmox-cluster-server: releasing lock on 14f70955-5e1e-4499-b66b-52cd50892315 held by {client=0x7f2494001ed0, pid=0 lk-owner=bc3ddbdbae7f0000}</div>
<div>[2014-09-01 06:23:52.591568] I [server-helpers.c:289:do_fd_cleanup] 0-HA-2TB-TT-Proxmox-cluster-server: fd cleanup on /images/124/vm-124-disk-1.qcow2</div><div>[2014-09-01 06:23:52.591679] I [client_t.c:417:gf_client_unref] 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection pve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div>
<div>[2014-09-01 06:23:58.709444] I [server-handshake.c:575:server_setvolume] 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from stor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0 (version: 3.5.2)</div>
<div>[2014-09-01 06:24:00.741542] I [server.c:520:server_rpc_notify] 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom stor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>
[2014-09-01 06:24:00.741598] I [client_t.c:417:gf_client_unref] 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection stor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>[2014-09-01 06:30:06.010819] I [server-handshake.c:575:server_setvolume] 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from stor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0 (version: 3.5.2)</div>
<div>[2014-09-01 06:30:08.056059] I [server.c:520:server_rpc_notify] 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom stor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>
[2014-09-01 06:30:08.056127] I [client_t.c:417:gf_client_unref] 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection stor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>[2014-09-01 06:36:54.307743] I [server-handshake.c:575:server_setvolume] 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from stor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0 (version: 3.5.2)</div>
<div>[2014-09-01 06:36:56.340078] I [server.c:520:server_rpc_notify] 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom stor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>
[2014-09-01 06:36:56.340122] I [client_t.c:417:gf_client_unref] 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection stor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>[2014-09-01 06:46:53.601517] I [server-handshake.c:575:server_setvolume] 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from stor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0 (version: 3.5.2)</div>
<div>[2014-09-01 06:46:55.624705] I [server.c:520:server_rpc_notify] 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom stor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div><div>
[2014-09-01 06:46:55.624793] I [client_t.c:417:gf_client_unref] 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection stor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0</div></div><div><br>
</div><div>last 2 lines are pretty unclear. Why it has disconnected?</div><div><br></div><div><div><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-09-01 9:41 GMT+03:00 Pranith Kumar Karampuri <span dir="ltr"><<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<div>
<div class="h5"><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><div>
<br>
<div>On 09/01/2014 12:08 PM, Roman wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Well, as for me, VM-s are not very impacted by
healing process. At least the munin server running with pretty
high load (average rarely goes below 0,9 :) )had no problems. To
create some more load I've made a copy of 590 MB file on the
VM-s disk, It took 22 seconds. Which is ca 27 MB /sec or 214
Mbps/sec
<div>
<br>
</div>
<div>Servers are connected via 10 gbit network. Proxmox client
is connected to the server with separate 1 gbps interface. We
are thinking of moving it to 10gbps also.<br>
<div><br>
</div>
<div>Here are some heal info which is pretty confusing.</div>
</div>
<div><br>
</div>
<div>right after 1st server restored it connection, it was
pretty clear:</div>
<div><br>
</div>
<div>
<div>root@stor1:~# gluster volume heal
HA-2TB-TT-Proxmox-cluster info</div>
<div>Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/</div>
<div>/images/124/vm-124-disk-1.qcow2 - Possibly undergoing
heal</div>
<div>Number of entries: 1</div>
<div><br>
</div>
<div>Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/</div>
<div>/images/124/vm-124-disk-1.qcow2 - Possibly undergoing
heal</div>
<div>/images/112/vm-112-disk-1.raw - Possibly undergoing heal</div>
<div>Number of entries: 2</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>some time later is says </div>
<div>
<div>root@stor1:~# gluster volume heal
HA-2TB-TT-Proxmox-cluster info</div>
<div>Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/</div>
<div>Number of entries: 0</div>
<div><br>
</div>
<div>Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/</div>
<div>Number of entries: 0</div>
</div>
<div><br>
</div>
<div>while I can still see traffic between servers and still
there was no messages about healing process completion.</div>
</div>
</blockquote></div>
On which machine do we have the mount?<span><font color="#888888"><br>
<br>
Pranith</font></span><div><div><br>
<blockquote type="cite">
<div dir="ltr">
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">2014-08-29 10:02 GMT+03:00 Pranith
Kumar Karampuri <span dir="ltr"><<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> Wow, this is great
news! Thanks a lot for sharing the results :-). Did you
get a chance to test the performance of the applications
in the vm during self-heal?<br>
May I know more about your use case? i.e. How many vms and
what is the avg size of each vm etc?<span><font color="#888888"><br>
<br>
Pranith</font></span>
<div>
<div><br>
<br>
<div>On 08/28/2014 11:27 PM, Roman wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Here are the results.
<div>1. still have problem with logs rotation.
logs are being written to .log.1 file, not .log
file. any hints, how to fix?</div>
<div>2. healing logs are now much more better, I
can see the successful message.</div>
<div>3. both volumes with HD off and on
successfully synced. the volume with HD on
synced much more faster.</div>
<div>4. both VMs on volumes survived the outage,
when new files were added and deleted during
outage.</div>
<div><br>
</div>
<div>So replication works well with both HD on and
off for volumes for VM-s. With HD even faster.
Need to solve the logging issue.</div>
<div><br>
</div>
<div>Seems we could start production storage from
this moment :) The whole company will use it.
Some distributed and some replicated. Thanks for
great product.</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">2014-08-27 16:03
GMT+03:00 Roman <span dir="ltr"><<a href="mailto:romeo.r@gmail.com" target="_blank">romeo.r@gmail.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Installed new packages. Will
make some tests tomorrow. thanx.</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">2014-08-27 14:10
GMT+03:00 Pranith Kumar Karampuri <span dir="ltr"><<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>></span>:
<div>
<div><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div><br>
On 08/27/2014 04:38 PM, Kaleb
KEITHLEY wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On
08/27/2014 03:09 AM, Humble
Chirammal wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
<br>
----- Original Message -----<br>
| From: "Pranith Kumar
Karampuri" <<a href="mailto:pkarampu@redhat.com" target="_blank">pkarampu@redhat.com</a>><br>
| To: "Humble Chirammal"
<<a href="mailto:hchiramm@redhat.com" target="_blank">hchiramm@redhat.com</a>><br>
| Cc: "Roman" <<a href="mailto:romeo.r@gmail.com" target="_blank">romeo.r@gmail.com</a>>,
<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>,
"Niels de Vos" <<a href="mailto:ndevos@redhat.com" target="_blank">ndevos@redhat.com</a>><br>
| Sent: Wednesday, August
27, 2014 12:34:22 PM<br>
| Subject: Re:
[Gluster-users] libgfapi
failover problem on replica
bricks<br>
|<br>
|<br>
| On 08/27/2014 12:24 PM,
Roman wrote:<br>
| > root@stor1:~# ls -l
/usr/sbin/glfsheal<br>
| > ls: cannot access
/usr/sbin/glfsheal: No such
file or directory<br>
| > Seems like not.<br>
| Humble,<br>
| Seems like the
binary is still not
packaged?<br>
<br>
Checking with Kaleb on this.<br>
<br>
</blockquote>
...<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> |
>>> |<br>
| >>> |
Humble/Niels,<br>
| >>>
| Do we have debs
available for 3.5.2? In
3.5.1<br>
| >>>
there was packaging<br>
| >>> |
issue where
/usr/bin/glfsheal is not
packaged along<br>
| >>>
with the deb. I<br>
| >>> |
think that should be fixed
now as well?<br>
| >>> |<br>
| >>>
Pranith,<br>
| >>><br>
| >>>
The 3.5.2 packages for
debian is not available yet.
We<br>
| >>>
are co-ordinating
internally to get it
processed.<br>
| >>> I
will update the list once
its available.<br>
| >>><br>
| >>>
--Humble<br>
</blockquote>
<br>
glfsheal isn't in our 3.5.2-1
DPKGs either. We (meaning I)
started with the 3.5.1
packaging bits from Semiosis.
Perhaps he fixed 3.5.1 after
giving me his bits.<br>
<br>
I'll fix it and spin 3.5.2-2
DPKGs.<br>
</blockquote>
</div>
</div>
That is great Kaleb. Please notify
semiosis as well in case he is yet
to fix it.<br>
<br>
Pranith<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>
<span><font color="#888888"> -- <br>
<br>
Kaleb<br>
<br>
</font></span></blockquote>
<br>
</blockquote>
</div>
</div>
</div>
<span><font color="#888888"><br>
<br clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </font></span></div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman. </div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
Best regards,<br>
Roman.
</div>
</blockquote>
<br>
</div></div></div>
</blockquote></div></div></div><span class="HOEnZb"><font color="#888888"><br><br clear="all"><div><br></div>-- <br>Best regards,<br>Roman.
</font></span></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>Best regards,<br>Roman.
</div>