<div dir="ltr"><div class="gmail_extra"><div>> What linux distro ?<br>
><br>
> Anything special about your network configuration ?<br>
><br>
> Any chance your server is taking too long to release networking and gluster<br>
> is starting before network is ready ?<br>
><br>
> Can you completely disable iptables and test again ?<br><br></div>Both nodes are CentOS 6.5 VMs running on VMware ESXi 5.5.0. There is nothing special about network configuration, just static IPs. Ping and ssh works fine. I added "iptables -F" to /etc/rc.local. After simulteneous reboot "gluster peer status" on both nodes is connected and replication works fine. But "gluster volume status" states that NFS server and self-heal daemon on one of them isn't running. So I need to restart glusterd to make them running.<br>
<br></div><div class="gmail_extra">Another issue: when everything is OK after "service glusterd restart" on both nodes, I reboot one node and then can see on the rebooted node (ipset02):<br><br><div style="margin-left:40px">
<i>[root@ipset02 etc]#</i> gluster peer status<br>Number of Peers: 1<br><br>Hostname: ipset01<br>Uuid: 6313a4dd-f736-46ff-9836-bdf05c886ffd<br>State: Peer in Cluster (Connected)<br><i>[root@ipset02 etc]#</i> gluster volume status<br>
Status of volume: ipset-gv<br>Gluster process Port Online Pid<br>------------------------------------------------------------------------------<br>Brick ipset01:/usr/local/etc/ipset 49152 Y 1615<br>
Brick ipset02:/usr/local/etc/ipset 49152 Y 2282<br>NFS Server on localhost 2049 Y 2289<br>Self-heal Daemon on localhost N/A Y 2296<br>NFS Server on ipset01 2049 Y 2258<br>
Self-heal Daemon on ipset01 N/A Y 2262<br><br>There are no active volume tasks<br><br>[root@ipset02 etc]# tail -17 /var/log/glusterfs/glustershd.log <br>[2014-03-26 07:55:48.982456] E [client-handshake.c:1742:client_query_portmap_cbk] 0-ipset-gv-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.<br>
[2014-03-26 07:55:48.982532] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-1: readv failed (No data available)<br>[2014-03-26 07:55:48.982555] I [client.c:2097:client_rpc_notify] 0-ipset-gv-client-1: disconnected<br>[2014-03-26 07:55:48.982572] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-ipset-gv-client-0: changing port to 49152 (from 0)<br>
[2014-03-26 07:55:48.982627] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-0: readv failed (No data available)<br>[2014-03-26 07:55:48.986252] I [client-handshake.c:1659:select_server_supported_programs] 0-ipset-gv-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>
[2014-03-26 07:55:48.986551] I [client-handshake.c:1456:client_setvolume_cbk] 0-ipset-gv-client-0: Connected to <a href="http://192.168.1.180:49152">192.168.1.180:49152</a>, attached to remote volume '/usr/local/etc/ipset'.<br>
[2014-03-26 07:55:48.986566] I [client-handshake.c:1468:client_setvolume_cbk] 0-ipset-gv-client-0: Server and Client lk-version numbers are not same, reopening the fds<br>[2014-03-26 07:55:48.986628] I [afr-common.c:3698:afr_notify] 0-ipset-gv-replicate-0: Subvolume 'ipset-gv-client-0' came back up; going online.<br>
[2014-03-26 07:55:48.986743] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ipset-gv-client-0: Server lk version = 1<br>[2014-03-26 07:55:52.975670] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-ipset-gv-client-1: changing port to 49152 (from 0)<br>
[2014-03-26 07:55:52.975717] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-1: readv failed (No data available)<br>[2014-03-26 07:55:52.978961] I [client-handshake.c:1659:select_server_supported_programs] 0-ipset-gv-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)<br>
[2014-03-26 07:55:52.979128] I [client-handshake.c:1456:client_setvolume_cbk] 0-ipset-gv-client-1: Connected to <a href="http://192.168.1.181:49152">192.168.1.181:49152</a>, attached to remote volume '/usr/local/etc/ipset'.<br>
[2014-03-26 07:55:52.979143] I [client-handshake.c:1468:client_setvolume_cbk] 0-ipset-gv-client-1: Server and Client lk-version numbers are not same, reopening the fds<br>[2014-03-26 07:55:52.979269] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ipset-gv-client-1: Server lk version = 1<br>
[2014-03-26 07:55:52.980284] I [afr-self-heald.c:1180:afr_dir_exclusive_crawl] 0-ipset-gv-replicate-0: Another crawl is in progress for ipset-gv-client-1<br><br></div><br></div><div class="gmail_extra">And on the node that wasn't rebooted:<br>
</div><div class="gmail_extra"><br><div style="margin-left:40px"><i>[root@ipset01 ~]#</i> gluster peer status<br>Number of Peers: 1<br><br>Hostname: ipset02<br>Uuid: ff14ab0e-53cf-4015-9e49-fb60698c56db<br>State: Peer in Cluster (Disconnected)<br>
<i>[root@ipset01 ~]#</i> gluster volume status<br>Status of volume: ipset-gv<br>Gluster process Port Online Pid<br>------------------------------------------------------------------------------<br>
Brick ipset01:/usr/local/etc/ipset 49152 Y 1615<br>NFS Server on localhost 2049 Y 2258<br>Self-heal Daemon on localhost N/A Y 2262<br><br>There are no active volume tasks<br>
<br></div><div style="margin-left:40px">[root@ipset01 ~]# tail -3 /var/log/glusterfs/glustershd.log <br>[2014-03-26 07:50:28.881369] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-1: readv failed (Connection reset by peer)<br>
[2014-03-26 07:50:28.881421] W [socket.c:1962:__socket_proto_state_machine] 0-ipset-gv-client-1: reading from socket failed. Error (Connection reset by peer), peer (<a href="http://192.168.1.181:49152">192.168.1.181:49152</a>)<br>
[2014-03-26 07:50:28.881463] I [client.c:2097:client_rpc_notify] 0-ipset-gv-client-1: disconnected<br></div><br></div><div class="gmail_extra">Howerver, it seems that files replicate fine on both nodes. After "service glusterd restart" on the first node (ipset01) "gluster peer status" is connected. This behavior is strange.<br>
</div><div class="gmail_extra"><br></div><div class="gmail_extra">> May not be cause of your problems but it does bad things and gluster<br>
> sees this as a 'crash' even with graceful shutdown<br><br></div><div class="gmail_extra">I have no /var/lock/subsys/glusterfsd file too, but there is /var/lock/subsys/glusterd. As far as I know new versions of GlusterFS use glusterd init file instead of glusterfsd.<br>
<br><div style="margin-left:40px">[root@ipset01 etc]# service glusterfsd status<br>glusterfsd (pid 2338) is running...<br>[root@ipset01 etc]# service glusterd stop [ OK ]<br>[root@ipset01 etc]# service glusterd status <br>
glusterd dead but subsys locked<br>[root@ipset01 etc]# service glusterfsd status<br>glusterfsd (pid 2338) is running...<br></div></div><div class="gmail_extra"><br></div><div class="gmail_extra">Is it OK that glusterfsd still running?<br>
</div><div class="gmail_extra"><br><div class="gmail_quote">2014-03-26 2:16 GMT+04:00 Viktor Villafuerte <span dir="ltr"><<a href="mailto:viktor.villafuerte@optusnet.com.au" target="_blank">viktor.villafuerte@optusnet.com.au</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Also see this bug<br>
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1073217" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1073217</a><br>
<br>
May not be cause of your problems but it does bad things and gluster<br>
sees this as a 'crash' even with graceful shutdown<br>
<br>
v<br>
<div><div><br>
<br>
<br>
On Tue 25 Mar 2014 22:24:22, Carlos Capriotti wrote:<br>
> Let's go with the data collection first.<br>
><br>
> What linux distro ?<br>
><br>
> Anything special about your network configuration ?<br>
><br>
> Any chance your server is taking too long to release networking and gluster<br>
> is starting before network is ready ?<br>
><br>
> Can you completely disable iptables and test again ?<br>
><br>
> I am afraid quorum will not help you if you cannot get this issue<br>
> corrected.<br>
><br>
><br>
><br>
><br>
> On Tue, Mar 25, 2014 at 3:14 PM, แาิฃอ ๋ฯฮืมฬภห <<a href="mailto:artret@gmail.com" target="_blank">artret@gmail.com</a>> wrote:<br>
><br>
> > Hello!<br>
> ><br>
> > I have 2 nodes with GlusterFS 3.4.2. I created one replica volume using 2<br>
> > bricks and enabled glusterd autostarts. Also firewall is configured and I<br>
> > have to run "iptables -F" on nodes after reboot. It is clear that firewall<br>
> > should be disabled in LAN, but I'm interested in my case.<br>
> ><br>
> > Problem: When I reboot both nodes and run "iptables -F" peer status is<br>
> > still disconnected. I wonder why. After "service glusterd restart" peer<br>
> > status is connected. But I have to run "gluster volume heal <volume-name>"<br>
> > to make both servers consistent and be able to replicate files. Is there<br>
> > any way to eliminate this problem?<br>
> ><br>
> > I read about server-quorum, but it needs 3 or more nodes. Am I right?<br>
> ><br>
> > Best Regards,<br>
> > Artem Konvalyuk<br>
> ><br>
> > _______________________________________________<br>
> > Gluster-users mailing list<br>
> > <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
> > <a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br>
> ><br>
<br>
> _______________________________________________<br>
> Gluster-users mailing list<br>
> <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
> <a href="http://supercolony.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a><br>
<br>
<br>
</div></div><span><font color="#888888">--<br>
Regards<br>
<br>
Viktor Villafuerte<br>
Optus Internet Engineering<br>
t: 02 808-25265<br>
</font></span></blockquote></div><br></div></div>