<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
A much simpler answer is to assign a hostname to multiple IP
addresses (round robin dns). When gethostbyname() returns multiple
entries, the client will try them all until it's successful.<br>
<br>
<div class="moz-cite-prefix">On 11/24/2014 06:23 PM, Paul Robert
Marino wrote:<br>
</div>
<blockquote cite="mid:5473e80e.d12be00a.3212.ffffc7d5@mx.google.com"
type="cite">This is simple and can be handled in many ways.<br>
<br>
Some background first.<br>
The mount point is a single IP or host name. The only thing the
client uses it for is to download a describing all the bricks in
the cluster. The next thing is it opens connections to all the
nodes containing bricks for that volume.<br>
<br>
<span style="font-family:Prelude, Verdana, san-serif;">So the
answer is tell the client to connect to a virtual IP address.<br>
<br>
I personally use keepalived for this but you can use any one of
the many IPVS Or other tools that manage IPS for this. I assign
the VIP to a primary node then have each node monitor the
cluster processes if they die on a node it goes into a faulted
state and can not own the VIP.<br>
<br>
As long as the client are connecting to a running host in the
cluster you are fine even if that host doesn't own bricks in the
volume but is aware of them as part of the cluster.<br>
</span><span id="signature">
<div style="font-family: arial, sans-serif; font-size:
12px;color: #999999;">-- Sent from my HP Pre3</div>
<br>
</span><span style="color:navy; font-family:Prelude, Verdana,
san-serif; ">
<hr style="width:75%" align="left">On Nov 24, 2014 8:07 PM, Eric
Ewanco <a class="moz-txt-link-rfc2396E" href="mailto:Eric.Ewanco@genband.com"><Eric.Ewanco@genband.com></a> wrote: <br>
<br>
</span>
<div class="WordSection1">
<p class="MsoNormal">Hi all,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We’re trying to use gluster as a replicated
volume. It works OK when both peers are up but when one peer
is down and the other reboots, the “surviving” peer does not
automount glusterfs. Furthermore, after the boot sequence is
complete, it can be mounted without issue. It automounts fine
when the peer is up during startup. I tried to google this
and while I found some similar issues, I haven’t found any
solutions to my problem. Any insight would be appreciated.
Thanks.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">gluster volume info output (after startup):<o:p></o:p></p>
<p class="MsoNormal"><o:p></o:p></p>
<p class="MsoNormal">Volume Name: rel-vol<o:p></o:p></p>
<p class="MsoNormal">Type: Replicate<o:p></o:p></p>
<p class="MsoNormal">Volume ID:
90cbe313-e9f9-42d9-a947-802315ab72b0<o:p></o:p></p>
<p class="MsoNormal">Status: Started<o:p></o:p></p>
<p class="MsoNormal">Number of Bricks: 1 x 2 = 2<o:p></o:p></p>
<p class="MsoNormal">Transport-type: tcp<o:p></o:p></p>
<p class="MsoNormal">Bricks:<o:p></o:p></p>
<p class="MsoNormal">Brick1: 10.250.1.1:/export/brick1<o:p></o:p></p>
<p class="MsoNormal">Brick2: 10.250.1.2:/export/brick1<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">gluster peer status output (after startup):<o:p></o:p></p>
<p class="MsoNormal">Number of Peers: 1<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hostname: 10.250.1.2<o:p></o:p></p>
<p class="MsoNormal">Uuid: 8d49b929-4660-4b1e-821b-bfcd6291f516<o:p></o:p></p>
<p class="MsoNormal">State: Peer in Cluster (Disconnected)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Original volume create command: <o:p></o:p></p>
<p class="MsoNormal">gluster volume create rel-vol rep 2
transport tcp 10.250.1.1:/export/brick1
10.250.1.2:/export/brick1<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I am running Gluster 3.4.5 on OpenSuSE
12.2.<o:p></o:p></p>
<p class="MsoNormal">gluster --version:<o:p></o:p></p>
<p class="MsoNormal">glusterfs 3.4.5 built on Jul 25 2014
08:31:19<o:p></o:p></p>
<p class="MsoNormal">Repository revision:
git://git.gluster.com/glusterfs.git<o:p></o:p></p>
<p class="MsoNormal">Copyright (c) 2006-2011 Gluster Inc.
<a class="moz-txt-link-rfc2396E" href="http://www.gluster.com"><http://www.gluster.com></a><o:p></o:p></p>
<p class="MsoNormal">GlusterFS comes with ABSOLUTELY NO
WARRANTY.<o:p></o:p></p>
<p class="MsoNormal">You may redistribute copies of GlusterFS
under the terms of the GNU General Public License.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The fstab line is:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">localhost:/rel-vol
/home glusterfs defaults,_netdev 0 0<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">lsof -i :24007-24100:<o:p></o:p></p>
<p class="MsoNormal">COMMAND PID USER FD TYPE DEVICE
SIZE/OFF NODE NAME<o:p></o:p></p>
<p class="MsoNormal">glusterd 4073 root 6u IPv4 82170
0t0 TCP s1:24007->s1:1023 (ESTABLISHED)<o:p></o:p></p>
<p class="MsoNormal">glusterd 4073 root 9u IPv4 13816
0t0 TCP *:24007 (LISTEN)<o:p></o:p></p>
<p class="MsoNormal">glusterd 4073 root 10u IPv4 88106
0t0 TCP s1:exp2->s2:24007 (SYN_SENT)<o:p></o:p></p>
<p class="MsoNormal">glusterfs 4097 root 8u IPv4 16751
0t0 TCP s1:1023->s1:24007 (ESTABLISHED)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">This is shorter than it is when it works,
but maybe that’s because the mount spawns some more processes.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Some
ports are down:<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif""> <o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">root@q50-s1:/root>
telnet localhost 24007<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Trying
::1...<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">telnet:
connect to address ::1: Connection refused<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Trying
127.0.0.1...<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Connected
to localhost.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Escape
character is '^]'.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif""> <o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">telnet>
close<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Connection
closed.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">root@q50-s1:/root>
telnet localhost 24009<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Trying
::1...<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">telnet:
connect to address ::1: Connection refused<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Trying
127.0.0.1...<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">telnet:
connect to address 127.0.0.1: Connection refused<o:p></o:p></span></p>
<p><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">ps
axww | fgrep glu:<o:p></o:p></span></p>
<p><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">4073
? Ssl 0:10 /usr/sbin/glusterd -p /run/glusterd.pid<o:p></o:p></span></p>
<p><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">4097
? Ssl 0:00 /usr/sbin/glusterfsd -s 10.250.1.1
--volfile-id rel-vol.10.250.1.1.export-brick1 -p
/var/lib/glusterd/vols/rel-vol/run/10.250.1.1-export-brick1.pid
-S /var/run/89ba432ed09e07e107723b4b266e18f9.socket
--brick-name /export/brick1 -l
/var/log/glusterfs/bricks/export-brick1.log --xlator-option
*-posix.glusterd-uuid=3b02a581-8fb9-4c6a-8323-9463262f23bc
--brick-port 49152 --xlator-option
rel-vol-server.listen-port=49152<o:p></o:p></span></p>
<p><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">5949
ttyS0 S+ 0:00 fgrep glu<o:p></o:p></span></p>
<p class="MsoNormal">These are the error messages I see in
/var/log/gluster/home.log (/home is the mountpoint):<o:p></o:p></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">+------------------------------------------------------------------------------+<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">[2014-11-24
13:51:27.932285] E
[client-handshake.c:1742:client_query_portmap_cbk]
0-rel-vol-client-0: failed to get the port number for remote
subvolume. Please run 'gluster volume status' on server to
see if brick process is running.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">[2014-11-24
13:51:27.932373] W [socket.c:514:__socket_rwv]
0-rel-vol-client-0: readv failed (No data available)<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">[2014-11-24
13:51:27.932405] I [client.c:2098:client_rpc_notify]
0-rel-vol-client-0: disconnected<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">[2014-11-24
13:51:30.818281] E [socket.c:2157:socket_connect_finish]
0-rel-vol-client-1: connection to 10.250.1.2:24007 failed
(No route to host)<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">[2014-11-24
13:51:30.818313] E [afr-common.c:3735:afr_notify]
0-rel-vol-replicate-0: All subvolumes are down. Going
offline until atleast one of them comes back up.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">[2014-11-24
13:51:30.822189] I [fuse-bridge.c:4771:fuse_graph_setup]
0-fuse: switched to graph 0<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">[2014-11-24
13:51:30.822245] W [socket.c:514:__socket_rwv]
0-rel-vol-client-1: readv failed (No data available)<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">[2014-11-24
13:51:30.822312] I [fuse-bridge.c:3726:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions:
glusterfs 7.13 kernel 7.18<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">[2014-11-24
13:51:30.822562] W [fuse-bridge.c:705:fuse_attr_cbk]
0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint
is not connected)<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">[2014-11-24
13:51:30.835120] I [fuse-bridge.c:4630:fuse_thread_proc]
0-fuse: unmounting /home<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">[2014-11-24
13:51:30.835397] W [glusterfsd.c:1002:cleanup_and_exit]
(-->/lib64/libc.so.6(clone+0x6d) [0x7f00f0f682bd]
(-->/lib64/libpthread.so.0(+0x7e0e) [0x7f0<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">0f1603e0e]
(-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xc5)
[0x4075f5]))) 0-: received signum (15), shutting down<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif"">[2014-11-24
13:51:30.835416] I [fuse-bridge.c:5262:fini] 0-fuse:
Unmounting '/home'.<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Relevant section from
/var/log/glusterfs/etc-glusterfs-glusterd.vol.log:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.552371] I
[glusterfsd.c:1910:main] 0-/usr/sbin/glusterd: Started running
/usr/sbin/glusterd version 3.4.5 (/usr/sbin/glusterd -p
/run/glusterd.pid)<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.574553] I
[glusterd.c:961:init] 0-management: Using /var/lib/glusterd as
working directory<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.577734] I
[socket.c:3480:socket_init] 0-socket.management: SSL support
is NOT enabled<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.577756] I
[socket.c:3495:socket_init] 0-socket.management: using system
polling thread<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.577834] E
[rpc-transport.c:253:rpc_transport_load] 0-rpc-transport:
/usr/lib64/glusterfs/3.4.5/rpc-transport/rdma.so: cannot open
shared object file: No such file or directory<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.577849] W
[rpc-transport.c:257:rpc_transport_load] 0-rpc-transport:
volume 'rdma.management': transport-type 'rdma' is not valid
or not found on this machine<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.577858] W
[rpcsvc.c:1389:rpcsvc_transport_create] 0-rpc-service: cannot
create listener, initing the transport failed<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.578697] I
[glusterd.c:354:glusterd_check_gsync_present] 0-glusterd:
geo-replication module not installed in the system<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.598907] I
[glusterd-store.c:1339:glusterd_restore_op_version]
0-glusterd: retrieved op-version: 2<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.607802] E
[glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-:
Unknown key: brick-0<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.607837] E
[glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-:
Unknown key: brick-1<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.809027] I
[glusterd-handler.c:2818:glusterd_friend_add] 0-management:
connect returned 0<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.809098] I
[rpc-clnt.c:962:rpc_clnt_connection_init] 0-management:
setting frame-timeout to 600<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.809150] I
[socket.c:3480:socket_init] 0-management: SSL support is NOT
enabled<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.809162] I
[socket.c:3495:socket_init] 0-management: using system polling
thread<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:27.813801] I
[glusterd.c:125:glusterd_uuid_init] 0-management: retrieved
UUID: 3b02a581-8fb9-4c6a-8323-9463262f23bc<o:p></o:p></p>
<p class="MsoNormal">Given volfile:<o:p></o:p></p>
<p class="MsoNormal">+------------------------------------------------------------------------------+<o:p></o:p></p>
<p class="MsoNormal"> 1: volume management<o:p></o:p></p>
<p class="MsoNormal"> 2: type mgmt/glusterd<o:p></o:p></p>
<p class="MsoNormal"> 3: option working-directory
/var/lib/glusterd<o:p></o:p></p>
<p class="MsoNormal"> 4: option transport-type socket,rdma<o:p></o:p></p>
<p class="MsoNormal"> 5: option
transport.socket.keepalive-time 10<o:p></o:p></p>
<p class="MsoNormal"> 6: option
transport.socket.keepalive-interval 2<o:p></o:p></p>
<p class="MsoNormal"> 7: option
transport.socket.read-fail-log off<o:p></o:p></p>
<p class="MsoNormal"> 8: # option base-port 49152<o:p></o:p></p>
<p class="MsoNormal"> 9: end-volume<o:p></o:p></p>
<p class="MsoNormal">+------------------------------------------------------------------------------+<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:30.818283] E
[socket.c:2157:socket_connect_finish] 0-management: connection
to 10.250.1.2:24007 failed (No route to host)<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:30.820254] I
[rpc-clnt.c:962:rpc_clnt_connection_init] 0-management:
setting frame-timeout to 600<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:30.820316] I
[socket.c:3480:socket_init] 0-management: SSL support is NOT
enabled<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:30.820327] I
[socket.c:3495:socket_init] 0-management: using system polling
thread<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:30.820378] W
[socket.c:514:__socket_rwv] 0-management: readv failed (No
data available)<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:30.821243] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get]
0-management: Found brick<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:30.821268] I
[socket.c:2236:socket_event_handler] 0-transport:
disconnecting now<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:30.822036] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get]
0-management: Found brick<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:30.863454] I
[glusterd-pmap.c:227:pmap_registry_bind] 0-pmap: adding brick
/export/brick1 on port 49152<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:33.824274] W
[socket.c:514:__socket_rwv] 0-management: readv failed (No
data available)<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:34.817560] I
[glusterd-utils.c:1079:glusterd_volume_brickinfo_get]
0-management: Found brick<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:39.824281] W
[socket.c:514:__socket_rwv] 0-management: readv failed (No
data available)<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:42.830260] W
[socket.c:514:__socket_rwv] 0-management: readv failed (No
data available)<o:p></o:p></p>
<p class="MsoNormal">[2014-11-24 13:51:48.832276] W
[socket.c:514:__socket_rwv] 0-management: readv failed (No
data available)<o:p></o:p></p>
<p class="MsoNormal">[ad nauseam...]<o:p></o:p></p>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Gluster-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>
<a class="moz-txt-link-freetext" href="http://supercolony.gluster.org/mailman/listinfo/gluster-users">http://supercolony.gluster.org/mailman/listinfo/gluster-users</a></pre>
</blockquote>
<br>
</body>
</html>