Hi all,<div><br></div><div>Here is a patch for this file flocks uncleanly disconnect issue of gluster-3.4.5. </div><div>I am totally new guy in the gluster development work flow, and still trying to </div><div>understand how to submit this patch to Gerrit. So I want to paste the patch </div><div>here first to let devel team know, and submit it after I figure out the Gerrit :-).</div><div><br></div><div>The major modification is adding an id for different tcp connection between a </div><div>pair client and server to avoid a connection socket not close at the same time.</div><div><br></div><div><div>diff --git a/rpc/rpc-lib/src/rpc-clnt.h b/rpc/rpc-lib/src/rpc-clnt.h</div><div>index 263d5f7..718308d 100644</div><div>--- a/rpc/rpc-lib/src/rpc-clnt.h</div><div>+++ b/rpc/rpc-lib/src/rpc-clnt.h</div><div>@@ -143,6 +143,7 @@ struct rpc_clnt_connection {</div><div> <span class="Apple-tab-span" style="white-space:pre">        </span>struct timeval last_sent;</div><div> <span class="Apple-tab-span" style="white-space:pre">        </span>struct timeval last_received;</div><div> <span class="Apple-tab-span" style="white-space:pre">        </span>int32_t ping_started;</div><div>+ uint32_t clnt_conn_id;</div><div> };</div><div> typedef struct rpc_clnt_connection rpc_clnt_connection_t;</div><div> </div><div>diff --git a/xlators/protocol/client/src/client-handshake.c b/xlators/protocol/client/src/client-handshake.c</div><div>index d2083e6..1c2fc2f 100644</div><div>--- a/xlators/protocol/client/src/client-handshake.c</div><div>+++ b/xlators/protocol/client/src/client-handshake.c</div><div>@@ -471,9 +471,10 @@ client_set_lk_version (xlator_t *this)</div><div> conf = (clnt_conf_t *) this->private;</div><div> </div><div> req.lk_ver = client_get_lk_ver (conf);</div><div>- ret = gf_asprintf (&req.uid, "%s-%s-%d",</div><div>+ ret = gf_asprintf (&req.uid, "%s-%s-%d-%u",</div><div> this->ctx->process_uuid, this->name,</div><div>- this->graph->id);</div><div>+ this->graph->id, </div><div>+ (conf->rpc) ? conf->rpc->conn.clnt_conn_id : 0);</div><div> if (ret == -1)</div><div> goto err;</div><div> </div><div>@@ -1549,13 +1550,22 @@ client_setvolume (xlator_t *this, struct rpc_clnt *rpc)</div><div> }</div><div> }</div><div> </div><div>+ /* For different connections between a pair client and server, we use a </div><div>+ * different clnt_conn_id to identify. Otherwise, there are some chances </div><div>+ * lead to flocks not released in a uncleanly disconnection.</div><div>+ * */</div><div>+ if (conf->rpc) {</div><div>+ conf->rpc->conn.clnt_conn_id = conf->clnt_conn_id++;</div><div>+ }</div><div>+</div><div> /* With multiple graphs possible in the same process, we need a</div><div> field to bring the uniqueness. Graph-ID should be enough to get the</div><div> job done</div><div> */</div><div>- ret = gf_asprintf (&process_uuid_xl, "%s-%s-%d",</div><div>+ ret = gf_asprintf (&process_uuid_xl, "%s-%s-%d-%u",</div><div> this->ctx->process_uuid, this->name,</div><div>- this->graph->id);</div><div>+ this->graph->id, </div><div>+ (conf->rpc) ? conf->rpc->conn.clnt_conn_id : 0);</div><div> if (-1 == ret) {</div><div> gf_log (this->name, GF_LOG_ERROR,</div><div> "asprintf failed while setting process_uuid");</div><div>diff --git a/xlators/protocol/client/src/client.c b/xlators/protocol/client/src/client.c</div><div>index ad95574..35fef49 100644</div><div>--- a/xlators/protocol/client/src/client.c</div><div>+++ b/xlators/protocol/client/src/client.c</div><div>@@ -2437,6 +2437,7 @@ init (xlator_t *this)</div><div> conf->lk_version = 1;</div><div> conf->grace_timer = NULL;</div><div> conf->grace_timer_needed = _gf_true;</div><div>+ conf->clnt_conn_id = 0;</div><div> </div><div> ret = client_init_grace_timer (this, this->options, conf);</div><div> if (ret)</div><div>diff --git a/xlators/protocol/client/src/client.h b/xlators/protocol/client/src/client.h</div><div>index 0a27c09..dea90d1 100644</div><div>--- a/xlators/protocol/client/src/client.h</div><div>+++ b/xlators/protocol/client/src/client.h</div><div>@@ -116,6 +116,9 @@ typedef struct clnt_conf {</div><div> <span class="Apple-tab-span" style="white-space:pre">                                                </span>*/</div><div> gf_boolean_t filter_o_direct; /* if set, filter O_DIRECT from</div><div> the flags list of open() */</div><div>+ uint32_t clnt_conn_id; /* connection id for each connection</div><div>+ in process_uuid, start with 0, </div><div>+ increase once a new connection */</div><div> } clnt_conf_t;</div><div> </div><div> typedef struct _client_fd_ctx {</div><div><br></div><br>On Wednesday, September 17, 2014, Jaden Liang <<a href="mailto:jaden1q84@gmail.com">jaden1q84@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>Hi all,</div><div><br></div><div>By several days tracking, we finally pinpointed the reason of glusterfs uncleanly </div><div>detach file flocks in frequently network disconnection. We are now working on</div><div>a patch to submit. And here is this issue details. Any suggestions will be </div><div>appreciated!</div><div><br></div><div>First of all, as I mentioned in </div><div><a href="http://supercolony.gluster.org/pipermail/gluster-devel/2014-September/042233.html" target="_blank">http://supercolony.gluster.org/pipermail/gluster-devel/2014-September/042233.html</a></div><div>This issue happens in a frequently network disconnection.</div><div><br></div><div>According to the sources, the server cleanup jobs is in server_connection_cleanup.</div><div>When the RPCSVC_EVENT_DISCONNECT happens, it will come here:</div><div><br></div><div>int</div><div>server_rpc_notify ()</div><div>{</div><div><span style="white-space:pre-wrap">        </span>......</div><div><span style="white-space:pre-wrap">        </span> case RPCSVC_EVENT_DISCONNECT:</div><div><span style="white-space:pre-wrap">                                </span>......</div><div> if (!conf->lk_heal) {</div><div> server_conn_ref (conn);</div><div> server_connection_put (this, conn, &detached);</div><div> if (detached)</div><div> server_connection_cleanup (this, conn,</div><div> INTERNAL_LOCKS |</div><div> POSIX_LOCKS);</div><div> server_conn_unref (conn);</div><div><span style="white-space:pre-wrap">        </span>......</div><div>}</div><div><br></div><div>The server_connection_cleanup() will be called while variable 'detached' is true. </div><div>And the 'detached' is set by server_connection_put():</div><div><span style="white-space:pre-wrap">        </span></div><div>server_connection_t*</div><div>server_connection_put (xlator_t *this, server_connection_t *conn,</div><div> gf_boolean_t *detached)</div><div>{</div><div> server_conf_t *conf = NULL;</div><div> gf_boolean_t unref = _gf_false;</div><div><br></div><div> if (detached)</div><div> *detached = _gf_false;</div><div> conf = this->private;</div><div> pthread_mutex_lock (&conf->mutex);</div><div> {</div><div> conn->bind_ref--;</div><div> if (!conn->bind_ref) {</div><div> list_del_init (&conn->list);</div><div> unref = _gf_true;</div><div> }</div><div> }</div><div> pthread_mutex_unlock (&conf->mutex);</div><div> if (unref) {</div><div> gf_log (this->name, GF_LOG_INFO, "Shutting down connection %s",</div><div> conn->id);</div><div> if (detached)</div><div> *detached = _gf_true;</div><div> server_conn_unref (conn);</div><div> conn = NULL;</div><div> }</div><div> return conn;</div><div>}</div><div><br></div><div>The 'detached' is only set _gf_true when 'conn->bind_ref' decrease to 0. </div><div>This 'conn->bind_ref' is set in server_connection_get(), increase or set to 1.</div><div><br></div><div>server_connection_t *</div><div>server_connection_get (xlator_t *this, const char *id)</div><div>{</div><div><span style="white-space:pre-wrap">                        </span>......</div><div> list_for_each_entry (trav, &conf->conns, list) {</div><div> if (!strcmp (trav->id, id)) {</div><div> conn = trav;</div><div> conn->bind_ref++;</div><div> goto unlock;</div><div> }</div><div> }</div><div><span style="white-space:pre-wrap">                        </span>......</div><div>}</div><div><br></div><div>When the connection id is same, then the 'conn->bind_ref' will be increased.</div><div>Therefore, the problem should be a reference mismatch increase or decrease. Then </div><div>we add some logs to verify our guess.</div><div><br></div><div>// 1st connection comes in. and there is no id 'host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0'</div><div>in the connection table. The 'conn->bind_ref' is set to 1.</div><div>[2014-09-17 04:42:28.950693] D [server-helpers.c:712:server_connection_get] 0-vs_vol_rep2-server: server connection id: host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0, conn->bind_ref:1, found:0</div><div>[2014-09-17 04:42:28.950717] D [server-handshake.c:430:server_setvolume] 0-vs_vol_rep2-server: Connected to host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0</div><div>[2014-09-17 04:42:28.950758] I [server-handshake.c:567:server_setvolume] 0-vs_vol_rep2-server: accepted client from host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0 (version: 3.4.5) (peer: host-000c29e93d20:1015)</div><div>......</div><div>// Keep running several minutes.......</div><div>......</div><div>// Network disconnected here. The TCP socket of client side is disconnected by </div><div>time-out, by the server-side socket still keep connected. AT THIS MOMENT, </div><div>network restore. Client side reconnect a new TCP connection JUST BEFORE the </div><div>last socket on server-side is reset. Note that at this point, there is 2 valid </div><div>sockets on server side. The later new connection use the same conn id 'host-000</div><div>c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0' look up in the </div><div>connection table and increase the 'conn->bind_ref' to 2.</div><div><br></div><div>[2014-09-17 04:46:16.135066] D [server-helpers.c:712:server_connection_get] 0-vs_vol_rep2-server: server connection id: host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0, conn->bind_ref:2, found:1 // HERE IT IS, ref increase to 2!!!</div><div>[2014-09-17 04:46:16.135113] D [server-handshake.c:430:server_setvolume] 0-vs_vol_rep2-server: Connected to host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0</div><div>[2014-09-17 04:46:16.135157] I [server-handshake.c:567:server_setvolume] 0-vs_vol_rep2-server: accepted client from host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0 (version: 3.4.5) (peer: host-000c29e93d20:1018)</div><div><br></div><div>// After 13 seconds, the old connection is reset, decrease the 'conn->bind_ref' to 1. </div><div><br></div><div>[2014-09-17 04:46:28.688780] W [socket.c:2121:__socket_proto_state_machine] 0-tcp.vs_vol_rep2-server: ret = -1, error: Connection reset by peer, peer (host-000c29e93d20:1015)</div><div>[2014-09-17 04:46:28.688790] I [socket.c:2274:socket_event_handler] 0-transport: socket_event_poll_in failed, ret=-1.</div><div>[2014-09-17 04:46:28.688797] D [socket.c:2281:socket_event_handler] 0-transport: disconnecting now</div><div>[2014-09-17 04:46:28.688831] I [server.c:762:server_rpc_notify] 0-vs_vol_rep2-server: disconnecting connectionfrom host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0(host-000c29e93d20:1015)</div><div>[2014-09-17 04:46:28.688861] D [server-helpers.c:744:server_connection_put] 0-vs_vol_rep2-server: conn->bind_ref:1</div><div><br></div><div>In our production environment, there is some flocks in the 1st connection. </div><div>According to the logs, there is no way to cleanup the flocks in the 1st connection.</div><div>And the 2nd new connection, the client-side can't flock again.</div><div><br></div><div>Therefore, we think the major reason is different connections using the same conn id.</div><div>The conn id is assembled in client_setvolume()</div><div><br></div><div><span style="white-space:pre-wrap">                </span>ret = gf_asprintf (&process_uuid_xl, "%s-%s-%d",</div><div> this->ctx->process_uuid, this->name,</div><div> this->graph->id);</div><div><br></div><div>The conn id contains 3 parts:</div><div>this->ctx->process_uuid: hostname + pid + startup timestamp</div><div>this->name: tranlator name</div><div>this->graph->id: graph id</div><div><br></div><div>It is apparently that the conn id is same unless the client side restart. So when </div><div>network disconnects, there is some chance that socket on client side timeout and </div><div>the one on server side is alive. At this moment, network restore, client reconnect </div><div>before server old socket reset, that will cause the file flocks of old connection </div><div>unclean.</div><div><br></div><div>That is our total analysis of this flock leak issue. Now we are working on the patch.</div><div>Hope someone could review it when it is finished.</div><div><br></div><div>Any other comment is grateful, Thank you! </div><div><br></div>
</blockquote></div>