Hi all,<div><br></div><div>Here is a patch for this file flocks uncleanly disconnect issue of gluster-3.4.5. </div><div>I am totally new guy in the gluster development work flow, and still trying to </div><div>understand how to submit this patch to Gerrit. So I want to paste the patch </div><div>here first to let devel team know, and submit it after I figure out the Gerrit :-).</div><div><br></div><div>The major modification is adding an id for different tcp connection between a </div><div>pair client and server to avoid a connection socket not close at the same time.</div><div><br></div><div><div>diff --git a/rpc/rpc-lib/src/rpc-clnt.h b/rpc/rpc-lib/src/rpc-clnt.h</div><div>index 263d5f7..718308d 100644</div><div>--- a/rpc/rpc-lib/src/rpc-clnt.h</div><div>+++ b/rpc/rpc-lib/src/rpc-clnt.h</div><div>@@ -143,6 +143,7 @@ struct rpc_clnt_connection {</div><div> <span class="Apple-tab-span" style="white-space:pre">        </span>struct timeval           last_sent;</div><div> <span class="Apple-tab-span" style="white-space:pre">        </span>struct timeval           last_received;</div><div> <span class="Apple-tab-span" style="white-space:pre">        </span>int32_t                  ping_started;</div><div>+        uint32_t             clnt_conn_id;</div><div> };</div><div> typedef struct rpc_clnt_connection rpc_clnt_connection_t;</div><div> </div><div>diff --git a/xlators/protocol/client/src/client-handshake.c b/xlators/protocol/client/src/client-handshake.c</div><div>index d2083e6..1c2fc2f 100644</div><div>--- a/xlators/protocol/client/src/client-handshake.c</div><div>+++ b/xlators/protocol/client/src/client-handshake.c</div><div>@@ -471,9 +471,10 @@ client_set_lk_version (xlator_t *this)</div><div>         conf = (clnt_conf_t *) this-&gt;private;</div><div> </div><div>         req.lk_ver = client_get_lk_ver (conf);</div><div>-        ret = gf_asprintf (&amp;req.uid, &quot;%s-%s-%d&quot;,</div><div>+        ret = gf_asprintf (&amp;req.uid, &quot;%s-%s-%d-%u&quot;,</div><div>                            this-&gt;ctx-&gt;process_uuid, this-&gt;name,</div><div>-                           this-&gt;graph-&gt;id);</div><div>+                           this-&gt;graph-&gt;id, </div><div>+                           (conf-&gt;rpc) ? conf-&gt;rpc-&gt;conn.clnt_conn_id : 0);</div><div>         if (ret == -1)</div><div>                 goto err;</div><div> </div><div>@@ -1549,13 +1550,22 @@ client_setvolume (xlator_t *this, struct rpc_clnt *rpc)</div><div>                 }</div><div>         }</div><div> </div><div>+        /* For different connections between a pair client and server, we use a </div><div>+         * different clnt_conn_id to identify. Otherwise, there are some chances </div><div>+         * lead to flocks not released in a uncleanly disconnection.</div><div>+         * */</div><div>+        if (conf-&gt;rpc) {</div><div>+                conf-&gt;rpc-&gt;conn.clnt_conn_id = conf-&gt;clnt_conn_id++;</div><div>+        }</div><div>+</div><div>         /* With multiple graphs possible in the same process, we need a</div><div>            field to bring the uniqueness. Graph-ID should be enough to get the</div><div>            job done</div><div>         */</div><div>-        ret = gf_asprintf (&amp;process_uuid_xl, &quot;%s-%s-%d&quot;,</div><div>+        ret = gf_asprintf (&amp;process_uuid_xl, &quot;%s-%s-%d-%u&quot;,</div><div>                            this-&gt;ctx-&gt;process_uuid, this-&gt;name,</div><div>-                           this-&gt;graph-&gt;id);</div><div>+                           this-&gt;graph-&gt;id, </div><div>+                           (conf-&gt;rpc) ? conf-&gt;rpc-&gt;conn.clnt_conn_id : 0);</div><div>         if (-1 == ret) {</div><div>                 gf_log (this-&gt;name, GF_LOG_ERROR,</div><div>                         &quot;asprintf failed while setting process_uuid&quot;);</div><div>diff --git a/xlators/protocol/client/src/client.c b/xlators/protocol/client/src/client.c</div><div>index ad95574..35fef49 100644</div><div>--- a/xlators/protocol/client/src/client.c</div><div>+++ b/xlators/protocol/client/src/client.c</div><div>@@ -2437,6 +2437,7 @@ init (xlator_t *this)</div><div>         conf-&gt;lk_version         = 1;</div><div>         conf-&gt;grace_timer        = NULL;</div><div>         conf-&gt;grace_timer_needed = _gf_true;</div><div>+        conf-&gt;clnt_conn_id       = 0;</div><div> </div><div>         ret = client_init_grace_timer (this, this-&gt;options, conf);</div><div>         if (ret)</div><div>diff --git a/xlators/protocol/client/src/client.h b/xlators/protocol/client/src/client.h</div><div>index 0a27c09..dea90d1 100644</div><div>--- a/xlators/protocol/client/src/client.h</div><div>+++ b/xlators/protocol/client/src/client.h</div><div>@@ -116,6 +116,9 @@ typedef struct clnt_conf {</div><div> <span class="Apple-tab-span" style="white-space:pre">                                                </span>*/</div><div>         gf_boolean_t           filter_o_direct; /* if set, filter O_DIRECT from</div><div>                                                    the flags list of open() */</div><div>+        uint32_t               clnt_conn_id; /* connection id for each connection</div><div>+                                                in process_uuid, start with 0, </div><div>+                                                increase once a new connection */</div><div> } clnt_conf_t;</div><div> </div><div> typedef struct _client_fd_ctx {</div><div><br></div><br>On Wednesday, September 17, 2014, Jaden Liang &lt;<a href="mailto:jaden1q84@gmail.com">jaden1q84@gmail.com</a>&gt; wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>Hi all,</div><div><br></div><div>By several days tracking, we finally pinpointed the reason of glusterfs uncleanly </div><div>detach file flocks in frequently network disconnection. We are now working on</div><div>a patch to submit. And here is this issue details. Any suggestions will be </div><div>appreciated!</div><div><br></div><div>First of all, as I mentioned in </div><div><a href="http://supercolony.gluster.org/pipermail/gluster-devel/2014-September/042233.html" target="_blank">http://supercolony.gluster.org/pipermail/gluster-devel/2014-September/042233.html</a></div><div>This issue happens in a frequently network disconnection.</div><div><br></div><div>According to the sources, the server cleanup jobs is in server_connection_cleanup.</div><div>When the RPCSVC_EVENT_DISCONNECT happens, it will come here:</div><div><br></div><div>int</div><div>server_rpc_notify ()</div><div>{</div><div><span style="white-space:pre-wrap">        </span>......</div><div><span style="white-space:pre-wrap">        </span>        case RPCSVC_EVENT_DISCONNECT:</div><div><span style="white-space:pre-wrap">                                </span>......</div><div>                if (!conf-&gt;lk_heal) {</div><div>                        server_conn_ref (conn);</div><div>                        server_connection_put (this, conn, &amp;detached);</div><div>                        if (detached)</div><div>                                server_connection_cleanup (this, conn,</div><div>                                                           INTERNAL_LOCKS |</div><div>                                                           POSIX_LOCKS);</div><div>                        server_conn_unref (conn);</div><div><span style="white-space:pre-wrap">        </span>......</div><div>}</div><div><br></div><div>The server_connection_cleanup() will be called while variable &#39;detached&#39; is true. </div><div>And the &#39;detached&#39; is set by server_connection_put():</div><div><span style="white-space:pre-wrap">        </span></div><div>server_connection_t*</div><div>server_connection_put (xlator_t *this, server_connection_t *conn,</div><div>                       gf_boolean_t *detached)</div><div>{</div><div>        server_conf_t       *conf = NULL;</div><div>        gf_boolean_t        unref = _gf_false;</div><div><br></div><div>        if (detached)</div><div>                *detached = _gf_false;</div><div>        conf = this-&gt;private;</div><div>        pthread_mutex_lock (&amp;conf-&gt;mutex);</div><div>        {</div><div>                conn-&gt;bind_ref--;</div><div>                if (!conn-&gt;bind_ref) {</div><div>                        list_del_init (&amp;conn-&gt;list);</div><div>                        unref = _gf_true;</div><div>                }</div><div>        }</div><div>        pthread_mutex_unlock (&amp;conf-&gt;mutex);</div><div>        if (unref) {</div><div>                gf_log (this-&gt;name, GF_LOG_INFO, &quot;Shutting down connection %s&quot;,</div><div>                        conn-&gt;id);</div><div>                if (detached)</div><div>                        *detached = _gf_true;</div><div>                server_conn_unref (conn);</div><div>                conn = NULL;</div><div>        }</div><div>        return conn;</div><div>}</div><div><br></div><div>The &#39;detached&#39; is only set _gf_true when &#39;conn-&gt;bind_ref&#39; decrease to 0. </div><div>This &#39;conn-&gt;bind_ref&#39; is set in server_connection_get(), increase or set to 1.</div><div><br></div><div>server_connection_t *</div><div>server_connection_get (xlator_t *this, const char *id)</div><div>{</div><div><span style="white-space:pre-wrap">                        </span>......</div><div>                list_for_each_entry (trav, &amp;conf-&gt;conns, list) {</div><div>                        if (!strcmp (trav-&gt;id, id)) {</div><div>                                conn = trav;</div><div>                                conn-&gt;bind_ref++;</div><div>                                goto unlock;</div><div>                        }</div><div>                }</div><div><span style="white-space:pre-wrap">                        </span>......</div><div>}</div><div><br></div><div>When the connection id is same, then the &#39;conn-&gt;bind_ref&#39; will be increased.</div><div>Therefore, the problem should be a reference mismatch increase or decrease. Then </div><div>we add some logs to verify our guess.</div><div><br></div><div>// 1st connection comes in. and there is no id &#39;host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0&#39;</div><div>in the connection table. The &#39;conn-&gt;bind_ref&#39; is set to 1.</div><div>[2014-09-17 04:42:28.950693] D [server-helpers.c:712:server_connection_get] 0-vs_vol_rep2-server: server connection id: host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0, conn-&gt;bind_ref:1, found:0</div><div>[2014-09-17 04:42:28.950717] D [server-handshake.c:430:server_setvolume] 0-vs_vol_rep2-server: Connected to host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0</div><div>[2014-09-17 04:42:28.950758] I [server-handshake.c:567:server_setvolume] 0-vs_vol_rep2-server: accepted client from host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0 (version: 3.4.5) (peer: host-000c29e93d20:1015)</div><div>......</div><div>// Keep running several minutes.......</div><div>......</div><div>// Network disconnected here. The TCP socket of client side is disconnected by </div><div>time-out, by the server-side socket still keep connected. AT THIS MOMENT, </div><div>network restore. Client side reconnect a new TCP connection JUST BEFORE the </div><div>last socket on server-side is reset. Note that at this point, there is 2 valid </div><div>sockets on server side. The later new connection use the same conn id &#39;host-000</div><div>c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0&#39; look up in the </div><div>connection table and increase the &#39;conn-&gt;bind_ref&#39; to 2.</div><div><br></div><div>[2014-09-17 04:46:16.135066] D [server-helpers.c:712:server_connection_get] 0-vs_vol_rep2-server: server connection id: host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0, conn-&gt;bind_ref:2, found:1 // HERE IT IS, ref increase to 2!!!</div><div>[2014-09-17 04:46:16.135113] D [server-handshake.c:430:server_setvolume] 0-vs_vol_rep2-server: Connected to host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0</div><div>[2014-09-17 04:46:16.135157] I [server-handshake.c:567:server_setvolume] 0-vs_vol_rep2-server: accepted client from host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0 (version: 3.4.5) (peer: host-000c29e93d20:1018)</div><div><br></div><div>// After 13 seconds, the old connection is reset, decrease the &#39;conn-&gt;bind_ref&#39; to 1. </div><div><br></div><div>[2014-09-17 04:46:28.688780] W [socket.c:2121:__socket_proto_state_machine] 0-tcp.vs_vol_rep2-server: ret = -1, error: Connection reset by peer, peer (host-000c29e93d20:1015)</div><div>[2014-09-17 04:46:28.688790] I [socket.c:2274:socket_event_handler] 0-transport: socket_event_poll_in failed, ret=-1.</div><div>[2014-09-17 04:46:28.688797] D [socket.c:2281:socket_event_handler] 0-transport: disconnecting now</div><div>[2014-09-17 04:46:28.688831] I [server.c:762:server_rpc_notify] 0-vs_vol_rep2-server: disconnecting connectionfrom host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0(host-000c29e93d20:1015)</div><div>[2014-09-17 04:46:28.688861] D [server-helpers.c:744:server_connection_put] 0-vs_vol_rep2-server: conn-&gt;bind_ref:1</div><div><br></div><div>In our production environment, there is some flocks in the 1st connection. </div><div>According to the logs, there is no way to cleanup the flocks in the 1st connection.</div><div>And the 2nd new connection, the client-side can&#39;t flock again.</div><div><br></div><div>Therefore, we think the major reason is different connections using the same conn id.</div><div>The conn id is assembled in client_setvolume()</div><div><br></div><div><span style="white-space:pre-wrap">                </span>ret = gf_asprintf (&amp;process_uuid_xl, &quot;%s-%s-%d&quot;,</div><div>                           this-&gt;ctx-&gt;process_uuid, this-&gt;name,</div><div>                           this-&gt;graph-&gt;id);</div><div><br></div><div>The conn id contains 3 parts:</div><div>this-&gt;ctx-&gt;process_uuid: hostname + pid + startup timestamp</div><div>this-&gt;name: tranlator name</div><div>this-&gt;graph-&gt;id: graph id</div><div><br></div><div>It is apparently that the conn id is same unless the client side restart. So when </div><div>network disconnects, there is some chance that socket on client side timeout and </div><div>the one on server side is alive. At this moment, network restore, client reconnect </div><div>before server old socket reset, that will cause the file flocks of old connection </div><div>unclean.</div><div><br></div><div>That is our total analysis of this flock leak issue. Now we are working on the patch.</div><div>Hope someone could review it when it is finished.</div><div><br></div><div>Any other comment is grateful, Thank you! </div><div><br></div>

</blockquote></div>