<div>Hi all,</div><div><br></div><div>By several days tracking, we finally pinpointed the reason of glusterfs uncleanly </div><div>detach file flocks in frequently network disconnection. We are now working on</div><div>a patch to submit. And here is this issue details. Any suggestions will be </div><div>appreciated!</div><div><br></div><div>First of all, as I mentioned in </div><div><a href="http://supercolony.gluster.org/pipermail/gluster-devel/2014-September/042233.html">http://supercolony.gluster.org/pipermail/gluster-devel/2014-September/042233.html</a></div><div>This issue happens in a frequently network disconnection.</div><div><br></div><div>According to the sources, the server cleanup jobs is in server_connection_cleanup.</div><div>When the RPCSVC_EVENT_DISCONNECT happens, it will come here:</div><div><br></div><div>int</div><div>server_rpc_notify ()</div><div>{</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>......</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>        case RPCSVC_EVENT_DISCONNECT:</div><div><span class="Apple-tab-span" style="white-space:pre">                                </span>......</div><div>                if (!conf-&gt;lk_heal) {</div><div>                        server_conn_ref (conn);</div><div>                        server_connection_put (this, conn, &amp;detached);</div><div>                        if (detached)</div><div>                                server_connection_cleanup (this, conn,</div><div>                                                           INTERNAL_LOCKS |</div><div>                                                           POSIX_LOCKS);</div><div>                        server_conn_unref (conn);</div><div><span class="Apple-tab-span" style="white-space:pre">        </span>......</div><div>}</div><div><br></div><div>The server_connection_cleanup() will be called while variable &#39;detached&#39; is true. </div><div>And the &#39;detached&#39; is set by server_connection_put():</div><div><span class="Apple-tab-span" style="white-space:pre">        </span></div><div>server_connection_t*</div><div>server_connection_put (xlator_t *this, server_connection_t *conn,</div><div>                       gf_boolean_t *detached)</div><div>{</div><div>        server_conf_t       *conf = NULL;</div><div>        gf_boolean_t        unref = _gf_false;</div><div><br></div><div>        if (detached)</div><div>                *detached = _gf_false;</div><div>        conf = this-&gt;private;</div><div>        pthread_mutex_lock (&amp;conf-&gt;mutex);</div><div>        {</div><div>                conn-&gt;bind_ref--;</div><div>                if (!conn-&gt;bind_ref) {</div><div>                        list_del_init (&amp;conn-&gt;list);</div><div>                        unref = _gf_true;</div><div>                }</div><div>        }</div><div>        pthread_mutex_unlock (&amp;conf-&gt;mutex);</div><div>        if (unref) {</div><div>                gf_log (this-&gt;name, GF_LOG_INFO, &quot;Shutting down connection %s&quot;,</div><div>                        conn-&gt;id);</div><div>                if (detached)</div><div>                        *detached = _gf_true;</div><div>                server_conn_unref (conn);</div><div>                conn = NULL;</div><div>        }</div><div>        return conn;</div><div>}</div><div><br></div><div>The &#39;detached&#39; is only set _gf_true when &#39;conn-&gt;bind_ref&#39; decrease to 0. </div><div>This &#39;conn-&gt;bind_ref&#39; is set in server_connection_get(), increase or set to 1.</div><div><br></div><div>server_connection_t *</div><div>server_connection_get (xlator_t *this, const char *id)</div><div>{</div><div><span class="Apple-tab-span" style="white-space:pre">                        </span>......</div><div>                list_for_each_entry (trav, &amp;conf-&gt;conns, list) {</div><div>                        if (!strcmp (trav-&gt;id, id)) {</div><div>                                conn = trav;</div><div>                                conn-&gt;bind_ref++;</div><div>                                goto unlock;</div><div>                        }</div><div>                }</div><div><span class="Apple-tab-span" style="white-space:pre">                        </span>......</div><div>}</div><div><br></div><div>When the connection id is same, then the &#39;conn-&gt;bind_ref&#39; will be increased.</div><div>Therefore, the problem should be a reference mismatch increase or decrease. Then </div><div>we add some logs to verify our guess.</div><div><br></div><div>// 1st connection comes in. and there is no id &#39;host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0&#39;</div><div>in the connection table. The &#39;conn-&gt;bind_ref&#39; is set to 1.</div><div>[2014-09-17 04:42:28.950693] D [server-helpers.c:712:server_connection_get] 0-vs_vol_rep2-server: server connection id: host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0, conn-&gt;bind_ref:1, found:0</div><div>[2014-09-17 04:42:28.950717] D [server-handshake.c:430:server_setvolume] 0-vs_vol_rep2-server: Connected to host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0</div><div>[2014-09-17 04:42:28.950758] I [server-handshake.c:567:server_setvolume] 0-vs_vol_rep2-server: accepted client from host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0 (version: 3.4.5) (peer: host-000c29e93d20:1015)</div><div>......</div><div>// Keep running several minutes.......</div><div>......</div><div>// Network disconnected here. The TCP socket of client side is disconnected by </div><div>time-out, by the server-side socket still keep connected. AT THIS MOMENT, </div><div>network restore. Client side reconnect a new TCP connection JUST BEFORE the </div><div>last socket on server-side is reset. Note that at this point, there is 2 valid </div><div>sockets on server side. The later new connection use the same conn id &#39;host-000</div><div>c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0&#39; look up in the </div><div>connection table and increase the &#39;conn-&gt;bind_ref&#39; to 2.</div><div><br></div><div>[2014-09-17 04:46:16.135066] D [server-helpers.c:712:server_connection_get] 0-vs_vol_rep2-server: server connection id: host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0, conn-&gt;bind_ref:2, found:1 // HERE IT IS, ref increase to 2!!!</div><div>[2014-09-17 04:46:16.135113] D [server-handshake.c:430:server_setvolume] 0-vs_vol_rep2-server: Connected to host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0</div><div>[2014-09-17 04:46:16.135157] I [server-handshake.c:567:server_setvolume] 0-vs_vol_rep2-server: accepted client from host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0 (version: 3.4.5) (peer: host-000c29e93d20:1018)</div><div><br></div><div>// After 13 seconds, the old connection is reset, decrease the &#39;conn-&gt;bind_ref&#39; to 1. </div><div><br></div><div>[2014-09-17 04:46:28.688780] W [socket.c:2121:__socket_proto_state_machine] 0-tcp.vs_vol_rep2-server: ret = -1, error: Connection reset by peer, peer (host-000c29e93d20:1015)</div><div>[2014-09-17 04:46:28.688790] I [socket.c:2274:socket_event_handler] 0-transport: socket_event_poll_in failed, ret=-1.</div><div>[2014-09-17 04:46:28.688797] D [socket.c:2281:socket_event_handler] 0-transport: disconnecting now</div><div>[2014-09-17 04:46:28.688831] I [server.c:762:server_rpc_notify] 0-vs_vol_rep2-server: disconnecting connectionfrom host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0(host-000c29e93d20:1015)</div><div>[2014-09-17 04:46:28.688861] D [server-helpers.c:744:server_connection_put] 0-vs_vol_rep2-server: conn-&gt;bind_ref:1</div><div><br></div><div>In our production environment, there is some flocks in the 1st connection. </div><div>According to the logs, there is no way to cleanup the flocks in the 1st connection.</div><div>And the 2nd new connection, the client-side can&#39;t flock again.</div><div><br></div><div>Therefore, we think the major reason is different connections using the same conn id.</div><div>The conn id is assembled in client_setvolume()</div><div><br></div><div><span class="Apple-tab-span" style="white-space:pre">                </span>ret = gf_asprintf (&amp;process_uuid_xl, &quot;%s-%s-%d&quot;,</div><div>                           this-&gt;ctx-&gt;process_uuid, this-&gt;name,</div><div>                           this-&gt;graph-&gt;id);</div><div><br></div><div>The conn id contains 3 parts:</div><div>this-&gt;ctx-&gt;process_uuid: hostname + pid + startup timestamp</div><div>this-&gt;name: tranlator name</div><div>this-&gt;graph-&gt;id: graph id</div><div><br></div><div>It is apparently that the conn id is same unless the client side restart. So when </div><div>network disconnects, there is some chance that socket on client side timeout and </div><div>the one on server side is alive. At this moment, network restore, client reconnect </div><div>before server old socket reset, that will cause the file flocks of old connection </div><div>unclean.</div><div><br></div><div>That is our total analysis of this flock leak issue. Now we are working on the patch.</div><div>Hope someone could review it when it is finished.</div><div><br></div><div>Any other comment is grateful, Thank you! </div><div><br></div>