[Gluster-devel] Frequently network failure cause a FD leak after setting a short network.tcp-timeout

Thu Sep 11 10:42:09 UTC 2014

Hi all,

First of all, I have sent a mail about FD leak in network failure before.
 -
http://supercolony.gluster.org/pipermail/gluster-devel/2014-August/041969.html
 'About file descriptor leak in glusterfsd daemon after network failure'

Thank Niels de Vos for telling me there is a bug 1129787 report and patch
to solve.
 - https://bugzilla.redhat.com/show_bug.cgi?id=1129787
And also I have applied the patch http://review.gluster.org/8065 and set
the
network.tcp-timeout as 30s to verify if the issue is fixed.

The patch and network.tcp-timeout setting worked fine mostly. However, I
found
out that in a very subtle condition, the FD still leaked. Here is the
details:

In a replicated gluster volume with 2 nodes server. On the client-side, I
use
mount.glusterfs to access that gluster volume. I wrote a test program which
opened a file in the volume and flock it(only flock once), then read and
write file
frequently. On one of nodes, I simulated a network failure for 15s then
reconnect.
Note that the 15s is less than the network.tcp-timeout 30s, so the tcp
connection is
kept alive. By this way, the FD of brick file will be leaked sometime.
After the test
program exit, the FD in server-side wouldn't be closed. If restart the test
program,
it will failed while flock, return a Resource Temporarily Unavailable error.

Network failure yimeline:
---(15s connected)---|---(15s disconnected)---|---(15s
connected)---|...repeat...

To trace this issue, I added some log codes in xlators/storage/posix. Found
that on
the network failure server, the glusterfsd will open and close file
repeatedly in 1
second when the network is connected(Is it a stat operation?). The leak FD
happened
in the disconnection switch, the last opened fd isn't closed.

So would this kind of network frequent failure cause the message of closing
fd lost?

And according to this issue, I find one patch.
 - http://review.gluster.org/#/c/7470
But I am not sure if this patch is the same with my situation.

My log:
(Connected logs)
[2014-09-09 12:05:49.055772] I [posix.c:1846:posix_open]
0-vs_vol_rep2-posix: real_path: /mnt/h1/testfile, _fd=18
[2014-09-09 12:05:49.058009] I [posix.c:2286:posix_release]
0-vs_vol_rep2-posix: posix_release: fd=18
[2014-09-09 12:05:49.058499] I [posix.c:1846:posix_open]
0-vs_vol_rep2-posix: real_path: /mnt/h1/testfile, _fd=18
[2014-09-09 12:05:49.060655] I [posix.c:2286:posix_release]
0-vs_vol_rep2-posix: posix_release: fd=18
[2014-09-09 12:05:49.061587] I [posix.c:1846:posix_open]
0-vs_vol_rep2-posix: real_path: /mnt/h1/testfile, _fd=18
...
// Disconnect for 15s, we can see that the fd=18 isn't closed.
...
[2014-09-09 12:06:15.910357] I [posix-helpers.c:1027:__posix_fd_ctx_get]
0-vs_vol_rep2-posix: open, _fd=16,
real_path=/mnt/h1/.glusterfs/44/3d/443d50b8-5187-48d0-92ba-779c62cb9fe0
[2014-09-09 12:06:15.910637] I [posix.c:2286:posix_release]
0-vs_vol_rep2-posix: posix_release: fd=16
[2014-09-09 12:06:15.919489] I [posix-helpers.c:1027:__posix_fd_ctx_get]
0-vs_vol_rep2-posix: open, _fd=16,
real_path=/mnt/h1/.glusterfs/44/3d/443d50b8-5187-48d0-92ba-779c62cb9fe0
[2014-09-09 12:06:15.919814] I [posix.c:2286:posix_release]
0-vs_vol_rep2-posix: posix_release: fd=16
[2014-09-09 12:06:16.003219] I [posix-helpers.c:1027:__posix_fd_ctx_get]
0-vs_vol_rep2-posix: open, _fd=16,
real_path=/mnt/h1/.glusterfs/44/3d/443d50b8-5187-48d0-92ba-779c62cb9fe0
[2014-09-09 12:06:16.003395] I [posix.c:2286:posix_release]
0-vs_vol_rep2-posix: posix_release: fd=16
[2014-09-09 12:06:45.015708] I [posix.c:1846:posix_open]
0-vs_vol_rep2-posix: real_path: /mnt/h1/testfile, _fd=23

// On the server side, lsof show
#lsof /mnt/h1/testfile
COMMAND     PID USER   FD   TYPE DEVICE    SIZE/OFF      NODE NAME
glusterfs 10583 root   18u   REG   8,17 21478375424 335543947
/mnt/h1/testfile

// On the server-side, gluster volume status fd show
# gluster volume status vol_rep2 h1:/mnt/h1 fd
FD tables for volume vs_vol_rep2
----------------------------------------------
Brick : vol_rep2 h1:/mnt/h1
Connection 1:
RefCount = 0  MaxFDs = 128  FirstFree = 2
FD Entry            PID                 RefCount            Flags

--------            ---                 --------            -----

0                   9153                1                   16386

Connection 2:
RefCount = 0  MaxFDs = 128  FirstFree = 0
No open fds

// Note that 9153 is my test program pid which is EXITED already.

// statedump
[xlator.features.locks.vs_vol_rep2-locks.inode]
path=/testfile
mandatory=0
posixlk-count=1
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
9153, owner=dac8e118f3c2e02e, transport=0x7faba00090d0, , granted at Tue
Sep  9 20:05:49 2014
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140911/221af2cd/attachment-0001.html>