Now you are showing a segfault backtrace from glusterd, while previously (the &quot;single thread&quot; backtrace) was from glusterfsd. Are both having problems?<div><br></div><div>Avati<br><br><div class="gmail_quote">On Mon, Feb 4, 2013 at 1:34 AM, Emmanuel Dreyfus <span dir="ltr">&lt;<a href="mailto:manu@netbsd.org" target="_blank">manu@netbsd.org</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Sun, Feb 03, 2013 at 02:13:40PM -0800, Anand Avati wrote:<br>

&gt; Yeah, a lot of threads are &quot;missing&quot;! Do the logs have anything unusual?<br>

<br>

</div>I understand the threads are here, but the process is corrupted enough<br>

that gdb cannot figure them out. A kernel thread shows 15 threads operating<br>

and they do not terminate before it stops responding.<br>

<br>

Running with electric fence leads to an early crash. I am not sure<br>

it is related, but it is probably worth a fix:<br>

<br>

Program terminated with signal 11, Segmentation fault.<br>

#0  slotForUserAddress (address=0x7f7ff2bafff4) at efence.c:648<br>

648     efence.c: No such file or directory.<br>

        in efence.c<br>

(gdb) bt<br>

#0  slotForUserAddress (address=0x7f7ff2bafff4) at efence.c:648<br>

#1  free (address=0x7f7ff2bafff4) at efence.c:713<br>

#2  0x00007f7ff744a31a in runner_end (runner=0x7f7ff1bfeef0) at run.c:370<br>

#3  0x00007f7ff744ac9e in runner_run_generic (runner=0x7f7ff1bfeef0,<br>

    rfin=0x7f7ff744a2f6 &lt;runner_end&gt;) at run.c:386<br>

#4  0x00007f7ff343ddb9 in glusterd_volume_start_glusterfs (<br>

    volinfo=0x7f7ff370fae0, brickinfo=0x7f7ff37407a8, wait=_gf_true)<br>

    at glusterd-utils.c:1337<br>

#5  0x00007f7ff344366b in glusterd_brick_start (volinfo=0x7f7ff370fae0,<br>

    brickinfo=0x7f7ff37407a8, wait=_gf_true) at glusterd-utils.c:3961<br>

#6  0x00007f7ff3447328 in glusterd_restart_bricks (conf=0x7f7ff73f2a98)<br>

    at glusterd-utils.c:3991<br>

#7  0x00007f7ff743c9c4 in synctask_wrap (old_task=&lt;optimized out&gt;)<br>

    at syncop.c:129<br>

#8  0x00007f7ff5e580a0 in swapcontext () from /usr/lib/libc.so.12<br>

<br>

In runner_end():<br>

368             if (runner-&gt;argv) {<br>

369                     for (p = runner-&gt;argv; *p; p++)<br>

370                             GF_FREE (*p);<br>

<br>

Inspection of runner-&gt;argv shows it is not NULL terminated. electric-fence<br>

with EF_PROTECT_BELOW  cause us to crash here:<br>

0x7f7ff2ba7e00: 0xf2babfe4      0x00007f7f      0xf2badffc      0x00007f7f<br>

0x7f7ff2ba7e10: 0xf2bafff4      0x00007f7f      0xf2bb1ff0      0x00007f7f<br>

0x7f7ff2ba7e20: 0xf2bb3fe4      0x00007f7f      0xf2bb5ffc      0x00007f7f<br>

0x7f7ff2ba7e30: 0xf2bb7fc4      0x00007f7f      0xf2bb9ffc      0x00007f7f<br>

0x7f7ff2ba7e40: 0xf2bbbfcc      0x00007f7f      0xf2bbdff0      0x00007f7f<br>

0x7f7ff2ba7e50: 0xf2bbfff0      0x00007f7f      0xf2bc1ffc      0x00007f7f<br>

0x7f7ff2ba7e60: 0xf2bc3fcc      0x00007f7f      0xf2bc5ff0      0x00007f7f<br>

0x7f7ff2ba7e70: 0xf2bc7fc4      0x00007f7f      0xf2bc9ff0      0x00007f7f<br>

0x7f7ff2ba7e80: 0xf2bcbff8      0x00007f7f      0xf2bcdff0      0x00007f7f<br>

0x7f7ff2ba7e90: 0xf2bcffe0      0x00007f7f      Cannot access memory at address 0x7f7ff2ba7e98<br>

<br>

In case it helps, the last element:<br>

(gdb) x/1s 0x00007f7ff2bcffe0<br>

0x7f7ff2bcffe0:  &quot;gfs33-server.listen-port=49152&quot;<br>

<br>

It is indeed the last one reported by ps:<br>

PID TTY   STAT    TIME COMMAND<br>

626 ?     Ssl  0:00.34 /usr/local/sbin/glusterfsd -s localhost --volfile-id gfs33.hotstuff.export-wd1a -p /var/lib/glusterd/vols/gfs33/run/hotstuff-export-wd1a.pid -S /var/run/66fedc0377e53ff9d6523d0802a230d1.socket --brick-name /export/wd1a -l /usr/local/var/log/glusterfs/bricks/export-wd1a.log --xlator-option *-posix.glusterd-uuid=2dbb8fc1-c2ab-4992-b080-fdc8556d1e34 --brick-port 49152 --xlator-option gfs33-server.listen-port=49152<br>


<span class="HOEnZb"><font color="#888888"><br>

<br>

<br>

<br>

<br>

--<br>

Emmanuel Dreyfus<br>

<a href="mailto:manu@netbsd.org">manu@netbsd.org</a><br>

</font></span></blockquote></div><br></div>