Is it possible the system was running low on memory? I see you have 48GB, but memory registration failure typically would be because the system limit on the number of pinnable pages in RAM was hit. Can you tell us the size of your core dump files after the crash?<div>

<br></div><div>Avati<br><br><div class="gmail_quote">On Fri, Jun 8, 2012 at 4:22 PM, Ling Ho <span dir="ltr">&lt;<a href="mailto:ling@slac.stanford.edu" target="_blank">ling@slac.stanford.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hello,<br>

<br>

I have a brick that crashed twice today, and another different brick that crashed just a while a go.<br>

<br>

This is what I see in one of the brick logs:<br>

<br>

patchset: git://<a href="http://git.gluster.com/glusterfs.git" target="_blank">git.gluster.com/<u></u>glusterfs.git</a><br>

patchset: git://<a href="http://git.gluster.com/glusterfs.git" target="_blank">git.gluster.com/<u></u>glusterfs.git</a><br>

signal received: 6<br>

signal received: 6<br>

time of crash: 2012-06-08 15:05:11<br>

configuration details:<br>

argp 1<br>

backtrace 1<br>

dlfcn 1<br>

fdatasync 1<br>

libpthread 1<br>

llistxattr 1<br>

setfsid 1<br>

spinlock 1<br>

epoll.h 1<br>

xattr.h 1<br>

st_atim.tv_nsec 1<br>

package-string: glusterfs 3.2.6<br>

/lib64/libc.so.6[0x34bc032900]<br>

/lib64/libc.so.6(gsignal+0x35)<u></u>[0x34bc032885]<br>

/lib64/libc.so.6(abort+0x175)[<u></u>0x34bc034065]<br>

/lib64/libc.so.6[0x34bc06f977]<br>

/lib64/libc.so.6[0x34bc075296]<br>

/opt/glusterfs/3.2.6/lib64/<u></u>libglusterfs.so.0(__gf_free+<u></u>0x44)[0x7f1740ba25e4]<br>

/opt/glusterfs/3.2.6/lib64/<u></u>libgfrpc.so.0(rpc_transport_<u></u>destroy+0x47)[0x7f1740956967]<br>

/opt/glusterfs/3.2.6/lib64/<u></u>libgfrpc.so.0(rpc_transport_<u></u>unref+0x62)[0x7f1740956a32]<br>

/opt/glusterfs/3.2.6/lib64/<u></u>glusterfs/3.2.6/rpc-transport/<u></u>rdma.so(+0xc135)[<u></u>0x7f173ca27135]<br>

/lib64/libpthread.so.0[<u></u>0x34bc8077f1]<br>

/lib64/libc.so.6(clone+0x6d)[<u></u>0x34bc0e5ccd]<br>

---------<br>

<br>

And somewhere before these, there is also<br>

[2012-06-08 15:05:07.512604] E [rdma.c:198:rdma_new_post] 0-rpc-transport/rdma: memory registration failed<br>

<br>

I have 48GB of memory on the system:<br>

<br>

# free<br>

             total       used       free     shared    buffers     cached<br>

Mem:      49416716   34496648   14920068          0      31692   28209612<br>

-/+ buffers/cache:    6255344   43161372<br>

Swap:      4194296       <a href="tel:1740%20%20%20%204192556" value="+17404192556" target="_blank">1740    4192556</a><br>

<br>

# uname -a<br>

Linux psanaoss213 2.6.32-220.7.1.el6.x86_64 #1 SMP Fri Feb 10 15:22:22 EST 2012 x86_64 x86_64 x86_64 GNU/Linux<br>

<br>

The server gluster versions is 3.2.6-1. I am using have both rdma clients and tcp clients over 10Gb/s network.<br>

<br>

Any suggestion what I should look for?<br>

<br>

Is there a way to just restart the brick, and not glusterd on the server? I have 8 bricks on the server.<br>

<br>

Thanks,<br>

...<br>

ling<br>

<br>

<br>

Here&#39;s the volume info:<br>

<br>

# gluster volume info<br>

<br>

Volume Name: ana12<br>

Type: Distribute<br>

Status: Started<br>

Number of Bricks: 40<br>

Transport-type: tcp,rdma<br>

Bricks:<br>

Brick1: psanaoss214:/brick1<br>

Brick2: psanaoss214:/brick2<br>

Brick3: psanaoss214:/brick3<br>

Brick4: psanaoss214:/brick4<br>

Brick5: psanaoss214:/brick5<br>

Brick6: psanaoss214:/brick6<br>

Brick7: psanaoss214:/brick7<br>

Brick8: psanaoss214:/brick8<br>

Brick9: psanaoss211:/brick1<br>

Brick10: psanaoss211:/brick2<br>

Brick11: psanaoss211:/brick3<br>

Brick12: psanaoss211:/brick4<br>

Brick13: psanaoss211:/brick5<br>

Brick14: psanaoss211:/brick6<br>

Brick15: psanaoss211:/brick7<br>

Brick16: psanaoss211:/brick8<br>

Brick17: psanaoss212:/brick1<br>

Brick18: psanaoss212:/brick2<br>

Brick19: psanaoss212:/brick3<br>

Brick20: psanaoss212:/brick4<br>

Brick21: psanaoss212:/brick5<br>

Brick22: psanaoss212:/brick6<br>

Brick23: psanaoss212:/brick7<br>

Brick24: psanaoss212:/brick8<br>

Brick25: psanaoss213:/brick1<br>

Brick26: psanaoss213:/brick2<br>

Brick27: psanaoss213:/brick3<br>

Brick28: psanaoss213:/brick4<br>

Brick29: psanaoss213:/brick5<br>

Brick30: psanaoss213:/brick6<br>

Brick31: psanaoss213:/brick7<br>

Brick32: psanaoss213:/brick8<br>

Brick33: psanaoss215:/brick1<br>

Brick34: psanaoss215:/brick2<br>

Brick35: psanaoss215:/brick4<br>

Brick36: psanaoss215:/brick5<br>

Brick37: psanaoss215:/brick7<br>

Brick38: psanaoss215:/brick8<br>

Brick39: psanaoss215:/brick3<br>

Brick40: psanaoss215:/brick6<br>

Options Reconfigured:<br>

performance.io-thread-count: 16<br>

performance.write-behind-<u></u>window-size: 16MB<br>

performance.cache-size: 1GB<br>

nfs.disable: on<br>

performance.cache-refresh-<u></u>timeout: 1<br>

network.ping-timeout: 42<br>

performance.cache-max-file-<u></u>size: 1PB<br>

<br>

______________________________<u></u>_________________<br>

Gluster-users mailing list<br>

<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>

<a href="http://gluster.org/cgi-bin/mailman/listinfo/gluster-users" target="_blank">http://gluster.org/cgi-bin/<u></u>mailman/listinfo/gluster-users</a><br>

</blockquote></div><br></div>