Without performance translator, the result is the same.<br>I'm trying with gdb as soon as possible.<br>you say, EBADFD is fine, AFR will try the operation on the other server , ok <br>so i understand, but it I test to stop this server, gluster can not retrieve the first which is EBADFD.<br>
A lot of my problem comes from here, i think, because with my two server, <br>i stop the first, then restart , wait, stop the second, restart and all is KO.<br>I just try to stop the first and test, then all is ok .<br>Nicolas<br>
<br><div class="gmail_quote">On Tue, Feb 3, 2009 at 3:50 PM, Krishna Srinivas <span dir="ltr"><<a href="mailto:krishna@zresearch.com">krishna@zresearch.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Nicolas,<br>
<br>
When you restart the server logs indicating EBADFD is fine, AFR will<br>
try the operation on the other server. When you have the situation<br>
where the glusterfs client hangs can you attach gdb to the glusterfs<br>
and mail us the backtrace?<br>
<br>
gdb -p <pid of glusterfs><br>
type "bt" at the gdb command prompt.<br>
<br>
Just want to confirm that glusterfs has not blocked at a system call.<br>
(as we have non blocking io now)<br>
<br>
Can you see if removing the performance translators helps? we can<br>
narrow down to the problem translator in such case.<br>
<br>
Krishna<br>
<br>
On Tue, Feb 3, 2009 at 5:18 PM, nicolas prochazka<br>
<div class="Ih2E3d"><<a href="mailto:prochazka.nicolas@gmail.com">prochazka.nicolas@gmail.com</a>> wrote:<br>
</div><div><div></div><div class="Wj3C7c">> ok,<br>
> So now I know there's few bugs,<br>
><br>
> 1 - when stop and i restart a server , I've the EBADFD bug<br>
> 2 - When I stop server :<br>
> - with --disable-direct-io-mode : my big image file become corrupt<br>
> ( missing data ...)<br>
> - without --disable-direct-io-mode : my process hangs and cpu load<br>
> grows a lot (by process )<br>
><br>
> any ideas ?<br>
><br>
> Regards,<br>
> Nicolas Prochazka<br>
><br>
> On Tue, Feb 3, 2009 at 5:42 AM, Raghavendra G <<a href="mailto:raghavendra@zresearch.com">raghavendra@zresearch.com</a>><br>
> wrote:<br>
>><br>
>> Hi Nicolas,<br>
>><br>
>> On Tue, Feb 3, 2009 at 12:01 AM, nicolas prochazka<br>
>> <<a href="mailto:prochazka.nicolas@gmail.com">prochazka.nicolas@gmail.com</a>> wrote:<br>
>>><br>
>>> I inspect the log and i find something interesting :<br>
>>> All is ok,<br>
>>> i have stop 10.98.98.2 and i restart it :<br>
>>><br>
>>> 2009-02-02 15:00:32 D [client-protocol.c:6498:notify] brick_10.98.98.2:<br>
>>> got GF_EVENT_CHILD_UP<br>
>>> 2009-02-02 15:00:32 D [socket.c:924:socket_connect] brick_10.98.98.2:<br>
>>> connect () called on transport already connected<br>
>>> 2009-02-02 15:00:32 N [client-protocol.c:5786:client_setvolume_cbk]<br>
>>> brick_10.98.98.2: connection and handshake succeeded<br>
>>> 2009-02-02 15:00:40 D [fuse-bridge.c:1945:fuse_statfs] glusterfs-fuse:<br>
>>> 17399: STATFS<br>
>>> 2009-02-02 15:00:40 D [fuse-bridge.c:368:fuse_entry_cbk] glusterfs-fuse:<br>
</div></div></blockquote></div><br>