<div dir="ltr">hi , maybe I need to comment some more infomation of the situation.<div><br></div><div>----->>>>> version of client and server <<<<<-----</div><div><br></div><div>client version : glusterfs 3.6.0 beta1</div><div>server version: glusterfs 3.3.0<br><div><br></div><div><br></div><div>---------->>>> volume info <<<<<-------------</div><div><div><br></div><div>Volume Name: myvol</div><div>Type: Distributed-Replicate</div><div>Volume ID: c36dfe1c-3f95-4d64-9dae-1b5916b56b19</div><div>Status: Started</div><div>Number of Bricks: 2 x 2 = 4</div><div>Transport-type: tcp</div><div>Bricks:</div><div>Brick1: 10.10.10.10:/mnt/xfsd/myvol-0</div><div>Brick2: 10.10.10.10:/mnt/xfsd/myvol-1</div><div>Brick3: 10.10.10.10:/mnt/xfsd/myvol-2</div><div>Brick4: 10.10.10.10:/mnt/xfsd/myvol-3</div></div><div><br></div><div>-------------->>>> mount with --debug <<<<<-------------------</div><div>/usr/sbin/glusterfs --volfile-server=10.10.10.10 --volfile-id=myvol /mnt/myvol --debug<br></div><div><br></div><div>then, at another window of client, I goto the mount point /mnt/myvol and execute "ls"</div><div>then a couple of lines present. I think no error message. following, I execute the command</div><div><br></div><div>echo "hello,gluster3.3" > /mnt/myvol/file</div><div><br></div><div>then some "error like" message are present . Again, with "ls" , I can find the file with name "file" under</div><div>mount point. but "cat /mnt/myvol/file" will result in nothing. which means , this is an empyt file !!!</div><div>After many tests, a conclusion is coming : I cannot write anything to a file, while I can create file successfully,</div><div>and can read an existing file successfully.</div><div><br></div><div><br></div><div>-------------->>>>> mount log (some "error like" message when write file ) <<<<<<----------------</div><div><br></div><div><div>[2014-09-28 07:41:28.585181] D [logging.c:1781:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5</div><div>[2014-09-28 07:42:42.433338] D [MSGID: 0] [dht-common.c:621:dht_revalidate_cbk] 0-myvol-dht: revalidate lookup of / returned with op_ret 0 and op_errno 117</div><div>[2014-09-28 07:43:19.487414] D [MSGID: 0] [dht-common.c:2182:dht_lookup] 0-myvol-dht: Calling fresh lookup for /file on myvol-replicate-1</div><div>[2014-09-28 07:43:19.558459] D [MSGID: 0] [dht-common.c:1818:dht_lookup_cbk] 0-myvol-dht: fresh_lookup returned for /file with op_ret -1 and op_errno 2</div><div>[2014-09-28 07:43:19.558497] I [dht-common.c:1822:dht_lookup_cbk] 0-myvol-dht: Entry /file missing on subvol myvol-replicate-1</div><div>[2014-09-28 07:43:19.558517] D [MSGID: 0] [dht-common.c:1607:dht_lookup_everywhere] 0-myvol-dht: winding lookup call to 2 subvols</div><div>[2014-09-28 07:43:19.634376] D [MSGID: 0] [dht-common.c:1413:dht_lookup_everywhere_cbk] 0-myvol-dht: returned with op_ret -1 and op_errno 2 (/file) from subvol myvol-replicate-0</div><div>[2014-09-28 07:43:19.634573] D [MSGID: 0] [dht-common.c:1413:dht_lookup_everywhere_cbk] 0-myvol-dht: returned with op_ret -1 and op_errno 2 (/file) from subvol myvol-replicate-1</div><div>[2014-09-28 07:43:19.634605] D [MSGID: 0] [dht-common.c:1086:dht_lookup_everywhere_done] 0-myvol-dht: STATUS: hashed_subvol myvol-replicate-1 cached_subvol null</div><div>[2014-09-28 07:43:19.634624] D [MSGID: 0] [dht-common.c:1147:dht_lookup_everywhere_done] 0-myvol-dht: There was no cached file and unlink on hashed is not skipped /file</div><div>[2014-09-28 07:43:19.634663] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/file: failed to resolve (No such file or directory)</div><div>[2014-09-28 07:43:19.708608] I [dht-common.c:1822:dht_lookup_cbk] 0-myvol-dht: Entry /file missing on subvol myvol-replicate-1</div><div>[2014-09-28 07:43:19.781420] D [logging.c:1937:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk</div><div>[2014-09-28 07:43:19.708640] D [MSGID: 0] [dht-common.c:1607:dht_lookup_everywhere] 0-myvol-dht: winding lookup call to 2 subvols</div><div>[2014-09-28 07:43:19.781418] D [MSGID: 0] [dht-common.c:1413:dht_lookup_everywhere_cbk] 0-myvol-dht: returned with op_ret -1 and op_errno 2 (/file) from subvol myvol-replicate-0</div><div>[2014-09-28 07:43:19.781629] D [MSGID: 0] [dht-common.c:1413:dht_lookup_everywhere_cbk] 0-myvol-dht: returned with op_ret -1 and op_errno 2 (/file) from subvol myvol-replicate-1</div><div>[2014-09-28 07:43:19.781653] D [MSGID: 0] [dht-common.c:1086:dht_lookup_everywhere_done] 0-myvol-dht: STATUS: hashed_subvol myvol-replicate-1 cached_subvol null</div><div>[2014-09-28 07:43:19.851925] I [dht-common.c:1822:dht_lookup_cbk] 0-myvol-dht: Entry /file missing on subvol myvol-replicate-1</div><div>[2014-09-28 07:43:19.851954] D [logging.c:1937:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk</div><div>The message "D [MSGID: 0] [dht-common.c:1818:dht_lookup_cbk] 0-myvol-dht: fresh_lookup returned for /file with op_ret -1 and op_errno 2" repeated 2 times between [2014-09-28 07:43:19.558459] and [2014-09-28 07:43:19.851922]</div><div>[2014-09-28 07:43:19.851954] D [MSGID: 0] [dht-common.c:1607:dht_lookup_everywhere] 0-myvol-dht: winding lookup call to 2 subvols</div><div>[2014-09-28 07:43:19.922764] D [MSGID: 0] [dht-common.c:1413:dht_lookup_everywhere_cbk] 0-myvol-dht: returned with op_ret -1 and op_errno 2 (/file) from subvol myvol-replicate-0</div><div>[2014-09-28 07:43:19.922925] D [MSGID: 0] [dht-common.c:1413:dht_lookup_everywhere_cbk] 0-myvol-dht: returned with op_ret -1 and op_errno 2 (/file) from subvol myvol-replicate-1</div><div>[2014-09-28 07:43:19.922974] D [fuse-resolve.c:83:fuse_resolve_entry_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001/file: failed to resolve (No such file or directory)</div><div>[2014-09-28 07:43:19.997012] D [logging.c:1937:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk</div><div>The message "D [MSGID: 0] [dht-common.c:1147:dht_lookup_everywhere_done] 0-myvol-dht: There was no cached file and unlink on hashed is not skipped /file" repeated 2 times between [2014-09-28 07:43:19.634624] and [2014-09-28 07:43:19.922951]</div><div>[2014-09-28 07:43:19.997011] D [MSGID: 0] [dht-diskusage.c:96:dht_du_info_cbk] 0-myvol-dht: subvolume 'myvol-replicate-0': avail_percent is: 99.00 and avail_space is: 44000304627712 and avail_inodes is: 99.00</div><div>[2014-09-28 07:43:19.997134] D [MSGID: 0] [dht-diskusage.c:96:dht_du_info_cbk] 0-myvol-dht: subvolume 'myvol-replicate-1': avail_percent is: 99.00 and avail_space is: 44000304627712 and avail_inodes is: 99.00</div><div>[2014-09-28 07:43:19.997179] D [afr-transaction.c:1166:afr_post_nonblocking_entrylk_cbk] 0-myvol-replicate-1: Non blocking entrylks done. Proceeding to FOP</div><div>[2014-09-28 07:43:20.067587] D [afr-lk-common.c:447:transaction_lk_op] 0-myvol-replicate-1: lk op is for a transaction</div><div>[2014-09-28 07:43:20.216287] D [afr-transaction.c:1116:afr_post_nonblocking_inodelk_cbk] 0-myvol-replicate-1: Non blocking inodelks done. Proceeding to FOP</div><div>[2014-09-28 07:43:20.356844] W [client-rpc-fops.c:850:client3_3_writev_cbk] 0-myvol-client-3: remote operation failed: Transport endpoint is not connected</div><div>[2014-09-28 07:43:20.356979] W [client-rpc-fops.c:850:client3_3_writev_cbk] 0-myvol-client-2: remote operation failed: Transport endpoint is not connected</div><div>[2014-09-28 07:43:20.357009] D [afr-lk-common.c:447:transaction_lk_op] 0-myvol-replicate-1: lk op is for a transaction</div><div>[2014-09-28 07:43:20.428013] W [fuse-bridge.c:1261:fuse_err_cbk] 0-glusterfs-fuse: 14: FLUSH() ERR => -1 (Transport endpoint is not connected)</div><div>[2014-09-28 07:43:28.593512] D [logging.c:1816:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed out. About to flush outstanding messages if present</div><div>The message "D [MSGID: 0] [dht-common.c:621:dht_revalidate_cbk] 0-myvol-dht: revalidate lookup of / returned with op_ret 0 and op_errno 117" repeated 5 times between [2014-09-28 07:42:42.433338] and [2014-09-28 07:43:19.487219]</div><div>The message "D [MSGID: 0] [dht-common.c:2182:dht_lookup] 0-myvol-dht: Calling fresh lookup for /file on myvol-replicate-1" repeated 2 times between [2014-09-28 07:43:19.487414] and [2014-09-28 07:43:19.781835]</div><div>[2014-09-28 07:43:19.922950] D [MSGID: 0] [dht-common.c:1086:dht_lookup_everywhere_done] 0-myvol-dht: STATUS: hashed_subvol myvol-replicate-1 cached_subvol null</div><div>[2014-09-28 07:43:28.593615] D [logging.c:1781:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5</div><div>[2014-09-28 07:44:26.517010] D [MSGID: 0] [dht-common.c:621:dht_revalidate_cbk] 0-myvol-dht: revalidate lookup of / returned with op_ret 0 and op_errno 117</div><div>[2014-09-28 07:44:26.950203] D [MSGID: 0] [dht-common.c:2108:dht_lookup] 0-myvol-dht: calling revalidate lookup for /file at myvol-replicate-1</div><div>[2014-09-28 07:44:27.022986] D [MSGID: 0] [dht-common.c:621:dht_revalidate_cbk] 0-myvol-dht: revalidate lookup of /file returned with op_ret 0 and op_errno 0</div></div><div><br></div><div><br></div><div><br></div><div>Pleasy give me some ideas to fix this problem. Thanks all !</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">2014-09-27 6:16 GMT+08:00 Justin Clift <span dir="ltr"><<a href="mailto:justin@gluster.org" target="_blank">justin@gluster.org</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 25/09/2014, at 10:29 AM, panpan feng wrote:<br>
> hi , Dear experts of gluster,<br>
> today I have met a problem. I install glusterfs 3.6 beta1 at client , and mount a volume which<br>
> servered by glusterfs 3.3 server, the mount operation success. And I can read file successfully. But<br>
> write operation will failed with error "E72 Close error on swap file". There are many strange message in log file<br>
><br>
><br>
> [2014-09-25 05:44:17.659510] W [graph.c:344:_log_if_unknown_option] 0-maintain4-quota: option 'timeout' is not recognized<br>
><br>
> [2014-09-25 05:45:26.022305] I [MSGID: 109018] [dht-common.c:715:dht_revalidate_cbk] 0-maintain4-dht: Mismatching layouts for /StorageReport, gfid = 00000000-0000-0000-0000-000000000000<br>
> [2014-09-25 05:45:26.022616] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-maintain4-dht: /StorageReport: Disk layout missing, gfid = 4f8abc71-b771-4fc3-b7fa-42ef0ea09dc5<br>
> [2014-09-25 05:45:26.022673] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-maintain4-dht: /StorageReport: Disk layout missing, gfid = 4f8abc71-b771-4fc3-b7fa-42ef0ea09dc5<br>
> [2014-09-25 05:45:26.022973] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-maintain4-dht: /StorageReport: Disk layout missing, gfid = 4f8abc71-b771-4fc3-b7fa-42ef0ea09dc5<br>
> [2014-09-25 05:45:26.023216] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-maintain4-dht: /StorageReport: Disk layout missing, gfid = 4f8abc71-b771-4fc3-b7fa-42ef0ea09dc5<br>
> [2014-09-25 05:45:26.023430] I [dht-layout.c:754:dht_layout_dir_mismatch] 0-maintain4-dht: /StorageReport: Disk layout missing, gfid = 4f8abc71-b771-4fc3-b7fa-42ef0ea09dc5<br>
> [2014-09-25 05:45:26.059022] I [afr-self-heal-metadata.c:41:__afr_selfheal_metadata_do] 0-maintain4-replicate-1: performing metadata selfheal on 4f8abc71-b771-4fc3-b7fa-42ef0ea09dc5<br>
> [2014-09-25 05:45:26.059208] I [afr-self-heal-metadata.c:41:__afr_selfheal_metadata_do] 0-maintain4-replicate-2: performing metadata selfheal on 4f8abc71-b771-4fc3-b7fa-42ef0ea09dc5<br>
> [2014-09-25 05:45:26.059238] I [afr-self-heal-metadata.c:41:__afr_selfheal_metadata_do] 0-maintain4-replicate-0: performing metadata selfheal on 4f8abc71-b771-4fc3-b7fa-42ef0ea09dc5<br>
> [2014-09-25 05:45:26.059281] I [afr-self-heal-metadata.c:41:__afr_selfheal_metadata_do] 0-maintain4-replicate-3: performing metadata selfheal on 4f8abc71-b771-4fc3-b7fa-42ef0ea09dc5<br>
> [2014-09-25 05:45:26.059594] I [afr-self-heal-metadata.c:41:__afr_selfheal_metadata_do] 0-maintain4-replicate-4: performing metadata selfheal on 4f8abc71-b771-4fc3-b7fa-42ef0ea09dc5<br>
> [2014-09-25 05:45:26.059783] I [afr-self-heal-metadata.c:41:__afr_selfheal_metadata_do] 0-maintain4-replicate-5: performing metadata selfheal on 4f8abc71-b771-4fc3-b7fa-42ef0ea09dc5<br>
> [2014-09-25 05:45:26.112882] I [dht-layout.c:663:dht_layout_normalize] 0-maintain4-dht: Found anomalies in /StorageReport (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0<br>
> [2014-09-25 05:45:26.112903] I [dht-selfheal.c:1065:dht_selfheal_layout_new_directory] 0-maintain4-dht: chunk size = 0xffffffff / 188817468 = 0x16<br>
> [2014-09-25 05:45:26.112915] I [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 0-maintain4-dht: assigning range size 0x294420dc to maintain4-replicate-4<br>
> [2014-09-25 05:45:26.112928] I [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 0-maintain4-dht: assigning range size 0x294420dc to maintain4-replicate-5<br>
> [2014-09-25 05:45:26.112937] I [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 0-maintain4-dht: assigning range size 0x294420dc to maintain4-replicate-0<br>
> [2014-09-25 05:45:26.112945] I [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 0-maintain4-dht: assigning range size 0x294420dc to maintain4-replicate-1<br>
> [2014-09-25 05:45:26.112952] I [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 0-maintain4-dht: assigning range size 0x294420dc to maintain4-replicate-2<br>
> [2014-09-25 05:45:26.112960] I [dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 0-maintain4-dht: assigning range size 0x294420dc to maintain4-replicate-3<br>
> The message "I [MSGID: 109018] [dht-common.c:715:dht_revalidate_cbk] 0-maintain4-dht: Mismatching layouts for /StorageReport, gfid = 00000000-0000-0000-0000-000000000000" repeated 5 times between [2014-09-25 05:45:26.022305] and [2014-09-25 05:45:26.023456]<br>
> [2014-09-25 05:45:26.130803] I [MSGID: 109036] [dht-common.c:6221:dht_log_new_layout_for_dir_selfheal] 0-maintain4-dht: Setting layout of /StorageReport with [Subvol_name: maintain4-replicate-0, Err: -1 , Start: 1384661432 , Stop: 2076992147 ], [Subvol_name: maintain4-replicate-1, Err: -1 , Start: 2076992148 , Stop: 2769322863 ], [Subvol_name: maintain4-replicate-2, Err: -1 , Start: 2769322864 , Stop: 3461653579 ], [Subvol_name: maintain4-replicate-3, Err: -1 , Start: 3461653580 , Stop: 4294967295 ], [Subvol_name: maintain4-replicate-4, Err: -1 , Start: 0 , Stop: 692330715 ], [Subvol_name: maintain4-replicate-5, Err: -1 , Start: 692330716 , Stop: 1384661431 ],<br>
><br>
> What Can I do to fix this problem?<br>
<br>
</div></div>Interesting. The guys will definitely want to look at this next week. :)<br>
<br>
Regards and best wishes,<br>
<br>
Justin Clift<br>
<br>
--<br>
GlusterFS - <a href="http://www.gluster.org" target="_blank">http://www.gluster.org</a><br>
<br>
An open source, distributed file system scaling to several<br>
petabytes, and handling thousands of clients.<br>
<br>
My personal twitter: <a href="http://twitter.com/realjustinclift" target="_blank">twitter.com/realjustinclift</a><br>
<br>
</blockquote></div><br></div>