<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Also, in the logfiles on the clients, it looks like I get these types of messages whenever I try to access a file that is no longer accessible.<div><br></div><div><div>2009-06-11 07:58:24 E [fuse-bridge.c:675:fuse_fd_cbk] glusterfs-fuse: 22068570: /hourlogs/myDir0/1243432800.log =&gt; -1 (5)</div><div>2009-06-11 07:58:24 E [fuse-bridge.c:436:fuse_entry_cbk] glusterfs-fuse: 22068579: /hourlogs/myDir1/1243400400.log =&gt; -1 (116)</div><div>2009-06-11 07:58:24 E [unify.c:850:unify_open] unify: /hourlogs/myDir1/1243400400.log: entry_count is 3</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir1/1243400400.log: found on afr3</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir1/1243400400.log: found on afr2</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir1/1243400400.log: found on afr-ns</div><div>2009-06-11 07:58:24 E [fuse-bridge.c:675:fuse_fd_cbk] glusterfs-fuse: 22068580: /hourlogs/myDir1/1243400400.log =&gt; -1 (5)</div><div>2009-06-11 07:58:24 E [fuse-bridge.c:436:fuse_entry_cbk] glusterfs-fuse: 22068583: /hourlogs/myDir2/1243411200.log =&gt; -1 (116)</div><div>2009-06-11 07:58:24 E [unify.c:850:unify_open] unify: /hourlogs/myDir2/1243411200.log: entry_count is 3</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir2/1243411200.log: found on afr1</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir2/1243411200.log: found on afr3</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir2/1243411200.log: found on afr-ns</div><div>2009-06-11 07:58:24 E [fuse-bridge.c:675:fuse_fd_cbk] glusterfs-fuse: 22068584: /hourlogs/myDir2/1243411200.log =&gt; -1 (5)</div><div>2009-06-11 07:58:24 E [fuse-bridge.c:436:fuse_entry_cbk] glusterfs-fuse: 22068599: /hourlogs/myDir3/1243472400.log =&gt; -1 (116)</div><div>2009-06-11 07:58:24 E [unify.c:850:unify_open] unify: /hourlogs/myDir3/1243472400.log: entry_count is 3</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir3/1243472400.log: found on afr1</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir3/1243472400.log: found on afr3</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir3/1243472400.log: found on afr-ns</div><div>2009-06-11 07:58:24 E [fuse-bridge.c:675:fuse_fd_cbk] glusterfs-fuse: 22068600: /hourlogs/myDir3/1243472400.log =&gt; -1 (5)</div><div>2009-06-11 07:58:24 E [fuse-bridge.c:436:fuse_entry_cbk] glusterfs-fuse: 22068603: /hourlogs/myDir4/1243404000.log =&gt; -1 (116)</div><div>2009-06-11 07:58:24 E [unify.c:850:unify_open] unify: /hourlogs/myDir4/1243404000.log: entry_count is 3</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir4/1243404000.log: found on afr1</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir4/1243404000.log: found on afr-ns</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir4/1243404000.log: found on afr3</div><div>2009-06-11 07:58:24 E [fuse-bridge.c:675:fuse_fd_cbk] glusterfs-fuse: 22068604: /hourlogs/myDir5/1243404000.log =&gt; -1 (5)</div><div>2009-06-11 07:58:24 E [fuse-bridge.c:436:fuse_entry_cbk] glusterfs-fuse: 22068619: /hourlogs/myDir5/1243447200.log =&gt; -1 (116)</div><div>2009-06-11 07:58:24 E [unify.c:850:unify_open] unify: /hourlogs/myDir5/1243447200.log: entry_count is 4</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir5/1243447200.log: found on afr1</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir5/1243447200.log: found on afr3</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir5/1243447200.log: found on afr2</div><div>2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir5/1243447200.log: found on afr-ns</div><div>2009-06-11 07:58:24 E [fuse-bridge.c:675:fuse_fd_cbk] glusterfs-fuse: 22068620: /hourlogs/myDir5/1243447200.log =&gt; -1 (5)</div><div><br></div><div><div>On Jun 11, 2009, at 10:33 AM, Elbert Lai wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">elbert@host1:~$ dpkg -l|grep glusterfs<div>ii &nbsp;glusterfs-client &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1.3.8-0pre2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;GlusterFS fuse client</div><div>ii &nbsp;glusterfs-server &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1.3.8-0pre2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;GlusterFS fuse server</div><div>ii &nbsp;libglusterfs0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1.3.8-0pre2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;GlusterFS libraries and translator modules</div><div><br></div><div>I have 2 hosts set up to use AFR with the package versions listed above. I have been experiencing an issue where a file that is copied to glusterfs is readable/writable for a while, then at some point it time, it ceases to be. Trying to access it only retrieves the error message, "cannot open `filename' for reading: Input/output error".</div><div><br></div><div>Files enter glusterfs either via the "cp" command from a client or via "rsync". In the case of cp, the clients are all local and copying across a very fast connection. In the case of rsync, the 1 client is itself a gluster client. We are testing out a later version of gluster, and it rsync's across a vpn.</div><div><br></div><div><div>elbert@host2:~$ dpkg -l|grep glusterfs</div><div>ii &nbsp;glusterfs-client &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2.0.1-1 &nbsp; &nbsp; clustered file-system</div><div>ii &nbsp;glusterfs-server &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2.0.1-1 &nbsp; &nbsp; clustered file-system</div><div>ii &nbsp;libglusterfs0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2.0.1-1 &nbsp; &nbsp; GlusterFS libraries and translator modules</div><div>ii &nbsp;libglusterfsclient0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2.0.1-1 &nbsp; &nbsp; GlusterFS client library</div><div><br></div><div>=========</div><div>What causes files to become inaccessible? I read that fstat() had a bug in version 1.3.x whereas stat() did not, and that it was being worked on. Could this be related?</div><div><br></div><div>When a file becomes inaccessible, I have been manually removing the file from the mount point, then copying it back in via scp. Then the file becomes accessible. Below I've pasted a sample of what I'm seeing.</div><div><br></div><div><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div><div><a href="mailto:elbert@tool3.sc9.admob.com">elbert@tool3</a>.:hourlogs$ cd myDir</div><div>ls 1244682000.log</div><div><a href="mailto:elbert@tool3.sc9.admob.com">elbert@tool3</a>.:myDir$ ls 1244682000.log</div><div>1244682000.log</div><div><a href="mailto:elbert@tool3.sc9.admob.com">elbert@tool3</a>.:myDir$ stat 1244682000.log</div><div>&nbsp;&nbsp;File: `1244682000.log'</div><div>&nbsp;&nbsp;Size: 40265114 &nbsp;<span class="Apple-tab-span" style="white-space: pre; ">        </span>Blocks: 78744 &nbsp; &nbsp; &nbsp;IO Block: 4096 &nbsp; regular file</div><div>Device: 15h/21d<span class="Apple-tab-span" style="white-space: pre; ">        </span>Inode: 42205749 &nbsp; &nbsp;Links: 1</div><div>Access: (0755/-rwxr-xr-x) &nbsp;Uid: ( 1003/ &nbsp; elbert) &nbsp; Gid: ( 6000/ &nbsp; &nbsp; ops)</div><div>Access: 2009-06-11 02:25:10.000000000 +0000</div><div>Modify: 2009-06-11 02:26:02.000000000 +0000</div><div>Change: 2009-06-11 02:26:02.000000000 +0000</div><div><a href="mailto:elbert@tool3.sc9.admob.com">elbert@tool3</a>.:myDir$ tail 1244682000.log</div><div>tail: cannot open `1244682000.log' for reading: Input/output error</div></div></div></div></blockquote><br></div><div>At this point, I am able to rm the file. Then, if I scp it back in, I am able to successfully tail it.</div><div><br></div><div>So,</div><div><br></div><div>I have observed cases where the files had a Size of 0, and otherwise they were in the same state. I'm not totally certain, but it looks like if a file gets into this state from rsync, either it gets deposited in this state immediately (before I try to read it), or else it quickly enters this state. Speaking generally, file sizes tend to be several MB up to 150 MB.</div><div><br></div><div>Here's my server config:</div><div><div># Gluster Server configuration /etc/glusterfs/glusterfs-server.vol</div><div># Configured for AFR &amp; Unify features</div><div><br></div><div>volume brick</div><div>&nbsp;type storage/posix&nbsp;</div><div>&nbsp;option directory /var/gluster/data/</div><div>end-volume</div><div><br></div><div>volume brick-ns</div><div>&nbsp;type storage/posix</div><div>&nbsp;option directory /var/gluster/ns/</div><div>end-volume</div><div><br></div><div>volume server</div><div>&nbsp;type protocol/server</div><div>&nbsp;option transport-type tcp/server</div><div>&nbsp;subvolumes brick brick-ns</div><div>&nbsp;option auth.ip.brick.allow 165.193.245.*,10.11.*&nbsp;</div><div>&nbsp;option auth.ip.brick-ns.allow 165.193.245.*,10.11.*</div><div>end-volume</div><div><br></div><div>Here's my client config:</div></div><div><div># Gluster Client configuration /etc/glusterfs/glusterfs-client.vol</div><div># Configured for AFR &amp; Unify features</div><div><br></div><div>volume brick1</div><div>&nbsp;type protocol/client</div><div>&nbsp;option transport-type tcp/client &nbsp; &nbsp; # for TCP/IP transport</div><div>&nbsp;option remote-host 10.11.16.68 &nbsp; &nbsp;# IP address of the remote brick</div><div>&nbsp;option remote-subvolume brick &nbsp; &nbsp; &nbsp; &nbsp;# name of the remote volume</div><div>end-volume</div><div><br></div><div>volume brick2</div><div>&nbsp;type protocol/client</div><div>&nbsp;option transport-type tcp/client</div><div>&nbsp;option remote-host 10.11.16.71</div><div>&nbsp;option remote-subvolume brick</div><div>end-volume</div><div><br></div><div>volume brick3</div><div>&nbsp;type protocol/client</div><div>&nbsp;option transport-type tcp/client</div><div>&nbsp;option remote-host 10.11.16.69</div><div>&nbsp;option remote-subvolume brick</div><div>end-volume</div><div><br></div><div>volume brick4</div><div>&nbsp;type protocol/client</div><div>&nbsp;option transport-type tcp/client</div><div>&nbsp;option remote-host 10.11.16.70</div><div>&nbsp;option remote-subvolume brick</div><div>end-volume</div><div><br></div><div>volume brick5</div><div>&nbsp;type protocol/client</div><div>&nbsp;option transport-type tcp/client</div><div>&nbsp;option remote-host 10.11.16.119</div><div>&nbsp;option remote-subvolume brick</div><div>end-volume</div><div><br></div><div>volume brick6</div><div>&nbsp;type protocol/client</div><div>&nbsp;option transport-type tcp/client</div><div>&nbsp;option remote-host 10.11.16.120</div><div>&nbsp;option remote-subvolume brick</div><div>end-volume</div><div><br></div><div>volume brick-ns1</div><div>&nbsp;type protocol/client</div><div>&nbsp;option transport-type tcp/client</div><div>&nbsp;option remote-host 10.11.16.68</div><div>&nbsp;option remote-subvolume brick-ns &nbsp;# Note the different remote volume name.</div><div>end-volume</div><div><br></div><div>volume brick-ns2</div><div>&nbsp;type protocol/client</div><div>&nbsp;option transport-type tcp/client</div><div>&nbsp;option remote-host 10.11.16.71</div><div>&nbsp;option remote-subvolume brick-ns &nbsp;# Note the different remote volume name.</div><div>end-volume</div><div><br></div><div>volume afr1</div><div>&nbsp;type cluster/afr</div><div>&nbsp;subvolumes brick1 brick2</div><div>end-volume</div><div><br></div><div>volume afr2</div><div>&nbsp;type cluster/afr</div><div>&nbsp;subvolumes brick3 brick4</div><div>end-volume</div><div><br></div><div>volume afr3</div><div>&nbsp;type cluster/afr</div><div>&nbsp;subvolumes brick5 brick6</div><div>end-volume</div><div><br></div><div>volume afr-ns</div><div>&nbsp;type cluster/afr</div><div>&nbsp;subvolumes brick-ns1 brick-ns2</div><div>end-volume</div><div><br></div><div>volume unify</div><div>&nbsp;type cluster/unify</div><div>&nbsp;subvolumes afr1 afr2 afr3&nbsp;</div><div>&nbsp;option namespace afr-ns</div><div><br></div><div>&nbsp;# use the ALU scheduler</div><div>&nbsp;option scheduler alu &nbsp;&nbsp;</div><div><br></div><div>&nbsp;# This option makes brick5 to be readonly, where no new files are created.</div><div>&nbsp;##option alu.read-only-subvolumes brick5##&nbsp;</div><div><br></div><div>&nbsp;# Don't create files one a volume with less than 5% free diskspace</div><div>&nbsp;option alu.limits.min-free-disk &nbsp;10% &nbsp; &nbsp; &nbsp;</div><div><br></div><div>&nbsp;# Don't create files on a volume with more than 10000 files open</div><div>&nbsp;option alu.limits.max-open-files 10000 &nbsp;&nbsp;</div><div>&nbsp;&nbsp;</div><div>&nbsp;# When deciding where to place a file, first look at the disk-usage, then at &nbsp;</div><div>&nbsp;# read-usage, write-usage, open files, and finally the disk-speed-usage.</div><div>&nbsp;option alu.order disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage</div><div><br></div><div>&nbsp;# Kick in if the discrepancy in disk-usage between volumes is more than 2GB</div><div>&nbsp;option alu.disk-usage.entry-threshold 2GB &nbsp;&nbsp;</div><div><br></div><div>&nbsp;# Don't stop writing to the least-used volume until the discrepancy is 1988MB&nbsp;</div><div>&nbsp;option alu.disk-usage.exit-threshold &nbsp;60MB &nbsp;&nbsp;</div><div><br></div><div>&nbsp;# Kick in if the discrepancy in open files is 1024</div><div>&nbsp;option alu.open-files-usage.entry-threshold 1024 &nbsp;&nbsp;</div><div><br></div><div>&nbsp;# Don't stop until 992 files have been written the least-used volume</div><div>&nbsp;option alu.open-files-usage.exit-threshold 32 &nbsp;&nbsp;</div><div><br></div><div>&nbsp;# Kick in when the read-usage discrepancy is 20%</div><div>&nbsp;option alu.read-usage.entry-threshold 20% &nbsp;&nbsp;</div><div><br></div><div>&nbsp;# Don't stop until the discrepancy has been reduced to 16% (20% - 4%)</div><div>&nbsp;option alu.read-usage.exit-threshold 4%</div><div><br></div><div>&nbsp;# Kick in when the write-usage discrepancy is 20%</div><div>&nbsp;option alu.write-usage.entry-threshold 20%</div><div><br></div><div>## Don't stop until the discrepancy has been reduced to 16%</div><div>&nbsp;option alu.write-usage.exit-threshold 4% &nbsp;&nbsp;</div><div><br></div><div>&nbsp;# Refresh the statistics used for decision-making every 10 seconds</div><div>&nbsp;option alu.stat-refresh.interval 10sec &nbsp;&nbsp;</div><div><br></div><div># Refresh the statistics used for decision-making after creating 10 files</div><div># option alu.stat-refresh.num-file-create 10 &nbsp;&nbsp;</div><div>end-volume</div><div><br></div><div><br></div><div>#writebehind improves write performance a lot</div><div>volume writebehind &nbsp;&nbsp;</div><div>&nbsp;&nbsp;type performance/write-behind</div><div>&nbsp;&nbsp;option aggregate-size 131072 # in bytes</div><div>&nbsp;&nbsp;subvolumes unify</div><div>end-volume</div><div><br></div></div><div>Has anyone seen this issue before? Any suggestions?</div><div><br></div><div>Thanks,</div><div>-elb-</div></div></div>_______________________________________________<br>Gluster-users mailing list<br><a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users<br></blockquote></div><br></div></body></html>