Brian,<br>  You are right, today we hardly leverage the page cache in the kernel. When Gluster started and performance translators were implemented, the fuse invalidation support did not exist, and since that support was brought in upstream fuse we haven&#39;t leveraged that effectively. We can actually do a lot more smart things using the invalidation changes.<br>

<br>For the consistency concerns where an open fd continues to refer to local page cache - if that is a problem, today you need to mount with --enable-direct-io-mode to bypass the page cache altogether (this is very different from O_DIRECT open() support). On the other hand, to utilize the fuse invalidation APIs and promote using the page cache and still be consistent, we need to gear up glusterfs framework by first implementing server originated messaging support, then build some kind of opportunistic locking or leases to notify glusterfs clients about modifications from a second client, and third implement hooks in the client side listener to do things like sending fuse invalidations or purge pages in io-cache or flush pending writes in write-behind etc. This needs to happen, but we&#39;re short on resources to prioritize this sooner :-)<br>

<br>Avati<br><br><div class="gmail_quote">On Wed, May 30, 2012 at 8:16 AM, Brian Foster <span dir="ltr">&lt;<a href="mailto:bfoster@redhat.com" target="_blank">bfoster@redhat.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi all,<br>

<br>

I&#39;ve been playing with a little hack recently to add a gluster mount<br>

option to support FOPEN_KEEP_CACHE and I wanted to solicit some thoughts<br>

on whether there&#39;s value to find an intelligent way to support this<br>

functionality. To provide some context:<br>

<br>

Our current behavior with regard to fuse is that page cache is utilized<br>

by fuse, from what I can tell, just about in the same manner as a<br>

typical local fs. The primary difference is that by default, the address<br>

space mapping for an inode is completely invalidated on open. So for<br>

example, if process A opens and reads a file in a loop, subsequent reads<br>

are served from cache (bypassing fuse and gluster). If process B steps<br>

in and opens the same file, the cache is flushed and the next reads from<br>

either process are passed down through fuse. The FOPEN_KEEP_CACHE option<br>

simply disables this cache flash on open behavior.<br>

<br>

The following are some notes on my experimentation thus far:<br>

<br>

- With FOPEN_KEEP_CACHE, fuse currently only invalidates on file size<br>

changes. This is a problem in that I can rewrite some or all of a file<br>

from another client and the cached client wouldn&#39;t notice. I&#39;ve sent a<br>

patch to fuse-devel to also invalidate on mtime changes (similar to<br>

nfsv3 or cifs), so we&#39;ll see how well that is received. fuse also<br>

supports a range based invalidation notification that we could take<br>

advantage of if necessary.<br>

<br>

- I reproduce a measurable performance benefit in the local/cached read<br>

situation. For example, running a kernel compile against a source tree<br>

in a gluster volume (no other xlators and build output to local storage)<br>

improves to 6 minutes from just under 8 minutes with the default graph<br>

(9.5 minutes with only the client xlator and 1:09 locally).<br>

<br>

- Some of the specific differences from current io-cache caching:<br>

        - io-cache supports time based invalidation and tunables such   as cache<br>

size and priority. The page cache has no such controls.<br>

        - io-cache invalidates more frequently on various fops. It also looks<br>

like we invalidate on writes and don&#39;t take advantage of the write data<br>

most recently sent, whereas page cache writes are cached (errors<br>

notwithstanding).<br>

        - Page cache obviously has tighter integration with the system (i.e.,<br>

drop_caches controls, more specific reporting, ability to drop cache<br>

when memory is needed).<br>

<br>

All in all, I&#39;m curious what people think about enabling the cache<br>

behavior in gluster. We could support anything from the basic mount<br>

option I&#39;m currently using (i.e., similar to attribute/dentry caching)<br>

to something integrated with io-cache (doing invalidations when<br>

necessary), or maybe even something eventually along the lines of the<br>

nfs weak cache consistency model where it validates the cache after<br>

every fop based on file attributes.<br>

<br>

In general, are there other big issues/questions that would need to be<br>

explored before this is useful (i.e., the size invalidation issue)? Are<br>

there other performance tests that should be explored? Thoughts<br>

appreciated. Thanks.<br>

<br>

Brian<br>

<br>

_______________________________________________<br>

Gluster-devel mailing list<br>

<a href="mailto:Gluster-devel@nongnu.org">Gluster-devel@nongnu.org</a><br>

<a href="https://lists.nongnu.org/mailman/listinfo/gluster-devel" target="_blank">https://lists.nongnu.org/mailman/listinfo/gluster-devel</a><br>

</blockquote></div><br>