<br><br><div class="gmail_quote">On Tue, Jan 15, 2013 at 4:29 AM, Raghavendra Gowdappa <span dir="ltr"><<a href="mailto:rgowdapp@redhat.com" target="_blank">rgowdapp@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5"><br>
<br>
----- Original Message -----<br>
> From: "Anand Avati" <<a href="mailto:aavati@redhat.com">aavati@redhat.com</a>><br>
> To: "Amar Tumballi" <<a href="mailto:atumball@redhat.com">atumball@redhat.com</a>><br>
> Cc: <a href="mailto:bharata@linux.vnet.ibm.com">bharata@linux.vnet.ibm.com</a>, <a href="mailto:gluster-devel@nongnu.org">gluster-devel@nongnu.org</a>, "Raghavendra Gowdappa" <<a href="mailto:rgowdapp@redhat.com">rgowdapp@redhat.com</a>><br>
> Sent: Thursday, January 10, 2013 12:20:09 PM<br>
> Subject: Re: [Gluster-devel] zero-copy readv<br>
><br>
> On 01/09/2013 10:37 PM, Amar Tumballi wrote:<br>
> ><br>
> >><br>
> >> - On the read side things are a little more complicated. In<br>
> >> rpc-transport/socket, there is a call to iobuf_get() to create a<br>
> >> new<br>
> >> iobuf for reading in the readv reply data from the server. We will<br>
> >> need<br>
> >> a framework changes where, if the readv request (of the xid for<br>
> >> which<br>
> >> readv reply is being handled) happened to be a "direct" variant<br>
> >> (i.e,<br>
> >> zero-copy), then the "special iobuf around user's memory" gets<br>
> >> picked up<br>
> >> and read() from socket is performed directly into user's memory.<br>
> >> Similar, but equivalent, changes will have to be done in RDMA<br>
> >> (Raghavendra on CC can help). Since the goal is to avoid memory<br>
> >> copy,<br>
> >> this data will be bypassing io-cache (and purging pre-cached data<br>
> >> of<br>
> >> those regions along the way).<br>
> >><br>
> ><br>
> > On the read side too, our client protocol is designed to handle<br>
> > 0-copy<br>
> > already, ie, if the fop comes with an iobuf/iobref, then the same<br>
> > buffer<br>
> > is used for copying the received data from network.<br>
> > (client_submit_request() is designed to handle this). [1]<br>
> ><br>
> > We made all these changes to make RDMA 0-copy a possibility, so<br>
> > even<br>
> > RDMA transport should be already 0-copy friendly.<br>
> ><br>
> > Thats my understanding.<br>
> ><br>
> > Regards,<br>
> > Amar<br>
> ><br>
> > [1] - recent patches to handle RPC read-ahead may involve small<br>
> > data<br>
> > copy from header to data buffer, but surely not very high.<br>
> ><br>
><br>
> Amar - note that the current infrastructure present for 0-copy RDMA<br>
> might not be sufficient for GFAPI's 0-copy. A glfs_readv() request<br>
> from<br>
> the app can come as a vector of memory pointers (and not a contiguous<br>
> iobuf) and therefore require storing an iovec/count as well. This<br>
> might<br>
> also mean we need to exercise the scatter-gather aspects of the verbs<br>
> API.<br>
<br>
</div></div>If we pass user supplied vectors as write chunks to server, it will do rdma-writes to memory regions pointed by those vectors. So, I think there are no major changes required to rdma as well.</blockquote><div>
<br></div><div>I wasn't sure if the client-side interface b/w protocol/client and rpc-transport/rdma was doing everything right even though the rdma transport itself had the capability. I guess that is probably what you mentioned as "If we pass user supplied vectors..".</div>
<div><br></div><div>Avati</div></div>