<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<div class="moz-cite-prefix">On 06/06/2014 12:11 AM, Anand Avati
wrote:<br>
</div>
<blockquote
cite="mid:CAFboF2xdvBB+1xZnEwzc=vFWiG2gC5jB+Mji+_xPkXedUkhSYg@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra">On Thu, Jun 5, 2014 at 10:46 AM,
Krutika Dhananjay <span dir="ltr"><<a
moz-do-not-send="true" href="mailto:kdhananj@redhat.com"
target="_blank">kdhananj@redhat.com</a>></span> wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div>
<div style="font-size:12pt">
<blockquote
style="border-left-width:2px;border-left-style:solid;border-left-color:rgb(16,16,255);margin-left:5px;padding-left:5px;font-weight:normal;font-style:normal;text-decoration:none;font-size:12pt">
<div class="h5"><font color="#000000"
face="Helvetica, Arial, sans-serif">To
summarize, the real "problems" are:</font><br>
<div
style="color:rgb(0,0,0);font-family:Helvetica,Arial,sans-serif"><br>
</div>
<font color="#000000" face="Helvetica, Arial,
sans-serif">- Deref of pl_inode->refkeeper as
an inode_t in the cleanup logger. It </font><br>
<font color="#000000" face="Helvetica, Arial,
sans-serif">is an internal artifact of
pl_update_refkeeper() working and nobody else </font><br>
<font color="#000000" face="Helvetica, Arial,
sans-serif">is supposed to "touch" it.</font></div>
</blockquote>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">For this, the solution I have in
mind is to</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
a. have a placeholder for gfid in pl_inode_t object,</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">b. remember the gfid of the inode
at the time of pl_inode_t creation in pl_ctx_get(),
and</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">c. print pl_inode->gfid in the
log message in pl_{inodelk,entrylk}_log_cleanup().<br>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div style="">
OK.</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div>
<div style="font-size:12pt">
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
</div>
<div class=""
style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
<blockquote
style="border-left-width:2px;border-left-style:solid;border-left-color:rgb(16,16,255);margin-left:5px;padding-left:5px;color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt">
<div><span style="font-size:12pt">- Missing call
to pl_update_refkeeper() in
client_disconnect_cbk(). This </span><br>
</div>
can result in a potential leak of inode ref (not a
leak if the same <br>
inode gets any locking activity by another client
in the future).<br>
</blockquote>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">As far as updates to refkeeper is
concerned during DISCONNECT is concerned, Pranith
and I did have discussions at length but could<br>
not find a single safe place in cleanup functions to
call pl_update_refkeeper() that does not lead to
illegal memory access,</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">whether before or after the call
to __pl_inodelk_unref() in the third for loop.</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif"><br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">Case #1 explained:<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
=============<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif"><br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif"><snip><br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
list_for_each_entry_safe() {<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">...<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">...<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">pl_update_refkeeper(inode);<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">pthread_mutex_lock
(&pl_inode->mutex);<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif"> __pl_inodelk_unref(l);<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">pthread_mutex_unlock
(&pl_inode->mutex);<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif"></snip><br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif"><br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
This causes the last unref in pl_update_refkeeper()
to implicitly call pl_forget() where pl_inode_t
object is destroyed while it is<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
still needed in terms of having to obtain
pl_inode->mutex before unrefing the lock object.<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif"><br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
Case 2 explained:<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">=============<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
<div><snip><br>
</div>
<div>list_for_each_entry_safe() {<br>
</div>
<div>...<br>
</div>
<div>...<br>
</div>
<div>pthread_mutex_lock (&pl_inode->mutex);<br>
</div>
<div> __pl_inodelk_unref(l);<br>
</div>
<div>pthread_mutex_unlock (&pl_inode->mutex);<br>
</div>
<div>pl_update_refkeeper(inode);</div>
</snip></div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif"><br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
After the first for loop is processed, the domain's
lists could have been emptied. And at this point,
there could well come a second thread that could
update refkeeper,<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
triggering last unref on the inode leading to a call
to pl_forget() (where pl_inode_t is freed), after
which the DISCONNECT thread would be trying to
perform illegal<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
memory access in its call to pl_update_refkeeper()
during its turn.<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif"><br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
So the solution Pranith and I came up with involves
making sure that the inode_t object is alive for as
long as there is atleast one lock on it:<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">1. refkeeper will not be used for
inodelks and entrylks (but will continue to be used
in fcntl locks).<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
2. Each pl_inode_t object will also host an inode_t
member which is initialised during a call to
pl_inode_get() in pl_common_{inodelk, entrylks}().<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
3. Everytime a lock is granted/blocked,
pl_inode->inode is ref'd (in pl_inode_setlk()
after successful locking.<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">4. Everytime a lock is unlocked,
pl_inode->inode is unref'd.<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">5. In disconnect codepath, as
part of "released" list processing, the inodes
associated with the client are unref'd in the same
loop right after every lock object is unref'd.</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif"><br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">With all this, there is one
problem with the lock object that is found to be in
both blocked and granted list in the transient
state, when there's a race between the unlocking
thread<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">and the disconnecting thread.
This can be best explained with the following
example:<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
<example><br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">Consider an inode I on which
there's a granted lock G and a blocked lock B, from
clients C1 and C2 respectively. With this approach,
at this point the number of refs taken<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">by the locks xlator on I would be
2.<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">C1 unlocks B, at which point I's
refcount becomes 1.<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">Now as part of granting other
blocked locks in unlock codepath, B is put in
granted list.<br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
Now B's client C2 sends a DISCONNECT event, which
would cause I to be unref'd again. This being the
last unref, pl_forget() is called on I causing its
pl_inode to be destroyed</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
At this point, the unlocking thread could try to do
a mutex lock on pl_inode->mutex in
grant_blocked_{inode,entry}_locks() (whereas
pl_inode is now already destroyed) leading to a
crash.</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">
</example><br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif"><br>
</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">The problem described in the
example can be fixed by making sure
grant_blocked_{inode,entry}_locks() refs the inode
first thing and then unrefs it just before
returning.</div>
<div style="color:rgb(0,0,0);font-family:garamond,'new
york',times,serif">This would fix illegal memory
access. </div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div style="">This sounds a bit complicated. I think there
is a much simpler solution:</div>
<div style=""><br>
</div>
<div style="">- First, make update_refkeeper() check for
blocked locks (which I mentioned as "optional" previously)</div>
<div style=""><br>
</div>
<div style="">- Make grant_blocked_locks() double up and do
the job of update_refkeeper() internally.</div>
<div style=""><br>
</div>
<div style="">Something which looks like this:</div>
<div style=""><br>
</div>
<div style="">
<div>
<div>diff --git a/xlators/features/locks/src/common.c
b/xlators/features/locks/src/common.c</div>
<div>index f6c71c1..38df385 100644</div>
<div>--- a/xlators/features/locks/src/common.c</div>
<div>+++ b/xlators/features/locks/src/common.c</div>
<div>@@ -126,8 +126,14 @@ __pl_inode_is_empty
(pl_inode_t *pl_inode)</div>
<div> if (!list_empty
(&dom->entrylk_list))</div>
<div> is_empty = 0;</div>
<div> </div>
<div>+ if (!list_empty
(&dom->blocked_entrylks))</div>
<div>+ is_empty = 0;</div>
<div>+</div>
<div> if (!list_empty
(&dom->inodelk_list))</div>
<div> is_empty = 0;</div>
<div>+</div>
<div>+ if (!list_empty
(&dom->blocked_inodelks))</div>
<div>+ is_empty = 0;</div>
<div> }</div>
<div> </div>
<div>
return is_empty;</div>
<div>@@ -944,12 +950,18 @@ grant_blocked_locks (xlator_t
*this, pl_inode_t *pl_inode)</div>
<div> struct list_head granted_list;</div>
<div> posix_lock_t *tmp = NULL;</div>
<div> posix_lock_t *lock = NULL;</div>
<div>+ inode_t *unref = NULL;</div>
<div> </div>
<div> INIT_LIST_HEAD (&granted_list);</div>
<div> </div>
<div> pthread_mutex_lock
(&pl_inode->mutex);</div>
<div> {</div>
<div> __grant_blocked_locks (this,
pl_inode, &granted_list);</div>
<div>+</div>
<div>+ if (__pl_inode_is_empty (pl_inode)
&& pl_inode->refkeeper) {</div>
<div>
+ unref =
pl_inode->refkeeper;</div>
<div>+ pl_inode->refkeeper =
NULL;</div>
<div>+ }</div>
<div> }</div>
<div> pthread_mutex_unlock
(&pl_inode->mutex);</div>
<div> </div>
<div>@@ -965,6 +977,9 @@ grant_blocked_locks (xlator_t
*this, pl_inode_t *pl_inode)</div>
<div> GF_FREE (lock);</div>
<div> }</div>
<div> </div>
<div>+ if (unref)</div>
<div>+ inode_unref (unref);</div>
<div>+</div>
<div> return;</div>
<div> }</div>
</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<blockquote
cite="mid:CAFboF2xdvBB+1xZnEwzc=vFWiG2gC5jB+Mji+_xPkXedUkhSYg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div style="">
<div><br>
</div>
<div style="">This should make pl_disconnect_cbk() pretty
much race free w.r.t refkpeer. Thoughts?</div>
</div>
</div>
</div>
</div>
</blockquote>
Lets say C1 is doing pl_inodelk_client_cleanup. After the second
for-loop(All granted and blocked locks are out of the domain) if an
unlock on the final granted lock on that inode from client C2
completes, refkeeper would be set to NULL and unrefed leading to
zero refs on that inode i.e. pl_forget will also happen. In 3rd
for-loop pl_inode is already freed and leads to free'd memory access
and will crash.<br>
<br>
Pranith<br>
<blockquote
cite="mid:CAFboF2xdvBB+1xZnEwzc=vFWiG2gC5jB+Mji+_xPkXedUkhSYg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div style="">
<div style=""><br>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Gluster-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-devel@gluster.org">Gluster-devel@gluster.org</a>
<a class="moz-txt-link-freetext" href="http://supercolony.gluster.org/mailman/listinfo/gluster-devel">http://supercolony.gluster.org/mailman/listinfo/gluster-devel</a>
</pre>
</blockquote>
<br>
</body>
</html>