<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 06/12/2014 11:13 PM, Anand Avati
wrote:<br>
</div>
<blockquote
cite="mid:CAFboF2x_CxfUrseO4=A=05ocKD-aLtwqKe+KcAh01u=fC78+kQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Thu, Jun 12, 2014 at 10:33 AM,
Vijay Bellur <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:vbellur@redhat.com" target="_blank">vbellur@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb">
<div class="h5">On 06/12/2014 06:52 PM, Ravishankar N
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Vijay,<br>
<br>
Since glusterfs 3.5, posix_lookup() sends ESTALE
instead of ENOENT [1]<br>
when when a parent gfid (entry) is not present on
the brick . In a<br>
replicate set up, this causes a problem because AFR
gives more priority<br>
to ESTALE than ENOENT, causing IO to fail [2]. The
fix is in progress at<br>
[3] and is client-side specific , and would make it
to 3.5.2<br>
<br>
But we will still hit the problem when rolling
upgrade is performed from<br>
3.4 to 3.5, unless the clients are also upgraded to
3.5: To elaborate<br>
an example:<br>
<br>
0) Create a 1x2 volume using 2 nodes and mount it
from client. All<br>
machines are glusterfs 3.4<br>
1) Perform for i in {1..30}; do mkdir $i; tar xf
glusterfs-3.5git.tar.gz<br>
-C $i& done<br>
2) While this is going on, kill one of the node in
the replica pair and<br>
upgrade it to glusterfs 3.5 (simulating rolling
upgrade)<br>
3) After a while, kill all tar processes<br>
4) Create a backup directory and move all 1..30 dirs
inside 'backup'<br>
5) Start the untar processes in 1) again<br>
6) Bring up the upgraded node. Tar fails with estale
errors.<br>
<br>
Essentially the errors occur because [3] is a client
side fix. But<br>
rolling upgrades are targeted at servers while the
older clients still<br>
need to access them without issues.<br>
<br>
A solution is to have a fix in the posix translator
wherein the newer<br>
client passes it's version (3.5) to posix_lookup()
which then sends<br>
ESTALE if version is 3.5 or newer but sends ENOENT
instead if it is an<br>
older client. Does this seem okay?<br>
<br>
</blockquote>
<br>
</div>
</div>
Cannot think of a better solution to this. Seamless
rolling upgrades are necessary for us and the proposed fix
does seem okay for that reason.<br>
<br>
Thanks,<br>
Vijay
<div class="HOEnZb">
<div class="h5"><br>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>I also like Justin's proposal, of having fixes in 3.4.X
and requiring clients to be at least 3.4.X in order to
have rolling upgrade to 3.5.Y. This way we can add the
"special fix" in 3.4.X client (just like the 3.5.2
client). Ravi's proposal "works", but all LOOKUPs will
have an extra xattr, and we will be carrying forward the
compat code burden for a very long time. Whereas a 3.4.X
client fix will remain in 3.4 branch.</div>
<div><br>
</div>
<div>Thanks</div>
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
I have sent a fix for review (<a class="moz-txt-link-freetext" href="http://review.gluster.org/#/c/8080/">http://review.gluster.org/#/c/8080/</a>) .
The change is in the server side only. I reckon if we are asking
users to upgrade clients to a 3.4.x which anyway involves app
downtime, we might as well ask them to upgrade to 3.5. <br>
<br>
The fix is only sent on 3.5 - it does not need to go to master as I
understand from Pranith that we only support compatibility between
the current two releases. (meaning 3.6 servers require clients to be
at at least 3.5 and not lower).<br>
<br>
Regards,<br>
Ravi<br>
<br>
</body>
</html>