<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<tt>Hi,<br>
<br>
[snip]</tt><br>
<blockquote cite="mid:4F283184.4050708@tid.es" type="cite">
<br>
However, after reading your mail, I wonder if Hadoop plugin for
gluster implements some location-based job scheduling similar to
the one in Hadoop on HDFS. I mean, in Hadoop on HDFS the JT
coordinates with the NN (which knows where every file block is
located withing the cluster), so each map task is scheduled to the
TT closest to the input they have to process (ideally,
collocated). In Hadoop on gluster I understand that there is no NN
equivalente, but is there any mean so JT can know which nodes in
the cluster have the actual data in their respective backend
filesystem so JT tries to schedule each map task to a TT in one of
these nodes? In negative case, how JT select the TT to schedule
each map task (round-robin, randomly, etc.)?<br>
<br>
Probably my question is very basic, but I haven't find a clear and
direct answer in the documentation, sorry...<br>
</blockquote>
<br>
<tt>The JT knows which part of the file is where by calling an API
that the GlusterFS plug-in implements.<br>
<br>
If you see the plug-in source, it extends the <b>FileSystem* </b>class.
So, the JT invokes an API that we implement (<b>getFileBlockLocations()**</b>),
and we give back the required info (file, offset, length) back to
JT. This helps it to decide which job to schedule to which TT
node. This API queries GlusterFS for the pathinfo extended
attribute (trusted.glusterfs.pathinfo) to get the required info.<br>
<br>
<br>
* </tt><tt><a class="moz-txt-link-freetext" href="https://github.com/gluster/hadoop-glusterfs/blob/master/glusterfs-hadoop/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java#L49">https://github.com/gluster/hadoop-glusterfs/blob/master/glusterfs-hadoop/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java#L49</a></tt><br>
<tt>**
<a class="moz-txt-link-freetext" href="https://github.com/gluster/hadoop-glusterfs/blob/master/glusterfs-hadoop/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java#L448">https://github.com/gluster/hadoop-glusterfs/blob/master/glusterfs-hadoop/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java#L448</a><br>
<br>
Thanks,<br>
-Venky</tt><br>
<br>
<blockquote cite="mid:4F283184.4050708@tid.es" type="cite">
<br>
Thanks!<br>
<br>
Best regards,<br>
<br>
------<br>
Fermín<br>
<br>
<hr>
<font color="Gray" face="Arial" size="1">Este mensaje se dirige
exclusivamente a su destinatario. Puede consultar nuestra
política de envío y recepción de correo electrónico en el enlace
situado más abajo.<br>
This message is intended exclusively for its addressee. We only
send and receive email on the basis of the terms set out at<br>
<a class="moz-txt-link-freetext" href="http://www.tid.es/ES/PAGINAS/disclaimer.aspx">http://www.tid.es/ES/PAGINAS/disclaimer.aspx</a><br>
</font>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Gluster-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>
<a class="moz-txt-link-freetext" href="http://gluster.org/cgi-bin/mailman/listinfo/gluster-users">http://gluster.org/cgi-bin/mailman/listinfo/gluster-users</a>
</pre>
</blockquote>
<br>
</body>
</html>