<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    <tt>Hi,<br>

      <br>

      [snip]</tt><br>

    <blockquote cite="mid:4F283184.4050708@tid.es" type="cite">

      <br>

      However, after reading your mail, I wonder if Hadoop plugin for

      gluster implements some location-based job scheduling similar to

      the one in Hadoop on HDFS. I mean, in Hadoop on HDFS the JT

      coordinates with the NN (which knows where every file block is

      located withing the cluster), so each map task is scheduled to the

      TT closest to the input they have to process (ideally,

      collocated). In Hadoop on gluster I understand that there is no NN

      equivalente, but is there any mean so JT can know which nodes in

      the cluster have the actual data in their respective backend

      filesystem so JT tries to schedule each map task to a TT in one of

      these nodes? In negative case, how JT select the TT to schedule

      each map task (round-robin, randomly, etc.)?<br>

      <br>

      Probably my question is very basic, but I haven't find a clear and

      direct answer in the documentation, sorry...<br>

    </blockquote>

    <br>

    <tt>The JT knows which part of the file is where by calling an API

      that the GlusterFS plug-in implements.<br>

      <br>

      If you see the plug-in source, it extends the <b>FileSystem* </b>class.

      So, the JT invokes an API that we implement (<b>getFileBlockLocations()**</b>),

      and we give back the required info (file, offset, length) back to

      JT. This helps it to decide which job to schedule to which TT

      node. This API queries GlusterFS for the pathinfo extended

      attribute (trusted.glusterfs.pathinfo) to get the required info.<br>

      <br>

      <br>

      *&nbsp; </tt><tt><a class="moz-txt-link-freetext" href="https://github.com/gluster/hadoop-glusterfs/blob/master/glusterfs-hadoop/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java#L49">https://github.com/gluster/hadoop-glusterfs/blob/master/glusterfs-hadoop/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java#L49</a></tt><br>

    <tt>**

<a class="moz-txt-link-freetext" href="https://github.com/gluster/hadoop-glusterfs/blob/master/glusterfs-hadoop/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java#L448">https://github.com/gluster/hadoop-glusterfs/blob/master/glusterfs-hadoop/src/main/java/org/apache/hadoop/fs/glusterfs/GlusterFileSystem.java#L448</a><br>

      <br>

      Thanks,<br>

      -Venky</tt><br>

    <br>

    <blockquote cite="mid:4F283184.4050708@tid.es" type="cite">

      <br>

      Thanks!<br>

      <br>

      Best regards,<br>

      <br>

      ------<br>

      Ferm&iacute;n<br>

      <br>

      <hr>

      <font color="Gray" face="Arial" size="1">Este mensaje se dirige

        exclusivamente a su destinatario. Puede consultar nuestra

        pol&iacute;tica de env&iacute;o y recepci&oacute;n de correo electr&oacute;nico en el enlace

        situado m&aacute;s abajo.<br>

        This message is intended exclusively for its addressee. We only

        send and receive email on the basis of the terms set out at<br>

        <a class="moz-txt-link-freetext" href="http://www.tid.es/ES/PAGINAS/disclaimer.aspx">http://www.tid.es/ES/PAGINAS/disclaimer.aspx</a><br>

      </font>

      <pre wrap="">

<fieldset class="mimeAttachmentHeader"></fieldset>

_______________________________________________

Gluster-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>

<a class="moz-txt-link-freetext" href="http://gluster.org/cgi-bin/mailman/listinfo/gluster-users">http://gluster.org/cgi-bin/mailman/listinfo/gluster-users</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>