<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

    <title></title>

  </head>

  <body bgcolor="#ffffff" text="#000000">

    <tt>[snip]</tt><br>

    <blockquote cite="mid:4F26E7F2.2010902@tid.es" type="cite">

      <ul>

        <li>Are you collocating gluster cluster peers with TT nodes (I

          mean, each one of the 8 TT nodes is also a gluster peer in the

          cluster) or are the gluster cluster running in separate nodes?<br>

        </li>

      </ul>

    </blockquote>

    <br>

    <tt>Yes, you are right. Each TaskTracker node is a gluster peer in

      the cluster.</tt><br>

    <br>

    <blockquote cite="mid:4F26E7F2.2010902@tid.es" type="cite">

      <ul>

        <li>In the case the answer to the question above is that they

          are collocated, which fs.glusterfs.server are you using in

          each TT?

        </li>

      </ul>

    </blockquote>

    <br>

    <tt>For the TaskTracker, fs.glusterfs.server would be _any_ one of

      the gluster peers (i.e. any one of the 8 machines considering you

      have a 1JT + 8TT setup). For simplicity, stick to one hostname/ip

      for this, since that would make deployment easier (no need to edit

      core-site.xml on every machine)</tt><br>

    <br>

    <blockquote cite="mid:4F26E7F2.2010902@tid.es" type="cite">

      I'm asking so because in my mind I'm thinking in a configuration

      like that:<br>

      <br>

      TT1-&gt; fs.glusterfs.server @ core-site.xml in TT1= IP_TT1<br>

      TT2-&gt; fs.glusterfs.server @ core-site.xml in TT2= IP_TT2<br>

      ...<br>

      TTn-&gt; fs.glusterfs.server @ core-site.xml in TTn= IP_TTn<br>

    </blockquote>

    <br>

    &nbsp;&nbsp;&nbsp; <tt>This will definitely work for you, but as i said stick to

      one hostname/ip. So for each (TT1, TT2 .. TTn) use IP_TT1.</tt><br>

    <br>

    <blockquote cite="mid:4F26E7F2.2010902@tid.es" type="cite">

      <br>

      so, each TT mounts "itself" which I suppose achieves a data

      locality similar to the one achieved with HDFS (considering the

      gluster driver is clever enough to use the local disk when the

      data is located in the same node). Does it make sense this

      configuration?<br>

    </blockquote>

    <br>

    <tt>Exactly ! Each TT node (and the JT too) does a GlusterFS FUSE

      mount to get a _view_ of the entire namespace of the FS.

      JobTracker schedules jobs to TaskTracker nodes. When a job runs on

      the TT node, all I/O is done through the GlusterFS mount. Data

      locality is a bit of a catch here. Since all I/O calls go through

      the mount, each call has to take the route of client translator(s)

      -&gt; server translator(s) before it hits the posix layer (even if

      the client and the server are on the same node, the TT in this

      case).<br>

      <br>

      To optimize this we introduced a configurable option

      "quick.slave.io". This is essentially a "short circuit" for the

      case i just mentioned above. When the job wants to read from a

      particular offset in the file, the GlusterFS Hadoop plugin checks

      whether the (offset, length) in question is present in the backend

      file system. If yes, then it satisfies the read directly from the

      backed FS instead of going through the FUSE mount, thereby saving

      context switches, translator overhead etc..<br>

      <br>

      A bit more info, this option is not tested well, so we default to

      "Off" in core-site.xml. If you do try it out please let us know if

      you hit any bugs (and please file them too !).<br>

      <br>

      HTH<br>

      <br>

      Thanks,<br>

      -Venky<br>

    </tt><br>

    <blockquote cite="mid:4F26E7F2.2010902@tid.es" type="cite">

      <br>

      Thanks!<br>

      <br>

      Best regards,<br>

      <br>

      ------<br>

      Ferm&iacute;n<br>

      <br>

      <br>

      <br>

      El 30/01/2012 18:14, Venky Shankar escribi&oacute;:

      <blockquote

cite="mid:00CA8F6DF62C8C4889169A3C2CFC0A4503C3FDAA@mbx024-e1-nj-10.exch024.domain.local"

        type="cite">

        <pre wrap="">Hi,

Can you please dump the contents of conf/core-site.xml from the JT and TT ? (or attach it).

We have tested the plugin with 1 hadoop master (JT) and 8 Hadoop Task Trackers (TT), so it should work with your setup too.

Additionally it would be better if you can give us back the JobTracker and TaskTracker logs. (If they are huge in size paste the last 50 odd lines)

Thanks,

-Venky

________________________________________

From: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:gluster-users-bounces@gluster.org">gluster-users-bounces@gluster.org</a> [<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:gluster-users-bounces@gluster.org">gluster-users-bounces@gluster.org</a>] on behalf of Ferm&iacute;n Gal&aacute;n M&aacute;rquez [<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:fermin@tid.es">fermin@tid.es</a>]

Sent: Monday, January 30, 2012 10:30 PM

To: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:gluster-users@gluster.org">gluster-users@gluster.org</a>

Subject: [Gluster-users] Can Hadoop run on gluster in 1 JT, N TT setup or only works for 1 JT+TT?

Hi,

Recently I've set up a Gluster cluster to run Hadoop M/R jobs, following

the document at

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://download.gluster.com/pub/gluster/glusterfs/qa-releases/3.3-beta-2/Gluster_Hadoop_Compatible_Storage.pdf">http://download.gluster.com/pub/gluster/glusterfs/qa-releases/3.3-beta-2/Gluster_Hadoop_Compatible_Storage.pdf</a>.

As long as I check in my tests, what the gluster_hadoop.jar plugin is

doing is to automatically mount the gluster volumen at the JT node, then

the TT (in the same node) uses that mountpoint to do its work. That's ok

if JT and TT are runing in the same node (i.e. a one-node setup (*)).

However, when I test with a 2-nodes (*) setup in which the JT runs in a

node and TT in another node it doesn't work (e.g. hadoop jar gets

stalled in the "INFO mapred.JobClient:  map 0% reduce 0%" with no

progress after that), which at the end makes sense, given that the

gluster volume is not mounted in the TT node (it's only mounted in the

JT node).

This is a bit annoying to me, given I was expecting that the gluster

volume gets mounted in the TT nodes, which are the ones that actually

need to access to data in the filesystem.

Thus, is not possible to run a 1 JT, N TT Hadoop cluster with gluster?

It only works on a 1 JT+TT?

Or maybe I'm doing something wrong or maybe I'm not understanding

correctly the document at

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://download.gluster.com/pub/gluster/glusterfs/qa-releases/3.3-beta-2/Gluster_Hadoop_Compatible_Storage.pdf">http://download.gluster.com/pub/gluster/glusterfs/qa-releases/3.3-beta-2/Gluster_Hadoop_Compatible_Storage.pdf</a>

(any piece of information about Hadoop running on gluster is highly

welcome, please).

I'm using Hadoop 0.20.2 and Gluster 3.3beta2. If you need to know any

other information about my setup, don't hesitate to ask for it!

Thanks in advance!

Best regards,

------

Ferm&iacute;n

(*) I refer to nodes in the Hadoop cluster, no matter how many nodes are

implementing the gluster cluster (latter ones are "abstracted" by the

mountpoint, as far as I understand)

Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra pol&iacute;tica de env&iacute;o y recepci&oacute;n de correo electr&oacute;nico en el enlace situado m&aacute;s abajo.

This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.tid.es/ES/PAGINAS/disclaimer.aspx">http://www.tid.es/ES/PAGINAS/disclaimer.aspx</a>

_______________________________________________

Gluster-users mailing list

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://gluster.org/cgi-bin/mailman/listinfo/gluster-users">http://gluster.org/cgi-bin/mailman/listinfo/gluster-users</a>

</pre>

      </blockquote>

      <br>

      <br>

      <hr>

      <font color="Gray" face="Arial" size="1">Este mensaje se dirige

        exclusivamente a su destinatario. Puede consultar nuestra

        pol&iacute;tica de env&iacute;o y recepci&oacute;n de correo electr&oacute;nico en el enlace

        situado m&aacute;s abajo.<br>

        This message is intended exclusively for its addressee. We only

        send and receive email on the basis of the terms set out at<br>

        <a class="moz-txt-link-freetext" href="http://www.tid.es/ES/PAGINAS/disclaimer.aspx">http://www.tid.es/ES/PAGINAS/disclaimer.aspx</a><br>

      </font>

    </blockquote>

    <br>

  </body>

</html>