Hadoop, MapReduce, & localhost

By default, Hadoop will route at least some of its functionality through $HOSTNAME, where $HOSTNAME is the output of the hostname command on *NIX.

If your hostname is not a valid URL endpoint, like Joe's_laptop, then Hadoop will complain on startup with something like this:

3/01/15 15:07:36 INFO metrics.MetricsUtil: Unable to obtain hostName
java.net.UnknownHostException: Joe's_laptop: Joe's_laptop
	at java.net.InetAddress.getLocalHost(InetAddress.java:1438)
	at org.apache.hadoop.metrics.MetricsUtil.getHostName(MetricsUtil.java:91)
	at org.apache.hadoop.metrics.MetricsUtil.createRecord(MetricsUtil.java:80)
[…]

More specious is when you set your computer’s hostname to be a valid URL, such as example.com.

Hadoop won’t necessarily complain on startup, or even when you send it a MapReduce job. In my case, the mappers completed fine, and the reduce step was stuck, without ever progressing. The red flag was discovered when looking at the “Hadoop Map/Reduce Administration” page (by default, http://localhost:50030/jobtracker.jsp), and seeing that the machine that was executing the reduce tasklets was something very weird. The proper machine, at least for my configuration, turned out to be: /default-rack/localhost.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s