Table of Contents

Using Hadoop in the CS Dept Lab

Hadoop 2.10.2 is installed in the CS Dept. Lab in the folder: /usr/lib/hadoop. You'll find the Hadoop binaries and scripts at /usr/lib/hadoop/bin.

If you are a member of the 'hadoop' group, you will be able to access hadoop to submit new jobs.

Jobs MUST be submitted to the head node (zoidberg.cs.ndsu.nodak.edu).

Hadoop makes use of the HDFS (Hadoop Distributed File System). Hadoop node computers are configured to be a node in an HDFS cluster. There's a bit of space here (about 14TB) so don't feel shy about using it, but please delete your large data files when they are no longer needed.

Credentials for the Hadoop cluster are your NDSU credentials.

To become a member of the 'hadoop' group, email labsupport@cs.ndsu.edu. You will be added to the group and a folder in the HDFS filesystem will be made for you.

Status pages

These pages are accessible only from ND state university system networks.

HDFS status

Job History

Resource Manager

The Hadoop tracker pages are accessible only from ND state university system networks.

Firewall info

The head node is firewalled such that Hadoop status links are ONLY accessible from ND state networks. The head node, Zoidberg, can be accessed via SSH from anywhere, so job submission is possible from outside of these networks, but it is currently not possible to view the status pages.

Getting Started

See gettingstarted for the quick 'WordCount' hadoop tutorial.

Hadoop Components

Hadoop consists of multiple, separate components. We have the MapReduce and DFS components installed in the CS Dept Linux Lab.

MapReduce notes

None at the moment.

DFS notes

To use the DFS, you call the hdfs program, give it the argument 'dfs' and give it commands to work with the filesystem, many of these commands are the same as UNIX-style file manipulation commands.

For example, if I wanted to list the files in my home directory I would use:

hdfs dfs -ls /user/helsene  

This would return a UNIX-style file listing of the files in my home directory. Files in DFS have UNIX-style permissions, but are not fully POSIX-compliant, most of the commands, such as -chmod and -chown work just as expected.

Putting files onto DFS

To put files onto DFS from the local system, use the -put command. This command takes a file from the local filesystem and puts it into the specified location on the DFS.

hdsf dfs -copyFromLocal /path/to/local/file /path/to/dfs/file

Monitoring

You can monitor hadoop by visiting particular webpages on the head node. You can visit the Job Tracker and DFS Summary pages for information on jobs and HDFS.