Table of Contents
Using Hadoop in the CS Dept Lab
Hadoop 2.10.2 is installed in the CS Dept. Lab in the folder: /usr/lib/hadoop. You'll find the Hadoop binaries and scripts at /usr/lib/hadoop/bin.
If you are a member of the 'hadoop' group, you will be able to access hadoop to submit new jobs.
Jobs MUST be submitted to the head node (zoidberg.cs.ndsu.nodak.edu).
Hadoop makes use of the HDFS (Hadoop Distributed File System). Hadoop node computers are configured to be a node in an HDFS cluster. There's a bit of space here (about 14TB) so don't feel shy about using it, but please delete your large data files when they are no longer needed.
Credentials for the Hadoop cluster are your NDSU credentials.
To become a member of the 'hadoop' group, email labsupport@cs.ndsu.edu. You will be added to the group and a folder in the HDFS filesystem will be made for you.
Status pages
These pages are accessible only from ND state university system networks.
The Hadoop tracker pages are accessible only from ND state university system networks.
Firewall info
The head node is firewalled such that Hadoop status links are ONLY accessible from ND state networks. The head node, Zoidberg, can be accessed via SSH from anywhere, so job submission is possible from outside of these networks, but it is currently not possible to view the status pages.
Getting Started
See gettingstarted for the quick 'WordCount' hadoop tutorial.
Hadoop Components
Hadoop consists of multiple, separate components. We have the MapReduce and DFS components installed in the CS Dept Linux Lab.
MapReduce notes
None at the moment.
DFS notes
To use the DFS, you call the hdfs program, give it the argument 'dfs' and give it commands to work with the filesystem, many of these commands are the same as UNIX-style file manipulation commands.
For example, if I wanted to list the files in my home directory I would use:
hdfs dfs -ls /user/helsene
This would return a UNIX-style file listing of the files in my home directory. Files in DFS have UNIX-style permissions, but are not fully POSIX-compliant, most of the commands, such as -chmod and -chown work just as expected.
Putting files onto DFS
To put files onto DFS from the local system, use the -put command. This command takes a file from the local filesystem and puts it into the specified location on the DFS.
hdsf dfs -copyFromLocal /path/to/local/file /path/to/dfs/file
Monitoring
You can monitor hadoop by visiting particular webpages on the head node. You can visit the Job Tracker and DFS Summary pages for information on jobs and HDFS.