deptlab:hadoop:gettingstarted
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
deptlab:hadoop:gettingstarted [2017/01/15 14:29] – created localadmin | deptlab:hadoop:gettingstarted [2025/04/23 11:40] (current) – [Getting Started with Hadoop] localadmin | ||
---|---|---|---|
Line 2: | Line 2: | ||
This is a short tutorial on using Hadoop. | This is a short tutorial on using Hadoop. | ||
+ | |||
+ | Hadoop commands and formats change at times. If the **hadoop fs < | ||
We'll go through the process of compiling, packaging, and running a simple Hadoop program. | We'll go through the process of compiling, packaging, and running a simple Hadoop program. | ||
Line 7: | Line 9: | ||
This tutorial is adapted from the [[https:// | This tutorial is adapted from the [[https:// | ||
+ | ===== Logging In ===== | ||
+ | |||
+ | First, make sure you can log in to the head node with SSH, currently at zoidberg.cs.ndsu.nodak.edu. You can log in to this server with your CS Domain password (The one you use for the Windows clusters found around campus or Campus Wi-Fi), and NOT your University System password. | ||
+ | |||
+ | If you have trouble logging in: | ||
+ | * Check to see if your password works in the Linux lab | ||
+ | * If you **can not** log in to the Linux lab, contact < | ||
+ | |||
+ | To request access to the Hadoop cluster, contact < | ||
+ | |||
+ | ===== Setting Up Input Files ===== | ||
+ | |||
+ | This program can use the Hadoop Distributed File System (HDFS) that is set up in the CS department. This file system spans all the Linux lab machines and provides distributed storage for use specifically with Hadoop. | ||
+ | |||
+ | You can work with HDFS with UNIX-like file commands. The list of file commands can be found [[https:// | ||
+ | |||
+ | First, make a directory to store the input for the program (use your username). | ||
+ | |||
+ | < | ||
+ | helsene@zoidberg: | ||
+ | helsene@zoidberg: | ||
+ | </ | ||
+ | |||
+ | To set up input for the WordCount program, create two files as follows: | ||
+ | |||
+ | file01: | ||
+ | <file file01> | ||
+ | Hello World Bye World | ||
+ | </ | ||
+ | |||
+ | file02: | ||
+ | <file file02> | ||
+ | Hello Hadoop Goodbye Hadoop | ||
+ | </ | ||
+ | |||
+ | Save these to your home folder on the head node. To move them into HDFS, use the following commands: | ||
+ | |||
+ | < | ||
+ | helsene@zoidberg: | ||
+ | helsene@zoidberg: | ||
+ | </ | ||
+ | |||
+ | Again, use your username where applicable. | ||
+ | |||
+ | The syntax here is " | ||
+ | |||
+ | ===== Running the WordCount Program ===== | ||
+ | |||
+ | You can now run the WordCount program using the following command: | ||
+ | |||
+ | < | ||
+ | hadoop jar wc.jar WordCount / | ||
+ | </ | ||
+ | |||
+ | The command syntax is: | ||
+ | " | ||
+ | |||
+ | In this case, we use the wc.jar JAR file, running the class ' | ||
+ | |||
+ | ===== View Output ===== | ||
+ | You can check the output directory with: | ||
+ | |||
+ | < | ||
+ | hadoop fs -ls / | ||
+ | </ | ||
+ | |||
+ | You should then see something similar to: | ||
+ | |||
+ | < | ||
+ | helsene@zoidberg: | ||
+ | Found 2 items | ||
+ | -rw-r--r-- | ||
+ | </ | ||
+ | |||
+ | The ' | ||
+ | |||
+ | < | ||
+ | helsene@zoidberg: | ||
+ | Bye 1 | ||
+ | Goodbye 1 | ||
+ | Hadoop 2 | ||
+ | Hello 2 | ||
+ | World 2 | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | ==== Notes on Hadoop 2.8.5 ==== | ||
+ | May need to run: export HADOOP_CLASSPATH=${JAVA_HOME}/ | ||
+ | |||
+ | Compile: hadoop com.sun.tools.javac.Main WordCount.java | ||
+ | |||
+ | Make Jar: jar cf wc.jar WordCount*.class | ||
+ | Execute jar: hadoop jar wc.jar WordCount / |
deptlab/hadoop/gettingstarted.1484512162.txt.gz · Last modified: by localadmin