Difference between revisions of "Course: Massive Data Analysis 2014/Hadoop Exercise"
Jump to navigation
Jump to search
Line 7: | Line 7: | ||
** Complete this [http://bit.ly/1vAxovu form] to add the links to your GitHub repository and S3 bucket. '''Deadline: 11:59 PM on the same day of class (Oct 6, 2014)''' | ** Complete this [http://bit.ly/1vAxovu form] to add the links to your GitHub repository and S3 bucket. '''Deadline: 11:59 PM on the same day of class (Oct 6, 2014)''' | ||
== Exercise 0: WordCount | == Hands-on exercises == | ||
* Run the basic WordCount example on your local machine and AWS | * Exercise 0: WordCount | ||
* Follow the instruction here to create your Amazon Elastic MapReduce (EMR): http://vgc.poly.edu/~fchirigati/mda-class/RunHadoopAWS.pdf | ** Run the basic WordCount example on your local machine and AWS | ||
* Instructions to run WordCount on your local machine and EMR cluster will be given in class | ** Follow the instruction here to create your Amazon Elastic MapReduce (EMR): http://vgc.poly.edu/~fchirigati/mda-class/RunHadoopAWS.pdf | ||
* '''Note: You don't have to submit code and results for this exercise.''' | ** Instructions to run WordCount on your local machine and EMR cluster will be given in class | ||
** '''Note: You don't have to submit code and results for this exercise.''' | |||
* Exercise 1: Fixed-Length WordCount | |||
* For this exercise, you will only count words with 5 characters | ** For this exercise, you will only count words with 5 characters | ||
* Exercise 2: InitialCount | |||
* Exercise 3 Top-K WordCount |
Revision as of 19:50, 3 October 2014
Before you start
- You must have Hadoop installed and working on your local machine. You also need to setup your Amazon AWS account. Refer to the instruction in the course page.
- Download the following package: http://vgc.poly.edu/~fchirigati/mda-class/hadoop-exercise.zip. This package contains the basic WordCount example to help you get started.
- What to submit
- Code: place your code in a public GitHub repository
- Results: put the results in your S3 bucket (don't forget to make it public)
- Complete this form to add the links to your GitHub repository and S3 bucket. Deadline: 11:59 PM on the same day of class (Oct 6, 2014)
Hands-on exercises
- Exercise 0: WordCount
- Run the basic WordCount example on your local machine and AWS
- Follow the instruction here to create your Amazon Elastic MapReduce (EMR): http://vgc.poly.edu/~fchirigati/mda-class/RunHadoopAWS.pdf
- Instructions to run WordCount on your local machine and EMR cluster will be given in class
- Note: You don't have to submit code and results for this exercise.
- Exercise 1: Fixed-Length WordCount
- For this exercise, you will only count words with 5 characters
- Exercise 2: InitialCount
- Exercise 3 Top-K WordCount