Build your own Hadoop AMI, starting from the Amazon Linux AMI ([url removed, login to view]). You have to use latest stable Hadoop release. You are required to store this AMI in S3, and its name must include your last name. This AMI will be tested with the application built for task 2. However, if your AMI doesn’t work you are allowed to use one of the pre-built Hadoop AMIs for task 2.
Write a Hadoop/Yarn MapReduce application that takes as input the 50 Wikipedia web pages dedicated to the US states (we will provide these files for consistency) and:
Computes how many times the words “education”, “politics”, “sports”, and “agriculture” appear in each file. Then, the program outputs the number of states for which each of these words is dominant (i.e., appears more times than the other three words).
Identify all states that have the same ranking of these four words. For example, NY, NJ, PA may have the ranking 1. Politics; 2. Sports. 3. Agriculture; 4. Education (meaning “politics” appears more times than “sports” in the Wikipedia file of the state, “sports” appears more times than “agriculture”, etc.)
INPUT FILE IS GIVEN - [url removed, login to view]
12 pekerja bebas membida secara purata $297 untuk pekerjaan ini
Dear Customer, My name is Yuriy Tumakha. I am interested in your AWS Hadoop project. I am Senior Scala/Java Developer with 14 years of experience. You can see my code examples on GitHub [url removed, login to view]