Data Mining Task from Digg
You'll be supplied with a list of movie titles.
Your task is to gather the following data
- List of Digg Submissions related to the movie based on the following search terms:
a. search for ||movie_name movie||
b. search for ||movie_name film||
c. search for ||movie_name trailer||
d. search for ||movie_name watch||
e. search for ||movie_name see||
IE for the movie "The Eye" you will run the following separates searches
"The Eye movie"
"The Eye film"
"The Eye Trailer"
"The Eye watch"
"The Eye see"
All *without* the double quotes!!
All searches should be combined and duplicates deleted (delete only exact duplicates, that leads to the same digg submission, not the same external URL)!
Digg search settings: "Title, Description, and URL", "All Stories". "Including burried: NO"
The results should be saved in a table (preferably excel, CSV is also possible) with the following data
ID (auto increment Serial Number), Date Submitted (dd/mm/yyyy), Title, Full URL of DIGG Item, FULL URL ITEM IS LINKED TO, number of diggs, number of comments, Made Popular(YES/NO)
Please note that the date appears on digg as a relative date (ie 2 years 34 days ago). This should of course be converted to the exact data).
Made Popular: Regular diggs (not popular) shows the following text on search result: "username" submitted "342 days ago"
Popular items shows the following text instead: "Username" made popular "342 days ago"
Sample data attached. Please make sure you understand the requirements before posting your bid.
I expect this to be done, as accurately as possible by script (automatically) and in 2-3 days.