For this exciting project you will be scraping a large content website.
Our target price for this project is $50.
We will give you the address of a website containing many content pages
For all of the normal content/article pages, you will need to:
1) Scrape the content
2) Result should be presented as a Unicode CSV file
3) Parse content and save the following fields: title, body, category
4) Remove specific string patterns that we define.
The resulting content must be free of any images and html tags, but must maintain spaces and paragraph indicator.
We are looking to complete this project quickly – 5 days from start.
We will ask you to show us a few scraped records from our site before we accept you to do the work,
Please use the phrase super-scraper in your response, so we know you have read this description.
We expect to have additional work like this.