I need create a crawler/robot in order to capture emails from a couple of sites.
I need to build a crawler that:
- Parses each of the site's pages looking for emails, eliminating those that belong to the site
- Assign the State and category to each of the emails. We need to know which province each capture email belongs to, and the Classifieds main category
- Eliminate duplicate emails when site is crawled
- Database should be delivered in Excel/Access format, that easily imported in any software.
There are 2 sites for now, there will be more.
1- cam pus anun cios dot com (alltogether)
Sites are in Spansih, but easily understood.
The states are on the left site.
The main categories are in Bold
There are 2 other sites very similar to this one. For which we need to do the same thing. These will be posted in pm if you are skilled enough
Are you available for this. I had similar projects done in the past, this shouldn't be a problem for an experienced programmer.