I need a data scraper for the italian Yellowpages.
The scaper needs to collect the following information
- category (eg plumbers etc)
- Business Name
- description (id="textDescriptor")
- All phone & fax numbers
- website address
- email address
A business may have more than one phone number and should be broken into the following fields.
- AH Contact
I also need the address broken into separate fields
- Street number and name
The script must be able to:
- Scrape a lot of entries without being kicked off the site.
- Able to export the data to a csv or mysql file or both.
- A simple html interface will allow me to start/stop the script and provide basic progress feedback.
- extract the data from the sponsored listings.
- automatically extract the data from the continuing pages i.e. 2, 3, 4 onwards to get the full data
I should be able to specify the max number of records to retrive
The input data for the scrape will be the web address generated by the search eg
[url removed, login to view];ts=1&l=1&cb=0&ind=&nc=&xpix=1000&ypix=400&qs=idraulico&dv=&x=0&y=0
[url removed, login to view];ts=3&l=1&cb=0&ind=&nc=&xpix=1000&ypix=400&qs=idraulico&dv=roma&x=47&y=2