I require a PHP scraper to collect business data from the Australian Yellow Pages and export it to a MySQL & CSV.
My focus in on "restaurants".
The script will be run in Windows on a local Apache Server.
Please see attached file for guide as to which data needs to be scraped for each listing.
The script should:
- be written in PHP and use the CURL library.
- be well documented with meaningful variables & comments so that I can modify it if needed.
- open each individual listing and extract all required data. See attached pdf.
- must be able to export the data to a MySQL file & CSV file.
- must be able to extract the data from the sponsored listings.
- must be able to automatically extract the data from the continuing pages i.e. 2, 3, 4 onwards to get the full data.
Extra Features that I would like:
- ability to choose to export data to only MySQL or CSV file or both.
- Use PHP’s DOM Functions To Parse The HTML
- ability to pause/resume script. I could be scraping up to 20,000 listings.
- ability to switch between extracting all data and extracting only General Business data (
div id="heading-inner" & div id="businessProfile) see attached pdf. This way I can scrap other categories other than just "restaurants".
- fake UserAgent - Googlebot
When the script is completed I would like to see an example output in CSV & MySQL.