Hi, we wish to have a dataset comprised of data from a well-known directory in the UK (which will be detailed when we award the job).
The website is a typical directory, where you search for a service and location and it lists the various results. The results page consists of approximately 25 results. Each result has a minimum amount of information which is consistent in all results, some have more information. We wish to capture as much as possible.
There will be two stages of web scraping:
- Stage 1: preliminary details from the main results page
- Stage 2: using the url's generated from the above, we wish to then find the relevant contact details by going to each site and searching for phone number/email address and any relevant pictures. ** We will discuss Stage 2 in greater detail after Stage 1 has been successful. This project only refers to Stage 1.**
There are approximately 10k results in Stage 1.
You will need to employ IP rotation and headers.
The results from Stage 1 will need to be in a CSV file, with clear and easily understood headings. We will pay milestones at every 2.5k results sent to us.
33 pekerja bebas membida secara purata £167 untuk pekerjaan ini
Hi I checked your two stages. I can write a desktop application that fetches the data. The scraper can be Multi-threading and with proxy support. Can do it in less than 1 day and start right now. Thanks
Hi, since you know tt we need IP rotation to scrape from this website, do you also have an idea after how many requests their server will detect and block the ip ? Thanks
Hello Sir i am an expert in web scraping and i can perfectly manage your job . i read you job requirements . we need to discuss and i can start your work thanks
Greetings I am highly experienced python developer. I have done a lot of utlity scripts like webscrapers. Contact me so we can discuss this directory in details. Sincerely Mladen
Hello. I read your job post carefully. I am an expert python scraper and I can rotate your IP. Especially I will use python selenium and do some action. You will get the perfect csv. Best regards.