I need a crawler that will have as input search terms [in either English or Chinese or other languages] that will be fed through a [login to view URL] file. One search term per line.
The crawler will then crawl some marketplace [to be define with you], process let's say 30 pages of results. Each product listing, we should grab all data from product pages.
i.e. Title, Full Price, Discounted price, Stock, Attributes, Image URLs,
The crawler should
1. be smart enough so it doesn't get banned - it should not be about speed but about being smart to prevent the site to block it.
2. be able to schedule the script and it would work [like through a cronjob]
3. be able to export the data into a CSV file with proper name [date+timestap]
4. The whole thing needs to work on a self-contained environment [like a container style] so I can install on any environment and have it run locally [on macOS or Windows machine]. All dependencies should be within the environment.
5. Make a screenshot of the product page [page are dynamically so we have to make sure the whole page is properly processed] and name the screenshot properly [date+timestamp+productid
6. crawler should behave in somewhat random behaviour [as to prevent from crawler block] i.e. if the search result returns 100 items, we should not process each page sequentially but do it in random orders and maybe put some sort of timer between the page fetch.
7. All these items above [filename, random time, result page to process, max amount of product to process, etc..] it should be configurable in a simple config file.
Other ideas are welcome.
38 pekerja bebas membida secara purata €501 untuk pekerjaan ini
Hi, I am interested in your project related to build a scraper for a marketplace that does the points you mentioned. Please send me a message so that we can discuss all the details. Thanks, Ramzi
hello. can you show me what marketplace& different sites need different methods for parsing data. we can discuss all details if you need. just contact me if you are interested