. Contents of request
· Creating a database that automatically updates with the original scraping tool and API and building a cloud environment
· The above test design (unit test - connection test - comprehensive test - acceptance test)
· Immediate use environment construction of the above tools (server selection, program operation verification etc.
· System monitoring / maintenance / troubleshooting support for the above tools
* In particular, please also propose about the whole redundancy.
3. How to propose
"Original scraping tool (IP rotation environment)" + "Database"
4. About the original scraping tool
Reference tool "octopus"
Http://[login to view URL]
· It must be Api compliant.
· Accessing by accessing IP.
· It is possible to access multiple sites in parallel.
· It is possible to set both of the following two types, or it is a service corresponding to ②
① Send to fixed URL Specify page count = 1 setting × 100 Service up to setting (crawler)
② Simple to load the URL list (csv) = 1 setting x 10 services (crawler) setting
5. About IP rotation environment
· Addition of IP addition is easy.
- Rotation of unrelated 10 to 100 or more IP addresses.
· More than 50 concurrent tasks.
· Setting the access interval (10 seconds to 1 minute)
· Operation on the cloud for 24 hours.
· Efficient rotation of IP.
For example, suppose that there are five access destinations of ABCDE and 10 IPs rotate.
If an IP that rotates at B gets an access refused, it will return B as an IP
Disconnect from access destination and perform rotation only between ACDE.
6. API linkage with database
A database that automatically captures and updates the data collected by the scraping tool
· Automatically update the database by placing it on the cloud. Delivery hope in the state where it can operate immediately
· Since the number of registered products in the database is over 3 million, it can be handled without delay. In case
· The database should always update the latest backup and switch quickly to emergency.
7. Assumed processing flow
1. Automatically extract URL to be scraped from (1) database and fixed URL list
2. Automatically reflect on existing scraping service (api compatible)
3. Automatic extraction of update information from scraping tool
4. Automatically reflect in the database
5. Automatically extract optimized information from database.
8. Supplied materials
· Database template (created with * simple function)
① The running cost is inexpensive.
* Of course there are no problems with suggestions with existing tools, but evaluation is also high if you propose the same original tool. In case
We will prioritize the adoption of equivalent original tools with only necessary functions.
② It has experience and knowledge on "API", "IP rotation", "access denial" and "large scale database creation", and we can make confident proposals.
③ The interaction with the scraping tool and the database is almost automatic.
④ We will pay the milestone payment, not after the inspection of the finished product.
* If you are absolutely a milestone please appreciate that you do not have any problem by actually seeing the equivalent tool.
Also, since it is not an expert, although explanation about the proposal will of course be obtained, please tell us on our axis whether we can propose our desired function surely.
Please do not hesitate to ask questions about specifications. Thank you.
13 pekerja bebas membida secara purata $475 untuk pekerjaan ini
Hi there. I need to ask you some questions. I have 8 year experience in very complex and high load projects. I can make your project easily. I will be glad to discuss the project through chat.
Hello, I have experience in web scraping with Python. I can use Selenium, Scrapy, BeautifulSoup and Requests to make the best web scrapers! I can also work with SQL databases! I hope to work with you!