Sedang Disiapkan

Project for Wilson S

This project will develop set of crawlers based on Scrapy framework that can download and synchronize all of products' firmware (including all versions) from web pages of a given list of predefined vendors and store the firmware information (meta data) in PostgreSQL DB. Final number of crawlers would be ~100 and project milestones are defined per vendor and each milestone is max 65€ which is paid after we verify the completeness of each crawler and see no errors.

The mandatory metadata fields include (Manufacturer, Model, Version, Type, Name, Release Date (if available), Download link, (calculated Sha2 hash of the file)i.e. ( Cisco, Video Surveillance 6030 IP Camera, 2.7.0, IP Camera, [login to view URL], 21/08/2015, "link", “Sha2” ). There is a boolean field which indicates if the device is discontinued or not depending on the availability of such information on the website of the vendor. The firmware files itself will be stored in the file system and will be referenced in PostgreSQL.

The developer is required to extend an existing scraping framework that was partially developed based on Scrapy framework and follow DB schema and code templates provided by us. It's also the responsibility of the developer to test crawler and ensure completeness of the solution in terms of full coverage of the firmware files and product pages. There are no GUI components on the server that runs crawlers. Therefore, headless browsing mode should be used.

Project Scope

1. Crawlers will be written per vendor. This is required because each vendor website will have its own implementation of the firmware download page.

2. The user should be able to pause and resume crawling jobs.

3. Crawlers should detect previously downloaded files and only download updated and new content and firmware files. At first execution of each crawler, it will download all the available firmware files but the subsequent crawler runs will only download new firmware files which are added since the last crawling.

4. The developer is required to manually analyze each provided vendor site before writing a crawler to identify the following required information:

a. URLs for the firmware download page including all of the firmware versions for each product

b. URLs/files for each product that include the following information, required to be scraped: "Manufacturer", "Model", "Version", "Type", "Release Date", "if the product is discontinued"

c. Credential Requirements (Simple Signups, Specific Signups, No Signups)

d. Any Captcha on the page

e. Any honeypot traps

5. If a vendor site requires credential for firmware download, the developer is required to sign up an account using an email address dedicated for this project

6. Script will try to imitate human like behaviour (to a limit) while scraping the web page as well as using Tor if required

Important Notes

The developer MUST test the completeness of each crawler before delivering to us and present test completion evidence in the form of a populated PostgreSQL database of that vendor.

*An NDA and a contract must be signed before the beginning of the project. A copy of the developer's identification document is required to verify the identity.

*Please apply just when you fully read and understand the project and agree with the conditions.

Kemahiran: Pengikisan Web

Lihat lagi: red purple modernization project, redline project, chicago l map, wilson stop red line, cta red and purple modernization award, cta capital improvement program, cta brown line construction, red line construction 2020, html code templates, project visual basic arabic speaking source code, expression media necessary code templates, css wordpress code templates, ebay code templates, bca project java script using html source code, email signature html code templates, oscommerce promotional code templates, website code templates database, code templates fbml, graduate project hospital billing system free source code, iphone app code templates

Tentang Majikan:
( 4 ulasan ) Brussels, Belgium

ID Projek: #27851350

Dianugerahkan kepada:


Thanks for your invitation. I am Wilson Sumanang. I have many experiences in these project. Wilson

€60 EUR dalam 7 hari
(0 Ulasan)