Scrape website aggregators.

We would like you to gather all structured data relating to individual websites from sites that have aggregated from many sources.

We have a list of 15 million(!) domains we would like queried.

You can use any of these:

* [url removed, login to view]

* [url removed, login to view]

* [url removed, login to view]

* [url removed, login to view]

* [url removed, login to view]

* [url removed, login to view]

If one source doesn't have info on the domain or only has one or two sources' worth of info and another of these sites does have it, then use that other site.

Choose any technology you like as long as it runs on Linux.


D1. You must build a simple API which accepts a domain name as input.

It will then return the data as BSON where all values are in UTF8 format.

This API will be used to automatically check that your program is getting the right data. PM me for the testing code we have developed.

We will add more tests to it as we go along. It must not pass on the query to the sites live, but instead pull it out of the database you've built from the crawl.

D2. All the code you used to generate the content. The code must be rerunnable automatically without manual intervention.

D3. In M3, the full BSON dump of all data.


M1 (10% of project value): Everything for an initial list of 100k domains we provide

M2. (20% of project value) Everything for a subsequent list of 1 million domains that we provide.

M3. Everything for the rest of the dataset: another 15 million domains.

In your bid please include:

[url removed, login to view] long it will take to complete each milestone

B2. What sites you've crawled in the past and what data you got out of them

B3. What technologies you will use.

B4. What server resources you need us to provide.

Kemahiran: Linux, Memasang Skrip, Kejuruteraan Perisian, Reka Bentuk Laman Web

Lihat lagi: what is database technology, website out source, web content dump, script to pull data from a website, out source website, milestone technologies, m3 website, m3 design, list the content of include for design web site, how does one design a website, gather content for website, design resources website, design dump, b2 design, what is manual testing, web worth, how to take tests, how to pull data from a website, d3 build, take tests, site scrape, scrape website, scrape web, manual testing project, M3

Tentang Majikan:
( 91 ulasan ) Cairo, Egypt

ID Projek: #1566369

3 pekerja bebas membida secara purata $400 untuk pekerjaan ini


We can help in your project, please check PMB and our ratings/reviews to get idea of our experience.

$250 USD dalam 5 hari
(75 Ulasan)

Hi, i can help you with that, check pm.

$250 USD dalam 10 hari
(3 Ulasan)

Dear hiring manager! I have experience of web scraping(automated application). I have completed number of projects. My past samples are attached in PM. I will provide you efficient solution. Please refer to PM for m Lagi

$700 USD dalam 60 hari
(0 Ulasan)