Closed

Creating a large dataset by crawling some public website

I need a large dataset in JSON format, to upload in a MongoDB database. The contents can be anything, but should be meaningful. I need between 500 MB and 5 TB of data to be generated. The data will be used for some training demonstrations.

I want someone to write a program that crawls some website for publicly available data (such as books and reviews from some e-commerce site; news articles from some news sites; hotels and reviews from some travel site; restaurants and reviews from some food aggregator site; articles from wikipedia, etc).

I don't need you to send me the data. I need you to write a program I can run at my end to download the data. But the program must store it in a JSON format that can be directly imported into MongoDB. The structure could be flat JSON documents, or documents that contain embedded documents.

Individual documents may be anywhere in the range from 100 bytes to 100 KB. No individual document should be bigger than 100 KB in size.

We'll have to discuss together to decide the site from which the data is to be downloaded. There should be no violation of any data access policies of the site. This is very important for me; I don't want us to break any law. I will need an assurance from you on this, and a link to the data access policies of the site, if available.

Once we agree on the site to download the data from, you will write the program, test it at your end, send me some sample data, and once approved, send me the program for me to run at my end. If I run into any difficulties while running the program I would require you to support me. The program should allow me to choose the approximate data size (such as 500 MB) after which it will stop crawling any further to download the data.

Kemahiran: Elasticsearch, node.js, NoSQL Couch & Mongo, Python, Pengikisan Web

Lihat lagi: large csv datasets, large datasets for data mining, large data sets, kaggle datasets, free datasets for students, large twitter dataset, interesting public databases, large data sets download excel, I need a new website. I need you to design a landing page. My name is laxita plz help to work, I need a new website. I need you to design and build a website for a doctor\ s office. I am looking for a simple wordpress site, I need a new website. I need you to design and build my blog., I need a new website. I need you to design and build a website for my small business.00, i need a new website.i need you to design and build my blog blog containing different activities, The website I need is for, I need some changes to an existing website. I need you to design a website for my small business, i need a new website. i need you to design and build a website for my small business. for stock photography, i need a new website. i need you to design and build a website for my small business, I need a new website. I need you to design and build a Crypto Currency exchange website. something like btc-e.com a simplified , i want to do public speaking i need a manager, i need someone to search for movie names through a website visit the link get the embed code and submit it on my website i need

Tentang Majikan:
( 5 ulasan ) Mumbai, India

ID Projek: #16917471

9 pekerja bebas membida secara purata ₹9419 untuk pekerjaan ini

anthonioez

I am an expert nodejs/Javascript developer with good experience. I have worked a very good data scraping and crawling script with nodejs. I am interested in working on your project and also available for ongoing sup Lagi

₹12000 INR dalam 4 hari
(3 Ulasan)
3.3
nirmalsarswat

I have done many crawling projects. On of my interesting project is webdb, a mongdb 9.1GB collection of URLs from online search engines. I crawled 2 million words on Google by maintaining policy using proxy servers. I Lagi

₹5555 INR dalam 2 hari
(3 Ulasan)
2.4
₹13888 INR dalam 3 hari
(2 Ulasan)
2.2
bawanifreelancer

I have more than 10 years of experience in data scraping and extraction. Kindly message me so we can decide the website from which the data will be scraped.

₹12222 INR dalam 3 hari
(3 Ulasan)
1.1
weflake

Logrolling solutions having professional developers, designers and Project Managers to provide quality IT services for all kind of business. Logrolling solutions Is quite an expertise in web and mobile applications d Lagi

₹11111 INR dalam 3 hari
(0 Ulasan)
0.0
andreijava

Hi, I'm senior java & python developer. Worked on data mining projects for both languages. I've alreade have some modules to make this task pretty quickly.

₹8888 INR dalam 3 hari
(0 Ulasan)
0.0
dvoraj75

Hi, I am a senior developer from Czech Republic with 10 years of experiences with Python on Windows or Linux, C/C++ and much more. I love precision and i am applying this in my work. I am sure that i can do the b Lagi

₹7777 INR dalam 3 hari
(0 Ulasan)
0.0
₹7777 INR dalam 3 hari
(0 Ulasan)
0.0
haseebrauf1

Hello Sir, I have read your Requirements and after reading them i can see that i already have written code similar to the piece of code you need, my code downloads tweets which have famous celebs mentioned in them Lagi

₹5555 INR dalam 3 hari
(0 Ulasan)
0.0