Ditutup

Web data extraction

FTE Data extraction specification

Goal:

To create a web data extraction software to extract data at scale from any website The software should also be responsible and respect the crawling websites. The software should be able to run under proxy IPs and be easy to install and manage. The goal is to extract data from 100+ websites globally (India and US to begin with).

Requirements:

the time to configure new websites or altering existing web sites can be done in minutes with minimal understanding of the software.

An user interface to manage the meta data for the websites to be crawled. This would include the site name, URL, frequency etc.

Some of the management functions would include adding new websites, deactivating, activating websites, changes to existing URLs, changing the attributes etc...

For each website, I should be able to define the attributes for each item (Item could be product or service) that I want to extract. For example, website id, title, description, pricing, shipping, and image for a product on Amazon.

Each element should be cleansed to be free of HTML and bad characters

All abusive words should be removed.

Should be able to define the categories or sections to be extracted. For ex, I should be able to extract only toys category from Amazon

Should be able to extract products based on specific attributes. For example, top seller, the products that are shown on specific tabs or quadrants of a web site.

Each extracted item should be uniquely identified. The identifier should produce the same key each time the item is extracted from the same web site. For example, the identifier for the item iPhone12 on Amazon should not change unless the name of the product changes.

The output format should support both CSV and JSON.

The software should be configurable to store the output to any desired locations including any cloud providers, disk etc..

The code should be written in Python mostly (and Perl minimally if required).

Code should be commented to make maintenance easier.

Number of parallel threads should be configurable.

A UI is required to check sample data for configured and to be configured web sites.

There should be a way to track the status of the data extraction pipe at a granular level.

Success criteria:

Ability to extract data from 10 web sites as defined by me.

Ability to change the elements to be extracted.

Demonstrate the ability to run as proxy and extract data from 1-3 desired web sites.

Code review to ensure I understand the software.

Output should be verified and approved for both JSON and CSV.

Pricing quote:

Would like to get a pricing quote with and without the UIs for the configuration manager (1) and (11).

Output format sample:

{

{“site id”: 3232},

{“date time” : “dd mon yyy hh24:mi:ss”}

{“content”:

{“row id” : “343-34-er-34”, “item name” : “x”, “item desc”: “abc”, “item price” : 20.45, “currency: “Indian”}

}

}

Fields to capture in POC:

Product Name

Product Description

Price

Site Category

Brand

Product attributes (color, weight, etc…)

The complete list of required fields will be given after the POC and the completion of the project will be determined based on the result.

Kemahiran: Python, Selenium, Web Scraping, NoSQL Couch & Mongo

Lihat lagi: web product data extraction, getafreelancercom web data extraction softwares scripts, automated web login data extraction, web data extraction visual basic code excel, web data extraction tools, web data extraction mysql, perl script data extraction web page, commerce web data extraction, pdf data extraction excel data, web data extraction asp, data extraction facebook data, we need to design an user interface for our new affiliate network, data extraction in data warehouse, data extraction in data mining, difference between data extraction and data mining, data extraction in data analytics

Tentang Majikan:
( 1 ulasan ) Sammamish, United States

ID Projek: #31458903

27 pekerja bebas membida secara purata $1081 untuk pekerjaan ini

(189 Ulasan)
7.2
(78 Ulasan)
6.9
(36 Ulasan)
6.5
(8 Ulasan)
5.6
(9 Ulasan)
5.2
techplusintl

Hi there, ★★★ Scrapping / Python / Selenium Expert ★★★ 9+ Years of Experience ★★★ I've read requirements and ready to create a web data extraction software. Some major works we do: ✔️ Product Websites Scraping: eComme Lagi

$750 USD dalam 7 hari
(11 Ulasan)
5.3
salmanullahbaig

Hi, I have experience in data scraping in python using requst, htmlRequest, selenium and some other python modules. I can help you with this task. Thanks

$1125 USD dalam 7 hari
(6 Ulasan)
4.5
arjun366333

Upon checking your requirement to web data scrapping i can do this perfectly , I have 6 year experience in website scrapping , python scrapper , scrapper , please check my profile https://www.freelancer.in/u/arjun36633 Lagi

$800 USD dalam 7 hari
(10 Ulasan)
4.4
normanburtonfree

Hello client, I wish you the best of luck in everything with you. As a professional developer, I have many years experiences of python and Selenium and Web scraping Please feel free to contact me and let’s discuss ab Lagi

$750 USD dalam 7 hari
(9 Ulasan)
4.0
jeetparmar7223

Respected client, I have very good experience in python selenium. i do this automation job daily i have automated so many sites and wrote to many scripts for many sites...i work on python selenium in which we scrape d Lagi

$944 USD dalam 10 hari
(10 Ulasan)
3.8
teresed85

Hello! I am an experienced and senior Full Stack developer who has over 10 years of rich experience in web development. At the moment, I have no pending work so I can start as soon as you send me the detailed requireme Lagi

$800 USD dalam 7 hari
(4 Ulasan)
3.5
appdeveloper4241

Hello, I will develop a script that will scrape or convert the data you need from a public or a personal source; Commonly output format: xls, csv, txt, xml, mdb,sql ; The script is developed and running under a license Lagi

$1125 USD dalam 7 hari
(3 Ulasan)
3.5
Daneilka1

Hi, there. As a senior python developer, I am very experienced in web data extraction. After checking your description carefully, I am very interested in your project. I have extensive knowledge with using several web Lagi

$750 USD dalam 10 hari
(1 Ulasan)
3.2
valeriypanovich1

Hi Saravana ⭐⭐⭐⭐⭐ Python Expert ⭐⭐⭐⭐⭐ I am expert in Python, Automation, JavaScript, Chrome Extension, Web Scraping using(Selenium,Beautifulsoup,lxml, Pandas), MYSQL. I have a strong grip on core python and related p Lagi

$750 USD dalam 7 hari
(2 Ulasan)
3.0
MalikVykov

Hi!! I can do anythin on Web scraping field, I'm expert. Selenium, Scrapy, puppeteer, playwright, cypress, ... Have experience in most of scraping libraries. Looking forward to hearing from you. Malik.

$999 USD dalam 7 hari
(2 Ulasan)
2.4
Parveenlamba77

Hi There, Hope you are doing well. I am an automation expert. I can do the data extraction automation for you as per your requirement. I am an IT professional with over 10 years of experience. Feel free to contact me.

$1000 USD dalam 7 hari
(4 Ulasan)
2.1
NEHABHAT92

Hi, I can create Web data extraction I am an experienced Web developer and work on crypto currency development and equipped with all the necessary skills to provide you best website that completely satisfies your bus Lagi

$1125 USD dalam 7 hari
(2 Ulasan)
2.0
nachitayadav8

Hi, i will do Web data extraction I am an experienced Web developer and work on crypto currency development and equipped with all the necessary skills to provide you best website that completely satisfies your busines Lagi

$1125 USD dalam 7 hari
(0 Ulasan)
0.0
pankajudayan

Hi there, I'm a Full Stack Developer with 5+ years experience - worked with multinational companies - now a Full Time Freelance click CHAT for a quick review about Your project, Will give you final Cost and timeline f Lagi

$1125 USD dalam 20 hari
(0 Ulasan)
0.0
jeffreymachariag

Hi, am an experienced python/django developer who has worked on scraping projects using selenium, beautiful soup and scrapy. Based on your project requirements i am interested in working on your project My quote withou Lagi

$1200 USD dalam 10 hari
(0 Ulasan)
0.0