Ditutup

Improve webpage scrapping solution -- 3

Request details

I developed a Java program to scrap information from a website. The architecture of the solution involves: 1) using Java Selenium to send requests to the webpage via Chrome Webdriver to trigger authentication and authenticated requests; 2) routing the requests from Chrome (headless) to Java BrowserMobProxy to capture three HTTP headers (Authorization, X-CSRF-TOKEN, and Cookie) and one query string (without these, the server after some requests starts responding 512); and 3) use these 4 elements in HTTPs requests from Java directly to the webpage (i.e. without Selenium, Chrome, and BrowserMobProxy involved) to retrieve the desired information.

This program does the basic functionality of extracting the information but has a few problems:

It depends on an external non-Java component: Chrome WebDriver

It depends on Java Selenium and Java BrowserMobProxy, two dependencies that I would like to remove

It is not optimized (too much refresh and too long sleep periods) relatively to the limit upon which the Webpage (Cloudfare) starts responding 429 errors. Thus, the retrieval of the information is taking much more time than needed.

Deliverables

You will get the current program Java code and you will need to solve the problems above. To do so, you will need to:

A. Find out how to authenticate and refresh the 3 headers and the query string without depending on Selenium, Chrome Webdriver, and BrowserMobProxy. As most of this data is likely generated in JavaScript, you will need knowledge about JavaScript and how to execute JavaScript from within Java or convert the JavaScript code to Java (preferable solution).

B. You will need to identify the limit upon which the Webpage (behind Cloudfare) starts responding 429 errors. You will need to tune the refresh frequency of the headers and sleep periods to the limit identified. You will need to demonstrate the benefits of your changes by extracting the information currently extracted by the program and measuring how long it takes.

Note: you will need to create your own login/password in the webpage. No additional requirements exist to register.

Kemahiran: Java, JavaScript, Pengikisan Web, Python, Kejuruteraan Perisian

Lihat lagi: skills needed data entry person, qatar companies needed data entry done free lancer, people needed data entry bangladesh, web scraping algorithm, web scraping python projects, how to scrape data from a website, web scraping, web scraping python, scrapy, web scraping python beautifulsoup, web scraping tutorial, excel skills needed data entry work, biding needed data entry work, webpage pdf solution, average speed needed data entry jobs, needed data entry compant, needed data entry ebay, needed data entry worldwide, needed data processing, skills needed data entry

Tentang Majikan:
( 1 ulasan ) Băilești, Romania

ID Projek: #26818015

10 pekerja bebas membida secara purata $141 untuk pekerjaan ini

sodiqa32

Hello, I am pleasure with your job as detailed. Thank you for the job posting. It’s a pleasure to meet you. I’d really like to work with you on this one if possible! I do have a couple of questions, but first I’d like Lagi

$30 USD dalam sehari
(45 Ulasan)
6.0
Eminencehub

Hey There After going through the details of your projects I understand that you need a developer to Improve webpage scrapping solution. I believe that you are looking for a quality work which needs to be completed und Lagi

$140 USD dalam 7 hari
(24 Ulasan)
5.3
jaimek91

I have a lot of experience in scraping as freelancer with python (scrapy/selenium/beautifulsoup), .net/c# (Html Agility Pack) and php (Goutte). I also have developed a python script which implemented multithreading and Lagi

$30 USD dalam 10 hari
(25 Ulasan)
5.4
Demenntor

Dear Employer, I have read the project details and confident to work on improving web scraping solutions. I have extensive knowledge on Java, javascript, python,web scraping, software architecture,etc . Kindly messag Lagi

$222 USD dalam 3 hari
(30 Ulasan)
5.2
jvalenzuelavega

Hello! I have solid experience in the field of statistics and computer programming. To make you happy is my essential and utmost motivation. I'm an expert data analyst, and a seasoned programmer (fluent and proficient Lagi

$140 USD dalam 7 hari
(10 Ulasan)
4.9
juneadkhan

Hello Sir! I am a web scrping expert, I think I'm a great fit for this project. because I have an interest in your project and can deliver on time, according to your specifications Thanks

$140 USD dalam 7 hari
(2 Ulasan)
2.9
gudenets2020

Hello. Glad to meet you. I am very interested in your job post involving these skills Let me do this job for you now just to prove my expertise in this field. I have confident that I will do your work very well. I hope Lagi

$150 USD dalam 2 hari
(2 Ulasan)
2.7
asmamessaadi

I can fix your code or rewrite a new code in Python + Selenium automated driver, performant, clean and without bug I can also automate the job. have a look into my bots with Selenium [login to view URL] Lagi

$200 USD dalam 2 hari
(2 Ulasan)
2.2
Nowrajkhan

hello sir❤ I am reading your post. I am very interested to work on this post. I hope you believe in a great job on me. I hope you will handover this work to me. I hope you will be very happy to see my work Lagi

$222 USD dalam 4 hari
(0 Ulasan)
0.0
NikitaYas

Good day. I'm interested in your project. I have about 3 years of scraping and 6 years of python programming experience. A big plus of using python is that everything will be automated and that I can write a program qu Lagi

$140 USD dalam 7 hari
(0 Ulasan)
0.0