Lengkap

Google Scholar data scrape

I am looking for some web-scraping code written in Python (Windows 10 PC) to pull results from Google Scholar, scrape data from these results and put the data into a .csv file which is readable by Excel. This data should also be written to a local database running on Windows 10 (MySQL or MongoDB)

I need the following:

1. Simple GUI which allows for a "topic" to be entered into which is to be used to scrape Google Scholar Case Law as well as the start and end year. GUI should have a "submit" button.

2. Ability to automatically (internally) create the correct search URL based upon Federal or State case, the topic keywords and the first year. It will then "step" through each year one by one until it hits the end year.

The URL's always follow a standard format so this should not be difficult to implement.

For example, if the topic is "Trademark" and the year range is 2012 - 2015, the program create a URL for Trademark search from year 2012-2012 (single year sub-range is more manageable data-set) and do the below steps. Once this is done, it steps to Trademark from year 2013-2013 and does the below steps. It steps up by 1 year at a time until it hits 2015-2015. This is to prevent far too many search results from showing up at once and making it unmanageable.

In the above example, there would be 4 separate URL created (one for each date range) that are run separately.

3. Navigate to each URL (no visual display required for this)

4. Grab the entire list of sub-URL from the search results and navigate to each one individually and scrape and save the following data in its own CSV file (auto-name). Each sub-url would then have its own filename and data.

5. Ability to turn off saving .CSV files (this is mainly needed for testing/debugging to make it easier to see that program is working properly)

SAVE:

a) Exact URL of the case

b) Header info - contains name of case, court name, district and year

c) *** if the url contains the phrase "NOT TO BE PUBLISHED IN THE OFFICIAL REPORTS", this must be reflected in the CSV naming convention by adding _NOT at the end of the filename.

d) Every sub-URL will have a section called "DISCUSSION." Inside this section we need to search for and save the following:

Sub-titles inside DISCUSSION section (write sub-title to file and save text associated with sub-title)

aa) Sentences within sub-title section with citations afterwards (text inside parenthesis)- save preceding sentence and citation

bb) Any text inside sub-title section which is inside double quotes, including citation afterwards - save entire text inside double quotes and citation.

Continue the above until end of URL, then get next URL, complete and repeat. Then step forward 1 year (if date range allows) and repeat again until all results are parsed.

I have attached a scraped html image of a Google Scholar article for reference.

Kemahiran: Pengaturcaraan Pangkalan Data, Javascript, MySQL, Python, Kejuruteraan Perisian

Lihat lagi: scrape data google keyword tool, scrape data google analytics data, scrape data google excel, scrape data google, scrape data google adwords, scrape data google finance excel, software write mq4, software write chip epson, useful software write book, scrape data google finance, software write web specs, scrape data google map geocode, free software write user guide, software write edid, free software write company profile, software write websites idea, software write book images, software write books, software write protection, free software write book

Tentang Majikan:
( 23 ulasan ) Los Angeles, United States

ID Projek: #14706912

Dianugerahkan kepada:

$155 USD dalam 3 hari
(8 Ulasan)
5.4

10 pekerja bebas membida secara purata $189 untuk pekerjaan ini

mantislin

Hi sir, I am scraping expert, I have did more than 350+ scraping project, please check my feedback then you will know. Can we discuss more details about this project? then I will provide example data/scr Lagi

$152 USD dalam 5 hari
(118 Ulasan)
6.7
barundebnath

I wrote a google scholar bot a more than year before using C#. It should be easy for me to rewrote that code in python Relevant Skills and Experience Already wrote google scholar bot Proposed Milestones $250 USD - Mi Lagi

$250 USD dalam 7 hari
(56 Ulasan)
6.1
technoweb7

699 876 606

$155 USD dalam 3 hari
(6 Ulasan)
5.0
$252 USD dalam 3 hari
(12 Ulasan)
5.1
makrazhamza

Hello sir! I've just seen your job offer and because of my skills in web design/development I'm pretty sure that I'm able to make it done with the best quality and price! Relevant Skills and Experience Hello sir! I've Lagi

$111 USD dalam 10 hari
(7 Ulasan)
3.9
$277 USD dalam 5 hari
(3 Ulasan)
3.4
ameurbennaoui

A proposal has not yet been provided

$249 USD dalam 6 hari
(5 Ulasan)
3.5
$155 USD dalam 3 hari
(1 Ulasan)
0.5
ashis9210

I am highly interested to work in your project. I have excellent experience in web scraping, research, data mining, extracting email address and other related contact information of any business Relevant Skills and Ex Lagi

$138 USD dalam 5 hari
(0 Ulasan)
0.0