Find Jobs
Hire Freelancers

Scrape Texts from Marxists.org

$250-750 CAD

Selesai
Disiarkan hampir 3 tahun yang lalu

$250-750 CAD

Dibayar semasa penghantaran
I want texts scraped from the Marxist Internet Archive, and saved in an SQL dataset. I want the structure of the data such that each paragraph has its own row, and is associated with the document that it is found in. I want separate columns to represent the metadata. This includes columns for: · Author or speaker, e.g., “Lenin” · Title of the document, e.g., “To His Sister Maria” This is NOT the HTML title tag, but a title of the content on the page itself (usually the first H1 or H2 HTML tag) · The citation information of the source document, e.g., “Lenin Collected Works, Progress Publishers, 1977, Moscow, Volume 37, pages 67-68.” · The date that the text was written. Or, if it is a speech, the date that the speech was delivered, e.g., “October, 1893”. This is often different from the first publication date, or of the publication date of the source document. So there is a need to be able to search for specific phrases within the scraped content (regex). · The date that the text was first published, e.g., “1929”. This is often different from when the text was written, or first published. Again, this can be resolved with regex. · The type of document, e.g., Speech, Letter, Paper. When available it is usually in the first or the last paragraph of each page. · The direct audience. If it is a speech, then this is the assembly that the speech was given to. If it is a letter, then this is who the letter was written to, e.g., “Lenin’s sister Maria.” (You only need to include this if the title, description, or metadata at the top of the text document states the direct audience.) - For any meta value that is NOT available, the column should record “N/A”. - Server response status (404, 500…) in a separate column. The list of domains is provided at the end of this document. For each domain the crawling depth should be 3 levels. The scraping script should log every successful scraped URL so that in case the script has to be restarted, there will be no duplication of content. If you cannot scrape a document because it is not saved as text, e.g., if it is a pdf, then save the metadata of the document to a second dataset (one that is just for these non-text documents). This meta-data could be as simple as name of author/speaker, title of the document, and URL. Skip documents that are not in English. Do not scrape them. Use the language Ruby, Python (Rails or Django as a framework, is required). The preferred scraping engine is scrapy but we are open to suggestions. Our goal is to be able to change the configuration file later if the sites change or if we want to add another domain to the list When the job is completed, save the scripts that you used, and deliver them to me along with the dataset of scraped texts, so that I can replicate your process. Deliverables are source code files and a DB dump as a .SQL file along side with [login to view URL] file that outlines how to configure and run the script independently. Scrape the texts from these locations: Lenin’s “Collected Works”: [login to view URL] Stalin’s works: [login to view URL] Malenkov’s works: [login to view URL] Nikita Khrushchev’s “Writings and Speeches” and “Letters to Kennedy” [login to view URL] Leonid Brezhnev’s “Writings and Statements” [login to view URL] Yuri Andropov’s “Writings and Statements” [login to view URL] Konstantin Chernenko’s “Writings and Statements” (One document) [login to view URL] Mao Zedong’s Works [login to view URL] Hua Guofeng’s Documents: [login to view URL] Deng Xiaoping’s Speech: (One document) [login to view URL]
ID Projek: 30267433

Tentang projek

34 cadangan
Projek jarak jauh
Aktif 3 tahun yang lalu

Ingin menjana wang?

Faedah membida di Freelancer

Tetapkan bajet dan garis masa anda
Dapatkan bayaran untuk kerja anda
Tuliskan cadangan anda
Ianya percuma untuk mendaftar dan membida pekerjaan
Dianugerahkan kepada:
Avatar Pengguna
$0 CAD dalam 5 hari
4.9 (181 ulasan)
8.1
8.1
34 pekerja bebas membida secara purata $492 CAD untuk pekerjaan ini
Avatar Pengguna
hey, I read you description and checked the LINKS you mentioned. I would like to discuss what I thinks it the way to get this DATA. I will use a PYTHON script and deliver SCRIPT as well as MYSQL database file. I have LOTS of expertise in SCRAPING WEB DATA. Take a look at my profile and lets chat.
$250 CAD dalam 3 hari
5.0 (318 ulasan)
7.3
7.3
Avatar Pengguna
Hi bhmorrison. I am a top-talented python scraping expert with extensive experience. I have complete lots of similar projects using frameworks and libraries such as scrapy, selenium, requests, beautifulsoup, lxml and etc, As an expert I can provide high-quality results within the deadline. You can check out similar projects in my profile. I would like to know more about the project. If you award me your project I will start your project asap. Best regards Valentyn.
$600 CAD dalam 7 hari
4.9 (26 ulasan)
6.7
6.7
Avatar Pengguna
Hi there, I am expert in Python, Django, Automation, JavaScript, Chrome Extension, Web Scraping using(selenium,beautifulsoup,lxml), MYSQL. You can check my portfolio here:-https://www.freelancer.com/u/PoojaRautela417 Lets have a quick chat on this project to clear further details. Thanks & Regards Pooja Bohra
$550 CAD dalam 5 hari
4.9 (103 ulasan)
6.6
6.6
Avatar Pengguna
Quality work, I have done several scrapping projects before Regards
$1,000 CAD dalam 14 hari
4.9 (27 ulasan)
6.4
6.4
Avatar Pengguna
Hello, I am a python based web scraper. I will use Scrapy Framework to scrap your data and store it in a SQL database. I will use regex. I know regex well to get data by text. I will scrap the Writer Name, Publication Date, Citation Text, Written to, First publish data, The URL of the link. I will also keep the status code (404 or 500). Now if you like my proposal then we can talk. Thanks. Shahidul.
$400 CAD dalam 3 hari
5.0 (22 ulasan)
5.2
5.2
Avatar Pengguna
Senior python , django Expert. As 9+ years experiences in these field. I can give good quality work. I have read the guidelines of your work.I believe that i can provide you the best quality works you are anticipating from this platfrom give me a chance to show you the best i can do at your service.
$589 CAD dalam 5 hari
4.4 (48 ulasan)
5.5
5.5
Avatar Pengguna
Hello, I'm currently working on another project. If you don't rush in time, I would like to work on this project. Your requirements are already clear. I will try to have at least a simple demo for you soon. (You don't need to pay until I finish the project --// if I were chosen) Regards,
$250 CAD dalam 10 hari
5.0 (15 ulasan)
5.1
5.1
Avatar Pengguna
Dear Client! I have read your description very carefully and fully understood your requirement and have full confident on this work. I will grab all data along the structure as you mentioned. Why you may choose me? I am python expert and very familiar with RE module and have deep experience in scrap using bs, requests or selenium. Also familiar with MySql DB. I have high problem solving skill and used to give satisfaction to my clients. I really hope for you to get back so we can go further and finish asap. Best Regards!
$500 CAD dalam 3 hari
5.0 (9 ulasan)
4.6
4.6
Avatar Pengguna
Hello! I am a Django/Python expert. Using Beautifulsoup, I can get the data you required. COuld you please contact me? Thanks
$500 CAD dalam 3 hari
4.8 (20 ulasan)
4.8
4.8
Avatar Pengguna
Dear Brad, I have read your description in more detail and have much interest in your project. So I think that I can finish your work perfectly as you need. I have many experiences for Django, Python, SQLite, PostgreSQL. If you need to discuss with me for any kind of suggestions or information, please knock me anytime. I am looking forward to hearing from you very soon. Thank you.
$750 CAD dalam 7 hari
4.6 (7 ulasan)
4.8
4.8
Avatar Pengguna
You are very lucky!!! I have rich experience in scraping. so if you want to see my previous project, I can share. I can finish your project for shortly time without any fail. **if you hire me, I will be your genie who solves all your wishes.** I can start work as soon as we discuss what you want exactly and what I have to do. Let's discuss more details about the job if you are interested in me. Looking forward to hearing from you. Best regards.
$500 CAD dalam 7 hari
5.0 (12 ulasan)
4.1
4.1
Avatar Pengguna
Hi, iam Django developer and i will be happy to work with you. I can create for you a Django app with scraping script Plz contact me for more details Best regards
$500 CAD dalam 7 hari
4.9 (8 ulasan)
3.7
3.7
Avatar Pengguna
Hi I have completed a web scrap project by python. I don't need money and only I am going to work for my review. Because I try to get good my job history. Of course, your project will be perfect product.
$300 CAD dalam 7 hari
5.0 (4 ulasan)
3.4
3.4
Avatar Pengguna
Hi there, hope you are doing great. I am an experienced professional in web scraping and have collected large amount of data from various renowned websites like LinkedIn, Instagram, Amazon and Aliexpress. I have worked in many different frameworks and libraries like scrappy, selenium, BeautifulSoup and even written some simple python scripts to collect data from various APIs using simple HTTP requests. I know how to work with proxies, break image captchas and Google recaptcha, extract text from pdf files or images. My communication skills are outstanding and I also provide post project assistance. Thanks in advance. Feel free to contact me to discuss things over.
$500 CAD dalam 7 hari
3.6 (19 ulasan)
4.5
4.5
Avatar Pengguna
Hi, Thanks for your job posting I'm a ⭐scraping & crawling expert⭐ with rich experience for 7+ years. I've successfully done a lot of projects to scrape data like ads, contact, product, Betting, and other special info with Python, PHP, C++/C#, Java. Also, I can use skills such as Multithreading, Proxy, Selenium, BeautifulSoup4, Scrapy, Scraper API, etc. to scrape data from any website with high speed & quality. I am free time now and I can start immediately. To accomplishing your goal, please send me a message so that we discuss it more and start ASAP. Thanks
$400 CAD dalam 3 hari
5.0 (2 ulasan)
2.9
2.9
Avatar Pengguna
Experienced Web Scraper , Skilled in website scraping, crawling and Data mining. Worked on scraping projects specializing in Web Scraping using Python ,Crawling ,Data mining and Web Research. I am very much skillful with Beautifulsoup , Requests , Selenium , scrapy , Chromedriver, CSS selector, web scraping \ data miner chrome extention . exporting data to various data formats (database, CSV, Excel, google sheets, MySQL ). I also have knowledge of Python\c++ programming language. In my previous working years, I have finished many successful projects as : - extract product data from e-commerce websites (product name, price, description, specifications, image links, etc.) - download and rename product images - gather emails from list of business websites - extract company contact data from directory websites (company name, address, website, email, etc.) Thank you for your time. Looking forward to working with you
$500 CAD dalam 7 hari
5.0 (2 ulasan)
2.8
2.8
Avatar Pengguna
Hi, I am ready to start right now. I can help you with your project according to your requirements. I am interested in your project. I ensure I will complete the task successfully and within the specified time. I love the perfection of work. You can see an example of one of those project in my portfolio here: https://www.freelancer.com/u/Monirulop If you are interested to work with me message me, waiting for your message. Thanks! and be safe, Monirul
$500 CAD dalam 3 hari
4.6 (4 ulasan)
2.1
2.1
Avatar Pengguna
Hi. How are you? I am very skillful freelancer, you can see my reviews and decide whether to choose me or not. It's up to you, but if you choose me, you can get a good result and we can become a good friend each other. I will wait for your connection. Thank you. ///
$500 CAD dalam 7 hari
0.0 (0 ulasan)
0.0
0.0
Avatar Pengguna
I have created a project like this. I believe that I would be able to complete this project. I work professionally and complete every task as soon as possible.
$450 CAD dalam 5 hari
0.0 (0 ulasan)
0.0
0.0

Tentang klien

Bendera CANADA
Nanaimo, Canada
5.0
2
Kaedah pembayaran disahkan
Ahli sejak Mei 18, 2021

Pengesahan Klien

Pekerjaan lain daripada klien ini

Convert txt files to csv files
$30-250 CAD
Terima kasih! Kami telah menghantar pautan melalui e-mel kepada anda untuk menuntut kredit percuma anda.
Sesuatu telah berlaku semasa menghantar e-mel anda. Sila cuba lagi.
Pengguna Berdaftar Jumlah Pekerjaan Disiarkan
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Memuatkan pratonton
Kebenaran diberikan untuk Geolocation.
Sesi log masuk anda telah luput dan telah dilog keluar. Sila log masuk sekali lagi.