
Selesai
Disiarkan
Dibayar semasa penghantaran
I have to pull many thousands of PDF files from a publicly available but poorly structured online database. The pages are slow, there are no clear download links, and navigation relies on clunky JavaScript forms, so a straightforward “save as” approach will take far too long. You will receive a text file that contains the exact filenames for every document I need. Those filenames appear in the HTML once the record is loaded, so they can be used as reliable anchors for the scrape. The order in which the files arrive does not matter; accuracy and completeness do. I expect an automated approach—Python with Selenium, Playwright, Scrapy, or any comparable tool is fine—as long as it can work around the site’s fragile structure and occasional timeouts. If headless browsing or rate-limiting tricks are required, please build them in. Deliverables: • A zipped archive (or split archives) containing every requested PDF. • The runnable script with clear, inline comments so I can repeat the process in future. I hope to be able to run this program every few weeks to capture up to date files. • A brief README explaining environment setup, command-line usage, and any third-party libraries. I will validate the job by spot-checking a random sample of filenames against the list I provide and by ensuring the script reproduces the full download set on my end without manual tweaks. The above is AI generated for this job - the following is my description. I want to create a readable store/database of AFCA decisions. Their website is afca.org.au. My plan is to create a ChatCGP (or similar) AI tool to summarise each determination, or search across all determinations for keywords or phrases. AFCA publish each determination in a pdf document. Obviously, I only need the text for each determination. So whether your tool captures each pdf or simply gathers the text as a separate .txt file is a matter for you. As far as storage size goes, obviously .txt files will be far smaller. I don't need an Access or similar database created, I seek only the documents themselves for use in an AI environment. As far as indexing goes, we can start with this : Date: Determination/Case number: Financial Firm Creating an index in Excel or similar seems to be easiest. Those details are captured on the 1st page of each determination. At the outset, there will be many, many 000's of determinations across the old and new databases. Their online search facility is very poor. Older determinations (2018-2024) [login to view URL] take note of this - Service advisory: We are aware that some PDF links show the message *error opening/reading pdf file* — if you see this message, please disregard it. Simply click the link and the PDF will open as normal. Newer determinations (since 2024) [login to view URL]*gbf20z*_gcl_au*MjEwNjExNjQxNi4xNzY2MTg0OTg5 I would be happy starting with the newer determinations only to check for validity, then look at the older database.
ID Projek: 40237292
177 cadangan
Projek jarak jauh
Aktif 25 hari yang lalu
Tetapkan bajet dan garis masa anda
Dapatkan bayaran untuk kerja anda
Tuliskan cadangan anda
Ianya percuma untuk mendaftar dan membida pekerjaan
177 pekerja bebas membida secara purata $498 AUD untuk pekerjaan ini

Hello, This is not just a bulk download task, it’s the foundation of a structured legal corpus that you’ll later use for AI summarisation and search. The solution needs to be reliable, repeatable, and built with scale in mind. My proposed approach would be: Phase 1 – Newer database (since 2024) • Build a Playwright-based crawler to handle the JavaScript-heavy interface and slow page loads. • Implement controlled rate limiting, retry logic, and resume capability to handle timeouts and interruptions. • Download each determination PDF and immediately extract clean text using PyMuPDF or pdfplumber. • Parse the first page to extract Date, Case Number, and Financial Firm using structured pattern matching. • Generate an index CSV with filename, date, case number, firm, and file path. Phase 2 – Older database (2018–2024) • Extend the same pipeline with adjustments for the legacy interface. Deliverables would include: • Complete set of PDFs (or text files if preferred) • Clean extracted text files • Structured index CSV • Fully commented Python script • README explaining environment setup and execution The architecture will allow you to re-run the process periodically to capture new determinations without duplicating existing files. I have experience building structured, audit-ready data pipelines where completeness and reproducibility are critical. Happy to begin with the newer database as a proof of validity before expanding. Best, Jenifer
$550 AUD dalam 25 hari
9.4
9.4

Hello, With my expertise in JavaScript, I can navigate and extract the required information from the clunky JavaScript forms you mentioned. I understand that the current website conditions are challenging, but I am well-versed in utilizing automation tools like Selenium, Playwright and Scrapy to handle such situations effectively. My experience in working with poorly structured websites will help me save your valuable time by providing an automated approach that avoids any rate-limiting issues. In terms of deliverables, I guarantee a zipped archive with all the requested PDFs accompanied by a comprehensible script ready for your future use. To ensure your satisfaction, I am happy to add clear inline comments to the script to clarify any steps or customization needed going forward. Furthermore, as an added bonus, I will provide you with a brief README file describing everything you need for environment setup, command-line usage and any third-party libraries used. Lastly, as a customer-focused professional, my aim is not only to complete the project but to ensure your ongoing satisfaction. Therefore, if there are any additional features or tweaks you'd like implemented in this process, kindly let me know; I'm more than ready to accommodate you. Choose me for this task and rest easy knowing the messy parts of extracting files from this complex database are well taken care of by an experienced professional. Thanks!
$350 AUD dalam 3 hari
7.9
7.9

I can efficiently handle the "Mass PDF Database Extraction" project using my expertise in JavaScript, Python, Data Processing, Web Scraping, and Software Architecture. The budget can be adjusted after discussing the full scope, and I aim to work within your budget constraints. Please review my 15-year-old profile to see my extensive experience. Let's discuss the details and get started on the project. Your satisfaction is my priority, and I am eager to showcase my commitment. Looking forward to hearing from you.
$525 AUD dalam 10 hari
7.9
7.9

⭐⭐⭐⭐⭐ Efficiently Gather PDF Files from AFCA's Online Database ❇️ Hi My Friend, I hope you are doing well. I reviewed your project details and see you are looking for a solution to pull thousands of PDF files from the AFCA website. You don’t need to look any further; Zohaib is here to help you! My team has successfully completed 50+ similar projects for data scraping. I will create an automated script using Python with Selenium or similar tools to navigate the site and gather the necessary files accurately. ➡️ Why Me? I can easily do your project of scraping AFCA decisions as I have 5 years of experience in web scraping, automation, and data extraction. My expertise includes Python, Selenium, and handling complex website structures. Additionally, I have a strong grip on API integration and data processing. ➡️ Let's have a quick chat to discuss your project in detail. I can show you samples of my previous work and how we can achieve your goals efficiently. Looking forward to discussing this with you in chat. ➡️ Skills & Experience: ✅ Web Scraping ✅ Python Programming ✅ Selenium Automation ✅ Data Extraction ✅ Error Handling ✅ API Integration ✅ Script Optimization ✅ File Management ✅ Data Indexing ✅ Task Scheduling ✅ Data Analysis ✅ Documentation Waiting for your response! Best Regards, Zohaib
$350 AUD dalam 2 hari
8.1
8.1

Hi there, I understand that you need to extract thousands of PDF documents from the AFCA website to create a readable database of decisions. Given the challenges with the site's structure and navigation, I propose an automated solution using Python with Selenium or a similar tool to efficiently gather the necessary files. My approach will involve developing a script that accurately retrieves each determination, focusing on the text content while ensuring the integrity of the data. I will implement error handling for any potential issues with PDF links and ensure that the script can be run periodically for future updates. Deliverables will include a zipped archive of the extracted documents, a well-documented script for future use, and a brief README for setup and usage instructions. I prioritize clear communication and quality work, ensuring you receive a reliable solution. I look forward to the opportunity to assist you with this project. Best regards, Burhan Ahmad TechPlus
$750 AUD dalam 5 hari
7.8
7.8

Greetings. I will approach the mass download by building a script to automate the process. I'll download the PDFs into a single folder and deliver it as a zip archive. I can generate a spreadsheet with additional metadata if you require any. I have noted that the site will be brittle and finicky and I can confirm that I have handled many such sites and it will not be a problem as long as waiting and retrying eventually works. Timeline: The building of the script should take 1 - 2 days and the completion time of the full scrape then depends on the number of PDFs you have on your list and the response time of the site. Features: The script will feature: - Rate limiting to avoid overburdening the site and bot-detection. - Fault tolerance to continue running when some pages fail to load. - Retry logic to maximize the number of pages that are successfully handled. Experience: I have a wealth of experience working in Python web scraping and can use technologies such as Playwright, Selenium, BeautifulSoup, Pandas and more. --- I am available to begin immediately and work until completion. Contact me if you wish to continue. Thanks.
$250 AUD dalam 4 hari
7.6
7.6

Hello Greetings, After reviewing your project description, I am confident and excited to work on this project for you. However, I have some crucial points and questions to clarify. Please leave a message in the chat to discuss this, and I can share my recent work that is similar to your requirements. Thanks for your time! I am excited to hear from you soon. Best regards
$5,000 AUD dalam 40 hari
7.7
7.7

Having worked extensively with data extraction and processing, I'm confident in my ability to tackle your project that involves pulling thousands of PDF files from a rather troublesome online database. My strong suit lies in web scraping and automating complex tasks, skills which you mentioned as key requirements for this job. I'm fluent in Python and have hands-on experience with Selenium, Playwright, and Scrapy - tools that would be ideal for handling the clunky JavaScript forms and timeouts you're facing. Given that my work philosophy resonates with your needs to the tee – providing an automated solution that others can implement easily – I am accustomed to creating clear yet comprehensive documentation like READMEs. This will help you not only validate the job but also harness the power of this script for future updates. A zipped archive containing every requested PDF is what you're asking for, and I promise to deliver exactly that. My attention to detail coupled with guaranteeing accuracy and completeness plays a crucial role when working on such high-volume projects. Best, Junaid.
$750 AUD dalam 7 hari
7.4
7.4

Hello, Working with large databases and automating data extraction is one of my core specialties and a reason why I'm well-suited for your project. As an experienced freelancer who has worked extensively with Python, Selenium, and Web Scraping, I assure you that I can develop a robust and efficient solution tailored to your needs. I understand the complexity of dealing with clunky JavaScript forms and slow website navigation, but my skills in finding reliable anchors for scraping will ensure accuracy and completeness in the PDF extraction process. In addition to providing you with the zipped archive containing all the requested PDFs, the runnable script with inline comments, and a clear README file, I am committed to delivering an optimal user experience for you. My expertise in handling headless browsing or rate-limiting issues empowers me to build workarounds easily into the script without compromising on speed or performance. Choose expertise. Choose reliability. Choose me! thank you Gaurav D.
$500 AUD dalam 7 hari
7.3
7.3

Hello, With my comprehensive set of skills and extensive experience, I'm confident I can successfully tackle the challenging task you've outlined. Utilizing my automation, data management, and web scraping expertise, I will develop a thoroughly automated solution using robust technologies like Python with Selenium. This approach will allow for the creation of an adaptable, future-proofed process that you can run yourself, capturing up-to-date files regularly without any manual intervention. My proficiencies extend beyond just scraping; I excel at data processing and management as well. I intend to parse each AFCA determination PDF and intelligently extract the text elements you need for your AI application. To meet your storage constraints, I can store the determinations as separate .txt files, significantly reducing the overall size without compromising the data's integrity. Furthermore, given my background in full-stack web development and software architecture, creating a readable store/database for your collected determinations will be a smooth endeavor. I propose an Excel index for easy access and reference using categories like date, determination/case number, and financial firm. Time and again, I've excelled at building efficient solutions under similar difficult conditions. Let's discuss your project further to ensure we surpass all your expectations. Thanks!
$555 AUD dalam 1 hari
7.3
7.3

Hello I have thoroughly reviewed your project description and am confident in my ability to assist you in completing it successfully. I believe it would be highly beneficial to delve deeper into the specifics of the job to determine the most effective way forward. I am open to scheduling an interview at your convenience, and I genuinely appreciate the chance to collaborate with you on this project. Your response is eagerly anticipated, and I'm excited about the prospect of working together. Thank you for considering my proposal. Looking forward to your prompt reply! Best regards Rekha!!!
$750 AUD dalam 7 hari
7.3
7.3

Hello, Would you like to see a demonstration of how we can streamline the extraction of thousands of PDF decisions effortlessly? Our automated approach utilizes advanced web scraping techniques to ensure accuracy and completeness while circumventing site limitations. Let's discuss how we can effectively compile the AFCA decisions into a usable format for your AI tool. Best, Smith
$500 AUD dalam 7 hari
7.1
7.1

Hi there, I’ve read your brief and understand you want to build an AI-ready library of AFCA determinations by systematically capturing each decision and creating a simple index (Date, Case number, Financial Firm). Starting with the newer database first is a smart way to validate. I’m a Python developer experienced with scraping JS-heavy and poorly structured sites. I focus on reliable, repeatable collection where completeness matters. Approach Python + Playwright/Selenium to handle AFCA’s dynamic search Iterate results and open each determination Save clean .txt extracted from PDFs (smaller, AI-friendly) or PDFs if preferred Parse page 1 to build an Excel index (date/case/firm) Add retries, rate limits, and resume support Deliverables All texts or PDFs Excel index Commented script + short README for reruns I suggest a small pilot on recent decisions first to confirm quality. Quick questions: • Text only or also keep PDFs? • How many for the pilot batch? Ready to begin once confirmed.
$350 AUD dalam 5 hari
7.3
7.3

Hi I can build a fully automated extraction tool that navigates AFCA’s poorly structured interfaces, loads each determination record, and reliably captures either the PDF or clean text using Python with Selenium or Playwright. The core challenge is handling dynamic JavaScript forms, broken link behaviors, and slow responses, and I’ll solve this with resilient selectors, retry logic, headless browsing, and safe rate-limiting to prevent timeouts. Once each determination is fetched, I can extract text directly to .txt for lighter storage and generate an index capturing Date, Case Number, and Financial Firm from the first page. The script will include inline comments, environment instructions, and a README so you can rerun it every few weeks without modification. I’ll also package all documents into zipped archives and ensure the scraper produces complete, reproducible results across both the new and old AFCA databases. Thanks, Hercules
$500 AUD dalam 7 hari
7.0
7.0

Hi Glenn P. I’m your web developer, ready to turn your project Mass PDF Database Extraction into reality! I’d love to discuss the details and create something amazing together. Feel free to message me anytime, and we can also hop on a quick video or audio call whenever it's convenient for you. I’ve developed many projects exactly like what you’re looking for. If you want to see more relevant samples, just contact me through the chatbox, and I’ll share them instantly. ★ Why Clients Trust Me 500+ successful web projects delivered 430+ positive client reviews Expert in JavaScript, Python, Data Processing, Web Scraping, Software Architecture, Scrapy, Data Extraction, Selenium, Automation, Data Management WordPress, Shopify, PHP, JavaScript, HTML, CSS, Plugin/Theme Development, Laravel, WebApp Clean, modern, responsive and SEO-optimized designs Fast delivery, great communication, and long-term support Available during EST hours for smooth collaboration If you want a professional developer who delivers quality work on time and stress-free, let’s connect. I’m excited to help build something amazing for you. Best regards, Kausar Parveen
$350 AUD dalam 3 hari
6.9
6.9

SURE-------------I will start this work as per the given description ----I have extensive experience with similar PROJECT ---->>I am highly qualified to do this job with high QUALITY ----- I am Passionate PYTHON/Full stack developer having rich experience with so many successful Tasks. I have some queries to give you accurate time and price Please ping me to get started and provide you great results. Thanks
$550 AUD dalam 7 hari
7.1
7.1

With warm regards, I’m Sami from BN-Droids Digital Services - a highly skilled and experienced web development team offering cutting-edge digital solutions. We specialize in Python and tools like Scrapy for expert data extraction. Our portfolio includes an extensive experience of web scraping involving even the most complex, niche databases such as yours with its difficult to navigate JavaScript forms.
$250 AUD dalam 7 hari
6.9
6.9

Hi I can build a fully automated extraction tool that navigates AFCA’s poorly structured interfaces, loads each determination record, and reliably captures either the PDF or clean text using Python with Selenium or Playwright. The core challenge is handling dynamic JavaScript forms, broken link behaviors, and slow responses, and I’ll solve this with resilient selectors, retry logic, headless browsing, and safe rate-limiting to prevent timeouts. Once each determination is fetched, I can extract text directly to .txt for lighter storage and generate an index capturing Date, Case Number, and Financial Firm from the first page. The script will include inline comments, environment instructions, and a README so you can rerun it every few weeks without modification. I’ll also package all documents into zipped archives and ensure the scraper produces complete, reproducible results across both the new and old AFCA databases. Thanks,
$300 AUD dalam 1 hari
6.9
6.9

Hi there, I understand the challenge of extracting numerous PDFs from a poorly structured database, and I'm confident in delivering a solution tailored to your needs. With extensive experience in web scraping using Python, Selenium, and Scrapy, I have successfully handled similar projects that required navigating clunky JavaScript and extracting data efficiently. 1. **Previous Experience**: I've worked on projects with thousands of files, ensuring accuracy and completeness through automated scripts. 2. **Proposed Solution**: I will develop a Python script that leverages Selenium to handle the dynamic elements of the website. It will: - Utilize the text file with filenames as anchors for extraction. - Implement headless browsing to improve performance and avoid detection. - Include rate-limiting measures to handle timeouts effectively. 3. **Deliverables**: You'll receive a zipped archive of the PDFs and a script with clear, inline comments for future use. Additionally, I will provide a concise README for setup and commands. Once the initial extraction is successful with the newer determinations, we can move to the older database. This step-by-step approach ensures thorough validation at each phase. Thanks, Luis
$555 AUD dalam 4 hari
6.3
6.3

Hello, Thank you so much for posting this opportunity. It sounds like a great fit, and I’d love to be part of it! I’ve worked on similar projects before, and I’m confident I can bring real value to your project. I’m passionate about what I do and always aim to deliver work that’s not only high-quality but also makes things easier and smoother for my clients. Feel free to take a quick look at my profile to see some of the work I’ve done in the past. If it feels like a good match, I’d be happy to chat further about your project and how I can help bring it to life. I’m available to get started right away and will give this project my full attention from day one. Let’s connect and see how we can make this a success together! Looking forward to hearing from you soon. With Regards! Abhishek Saini
$750 AUD dalam 7 hari
6.5
6.5

KILSYTH SOUTH, Australia
Kaedah pembayaran disahkan
Ahli sejak Ogo 8, 2014
$30-250 AUD
$30-250 AUD
$250-750 USD
$30-250 AUD
$30-250 AUD
₹12500-37500 INR
₹750-1250 INR / jam
$15-25 USD / jam
€8-30 EUR
$10-30 CAD
$750-1500 USD
€250-750 EUR
$14-30 NZD
₹12500-37500 INR
₹750-1250 INR / jam
$8-15 USD / jam
$15-25 USD / jam
$15-25 USD / jam
$15-25 USD / jam
$30-250 SGD
₹12500-37500 INR
₹5000-12000 INR
$2-8 USD / jam
$250-750 USD
€30-250 EUR