
Closed
Posted
Paid on delivery
I have a set of PDFs that contain purely textual information presented in a mix of paragraphs, scattered labels, and occasional row-style groupings. I need every single data point moved into a clean Excel workbook so nothing is lost in translation. Because the layout switches between narrative blocks and ad-hoc table-like sections, the job will likely involve both automated capture (Python, Power Query, or your preferred OCR tool) and some careful manual cleanup to preserve order and context. Deliverable: one well-structured .xlsx file per source PDF, with all columns and data fields represented exactly as they appear, ready for filtering and analysis. Consistent header naming and no stray line breaks or merged cells, please. If you’ve handled mixed-format PDFs before and can turn them into tidy spreadsheets quickly, I’m ready to get started as soon as you are.
Project ID: 40387334
21 proposals
Remote project
Active 4 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
21 freelancers are bidding on average $23 USD for this job

Yes,I have handled mixed-format PDFs before so,I am confident to do this job perfectly within required time and reasonable budget. Please message me here & LET'S GET STARTED. Regards, Shalu
$34 USD in 4 days
6.3
6.3

Hello, I have experience converting mixed-format PDFs into clean, structured Excel files, including handling narrative text, scattered labels, and table-like sections. I can ensure that every data point is accurately transferred into a well-organized .xlsx file with consistent headers, no merged cells, and clean formatting ready for analysis. I’m comfortable using both automated tools and manual refinement to preserve context and order. I’m available to start immediately and can deliver precise and reliable results within your timeline. Feel free to message me to discuss further. Kind Regards. -Habib
$20 USD in 1 day
4.9
4.9

Having worked in data entry and extraction for multiple formats like CSV, XLS, and PDFs over the years, I'm well-versed in handling mixed-format PDFs. I completely understand the challenges they bring to sorting and organizing data, and know just how important it is to retain the structure while ensuring cleanliness. Personally, I've had experiences where the layouts switch between narrative blocks and ad-hoc table-like sections - exactly as you described - so I can assure you that I can handle this job efficiently with minimal data loss or errors. My proficiency in creating consistent header naming will also mean your workbook columns and data fields will be represented exactly as they appear in the PDFs, facilitating easy filtering and analysis. Overall, my goal is your complete satisfaction achieved through on-time delivery of quality work. The affordable costs mentioned are open for negotiation to best align with your budgetary needs. Rest assured, I'm committed to bringing honesty, proficiency, and smart working procedures to this project. Let's move those data points into a tidy spreadsheet swiftly together! Thank you for considering me as your potential freelancer!
$30 USD in 1 day
4.2
4.2

AM READY TO START ASAP: PDF TO EXCEL CONVERSION – MIXED FORMAT TEXT & AD-HOC TABLES (PYTHON/OCR + MANUAL CLEANUP) Hello, I am John K., an MSc Economist & Statistician with over fifteen years of experience. I have delivered 1,000+ projects with a 4.9-star rating. My understanding is: You have PDFs with mixed-format textual information (paragraphs, scattered labels, row-style groupings). You need every data point moved into a clean Excel workbook with consistent headers, no merged cells, and no stray line breaks. I will use Python/OCR (e.g., pdfplumber, Camelot, Tesseract) plus manual cleanup to ensure accuracy. I will deliver: ✅ One well-structured .xlsx file per PDF, ready for filtering and analysis. ✅ Clean headers, no merged cells, preserved data order and context. I am ready to begin. Let's connect via chat to receive your PDFs. Respectfully, John K.
$10 USD in 1 day
3.7
3.7

The narrative-to-table switching is the tricky bit. I'd run an initial pass with pdfplumber to grab the structured bits, then handle the paragraph sections with regex patterns to pull labeled fields. Manual cleanup on the rest to keep everything in the right order. Built a similar pipeline for a 500-page OCR job that's on ffulb.com. Turned messy multi-column layouts into clean Word docs with a formatting rules engine. Same approach works here, just targeting xlsx instead. Can start whenever.
$25 USD in 2 days
3.6
3.6

Hi, I can extract data from mixed-format PDFs with high accuracy, handling text, tables, and complex layouts efficiently. I’ll deliver clean, well-structured, and error-free data ready for immediate use. Reliable, detail-focused work with fast turnaround — ready to start immediately. Best regards.
$20 USD in 1 day
3.0
3.0

Hi, I’m Sudhir, I understand this requires converting mixed-format PDFs (paragraphs, labels, and table-like data) into a clean, well-structured Excel format without losing any information. I’ll carefully extract and organize the data, ensuring proper column structure, consistent headers, and no misplaced or broken text. For sections where structure varies, I’ll manually review and format the data to preserve context and accuracy rather than relying only on automated tools. I use Excel and reliable extraction tools to balance speed with precision. Each file will be double-checked against the source to ensure completeness and clean formatting for easy filtering and analysis. I can start immediately and deliver accurate, well-organized Excel files within your timeline.
$20 USD in 7 days
3.2
3.2

With a wealth of experience in data entry, extraction, and financial management, I'm confident I can deliver on your unique project needs. My particular expertise lies in transforming complex data into cohesive and meaningful insights, skillfully transforming the chaos of mixed-format PDFs into order within Excel workbooks. Fluent in various tools including Power Query, and OCR software, I have keen eye for detail that ensures precise capture of every single data point with minimal risk of translation loss. But data extraction is only half the job; manual cleanup is equally crucial. Here, my meticulousness comes to play: every column, label, and narrative block will find their proper place - no merged cells or stray line breaks but consistent header naming for ease of filtering and analysis. Beyond just delivering tasks, my approach involves strategic thinking. Choose me for not just someone who can manage details, but for a reliable partner committed to helping you make sense of your numbers and grow your business.
$25 USD in 2 days
3.4
3.4

Your PDFs likely mix native text and scanned pages, so a single extraction method won't work across the board. I'd route each page through pdfplumber or Tesseract depending on content type, normalize to a DataFrame with page and region metadata, then output to Excel with a QA sheet flagging low-confidence rows. M1: pipeline + QA sheet, $14, 1d. M2: full batch run + delivery, $14, 1d.
$28 USD in 2 days
2.8
2.8

Using my extensive experience in Python, OCR, and data manipulation, I am uniquely qualified to turn your mixed-format PDFs into clean and structured Excel workbooks. I understand the intricacies of various PDF designs and layouts, having previously processed similar data before. My goal is always to maintain the integrity of the data without any loss. Drawing on my expertise in Machine Learning, I will utilize intelligent algorithms to automate as much of the process as possible, ensuring efficiency and accuracy in capturing data from your PDFs. While automation will be central to our approach, I also understand that manual cleanup is sometimes unavoidable for complex datasets like yours. Rest assured, I am more than comfortable with this aspect of the job. In conclusion, with a blend of advanced Python skills and knack for intricate data management, choosing me means entrusting your project to someone who sees each assignment as an opportunity to merge analytic cutting edge with a passion for problem-solving. I pledge to deliver well-structured .xlsx files that perfectly represent your original PDFs; headers intact and no stray line breaks or merged cells – just clean and ready for your filtering and analysis. Looking forward to working together!
$10 USD in 7 days
1.9
1.9

Hello There, Mixed-format PDFs are genuinely tricky precisely because no single extraction strategy handles both narrative blocks and ad-hoc table sections reliably. The ones that fail are usually pipelines that treat the entire document the same way regardless of what is on each page. My approach uses PyMuPDF to detect and separate layout zones per page before any extraction runs. Tabular sections go through pdfplumber, which preserves column alignment accurately. Narrative and label-style blocks are parsed separately with rule-based logic that maps scattered fields into consistently named headers. Everything then flows into pandas for a cleanup pass that removes stray line breaks, resolves merged cell artifacts, and normalises header naming before the final .xlsx is written with openpyxl. The result is one clean, filter-ready workbook per source PDF with no structural surprises at the other end. As a Digital Operations Specialist with hands-on Python scripting and document processing experience, delivering tidy spreadsheets from complex source files is a standard part of my workflow, consistently at 98%+ accuracy. How many PDFs are in the set and what is your target deadline? Looking forward to working with you. CTS19966991
$15 USD in 1 day
1.8
1.8

Hi, I can implement a python script for that and extract the data. Can you provide me an example of pdf? Thanks, Dorin.
$30 USD in 7 days
1.5
1.5

Hi, there I’ve handled mixed-format PDF extraction projects where data appears in paragraphs, scattered labels, and irregular table-like structures. I can convert your PDFs into clean, structured Excel files with zero data loss. My Approach: -Use Python (pdfplumber / PyMuPDF / OCR if needed) for accurate extraction -Detect and separate narrative text vs. row-style data -Reconstruct logical columns and preserve context/order -Perform manual validation & cleanup to ensure accuracy What You’ll Get: ✔ One well-structured .xlsx per PDF ✔ Consistent headers and clean column formatting ✔ No merged cells or broken line issues ✔ Data ready for filtering, sorting, and analysis I pay close attention to detail, especially with irregular layouts, ensuring every data point is captured exactly as it appears. I can start immediately and deliver quickly within your budget. If you’d like, I can process a sample PDF first to demonstrate accuracy. Best regards, Oluwatobi Okedairo
$10 USD in 1 day
0.4
0.4

Hi, I can help you convert your PDFs into clean, well-structured Excel files while preserving all the original information. I understand that your documents contain a mix of narrative text and table-like sections, so I will carefully organize the data to ensure nothing is lost and everything remains clear, consistent, and ready for analysis. I will structure the workbook with proper headers, avoid formatting issues such as merged cells or broken lines, and make sure the final result is easy to filter and use. I’m detail-oriented and will take the time to maintain accuracy across every data point. I’m ready to start immediately and can adapt to your preferred format or structure if needed. Best regards
$12 USD in 2 days
0.0
0.0

Hello, I can help you extract and organize data from your PDFs into a clean, structured Excel file for easy analysis. Approach: Automated Capture: I will use Python (with libraries like PyPDF2 or pdfplumber) or Power Query to automate the extraction of textual data from narrative blocks and ad-hoc table-like sections. Manual Cleanup: I will carefully review and clean up the data, ensuring that all columns and fields are properly structured, with consistent header names and no stray line breaks or merged cells. Excel Output: The result will be a well-organized .xlsx file that retains the order and context of the original PDF, ready for filtering and analysis. Deliverable: One clean, structured .xlsx file per source PDF. I have experience handling mixed-format PDFs and can ensure a quick turnaround with high accuracy. Best regards, Mark
$20 USD in 2 days
0.0
0.0

I will extract every data point from your mixed-format PDFs into clean, well-structured Excel workbooks with consistent headers and no merged cells. Automated tools plus manual QA for accuracy. Ready to start now.
$20 USD in 2 days
0.0
0.0

Hello, I specialize in OCR and document AI workflows, with experience converting mixed-format PDFs into clean, structured Excel files. Your requirement to extract textual data from paragraphs, scattered labels, and row-style groupings aligns closely with projects I have handled. I have built automated pipelines to process heterogeneous PDFs containing narrative text and table-like sections. These workflows combine OCR and layout parsing to capture all data points accurately, followed by structured formatting and careful manual validation to preserve order, context, and field relationships. For projects like yours, I typically use Python-based workflows with OCR and layout analysis to extract text, normalize fields, and export well-structured Excel outputs. I ensure consistent headers, no merged cells, removal of stray line breaks, and data arranged for easy filtering and analysis. Technical Stack: Python, OpenCV, PyTorch-based OCR, LayoutParser/DocTR workflows, Excel automation, structured data extraction. What I Can Deliver: • Accurate extraction from mixed-format PDFs • Clean, well-structured .xlsx files per source PDF • Consistent headers and properly aligned fields • Manual validation to ensure no data loss I would be happy to review a sample PDF and provide a quick test output to confirm accuracy before starting. Best regards, Santosh
$20 USD in 3 days
0.0
0.0

Ahmedabad, India
Member since Jul 21, 2019
₹1500-12500 INR
$30-250 USD
₹750-1250 INR / hour
$15-25 USD / hour
₹12500-37500 INR
₹75000-150000 INR
$10-60 USD
$250-750 USD
£18-36 GBP / hour
₹12500-37500 INR
$250-750 USD
$250-750 USD
€30-250 EUR
₹750-1250 INR / hour
₹750-1250 INR / hour
$30-250 USD
₹100-400 INR / hour
$10-50 AUD
$30-250 AUD
₹12500-37500 INR