PDF Parser-Indexer

I need a solution/utility or set of utilities to be able to :

a) index a large set of PDF file(combination of single and multiple paged files)

b) dump the indexed information in a data structure ideally a MySQL/Access/MSSQL/Postgre database

c) allow manipulation of data collected

d) generate JPEG and image maps based on data manipulated

e) generate new PDFs based on data manipulated

The script/utility would normally be an iterative solution to be able to do the following on a set of PDF files (aprrox 400 PDFs):

1. Read the PDF files

2. Get all text from PDF into a Database

3. Get all images from PDF into a separate file and note in Excel/MySQL DB

4. Get all web hyperlinks from PDf

5. Get all email hyperlinks from PDF

6. Convert each page of PDF into a JPEG

For each data item(text,hyperlink,image) the Database should be hold information regarding :

1. The page(PDF page) and file in which the data item is found.

2. The position/(starting-ending coordinates)/size(width/height) of each data item on PDF page.

Based on the information stored in the database(parser results), either through same utlity or another separate utility, the following should be possible:

1. enter additional customizable information connected to data items - possibly in another table or database(comments such as enabled, category, section,location, address, etc.)

2. based on additional information, image maps should be created on the JPEG or new PDFs generated.

The solution should be very generic and reusable. It could be a server-based solution or executable solution.

Solution needs to be open source as further development on how to best utilize software in different scenarios and assess limitations will be based on script architecture and structure. Usage of code will be agreed with solution provider.

This is phase one of the project and is the framework for future works(normally much easier than this one)

The solution provider is expected to understand fully the script developed as the same provider will be kept for future enhancements and add-ons (planned: phase 2,3 and 4)

Kemahiran: Java, Linux, PHP, Visual Basic, Destop Windows

Lihat lagi: php pdf parser, pdf parser php, open source pdf parsers, pdf parser mysql, java pdf parser, php pdf parsing, postgre parser, parse pdf html imagemap, php parse pdf, mysql pdf parsing, parsing pdf database, open source pdf parser, mysql pdf parser, pdf parse mysql, multiplication table, parser pdf, the best framework for web development php, table data structure, set in data structure, set data structure, php and mysql web development pdf, mysql architecture pdf, java web development pdf, java table data structure, indexed data

Tentang Majikan:
( 0 ulasan ) Forest-Side, Mauritius

ID Projek: #352370

11 pekerja bebas membida secara purata $1076 untuk pekerjaan ini


We are experienced of such activities please start

$1500 USD dalam 7 hari
(29 Ulasan)

Hello. Experienced with pdf/JavaScrpt/Ajax/PHP/MySql. Can do it with the best quality. Thanks. Alex.

$1270 USD dalam 17 hari
(60 Ulasan)

Hello, the project you need can be done fast and professionally. Please check PMB for more details.

$900 USD dalam 10 hari
(1 Ulasan)

I have done a similar project before for extracting information from PDFs. I am also fluent in Java related technologies.

$817 USD dalam 10 hari
(1 Ulasan)

ready to resolve this project. i`ve made similar parsers for TIFF, EPS and simple PDF in the past using PHP.

$1100 USD dalam 19 hari
(0 Ulasan)

Please read your PMs for detailed information about my proposal.

$999 USD dalam 5 hari
(0 Ulasan)

Techno Softwares is an India based IT Solution provider. We have created such Tool to parse PDF. Pls refer PMB for more details.

$1500 USD dalam 15 hari
(0 Ulasan)

This is a natural for perl on linuc, or c++ .net on windows. Either way, easy.

$750 USD dalam 5 hari
(0 Ulasan)

Dear sir, I can provide the project by using Java and jPod libraries ([login to view URL]). All indexed information will be stored in into a database with JDBC driver (we can use MySQL, PostgreSQL, Lagi

$800 USD dalam 5 hari
(0 Ulasan)

Hi, I have worked for building search engine where we needed to crawl,,index and search all types of files. let us do this and u will get work as u [login to view URL] please provide more details for user interface and option Lagi

$1000 USD dalam 8 hari
(0 Ulasan)

Hello Sir, we studied the requirement and make sure this software can be finished on time. Intechno is a software outsourcing company. We have successfully completed over 100 projects for our clients, including the Gl Lagi

$1200 USD dalam 14 hari
(0 Ulasan)