This is a challenging project and not for the *faint hearted*. If you believe you are capable of this project description please bid with some reference (if any).
We are in need of a proof-of-concept / prototype program to do the following. (Please note there is a high chance of further engagement should the prototype is successful and we are awarded the project from our client).
1. Physical document will be scanned and OCR
2. A module developed by you will recognise key words in a dictionary and sectionized into textual objects.
3. Another module will display the textual objects and allow mapping to columns in a destination table. The mappings and sections identified will be saved to a template. This module also allow user to modify the sections if not correctly identified.
4. Subsequent documents scanned will input into database directly using template from step 3.
There can be many variants of documents where key words are in different locations and sections are also of different variable length and width.
A typical type of document - quotation, tender
Example of key words : Qty, quantity, item description, price, cost, item cost.
**+ We are paying for this prototype/proof-of-concept work. If our client accepts the work and proceed with us to the project stage, then you will be accepted for extension of work with higher value reward.
+ Please take a look at some of the samples of documents. Our understanding is that there can be hundreds of variants documents.
+ The most crucial part of this prototype is the recognition module where some cognitive/artificial intelligence routines will be required.
+ A suggestion is to recognise the sections, then form XML tags around the text, and in the designer module, allow visual rearrangement of the sections before mapping to the columns in a table for a database (preferably SQL server)
**What we require from you
**Please reply with **a bid** and **indicative timeline** you will require to develop this proof-of-concept. To help us make the best selection, please provide some high level details on how you plan to go about this project.
**Additional notes: 22nd June 2004 (Please see platform section)**
1) Complete and fully-functional working program(s) in executable form as well as complete source code [all source code produced by provider with exception to 3rd party controls/dll] of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
**Additional notes: 22nd June 2004**
My developer has tried using Scansoft OmniPage OCR and we have attached the result (please see attachment). As you can see, the OCR has done a nice job with the accuracy as well as the auto layout. Assuming we can get this kind of results to you as a .rtf or .doc file, your heuristic program will take on from step 2 (see above).