we have two type of documents:
- multipage PDF files (could already contain OCR detected text)
- multipage Tiff files (one tiff with multiple pages in the file)
These pages contain the standardized patchcode T separator pages.
Samples of the barcode (patchcode T)
- [url removed, login to view] on page 11
- [url removed, login to view] on page 75
The PDFs and Tiff files are documents scanned documents and maybe skewed. So the detection of the barcode has to go with similarities like HAAR detection.
Your job is to provide us a java implementation which
- gets as input either a PDF file or a Tiff file (selectable by param)
- analyses page by page with opencv
- provide us the documentation how to train opencv/haar and of course the trained files also
- splits the file the by given patchcode T into multiple files. The new files have the same filetype as the original file
The reading and splitting of the PDF and TIFF is already implemented via openpdf, but the detection of the barcodes is right now not implemented.
Ensure the pagecode page can have any arbitrary content between the code lines (like in the samples)
Your delivered artefacts have to be:
- the full source code with JDK8 features (e.g. preferably annotations, if possible)
- a fully working [url removed, login to view] to build, test and run the application
- a fully working [url removed, login to view] to build the application as a single jar (embedded all jar inside this single jar)
- a delivery of all sources has to be done in our a selfhosted gitlab (git repository and issue tracking). We will share the access after we award you the project
- all features of JavaSE and JavaEE standards are allowed
- preferably already solved problems like in apache commons is strongly recommended
- additional libs need to be mentioned prior usage to avoid licensing issue
- spring is restricted for this project
- your code will pass standard checks in findbugs, checkstyle and PMD
Your delivered artefacts will run with/in:
- Eclipse Oxygen (no Netbeans, no Idea IntelliJ)
- Maven 3.2
- oracle jdk8 version 111 or newer
Budgets and rates:
Place your offer independently of our offer. We do not disclose our budget nor do we disclose our rates which we are willing to pay.
share us your experience in OpenCV.
What AI algos/variants have you used already?
What image and object recognition systems do you know?
What image/object recognitions can you demo us?
in the meanwhile we have a implementation which is working with org.bytedeco javacv-platform
The requirements changed slightly:
- The rotation angle is needed to be implemented if the page is skewed during print or scan
=> your implementation has to give the rotation angle
=> your implementation has to rotate the image into correct angle
- the patchcode free area contains a logo, which needs to be detected also
=> your implementation has to detect a given logo image on a given png file and has to consider the size of the scanned logo to have a minimum sizing in mm or inches
here the first probable solution http://bit.ly/2B5Rg23 for the angle