I'm a sole contractor and have had an idea where I'd like to publish a one page website that can accept a PDF file being drag and dropped into a box on the web page, the page then converts that pdf file to text (so we're not talking OCR, just converting to text) and then using some regex pattern matching (or if there is a better way - happy to hear what you think) gets a couple of key fields out of the PDF document (which in the first use case I'm thinking of, will be an invoice so we'd extract something like the invoice number or a purchase order number) and returns these key fields in a json object printed in an output window also on the web page.
In future, the intent would be to pass that json object into an ERP application via web service but for now I'd really like to get the basics of the web site, drag and drop, convert to text then return field data to the screen.
The attached PDF shows a PO number in bold and that's the thing I want the website to return in the json object. The pattern for the PO number is shown in the attachment.
I've had a crack at this myself but I don't have the programming skills to get the PDF DnD box to take the PDF file and read it and then do something with it.
I don't know if it is possible or not but ideally I don't want the site to store a copy of the file or any record at all really of the file or it's contents. It is just a utility that takes the file, returns the field data and that's it - no record, no history, no backups or anything like that.