I want to extract titles from pdf pages and match them with a search query. See attached file for an example.
In the attached file, if I search for "Balance Sheet", the code should be able to return page 232.
So input will be a string and output will be a page number (integer value).
Note that "balance sheet" would be at multiple locations but we want to return only those pages in which it is in the title.
If you have previously used pdfminer then this should be easy for you. I'm open to other core languages like Java.
You can also explore pdftitle library, if that works.
Important thing is speed and accuracy. We tried doing it with PyPDF but it is not so accurate. So keep that in mind.
We can provide many other example documents if needed.
14 pekerja bebas membida secara purata ₹24821 untuk pekerjaan ini
Hello Sir! I think I'm a great fit for this project because I have an interest in your project and can deliver on time, according to your specifications