I would like to have a similar package for tables, images and text extraction. Where tables, images and text will be my sub-packages. Say we create a package named BLA, we should be able to extract tables from a particular pdf using a single line of code for example;
from BLA import [login to view URL]
page_number will be another module which will extract tables from a particular page or all the pages if set to default. (something like this)
random_name=BLA.tables.txt. page_number (name of the pdf file)
This should now give me the tables that are present on that particular page. Similarly, for images and text.
There are many open source packages like Tabula-py for table [login to view URL] for table, text and also images I think, pdfminer is used only for text extraction and then there is Camelot again for tables. I hope this is a bit clear about what is to be done in the project.