I have a project where I have to implement translation to English from four languages.
I have planned to implement using open-source pre-trained models.
Pick up files from a folder and extract the text using tika and push it to database1.
Construct paragraphs if the structure is broken (if possible).
Using APIs translate the text para-wise, Save it to database 2. (This is required)
Assemble the text for each document from database 2 and push it to database 1 as translated text of the document.
Summarize this using a pre-trained model. push the summary into database 1.
Extract important catchwords, semantics etc, and push it in database 1.
Web interface to show a list of documents (Listing based on the heading of the file, date-based, based on topics, catchwords)
Open the selected document in two tabs.
Tab 1 Parawise Original text and Translation full (Translation should be editable and pushed into a saperate location on database2.
Tab 2 Full document Text and translation along with summary, semantic analysis, catchwords etc (if the translation is edited then edited one should be visible here)
For more details leave a message.
Project in python using standard libraries like transformers and I have half of the work already done myself with respect to survey effort
I will share the details if one can help in integrating everything with MongoDB and web-based GUI.
This will be a proof of concept for a hardware project.