The objective is to create a command-line tool that will convert a (structured) PDF file containing publicly available voter rolls in India, into text.
The output will be stored in a CSV sheet. There are further details in the attached instruction file.
Some points to bear in mind:
- The text is in the Devanagari character set (i.e. in the Hindi language)
- The voter rolls are arranged in a grid (3 columns and n rows) - see attached PDF
- There are known issues with fidelity of information during a simple copy-paste from PDF to text
- The tool is expected to be run on a Linux system and take two command-line parameters: the path + file name of the source PDF and the path + file name of the output CSV file