Local word grouping: NLP
$250-750 USD
Dibayar semasa penghantaran
Overview of the task:
This is a project to find a particular type of words from any given Hindi text labeled with their parts of speech. It is a task of natural language processing (NLP).
It requires a high level of programming knowledge in Java (especially the string/text processing).
Description of the task:
Input: Parts of Speech (PoS) tagged text, one sentence per line.
Desired Output: Verbal words boundary marked with a special predefined label.
Example Input text:
वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 पढ़ेगा\VM.mas.sg.3.fut.sim.dcl.fin.n ।\PU
वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 पढ़ता\VM.mas.sg.3.0.impf.dcl.fin.n है\VAUX.0.sg.3.prs.sim.dcl.fin.n ।\PU
वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 पढ़\VM.0.0.0.0.0.0.nfn.n रहा\VAUX.mas.sg.3.0.pft.dcl.fin.n है\VAUX.0.sg.3.prs.sim.dcl.fin.n ।\PU
Example Output text:
वह किताब [पढ़ेगा][url removed, login to view] ।\PU
वह किताब [पढ़ता है[url removed, login to view] ।\PU
वह किताब [पढ़ रहा है][url removed, login to view] ।\PU
or
वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 [पढ़ेगा\VM.mas.sg.3.fut.sim.dcl.fin.n][url removed, login to view] ।\PU
वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 [पढ़ता\VM.mas.sg.3.0.impf.dcl.fin.n है\VAUX.0.sg.3.prs.sim.dcl.fin.n][url removed, login to view] ।\PU
वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 [पढ़\VM.0.0.0.0.0.0.nfn.n रहा\VAUX.mas.sg.3.0.pft.dcl.fin.n है\VAUX.0.sg.3.prs.sim.dcl.fin.n][url removed, login to view] ।\PU
(Note: If you see junk text instead of the Hindi characters in the examples above, please see the attached file.)
Other Details:
There are a total of 192 tags that can be assigned as the boundary marker for the verb group.
More details will follow. A brief of the algorithm is as follows:
1. Start searching for verbal word from the right of the sentence boundary.
2. When a verbal word is found, match it with a template and store it.
3. Continue the search rightward looking for more verbal words and match it with the TAM template. Continue it till the last verbal word found in the sequence.
4. From among the templates matched, choose the longest verb sequence matched with the TAM template and mark the boundary of the verb sequence within square braces ‘[ ]’.
5. Assign the tag of the TAM template matched at the end of the square bracket, prefixed with /VG./.
ID Projek: #992469
Tentang projek
Dianugerahkan kepada:
7 pekerja bebas membida secara purata $546 untuk pekerjaan ini
Hi, I'm a professional Java developer (SCJP 6) with experience in NLP and text processing. Contact me to discuss it further.
Hello! I am developing in Java, JSP since 2000, so I have a huge experience. I would be thrilled to do this project for You.