Local word grouping: NLP

Lengkap Disiarkan Mar 21, 2011 Dibayar semasa penghantaran
Lengkap Dibayar semasa penghantaran

Overview of the task:

This is a project to find a particular type of words from any given Hindi text labeled with their parts of speech. It is a task of natural language processing (NLP).

It requires a high level of programming knowledge in Java (especially the string/text processing).

Description of the task:

Input: Parts of Speech (PoS) tagged text, one sentence per line.

Desired Output: Verbal words boundary marked with a special predefined label.

Example Input text:

वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 पढ़ेगा\VM.mas.sg.3.fut.sim.dcl.fin.n ।\PU

वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 पढ़ता\VM.mas.sg.3.0.impf.dcl.fin.n है\VAUX.0.sg.3.prs.sim.dcl.fin.n ।\PU

वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 पढ़\VM.0.0.0.0.0.0.nfn.n रहा\VAUX.mas.sg.3.0.pft.dcl.fin.n है\VAUX.0.sg.3.prs.sim.dcl.fin.n ।\PU

Example Output text:

वह किताब [पढ़ेगा][url removed, login to view] ।\PU

वह किताब [पढ़ता है[url removed, login to view] ।\PU

वह किताब [पढ़ रहा है][url removed, login to view] ।\PU

or

वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 [पढ़ेगा\VM.mas.sg.3.fut.sim.dcl.fin.n][url removed, login to view] ।\PU

वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 [पढ़ता\VM.mas.sg.3.0.impf.dcl.fin.n है\VAUX.0.sg.3.prs.sim.dcl.fin.n][url removed, login to view] ।\PU

वह\PPR.0.sg.3.dir.0.n.n.0.n किताब\NC.fem.sg.dir.0 [पढ़\VM.0.0.0.0.0.0.nfn.n रहा\VAUX.mas.sg.3.0.pft.dcl.fin.n है\VAUX.0.sg.3.prs.sim.dcl.fin.n][url removed, login to view] ।\PU

(Note: If you see junk text instead of the Hindi characters in the examples above, please see the attached file.)

Other Details:

There are a total of 192 tags that can be assigned as the boundary marker for the verb group.

More details will follow. A brief of the algorithm is as follows:

1. Start searching for verbal word from the right of the sentence boundary.

2. When a verbal word is found, match it with a template and store it.

3. Continue the search rightward looking for more verbal words and match it with the TAM template. Continue it till the last verbal word found in the sequence.

4. From among the templates matched, choose the longest verb sequence matched with the TAM template and mark the boundary of the verb sequence within square braces ‘[ ]’.

5. Assign the tag of the TAM template matched at the end of the square bracket, prefixed with /VG./.

Java JSP

ID Projek: #992469

Tentang projek

7 cadangan Projek jarak jauh Aktif Mar 25, 2011

Dianugerahkan kepada:

ozaidan

Hi, I think I can do this if you can provide some details in the PMB. Cheers!!

$520 USD dalam 30 hari
(3 Ulasan)
4.3

7 pekerja bebas membida secara purata $546 untuk pekerjaan ini

AshwinSen

Hello, Please view PMB. Ashwin

$750 USD dalam 15 hari
(39 Ulasan)
5.7
MKScott

Hi, I can do it. Conatact me to discuss details.

$500 USD dalam 10 hari
(7 Ulasan)
5.0
gyk2k

Please see PMB

$500 USD dalam 5 hari
(2 Ulasan)
3.8
try67

Hi, I'm a professional Java developer (SCJP 6) with experience in NLP and text processing. Contact me to discuss it further.

$500 USD dalam 10 hari
(1 Ulasan)
3.2
vargasanyi

Hello! I am developing in Java, JSP since 2000, so I have a huge experience. I would be thrilled to do this project for You.

$400 USD dalam 15 hari
(0 Ulasan)
0.0
jagdishranjha

check my PM

$650 USD dalam 20 hari
(0 Ulasan)
0.0