Custom Nutch Parser Plugin with mapping feature

I am after someone who has experience writing custom Nutch plugins.

The details of the project will be given to only those that meet first round requirements. You must have decent experience here and can show experience with Nutch.

I am not after someone to write me a parser for a particular site.

I am after someone who can write a custom parser based on ANY DOM TREE STRUCTURE!

If you dont understand what that means, after nutch crawls a page I want fields with any data stored and automatically named.

Eg if there is a field <Div>Someinfohere</Div> then the field that extracts that data is called <fieldname Div1>Someinfohere<field>

Thats the first step, creating order from html.

Second step is an easy way for me to map this to <fieldname Div1> to a solr [url removed, login to view] field.

To do that I think the best way would be to have the data stored in a database of some description and a simple GUI created so that I can easily map <fieldname Div1> to <solr schema field>.

Choice of technolgy is yours as long as it runs on a LAMP stack. Php and mysql preferred.

Will be crawling approx 10 000 sites so this thing will have to handle any html template I throw at it. If there are multiple Divs on a page, call them Div1, Div2 etc.

The Dom structure will be your guide eg HTML|DIV|TABLE|TR|TD| Some info here

Im on a tight budget, dont go crazy with your bid.

Kemahiran: Apache, Apache Solr, MySQL, PHP

Lihat lagi: nutch mysql, nutch custom parser, nutch custom html parser step step, php solr parser, nutch solr mysql, nutch parser plugins, custom nutch, php parser nutch, parser plugin nutch, nutch custom plugins, nutch custom parser plugin, nutch parser, writing custom, what is tree in data structure, what is tree data structure, what is stack data structure, what is a tree in data structure, what is a tree data structure, what can you do with php and mysql, what can i do with php and mysql, tree structure in c, tree of data structure, tree map template, tree in data structure, tree data structure in c

Tentang Majikan:
( 7 ulasan ) parramatta, Australia

ID Projek: #1562360

2 pekerja bebas membida secara purata $500 untuk pekerjaan ini


I work with drupal everyday building plugins and web applications using the Drupal CMS. I am ready to start your project today. I have extensive knowledge in web design I am an expert in the following: PHP, Drupal, Lagi

$250 USD dalam 3 hari
(3 Ulasan)

Pls check PMB.

$750 USD dalam sehari
(0 Ulasan)