RSS Aggregator

We need a program which runs on a linux (opensuse) server and gathers given XML-Sources. The Database-Connection and all other settings or values should be placed in a ini or xml file.

The Algorithm:

1. Get all RSS-Urls from the database (containing rss_id,url,category_id and some other fields - table layout will be provided)

2. iterate (asynchronous! - it should gather x simultanous, x should be in config) through all entries and retrieve the contents

3. Parse contents:

this should be done in 3 steps:

First: Try to parse RSS-Format / Atom-Format (all versions and all different fields f.e. contant|description|summary or pubDate,updated,dc:date,pwd:timestamp etc)

Second: If this fails the script should try to parse the elements by RegExp or string match

Third: write all entries of the current rss-url to database (into one temp-table and another table) with some informations off the media-item (f.e. category).

We have a PHP-Script which allready implements all this features, this can be provided for detailed instruction what the script should do.

IMPORTANT: The script should handle all encodings 100%!! Sometimes the XMLs have a given "utf-8" but the contents are encoded in ISO..., or only one item is utf-8 and the other in different encodings. This has to be absolutely safe!

We provide:

- 200 RSS-Items for testing

- MySQL-Table-Structure

- PHP Script with the current algorithm

We need:

- All sourcecodes and files

- runnable, bug free java program (or C++) for linux

- short instruction how to use it

The program should be max. performant, so please consider this during development.

Kemahiran: Pengaturcaraan C, Java, Linux

Lihat lebih lanjut: rss aggregator, string match algorithm, string match, string algorithm, sources format, regexp linux, media match, match string, linux regexp, use algorithm, different algorithm, development algorithm, algorithm testing, algorithm string, algorithm development steps, getafreelancer rss reader, rss aggregator india, important steps, timestamp, summary table, rss, iso, iso 7, encoded, dc

Tentang Majikan:
( 1 ulasan ) Aachen, Germany

ID Projek: #383917