Create Multi-Threaded Distributed Web Crawler on AWS

Ditutup Disiarkan Mar 29, 2013 Dibayar semasa penghantaran
Ditutup Dibayar semasa penghantaran

This is much, much simpler than a typical 'web crawler'. It needs to be run as cheaply as possible (preferably on AWS).

The software has 2 simple functions:

1. URLS: Grab a webpage (with a multi-threaded approach), these are simply pulled from the db along with the extraction class to use.

2. EXTRACTION CLASSES: Classes with ability to easily extract data from HTML, following a given pattern and insert into db. (with a multi-threaded approach)

You should follow this Perl approach and make sure your solution will garner similar, if not better results.

[url removed, login to view]

(Further reading: [url removed, login to view] )

For an experienced programer I expect this to take no longer than a day as instructions are laid out above, therefore budget is very low, bid accordingly.

Amazon Web Services Pengaturcaraan C Pengaturcaraan C++ Perl Kejuruteraan Perisian

ID Projek: #4381334

Tentang projek

4 cadangan Projek jarak jauh Aktif May 5, 2013

4 pekerja bebas membida secara purata $179 untuk pekerjaan ini

mccheung

Hello, I'm 3 years perl gramer. and good at on data scrap job. Thanks

$176 USD dalam 4 hari
(1 Ulasan)
2.5
d0tnet12

consider it done . !!! check pm.

$180 USD dalam 6 hari
(2 Ulasan)
2.4
shahroz91

consider it done.

$200 USD dalam 5 hari
(0 Ulasan)
0.0
pvdenis76

could you explain few Qs: 1. is pages already downloaded and saved in db ? 2. what you mean "given pattern" is it regexp ? ...forked processes not a prob, prob to understand from where take info and how to process Lagi

$160 USD dalam 3 hari
(0 Ulasan)
0.0