I require a Web Crawler written in Perl that does the following: * It must be able to crawl multiple sites at a time * It must be able to run on multiple computers using the same database * It must be able to run on Windows or Linux (using ActivePerl) * It uses a MySQL database * It must store the page downloaded, server type, HTTP response code (e.g. 200), time and date and URL crawled for every page. * It must store in a separate table every link in the format FromURL (page the link is on), ToURL (where the link points to) and LinkText (the text that was on the link * It must obey [url removed, login to view] and the correct META tags * It must keep track of redirects in a separate table * It must support incremental crawling I reccomend you use the following: * HTTP::LinkExtor ([url removed, login to view]~gaas/[url removed, login to view]) * WWW::RobotRules ([url removed, login to view]~gaas/[url removed, login to view]) * LWP::Parallel::RobotUA ([url removed, login to view]~marclang/[url removed, login to view]) Please ask for any details you are not sure of.
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Any requirements of the script such as exported SQL tables must be included.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
4) The coder isnt paid until the buyer is happy with the program.
Windows using ActiveState's ActivePerl Linux using Perl