273163 Project for loker

N/A

Dalam Kemajuan

Disiarkan

lebih dari 15 tahun yang lalu

N/A

Dibayar semasa penghantaran

C++ Crawler able to index/reindex pages and download content making xml file for each page. Here are main requirements: * Can be scheduled * The Agent can accept multiple crawl start locations per web site * Support for [login to view URL] * Forbiden string in url (for example do not follow ?, %, or keyword) * Can leave domain / do not leave domain * Max pages per domain (user input) * The agent can support exclusions of files beyond that of the servers standard [login to view URL] * Specify how many levels deep to follow links for starting location crawl * Multi-Threaded for Concurrent Scans * Reindexing New Files or Modified Files Only * Complete Cache Management * Download to specific storage (web, news) * Download Title, Description, Keywords, Page content, Add the following fields: date indexed, Page size, url * Make XML file for each downloaded page with the info above ------------------------------------------------------------------- * Web based administration * List of url's to crawl * Start/Stop/Hold/Continue * Scheduled time index/reindex for specific storage and list of sites * File type: html based (html, htm, php, asp, js, do ...)

273163 Project for loker

N/A

N/A

Tentang projek

Ingin menjana wang?

Faedah membida di Freelancer

Tentang klien

Pengesahan Klien

Pekerjaan lain daripada klien ini

Pekerjaan serupa