Dibatalkan

Web Crawler

I am looking for a developer(s) who can develop a web crawler or spider in PHP or using PHP frameworks.

The end-product will only require root domain names.. (eg [url removed, login to view]); which is stored in a table... and it will go through through it one by one.

The purpose of the spider is to finds all working links within that domain.

Would be nice to have a functionality that allows it to start at [url removed, login to view]\sub folderThe discovered links will be stored in a table.

It should knows if it fails on a site and continue where it left off...

Error logs if it encounters issue.

Should work on any domains.

Needs to be efficient.

The database used is mySQL.

The end product should work with php [url removed, login to view]

This can be billed as hourly, but I would need a quote on how many hours it takes and at what rate...

This can be on a project based also...

This is one of part of many development, would like to establish a long term relationship.

More thorough specification:

Assuming we are crawling for domain.com...

- It looks for all pages within this website, 100% of the pages in domain.com must be accounted for.

- All found links must be in the format of http://domain.com/<whatever>

- It needs to be able to handle multiple domains, I must be able to add domains and it will crawl them next time the cycle runs

- It needs to crawl multiple domains at once, meaning it should not be ran one after another. It should start multiple threads (like 3-4 at once) or preferrably some way I can control how many threads it starts.

- It needs to know when it fails and when it completes. Meaning, it shouldn't stop in the middle of a crawling. If it does, it must know to timeout and go to the next site or link and retry next time.

- The found list will be inserted into a new table, and columns as follow:
1:ID
2:domain name - [domain the link belong to]
3:link itself - [link]
4:Date Active -[date] (first date the link was found. When the same script is ran again, it should not update this date again.)
5: Date inactive - [date] (if script is ran again, and this link cannot be found, insert a date)

- Script will be crontab, if not you can suggest otherwise.

Kemahiran: Reka Bentuk Grafik, HTML, PHP, Reka Bentuk Laman Web

Lihat lebih lanjut: working of web crawler, who needs to design web, where to start in web development, what should knows web developer, what is php used for in web development, website development frameworks, web developer website names, web developer rate, web developer names, web crawler developer, web-crawler, using go for web development, table in web developer, spider web design, php web developer hourly rate, nice design web, logs design, how to start web design, how to start off as a web developer, how to start as web developer, how to develop the web site, how is c# used in web development, how c++ can be used in web development, how can web develop, how can i develop a web site

Tentang Majikan:
( 0 ulasan ) Markham, Canada

ID Projek: #1585625

3 pekerja bebas membida secara purata $129 untuk pekerjaan ini

kalidass678

Hi am ready to take your task. please check PM.

$140 CAD dalam 2 hari
(120 Ulasan)
6.8
Instantsolutions

i am interested thanks.

$131 CAD dalam 5 hari
(106 Ulasan)
6.7
marchent

consider my bid as fixed price, not hourly. thanks

$115 CAD dalam 7 hari
(161 Ulasan)
6.2
designguru47

Hi, please check PMB

$155 CAD dalam 3 hari
(48 Ulasan)
5.6