Dibatalkan

Web Crawler Identifying Layout

I am looking for a solid web crawler, that has one task, and one task only...

Identify different page layouts on a site.

Some site, especially webshops have category pages, subcategory pages, product pages, checkout pages...

This crawler, should not identify the purpose of the page, but be able to take a site with 500.000 pages, and identify how many different page layouts there are.

In the end, it should end up making a list of each url, and add a layout ID (XML)

EXAMPLE XML

<website>
<ws_info>
<ws_url>http://domain.com/</ws_url>
<ws_pages>146.000</ws_pages>
<ws_cats>6</ws_cats>
<ws_scraped>01.07.2010 11:53:07</ws_scraped>
</ws_info>

<cpage>
<cpage_scraped>
<cpage_url>http://domain.com/some-page-url</cpage_url>
<ws_cat>3</ws_cat>
<cpage_scraped>

<cpage_scraped>
<cpage_url>http://domain.com/some-page-url</cpage_url>
<ws_cat>6</ws_cat>
<cpage_scraped>

</cpage>
</website>

Performance and speed of the scraper - as well as how it will intelligently view one page appart from the other is a main ingredient of this scraper.

Some sites have very similar pages, however making the scraper identify an element as a menu, submenu or navigation - thereby making it ignore the element is very much wanted...

I dont want to scrape a site with 200.000 pages, and the scraper comes up with 110.000 different category's of pages.

Kemahiran: Algoritma, Pengaturcaraan C#, Java, Machine Learning (ML), PHP

Lihat lagi: web-crawler, layouts for web pages, web crawler layout, identify web crawler, web page layouts, xml layout, web making, identifying, id layout, crawler, subcategory, web category, php crawler product, category layout, web task php, category list layout, 500 web list, category subcategory java, category subcategory list java, checkout page layout

Tentang Majikan:
( 0 ulasan ) Brønshøj, Denmark

ID Projek: #731731

7 pekerja bebas membida secara purata $427 untuk pekerjaan ini

aspnetexpert

please see pmb

$400 USD dalam 15 hari
(5 Ulasan)
3.4
SPDotNetDev

I have been working as a .net developer for last six years. I also have experience on sharepoint. I think i suit well for this work. my core skillset includes: C#, SQL Server, .Net framework, Sharepoint and html.

$450 USD dalam 10 hari
(0 Ulasan)
0.0
ehtashamulhaq

Let me help you out in this task. I done similar kind of task in a semester project of mine BS(CS) degree.

$350 USD dalam 10 hari
(0 Ulasan)
0.0
jacklee2000

I have MS in CS and 10 years working experience in web and search engine fields, I am experienced in web crawler development.

$340 USD dalam 9 hari
(0 Ulasan)
0.0
SCAnalytics

We are a team of .Net experts. We can do this project for you.

$300 USD dalam 7 hari
(1 Ulasan)
0.0
UpiterSoft

Hello, I have experience in web page analysis and I can do it good. Please see PM

$750 USD dalam 14 hari
(0 Ulasan)
0.0
codejam212

Hi, We are the group of people working from both India and US with knowledge in PHP, C#, ASP.NET, Data processing, Sql Server, MSSql, DB2, Joomla, Drupal did several projects as the same and we are really interested in Lagi

$400 USD dalam 10 hari
(0 Ulasan)
0.0