Site Scraper for 4 specific pages

I have need of a utility to do the following. (This is reposted, since the last guy screwed me.)


(1) Watch 4 specified webpages.

(2) Detect any *content* changes, additions, deletions, and

(3) Send an email with what changed, identified in a way that can be used to update a set of external MySQL databases of the current content.

The application must be run via a browser/command line, and/or via CRON.

So, the idea is to scrape a specific set of websites for updated/changed/added material. The sites are currently using the DOM, rather than just putting the material into dynamic HTML pages. Yes, they are purposely done that way, so that a viewer cannot simply point a free tool at it and get alerts on what changes, day to day.

Here is the list of sites it needs to watch -- these are sites that are required (by the US Dept of HHS) to be made publically available, but there are no requirments to make them "user friendly" and the vendors do NOT make them easy to track. Nevertheless, this is public data, on these pages:

A - [url removed, login to view]

B - [url removed, login to view]

C - [url removed, login to view]

D - [url removed, login to view]


The first three sites listed use a "Details" link to display additional content, which needs to be checked at the same time as the main pages.

Single character differences are CRITICAL, so the change/accuracy requirement is down to character level.

To make matters difficult, the publishers of this content use different formats, even for similar data on the same pages, so you cannot necessarily count on the format to be your template for detecting changes.

The utility needs to run according to settings by the me, 1-4x /hour, every day.

Detection of change must produce an email, sent to an email address I will supply later.

The email should supply a copy of exactly what changed, and be identified by the "Issue Name" or "Name" field (the one field which is available and unique on every site).

I will require the source code, so that it can be modified, later -- the sites tend to change...

I prefer that this be written in php, but a good price may convince me otherwise...

Skills Required:

wordpress, dom, html, producer, mysql, php

Kemahiran: PHP, Kejuruteraan Perisian

Lihat lagi: PHP scraper, www dom com, what are databases and what are they used for, websites for wordpress, viewer count wordpress, template tool price, simply healthcare, requirement application format, in need of a producer, i need a producer name, hour change 2012, get external data, get a producer, free template price list, free publishers, free html websites template, free html material template, d&b supply, code template for get and set unique line, code for update in php, character 3 d free, architecture site, architecture idea, 1 hour email address, webpages using html code

Tentang Majikan:
( 2 ulasan ) Apple Valley, United States

ID Projek: #1537036