Large-scale research needs to be done on automotive classifieds listings, spanning multiple auto classifieds websites (e.g. yahoo autos, google autos, cars.com... ~10 sites total).
This project is the first step of this research - Gathering listings from all the auto classifieds websites and storing them in a database. Essentially, the sites need to be spidered, and the individual ads parsed and dumped into a SQL database. This is also to be thought of as a small starter project to get the developer's feet wet, and to determine whether the skill set of the developer is sufficient for larger projects to come.
Although the below can be changed if there are good reasons to do so, the below architecture is how I envision this project to be structured:
There will be mainly two tables in the database:
new ads - New ads found on the listing websites that are determined to not exist in the "new ads" or the "committed ads"
committed ads - Previously "new ads" determined by a human being to be valid
Based on that, I envision this project to be composed of the following modules, listed bottom-up:
DB entry backend - Simply interacts with the SQL database, determines if the entry is a duplicate, and if not, adds the entry to the "new ads" table. Should be a separate module, because we may decide to change the criteria that determines duplicates as time goes on.
Ad backends - These can potentially be written using HTML::Parser. One must be written for each of the ~10 websites. They should interperet an ad page and pass the relevant information from it to the DB entry manager.
Listing backends - These can potentially be written using HTML::Parser. One must be written for each of the ~10 websites. The role of this is to generate and to traverse a listing of ads and call the corresponding ad backend for each ad. Can be given search criteria to minimize the number of records returned by the site to something managable. Each of these will be eventually be called upon automatically on a nightly basis with enough criteria to ensure an exhaustive search of all websites.
Scraper frontend - written in PHP - simply goes through all "new ads", displaying each ad to the user, giving the user the option to either move the ad to the committed ads or to mark the ad invalid.
7 pekerja bebas membida secara purata $679 untuk pekerjaan ini
We have similar project and we can offer you the professional services, the team with over 10+ years working experiences, established in Jul, 2004. please kindly check PMB for details, thank you.