I need a web scraper written for the following url:
[url removed, login to view]
All pages will need to be retrieved not just page one. The data on this site changes and page 2 may not always exist, however we need to scrape additional pages if they exist.
The output should be a pipe (|) delimited file with the following column mappings:
origin_city --> data located in the first sub column under the "Shipper" column
origin_state --> data located in the second sub column under the "Shipper" column
ship_date --> the date from the "Pick Date" column changed to the YYYY-MM-DD format
destination_city --> data located in the first sub column under the "Consignee" column
destination_state --> data located in the second sub column under the "Consignee" column
receive_date --> leave blank
trailer_type --> the trailer type located in the "Required Equipment" column if van abbreviate to "V", flatbed abbreviate to "F", if the word "hazmat" or abbreviation "HZ" are in this column "hazmat" needs to be in "Comments" column, if the words "swing doors" is in this column "swing doors" needs to be in the "comments" column.
load_size --> "Full"
weight --> data located in the "Weight lbs" column
length --> data located in the "Footage" column
width --> leave blank
height --> leave blank
trip_miles --> leave blank
pay_rate --> data located in the "Pay Rate" column
contact_phone --> leave blank
contact_name --> leave blank
tarp_required --> leave blank
comment --> see description in the trailer_type
load_number --> data in the "w/o #" column
commodity --> data located in the "Pieces" and the "Type" column
The first line of the output should contain all of the column headers.
Any field that contain no data should be left blank.
Please do not use words like "null" or "blank" in blank columns.
Below is a sample output of the first 5 columns using sample data:
The deliverable will be a Perl .pl file that must run on
Ubuntu Linux and must use Modern::Perl. The Perl .pl file
should be called '[url removed, login to view]' and the output file should be
called '[url removed, login to view]'
It will be scheduled in cron to run unattended every 15 minutes.
Please specific what language/OS/modules you plan to use.
Also, please include the word "raccoon" in your bid so I know that
you read this description.
I can create same scraper as before using WWW::Mechanize. Thanks. Roman Relevant Skills and Experience I have worked on similar projects many times before. Proposed Milestones $66 USD - Project completion