I need a web scraper written for the following url:
[login to view URL]
All information needed is available on the main page. The number of rows will vary.
The rows will be separated by dashed line segments.
The output should be a pipe (|) delimited file with the following column mappings:
origin_city --> data is the first city name in the "ROUTE" column
origin_state --> data is located after the comma after the origin_city name in the "ROUTE" column
ship_date --> data located in the "DATE AVAIL." column, change to the YYYY-MM-DD format, if the date is a past date change to current days date also in the YYYY-MM-DD format
destination_city --> data is the city name located after the abbreviation D/O in the "ROUTE" column, if there are multiple "D/O", use data after the last listed "D/O'
destination_state --> data is located after the comma after the destination_city name in the "ROUTE" column, if there are multiple "D/O", use data after the last listed "D/O'
receive_date --> leave blank
trailer_type --> data is located in the "TRAILER TYPE" column
load_size --> add the text "FULL"
weight --> data located in the "WEIGHT" column, do NOT include the decimal point or the data after the decimal point
length --> data located in the "FEET" column
width --> leave blank
height --> leave blank
trip_miles --> leave blank
pay_rate --> leave blank
contact_phone --> leave blank
contact_name --> leave blank
tarp_required --> leave blank
comment --> in the individual blocks of data, in the "Route" column if there are multiple "D/O", add the text "Mulitple Stops" to the comment
load_number --> data located in the "LOAD NUMBER" column
commodity --> data located in the "PALL." column, add text "Pall." before the data located in the column
The first line of the output should contain all of the column headers.
Any field that contains no data should be left blank.
Please do not use words like "null" or "blank" in blank columns.
Below is a sample output of the first 5 columns using sample data:
The deliverable will be a Perl .pl file that must run on
Ubuntu Linux and must use Modern::Perl. The Perl .pl file
should be called '[login to view URL]' and the output file should be
called '[login to view URL]'
It will be scheduled in cron to run unattended every 15 minutes.
Please specify what language/OS/modules you plan to use.
Also, please include the word "raccoon" in your bid so I know that
you read this description.