We are developing a browser based application for handicapping horse races utilizing statistics information. Information is collected from various sources utilizing xml data feeds, json data feeds and web page parsing. The reporting end is written in php, html and jquery. The data is stored in a mysql database. Most of the importing of data is done with the exception of a few websites and json data feeds we'd like to check for data updates. We are currently budgeting $100 / week, but demonstrated ability over a few weeks could lead to us increasing our weekly budget. We are looking for a developer or team of developers that can work on data import and reporting features. The reporting engine is curently not using any framework. It is custom PHP, and may be re-structured over time.
Tasks for the comming few weeks will include:
Parsing of table data from websites and updating the mysql database - these scripts do not have to be PHP. Any language that runs on linux is fine, but php is preferred.
Creating customized sql queries and output templates for reporting against the database
This project is going to be long term, and we plan to increase our weekly budget from $100 to $500 as we grow the system.
Below is the functional spec for the first item, more like this to follow...
As a system, the database should be updated periodically and the horses marked as scratched should show as such in the interface
Mysql database has a table called races which has fields race_date (example: ‘2016-06-30’ as date type), track (varchar(5) most are 2 or 3 character uppercase strings and will match our data source), race (integer - this is the race number in the day)
Example - race_date: ‘2016-06-22 | track: ‘bel’ | race: 1
Website to be scraped is specific to day of race, and all updates will only happen on day of race.
(URL to be provided upon acceptance) provides an html table list of available track/date combinations available with links to individual pages to be scraped.
Url pulled from previous step will provide in page body a table with a list of updated data
In a script that can be automated, the list of pages that relate to today’s date will be parsed
For each page related to today’s date, the main table in the main body will be parsed
For each row that denotes a horse was “scratched”, the script will update the database marking the specific horse as scratched (DB field exists. Value 1 = scratched. 0 = not scratched.)
Whenever possible, the script or better yet class and execution script, should be written for future use. The page we’re scraping contains more than just scratches. It also contains Jockey Changes, Track Surface changes, and more. If this is completed well and within budget we have many hours of work to provide.
Week 1 - Parser described above - $100
Additional tasks in the pipeline:
We have a json feed parser script in php that is pulling data for race result information. It is currently only reading US tracks, We need to adjust this script to read canida tracks and determine when a few tracks seem to not work from the US.
We have a complex system that downloads XML files full of race data, uses XSLT to convert it to sql statements, and updates the database. Periodically some tracks do not import properly. This needs to be debugged.
The reporting output needs to be updated to include a number of new fields and the layout needs adjustments as well