Parses horse racing result charts into JSON/CSV/Java...
When given an Equibase result chart PDF file e.g.
chart-parser
can turn it into machine-readable formats, like JSON, e.g.
or CSV, e.g.
or even to be used as code in an SDK:
-
The entire PDF is parsed; everything you see in the chart can be used, including:
- the race conditions and restrictions
- lengths ahead/behind at each point of call
- fractional times
- wagering payoffs, pools, and carryovers
- footnotes etc.
-
Full race card PDFs containing multiple races (including those spread over multiple pages) can be parsed.
-
An SDK comes out-of-the-box that supports full serialization to and from a JSON API.
-
Textual descriptions of race distances are converted to feet e.g. "Six Furlongs" becomes 3,960.
-
Values for lengths ahead/behind are converted to decimal formats.
-
The software adds additional features, including:
- attempting to lookup the last-raced track details and linking to it
- calculating estimated individual fractional and splits at each fraction for each starter in a race.
- outlining each medication and equipment used
- providing a normalized "X-to-1" odds determination for all wagering payoffs
- displaying the day- of-the-week and -of-the-year that a race took place
-
Thoroughbred, Quarter Horse, Arabian and Mixed breed races are all supported.
-
The software handles edge-case scenarios such as dead-heats, walkovers, non-betting races, disqualifications (including adjusting final winning positions), cancellations, claiming price information etc.
PDFs are parsed using the Apache PDFBox library.
For a given PDF file, each character present is written as pipe-delimited String that notes its x-y coordinates, height, width, scale, font-size, and unicode value within a page of the PDF.
This is done using ChartStripper
, a customized PDFTextStripper
instance.
For each pipe-delimited String representing a character within the PDF, it is converted to a custom POJO, ChartCharacter
, using the CSV Jackson data format.
The list of ChartCharacter
s is then further grouped by the line of text it is present on within the PDF.
Each line of text within the PDF is then tested against a series of regex matchers to identify which parts of the race domain model it represents. When matched, the information is parsed and used to create an instance of RaceResult
, following the Builder pattern.
See ChartParser#parse()
for more.
Chart Parser is available in the Maven Central repository:
<dependency>
<groupId>com.robinhowlett</groupId>
<artifactId>chart-parser</artifactId>
<version>1.2.0.RELEASE</version>
</dependency>
Parsing a PDF file is simple and can be done in one-line e.g.:
List<RaceResult> raceResults = ChartParser.create().parse(Paths.get("ARP_2016-07-24_race-charts.pdf").toFile());
// print the winning margins
raceResults.stream()
.flatMap(raceResult -> raceResult.getStarters().stream())
.filter(Starter::isWinner)
.forEach(starter -> System.out.println(
String.format("%-20s: %10s",
starter.getHorse().getName(),
starter.getFinishPointOfCall().getRelativePosition().getLengthsAhead().getText())
));
// console output
Back Stop : 1 1/2
Cowboy Cliff : 9 1/2
Perkin Desire : 1 3/4
Fast as Thunder : 1/2
Takin the Blame : 7 1/4
Acme Rocket : 1 1/4
Magical Twist : 3 3/4
Lady Jila : Neck
Prater Sixty Four : 3 1/4
Handycapper is provided as a sample application to parse and convert PDF charts:
IMPORTANT: This project relies on enabling the Java 8 method parameter reflection feature (-parameters
) in your JVM settings e.g.
chart-parser
is a Maven-based Java open-source project. Running mvn clean install
will compile the code, run all tests, and install the built artificat to the local repository.
This software is open-source and released under the MIT License.
This project contains a single sample Equibase PDF chart included for testing, educational and demonstration purposes only.
It is recommended users of this software be aware of the conditions on the PDF charts that may apply.