Chart Parser

Parses horse racing result charts into JSON/CSV/Java...

TL;DR

When given an Equibase result chart PDF file e.g.

chart-parser can turn it into machine-readable formats, like JSON, e.g.

or CSV, e.g.

or even to be used as code in an SDK:

Highlights

The entire PDF is parsed; everything you see in the chart can be used, including:
- the race conditions and restrictions
- lengths ahead/behind at each point of call
- fractional times
- wagering payoffs, pools, and carryovers
- footnotes etc.
Full race card PDFs containing multiple races (including those spread over multiple pages) can be parsed.
An SDK comes out-of-the-box that supports full serialization to and from a JSON API.
Textual descriptions of race distances are converted to feet e.g. "Six Furlongs" becomes 3,960.
Values for lengths ahead/behind are converted to decimal formats.
The software adds additional features, including:
- attempting to lookup the last-raced track details and linking to it
- calculating estimated individual fractional and splits at each fraction for each starter in a race.
- outlining each medication and equipment used
- providing a normalized "X-to-1" odds determination for all wagering payoffs
- displaying the day- of-the-week and -of-the-year that a race took place
Thoroughbred, Quarter Horse, Arabian and Mixed breed races are all supported.
The software handles edge-case scenarios such as dead-heats, walkovers, non-betting races, disqualifications (including adjusting final winning positions), cancellations, claiming price information etc.

How it works

PDFs are parsed using the Apache PDFBox library.

For a given PDF file, each character present is written as pipe-delimited String that notes its x-y coordinates, height, width, scale, font-size, and unicode value within a page of the PDF.

This is done using ChartStripper, a customized PDFTextStripper instance.

For each pipe-delimited String representing a character within the PDF, it is converted to a custom POJO, ChartCharacter, using the CSV Jackson data format.

The list of ChartCharacters is then further grouped by the line of text it is present on within the PDF.

Each line of text within the PDF is then tested against a series of regex matchers to identify which parts of the race domain model it represents. When matched, the information is parsed and used to create an instance of RaceResult, following the Builder pattern.

See ChartParser#parse() for more.

How to use

Chart Parser is available in the Maven Central repository:

<dependency>
    <groupId>com.robinhowlett</groupId>
    <artifactId>chart-parser</artifactId>
    <version>1.2.0.RELEASE</version>
</dependency>

Parsing a PDF file is simple and can be done in one-line e.g.:

List<RaceResult> raceResults = ChartParser.create().parse(Paths.get("ARP_2016-07-24_race-charts.pdf").toFile());

// print the winning margins
raceResults.stream()
        .flatMap(raceResult -> raceResult.getStarters().stream())
        .filter(Starter::isWinner)
        .forEach(starter -> System.out.println(
                String.format("%-20s: %10s",
                        starter.getHorse().getName(),
                        starter.getFinishPointOfCall().getRelativePosition().getLengthsAhead().getText())
        ));

// console output
Back Stop           :      1 1/2
Cowboy Cliff        :      9 1/2
Perkin Desire       :      1 3/4
Fast as Thunder     :        1/2
Takin the Blame     :      7 1/4
Acme Rocket         :      1 1/4
Magical Twist       :      3 3/4
Lady Jila           :       Neck
Prater Sixty Four   :      3 1/4

Handycapper is provided as a sample application to parse and convert PDF charts:

Compiling

IMPORTANT: This project relies on enabling the Java 8 method parameter reflection feature (-parameters) in your JVM settings e.g.

chart-parser is a Maven-based Java open-source project. Running mvn clean install will compile the code, run all tests, and install the built artificat to the local repository.

Notes

This software is open-source and released under the MIT License.

This project contains a single sample Equibase PDF chart included for testing, educational and demonstration purposes only.

It is recommended users of this software be aware of the conditions on the PDF charts that may apply.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
docs		docs
src		src
.gitignore		.gitignore
.travis.settings.xml		.travis.settings.xml
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
codesigning.asc.enc		codesigning.asc.enc
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chart Parser

TL;DR

Highlights

How it works

How to use

Compiling

Notes

About

Releases

Packages

Languages

License

ccmd00d/chart-parser

Folders and files

Latest commit

History

Repository files navigation

Chart Parser

TL;DR

Highlights

How it works

How to use

Compiling

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages