This repository contains a code-generator that turns a Deutsche Börse ETI (Enhanced Trading Interface) protocol description into Python bindings. It supports EOBI (market data) protocol descriptions, as well.
There is also a code generator for creating Wireshark protocol dissector from these protocol descriptions.
This is a private research project for investigating how binary serialisation/deserialization code can profit from modern Python features and other experiments.
2021, Georg Sauthoff mail@gms.tf
The generated Python code can be used for several purposes, such as:
- creating binary message templates for traffic generators or high performance ETI clients
- analysing captured ETI messages for - say - debugging
- a concise reference to look up message details such as the name, offset, width, type, etc. of fields
- writing a ETI traffic generator or test-server in Python
As an example for how the generated code looks you can check out the output for the T7 ETI version 9 specification.
This repository also contains a simple ETI-Client
(eti_client.py
) and a small ETI-Server (eti_server.py
) that
can be used to ping pong some ETI messages over the network. The
server runs forever and replies to each request with some context
dependent response message or messages (as specified in the
protocol specification). If alternative response types are
possible, a choice is made by random. Since the server dumps each
received ETI message to stdout it can also be used as ad-hoc
protocol dissector when developing/testing an ETI client.
There is also a simple EOBI-Client (eobi_client.py
) that dumps
multicast market data packets, including the DSCP field in
which the EOBI protocol encodes market data related information, as well.
Another example is pcapdump.py
, a simple PCAP to ETI/EOBI
dumper. It pretty-prints EOBI/ETI packets from a PCAP file to
stdout in a human-readable format. Note that for simplicity it
assumes that ETI-TCP-packets just contain complete ETI messages
and start with an ETI message header which is usually the case, in
practice. Of course, since it's ETI over TCP and TCP is a stream
oriented protocol it's perfectly fine for a client to span
ETI-messages over TCP segment boundaries. Adding TCP reassembly
to the example can be seen as an exercise.
The pcapgen.py
script shows how to quickly generate/fake some
ETI/EOBI PCAP files from scratch for testing purposes.
Deutsche Börse publishes the ETI protocol descriptions on their web sites. Since they are sometimes kind of hard to find I include some links:
- ETI 12.1 via Xetra system documentation or via Eurex system documentation
- ETI 12 via Xetra system documentation or via Eurex system documentation
- ETI 11.1 via Xetra system documentation or via Eurex system documentation
- ETI 11 via Xetra system documentation or via Eurex system documentation
- ETI 10.1 via Xetra system documentation or via Eurex system documentation
- ETI 10 via Xetra system documentation or via Eurex system documentation
- ETI 9.1 via Xetra system documentation or via Eurex system documentation
- ETI 9.0 via Xetra system documentation or via Eurex system documentation
- ETI 8.1 via Xetra system documentation or via Eurex system documentation
- ETI 8.0 via Xetra system documentation or via Eurex system documentation
EOBI descriptions:
- EOBI 12.1 via Xetra system documentation or via Eurex system documentation
- EOBI 12 via Xetra system documentation or via Eurex system documentation
- EOBI 11.1 via Xetra system documentation or via Eurex system documentation
- EOBI 11 via Xetra system documentation or via Eurex system documentation
- EOBI 10.1 via Xetra system documentation or via Eurex system documentation
- EOBI 10 via Xetra system documentation or via Eurex system documentation
- EOBI 9.1 via Xetra system documentation or via Eurex system documentation
- EOBI 9.0 via Xetra system documentation or via Eurex system documentation
- ETI 8.1 via Xetra system documentation or via Eurex system documentation
- ETI 8.0 via Xetra system documentation or via Eurex system documentation
The previous section contains links into the Euex/Xetra system documentation which includes manuals and reference manuals on the various protocols and services.
Besides the protocols there is also the N7 Network Access Guide which lists the various ports and IP addresses in use for these protocols:
- Xetra Release 12.1 Network Access Section
- Direct link: N7 Network Access-Guide v2.3.1 Release 12.1 (Xetra)
- Direct link: N7 Network Access-Guide v2.3.1 Release 12.1 (Eurex)
The functional reference gives some background on how the exchange system (the order matching etc.) is supposed to work:
- Xetra Release 12.1 Overview and Functionality Section
- Direct link: T7 Functional Reference Release 12.1 (Xetra)
- Direct link: T7 Functional Reference Release 12.1 (Eurex)
The main noteworthy modern Python features the generated code uses are Python enumerations (available since Python 3.4) and dataclasses (available since Python 3.7, for Python 3.6 there is a backport).
Dataclasses provide some syntactic sugar for dealing with mutable
named records in Python. Their use of type annotations and
default values allow for compact definitions. Two things to keep
in mind with dataclasses are that default value definitions must
be immutable and that additional (non-annotated) fields can
accidentally added my typos. Thus, the generated code uses
default factory functions for mutable defaults and overwrites
__setattr__()
to check for unknown fields.
The generated code also makes heavy use of Python's neat struct package for serializing and deserializing spans of elementary fields. This isn't a recent addition to Python, however, memoryviews, which are often a useful tool for avoiding buffer churning were added as late as Python 2.7.
Of course, Python trades some runtime speed for syntactic sugar and usability, and you wouldn't write performance critical code in Python. Having said that, serializing/deserializing shouldn't be too slow, either.
The file bench_eti.py
contains a small benchmark that
repeatedly serializes an IOC (immediate-or-cancel) order after
changing a few fields, while avoiding buffer churning.
On a Skylake i7-6600U Laptop this results in:
$ pytest bench_eti.py
------------------------------------------------------------------------------------- benchmark: 2 tests -------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_pack_ioc 4.4320 (1.0) 158.2600 (1.0) 4.9058 (1.0) 2.1368 (1.0) 4.7510 (1.0) 0.1140 (1.0) 403;688 203.8419 (1.0) 18464 1
test_unpack_ioc 20.6270 (4.65) 1,231.7820 (7.78) 23.7020 (4.83) 10.6706 (4.99) 22.3940 (4.71) 2.5851 (22.67) 1063;1534 42.1906 (0.21) 22066 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Legend:
Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
OPS: Operations Per Second, computed as 1 / Mean
That means on that machine the code serializes ~ 200 k IOC orders per seconds (with cpython) which is quite ok.
Using PyPy, the numbers are much better (same machine):
$ pypy3 -m pytest bench_eti.py
----------------------------------------------------------------------------------------------- benchmark: 2 tests -----------------------------------------------------------------------------------------------
Name (time in ns) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_pack_ioc 258.9996 (1.0) 334,982.7357 (1.0) 317.4350 (1.0) 1,081.2319 (1.0) 284.9449 (1.0) 25.3693 (1.0) 323;20639 3,150.2509 (1.0) 191351 19
test_unpack_ioc 3,491.9940 (13.48) 3,202,659.4854 (9.56) 4,626.1754 (14.57) 14,446.8865 (13.36) 3,695.9827 (12.97) 345.4916 (13.62) 1321;18103 216.1613 (0.07) 142817 2
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Legend:
Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
OPS: Operations Per Second, computed as 1 / Mean
Basically PyPy speeds up the serialization by a factor of 10 and the deserialization by a factor of 5 or so. Note the change in units in the pytest output (from µs to ns).
The ETI and EOBI protocols specify a message stream, where each message is tagged and starts with a length field, although most messages are of fixed size. Most message fields are of fixed size, those which aren't are prefixed with an accompanying length field. Integers a encoded in little endian byte order, each field size is divisible by 8 bits, and the size of each message is divisible by 8 bytes.
One important difference between the ETI and EOBI encoding is that whole EOBI messages are of fixed size whereas ETI messages may vary in size and only their sub-records are of fixed size. That means that arrays in ETI messages are minimally encoded (i.e. only the filled elements are put on the wire) while arrays in EOBI are fully encoded (i.e. trailing empty elements act as additional padding). Some ETI messages also include string fields of variable size and those are zero-padded such that the message size is divisible by 8.
ETI runs over TCPv4 while EOBI is specified on top of UDPv4.
See for example the ETI request header:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Body Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template ID | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| Network Message ID (unused) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | pad |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sender Sub ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
And the EOBI Packet-Header:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Body Length | Template ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Sequence Number (unused) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Application Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Market Segment ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Partition ID | CompletionInd.|ApplSeqRestInd.| DSCP copy |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| pad |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ Transact Time +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The tool ./eti2wireshark.py
generates Wireshark protocol
dissectors from the ETI/EOBI protocol descriptions.
Example:
./eti2wireshark.py --proto eobi --desc 'Enhanced Order Book Interface' temp/T7_EOBI_9.1.zip/eobi-mod.xml -o packet-eobi.c
./eti2wireshark.py temp/T7_ETI_9.1.zip/eti_Derivatives.xml -o packet-eti.c
The generated code is implemented around a tight state machine to avoid code bloat.
Protocol fields are pretty-printed in the obvious ways, e.g. timestamps in human readable format, fixed point decimals with the point inserted at the type specific place, enumeration mappings provided etc.
Related work:
- Open-Markets-Initiative/wireshark-lua - A collection of Lua based model-generated Wireshark dissectors for various trading/market data protocols. The ETI/EOBI protocols are listed there as untested. I haven't tested these dissectors - however, the fact that they use another layer of general indirection (the Lua interpreter) surely doesn't help with dissecting speed.
The generated ETI 9.1 Lua dissector file contains over 32 thousand lines whereas theeti2wireshark.py
generated ETI 9.1 dissector C-code just spans about 13 thousand lines - where most of the lines are lookup tables that are placed into the read-only data segment (i.e. more than 12 thousand lines).
FWIW, in contrast to the eti2wireshark dissectors, the Lua dissectors pretty-print field names with spaces between the camel-cased elements.
A real limitation is that timestamp fields such asExecID
are displayed as is, i.e. the value isn't converted into a human readable date-time string.
A serious issue is how the Lua dissectors display fixed-point decimals: the Lua code uses floating-point arithmetic to convert them and the resulting floating-point value is displayed. Thus, the displayed value is just an approximation of the real value.
From the repository's description and README it isn't clear where the Lua dissector generators are available and whether they are avaiable under an Open Source license. - dharmangbhavsar/eti_dissector (removed) - 'A Eurex ETI Wireshark Dissector for Geneva Trading' was available until mid 2021 or so but that repository was removed later that year. From the archived page its unclear whether that dissector was released under an open source license. The last commit was from December, 2018 and it looks like it supported ETI version 6.1. Since the repository listing includes Deutsche Börse's published C header file (with structs for all the ETI PDUs) and no XML protocol description it looks like that dissectors wasn't code generated.
- The benchmark test case relies on pytest benchmark (Fedora package: python3-pytest-benchmark).
- The
pcapdump.py
example uses the dpkt package for parsing PCAP files and skipping over Ethernet/IP/UDP/TCP headers (Fedora package: python3-dpkt). - Wikipedia's List of Electronic Trading Protocols