Open
Description
When running with the following set of options:
wget -q -O - ftp://ftp.cmegroup.com/SBEFix/Production/secdef.dat.gz | gunzip - | txcode \
--schema_file ~/src/datacast/transcoder/test/FIX50SP2.CME.xml \
--factory fix \
--source_file_format_type line_delimited \
--message_type_inclusions=SecurityDefinition \
--fix_header_tags 8,9,35,1128,49,56,34,52,10 \
--destination_project_id $(gcloud config get-value project) \
--output_type pubsub \
--output_encoding json \
--lazy_create_resources
The following error is thrown:
google.api_core.exceptions.InvalidArgument: 400 AVRO schema definition is not valid: sbeMessage.NoLotTypeRules exists twice in schema. [detail: "[ORIGINAL ERROR] generic::invalid_argument: AVRO schema definition is not valid: sbeMessage.NoLotTypeRules exists twice in schema. [google.rpc.error_details_ext] { message: \"AVRO schema definition is not valid: sbeMessage.NoLotTypeRules exists twice in schema.\" }"
This is likely because of the complex schema defined in FIX50SP2.CME.xml
, where the same field name may be present at different levels of the entity hierarchy. A graph such as Object.Property1
and Object.Property2.Property1
appears to be incompatible with Avro, but is commonly encountered within legitimate FIX schema definitions.
It's notable that BigQuery output types do not exhibit this behavior, but the fastavro
output type to local POSIX files does as well:
fastavro._schema_common.SchemaParseException: redefined named type: sbeMessage.NoLotTypeRules