Schema-to-case-class code generation for working with Avro in Scala.
avrohugger-core
: Generate source code at runtime for evaluation at a later step.avrohugger-filesorter
: Sort schema files for proper compilation order.avrohugger-tools
: Generate source code at the command line with the avrohugger-tools jar.
Alternative Distributions:
- sbt:
sbt-avrohugger
- Generate source code at compile time with an sbt plugin found here. - maven:
avrohugger-maven-plugin
- Generate source code at compile time with a maven plugin found here. - gradle:
gradle-avrohugger-plugin
- Generate source code at compile time with a gradle plugin found here. - on the web:
avro2caseclass
- Generate source code from a web app, found here. - freestyle-rpc:
sbt-frees-rpc-idlgen
- Generate rpc models, messages, clients, and servers found here.
- Supported Formats:
Standard
,SpecificRecord
,Scavro
- Supported Datatypes
- Logical Types Support
- Protocol Support
- Doc Support
- Usage
- Warnings
- Best Practices
- Testing
- Credits
-
Standard
Vanilla case classes (for use with Apache Avro'sGenericRecord
API, etc.) -
SpecificRecord
Case classes that implementSpecificRecordBase
and therefore have mutablevar
fields (for use with the Avro Specific API - Scalding, Spark, Avro, etc.). -
Scavro
Case classes with immutable fields, intended to wrap Java generated Avro classes (for use with the Scavro runtime, Java classes provided separately (see Scavro Plugin or sbt-avro)).
Avro | Standard |
SpecificRecord |
Scavro |
Notes |
---|---|---|---|---|
INT | Int | Int | Int | See Logical Types: date |
LONG | Long | Long | Long | See Logical Types: timestamp-millis |
FLOAT | Float | Float | Float | |
DOUBLE | Double | Double | Double | |
STRING | String | String | String | |
BOOLEAN | Boolean | Boolean | Boolean | |
NULL | Null | Null | Null | |
MAP | Map | Map | Map | |
ENUM | scala.Enumeration Scala case object Java Enum EnumAsScalaString |
Java Enum EnumAsScalaString |
scala.Enumeration Scala case object Java Enum EnumAsScalaString |
See Customizable Type Mapping |
BYTES | Array[Byte] BigDecimal |
Array[Byte] | Array[Byte] | SeeLogical Types: decimal |
FIXED | //TODO | //TODO | //TODO | |
ARRAY | Seq List Array Vector |
Seq List Array Vector |
Array Seq List Vector |
See Customizable Type Mapping |
UNION | Option Either Shapeless Coproduct |
Option | Option | See Customizable Type Mapping |
RECORD | case class case class + schema |
case class extending SpecificRecordBase |
case class extending AvroSerializeable |
See Customizable Type Mapping |
PROTOCOL | No Type Scala ADT |
RPC trait Scala ADT |
No Type Scala ADT |
See Customizable Type Mapping |
Date | java.time.LocalDate java.sql.Date |
java.time.LocalDate java.sql.Date |
Not yet supported | See Customizable Type Mapping |
TimestampMillis | java.time.Instant java.sql.Timestamp |
java.time.Instant java.sql.Timestamp |
Not yet supported | See Customizable Type Mapping |
UUID | java.util.UUID | java.util.UUID | Not yet supported | See Customizable Type Mapping |
Decimal | BigDecimal | BigDecimal | Not yet supported | See Customizable Type Mapping |
NOTE: Currently logical types are only supported for Standard
and SpecificRecord
formats
date
: Annotates Avroint
schemas to generatejava.time.LocalDate
orjava.sql.Date
(See Customizable Type Mapping). Examples: avdl, avsc.decimal
: Annotates Avrobytes
schemas to generateBigDecimal
. Examples: avdl, avsc.timestamp-millis
: Annotates Avrolong
schemas to genaratejava.time.Instant
orjava.sql.Timestamp
(See Customizable Type Mapping). Examples: avdl, avsc.uuid
: Annotates Avrostring
schemas (but not idls as of avro 1.8.2) to generatejava.util.UUID
(See Customizable Type Mapping). Example: avsc.
-
the records defined in
.avdl
,.avpr
, and json protocol strings can be generated as ADTs if the protocols define more than one Scala definition (note: message definitions are ignored when this setting is used). See Customizable Type Mapping. -
For
SpecificRecord
, if the protocol contains messages then an RPC trait is generated (instead of generating and ADT, or ignoring the message definitions).
-
.avdl
: Comments that begin with/**
are used as the documentation string for the type or field definition that follows the comment. -
.avsc
,.avpr
, and.avro
: Docs in Avro schemas are used to define a case class' ScalaDoc -
.scala
: ScalaDocs of case class definitions are used to define record and field docs
Note: Currently Treehugger appears to generate Javadoc style docs (thus compatible with ScalaDoc style).
- Library For Scala 2.11, 2.12, and 2.13
- Parses Schemas and IDLs with Avro version 1.9
- Generates Code Compatible with Scala 2.11, 2.12, 2.13
"com.julianpeeters" %% "avrohugger-core" % "1.0.0-RC22"
Instantiate a Generator
with Standard
, Scavro
, or SpecificRecord
source
formats. Then use
tToFile(input: T, outputDir: String): Unit
or
tToStrings(input: T): List[String]
where T
can be File
, Schema
, or String
.
import avrohugger.Generator
import format.SpecificRecord
val schemaFile = new File("path/to/schema")
val generator = new Generator(SpecificRecord)
generator.fileToFile(schemaFile, "optional/path/to/output") // default output path = "target/generated-sources"
where an input File
can be .avro
, .avsc
, .avpr
, or .avdl
,
and where an input String
can be the string representation of an Avro schema,
protocol, IDL, or a set of case classes that you'd like to have implement
SpecificRecordBase
.
To reassign Scala types to Avro types, use the following (e.g. for customizing Specific
):
import avrohugger.format.SpecificRecord
import avrohugger.types.ScalaVector
val myScalaTypes = Some(SpecificRecord.defaultTypes.copy(array = ScalaVector))
val generator = new Generator(SpecificRecord, avroScalaCustomTypes = myScalaTypes)
record
can be assigned toScalaCaseClass
andScalaCaseClassWithSchema
(with schema in a companion object)array
can be assigned toScalaSeq
,ScalaArray
,ScalaList
, andScalaVector
enum
can be assigned toJavaEnum
,ScalaCaseObjectEnum
,EnumAsScalaString
, andScalaEnumeration
union
can be assigned toOptionShapelessCoproduct
,OptionEitherShapelessCoproduct
, orOptionalShapelessCoproduct
int
,long
,float
,double
can be assigned toScalaInt
,ScalaLong
,ScalaFloat
,ScalaDouble
protocol
can be assigned toScalaADT
andNoTypeGenerated
decimal
can be assigned to e.g.ScalaBigDecimal(Some(BigDecimal.RoundingMode.HALF_EVEN))
andScalaBigDecimalWithPrecision(None)
(via Shapeless Tagged Types)
Namespaces can be reassigned by instantiating a Generator
with a custom
namespace map (please see warnings below):
val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("oldnamespace"->"newnamespace"))
Scavro: by default, a "model" package is appended to the namespace to create a Scala namespace that does not conflict with Scavro's generated Java. To override, either customize each package namespace separately (preempting the use of the default package name), or override the package name like so:
val generator = new Generator(SpecificRecord, avroScalaCustomNamespace = Map("SCAVRO_DEFAULT_PACKAGE$"->"scavro"))
Generate simple classes instead of case classes when fields.size > 22, useful for generating code for Scala 2.10 from large schemas.
val generator = new Generator(SpecificRecord, restrictedFieldNumber = true)
"com.julianpeeters" %% "avrohugger-filesorter" % "1.0.0-RC22"
To ensure dependent schemas are compiled in the proper order (thus avoiding org.apache.avro.SchemaParseException: Undefined name: "com.example.MyRecord"
parser errors), sort avsc and avdl files with the sortSchemaFiles
method on AvscFileSorter
and AvdlFileSorter
respectively.
import avrohugger.filesorter.AvscFileSorter
import java.io.File
val sorted: List[File] = AvscFileSorter.sortSchemaFiles((srcDir ** "*.avsc")
Download the avrohugger-tools jar for Scala 2.10, Scala 2.11, or Scala 2.12 (>30MB!) and use it like the avro-tools jar Usage: [-string] (schema|protocol|datafile) input... outputdir
:
generate
generates Scala case class definitions:
java -jar /path/to/avrohugger-tools_2.12-1.0.0-RC22-assembly.jar generate schema user.avsc .
generate-specific
generates definitions that extend Avro'sSpecificRecordBase
:
java -jar /path/to/avrohugger-tools_2.12-1.0.0-RC22-assembly.jar generate-specific schema user.avsc .
generate-scavro
generates definitions that extend Scavro'sAvroSerializable
:
java -jar /path/to/avrohugger-tools_2.12-1.0.0-RC22-assembly.jar generate-scavro schema user.avsc .
-
If your framework is one that relies on reflection to get the Schema, it will fail since Scala fields are private. Therefore preempt it by passing in a Schema to DatumReaders and DatumWriters (e.g.
val sdw = SpecificDatumWriter[MyRecord](schema)
). -
For the
SpecificRecord
format, generated case class fields must be mutable (var
) in order to be compatible with the SpecificRecord API. Note: If your framework allowsGenericRecord
, avro4s provides a type class that converts to and from immutable case classes cleanly. -
When the input is a case class definition
String
, import statements are not supported, please use fully qualified type names if using records/classes from multiple namespaces. -
By default, a schema's namespace is used as a package name. In the case of the Scavro output format, the default is the namespace with
model
appended. -
While Scavro format uses custom namespaces in a way that leaves it unaffected, most formats fail on schemas with records within unions (see [avro forum](see http://apache-avro.679487.n3.nabble.com/Deserialize-with-different-schema-td4032782.html)).
-
SpecificRecord
requires thatenum
be represented asJavaEnum
-
Avoid recursive schemas since they can cause compatibility issues if trying to flow data into a system that doesn't support them (e.g., Hive).
-
Use namespaces to ensure compatibility when importing into Java/Scala.
-
Use default field values in case of future schema evolution (further reading).
To test for regressions, please run sbt:avrohugger> + test
.
To test that generated code can be de/serialized as expected, please run:
sbt:avrohugger> + publishLocal
- then clone sbt-avrohugger and update its avrohugger dependency to the locally published version
- finally run
sbt:sbt-avrohugger> scripted avrohugger/*
, or, e.g.,scripted avrohugger/GenericSerializationTests
Depends on Avro and Treehugger. avrohugger-tools
is based on avro-tools.
Contributors:
- Marius Soutier
- Paul Pearcy
- Stefano Galarraga
- Brian London
- Matt Allen
- Lars Albertsson
- alancnet
- C-zito
- Eugene Platonov
- Matt Coffin
- Tim Chan
- Jerome Wacongne
- Ryan Koval
- Saket
- Jon Morra
- Simonas Gelazevicius
- Daniel Davis
- Raúl Raja Martínez
- Paul Snively
- Zach Cox
- Kaur Matas
- Marco Stefani
- Diego E. Alonso Blas
- Chris Albright
- Andrew Gustafson
- Fede Fernández
- Francisco Díaz
- Kostya Golikov
- Rob Landers
- Bobby Rauchenberg
- Plínio Pantaleão
- Simon Petty
- Leonard Ehrenfried