diff --git a/README.md b/README.md index c6b1652..1f699fe 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,9 @@ Converters](https://github.com/ParadoxGameConverters) and ## Quick Start -Below is a demonstration on parsing plaintext data using jomini tools. +Below is a demonstration of deserializing plaintext data using serde. +Several additional serde-like attributes are used to reconcile the serde +data model with structure of these files. ```rust use jomini::{ @@ -71,9 +73,9 @@ let actual: Model = jomini::text::de::from_windows1252_slice(data)?; assert_eq!(actual, expected); ``` -## Binary Parsing +## Binary Deserialization -Parsing data encoded in the binary format is done in a similar fashion but with a couple extra steps for the caller to supply: +Deserializing data encoded in the binary format is done in a similar fashion but with a couple extra steps for the caller to supply: - How text should be decoded (typically Windows-1252 or UTF-8) - How rational (floating point) numbers are decoded @@ -84,7 +86,7 @@ Implementors be warned, not only does each Paradox game have a different binary Below is an example that defines a sample binary format and uses a hashmap token lookup. ```rust -use jomini::{BinaryDeserializer, Encoding, JominiDeserialize, Windows1252Encoding}; +use jomini::{Encoding, JominiDeserialize, Windows1252Encoding, binary::BinaryFlavor}; use std::{borrow::Cow, collections::HashMap}; #[derive(JominiDeserialize, PartialEq, Debug)] @@ -116,8 +118,7 @@ let data = [ 0x82, 0x2d, 0x01, 0x00, 0x0f, 0x00, 0x03, 0x00, 0x45, 0x4e, 0x47 ]; let mut map = HashMap::new(); map.insert(0x2d82, "field1"); -let actual: MyStruct = BinaryDeserializer::builder_flavor(BinaryTestFlavor) - .deserialize_slice(&data[..], &map)?; +let actual: MyStruct = BinaryTestFlavor.deserialize_slice(&data[..], &map)?; assert_eq!(actual, MyStruct { field1: "ENG".to_string() }); ``` @@ -126,59 +127,14 @@ without any duplication. One can configure the behavior when a token is unknown (ie: fail immediately or try to continue). -### Ondemand Deserialization - -The ondemand deserializer is a one-shot deserialization mode is often faster -and more memory efficient as it does not parse the input into an intermediate -tape, and instead deserializes right from the input. - -It is instantiated and used similarly to `BinaryDeserializer` - -```rust -use jomini::OndemandBinaryDeserializer; -// [...snip code from previous example...] - -let actual: MyStruct = OndemandBinaryDeserializer::builder_flavor(BinaryTestFlavor) - .deserialize_slice(&data[..], &map)?; -assert_eq!(actual, MyStruct { field1: "ENG".to_string() }); -``` - -### Direct identifier deserialization with `token` attribute - -There may be some performance loss during binary deserialization as -tokens are resolved to strings via a `TokenResolver` and then matched against the -string representations of a struct's fields. - -We can fix this issue by directly encoding the expected token value into the struct: - -```rust -#[derive(JominiDeserialize, PartialEq, Debug)] -struct MyStruct { - #[jomini(token = 0x2d82)] - field1: String, -} - -// Empty token to string resolver -let map = HashMap::::new(); - -let actual: MyStruct = BinaryDeserializer::builder_flavor(BinaryTestFlavor) - .deserialize_slice(&data[..], &map)?; -assert_eq!(actual, MyStruct { field1: "ENG".to_string() }); -``` - -Couple notes: - -- This does not obviate need for the token to string resolver as tokens may be used as values. -- If the `token` attribute is specified on one field on a struct, it must be specified on all fields of that struct. - ## Caveats -Caller is responsible for: +Before calling any Jomini API, callers are expected to: -- Determining the correct format (text or binary) ahead of time -- Stripping off any header that may be present (eg: `EU4txt` / `EU4bin`) -- Providing the token resolver for the binary format -- Providing the conversion to reconcile how, for example, a date may be encoded as an integer in +- Determine the correct format (text or binary) ahead of time. +- Strip off any header that may be present (eg: `EU4txt` / `EU4bin`) +- Provide the token resolver for the binary format +- Provide the conversion to reconcile how, for example, a date may be encoded as an integer in the binary format, but as a string when in plaintext. ## The Mid-level API @@ -199,6 +155,9 @@ for (key, _op, value) in reader.fields() { } ``` +For even lower level of parisng, see the respective binary and text +documentation. + The mid-level API also provides the excellent utility of converting the plaintext Clausewitz format to JSON when the `json` feature is enabled. @@ -211,28 +170,6 @@ let actual = reader.json().to_string()?; assert_eq!(actual, r#"{"foo":"bar"}"#); ``` -## One Level Lower - -At the lowest layer, one can interact with the raw data directly via `TextTape` -and `BinaryTape`. - -```rust -use jomini::{TextTape, TextToken, Scalar}; - -let data = b"foo=bar"; - -assert_eq!( - TextTape::from_slice(&data[..])?.tokens(), - &[ - TextToken::Unquoted(Scalar::new(b"foo")), - TextToken::Unquoted(Scalar::new(b"bar")), - ] -); -``` - -If one will only use `TextTape` and `BinaryTape` then `jomini` can be compiled without default -features, resulting in a build without dependencies. - ## Write API There are two targeted use cases for the write API. One is when a text tape is on hand. diff --git a/src/binary/mod.rs b/src/binary/mod.rs index 13ffc56..74b59c3 100644 --- a/src/binary/mod.rs +++ b/src/binary/mod.rs @@ -6,10 +6,66 @@ //! //! If the serde deserialization API is too high level, one can build //! abstractions ontop of. -//! - [BinaryTape::from_slice]: Realizes a pseudo AST onto -//! a linear tape. Cleans up and normalizes data. -//! - [TokenReader]: an incremental binary lexer designed for handling large +//! - [BinaryTape::from_slice]: Realizes a pseudo AST onto a linear tape. +//! Cleans up and normalizes data. +//! - [TokenReader]: An incremental binary lexer designed for handling large //! saves in a memory efficient manner. +//! - [Lexer]: The lowest level, a zero cost binary data scanner over a byte +//! slice. +//! +//! ## Direct identifier deserialization with `token` attribute +//! +//! There may be some performance loss during binary deserialization as +//! tokens are resolved to strings via a `TokenResolver` and then matched against the +//! string representations of a struct's fields. +//! +//! We can fix this issue by directly encoding the expected token value into the struct: +//! +//! ```rust +//! # #[cfg(feature = "derive")] { +//! # use jomini::{Encoding, JominiDeserialize, Windows1252Encoding, binary::BinaryFlavor}; +//! # use std::{borrow::Cow, collections::HashMap}; +//! # +//! # #[derive(Debug, Default)] +//! # pub struct BinaryTestFlavor; +//! # +//! # impl BinaryFlavor for BinaryTestFlavor { +//! # fn visit_f32(&self, data: [u8; 4]) -> f32 { +//! # f32::from_le_bytes(data) +//! # } +//! # +//! # fn visit_f64(&self, data: [u8; 8]) -> f64 { +//! # f64::from_le_bytes(data) +//! # } +//! # } +//! # +//! # impl Encoding for BinaryTestFlavor { +//! # fn decode<'a>(&self, data: &'a [u8]) -> Cow<'a, str> { +//! # Windows1252Encoding::decode(data) +//! # } +//! # } +//! # +//! # let data = [ 0x82, 0x2d, 0x01, 0x00, 0x0f, 0x00, 0x03, 0x00, 0x45, 0x4e, 0x47 ]; +//! # +//! #[derive(JominiDeserialize, PartialEq, Debug)] +//! struct MyStruct { +//! #[jomini(token = 0x2d82)] +//! field1: String, +//! } +//! +//! // Empty token to string resolver +//! let map = HashMap::::new(); +//! +//! let actual: MyStruct = BinaryTestFlavor.deserialize_slice(&data[..], &map)?; +//! assert_eq!(actual, MyStruct { field1: "ENG".to_string() }); +//! # } +//! # Ok::<(), Box>(()) +//! ``` +//! +//! Couple notes: +//! +//! - This does not obviate need for the token to string resolver as tokens may be used as values. +//! - If the `token` attribute is specified on one field on a struct, it must be specified on all fields of that struct. /// binary deserialization #[cfg(feature = "derive")] diff --git a/src/lib.rs b/src/lib.rs index 84d01b8..f9c6301 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -14,7 +14,7 @@ Converters](https://github.com/ParadoxGameConverters) and ## Features - ✔ Versatile: Handle both plaintext and binary encoded data -- ✔ Fast: Parse data at 1 GB/s +- ✔ Fast: Parse data at over 1 GB/s - ✔ Small: Compile with zero dependencies - ✔ Safe: Extensively fuzzed against potential malicious input - ✔ Ergonomic: Use [serde](https://serde.rs/derive.html)-like macros to have parsing logic automatically implemented @@ -22,7 +22,9 @@ Converters](https://github.com/ParadoxGameConverters) and ## Quick Start -Below is a demonstration on parsing plaintext data using jomini tools. +Below is a demonstration of deserializing plaintext data using serde. +Several additional serde-like attributes are used to reconcile the serde +data model with structure of these files. ```rust # #[cfg(feature = "derive")] { @@ -129,68 +131,14 @@ without any duplication. One can configure the behavior when a token is unknown (ie: fail immediately or try to continue). -### Direct identifier deserialization with `token` attribute - -There may be some performance loss during binary deserialization as -tokens are resolved to strings via a `TokenResolver` and then matched against the -string representations of a struct's fields. - -We can fix this issue by directly encoding the expected token value into the struct: - -```rust -# #[cfg(feature = "derive")] { -# use jomini::{Encoding, JominiDeserialize, Windows1252Encoding, binary::BinaryFlavor}; -# use std::{borrow::Cow, collections::HashMap}; -# -# #[derive(Debug, Default)] -# pub struct BinaryTestFlavor; -# -# impl BinaryFlavor for BinaryTestFlavor { -# fn visit_f32(&self, data: [u8; 4]) -> f32 { -# f32::from_le_bytes(data) -# } -# -# fn visit_f64(&self, data: [u8; 8]) -> f64 { -# f64::from_le_bytes(data) -# } -# } -# -# impl Encoding for BinaryTestFlavor { -# fn decode<'a>(&self, data: &'a [u8]) -> Cow<'a, str> { -# Windows1252Encoding::decode(data) -# } -# } -# -# let data = [ 0x82, 0x2d, 0x01, 0x00, 0x0f, 0x00, 0x03, 0x00, 0x45, 0x4e, 0x47 ]; -# -#[derive(JominiDeserialize, PartialEq, Debug)] -struct MyStruct { - #[jomini(token = 0x2d82)] - field1: String, -} - -// Empty token to string resolver -let map = HashMap::::new(); - -let actual: MyStruct = BinaryTestFlavor.deserialize_slice(&data[..], &map)?; -assert_eq!(actual, MyStruct { field1: "ENG".to_string() }); -# } -# Ok::<(), Box>(()) -``` - -Couple notes: - -- This does not obviate need for the token to string resolver as tokens may be used as values. -- If the `token` attribute is specified on one field on a struct, it must be specified on all fields of that struct. - ## Caveats -Caller is responsible for: +Before calling any Jomini API, callers are expected to: -- Determining the correct format (text or binary) ahead of time -- Stripping off any header that may be present (eg: `EU4txt` / `EU4bin`) -- Providing the token resolver for the binary format -- Providing the conversion to reconcile how, for example, a date may be encoded as an integer in +- Determine the correct format (text or binary) ahead of time. +- Strip off any header that may be present (eg: `EU4txt` / `EU4bin`) +- Provide the token resolver for the binary format +- Provide the conversion to reconcile how, for example, a date may be encoded as an integer in the binary format, but as a string when in plaintext. ## The Mid-level API @@ -211,6 +159,8 @@ for (key, _op, value) in reader.fields() { } ``` +For even lower level of parisng, see the respective [binary] and [text] module documentation. + */ #![cfg_attr( feature = "json", @@ -234,28 +184,6 @@ assert_eq!(actual, r#"{"foo":"bar"}"#); "## )] /*! -## One Level Lower - -At the lowest layer, one can interact with the raw data directly via `TextTape` -and `BinaryTape`. - -```rust -use jomini::{TextTape, TextToken, Scalar}; - -let data = b"foo=bar"; - -assert_eq!( - TextTape::from_slice(&data[..])?.tokens(), - &[ - TextToken::Unquoted(Scalar::new(b"foo")), - TextToken::Unquoted(Scalar::new(b"bar")), - ] -); -# Ok::<(), Box>(()) -``` - -If one will only use `TextTape` and `BinaryTape` then `jomini` can be compiled without default -features, resulting in a build without dependencies. ## Write API diff --git a/src/text/de.rs b/src/text/de.rs index f660d0a..420c82a 100644 --- a/src/text/de.rs +++ b/src/text/de.rs @@ -123,7 +123,10 @@ where TextDeserializer::from_windows1252_slice(data)?.deserialize() } -/// Convenience method for deserializing streaming windows1252 data into a Rust value +/// (**Experimental**) Create a Windows1252 text value from a reader +/// +/// Considered experimental as it uses a [TokenReader] under the hood, which +/// uses a different parsing routine geared toward save files. pub fn from_windows1252_reader(reader: R) -> Result where T: DeserializeOwned, @@ -779,7 +782,10 @@ enum TextDeserializerKind<'a, 'b, E> { } impl TextDeserializer<'_, '_, Windows1252Encoding> { - /// Create a Windows1252 text deserializer over a reader + /// (**Experimental**) Create a Windows1252 text deserializer over a reader + /// + /// Considered experimental as it uses a [TokenReader] under the hood, which + /// uses a different parsing routine geared toward save files. pub fn from_windows1252_reader(reader: R) -> TextReaderDeserializer where R: Read, diff --git a/src/text/mod.rs b/src/text/mod.rs index 52eab49..f17c392 100644 --- a/src/text/mod.rs +++ b/src/text/mod.rs @@ -11,11 +11,8 @@ //! abstractions ontop of: //! - [TextTape::from_slice]: Realizes a pseudo AST onto //! a linear tape. Cleans up and normalizes data. -//! - [TokenReader]: (**experimental** unlike the [binary -//! equivalent](crate::binary::TokenReader)) an incremental text lexer -//! designed for handling large saves in a memory efficient manner. It can -//! lex game files, but the best API for exposing esoteric game file syntax -//! has not yet been developed. +//! - [TokenReader]: (**experimental**) an incremental text lexer +//! designed for handling large saves in a memory efficient manner. //! //! Some additional APIs are available to make working with a [TextTape] more //! ergonomic for DOM-like use cases. diff --git a/src/text/reader.rs b/src/text/reader.rs index bd215f9..375fa17 100644 --- a/src/text/reader.rs +++ b/src/text/reader.rs @@ -79,6 +79,15 @@ enum Utf8Bom { /// construct higher level parsers and deserializers that operate over a stream /// of data. /// +/// The [TokenReader] is considered **experimental**, as it uses a different +/// parsing algorithm geared towards parsing large save files. Ergonomic +/// equivalents for more esoteric game syntax (like parameter definitions) have +/// not yet been finalized. Game files can still be parsed with the experimental +/// APIs, but these APIs may change in the future based on feedback. Since the +/// binary format is not used for game files, the +/// [binary::TokenReader](crate::binary::TokenReader) is not considered +/// experimental) +/// /// [TokenReader] operates over a fixed size buffer, so using a /// [BufRead](std::io::BufRead) affords no benefits. An error will be returned /// for tokens that are impossible to fit within the buffer (eg: if the provided