|
1 | | -xml-rs, an XML library for Rust |
2 | | -=============================== |
| 1 | +The `xml-rs` project has a new home |
| 2 | +=================================== |
3 | 3 |
|
4 | | -[](https://github.com/kornelski/xml-rs/actions/workflows/main.yml) |
5 | | -[![crates.io][crates-io-img]](https://lib.rs/crates/xml-rs) |
6 | | -[![docs][docs-img]](https://docs.rs/xml-rs/) |
| 4 | +The current repository is: |
7 | 5 |
|
8 | | -[Documentation](https://docs.rs/xml-rs/) |
| 6 | +https://github.com/kornelski/xml-rs |
9 | 7 |
|
10 | | - [crates-io-img]: https://img.shields.io/crates/v/xml-rs.svg |
11 | | - [docs-img]: https://img.shields.io/badge/docs-latest%20release-6495ed.svg |
12 | | - |
13 | | -xml-rs is an XML library for the [Rust](https://www.rust-lang.org/) programming language. |
14 | | -It supports reading and writing of XML documents in a streaming fashion (without DOM). |
15 | | - |
16 | | -### Features |
17 | | - |
18 | | -* XML spec conformance better than other pure-Rust libraries. |
19 | | - |
20 | | -* Easy to use API based on `Iterator`s and regular `String`s without tricky lifetimes. |
21 | | - |
22 | | -* Support for UTF-16, UTF-8, ISO-8859-1, and ASCII encodings. |
23 | | - |
24 | | -* Written entirely in the safe Rust subset. Designed to safely handle untrusted input. |
25 | | - |
26 | | - |
27 | | -The API is heavily inspired by Java Streaming API for XML ([StAX][stax]). It contains a pull parser much like StAX event reader. It provides an iterator API, so you can leverage Rust's existing iterators library features. |
28 | | - |
29 | | - [stax]: https://en.wikipedia.org/wiki/StAX |
30 | | - |
31 | | -It also provides a streaming document writer much like StAX event writer. |
32 | | -This writer consumes its own set of events, but reader events can be converted to |
33 | | -writer events easily, and so it is possible to write XML transformation chains in a pretty |
34 | | -clean manner. |
35 | | - |
36 | | -This parser is mostly full-featured, however, there are limitations: |
37 | | -* Legacy code pages and non-Unicode encodings are not supported; |
38 | | -* DTD validation is not supported (but entities defined in the internal subset are supported); |
39 | | -* attribute value normalization is not performed, and end-of-line characters are not normalized either. |
40 | | - |
41 | | -Other than that the parser tries to be mostly XML-1.1-compliant. |
42 | | - |
43 | | -Writer is also mostly full-featured with the following limitations: |
44 | | -* no support for encodings other than UTF-8, |
45 | | -* no support for emitting `<!DOCTYPE>` declarations; |
46 | | -* more validations of input are needed, for example, checking that namespace prefixes are bounded |
47 | | - or comments are well-formed. |
48 | | - |
49 | | -Building and using |
50 | | ------------------- |
51 | | - |
52 | | -xml-rs uses [Cargo](https://crates.io), so add it with `cargo add xml` or modify `Cargo.toml`: |
53 | | - |
54 | | -```toml |
55 | | -[dependencies] |
56 | | -xml = "1.0" |
57 | | -``` |
58 | | - |
59 | | -The package exposes a single crate called `xml`. |
60 | | - |
61 | | -Reading XML documents |
62 | | ---------------------- |
63 | | - |
64 | | -[`xml::reader::EventReader`](EventReader) requires a [`Read`](stdread) instance to read from. It can be a `File` wrapped in `BufReader`, or a `Vec<u8>`, or a `&[u8]` slice. |
65 | | - |
66 | | -[EventReader]: https://docs.rs/xml-rs/latest/xml/reader/struct.EventReader.html |
67 | | -[stdread]: https://doc.rust-lang.org/stable/std/io/trait.Read.html |
68 | | - |
69 | | -`EventReader` implements `IntoIterator` trait, so you can use it in a `for` loop directly: |
70 | | - |
71 | | -```rust,no_run |
72 | | -use std::fs::File; |
73 | | -use std::io::BufReader; |
74 | | -
|
75 | | -use xml::reader::{EventReader, XmlEvent}; |
76 | | -
|
77 | | -fn main() -> std::io::Result<()> { |
78 | | - let file = File::open("file.xml")?; |
79 | | - let file = BufReader::new(file); // Buffering is important for performance |
80 | | -
|
81 | | - let parser = EventReader::new(file); |
82 | | - let mut depth = 0; |
83 | | - for e in parser { |
84 | | - match e { |
85 | | - Ok(XmlEvent::StartElement { name, .. }) => { |
86 | | - println!("{:spaces$}+{name}", "", spaces = depth * 2); |
87 | | - depth += 1; |
88 | | - } |
89 | | - Ok(XmlEvent::EndElement { name }) => { |
90 | | - depth -= 1; |
91 | | - println!("{:spaces$}-{name}", "", spaces = depth * 2); |
92 | | - } |
93 | | - Err(e) => { |
94 | | - eprintln!("Error: {e}"); |
95 | | - break; |
96 | | - } |
97 | | - // There's more: https://docs.rs/xml-rs/latest/xml/reader/enum.XmlEvent.html |
98 | | - _ => {} |
99 | | - } |
100 | | - } |
101 | | -
|
102 | | - Ok(()) |
103 | | -} |
104 | | -``` |
105 | | - |
106 | | -Document parsing can end normally or with an error. Regardless of exact cause, the parsing |
107 | | -process will be stopped, and the iterator will terminate normally. |
108 | | - |
109 | | -You can also have finer control over when to pull the next event from the parser using its own |
110 | | -`next()` method: |
111 | | - |
112 | | -```rust,ignore |
113 | | -match parser.next() { |
114 | | - ... |
115 | | -} |
116 | | -``` |
117 | | - |
118 | | -Upon the end of the document or an error, the parser will remember the last event and will always |
119 | | -return it in the result of `next()` call afterwards. If iterator is used, then it will yield |
120 | | -error or end-of-document event once and will produce `None` afterwards. |
121 | | - |
122 | | -It is also possible to tweak parsing process a little using [`xml::reader::ParserConfig`][ParserConfig] structure. |
123 | | -See its documentation for more information and examples. |
124 | | - |
125 | | -[ParserConfig]: https://docs.rs/xml-rs/latest/xml/reader/struct.ParserConfig.html |
126 | | - |
127 | | -You can find a more extensive example of using `EventReader` in `src/analyze.rs`, which is a |
128 | | -small program (BTW, it is built with `cargo build` and can be run after that) which shows various |
129 | | -statistics about specified XML document. It can also be used to check for well-formedness of |
130 | | -XML documents - if a document is not well-formed, this program will exit with an error. |
131 | | - |
132 | | - |
133 | | -## Parsing untrusted inputs |
134 | | - |
135 | | -The parser is written in safe Rust subset, so by Rust's guarantees the worst that it can do is to cause a panic. |
136 | | -You can use `ParserConfig` to set limits on maximum lenghts of names, attributes, text, entities, etc. |
137 | | -You should also set a maximum document size via `io::Read`'s [`take(max)`](https://doc.rust-lang.org/stable/std/io/trait.Read.html#method.take) method. |
138 | | - |
139 | | -Writing XML documents |
140 | | ---------------------- |
141 | | - |
142 | | -xml-rs also provides a streaming writer much like StAX event writer. With it you can write an |
143 | | -XML document to any `Write` implementor. |
144 | | - |
145 | | -```rust,no_run |
146 | | -use std::io; |
147 | | -use xml::writer::{EmitterConfig, XmlEvent}; |
148 | | -
|
149 | | -/// A simple demo syntax where "+foo" makes `<foo>`, "-foo" makes `</foo>` |
150 | | -fn make_event_from_line(line: &str) -> XmlEvent { |
151 | | - let line = line.trim(); |
152 | | - if let Some(name) = line.strip_prefix("+") { |
153 | | - XmlEvent::start_element(name).into() |
154 | | - } else if line.starts_with("-") { |
155 | | - XmlEvent::end_element().into() |
156 | | - } else { |
157 | | - XmlEvent::characters(line).into() |
158 | | - } |
159 | | -} |
160 | | -
|
161 | | -fn main() -> io::Result<()> { |
162 | | - let input = io::stdin(); |
163 | | - let output = io::stdout(); |
164 | | - let mut writer = EmitterConfig::new() |
165 | | - .perform_indent(true) |
166 | | - .create_writer(output); |
167 | | -
|
168 | | - let mut line = String::new(); |
169 | | - loop { |
170 | | - line.clear(); |
171 | | - let bytes_read = input.read_line(&mut line)?; |
172 | | - if bytes_read == 0 { |
173 | | - break; // EOF |
174 | | - } |
175 | | -
|
176 | | - let event = make_event_from_line(&line); |
177 | | - if let Err(e) = writer.write(event) { |
178 | | - panic!("Write error: {e}") |
179 | | - } |
180 | | - } |
181 | | - Ok(()) |
182 | | -} |
183 | | -``` |
184 | | - |
185 | | -The code example above also demonstrates how to create a writer out of its configuration. |
186 | | -Similar thing also works with `EventReader`. |
187 | | - |
188 | | -The library provides an XML event building DSL which helps to construct complex events, |
189 | | -e.g. ones having namespace definitions. Some examples: |
190 | | - |
191 | | -```rust,ignore |
192 | | -// <a:hello a:param="value" xmlns:a="urn:some:document"> |
193 | | -XmlEvent::start_element("a:hello").attr("a:param", "value").ns("a", "urn:some:document") |
194 | | -
|
195 | | -// <hello b:config="name" xmlns="urn:default:uri"> |
196 | | -XmlEvent::start_element("hello").attr("b:config", "value").default_ns("urn:defaul:uri") |
197 | | -
|
198 | | -// <![CDATA[some unescaped text]]> |
199 | | -XmlEvent::cdata("some unescaped text") |
200 | | -``` |
201 | | - |
202 | | -Of course, one can create `XmlEvent` enum variants directly instead of using the builder DSL. |
203 | | -There are more examples in [`xml::writer::XmlEvent`][XmlEvent] documentation. |
204 | | - |
205 | | -[XmlEvent]: https://docs.rs/xml-rs/latest/xml/reader/enum.XmlEvent.html |
206 | | - |
207 | | -The writer has multiple configuration options; see `EmitterConfig` documentation for more |
208 | | -information. |
209 | | - |
210 | | -[EmitterConfig]: https://docs.rs/xml-rs/latest/xml/writer/struct.EmitterConfig.html |
211 | | - |
212 | | -Bug reports |
213 | | ------------- |
214 | | - |
215 | | -Please report issues at: <https://github.com/kornelski/xml-rs/issues>. |
216 | | - |
217 | | -Before reporting issues with XML conformance, please find the relevant section in the XML spec first. |
218 | | - |
219 | | -## [Upgrading from 0.8 to 1.0](https://github.com/kornelski/xml-rs/blob/main/Changelog.md) |
220 | | - |
221 | | -It should be pretty painless: |
222 | | - |
223 | | -* Change `xml-rs = "0.8"` to `xml = "1.0"` in `Cargo.toml` |
224 | | -* Add `_ => {}` to `match` statements where the compiler complains. A new `Doctype` event has been added, and error enums are non-exhaustive. |
225 | | -* If you were creating `ParserConfig` using a struct literal, please use `ParserConfig::new()` and the setters. |
0 commit comments