Parser

Character-by-character string parsing library.

Contents

Features
Requirements
Usage
Character types
- Customizing character types

Features

line number tracking (can be disabled for performance)
supports CR, LF and CRLF line endings
verbose exceptions
many methods to navigate and operate the parser
- forward / backward peeking and seeking
- forward / backward character consumption
- state stack
character types
expectations

Requirements

PHP 7.1+

Usage

Creating a parser

Create a new parser instance with string input.

The parser begins at the first character.

<?php

use Kuria\Parser\Parser;

$input = 'foo bar baz';

$parser = new Parser($input);

Parser properties

The parser has several public properties that can be used to inspect its current state:

$parser->i - current position
$parser->char - current character (or NULL at the end of input)
$parser->lastChar - last character (or NULL at the start of input)
$parser->line - current line (or NULL if line tracking is disabled)
$parser->end - end of input indicator (TRUE at the end, FALSE otherwise)
$parser->vars - user-defined variables attached to the current state

Warning

All of the public properties (with the exception of $parser->vars) are read-only and must not be modified directly by the calling code.

Use the built-in parser methods to mutate the parser state. See Parser method overview.

Parser method overview

Refer to doc comments of the respective methods for more information.

Also see Character types.

Static methods

getCharType($char): int - determine character type
getCharTypeName($charType): string - get human-readable character type name

Instance methods

getInput(): string - get the input string
setInput($input): void - replace the input string (this also resets the parser)
getLength(): int - get length of the input string
isTrackingLineNumbers(): bool - see if line number tracking is enabled
type(): int - get type of the current character
is(...$types): bool - check whether the current character is of one of the specified types
atNewline(): bool - see if the parser is at the start of a newline sequence
eat(): ?string - go to the next character and return the current one (returns NULL at the end)
spit(): ?string - go to the previous character and return the current one (returns NULL at the beginning)
shift(): ?string - go to the next character and return it (returns NULL at the end)
unshift(): ?string - go to the previous character and return it (returns NULL at the beginning)
peek($offset, $absolute = false): ?string - get character at the given offset or absolute position (does not affect state)
seek($offset, $absolute = false): void - alter current position
reset(): void - reset states, vars and rewind to the beginning
rewind(): void - rewind to the beginning
eatChar($char): ?string - consume specific character and return the next character
tryEatChar(): bool - attempt to consume specific character and return success state
eatType($type): string - consume all characters of the specified type
eatTypes($typeMap): string - consume all characters of the specified types
eatWs(): string - consume whitespace, if any
eatUntil($delimiterMap, $skipDelimiter = true, $allowEnd = false): string - consume all characters until the specified delimiters
eatUntilEol($skip = true): string - consume all character until end of line or input
eatEol(): string - consume end of line sequence
eatRest(): string - consume reamaining characters
getChunk($start, $end): string - get chunk of the input (does not affect state)
detectEol(): ?string - find and return the next end of line sequence (does not affect state)
countStates(): int - get number of stored states
pushState(): void - store the current state
revertState(): void - revert to the last stored state and pop it
popState(): void - pop the last stored state without reverting to it
clearStates(): void - throw away all stored states
expectEnd(): void - ensure that the parser is at the end
expectNotEnd(): void - ensure that the parser is not at the end
expectChar($expectedChar): void - ensure that the current character matches the expectation
expectCharType($expectedType): void - ensure that the current character is of the given type

Example INI parser implementation

<?php

use Kuria\Parser\Parser;

/**
 * INI parser (example)
 */
class IniParser
{
    /**
     * Parse an INI string
     */
    public function parse(string $string): array
    {
        // create parser
        $parser = new Parser($string);

        // prepare variables
        $data = [];
        $currentSection = null;

        // parse
        while (!$parser->end) {
            // skip whitespace
            $parser->eatWs();
            if ($parser->end) {
                break;
            }

            // parse the current thing
            if ($parser->char === '[') {
                // a section
                $currentSection = $this->parseSection($parser);
            } elseif ($parser->char === ';') {
                // a comment
                $this->skipComment($parser);
            } else {
                // a key=value pair
                [$key, $value] = $this->parseKeyValue($parser);

                // add to output
                if ($currentSection === null) {
                    $data[$key] = $value;
                } else {
                    $data[$currentSection][$key] = $value;
                }
            }
        }

        return $data;
    }

    /**
     * Parse a section and return its name
     */
    private function parseSection(Parser $parser): string
    {
        // we should be at the [ character now, eat it
        $parser->eatChar('[');

        // eat everything until ]
        $sectionName = $parser->eatUntil(']');

        return $sectionName;
    }

    /**
     * Skip a commented-out line
     */
    private function skipComment(Parser $parser): void
    {
        // we should be at the ; character now, eat it
        $parser->eatChar(';');

        // eat everything until the end of line
        $parser->eatUntilEol();
    }

    /**
     * Parse a key=value pair
     */
    private function parseKeyValue(Parser $parser): array
    {
        // we should be at the first character of the key
        // eat characters until = is found
        $key = $parser->eatUntil('=');

        // eat everything until the end of line
        // that is our value
        $value = trim($parser->eatUntilEol());

        return [$key, $value];
    }
}

Using the parser

<?php

$iniParser = new IniParser();

$iniString = <<<INI
; An example comment
name=Foo
type=Bar

[options]
size=150x100
onload=
INI;

$data = $iniParser->parse($iniString);

print_r($data);

Output:

Array
(
    [name] => Foo
    [type] => Bar
    [options] => Array
        (
            [size] => 150x100
            [onload] =>
        )

)

Character types

The table below lists the default character types.

These types are available as constants on the Parser class:

Parser::C_NONE - no character (NULL)
Parser::C_WS - whitespace (tab, linefeed, vertical tab, form feed, carriage return and space)
Parser::C_NUM - numeric character (0-9)
Parser::C_STR - string character (a-z, A-Z, _ and any 8-bit char)
Parser::C_CTRL - control character (ASCII 127 and ASCII < 32 except whitespace)
Parser::C_SPECIAL - !"#$%&'()*+,-./:;<=>?@[\\]^\`{|}~

#	Character	Type
NULL	none	C_NONE
0	`0x00`	C_CTRL
1	`0x01`	C_CTRL
2	`0x02`	C_CTRL
3	`0x03`	C_CTRL
4	`0x04`	C_CTRL
5	`0x05`	C_CTRL
6	`0x06`	C_CTRL
7	`0x07`	C_CTRL
8	`0x08`	C_CTRL
9	`\t`	C_WS
10	`\n`	C_WS
11	`\v`	C_WS
12	`\f`	C_WS
13	`\r`	C_WS
14	`0x0e`	C_CTRL
15	`0x0f`	C_CTRL
16	`0x10`	C_CTRL
17	`0x11`	C_CTRL
18	`0x12`	C_CTRL
19	`0x13`	C_CTRL
20	`0x14`	C_CTRL
21	`0x15`	C_CTRL
22	`0x16`	C_CTRL
23	`0x17`	C_CTRL
24	`0x18`	C_CTRL
25	`0x19`	C_CTRL
26	`0x1a`	C_CTRL
27	`0x1b`	C_CTRL
28	`0x1c`	C_CTRL
29	`0x1d`	C_CTRL
30	`0x1e`	C_CTRL
31	`0x1f`	C_CTRL
32	`0x20`	C_WS
33	`!`	C_SPECIAL
34	`"`	C_SPECIAL
35	`#`	C_SPECIAL
36	`$`	C_SPECIAL
37	`%`	C_SPECIAL
38	`&`	C_SPECIAL
39	`'`	C_SPECIAL
40	`(`	C_SPECIAL
41	`)`	C_SPECIAL
42	`*`	C_SPECIAL
43	`+`	C_SPECIAL
44	`,`	C_SPECIAL
45	`-`	C_SPECIAL
46	`.`	C_SPECIAL
47	`/`	C_SPECIAL
48	`0`	C_NUM
49	`1`	C_NUM
50	`2`	C_NUM
51	`3`	C_NUM
52	`4`	C_NUM
53	`5`	C_NUM
54	`6`	C_NUM
55	`7`	C_NUM
56	`8`	C_NUM
57	`9`	C_NUM
58	`:`	C_SPECIAL
59	`;`	C_SPECIAL
60	`<`	C_SPECIAL
61	`=`	C_SPECIAL
62	`>`	C_SPECIAL
63	`?`	C_SPECIAL
64	`@`	C_SPECIAL
65	`A`	C_STR
66	`B`	C_STR
67	`C`	C_STR
68	`D`	C_STR
69	`E`	C_STR
70	`F`	C_STR
71	`G`	C_STR
72	`H`	C_STR
73	`I`	C_STR
74	`J`	C_STR
75	`K`	C_STR
76	`L`	C_STR
77	`M`	C_STR
78	`N`	C_STR
79	`O`	C_STR
80	`P`	C_STR
81	`Q`	C_STR
82	`R`	C_STR
83	`S`	C_STR
84	`T`	C_STR
85	`U`	C_STR
86	`V`	C_STR
87	`W`	C_STR
88	`X`	C_STR
89	`Y`	C_STR
90	`Z`	C_STR
91	`[`	C_SPECIAL
92	`\`	C_SPECIAL
93	`]`	C_SPECIAL
94	`^`	C_SPECIAL
95	`_`	C_STR
96	`	C_SPECIAL
97	`a`	C_STR
98	`b`	C_STR
99	`c`	C_STR
100	`d`	C_STR
101	`e`	C_STR
102	`f`	C_STR
103	`g`	C_STR
104	`h`	C_STR
105	`i`	C_STR
106	`j`	C_STR
107	`k`	C_STR
108	`l`	C_STR
109	`m`	C_STR
110	`n`	C_STR
111	`o`	C_STR
112	`p`	C_STR
113	`q`	C_STR
114	`r`	C_STR
115	`s`	C_STR
116	`t`	C_STR
117	`u`	C_STR
118	`v`	C_STR
119	`w`	C_STR
120	`x`	C_STR
121	`y`	C_STR
122	`z`	C_STR
123	`{`	C_SPECIAL
124	`\|`	C_SPECIAL
125	`}`	C_SPECIAL
126	`~`	C_SPECIAL
127	`0x7f`	C_CTRL
128	`0x80`	C_STR
129	`0x81`	C_STR
130	`0x82`	C_STR
131	`0x83`	C_STR
132	`0x84`	C_STR
133	`0x85`	C_STR
134	`0x86`	C_STR
135	`0x87`	C_STR
136	`0x88`	C_STR
137	`0x89`	C_STR
138	`0x8a`	C_STR
139	`0x8b`	C_STR
140	`0x8c`	C_STR
141	`0x8d`	C_STR
142	`0x8e`	C_STR
143	`0x8f`	C_STR
144	`0x90`	C_STR
145	`0x91`	C_STR
146	`0x92`	C_STR
147	`0x93`	C_STR
148	`0x94`	C_STR
149	`0x95`	C_STR
150	`0x96`	C_STR
151	`0x97`	C_STR
152	`0x98`	C_STR
153	`0x99`	C_STR
154	`0x9a`	C_STR
155	`0x9b`	C_STR
156	`0x9c`	C_STR
157	`0x9d`	C_STR
158	`0x9e`	C_STR
159	`0x9f`	C_STR
160	`0xa0`	C_STR
161	`0xa1`	C_STR
162	`0xa2`	C_STR
163	`0xa3`	C_STR
164	`0xa4`	C_STR
165	`0xa5`	C_STR
166	`0xa6`	C_STR
167	`0xa7`	C_STR
168	`0xa8`	C_STR
169	`0xa9`	C_STR
170	`0xaa`	C_STR
171	`0xab`	C_STR
172	`0xac`	C_STR
173	`0xad`	C_STR
174	`0xae`	C_STR
175	`0xaf`	C_STR
176	`0xb0`	C_STR
177	`0xb1`	C_STR
178	`0xb2`	C_STR
179	`0xb3`	C_STR
180	`0xb4`	C_STR
181	`0xb5`	C_STR
182	`0xb6`	C_STR
183	`0xb7`	C_STR
184	`0xb8`	C_STR
185	`0xb9`	C_STR
186	`0xba`	C_STR
187	`0xbb`	C_STR
188	`0xbc`	C_STR
189	`0xbd`	C_STR
190	`0xbe`	C_STR
191	`0xbf`	C_STR
192	`0xc0`	C_STR
193	`0xc1`	C_STR
194	`0xc2`	C_STR
195	`0xc3`	C_STR
196	`0xc4`	C_STR
197	`0xc5`	C_STR
198	`0xc6`	C_STR
199	`0xc7`	C_STR
200	`0xc8`	C_STR
201	`0xc9`	C_STR
202	`0xca`	C_STR
203	`0xcb`	C_STR
204	`0xcc`	C_STR
205	`0xcd`	C_STR
206	`0xce`	C_STR
207	`0xcf`	C_STR
208	`0xd0`	C_STR
209	`0xd1`	C_STR
210	`0xd2`	C_STR
211	`0xd3`	C_STR
212	`0xd4`	C_STR
213	`0xd5`	C_STR
214	`0xd6`	C_STR
215	`0xd7`	C_STR
216	`0xd8`	C_STR
217	`0xd9`	C_STR
218	`0xda`	C_STR
219	`0xdb`	C_STR
220	`0xdc`	C_STR
221	`0xdd`	C_STR
222	`0xde`	C_STR
223	`0xdf`	C_STR
224	`0xe0`	C_STR
225	`0xe1`	C_STR
226	`0xe2`	C_STR
227	`0xe3`	C_STR
228	`0xe4`	C_STR
229	`0xe5`	C_STR
230	`0xe6`	C_STR
231	`0xe7`	C_STR
232	`0xe8`	C_STR
233	`0xe9`	C_STR
234	`0xea`	C_STR
235	`0xeb`	C_STR
236	`0xec`	C_STR
237	`0xed`	C_STR
238	`0xee`	C_STR
239	`0xef`	C_STR
240	`0xf0`	C_STR
241	`0xf1`	C_STR
242	`0xf2`	C_STR
243	`0xf3`	C_STR
244	`0xf4`	C_STR
245	`0xf5`	C_STR
246	`0xf6`	C_STR
247	`0xf7`	C_STR
248	`0xf8`	C_STR
249	`0xf9`	C_STR
250	`0xfa`	C_STR
251	`0xfb`	C_STR
252	`0xfc`	C_STR
253	`0xfd`	C_STR
254	`0xfe`	C_STR
255	`0xff`	C_STR

Customizing character types

Character types can be customized by extending the base Parser class.

The following example changes "-" and "." from CHAR_SPECIAL to CHAR_STR and inherits everything else.

<?php

class CustomParser extends Parser
{
    const CHAR_TYPE_MAP = [
        '-' => self::C_STR,
        '.' => self::C_STR,
    ] + parent::CHAR_TYPE_MAP; // inherit everything else
}

// usage example
$parser = new CustomParser('foo-bar.baz');

var_dump($parser->eatType(CustomParser::C_STR));

Output:

string(11) "foo-bar.baz"

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.rst		CHANGELOG.rst
LICENSE		LICENSE
README.rst		README.rst
composer.json		composer.json
phpunit.xml.dist		phpunit.xml.dist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parser

Features

Requirements

Usage

Creating a parser

Parser properties

Parser method overview

Static methods

Instance methods

Example INI parser implementation

Using the parser

Character types

Customizing character types

About

Uh oh!

Releases

Uh oh!

Languages

License

kuria/parser

Folders and files

Latest commit

History

Repository files navigation

Parser

Static methods

Instance methods

Using the parser

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages