mregexp is a small utf8 compatible regex library consisting of only two files written for C99/C++11.
MRegexp *re = mregexp_compile("[0-9]+");If an error occurs mregexp_compile returns NULL. To get the specific error code use mregexp_error. Error values and their meaning can be found in mregexp.h.
// Lets find the first sequence of digits in a string
const char *s = "hello 12345 world";
MRegexp *re = mregexp_compile("\\d");
MRegexpMatch m;
if (mregexp_match(re, s, &m)) {
printf("Found digits at position %lu\n", m.match_begin);
} else {
printf("Could not find any digits\n");
}
// Compiled regular expressions are stored on the heap
// and must be freed
mregexp_free(re);The MRegexpMatch type looks somewhat like this:
typedef struct {
size_t match_begin;
size_t match_end;
} MRegexpMatch;The match_begin field represents a byte offset in the matched string to the first occurence of a pattern, so that s + m.match_begin points to the beginning of the match. match_end is a byte offset in the matched string to the first byte which did not match the pattern.
First of all, mregexp is still in a very early stage of development.
To use mregexp you will need two files: mregexp.c and mregexp.h. Include mregexp.h wherever you wish to use it. mregexp.c can be compiled independently into an object file and then be linked with your project.
mregexp comes with a few tests to ensure that changes won't break anything. To run the tests you'll need libcheck. Then just run
make test| Metacharacter | Description |
|---|---|
| c | Most characters (like c) match themselve literally |
| \c | Some characters are used as metacharacters. To use them literally escape them |
| \n \t \r | newline, tab, carriage return |
| \d \s \w | digit, whitespace, alphanumeric character (a-z, A-Z, 0-9 and _) |
| \D \S \W | do not match the groups described above |
| . | Matches any character (including newline) |
| * | Matches the preceding token as often as possible |
| + | Matches the preceding token at least once and as often as possible |
| {m,n} | Matches the preceding token at least m times and at most n times. m and n may be ommited to ignore the min or max value. |
| (c) | Matches the expression inside the parentheses. |
| [c] | Matches all characters inside the brackets. Ranges like a-z may also be used |
| [^c] | Does not match the characters inside the bracket. |
| | | Matches either the expression before the | or the expression after it |