September 2024 NOTE: This repo is obsolete. Please use following repos for processing single-byte (ASCII/ANSI) strings instead:
- mdz_ansi_alg for string processing algorithms
- mdz_ansi_16 for short attached strings
- mdz_ansi_dyn for long dynamically-allocated strings
Repos with unicode-strings functions (for UTF-8, UTF-16, UTF-32, wchar_t) will come soon.
NOTE: All 0.x releases are kind of "alpha-versions" without expectations of API backward-compatibility.
mdz_string Overview
mdz_string Advantages
Performance Comparison
mdz_string Usage
mdz_ansi Overview
mdz_wchar Overview
mdz_utf8 Overview
mdz_utf16 Overview
mdz_utf32 Overview
Licensing info
Wiki: mdz_string Wiki
file: "mdz_string.h"
Please take a look at "mdz_string.h" file or mdz_string Wiki site for detailed functions descriptions.
mdz_string - very lightweight and versatile C library for handling single-byte (ASCII/ANSI) strings and Unicode strings, developed by maxdz Software GmbH.The library supports ASCII/ANSI, UTF8, UTF16, UTF32, wchar strings. Source code of library is highly-portable, conforms to ANSI C 89/90 Standard. Builds for Win32/Win64, Linux, FreeBSD, Android, macOS are available.
Only shared/dynamically loaded libraries (.so and .dll files with import libraries) are available for evaluation testing purposes. Static libraries/source code are covered by our commercial licenses.
Linux binaries are built against Linux Kernel 2.6.18 - and thus should be compatible with Debian (from ver. 4.0), Ubuntu (from ver. 8.04), Fedora (from ver. 9)
FreeBSD binaries - may be used from FreeBSD ver. 7.0
Win32 binaries are built using Visual Studio Platform Toolset "v90", thus are compatible with Windows versions from Windows 2000.
Win64 binaries are built using Visual Studio Platform Toolset "v100", thus are compatible with Windows versions from Windows XP.
Android x86/armeabi-v7a binaries - may be used from Android API level 16 ("Jelly Bean" ver. 4.1.x)
Android x86_64/arm64-v8a binaries - may be used from Android API level 21 ("Lollipop" ver. 5.0)
macOS binaries - x86_64, from MacOS X v10.6.0
1. High portability: the whole code conforms to ANSI C 89/90 Standard. Multithreading/asynchronous part is POSIX compatible (under UNIX/Linux).
2. Little dependencies: basically mdz_string functions are only dependend on standard C-library memory-management/access functions. Multithreading part is dependend on POSIX pthreads API (under UNIX/Linux) and old process control/synchronization API (from Windows 2000). It means you can use library in your code withouth any further dependencies except standard platform libraries/APIs.
3. Fast: Our single-byte (ASCII/ANSI) strings are very fast, concerning operations like searching, insertion, deletion, etc. especially for very large (like hundreds of megabytes or gigabytes) strings.
4. Flexibilty: nearly all functions our single-byte (ASCII/ANSI) strings, contain not only "left position" but also "right position" parameters to limit processed area from right. Also library contains more string functions than according STL, boost or glib analogs have.
5. Extended error-checking: all functions preserve internal error-code pointing the problem. It is possible to use strict error-checking (when all preserved error-codes should be MDZ_ERROR_NONE) or "relaxed"-checking - when only returned mdz_false will indicate error.
6. Extended control: strings do only explicit operations. It means for example, when "insert" function is called with auto-reservation flag set in mdz_false - it will return error if there is not enough capacity in string. No implicit reservations will be made.
7. Attached usage: strings should not necessarily use dynamically-allocated memory - which may be not available on your embedded system (or if malloc()/free() are forbidden to use in your safety-critical software). Just attach string/data to your statically-allocated memory and use all strings functionality.
8. Cache-friendly: it is possible to keep controlling and data parts together in memory using "embedded part".
9. Unicode support: UTF-8, UTF-16, UTF-32 are supported.
10. wchar_t support: also wchar_t strings are supported, with 2 and 4 bytes-large wchar_t characters.
11. Endianness-aware strings: wchar, utf16 and utf32 strings are endiannes-aware thus may be used to produce and manipulate strings with pre-defined endianness even if endianness of host differs.
12. Unicode "surrogate-pairs" awareness: 2-byte Unicode strings correctly process/distinguish "surrogate-pairs" as 1 Unicode symbol.
13. Asynchronous execution: almost all functions of single-byte (ASCII/ANSI) strings and insert functions can be executed asynchronously
Performance comparison tables for mdz_ansi_find() and mdz_ansi_firstOf() give an idea about mdz_ansi library overall performance on different platforms compared to STL and standard C library. Modern implementationsof STL and standard C library are pretty fast, using optimized versions of memory-access functions.
- mdz_ansi_find Test
Following tests are executed:
- Test 1/100M": Find 1 byte - in the end of 100M bytes long string
- Test 5/100M": Find 5 bytes long string - in the end of 100M bytes long string
- Test 10/100M": Find 10 bytes long string - in the end of 100M bytes long string
- Test 100/100M": Find 100 bytes long string - in the end of 100M bytes long string
- Test 1K/100M": Find 1K bytes long string - in the end of 100M bytes long string
- Test 500K/1M": Find 500K bytes long string - in the end of 1M bytes long string
- Test 100M-100/100M": Find "100M minus 100" bytes long string - in the end of 100M bytes long string
- Test 100M/100M": Find 100M bytes long string - in 100M bytes long string
For Windows 10 (64-bit) on Intel i5-6600 @ 3.30GHz (4 cores/4 threads)
Monotone test : "long string" and "string to find" are both filled with '1's; on the last position of both strings is '2'
monotone = MDZ_FIND_MONOTONE method
clib = MDZ_FIND_CLIB method (standard C library)
brute = MDZ_FIND_BRUTE method
bmh = MDZ_FIND_BMH method
- VC++ toolset v140 (32-bit)
(all numbers are in microseconds measured using QueryPerformanceCounter() in main execution thread)
Test | mdz_ansi, monotone | mdz_ansi, clib | mdz_ansi, brute | mdz_ansi, bmh | std::string.find() | clib (strstr()) |
---|---|---|---|---|---|---|
1/100M | 70,351 | 162,681 | 70,579 | |||
5/100M | 407,668 | 460,052 | 3,045,869 | 781,655 | 3,381,061 | 482,075 |
10/100M | 1,334,782 | 707,712 | 4,394,022 | 780,128 | 4,206,329 | 731,395 |
100/100M | 1,333,516 | 10,914,646 | 15,779,350 | 781,370 | 15,652,407 | 11,253,026 |
1K/100M | 1,332,838 | 70,179,989 | 139,398,637 | 781,439 | 139,808,212 | 75,808,535 |
500K/1M | 13,202 | 166,409,422 | 323,375,345 | 9,411 | 324,276,637 | 178,302,908 |
100M-100/100M | 1,262,919 | 10,884,012 | 14,182,350 | 1,066,737 | 14,150,110 | 10,383,086 |
100M/100M | 117,970 | 144,573 | 114,565 |
- MinGW/gcc toolset (32-bit)
(all numbers are in microseconds measured using QueryPerformanceCounter() in main execution thread)
Test | mdz_ansi, monotone | mdz_ansi, clib | mdz_ansi, brute | mdz_ansi, bmh |
---|---|---|---|---|
1/100M | 148,067 | |||
5/100M | 534,070 | 1,599,882 | 6,825,862 | 784,326 |
10/100M | 551,404 | 3,635,378 | 7,898,385 | 783,832 |
100/100M | 550,701 | 32,447,796 | 20,451,496 | 786,006 |
1K/100M | 551,213 | 348,052,489 | 117,762,194 | 784,335 |
500K/1M | 7,851 | 814,620,053 | 246,574,213 | 6,263 |
100M-100/100M | 997,729 | 33,028,357 | 11,705,985 | 456,680 |
100M/100M | 328,564 |
- mdz_ansi_firstOf Test
Following tests are executed:
- Test 1/100M": Find first of 1 byte - in the end of 100M bytes long string
- Test 5/100M": Find first of 5 bytes - in the end of 100M bytes long string
- Test 20/100M": Find first of 20 bytes - in the end of 100M bytes long string
- Test 50/100M": Find first of 50 bytes - in the end of 100M bytes long string
- Test 100/100M": Find first of 100 bytes - in the end of 100M bytes long string
For Windows 10 (64-bit) on Intel i5-6600 @ 3.30GHz (4 cores/4 threads)
- VC++ toolset v140 (32-bit)
(all numbers are in microseconds measured using QueryPerformanceCounter() in main execution thread)
Test | mdz_ansi | std::string.find_first_of() | clib (strcspn()) |
---|---|---|---|
1/100M | 70,078 | 163,666 | 2,085,714 |
5/100M | 370,204 | 3,719,660 | 2,077,677 |
20/100M | 369,162 | 5,714,212 | 2,076,031 |
50/100M | 368,994 | 10,965,401 | 2,078,038 |
100/100M | 369,360 | 18,727,283 | 2,076,740 |
- MinGW/gcc toolset (32-bit)
(all numbers are in microseconds measured using QueryPerformanceCounter() in main execution thread)
Test | mdz_ansi |
---|---|
1/100M | 153,511 |
5/100M | 278,387 |
20/100M | 276,389 |
50/100M | 275,956 |
100/100M | 277,709 |
mdz_string is implemented with strict input parameters checking. It means mdz_false or some other error indication will be returned if one or several input parameters are invalid - even if such an invalidity doesn't lead to inconsistence (for example adding or removing 0 items).
Test license generation: - in order to get free test-license, please proceed to our Shop page maxdz Shop and register an account. After registration you will be able to obtain free 30-days test-licenses for our products using "Obtain for free" button. Test license data should be used in mdz_string_init() call for library initialization.
NOTE: All 0.x releases are kind of "beta-versions" and can be used 1) only with test-license (during test period of 30 days, with necessity to re-generate license for the next 30 days test period) and 2) without expectations of interface backward-compatibility.
mdz_string_init() with license information should be called for library initialization before any subsequent calls:
#include <mdz_string.h>
int main(int argc, char* argv[])
{
/* mdz_string library initialization using test info retrieved after license generation (see "Test license generation" above) */
mdz_bool bRet = mdz_string_init("<first-name-hash>", "<last-name-hash>", "<email-hash>", "<license-hash>");
...
mdz_string_uninit(); /* call for un-initialization of library */
return 0;
}
After library initialization call mdz_utf8_create() for utf8 string creation. There should be also symmetric mdz_utf8_destroy() call for every create, otherwise allocated for string memory remains occupied:
#include <mdz_string.h>
#include <mdz_utf8.h>
int main(int argc, char* argv[])
{
mdz_bool bRet = mdz_string_init("<first-name-hash>", "<last-name-hash>", "<email-hash>", "<license-hash>");
// initialize pAnsi
mdz_Utf8* pUtf8 = mdz_utf8_create(0); // create utf8-string
...
...
// use pUtf8
...
...
// destroy pUtf8
mdz_utf8_destroy(&pUtf8); // after this pUtf8 should be NULL
mdz_string_uninit();
...
}
Use mdz_Utf8* pointer for subsequent library calls:
#include <mdz_string.h>
#include <mdz_utf8.h>
int main(int argc, char* argv[])
{
mdz_bool bRet = mdz_string_init("<first-name-hash>", "<last-name-hash>", "<email-hash>", "<license-hash>");
mdz_Utf8* pUtf8 = mdz_utf8_create(0); // create utf8-string
// reserve memory for 5 elements
bRet = mdz_utf8_reserve(pUtf8, 5);
// insert 'b' in front position with auto-reservation if necessary
bRet = mdz_utf8_insertAnsi(pUtf8, 0, "b", 1, mdz_true); // "b" after this call
// append string with "cde" with auto-reservation if necessary
bRet = mdz_utf8_insert(pUtf8, (size_t) -1, "cde", 3, mdz_true); // "bcde" after this call
...
mdz_utf8_destroy(&pUtf8);
mdz_string_uninit();
...
}
Wiki: mdz_ansi Wiki
file: "mdz_ansi.h"
Please take a look at "mdz_ansi.h" file or mdz_ansi Wiki site for detailed functions descriptions.
Wiki: mdz_wchar Wiki
file: "mdz_wchar.h"
Please take a look at "mdz_wchar.h" file or mdz_wchar Wiki site for detailed functions descriptions.
Wiki: mdz_utf8 Wiki
file: "mdz_utf8.h"
Please take a look at "mdz_utf8.h" file or mdz_utf8 Wiki site for detailed functions descriptions.
Wiki: mdz_utf16 Wiki
file: "mdz_utf16.h"
Please take a look at "mdz_utf16.h" file or mdz_utf16 Wiki site for detailed functions descriptions.
Wiki: mdz_utf32 Wiki
file: "mdz_utf32.h"
Please take a look at "mdz_utf32.h" file or mdz_utf32 Wiki site for detailed functions descriptions.
Use of mdz_string library is regulated by license agreement in LICENSE.txt
Basically private non-commercial "test" usage is unrestricted. Commercial usage of library (incl. its source code) will be regulated by according license agreement.