You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+14-28Lines changed: 14 additions & 28 deletions
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,10 @@ of characters.
6
6
7
7
## What does it mean?
8
8
9
-
* Interoperability with platform-native string type means that `sys_string` makes conversions to and from native string types (`NSString *` or `CFStringRef` on macOS/iOS, Java `String` on Android, `HSTRING` or `const wchar_t *` on Windows and `const char *` on Linux) as efficient as possible and ideally 0 cost operations. For example on Apple's platforms it stores `NSString *` internally allowing zero cost conversion. On Android where no-op conversions to Java strings are impossible for technical reasons, the internal storage is such that it makes conversions as cheap as possible.
9
+
* Interoperability with platform-native string type means that `sys_string` makes conversions to and from native string types as efficient as possible and ideally 0 cost operations. Native string types are things like `NSString *` or `CFStringRef` on macOS/iOS, Java `String` on Android, `const wchar_t *`, `HSTRING` or `BSTR` on Windows and `const char *` on Linux. For example on Apple's platforms it stores `NSString *` internally allowing zero cost conversion. On Android where no-op conversions to Java strings are impossible for technical reasons, the internal storage is such that it makes conversions as cheap as possible.
10
+
11
+
Some platforms, like Windows, support multiple kinds of native string types. Internally, `sys_string` is a specialization of template `sys_string_t<Storage>` where the `Storage` parameter defines what kind of native string type to use. The default storage for `sys_string` is picked for you based on your platform (you can change it via compilation options) but you can also directly use other specializations in your code if necessary.
12
+
10
13
* Immutable. String instances cannot be modified. To do modifications you use a separate "builder" class. This is similar to how many other languages do it and results in improved performance and elimination of whole class of errors.
11
14
* Unicode-first. Instances of `sys_string` always store Unicode characters in either UTF-8, UTF-16 or UTF-32, depending on platform. Iteration can be done in all of these encodings and all operations (case conversion, case insensitive comparisons, trimming) are specified as actions on sequence of Unicode codepoints using Unicode algorithms.
12
15
* Operations similar to Python or ECMAScript strings means that you can do things like `rtrim`, `split`, `join`, `starts_with` etc. in a way proven to be natural and productive in those languages.
@@ -29,45 +32,28 @@ Finally, and unrelatedly to the above, `std::string` lacks some simple things th
29
32
The following requirements which other string classes often have are specifically non-goals of this library.
30
33
31
34
* Support C++ allocators. Since `sys_string` is meant to interoperate with system string class/types, it necessarily has to use the same allocation mechanisms as those.
32
-
* Have an efficient `const char * c_str()` method on all platforms. The goal of the library is to provide an efficient conversion to the native string types rather than specifically `char *`. While ability to get `char *`*is* provided everywhere it might involve additional memory allocations and other overhead. Note that on Linux `char *` is the system type so it can be obtained with 0 cost.
35
+
* Have an efficient `const char * c_str()` method on all platforms. The goal of the library is to provide an efficient conversion to the native string types rather than specifically `const char *`. While ability to obtain `const char *`*is* provided everywhere, it might involve additional memory allocations and other overhead. Note that on Linux `char *` is the system type so it can be obtained with 0 cost.
33
36
* Make `sys_string` an STL container. Conceptually a string is not a container. You can **view** contents of a string as a sequence of UTF-8 or UTF-16 or UTF-32 codepoints and the library provides such views which function as STL ranges.
34
37
* Support non-Unicode "narrow" and "wide" character encodings. `sys_string` only understands Unicode. Conversions to/from non-Unicode encodings are a job for a different library. Specifically `char *` in any of the library's methods is required to be in UTF-8.
35
38
* Provide locale-dependent functionality. Properly supporting locales with Unicode is an important area but it belongs to another library, not this one. This library is focused on locale-independent behavior that works the same everywhere. For example `to_lower` methods implements locale-independent part of Unicode specification. (Final uppercase Σ transforms to ς but I always transforms to i)
36
39
37
40
## Performance
38
41
39
-
In general `sys_string` aims to have the same performance as hand-crafted code that uses corresponding native string types on every platforms. For example on macOS code using `sys_string` should be as fast as code manually using `NSString *`/`CFStringRef`.
40
-
This needs to be kept in mind when evaluating whether `sys_string` is a better choice for your application that `std::string`. Continuing Apple's example an `std::string` is generally faster for direct character access than `NSString *` and thus `sys_string`. If your code rarely transfers data from `NSString *` to `std::string` and spends most of the time iterating over `std::string` characters then using `std::string` might be the right choice.
42
+
In general `sys_string` aims to have the same performance of its operations as best hand-crafted code that uses corresponding native string types on every platforms. For example on macOS code using `sys_string` should be as fast as code manually using `NSString *`/`CFStringRef`.
43
+
This needs to be kept in mind when evaluating whether `sys_string` is a better choice for your application that `std::string`. Continuing Apple's example, an `std::string` is generally faster for direct character access than `NSString *` and thus faster than `sys_string` too. If your code rarely transfers data from `NSString *` to `std::string` and spends most of the time iterating over `std::string` characters then using `std::string` might be the right choice.
44
+
45
+
Another way to look at it is that `sys_string` sometimes trades micro-benchmarking performance of individual string operations for reduced copying, allocations and memory pressure overall. Whether this is a right tradeoff for you depends on specifics of your codebase.
41
46
42
47
## Compatibility
43
48
44
49
This library has been tested with
45
-
* Xcode 12 on x86_64 and arm64
46
-
* MSVC 16.9 on x86_64
47
-
* Clang 11.0.5 under Android NDK on x86, x86_64, armeabi-v7a and arm64-v8a architectures
50
+
* Xcode 13 on x86_64 and arm64
51
+
* MSVC 16.9 and 17.1 on x86_64
52
+
* Clang 12.0.5 under Android NDK on x86, x86_64, armeabi-v7a and arm64-v8a architectures
48
53
* GCC 9.3 on x86_64 Ubuntu 20.04
49
54
50
-
51
-
## Building
52
-
53
-
If you use CMake clone this repository and add the `lib` directory as subdirectory. Something like
Currently there is only one storage type available on Android.
4
+
5
+
`sys_string` internally stores a sequence of `char16_t` which can be converted to `jstring` with the least amount of JNI overhead. A conversion is not-trivial, however. It incurs allocation and copying. (A possible approach to store global references to `jstring` in `sys_string` is not feasible for many reasons, among them the fact that global reference table is of limited size).
6
+
As expected with JNI, all conversion require JNIEnv * argument.
Note the **null preservation** above. A default constructed `sys_string` or `sys_string` constructed from a `null` system string type produces `null` system string pointer back. This is by design to allow round-tripping of `null`s between C++ and ObjectiveC without information loss.
The only storage type available on Linux is "generic Unix". It is meant to interoperate with Posix-style APIs that deal with `char *`.
4
+
`sys_string` logically stores its content as a sequence of `char`s in UTF-8 encoding. Conversions **from**`const char *` always incur copying and sometimes memory allocation (`sys_string` does small string optimization similar to `std::string`).
5
+
Conversion **to**`const char *` are 0-cost.
6
+
7
+
```cpp
8
+
constchar * cstr_in = "abc";
9
+
sys_string str(cstr_in);
10
+
const char * cstr_out = str.c_str();
11
+
assert(strcmp(cstr_out, cstr_in) == 0);
12
+
assert(cstr_out != cstr_in); //in and out are NOT the same!
Note that unlike other storage types there is no null preservation here. `c_str()` returns an empty C string for default constructed `sys_string` or one constructed from `nullptr`. This is deliberate to align with `std::string` behavior that never produces `nullptr` from its `c_str()`.
0 commit comments