Skip to content

Commit 15a5b5b

Browse files
committed
Merge branch 'dev'
2 parents 0a711ca + a60cc29 commit 15a5b5b

34 files changed

+2445
-1183
lines changed

README.md

Lines changed: 14 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,10 @@ of characters.
66

77
## What does it mean?
88

9-
* Interoperability with platform-native string type means that `sys_string` makes conversions to and from native string types (`NSString *` or `CFStringRef` on macOS/iOS, Java `String` on Android, `HSTRING` or `const wchar_t *` on Windows and `const char *` on Linux) as efficient as possible and ideally 0 cost operations. For example on Apple's platforms it stores `NSString *` internally allowing zero cost conversion. On Android where no-op conversions to Java strings are impossible for technical reasons, the internal storage is such that it makes conversions as cheap as possible.
9+
* Interoperability with platform-native string type means that `sys_string` makes conversions to and from native string types as efficient as possible and ideally 0 cost operations. Native string types are things like `NSString *` or `CFStringRef` on macOS/iOS, Java `String` on Android, `const wchar_t *`, `HSTRING` or `BSTR` on Windows and `const char *` on Linux. For example on Apple's platforms it stores `NSString *` internally allowing zero cost conversion. On Android where no-op conversions to Java strings are impossible for technical reasons, the internal storage is such that it makes conversions as cheap as possible.
10+
11+
Some platforms, like Windows, support multiple kinds of native string types. Internally, `sys_string` is a specialization of template `sys_string_t<Storage>` where the `Storage` parameter defines what kind of native string type to use. The default storage for `sys_string` is picked for you based on your platform (you can change it via compilation options) but you can also directly use other specializations in your code if necessary.
12+
1013
* Immutable. String instances cannot be modified. To do modifications you use a separate "builder" class. This is similar to how many other languages do it and results in improved performance and elimination of whole class of errors.
1114
* Unicode-first. Instances of `sys_string` always store Unicode characters in either UTF-8, UTF-16 or UTF-32, depending on platform. Iteration can be done in all of these encodings and all operations (case conversion, case insensitive comparisons, trimming) are specified as actions on sequence of Unicode codepoints using Unicode algorithms.
1215
* Operations similar to Python or ECMAScript strings means that you can do things like `rtrim`, `split`, `join`, `starts_with` etc. in a way proven to be natural and productive in those languages.
@@ -29,45 +32,28 @@ Finally, and unrelatedly to the above, `std::string` lacks some simple things th
2932
The following requirements which other string classes often have are specifically non-goals of this library.
3033

3134
* Support C++ allocators. Since `sys_string` is meant to interoperate with system string class/types, it necessarily has to use the same allocation mechanisms as those.
32-
* Have an efficient `const char * c_str()` method on all platforms. The goal of the library is to provide an efficient conversion to the native string types rather than specifically `char *`. While ability to get `char *` *is* provided everywhere it might involve additional memory allocations and other overhead. Note that on Linux `char *` is the system type so it can be obtained with 0 cost.
35+
* Have an efficient `const char * c_str()` method on all platforms. The goal of the library is to provide an efficient conversion to the native string types rather than specifically `const char *`. While ability to obtain `const char *` *is* provided everywhere, it might involve additional memory allocations and other overhead. Note that on Linux `char *` is the system type so it can be obtained with 0 cost.
3336
* Make `sys_string` an STL container. Conceptually a string is not a container. You can **view** contents of a string as a sequence of UTF-8 or UTF-16 or UTF-32 codepoints and the library provides such views which function as STL ranges.
3437
* Support non-Unicode "narrow" and "wide" character encodings. `sys_string` only understands Unicode. Conversions to/from non-Unicode encodings are a job for a different library. Specifically `char *` in any of the library's methods is required to be in UTF-8.
3538
* Provide locale-dependent functionality. Properly supporting locales with Unicode is an important area but it belongs to another library, not this one. This library is focused on locale-independent behavior that works the same everywhere. For example `to_lower` methods implements locale-independent part of Unicode specification. (Final uppercase Σ transforms to ς but I always transforms to i)
3639

3740
## Performance
3841

39-
In general `sys_string` aims to have the same performance as hand-crafted code that uses corresponding native string types on every platforms. For example on macOS code using `sys_string` should be as fast as code manually using `NSString *`/`CFStringRef`.
40-
This needs to be kept in mind when evaluating whether `sys_string` is a better choice for your application that `std::string`. Continuing Apple's example an `std::string` is generally faster for direct character access than `NSString *` and thus `sys_string`. If your code rarely transfers data from `NSString *` to `std::string` and spends most of the time iterating over `std::string` characters then using `std::string` might be the right choice.
42+
In general `sys_string` aims to have the same performance of its operations as best hand-crafted code that uses corresponding native string types on every platforms. For example on macOS code using `sys_string` should be as fast as code manually using `NSString *`/`CFStringRef`.
43+
This needs to be kept in mind when evaluating whether `sys_string` is a better choice for your application that `std::string`. Continuing Apple's example, an `std::string` is generally faster for direct character access than `NSString *` and thus faster than `sys_string` too. If your code rarely transfers data from `NSString *` to `std::string` and spends most of the time iterating over `std::string` characters then using `std::string` might be the right choice.
44+
45+
Another way to look at it is that `sys_string` sometimes trades micro-benchmarking performance of individual string operations for reduced copying, allocations and memory pressure overall. Whether this is a right tradeoff for you depends on specifics of your codebase.
4146

4247
## Compatibility
4348

4449
This library has been tested with
45-
* Xcode 12 on x86_64 and arm64
46-
* MSVC 16.9 on x86_64
47-
* Clang 11.0.5 under Android NDK on x86, x86_64, armeabi-v7a and arm64-v8a architectures
50+
* Xcode 13 on x86_64 and arm64
51+
* MSVC 16.9 and 17.1 on x86_64
52+
* Clang 12.0.5 under Android NDK on x86, x86_64, armeabi-v7a and arm64-v8a architectures
4853
* GCC 9.3 on x86_64 Ubuntu 20.04
4954

50-
51-
## Building
52-
53-
If you use CMake clone this repository and add the `lib` directory as subdirectory. Something like
54-
55-
```cmake
56-
add_subdirectory(PATH_TO_SYS_STRING/lib, sys_string)
57-
```
58-
59-
You need to have your compiler to default to at least C++17 or set `CMAKE_CXX_STANDARD` to at least 17 in order for build to succeed.
60-
61-
If you use a different build system all you need is to set your include path to `lib/inc` and compile the sources under `lib/cpp`.
62-
63-
No special compilation flags are required except on Windows where `_CRT_SECURE_NO_WARNINGS` must be defined to avoid MSVC bogus warnings.
64-
On Mac you need to link with `CoreFoundation` framework and on Windows with `runtimeobject.lib`.
65-
66-
### Configuration options
67-
68-
* `SYS_STRING_NO_S_MACRO` - set to 1 to disable short `S()` macro. See [Usage](doc/Usage.md#basics) for details
69-
7055
## Usage
7156

72-
See [Usage](doc/usage.md)
57+
* [Building](doc/Building.md)
58+
* [Usage](doc/Usage.md)
7359

doc/Android.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
## Windows platform conversions
2+
3+
Currently there is only one storage type available on Android.
4+
5+
`sys_string` internally stores a sequence of `char16_t` which can be converted to `jstring` with the least amount of JNI overhead. A conversion is not-trivial, however. It incurs allocation and copying. (A possible approach to store global references to `jstring` in `sys_string` is not feasible for many reasons, among them the fact that global reference table is of limited size).
6+
As expected with JNI, all conversion require JNIEnv * argument.
7+
8+
```cpp
9+
JNIEnv * env = ...;
10+
11+
//Conversions from/to jstring
12+
jstring jstr_in = env->NewString((const jchar *)u"abc", std::size(u"abc") - 1);
13+
sys_string str(env, jstr_in);
14+
assert(str == S("abc"));
15+
jstring jstr_out = str.make_jstring(env);
16+
assert(jstr_in != jstr_out); //in and out are NOT the same!
17+
18+
//nullptr
19+
assert(sys_string().make_jstring(env) == nullptr);
20+
assert(sys_string(env, nullptr) == sys_string());
21+
```
22+
23+
Note the **null preservation** above.
24+

doc/Apple.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
## Apple platform conversions
2+
3+
On Apple platforms (macOS, iOS) the default `sys_string` storage is `CFStringRef` to interoperate with Core Foundation and Cocoa/UIKit APIs.
4+
5+
As a convenience Apple platforms also support "generic Unix" storage which stores `char *` and is meant to interoperate with plain Unix API.
6+
It can be selected via `#define SYS_STRING_USE_GENERIC 1` and is described under [Linux](Linux.md).
7+
8+
With `CFStringRef` storage `sys_string` is trivially convertible from and to `CFStringRef` or `NSString *`.
9+
10+
```objc
11+
//Converting from/to CFStringRef
12+
CFStringRef cfstr_in = CFSTR("abc");
13+
sys_string str1(cfstr_in);
14+
CFStringRef cfstr_out = str1.cf_str();
15+
assert(cfstr_in == cfstr_out);
16+
17+
//Converting from/to NSString *
18+
NSString * nsstr_in = @"abc";
19+
sys_string str2(nsstr_in);
20+
NSString * nsstr_out = str2.ns_str();
21+
assert(nsstr_in == nsstr_out);
22+
23+
//nullptr
24+
assert(sys_string((NSString *)nullptr) == sys_string());
25+
assert(sys_string().cf_str() == nullptr);
26+
assert(sys_string().ns_str() == nullptr);
27+
```
28+
29+
Note the **null preservation** above. A default constructed `sys_string` or `sys_string` constructed from a `null` system string type produces `null` system string pointer back. This is by design to allow round-tripping of `null`s between C++ and ObjectiveC without information loss.
30+

doc/Building.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Building and configuration
2+
3+
4+
5+
## Building
6+
7+
If you use CMake clone this repository and add the `lib` directory as subdirectory. Something like
8+
9+
```cmake
10+
add_subdirectory(PATH_TO_SYS_STRING_REPO/lib, sys_string)
11+
```
12+
13+
Alternatively with modern CMake you can just do
14+
15+
```cmake
16+
include(FetchContent)
17+
FetchContent_Declare(sys_string
18+
GIT_REPOSITORY git@github.com:gershnik/sys_string.git
19+
GIT_TAG <desired tag like v1.2>
20+
GIT_SHALLOW TRUE
21+
)
22+
FetchContent_MakeAvailable(sys_string)
23+
```
24+
25+
You need to have your compiler to default to at least C++17 or set `CMAKE_CXX_STANDARD` to at least 17 in order for build to succeed.
26+
27+
If you use a different build system all you need is to set your include path to `lib/inc` and compile the sources under `lib/cpp`.
28+
29+
No special compilation flags are required except on Windows where `_CRT_SECURE_NO_WARNINGS` must be defined to avoid MSVC bogus warnings.
30+
On Mac you need to link with `CoreFoundation` framework and on Windows with `runtimeobject.lib`.
31+
32+
### Configuration options
33+
34+
* `SYS_STRING_NO_S_MACRO` - set to 1 to disable short `S()` macro. See [Usage](doc/Usage.md#basics) for details
35+
* `SYS_STRING_WIN_BSTR` - set to 1 to use `BSTR` as native `sys_string` type on Windows
36+
* `SYS_STRING_WIN_HSTRING` - set to 1 to use `HSTRING` as native `sys_string` type on Windows
37+
* `SYS_STRING_USE_GENERIC` - set to 1 to use `const char *` as native `sys_string` type on MacOS (similar to Linux)

doc/Linux.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
## Windows platform conversions
2+
3+
The only storage type available on Linux is "generic Unix". It is meant to interoperate with Posix-style APIs that deal with `char *`.
4+
`sys_string` logically stores its content as a sequence of `char`s in UTF-8 encoding. Conversions **from** `const char *` always incur copying and sometimes memory allocation (`sys_string` does small string optimization similar to `std::string`).
5+
Conversion **to** `const char *` are 0-cost.
6+
7+
```cpp
8+
const char * cstr_in = "abc";
9+
sys_string str(cstr_in);
10+
const char * cstr_out = str.c_str();
11+
assert(strcmp(cstr_out, cstr_in) == 0);
12+
assert(cstr_out != cstr_in); //in and out are NOT the same!
13+
14+
//nullptr
15+
assert(sys_string().c_str() != nullptr); //NO null preservation
16+
assert(strcmp(sys_string().c_str(), "") == 0);
17+
assert(sys_string((const char *)nullptr).c_str() != nullptr); //NO null preservation
18+
assert(strcmp(sys_string((const char *)nullptr).c_str(), "") == 0);
19+
```
20+
21+
Note that unlike other storage types there is no null preservation here. `c_str()` returns an empty C string for default constructed `sys_string` or one constructed from `nullptr`. This is deliberate to align with `std::string` behavior that never produces `nullptr` from its `c_str()`.
22+

0 commit comments

Comments
 (0)