-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the task template #894
Conversation
I will look more into it later, for now let me cite the string_view documentation:
My take is to just use const std::string, which is a bit more expensive but provides the safe-ish (lifetimes!) method |
Well, std::string is definitely a good option. But std::string_view provides much more convenience. If we use std::string, the input has to be std::string: auto task = Task(std::string{"a_task"}); On the other hand, if we use std::string_view, input can have any string-like type (without copying). So it still works with this: auto task = Task("a_task");
I would say this is a very rare case since we don't usually create char arrays on purpose (I have to look it up how to do this). And if someone deliberately does this to shoot himself in his foot, then let it be. |
First, the lack of the explicit keyword for the definition of the constructor (5) of basic_string seems to indicate that you can initialize a std::string using a zero terminated char pointer implicitly, just like you can for string_view. Secondly, if an interface takes a std::string_view, that clearly suggests that it will work with any valid string_views. Implicitly relying on certain constraints which may be true for some instances (not per standard, but as implementation details) seems like a bad way to define an interface. If your constructor requires a zero terminated string, then it would be better to call the argument Explicit char array trickery is not required to construct string_views which violate your assumption of zero-terminatedness.
The cout will print "12", while the printf will print "12345", because the string referred to by Is it terrible likely that someone will pass such a wrongly terminated std::string_view to that class? Certainly not. Still, especially given that this is code meant to be built on by new programmers, I think "it is fine to build hidden assumptions into your interfaces" is the wrong message to send, and nobody will ever care about the runtime and memory penalty of copying the name to a std::string in any case. |
Yeah, you are quite right about this. Not sure how I got this wrong idea.
I wouldn't say using std::string_view is relying on certain constraints. Rather it just says this interface accepts a reference to any string-like data and pass the reference to the underlying function. In this case,
For a better comparison, you should also use std::string_view a{"12345"};
a.remove_suffix(3);
std::cout << a.data() << "\n";
printf("%s\n", a.data()); Then you get same result. As the name suggested, char name[] = {'a', 'b', 'c', 'd', '\0'}; // size = 5
auto name_view = std::string_view(name); // size = 4
name_view.remove_suffix(2); // size = 2
name_view.data(); // const char*
I agree. But this is the responsibility of
I would give a second thought about this. Since we are dealing with a template, which, presumably, needs to be viewed as a standard way to implement many task classes. If all of them have this string copying, it may have some impacts on performance. |
c5a0417
to
3f40adb
Compare
Regarding interfaces, an analogy.
If one uses the data method, then a string_view is functionally equivalent to a character pointer. If one intends to use it that way, I would much prefer to use a
So far I'm aware, all users of R3BRoot have kept their task names to below a megabyte, and their number of tasks per run in the low thousands. :-) As a rule of thumb, if a thing happens outside the event loop and is not by itself very costly, it is likely not worth worrying about. TObject::fTitle likely already creates a copy of the const char*, another copy won't hurt too much. If, a few decades down the road, it becomes fashionable to create new tasks within the event loop, and performance measurements point to the copying of the task names out of all things as the bottleneck, then we can always switch from std::string to const char* and fix performance. |
That's not true. The data stored in a string_view is a char pointer PLUS a size, which makes the whole thing a lot better. :D
On the other hand, std::string_view doesn't have this issue. Even if the char array doesn't have null terminator, it's still fine. #include <iostream>
#include <string>
void Print(const char* name) {
auto str = std::string{name};
std::cout << str << "\n";
}
void Print2(std::string_view name) {
auto str = std::string{name};
std::cout << str << "\n";
}
int main() {
const char name[] = {'a', 'b', 'c', 'd', 'e'};
const char name2 = 'a';
const char name3 = 'b';
Print(name);
// Print2(name);
} the printout may be
That's true. But default is important. Something like always return
Suppose party B is a delivery guy. I guess most of us don't blame the delivery man for the bad food or mistakes made by either the costumer or the restaurant. :D |
The thing is, the author of this class (B) is not just a delivery man. If B got an argument X of type x_t from your caller (A) and passed it to its base class as an x_t, then B could claim that any issues with X are between A and C. The analogy here would be B taking a package labeled "not for human consumption", ripping of the label and writing "enjoy your meal!" on it instead. std::string_view is safe and well-defined on its own, zero terminated char arrays are safe and well-defined only if you don't make any mistakes (which is why they are discouraged). Reinterpreting the former as the latter is not correct in some cases. I see the following ways forward:
What we should not do is any of the following:
I agree that using However, the rules for performant string handling in physics analysis are very simple: Don't do strings processing per event. Outside event loops, what I expect from R3BRoot contributors is that they can use strings correctly. Fancy classes like string_view which don't own the buffer get complex to use when the life-time of the buffer is shorter than the life-time of these classes (e.g. if you had a string_view member variable). If we were writing a parser, a regex library or a database management system, then we would have to be experts on fast string handling. As it is, even if the task constructor took a std::string by copy instead of by reference, that is unlikely to be within my top ten complaints about class implementation. |
Using the name of a variable or a type to indicate the constraint is a poor design because it doesn't make program crash at the right moment when it should. But let's just compare the following methods: using zero_terminated_character_array=char* using zero_terminated_character_array=std::string_view The second one can do everything the first one can do. If we pass a NUL-terminated string, both work. If we pass a non-NUL-terminated string, both fail. And from the example above we could see when passing a char array (
There is always a risk of passing a pointer because we could never know whether the data pointed is still valid or not. And I don't think there is something wrong with |
You are indeed correct that I agree that it is always preferable to use the type system to just using alias names, but I still think that there is a use for transporting intent using type alias names. For example, I would preferentially call what is to the compiler a 32 bit signed integer Many functions have preconditions for their arguments which are not commonly expressed in a type system. For example:
If we knew that FairTask would get a std::string_view constructor at the end of the year, I would now prefer to use:
Then we would not have to change our interface when FairRoot starts using C++ strings. However, in the reality we are inhabiting, this is not going to happen. So we are stuck with these options:
My preference ordering is Safe_A>Safe_B>Cheap_B>Cheap_A.
So in the end Cheap_A is more confusing (because you violate the default interpretation of a type) and more costly (because you take and record the length which you never use). Last analogy (I promise). Suppose that someone heard that raw pointers are evil and one should use C++ pointer types instead. So they do the following:
To that I would reply: "You are not using unique_ptr as intended, but abusing that class to transport a pointer with different ownership semantics. It is better to use 'evil' C constructs which have well known ambiguities in interpretation (like Foo*) than violating the supposedly unambiguous interpretation of well known C++ constructs, which is about 100x as evil." (I am aware that in that specific case, there is an even better solution than keeping Foo*, namely weak_ptr, but that is not my point.) This is also exactly how I feel about using string_view to smuggle what will be interpreted as zero-terminated char arrays. I think that taking a
So rather than continuing the discussion if char* (which some people will mistake as a pointer to a single character) is more or less evil than your suggested use of string_view (which most people will mistake as having an explicit length rather than being reinterpreted as a zero terminated array), can we perhaps agree that std::string& is good enough and move on? |
I made a wrong statement, when you pass a const char name[] = {'a', 'b', 'c'};
const auto name_view= std::string_view{name, std::size(name)}; Otherwise, it would still be UB. The thing with
Let's ust be simple: ExampleTask::ExampleTask(std::string_view name)
: FairTask(name){}
I don't understand here. Why don't you just use
The only difference in data structure between string_views and char* (apart from the type name) is string_views has one additional data to describe the size of the array. You could interpret string_views as So when you break this interpretation and use a not-NUL-terminated char array, everything is broken no matter you use char*, string or string_view: void Read_string(std::string name);
void Read_c_char(const char* name);
void Read_string_view(std::string_view name);
const char name[] = {'a', 'b', 'c'};
Read_string(name); // UB
Read_c_char(name); // UB
Read_string_view(name); // UB Here is an example of using std::string. So the best solution is just not using non-NUL-terminated string. But if we still have to deal with the case of a non-NUL-terminated string without data copying, passing std::string_view is the only option. (unless you are a C programmer and do something like |
I won't say this example is abusing a new feature but rather simply a wrong usage.
Let's move on by just taking |
Of course, once they had that constructor we would switch to the simple version.
The problem in these examples arises from the fact that you are implicitly constructing std::string, std::string_view using the const char* constructor without passing size. However, if you construct a string/string_view by some other mean, there is no reason to expect that it's internal representation will be zero-terminated. (Only since C++11 zero termination is mandated for std::string; before that, implementations were free to use realloc when someone called c_str and they had to append a zero byte.)
char* is assumed in many contexts (e.g. without a separate length argument) to be zero terminated (because otherwise you could not know the length).
That is true if you do not require access to the internal representation. If you need to cheaply convert it to a zero terminated character array (which we do), then string_view is not an option.
Agreed. I think the moral of the story is that C++ inherited a lot of cruft from C, and parts of ROOT very much rely on that cruft. (If the language was designed from the start, there would be no reason why the literal "foo" should not be a string_view or why Of course, this legacy support is also the reason for C++'s success. |
3f40adb
to
0c2f4d7
Compare
Ok, I change to |
Hello @YanzhaoW |
What is the plan for implementing this template? Will you take all current code, replace the current standard with the new template? Or is the expectation that all 'new' code needs to be in the new template. If that's the case, this is forcing the people who are nice enough to expand upon the code to do the copying and pasting. I object to this. This would be forcing a template upon us and washing you hands clean of the responsibility that comes with it. Best regards, |
@jose-luis-rs Yes.
The plan is same as when it was created in the first place.
No, we don't have neither the manpower nor expertise.
I would say it's suggested to do that. In my opinion, it's more like an example instead of a rule. If you create a new class, rather than copying a low-quality code, copy this. If you can't do exactly the same thing, then do it in your own way. But we should always hold a higher standard to the new code in folders like r3bdata and r3bsource because they are used by everyone.
I don't quite understand this. This isn't such "clean of the responsibility". You should be always and solely responsible for the code you create. I created this template. I'm responsible for this template. You create a Cal2Hit class. You are responsible for the Cal2Hit class. |
Agreed, the template is meant as an acceptable basis on which to build new classes. It is clearly not the only acceptable basis. For example, I would be ok with a FairTask which:
I think that we should only strictly enforce standards which are already enforced on our existing code (and perhaps easily fixed stuff such as "initialize all variables and members in-line"). If a PR is above the ~60th percentile of the code base in terms of maintainability, that is good enough for me. |
I also agree. A template should be updated for improvements or corrected. We should have the best possible example for new classes. If a developer uses the last version, he/she could profit from these corrections. If not, we should fix more things during the PR phase, but as @YanzhaoW said, it is suggested to use the new, not obliged. |
Update R3BRoot/template/NewTask/base with R3BIOConnector proposed in #893.
Please add the comments below if there are any suggestions. @klenze
Depends on #893
(Let's see how Github manages a PR based on another unmerged PR. ;D)
Checklist:
dev
branch