Fix the O(N^2) bug of passColName and passRowName#1782
Fix the O(N^2) bug of passColName and passRowName#1782jajhall merged 1 commit intoERGO-Code:latestfrom
Conversation
|
@jajhall CI tests pass. It is ready to be reviewed. |
| assert(int(search->second) < int(this->name2index.size())); | ||
| this->name2index.erase(search); | ||
| this->name2index.insert({name[index], kHashIsDuplicate}); | ||
| search->second = kHashIsDuplicate; |
There was a problem hiding this comment.
By deleting
this->name2index.erase(search);
this->name2index.insert({name[index], kHashIsDuplicate});
and only setting
search->second = kHashIsDuplicate;
(which has no effect since search is immediately destroyed) the property of marking duplicate names is lost. hence errors due to the same name being given to different columns are not trapped.
There was a problem hiding this comment.
emplace will only insert when there is no duplication. When there is duplication, search points to the duplicated key and value, and setting the value will influence the original map.
This is an example: https://godbolt.org/z/YsYYrKrPb
There was a problem hiding this comment.
Thanks, it did cross my mind that there was more to setting
search->second = kHashIsDuplicate;
than I gave you credit for, sorry.
As I said, I'm not fluent in the use of std::unordered_map :-)
Just spotted the & in
auto& search = emplace_result.first;
Now I see why setting
search->second = kHashIsDuplicate;
achieves the desired result
There was a problem hiding this comment.
Never mind. I also learned about the meaning of return values of emplace today. It is a clever choice to use emplace to detect duplication.
| // Find the original and mark it as duplicate | ||
| auto& search = emplace_result.first; | ||
| assert(int(search->second) < int(this->name2index.size())); | ||
| search->second = kHashIsDuplicate; |
jajhall
left a comment
There was a problem hiding this comment.
This may pass the CI tests, but that's because there's no unit test to check that duplicate names are marked as such
Lines 68 to 88 in 13363c9 In the beginning, col0_name is unique, so it has corresponding column. But after col0_name is assigned to |
|
There is still one edge case: if a name is marked as duplicate and the corresponding column is assigned with a new name, then the old name should not be deleted directly (there might be other columns with this old name). However, this is an edge case and checking it would require The final and complete solution is to use https://en.cppreference.com/w/cpp/container/unordered_multimap to map name to column/row if we want to allow multiple columns/rows to have the same name. |
|
Indeed, there's a limit to making this idiot-proof |
Reland #1780
I also simplify some logic here. The result of
emplacehas contained the duplicate key-value pair if it detects duplication, so we can mark the value as duplicate directly instead of using the slow search-erase-insert combo.