-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move some setDT validation checks to C #5427
Merged
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
7cc4da4
move setDT validation checks to C
MichaelChirico bd2fc1e
fix some tests
MichaelChirico 835f118
Merge branch 'master' into setdt-wide
MichaelChirico 2831c59
restore prototype
MichaelChirico b53d967
Rename to reflect return value
MichaelChirico 5a2d8e9
stab at sharing message at R+C levels
MichaelChirico c929792
Compiles & basically runs
MichaelChirico 0696944
Run POSIXlt test first; save LENGTH(dim) to reuse
MichaelChirico d0e0216
new test
MichaelChirico fb45cd8
warning, not error
MichaelChirico 1872f47
correct comment
MichaelChirico 1e758e6
Merge branch 'master' into setdt-wide
tdhock af48a80
setDT faster
tdhock File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -200,6 +200,48 @@ static SEXP shallow(SEXP dt, SEXP cols, R_len_t n) | |
return(newdt); | ||
} | ||
|
||
// Wrapped in a function so the same message is issued for the data.frame case at the R level | ||
void warn_matrix_column(/* 1-indexed */ int i) { | ||
warning(_("Some columns are a multi-column type (such as a matrix column), for example column %d. setDT will retain these columns as-is but subsequent operations like grouping and joining may fail. Please consider as.data.table() instead which will create a new column for each embedded column."), i); | ||
} | ||
|
||
// input validation for setDT() list input; assume is.list(x) was tested in R | ||
SEXP setdt_nrows(SEXP x) | ||
{ | ||
int base_length = 0; | ||
bool test_matrix_cols = true; | ||
|
||
for (R_len_t i = 0; i < LENGTH(x); ++i) { | ||
SEXP xi = VECTOR_ELT(x, i); | ||
/* allow NULL columns to be created by setDT(list) even though they are not really allowed | ||
* many operations still work in the presence of NULL columns and it might be convenient | ||
* e.g. in package eplusr which calls setDT on a list when parsing JSON. Operations which | ||
* fail for NULL columns will give helpful error at that point, #3480 and #3471 */ | ||
if (Rf_isNull(xi)) continue; | ||
if (Rf_inherits(xi, "POSIXlt")) { | ||
error(_("Column %d has class 'POSIXlt'. Please convert it to POSIXct (using as.POSIXct) and run setDT() again. We do not recommend the use of POSIXlt at all because it uses 40 bytes to store one date."), i+1); | ||
} | ||
SEXP dim_xi = getAttrib(xi, R_DimSymbol); | ||
R_len_t len_xi; | ||
R_len_t n_dim = LENGTH(dim_xi); | ||
if (n_dim) { | ||
if (test_matrix_cols && n_dim > 1) { | ||
warn_matrix_column(i+1); | ||
test_matrix_cols = false; | ||
} | ||
len_xi = INTEGER(dim_xi)[0]; | ||
} else { | ||
len_xi = LENGTH(xi); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. with #5981 in mind I do have some wariness about potentially introducing issues by skipping dispatch. |
||
} | ||
if (!base_length) { | ||
base_length = len_xi; | ||
} else if (len_xi != base_length) { | ||
error(_("All elements in argument 'x' to 'setDT' must be of equal length, but input %d has length %d whereas the first non-empty input had length %d"), i+1, len_xi, base_length); | ||
} | ||
} | ||
return ScalarInteger(base_length); | ||
} | ||
|
||
SEXP alloccol(SEXP dt, R_len_t n, Rboolean verbose) | ||
{ | ||
SEXP names, klass; // klass not class at request of pydatatable because class is reserved word in C++, PR #3129 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be preferable to combine the logic for the
for
loops instead of just emitting the warning, but I didn't see an easy way to do so -- thesetdt_nrows
parallel also does the other checks in the same loop. Unless we think looping over columns twice is fine to exchange for the clarity of sharing this logic. But this loop is pretty straightforward.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the loop overhead is likely to be negligible compared to even this simple logic, so I don't think it's worth much instinctively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good performance improvement.