-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #9
Comments
That would be awesome. |
Not in C, but a first pass at this might look something like this. It uses the fact that if get_yaml_header <- function(filename, yaml_rxp = "^#?---[[:space:]]*$") {
con <- file(filename, "r")
on.exit(close(con))
first_line <- readLines(con, n = 1)
if (!grepl(yaml_rxp, first_line)) {
warning("No YAML file found.")
return(NULL)
}
iline <- 2
closing_tag <- FALSE
tag_vec <- character()
while (!closing_tag) {
curr_line <- readLines(con, n = 1)
tag_vec[iline - 1] <- curr_line
closing_tag <- grepl(yaml_rxp, curr_line)
iline <- iline + 1
}
tag_vec[seq_len(iline - 2)]
}
parse_yaml_header <- function(yaml_header) {
if (all(grepl("^#", yaml_header))) {
yaml_header <- gsub("^#", "", yaml_header)
}
yaml::yaml.load(paste(yaml_header, collapse = "\n"))
}
raw_header <- get_yaml_header("iris.csvy")
metadata <- parse_yaml_header(raw_header) You should then be able to do something like If this looks OK, I can try to put together a more complete pull request later this week. |
That would be awesome! |
See issue leeper#9.
Merging of #15 is done. We could do further C-level fixes, but this seems good for the time being. |
Currently
read_csvy
reads the complete file usingreadLines()
- this means it will be slow for large files. I'd recommend (and can possibly help with) writing a C/C++read_yaml_header()
function that would parse from the first---
to the next---
. This metadata could then be used to generate the column specification that's passed toread.csv()
,read_csv()
, andfread()
. (Will probably still need some additional cleanup afterwards).The text was updated successfully, but these errors were encountered: