-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oa_snowball
returns Error in if (is.na(so_info)) NA else so_info[[1]]
when snowballing large number of cites
#95
Comments
Hi @TimothyElder thanks so much for reporting this. 🌱 It is expected that the script takes a while to run because oa_snowball retrieves all works that cite and are cited by the focal work. When your set of focal work is over 5000 works, this can take a very long time, especially if some of these focal works have a lot of citations. In this particular case, I think you run out of memory in R. The result of the following query finding works that are cited by a subset of your focal works somehow result in over 7GiB of memory used in session. I'll keep investigating, but I suggest breaking your library(openalexR)
ids <- c("W2119340816", "W4285719527", "W4211208840", "W4247785462", "W4211082352", "W2163351155", "W4210992155", "W2103903454", "W2549006299", "W2026141069", "W3126128017", "W2145354914", "W2086643853", "W2085458222", "W1988902102", "W2095880617", "W2139524347", "W2109565845", "W2112652525", "W2137200701", "W2144330816", "W2552595635", "W1996710573", "W2051676630", "W1875373156", "W2761242421", "W2134119471", "W2125665528", "W2111285159", "W2147485520", "W2121875608", "W2561425398", "W4238604577", "W2336794604", "W2106742300", "W4211174791", "W1958810146", "W2184779060", "W2169678441", "W1942996532", "W2165335733", "W2098206882", "W2073051214", "W2168197710", "W2017506719", "W2469676206", "W2094905849", "W2099192919", "W2124028388", "W4248178819")
oa_fetch(
cited_by = ids,
verbose = TRUE,
cited_by_count = c(">1000", "<30000")
) |
@trangdata Thanks! I kept working on this and found a similar solution to the one that you outlined. Instead of breaking it up by feeding in chunks of the data, I used the
Then plan on doing a few more passes with |
@TimothyElder one thing I noticed just now: did you mean for the conditions to be oa_snowball(
identifier = ids,
verbose = TRUE,
citing_filter = list(cited_by_count = ">500", cited_by_count = "<30000"),
cited_by_filter = list(cited_by_count = ">500", cited_by_count = "<30000"),
is_retracted = FALSE
) |
@trangdata Yes!! Very good catch. This was my way of chunking out the process, though now that I look at the code I wrote, i see that there are some mistakes. But, yes I meant for the snowball to return only articles that are cited by more than 500 other articles but less than 30,000 articles. I also added the For my own clarification the Sorry in advance if that is confusing, and the documentation even on OpenAlex is a little confusing about the logical expressions. |
Yes, you're correct @TimothyElder. 💯 Also, we're open to new PRs if you would like to improve the documentation! 🙏🏽 🪴 |
When running
oa_snowball
on all the works that cite one highly cited article a large number of works are returned and the script takes a long time to run. After returning about 100,000 works, the script returns error:Looking at the source code I can't quite make sense why this error is returned. And I can't think of a way of more efficiently returning all the works. Here is how I do it now:
The text was updated successfully, but these errors were encountered: