-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WoS citations for pubmed papers (Xin Li) #4
Comments
Sure, I will do that next week. I assume you are using pubmed data as well. We already have a copy on Azure if you do not want to download them again. |
发件人: Yan, Xiaoran I can certainly help with that once I got back early next year. However, it might be worthwhile to discuss in more details before we proceed. From what I have seen, the citation data in PubMed is missing about 40% compared to WoS and MAG combined. We are planning to do a data integration by merging WoS, MAG and PubMed. And what is your plan to distinguish clinical paperand non-clinical paper? Do you need the MESH tag of each paper? Xiaoran Thank you very much, Dear Xiaoran. I need the citation times of all articles in PubMed. Could you write me a csv file that include each paper's pmid , title and its corresponding citation times in WOS? Have a good day! Xin Li Sure. Although not officially part of CADRE yet, I did built a spark database of pubmed for my own research. Let us discuss what I can help with Xin's project. Xiaoran On Dec 18, 2018 3:31 AM, "Ding, Ying" dingying@indiana.edu wrote:
|
If you want to use our PubMed data, please specify the list of columns in pubmed you want (authors, titles, abstract, etc...). If I remember this correctly, you already have a list of pubmed IDs of clinical papers, do you still need mesh tags? If instead, you already have a curated pubmed data that you want to connect with WoS citations, please let us know. We can upload your data into our cloud for easier communications and access from the notebook environment. |
Hi Xin, The requested citation table is now available inside your notebook environment. You can find it under /AzureDownload/PMwosCItations.cxv.gz You can re-download it by running the AzureBlobTest notebook Xiaoran |
Dear Xiaoran, Thank you very much! I have downloaded it and will look into it! Xin Li |
Dear Xiaoran, For the citation file you have written, I have several questions that need your kind answer: Yours sincerely, |
(1) Does each paper in the file have WOS number? (2) Does each paper in the file have DOI number? (3) are there papers that have no PMID in the file? (4)are there papers that have no publication year in the file? |
Hi! Dear Xiaoran, I have checked the citation data, the total number of the citing-cited pairs is 11,894,932. But it is very strange that there are only 7,607,845 citing papers in the dataset. This number is much smaller than what I think. So I check whether each a citing or cited paper contains its PMID. The result is yes. I guess we have limited the citations in the papers indexed in the MEDLINE, which led to the result was very close to the data extracted from PubMed (about 5,700,000 citing papers). So Could you kindly write me a new file that contains the citation pairs of the whole WOS? Because what exactly I want to use is the global citation information in the WOS dataset. A PubMed paper could cite a paper that is not indexed in PubMed (has no PMID), vice versa. We should include all the papers in the WOS whether it has PMID(DOI) or not. If a paper has no PMID, we can just mark it 'null' or something else. Thanks, |
Sure, but this would be huge. Do you only want those citations originated from PubMed matched papers (citing papers)?
This does not make sense at all. If you are only comparing clinical vs non-clinical papers in PubMed, all other WoS records that does not cite or is not cited by the matched records should not matter. Unless you plan to do mult-step citation analysis. In general, data at this scale is very tricky to deal even with our resources. It is recommended to move your code to data, which means downloading might not be an efficient way any longer. Please attend our event next week and we can discuss possibilities moving forward. http://iuni.iu.edu/news/event/39 |
No, I also need the information about papers that are not indexed in PubMed.
I am sorry I didn't express my goal clearly. It was a very initial idea that comparing clinical vs non-clinical papers in PubMed, and I found someone had done it. Now, what I want to do with the citation dataset is something like this paper https://www.nature.com/articles/s41586-019-0941-9?mc_cid=ece727ac75&mc_eid=%5BUNIQID%5D, using the citation data to design some indicators for the prediction of the success of a drug. It is also an initial idea, but I believe it is promising.
Yes, of course, it will be about 200 GB as I estimate. I plan to use something like Lucene or ElasticSearch before, index and then search. it will only take about 300 GB hard drive to store and index it. However, moving code to data is also a good choice, I believe. Thank you so much for the kind reply. I will definitely attend your great events if possible. |
Dear Xiaoran,
Could you write me a file that contains the citation relationships of papers in the whole Web of Science?
If possible, each line can be organized as "citing paper (WOS No.|pmid|doi|published year) \t cited paper(WOS No.|pmid|doi|published year)".
By the way, thank you for the AWS server! It is very helpful!
Yours sincerely,
Xin Li
2019-03-08
The text was updated successfully, but these errors were encountered: