-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WoS citation data integration with NIH grant data (Chenwei Zhang) #8
Comments
Hi Chenwei, Please redirect all follow up conversations to our GitHub repo and only use email if privacy is a concern. Please also invite me as a collaborator to your own github repo if there is one for this particular project. Here is the preliminary data from the Katz' report. The data consists of three CSV tables, and you can download it from the link (will be valid for a week) Authors.csv contains basic information of the PIs with following columns: Teams.csv contains grant team information from NIH exporters with following columns: Papers.csv contains papers level information from the Katz' data with following columns: Please notice that all data provided by the Katz' study are not as "clean" as they claimed to be. I have identified many duplicate records and many may still remains after my cleaning. Please try to use unique identifiers such as "pi_id", "FULL_PROJECT_NUM" and "PMID" when doing statistics analysis. Please feels free to ask questions if you have any questions about the data. From my experience, the final dataset will take several updates with your feedback. Thanks! |
Thank you so much, Xiaoran! I have downloaded the dataset. I will explore it after I am back from the iconference. I will let you know if I get some updates. |
Dear Xiaoran and Patricia,
Hi. This is Chenwei. It was so nice to meet you today and discuss my dissertation. Thanks for your kind suggestions.
Currently my work focuses on measuring the diversity of teams. I extracted five basic features for the measurement, including:
the scientific age of an author in each team (the current publication year - the first publication year + 1)
an author's impact (citation/h-index)
an author's productivity (number of publications in the corpus)
an author's research topic
an author's country
I really hope I could expand my work from only ACM dataset to another domain, such as the bio domain with the pubmed dataset. I have discussed some potentials with Xiaoran. It will be great if we could have a PI/Co-PI dataset with their publication records. We first need to define a team by the co-investigation relation between these individuals. Then for each member within the team, we want to extract his/her features as I listed above. Ideally we want to have all their publications from a broad dataset (for example, the pubmed/WOS). In case we could only extract publications associated with the grants, just as Katz' report, we will try to claim these features (except for the country) are more from the grant perspective.
Kindly let me know if you have any questions. Thank you so much for your great help!
Best regards,
Chenwei Zhang
PhD Candidate in Information Science / Adjunct Lecturer
School of Informatics, Computing, and Engineering
Indiana University Bloomington
The text was updated successfully, but these errors were encountered: