This repository contains the Python and R scripts written to analyze the datasets of 850,827 authors of Google Scholar we collected in 2021 for this paper:
Title: "A new insight to the analysis of co-authorship in Google Scholar"
Authors: Ghazal Kalhor, Amin Asadi Sarijalou, Niloofar Sharifi Sadr, and Behnam Bahrak
DOI: https://doi.org/10.1007/s41109-022-00460-4
Datasets: https://drive.google.com/drive/folders/1v9nkcG2QasMX54Ejv2jVEpX_5DzB8xt2?usp=share_link
If you use our scripts or datasets in your work, please cite our paper:
Kalhor, G., Asadi Sarijalou, A., Sharifi Sadr, N. and Bahrak, B., 2022. A new insight to the analysis of co-authorship in Google Scholar. Applied Network Science, 7(1), p.21. https://doi.org/10.1007/s41109-022-00460-4
We collected the following information for each author from Google Scholar:
- Author ID
- Institute ID
- Citation Count
- h-index
- Gender
- Country
- Fields of Interest
- Co-authors IDs
import pandas as pd
authorsFeatures = pd.read_csv('authorsFeatures.csv')
authorsFeatures.head(3)
Author ID | Institute ID | Citation Count | h-index | Gender | Country |
---|---|---|---|---|---|
QcRldecAAAAJ | 17508113656414128510 | 1280 | 16 | male | ID |
rkKMIwMAAAAJ | 4065822778065209794 | 1034 | 18 | male | US |
AUb2dK4AAAAJ | 4396926741242628134 | 111 | 7 | female | US |
authorsFields = pd.read_csv('authorsFields.csv')
authorsFields.head(3)
Author ID | Field of Interest |
---|---|
QcRldecAAAAJ | Physics and Astronomy |
rkKMIwMAAAAJ | Computer Science |
AUb2dK4AAAAJ | Agricultural and Biological Sciences |
coauthorship = pd.read_csv('coauthorship.csv')
coauthorship.head(3)
Author ID | Co-author ID |
---|---|
rkKMIwMAAAAJ | ZjDFiYsAAAAJ |
rkKMIwMAAAAJ | rcW8mi0AAAAJ |
rkKMIwMAAAAJ | n07X8FoAAAAJ |