-
Notifications
You must be signed in to change notification settings - Fork 0
Web of Science data schema
Issues found:
-
SCI index, SCI expanded, ESCI and how they differ over the years
-
Master journal list, or which collection it comes from - and where to find in the xml dump. Changes between CORE and ESCI
-
subject categories versions comparison.
-
Historic Master journal list - categories.
-
Country code with semicolons. Ex: WOS:000473806600025, [Japan, Bangladesh, Bangladesh, Bangladesh;]
-
Confirm labels for back-files for more granular access control. /static_data/summary/EWUID/WUID or edition, combined with /static_data/summary/pub_info/_pubyear for Maryland use case?
-
ESCI 2005-2014 missing in new xml update?
-
Fields appear in multiple places in the xmls. See link for a breakdown https://github.com/iuni-cadre/DataPipelineAndProvenanceForCADRE/blob/master/wosParseYan/WoSfieldTagsCompact.csv
-
Need unique identifiers for journals, conferences, addresses, etc. How is the ids in /static_data/contributors/contributor relate to summary_names and addresses tables?
-
Duplicate records found in wos_summary_names table conditioned on two indices "id" and "seq_no". Possible redundant parsing happened in the original SQL-parser in terms of path can be nested and rematched https://github.com/cns-iu/generic_parser/blob/master/generic_parser.py
-
Duplicate records still exists with DISTINCT "id" and "seq_no", in cases where role of an "author" is not "author", group/corporation authors can lead to same name with different "seq_no". For example: https://atlas.cern/discover/collaboration https://journals.aps.org/prd/abstract/10.1103/PhysRevD.101.012002
-
Unreliable author-address mapping before 2008, is there a "institution enhanced" label in the data set? " address_name/address_spec/organizations/organization/_pref"
-
References with fractional numbers represents citations outside of WoS collection, what about ids starts with "BCI:BCI198273049941" or "ZOOREC:ZOOR15502011206"
-
Paragraphs and keywords needs to concatenated from the paragraphs, some duplication exists
-
We need a detailed official data dictionary. Right now we only have the 2013 version. https://iuni.iu.edu/files/WoS_Documents/WoKRawXML20130509.pdf
-
Hierarchical classification system,