Skip to content

Web of Science data schema

XiaoranYan edited this page Feb 5, 2021 · 17 revisions

Issues found:

  1. SCI index, SCI expanded, ESCI and how they differ over the years

  2. Master journal list, or which collection it comes from - and where to find in the xml dump. Changes between CORE and ESCI

  3. subject categories versions comparison.

  4. Historic Master journal list - categories.

  5. Country code with semicolons. Ex: WOS:000473806600025, [Japan, Bangladesh, Bangladesh, Bangladesh;]

  6. Confirm labels for back-files for more granular access control. /static_data/summary/EWUID/WUID or edition, combined with /static_data/summary/pub_info/_pubyear for Maryland use case?

  7. ESCI 2005-2014 missing in new xml update?

  8. Fields appear in multiple places in the xmls. See link for a breakdown https://github.com/iuni-cadre/DataPipelineAndProvenanceForCADRE/blob/master/wosParseYan/WoSfieldTagsCompact.csv

  9. Need unique identifiers for journals, conferences, addresses, etc. How is the ids in /static_data/contributors/contributor relate to summary_names and addresses tables?

  10. Duplicate records found in wos_summary_names table conditioned on two indices "id" and "seq_no". Possible redundant parsing happened in the original SQL-parser in terms of path can be nested and rematched https://github.com/cns-iu/generic_parser/blob/master/generic_parser.py

  11. Duplicate records still exists with DISTINCT "id" and "seq_no", in cases where role of an "author" is not "author", group/corporation authors can lead to same name with different "seq_no". For example: https://atlas.cern/discover/collaboration https://journals.aps.org/prd/abstract/10.1103/PhysRevD.101.012002

  12. Unreliable author-address mapping before 2008, is there a "institution enhanced" label in the data set? " address_name/address_spec/organizations/organization/_pref"

  13. References with fractional numbers represents citations outside of WoS collection, what about ids starts with "BCI:BCI198273049941" or "ZOOREC:ZOOR15502011206"

  14. Paragraphs and keywords needs to concatenated from the paragraphs, some duplication exists

  15. We need a detailed official data dictionary. Right now we only have the 2013 version. https://iuni.iu.edu/files/WoS_Documents/WoKRawXML20130509.pdf

  16. Hierarchical classification system,