Skip to content

Releases: CBIIT/INS-Data

2.2.0

18 Dec 20:00

Choose a tag to compare

INS Data Release 2.2.0

This data release was used along with the INS Data Model 2.2.0 for the 3.2.0 release of the Index of NCI Studies on December 18, 2025.

To access raw INS data files (TSV format), download the ins_data_files_2.2.0.0.zip below.

Key Changes

  • Added datasets and downloadable file mappings from the Cancer Target Discovery and Development (CTD²) Network from the Center for Cancer Genomics
    • CTD² datasets (ctd_datasets_curated.tsv) were curated for INS from descriptive study pages on the CTD² Data Portal, which may no longer be supported after 2025
    • CTD² dataset-file mappings (ctd2_filedata.tsv) were curated for each .zip file available from CTD² Data Downloads, which may no longer be supported after 2025
    • Note: CTD² file downloads supported by INS are stored separately, not in this INS-Data repository
  • Updated dbGaP primary disease annotations to improve accuracy and consistency
  • This update does not include a rerun of the full data gathering pipeline. All previous TSVs not mentioned above remain unchanged.

Data Counts

Node Type Unique IDs
program 166
dataset 7889
project 2666
grant 13549
publication 56003
file 82

Relevant Versions

Version Tag
Data Build 2.2.0.0
Program Curation (Qualtrics) 2025-05-09 (manual fix)
Data Gathering Date gathered-2025-05-15
iCite Version 2025-04
dbGaP Download 2025-05-19 (updated curation 2025-12-05)
CEDCD Received 2025-04-24
Data Model 2.2.0

Data Sources

Type Sources
Programs Curation
Projects NIH RePORTER
Grants NIH RePORTER
Publications NIH RePORTER, PubMed, iCite
dbGaP Datasets dbGaP, E-utilities, Curation
GED Datasets GEO, E-utilities
CEDCD Datasets CEDCD, Curation
CTD² Datasets, Files CTD², Curation

Full Changelog: 2.1.1...2.2.0

2.1.1

22 Sep 17:03
f23a501

Choose a tag to compare

INS Data Release 2.1.1

This data release was used along with the INS Data Model 2.1.0 for the 3.1.1 patch release of the Index of NCI Studies on July 22, 2025.

To access raw INS data files (TSV format), download the ins_data_files_2.1.1.0.zip below.

Key Changes

  • Replace NCI DOC values for some datasets with "Non-NIH-Funded". These are datasets where an NCI GPA managed the study data submission to dbGaP, but funding attribution is not appropriate.
  • This update is a minor patch release and did not include a rerun of the full data gathering pipeline.

Data Counts

Node Type Unique IDs
program 166
dataset 7791
project 2666
grant 13549
publication 56003

Relevant Versions

Version Tag
Data Build 2.1.1.0
Program Curation (Qualtrics) 2025-05-09 (manual fix)
Data Gathering Date gathered-2025-05-15
iCite Version 2025-04
dbGaP Download 2025-05-19
CEDCD Received 2025-04-24
Data Model 2.1.0

Data Sources

Type Sources
Programs Curation
Projects NIH RePORTER
Grants NIH RePORTER
Publications NIH RePORTER, PubMed, iCite
dbGaP Datasets dbGaP, E-utilities, Curation
GED Datasets GEO, E-utilities
CEDCD Datasets CEDCD, Curation

Full Changelog: 2.1.0...2.1.1

2.1.0

12 Jun 15:44
1b7114b

Choose a tag to compare

INS Data Release 2.1.0

This data release was used along with the INS Data Model 2.1.0 for the 3.1.0 release of the Index of NCI Studies on June 10, 2025.

To access raw INS data files (TSV format), download the ins_data_files_2.1.0.1.zip below.

Key Changes

  • Programs with Intramural Projects added to workflow
    • 700+ intramural projects from Center for Cancer Research (CCR) and Division of Cancer Epidemiology & Genetics programs added
    • Projects associated with automated downstream research outputs (publications and GEO datasets)
  • Gene Expression Omnibus (GEO) datasets added to workflow
    • GEO accessions associated with publications (PMIDs) are automatically pulled using NCBI E-utilities
    • GEO metadata is pulled from GEO matrix files available through NCBI ftp server
    • Datasets are associated with upstream projects and programs also associated with the same publications
  • Cancer Epidemiology Descriptive Cohort Database cohorts added to workflow
    • Cohort information is received from the CEDCD team and processed for integration with INS
  • Improved dbGaP dataset handling
    • dbGaP dataset module now allows for iterative loops of manual curation and automated cleaning for better workflow integration
    • All data outputs (including dbGaP datasets) are now handled with the data packaging module for standardized finalization
  • Implemented initial unit testing coverage with pytest
  • Minor fixes and improvements to program and packaging modules
  • Updated all input source files and regenerated all outputs

Data Counts

Node Type Unique IDs
program 166
dataset 7791
project 2666
grant 13549
publication 56003

Relevant Versions

Version Tag
Data Build 2.1.0.1
Program Curation (Qualtrics) 2025-05-09 (manual fix)
Data Gathering Date gathered-2025-05-15
iCite Version 2025-04
dbGaP Download 2025-05-19
CEDCD Received 2025-04-24
Data Model 2.1.0

Data Sources

Type Sources
Programs Curation
Projects NIH RePORTER
Grants NIH RePORTER
Publications NIH RePORTER, PubMed, iCite
dbGaP Datasets dbGaP, E-utilities, Curation
GED Datasets GEO, E-utilities
CEDCD Datasets CEDCD, Curation

Full Changelog: 2.0.0...2.1.0

2.0.0

29 Jan 14:39
5682418

Choose a tag to compare

INS Data Release 2.0.0

This data release was used along with INS Data Model 2.0.0 for the 3.0.0 release of the Index of NCI Studies on November 26, 2024.

Key Changes

  • Datasets: New independent datasets module to gather and integrate study metadata from various public NCBI dbGaP resources:
  • Improvements to the Data Validation Excel generation to include tests for filter selections
  • Improvements to handling of errors and edge cases in the Programs, Grants, and Publications modules

Data Counts

Node Type Unique IDs
program 163
dataset 871
project 1899
grant 6376
publication 35339

Relevant Versions

Version Tag
Data Build 2.0.0.4
Program Curation (Qualtrics) 2024-09-18 (manual fix)
Data Gathering Date gathered-2024-09-20
iCite Version 2024-08
dbGaP Download 2024-08-30
Datasets Curation 2024-10-29
Data Model 2.0.0

Data Sources

Type Sources
program Curation
dataset dbGaP, PubMed, Curation
project NIH RePORTER
grant NIH RePORTER
publication NIH RePORTER, PubMed, iCite

Full Changelog: v1.1.0...2.0.0

v1.1.0

09 May 16:58
ad99936

Choose a tag to compare

Release v1.1.0

INS Data Gathering pipeline release v1.1.0 for the Index of NCI Studies (INS) https://studycatalog.cancer.gov/. First introduced in the May 9th, 2024 release of INS v2.1.0.

Key Changes

  • New data gathered from 2024-04-24 programs input file
  • Formatting updates (special character handling, CSV escape workarounds)
  • Support for separate cancer_type and focus_area program properties
  • Data validation file updates

Relevant Versions

Version Tag
Qualtrics Version 2024-04-24
Data Gathering Date gathered-2024-04-26
iCite Version 2024-03
Qualtrics Type manual_fix
Data Model v1.0.1

Data Counts

Node Type Unique IDs
program 83
project 757
grant 3065
publication 22199

Data Sources

Type Sources
program ODS Curation
project NIH RePORTER
grant NIH RePORTER
publication NIH RePORTER, PubMed, iCite

v1.0.0

27 Mar 17:48
db73bbf

Choose a tag to compare

Release v1.0.0

Initial release of the INS Data Gathering pipeline. This release represents the first data gathered by this pipeline that will be included in the Index of NCI Studies (INS) https://studycatalog.cancer.gov/
Previous versions of INS used data gathered using the INS-ETL (Obsolete) repo. All data gathered using the old method have been removed and are no longer included in INS v2.0+.

Relevant Versions

Version Tag
Qualtrics Version 2024-03-12
Data Gathering Date gathered-2024-03-13
iCite Version 2024-02
Qualtrics Type raw
Data Model https://github.com/CBIIT/ins-model/tree/fb6185addcd82c6f535ebfa5dff39f8cb64ce262

Data Counts

Node Type Unique IDs
program 83
project 757
grant 3047
publication 21898

Date Sources

Type Sources
program ODS Curation
project NIH RePORTER
grant NIH RePORTER
publication NIH RePORTER, PubMed, iCite