Skip to content

CADRE ISSI 2019 tutorial

XiaoranYan edited this page Aug 28, 2019 · 19 revisions

Background

Open big bibliometric data sets such as Microsoft Academic Graph (MAG) and the Lens are becoming critical resources for the Scientometrics and Informetrics research community. Open big data sets hold promise for revolutionizing the scientific enterprise when combined with state-of-the-science computational capabilities (Fortunato et al 2018). Yet, this promise cannot be fulfilled on a large scale until barriers are addressed. In particular, the pragmatic constraints of cost and expertise needed to service and support the research infrastructure associated with such large and complex datasets is preventing research groups and institutions from taking advantage of what MAG, the Lens and other large open datasets have to offer. Their user base is limited to the privileged few who has the knowledge and technical support of big data, despite the fact that these open data permits access and sharing with little restrictions. These barriers also make it very difficult for the research community to reproduce, compare, and build upon previous results, hindering scientific progress in Scientometrics and Informetrics.

The proposed cloud-based platform, Collaborative Archive & Data Research Environment (CADRE) aims to provide sustainable, scalable, and standardized data and analytic services for open and proprietary, big data bibliometric data sets. Supported by a U.S. Institute of Museum and Library Services National Leadership Grant and partners in both academia (9 institutions within the Big Ten Academic Alliance) and industry (Clarivate Analytics and Microsoft Research), CADRE will provide an efficient integrated solution for scholars from different disciplines and institutions, including a free tier that is open to the general public. In addition to providing access to data (per institutional subscription), CADRE is designed to facilitate collaboration and reproducibility by providing a research asset commons, where researchers will be able to save their queries, algorithms, data subsets, derived results, tools and methods. The use of globally unique Digital Object Identifiers (DOIs) will ensure reproducibility and replicability of results as well as data provenance for every stage of their transformation. Another key feature of CADRE will be its flexibility to meet individual researcher’s needs: a graphical user interface with pull down menus will make the platform accessible to researchers with limited programming skills, while application program interfaces and interactive notebook environments will be available for those who are more comfortable writing their own code.

The key for CADRE’s success is to build a community of practice in addition and adjacent to the data cyberinfrastructure. We aim to knit together communities of data providers and consumers, seeking out and cultivating relationships between industry partners, researchers who work with the hosted data, and member libraries. By bringing together this community of stakeholders with a mutual interest, CADRE will help to lower technological and financial barriers, facilitate data and resource sharing across institutional and disciplinary boundaries, promote collaborations and reproducibility, add value to the resource through shared decision making (regarding datasets to add, data custodial tasks, or setting of standards for data, reporting, etc), and ultimately accelerate discoveries in the emerging field of science of science (SofS).

The tutorial

At ISSI, we aim to reach out to the international Scientometrics and Informetrics research community. We plan to host an academic workshop in conjunction with a hands-on tutorial. The tutorial will focus on hands-on experience for the general audience at ISSI. Based on CADRE’s free tier and the MAG dataset, it will be fully committed to the "GOTO" principle, which stands for Good, Open, Transparent and Objective, in terms of data, resources and materials.

To address all levels of user proficiency, we will offer two different approaches. A. an intuitive, graphical user interface and B. assisted programming. All attendees will be encouraged to walk through examples using both interfaces. Instructions will be given orally and in writing. The tutorial attendees will have access to step by step instructions as well as individual discussions. They will also have an opportunity to voice their needs and help us better design CADRE through feedback questionnaire. The proposed tutorial is designed to stand alone and also to serve as a perfect complement to our companion workshop on CADRE. Ideally, attendees will register for both the workshop and tutorial to maximize their benefit.

Several months in advance of the tutorial, we will issue a call for SofS researchers to work directly with our development team and to test out the utility of an early version of the CADRE platform for their own use cases. Research projects with great promise of benefitting from CADRE and potential for being of broad interest to the research community, and most importantly, demonstrated commitments to the "GOTO" principle will be selected from the pool of applicants. Moreover, our partner Microsoft Research will provide travel scholarships to the ISSI meeting for these teams, who will showcase their work at the proposed tutorial. For the first time, we will launch CADRE’s free tier service to the international community at ISSI. The system will remain open during the remainder of the ISSI conference, and our scientific and technical team will provide immediate support. The individual discussion section of the tutorial will also serve as the “incubator” for new research ideas on CADRE. Identified ideas and teams will be followed beyond the ISSI conference.

Preliminary program

16:30 - 16:35: [Yan] Welcome and introduction to tutorial’s program.

16:35 - 16:45: [Pentchev] The Promise of CADRE: A brief introduction of the CADRE project, need for collaborative research platforms. Overview of the CADRE design and functionality.

16:45 - 16:50: [Hutchinson] Walk through registration and set up profile: User registration and research commons.

16:50 - 17:00: [Hutchinson] Demo 1: simple query on the GUI-query builder (step by step using drop down menus). With simple plotting in notebook.

17:00 - 17:05: [Hutchinson] Network visualization by running a package, with illustrated use of the Research Commons. Refer to Silva for more detailed explanation.

17:05 - 17:20: [Silva] Demo 2 with visualizations and word clouds. Illustrate how we can reproduce with packages and interact with the code in a notebook environment.

17:20 - 17:25: [Silva] Demo 3 in notebook. Show case more advanced visualizations.

17:25 - 17:35: [Pentchev] Free exploration and Q&A 1 - explore CADRE features with drop down menus and notebooks, or individual discussions with the team.

17:35 - 17:45: [Yan] Introduce CADRE fellowships program and contributed projects on MAG data.

17:45 - 18:00: [Bu] Contributed demos with MAG data on CADRE. It will use a notebook environment on CADRE. Explain research background and technical challenges.

18:00 - 18:10: [Yan] Our solution: Demo 4 in the notebook environment (databricks backend).

18:10 - 18:15: [Yan] Demo 5 with full reproducible pipeline in the notebook environment. https://github.com/iuni-cadre/ReproducibilityDemo

18:15 - 18:25: [Pentchev] Free exploration and Q&A 2 - explore CADRE features with drop down menus and notebooks, or individual discussions with the team or CADRE Fellows.

18:25 - 18:30: [Yan] Regather and conclude the tutorial. Follow CADRE and Fellow events on GitHub and Twitter.