intro.tex

%!TEX root = paper.tex
\section{Introduction}
\label{sec:intro}

Recently, owing to the advances in deep learning, the AI, has been transformational in various aspects of our life. These advances have resulted in machine learning being one of the effective techniques for scientific data analysis, covering a number of domains of sciences, such as material, life, and environmental sciences, particle physics and astronomy~\cite{jtmnras1,jtmnras2,natrev:jeyan,roysoc:tonyhey,2020:callaway,2021:jumper,tanaka:2021}. With AI becoming one of the underpinning technologies for science, there is a considerable amount of attention on several aspects of AI, including, but not limited to, understanding the general applicability of AI/ML to various scientific problems, role of high performance computing on AI/ML, datasets, explainability and robustness of  AI/ML techniques, role of small-scale devices on AI/ML, AI/ML-specific algorithms, and scalability of AI/ML techniques with varying volumes of data or varying computational capabilities. With each of these areas being considerably large, it is a substantial undertaking for any single organization or community for developing an overall understanding  of various initiatives and their corresponding impacts, particularly across different domains of applications. Ideally, multiple communities should join forces to understand these issues and to make relevant progresses in AI.  

MLCommons is one such global initiative with the mission being {\em accelerate machine learning
innovation and increase its positive impact on society}. Although MLCommons\texttrademark\  initiatives were legally setup in 2020, the initiatives originated along with the MLPerf\texttrademark\ benchmarking efforts in 2018. The overarching strands are: benchmarks, datasets, and best practice systems and usage. The current MLCommons initiatives retain the core activities of MLPerf across six distinct focus areas: Training, Training HPC, Inference Datacenter, Inference Edge, Inference Mobile, and Inference Tiny. With application and impact of AI being rather broad, MLCommons is setup along with a number of research working groups with the vision of creating an open ``\emph{AI for Research}'' ecosystem
that is driven by the community for the community\footnote{\url{https://mlcommons.org/en/groups/research/}}. 
These groups are open to the public, including academics and researchers. The philosophy of MLCommons is to support open-source ``AI for Research''. The MLCommons Research organization is responsible for overseeing new activities that can lead to new scientific methods in ML, as well as new applications of ML, and currently houses a number of working groups that focus on various areas of ML. These include: ML algorithms (Algorithms),  dataset benchmarking (DataPerf), building shared resource infrastructure (Dynabench), benchmarking and best practices for healthcare (Medical), storage benchmarking for ML (Storage),and AI benchmarking for science (Science)~\cite{mlcommons-science}. 

In this paper, we describe the benchmarking initiatives of the Science Working Group, covering our initial set of benchmarks, datasets, policies that govern our benchmarks and benchmarking, rules around submitting new benchmarks or datasets, and some initial results on the evaluation of these benchmarks. 

The rest of this paper is organized as follows: In Section~\ref{sec:science-wg}, we describe the working group, goals of the group, and policies adopted by the working group towards science benchmarking. This is then followed by Section~\ref{sec:benchmarks}, where we describe the initial set of benchmarks curated by the working group. In Section~\ref{sec:evaluation}, we provide some initial evaluations and discuss the results, and we conclude the paper with future directions in Section~\ref{sec:conclusions}.