The explosive growth of big Earth data is not only driving the transformation of Earth system science towards a data-intensive paradigm but also laying the foundation for deciphering and understanding complex Earth systems. It is necessary to develop more effective solutions for extracting the required information and knowledge from massive, multisource, heterogeneous, and ubiquitous big Earth data for the seamless transformation of Earth system science.
Recently, with the support of the Strategic Priority Research Program of Chinese Academy of Sciences "Big Earth Data Science Engineering " Program (CASEarth), Prof. Xin Li and Prof. Min Feng from the Institute of Tibetan Plateau Research, Chinese Academy of Sciences, and the leading scientist of CASEarth Academician Huadong Guo from the International Research Center of Big Data for Sustainable Development Goals, together with Prof. Youhua Ran, Dr. Yang Su, Associate Prof. Feng Liu, and Prof. Chunlin Huang from the Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Prof. Huanfeng Shen from Wuhan University, Prof. Qing Xiao from the Aerospace Information Research Institute, Chinese Academy of Sciences, and Dr. Jianbin Su and Dr. Shiwei Yuan from the Three-Pole Environment Observation and Big Data Research Center, systematically reviewed the progress and challenges of big data in the field of Earth system science in the paper entitled "Big Data in Earth System Science and Progress Towards a Digital Twin," published in Nature Reviews Earth & Environment.
The article analyzes the characteristics of four types of big Earth data: remote sensing, in situ and laboratory analyses, social sensing, and simulation and reanalysis. It proposes a data assimilation framework that integrates big Earth data into Earth system models and explores key approaches such as deep learning, physics-informed machine learning, causal inference, and deep reinforcement learning to address the challenges of high dimensionality, complexity, and non-linearity in Earth system science. These big data analytical methods demonstrate a promising ability in overcoming the limits of predictability, transferability, interpretability, and decision-making, providing advanced solutions for the development of an intelligent Digital Twin of Earth (Figure 1).
Figure. 1 Transition of data use in Earth system science.
The article suggests that Big Data Assimilation is an important approach for integrating big Earth data and Earth system models. Big Data Assimilation enables the synergistic integration of machine learning and data assimilation methods, leveraging advanced computational resources. It facilitates the fusion of ultrahigh-resolution Earth system models with multi-source Earth observations, which can include non-mainstream data and social data of human systems, achieving a more spatiotemporally and physically consistent representation of Earth systems, and thus, providing a cost-effective layer for a Digital Twin of Earth (Figure 2).
Figure. 2 Big Data assimilation into ultrahigh-resolution model.
From the perspective of data-intensive Earth system science, the article examines four cutting-edge big data analytical techniques: deep learning, physics-informed machine learning, causal inference, and deep reinforcement learning. It highlights that these techniques will boost the development of data-driven geosciences. Among them, deep learning demonstrates unprecedented potential in addressing the high-dimensional, complex, and nonlinear problems of the Earth system. The combinations of deep learning with physics-informed machine learning and causal inference enhances transferability, interpretability, and predictability in Earth system research. Integrating deep learning with reinforcement learning and agent-based modeling provides an effective approach to tackle complex decision-making problems (Figure 3).
Figure. 3 Interactions between deep learning, physics-informed machine learning, causal inference and reinforcement learning in Earth system science.
Lastly, the article emphasizes that the construction of a Digital Twin of Earth requires comprehensive inclusiveness of and extensive data support from deep time and deep space Earth. As the Earth enters the Anthropocene era, the realization of a Digital Twin of Earth necessitates the seamless integration of "hard" data from natural systems and "soft" data from human systems to capture the complex interactions between human and nature systems. The development of an open-sharing and fair data culture and infrastructure is crucial to ensure the success of a Digital Twin of Earth. The shift towards a Digital Twin of Earth will be a long and challenging journey, and extensive interdisciplinary and transdisciplinary collaboration and an open-sharing and fair data culture will help address these challenges and drive the development of AI for Earth system science (Figure 4).
Figure. 4 Grand challenges of big data use in Earth system science.
This work was jointly supported by the Strategic Priority Research Program of Chinese Academy of Sciences "Big Earth Data Science Engineering" (XDA19070104) and the National Natural Science Foundation of China (41988101 and 42171140).
Citation: Li X, Feng M, Ran YH, Su Y, Liu F, Huang CL, Shen HF, Xiao Q, Su JB, Yuan SW, Guo HD. Big data in Earth system science and progress towards a digital twin. Nature Reviews Earth & Environment, 2023, 4, 319–332, doi:10.1038/s43017-023-00409-w