The U.S. Department of Energy (DOE) advanced Computational and Data Infrastructures (CDIs) – such as supercomputers, edge systems at experimental facilities, massive data storage, and high-speed networks – are brought to bear to solve the nation’s most pressing scientific problems.
The problems include assisting in astrophysics research, delivering new materials, designing new drugs, creating more efficient engines and turbines, and making more accurate and timely weather forecasts and climate change predictions.
Increasingly, computational science campaigns are leveraging distributed, heterogeneous scientific infrastructures that span multiple locations connected by high-performance networks, resulting in scientific data being pulled from instruments to computing, storage, and visualisation facilities.
However, since these federated services infrastructures tend to be complex and managed by different organisations, domains, and communities, both the operators of the infrastructures and the scientists that use them have limited global visibility, which results in an incomplete understanding of the behaviour of the entire set of resources that science workflows span.
Although scientific workflow systems increase scientists’ productivity to a great extent by managing and orchestrating computational campaigns, the intricate nature of the CDIs, including resource heterogeneity and the deployment of complex system software stacks, pose several challenges in predicting the behaviour of the science workflows and in steering them past system and application anomalies.
Our new project will provide an integrated platform consisting of algorithms, methods, tools, and services that will help DOE facility operators and scientists to address these challenges and improve the overall end-to-end science workflow.
– Research professor of computer science and research director at the University of Southern California
Under a new DOE grant, the project aims to advance the knowledge of how simulation and machine learning (ML) methodologies can be harnessed and amplified to improve the DOE’s computational and data science.
The project will add three important capabilities to current scientific workflow systems — (1) predicting the performance of complex workflows; (2) detecting and classifying infrastructure and workflow anomalies and “explaining” the sources of these anomalies; and (3) suggesting performance optimisations. To accomplish these tasks, the project will explore the use of novel simulation, ML, and hybrid methods to predict, understand, and optimise the behaviour of complex DOE science workflows on DOE CDIs.
Assistant director for network research and infrastructure at RENCI stated that in addition to creating a more efficient timeline for researchers, we would like to provide CDI operators with the tools to detect, pinpoint, and efficiently address anomalies as they occur in the complex DOE facilities landscape.
To detect anomalies, the project will explore real-time ML models that sense and classify anomalies by leveraging underlying spatial and temporal correlations and expert knowledge, combine heterogeneous information sources, and generate real-time predictions.
Successful solutions will be incorporated into a prototype system with a dashboard that will be used for evaluation by DOE scientists and CDI operators. The project will enable scientists working on the frontier of DOE science to efficiently and reliably run complex workflows on a broad spectrum of DOE resources and accelerate time to discovery.
Furthermore, the project will develop ML methods that can self-learn corrective behaviours and optimise workflow performance, with a focus on explainability in its optimisation methods. Working together, the researchers behind Poseidon will break down the barriers between complex CDIs, accelerate the scientific discovery timeline, and transform the way that computational and data science are done.
As reported by OpenGov Asia, the U.S. Department of Energy’s (DOE) Argonne National Laboratory is leading efforts to couple Artificial Intelligence (AI) and cutting-edge simulation workflows to better understand biological observations and accelerate drug discovery.
Argonne collaborated with academic and commercial research partners to achieve near real-time feedback between simulation and AI approaches to understand how two proteins in the SARS-CoV-2 viral genome interact to help the virus replicate and elude the host’s immune system.