Earth Environment Sciences & Geosciences require a large panel and volume of data from satellite, in-situ observations and climate models, that are managed and preserved separately in domain-dependant repositories of national or European infrastructures. As the data sources are widely distributed, it induces real difficulties to achieve inter-processing and integrated uses of the data for comprehensive studies at domain or cross-domain level.
In this context, the goal is to offer services that facilitate and speed up access to the distributed data sources and provide a web-based processing environment for Data Science notebooks supporting the Pangeo community platform. This use case aims also to enhance data discovery and data access, relying on existing services provided by the Earth Sciences communities such as in France, the Research Infrastructure Data Terra, and in Europe, the consortium of Environment Research Infrastructures (ENVRI), the climate community, and beyond.
The web-based processing environment proposed by the use case has to address the needs of two types of user of Data Science notebooks on Earth environment & Geosciences data :
The first challenge to tackle is to design and implement the IT infrastructure (services and resources) underlying this virtual environment and that enables to speed up and facilitate access to data, taking into account the specificities of data sources, i. e. large volumes of data from distributed repositories.
The second one is to enable users to easily discover cross-domain data collections and services, even if these resources are distributed and based on different metadata and ontologies.
The use case is built on top of IT services and resources proposed by national infrastructures involved in the project:
EOSC-Pillar also gives the opportunity to test cross-domain and transnational interoperability, as this use case gathers multi-domain data repositories from France, Germany and Italy.
The overarching goal is to demonstrate what EOSC could offer to the Earth environment & Geosciences communities, i.e. a cloud platform to run big data analysis in Europe as an alternative to using large private providers for storage and computing. To this end, the use case focuses on Pangeo as it is a community platform for big data (Geo)sciences oriented toward Python scripts developers, that is (1) fostering collaboration around the open-source Scientific Python ecosystem and (2) involving many relevant technologies: HPC, containers, notebooks, advanced data structures (“Data Cubes”) for efficient access, remote access to data.
As the use case is ambitious to design and implement, the partners chose to split it into three sub-use cases, in order of priorities
In addition, this use case aims to go as far as possible in the Proof of concept within the EOSC-Pillar project in collaboration with the technical Work Packages 7 and 5.