Data accessibility plays a crucial role in modern research for the Natural and Engineering Sciences (NES), and it is pivotal in the journey towards Open Science. However, it is still challenging to quickly access and efficiently process relevant large datasets, such as Earth-related spatiotemporal datasets that are continuously growing in volume as data sources diversify and data collection frequency increases. Such datasets are mostly made available in the Cloud and cloud-native data access and processing are ramping up as modern digital competences. Cloud-based processing is also desirable, because bringing computation close to data usually increases efficiency and reduces research time. Unfortunately, the highly inefficient approach of downloading data and analysing it locally is still a standard practice for most researchers. Sometimes this is involuntary because the data is not provided in a format that well suits cloud-based processing, but it is also not uncommon that the skills necessary for cloud-based data access and processing are lacking. This is also observed in research data publishing, where researchers publish data in formats that hinder efficient access and interoperability in the Cloud, even if cloud-optimized formats could be used at no additional cost.
This project aims to stimulate the use of cloud-native tools and technologies to publish, access, and process research data in the NES domain in the Netherlands. We will reach this goal by developing a proof-of-concept infrastructure, and by using it to show the benefits of a cloud-native approach to the NES community and provide them the necessary training to improve their digital skills. For this purpose, we will first build a public cloud-native data repository with co-located data analysis capabilities that will be operational during the project period. Then we will make selected datasets that are relevant to the NES domain available on the platform, such as Dutch Public Services on the Map (PDOK) and Royal Netherlands Meteorological Institute (KNMI) data, after transforming them from their existing formats into cloud-optimized ones. By using this data infrastructure, we will demonstrate how efficient cloud-native solutions are for common research workflows as compared to the traditional formats and methods. We will promote the benefits of cloud-native research in an evidence-based manner by providing reliable and reproducible performance benchmarks. To enable quick uptake of the cloud-native approach, we will develop open training material specifically targeting the NES domain and organize training workshops at different locations and prominent computing and Open Science events in the Netherlands. Researchers and other stakeholders will learn
how to design, develop, and run cloud-native data access and processing workflows with hands-on practices by using
the developed data infrastructure. Moreover, they will also learn how to create and publish cloud-native datasets
easily and efficiently by following the identified best practices. Finally, we will share the experience and lessons learned during the project with all relevant (inter)national stakeholders through dedicated meetings. Well-known research data providers will be especially targeted to motivate the implementation of cloud-native infrastructure, for which we will also provide detailed guidelines and training on how to build such infrastructure.