Cloud-native repositories for big scientific data

Scientific data have traditionally been distributed via downloads from data server to local computer. This way of working suffers from limitations as scientific datasets grow toward the petabyte scale. A “cloud-native data repository,” as defined in this article, offers several advantages over traditional data repositories—performance, reliability, cost-effectiveness, collaboration, reproducibility, creativity, downstream impacts, and access and inclusion. These objectives motivate a set of best practices for cloud-native data repositories: analysis-ready data, cloud-optimized (ARCO) formats, and loose coupling with data-proximate computing. The Pangeo Project has developed a prototype implementation of these principles by using open-source scientific Python tools. By providing an ARCO data catalog together with on-demand, scalable distributed computing, Pangeo enables users to process big data at rates exceeding 10 GB/s. Several challenges must be resolved in order to realize cloud computing’s full potential for scientific research, such as organizing funding, training users, and enforcing data privacy requirements.

To Access Resource:

Questions? Email Resource Support Contact:

  • opensky@ucar.edu
    UCAR/NCAR - Library

Resource Type publication
Temporal Range Begin N/A
Temporal Range End N/A
Temporal Resolution N/A
Bounding Box North Lat N/A
Bounding Box South Lat N/A
Bounding Box West Long N/A
Bounding Box East Long N/A
Spatial Representation N/A
Spatial Resolution N/A
Related Links N/A
Additional Information N/A
Resource Format PDF
Standardized Resource Format PDF
Asset Size N/A
Legal Constraints

Copyright author(s). This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Access Constraints None
Software Implementation Language N/A

Resource Support Name N/A
Resource Support Email opensky@ucar.edu
Resource Support Organization UCAR/NCAR - Library
Distributor N/A
Metadata Contact Name N/A
Metadata Contact Email opensky@ucar.edu
Metadata Contact Organization UCAR/NCAR - Library

Author Abernathey, Ryan P.
Augspurger, Tom
Banihirwe, Anderson
Blackmon-Luca, Charles C.
Crone, Timothy J.
Gentemann, Chelle L.
Hamman, Joseph J.
Henderson, Naomi
Lepore, Chiara
McCaie, Theo A.
Robinson, Niall H.
Signell, Richard P.
Publisher UCAR/NCAR - Library
Publication Date 2021-03-01T00:00:00
Digital Object Identifier (DOI) Not Assigned
Alternate Identifier N/A
Resource Version N/A
Topic Category geoscientificInformation
Progress N/A
Metadata Date 2023-08-18T18:29:08.798590
Metadata Record Identifier edu.ucar.opensky::articles:24291
Metadata Language eng; USA
Suggested Citation Abernathey, Ryan P., Augspurger, Tom, Banihirwe, Anderson, Blackmon-Luca, Charles C., Crone, Timothy J., Gentemann, Chelle L., Hamman, Joseph J., Henderson, Naomi, Lepore, Chiara, McCaie, Theo A., Robinson, Niall H., Signell, Richard P.. (2021). Cloud-native repositories for big scientific data. UCAR/NCAR - Library. http://n2t.net/ark:/85065/d7q52t1h. Accessed 30 June 2025.

Harvest Source