Informing the Prediction of Compression Method and Level for Climate Model Data Using Variable Features

Increased computing power makes it possible to simulate larger Earth system model ensembles with higher output frequency, finer spatial resolution, and extended simulation length. These improvements produce massive datasets and are straining institutional storage resources. Therefore, different compression methodologies have been studied to address this issue. It is possible to implement lossless compression methods, where the original data is perfectly preserved. However, lossy compression methods, where part of original data may not be preserved, are a more promising option due to the higher compression rates they can achieve. Previous work has demonstrated that using a combination of different lossy compression methods and levels produces better results overall because the choice of method and level can be tailored to the characteristics of each variable. Currently, determining the optimal compression method and level for each variable is computationally expensive because it involves compressing and reconstructing each variable exhaustively for each possible compression method and level. The optimal combination is then determined by assessing which method/level produces the highest data compression while still satisfying the quality criteria. The goal of this project is to streamline this process by characterizing the variables through features that will be used in a regression model to predict the optimal compression level automatically. We analyze a large ensemble of annual averages of 198 variables from the Community Earth System Model (CESM) with the final goal of informing a multinomial regression model to predict different compression levels for the fpzip compression method. Here we describe and summarize the different features that range from simple statistics to smoothness and clustering indicators, analyze their variability across ensemble members, and preliminarily evaluate their correlation with the different compression levels from fpzip.

To Access Resource:

Questions? Email Resource Support Contact:

    UCAR/NCAR - Library


Resource Type publication
Temporal Range Begin N/A
Temporal Range End N/A
Temporal Resolution N/A
Bounding Box North Lat N/A
Bounding Box South Lat N/A
Bounding Box West Long N/A
Bounding Box East Long N/A
Spatial Representation N/A
Spatial Resolution N/A
Related Links N/A
Additional Information N/A
Resource Format PDF
Standardized Resource Format PDF
Asset Size N/A
Legal Constraints

Copyright Author(s). This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Access Constraints None
Software Implementation Language N/A

Resource Support Name N/A
Resource Support Email
Resource Support Organization UCAR/NCAR - Library
Distributor N/A
Metadata Contact Name N/A
Metadata Contact Email
Metadata Contact Organization UCAR/NCAR - Library

Author Rodríguez-Jeangros, Nicolás
Hammerling, Dorit
Publisher UCAR/NCAR - Library
Publication Date 2017-09-01T00:00:00
Digital Object Identifier (DOI) Not Assigned
Alternate Identifier N/A
Resource Version N/A
Topic Category geoscientificInformation
Progress N/A
Metadata Date 2023-08-18T18:06:47.944050
Metadata Record Identifier edu.ucar.opensky::technotes:556
Metadata Language eng; USA
Suggested Citation Rodríguez-Jeangros, Nicolás, Hammerling, Dorit. (2017). Informing the Prediction of Compression Method and Level for Climate Model Data Using Variable Features. UCAR/NCAR - Library. Accessed 22 September 2023.

Harvest Source