Structural k -means (S k -means) and clustering uncertainty evaluation framework (CUEF) for mining climate data

Dramatic increases in climate data underlie a gradual paradigm shift in knowledge acquisition methods from physically based models to data-based mining approaches. One of the most popular data clustering/mining techniques is k-means, and it has been used to detect hidden patterns in climate systems; k-means is established based on distance metrics for pattern recognition, which is relatively ineffective when dealing with "structured" data, that is, data in time and space domains, which are dominant in climate science. Here, we propose (i) a novel structural-similarity-recognition-based k-means algorithm called structural k-means or S k-means for climate data mining and (ii) a new clustering uncertainty representation/evaluation framework based on the information entropy concept. We demonstrate that the novel S k-means could provide higher-quality clustering outcomes in terms of general silhouette analysis, although it requires higher computational resources compared with conventional algorithms. The results are consistent with different demonstration problem settings using different types of input data, including two-dimensional weather patterns, historical climate change in terms of time series, and tropical cyclone paths. Additionally, by quantifying the uncertainty underlying the clustering outcomes we, for the first time, evaluated the "meaningfulness" of applying a given clustering algorithm for a given dataset. We expect that this study will constitute a new standard of k-means clustering with "structural" input data, as well as a new framework for uncertainty representation/evaluation of clustering algorithms for (but not limited to) climate science.

To Access Resource:

Questions? Email Resource Support Contact:

  • opensky@ucar.edu
    UCAR/NCAR - Library

Resource Type publication
Temporal Range Begin N/A
Temporal Range End N/A
Temporal Resolution N/A
Bounding Box North Lat N/A
Bounding Box South Lat N/A
Bounding Box West Long N/A
Bounding Box East Long N/A
Spatial Representation N/A
Spatial Resolution N/A
Related Links N/A
Additional Information N/A
Resource Format PDF
Standardized Resource Format PDF
Asset Size N/A
Legal Constraints

Copyright author(s). This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Access Constraints None
Software Implementation Language N/A

Resource Support Name N/A
Resource Support Email opensky@ucar.edu
Resource Support Organization UCAR/NCAR - Library
Distributor N/A
Metadata Contact Name N/A
Metadata Contact Email opensky@ucar.edu
Metadata Contact Organization UCAR/NCAR - Library

Author Doan, Quang-Van
Amagasa, Toshiyuki
Pham, Thanh-Ha
Sato, Takuto
Chen, Fei
Kusaka, Hiroyuki
Publisher UCAR/NCAR - Library
Publication Date 2023-04-24T00:00:00
Digital Object Identifier (DOI) Not Assigned
Alternate Identifier N/A
Resource Version N/A
Topic Category geoscientificInformation
Progress N/A
Metadata Date 2023-08-18T18:40:21.889885
Metadata Record Identifier edu.ucar.opensky::articles:26283
Metadata Language eng; USA
Suggested Citation Doan, Quang-Van, Amagasa, Toshiyuki, Pham, Thanh-Ha, Sato, Takuto, Chen, Fei, Kusaka, Hiroyuki. (2023). Structural k -means (S k -means) and clustering uncertainty evaluation framework (CUEF) for mining climate data. UCAR/NCAR - Library. http://n2t.net/ark:/85065/d74b358m. Accessed 26 June 2025.

Harvest Source