Identification

Title

Structural k -means (S k -means) and clustering uncertainty evaluation framework (CUEF) for mining climate data

Abstract

Dramatic increases in climate data underlie a gradual paradigm shift in knowledge acquisition methods from physically based models to data-based mining approaches. One of the most popular data clustering/mining techniques is k-means, and it has been used to detect hidden patterns in climate systems; k-means is established based on distance metrics for pattern recognition, which is relatively ineffective when dealing with "structured" data, that is, data in time and space domains, which are dominant in climate science. Here, we propose (i) a novel structural-similarity-recognition-based k-means algorithm called structural k-means or S k-means for climate data mining and (ii) a new clustering uncertainty representation/evaluation framework based on the information entropy concept. We demonstrate that the novel S k-means could provide higher-quality clustering outcomes in terms of general silhouette analysis, although it requires higher computational resources compared with conventional algorithms. The results are consistent with different demonstration problem settings using different types of input data, including two-dimensional weather patterns, historical climate change in terms of time series, and tropical cyclone paths. Additionally, by quantifying the uncertainty underlying the clustering outcomes we, for the first time, evaluated the "meaningfulness" of applying a given clustering algorithm for a given dataset. We expect that this study will constitute a new standard of k-means clustering with "structural" input data, as well as a new framework for uncertainty representation/evaluation of clustering algorithms for (but not limited to) climate science.

Resource type

document

Resource locator

Unique resource identifier

code

http://n2t.net/ark:/85065/d74b358m

codeSpace

Dataset language

eng

Spatial reference system

code identifying the spatial reference system

Classification of spatial data and services

Topic category

geoscientificInformation

Keywords

Keyword set

keyword value

Text

originating controlled vocabulary

title

Resource Type

reference date

date type

publication

effective date

2016-01-01T00:00:00Z

Geographic location

West bounding longitude

East bounding longitude

North bounding latitude

South bounding latitude

Temporal reference

Temporal extent

Begin position

End position

Dataset reference date

date type

publication

effective date

2023-04-24T00:00:00Z

Frequency of update

Quality and validity

Lineage

Conformity

Data format

name of format

version of format

Constraints related to access and use

Constraint set

Use constraints

Copyright author(s). This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Limitations on public access

None

Responsible organisations

Responsible party

contact position

OpenSky Support

organisation name

UCAR/NCAR - Library

full postal address

PO Box 3000

Boulder

80307-3000

email address

opensky@ucar.edu

web address

http://opensky.ucar.edu/

name: homepage

responsible party role

pointOfContact

Metadata on metadata

Metadata point of contact

contact position

OpenSky Support

organisation name

UCAR/NCAR - Library

full postal address

PO Box 3000

Boulder

80307-3000

email address

opensky@ucar.edu

web address

http://opensky.ucar.edu/

name: homepage

responsible party role

pointOfContact

Metadata date

2023-08-18T18:40:21.889885

Metadata language

eng; USA