Identification

Title

Traps, pitfalls and misconceptions of machine learning applied to scientific disciplines

Abstract

In the last decade, Machine Learning has experienced a dramatic increase in performance on a wide variety of tasks, including computer vision, speech recognition, text parsing, and language translation, just to name a few. This has corresponded to an understandable hype especially for the remarkable results achieved in some cases. Therefore, practitioners of Scientific Disciplines have become interested in utilizing new Machine Learning techniques, and have sometimes started doing so with mixed success. The purpose of this paper is to describe some of the common Traps, Pitfalls and Misconceptions of Machine Learning as relevant to the Scientific Discipline, and how to avoid them. In fact, Machine Learning and Deep Learning are fast evolving fields, and some of the astonishing results achieved recently sit on small but important details which have become the state of the art. Some of these details are not broadly known by the scientific community. No new scientific result is presented in this paper, which is a survey and a summary of the best of the field, for the benefit of researchers with limited experience. It is not the intention of the authors to provide any criticism to the work of experienced practitioners, particularly not to the ones working on the cutting edge of what is currently possible: in these cases expert researchers may well be doing exactly what we recommend here to avoid, and for a good reason. However we believe that the advice provided here will be useful, and perhaps even a reference, for the newcomers of the field.

Resource type

document

Resource locator

Unique resource identifier

code

https://n2t.org/ark:/85065/d71r6tnr

codeSpace

Dataset language

eng

Spatial reference system

code identifying the spatial reference system

Classification of spatial data and services

Topic category

geoscientificInformation

Keywords

Keyword set

keyword value

Text

originating controlled vocabulary

title

Resource Type

reference date

date type

publication

effective date

2016-01-01T00:00:00Z

Geographic location

West bounding longitude

East bounding longitude

North bounding latitude

South bounding latitude

Temporal reference

Temporal extent

Begin position

End position

Dataset reference date

date type

publication

effective date

2019-07-28T00:00:00Z

Frequency of update

Quality and validity

Lineage

Conformity

Data format

name of format

version of format

Constraints related to access and use

Constraint set

Use constraints

Copyright 2019 Author(s).

Limitations on public access

None

Responsible organisations

Responsible party

contact position

OpenSky Support

organisation name

UCAR/NCAR - Library

full postal address

PO Box 3000

Boulder

80307-3000

email address

opensky@ucar.edu

web address

http://opensky.ucar.edu/

name: homepage

responsible party role

pointOfContact

Metadata on metadata

Metadata point of contact

contact position

OpenSky Support

organisation name

UCAR/NCAR - Library

full postal address

PO Box 3000

Boulder

80307-3000

email address

opensky@ucar.edu

web address

http://opensky.ucar.edu/

name: homepage

responsible party role

pointOfContact

Metadata date

2025-07-11T19:26:57.636169

Metadata language

eng; USA