Application of machine learning algorithms to handle missing values in precipitation dataстатья
Информация о цитировании статьи получена из
Scopus
Статья опубликована в журнале из списка Web of Science и/или Scopus
Дата последнего поиска статьи во внешних источниках: 4 мая 2020 г.
Аннотация:The paper presents two approaches to filling gaps in precipitation based on classification (Support-Vector Machines) and regression (EM, Random Forests, k-Nearest Neighbors) machine learning algorithms as well as the pattern-driven methodology. These methods are among of the most powerful tools for data mining in a wide range of research areas including meteorology and climatology due to the presence of a large amount of temporal and spatial observations. When collecting observations from weather stations, there are a lot of missing records. Data processing algorithms are often very sensitive to the presence of incomplete data, so missing values should be firstly imputed and only after that the complete samples can be analyzed. The possibility of a correct filling data even for high missing levels based on suggested methods is demonstrated. The observations in Potsdam and Elista for about 60 years were used. Also, comparison of various algorithms for data imputation taking into account different missing levels is presented. The proposed methodology can be successfully used for real-time data processing of information flows.