Detect and process outliers for temperature data at 3h monitoring stations in Vietnam
1 Faculty of Information Technology, Hanoi University of Mining and Geology, Vietnam;
2 AI Academy Vietnam, Vietnam;
3 Center for Hydro - Meteorological Data and Information, Vietnam;
4 Falculty of Information Technology Technical University, Vietnam
- Received: 15th-Nov-2019
- Revised: 6th-Jan-2020
- Accepted: 28th-Feb-2020
- Online: 28th-Feb-2020
- Section: Information Technology
Data preparation is a compulsory process in any data science project. Many research have shown that it constitutes 80% of the time, effort and resources of a data science project. Depending on the particular project and data type, Data preparation step may required different methods/steps. Detecting and processing outlier data is one of the important preprocessing steps in data preparation , especially for time series data. This paper reviews two methods for detecting outliers for low dimensional data, namely Z - Score and Box - plot charts. We also present results of experiments which applied these methods for temperature data collected from 43 monitoring stations in 3 - hour in Vietnam over the last 6 years from 01/01/2014 to 31/12/2019.
Charu C., Aggarwal, (2017). Outlier Analysis, Springer International Publishing AG, New York.
Davy Cielen, Arno D. B., Meysman, Mohamed Ali, (2016). Introducing Data Science, Manning Publications Co.
Hermine N., Akouemo, Richard J. Povinelli, (2014). Time series outlier detection and imputation, IEEE.
Nguyễn Văn Tuấn, (2014). Phân tích dữ liệu với R, Nhà xuất bản tổng hợp Thành phố Hồ Chí Minh.
Ranga Suri, N. N. R , Narasimha Murty M., Athithan, G., (2018). Outlier Detection: Techniques and Applications, Springer Nature Switzerland AG, Cham.
Tamara Munzer, (2014). Visualization Analysis and Design,CRC Press.