RF-PM1010Hybrid Model for Real-Time PM1010 Forecasting in Open-Pit Copper Mine: A Case Study at the Sin Quyen Copper Mine

  • Affiliations:

    1 Vinacomin – Minerals Holding Corporation, Hanoi, Vietnam
    2 Department of Surface Mining, Hanoi University of Mining and Geology, Hanoi, Vietnam
    3 Innovations for Sustainable and Responsible Mining (ISRM) Research Group, Hanoi University of Mining and Geology, Hanoi, Vietnam
    4 Vietnam Mining Science and Technology Association, Hanoi, Vietnam

  • *Corresponding:
    This email address is being protected from spambots. You need JavaScript enabled to view it.
  • Received: 25th-Mar-2024
  • Revised: 10th-May-2025
  • Accepted: 27th-May-2025
  • Online: 1st-Aug-2025
Pages: 1 - 24
Views: 11
Downloads: 1
Rating: , Total rating: 0
Yours rating

Abstract:

Air pollution in open-pit mining areas poses significant environmental and health risks, with particulate matter (PM1010) being one of the most critical pollutants. Accurate forecasting of PM1010 concentrations is essential for real-time air quality management and dust mitigation strategies. This study develops a machine learning-based framework for PM1010 prediction at the Sin Quyen open-pit copper mine, leveraging advanced feature engineering, Principal Component Analysis (PCA), and Synthetic Minority Over-sampling technique for Regression (SMOGN) to enhance model accuracy. Six forecasting models were evaluated, including Random Forest (RF-PM1010Hybrid), XGBoost, LightGBM, ARIMA, SARIMA, and Holt-Winters exponential smoothing. The results indicate that machine learning models significantly outperform traditional time-series models with RMSE of 5.791, 8.293, 6.172, 4.233, 11.070, 13.108; MAE of 3.518, 3.953, 3.770, 4.208, 8.800, 10.224; MAPE of 11.70%, 13.18%, 12.57%, 14.03%, 29.32%, 34.07% for the RF-PM1010Hybrid, XGBoost, LightGBM, ARIMA, SARIMA, Holt-Winters, respectively. RF-PM1010Hybrid achieved the best forecasting performance, with the lowest RMSE (5.791) and MAE (3.518) on the testing dataset, followed by LightGBM and XGBoost. Conversely, statistical models (ARIMA, SARIMA, and Holt-Winters) exhibited higher forecasting errors, making them less suitable for predicting PM1010 variations in open-pit mining environments. Key methodological advancements include the integration of lag features, rolling statistics, and interaction terms, which improved the ability of ML models to capture PM1010 dynamics. SMOGN was applied to balance the dataset, ensuring better representation of high- PM1010 events. The findings demonstrated that machine learning-based approaches, particularly RF-PM1010Hybrid, provide a reliable tool for real-time PM1010 forecasting, supporting proactive dust control, regulatory compliance, and sustainable mining operations.

How to Cite
Le, N.Tuan, ., H.Nguyen, ., X.Bui and Le., H.Thu Thi 2025. RF-PM1010Hybrid Model for Real-Time PM1010 Forecasting in Open-Pit Copper Mine: A Case Study at the Sin Quyen Copper Mine (in Vietnamese). Journal of Mining and Earth Sciences. 4, 66 (Aug, 2025), 1-24. DOI:https://doi.org/10.46326/JMES.2025.66(4).01.
References

Asselman, A., et al. (2023). "Enhancing the prediction of student performance based on the machine learning XGBoost algorithm." Interactive Learning Environments 31(6): 3360-3379.            

Bhatti, U. A., et al. (2021). "Time series analysis and forecasting of air pollution particulate matter (PM 2.5): an SARIMA and factor analysis approach." IEEE Access 9: 41019-41031.        

da Silva, K. L. S., et al. (2023). "Spatio-temporal visualization and forecasting of PM 10 in the Brazilian state of Minas Gerais." Scientific reports 13(1): 3269.           

Fratello, M. and R. Tagliaferri (2018). "Decision trees and random forests." Encyclopedia of bioinformatics and computational biology: ABC of bioinformatics 1(S3): 374.

Halabaku, E. and E. Bytyçi (2024). "Overfitting in Machine Learning: A Comparative Analysis of Decision Trees and Random Forests." Intelligent Automation and Soft Computing 39(6).        

Hegelich, S. (2016). "Decision trees and random forests: Machine learning techniques to classify rare events." European policy analysis 2(1): 98-120.

Kavitha, R. and M. Priyadharshini (2024). Performance Comparison of XGBoost and LightGBM Gradient Boosting Algorithms in Predicting Cervical Cancer Risk. 2024 International Conference on Computing and Data Science (ICCDS), IEEE.    

Bui, X, N, (2021). Development of air quality control system to ensure safety and healthy in deep open-pit mine in Quang Ninh area (in Vietnamese).       

Pozza, S. A., et al. (2010). "Time series analysis of PM2. 5 and PM1010− 2.5 mass concentration in the city of Sao Carlos, Brazil." International Journal of Environment and Pollution 41(1-2): 90-108.  

Sánchez Lasheras, F., et al. (2020). "Evolution and forecasting of PM1010 concentration at the Port of Gijon (Spain)." Scientific reports 10(1): 11716.            

Sibindi, R., et al. (2023). "A boosting ensemble learning based hybrid light gradient boosting machine and extreme gradient boosting model for predicting house prices." Engineering Reports 5(4): e12599.           

Simon, S. M., et al. (2023). "Interpreting random forest analysis of ecological models to move from prediction to explanation." Scientific reports 13(1): 3881.          

Sumanth, C., et al. (2020). "“Numerical modelling of PM1010 dispersion in open-pit mines”." Chemosphere 259: 127454.

Török, Z., et al. (2023). "Modelling the dispersion of particulate matter (PM1010) via wind erosion from opencast mining—Moldova Nouă tailings ponds, Romania." Environmental monitoring and assessment 196(1): 59.         

Wang, F., et al. (2021). "Spatial heterogeneity modeling of water quality based on random forest regression and model interpretation." Environmental research 202: 111660.  

Wang, Y., et al. (2025). "An interpretable approach combining Shapley additive explanations and LightGBM based on data augmentation for improving wheat yield estimates." Computers and Electronics in Agriculture 229: 109758.         

Zhang, D. and Y. Gong (2020). "The comparison of LightGBM and XGBoost coupling factor analysis and prediagnosis of acute liver failure." IEEE Access 8: 220990-221003.        

Other articles