Evaluation of DeepLabV3+ with ResNet backbone for building segmentation using UAV images

Trích xuất dữ liệu từ ảnh viễn thám, ảnh hàng không và ảnh UAV sử dụng mạng học sâu đang là hướng nghiên cứu thu hút được nhiều sự quan tâm. Tòa nhà là thông tin trung tâm trong quá trình phát triển và quản lý đô thị cũng như các vấn đề dân số và môi trường. Do đó, việc tự động trích xuất tòa nhà trên tư liệu ảnh là vấn đề được đặt ra cho cả nghiên cứu và trong thực tiễn sản xuất. Bài báo trình bày kết quả trích xuất tòa nhà từ ảnh UAV sử dụng mạng học sâu DeepLabV3+ với (backbone) là cấu trúc mạng phần dư (ResNet) trên bộ mẫu dữ liệu của nhóm nghiên cứu xây dựng. Bộ mẫu dữ liệu tòa nhà gồm 6500 mẫu ảnh có kích cỡ 512 x 512 pixels được xây dựng từ ảnh UAV độ phân giải cao dựa trên sự thay đổi về kiến trúc, hình dạng và phân bố của tòa nhà ở một số tỉnh, thành phố của nước ta. Kết quả chỉ ra rằng độ chính xác dự đoán tòa nhà tính theo chỉ số IoU (tỉ lệ diện tích vùng giao trên vùng hợp) đạt mức 0,774 với backbone ResNet101. Tuy nhiên độ chính xác dự đoán tòa nhà từ mô hình chịu ảnh hưởng lớn bởi đặc điểm kiến trúc và phân bố của tòa nhà. Đối với khu vực đô thị mới và khu vực ngoại ô, độ chính xác dự đoán tòa nhà có thể đạt IoU = 0,874 và 0,857. Tuy nhiên độ chính xác này chỉ đạt IoU = 0,762 và 0,673 đối với khu công nghiệp và khu đô thị cũ. Kết quả của nghiên cứu cho phép ứng dụng mô hình mạng học sâu DeepLabV3+ trong trích xuất dữ liệu tòa nhà phục vụ công tác quản lý và phát triển đô thị cũng như các vấn đề dân số và môi trường ở nước ta.

Trích dẫn

Phạm Trung Dũng, Trương Minh Hùng, Đoàn Thị Nam Phương, Tạ Thị Thu Hường, Nguyễn Thị Hà và Nguyễn Thị Mến, 2025. Đánh giá mô hình DeepLabV3+ sử dụng Backbone ResNet để trích xuất tòa nhà từ ảnh UAV, Tạp chí Khoa học kỹ thuật Mỏ - Địa chất, số 66, kỳ 3, tr. 14-28.

Tài liệu tham khảo

Al Shafian, S., and Hu, D. (2024). Integrating machine learning and remote sensing in disaster management: A decadal review of post-disaster building damage assessment. Buildings, 14(8), 2344.

Atik, S. O., Atik, M. E., and Ipbuker, C. (2022). Comparative research on different backbone architectures of DeepLabV3+ for building segmentation. Journal of Applied Remote Sensing, 16(2), 024510-024510.

Bhatt, D., Patel, C., Talsania, H., Patel, J., Vaghela, R., Pandya, S.,,... Ghayvat, H. (2021). CNN variants for computer vision: History, architecture, application, challenges and future scope. Electronics, 10(20), 2470.

Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2017a). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4), 834-848.

Chen, L. C., Papandreou, G., Schroff, F., and Adam, H. (2017b). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.

Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801-818.

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R.,,... Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE conference on computer vision and pattern recognition, (CVPR), 2016, pp. 3213-3223.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. 2009 IEEE conference on computer vision and pattern recognition, Miami, FL, USA, 2009, pp. 248-255, doi: 10.1109/CVPR.2009.5206848.

Everingham, M., Van Gool, L., Williams, C. K., Winn, J., and Zisserman, A. J. I. j. o. c. v. (2010). The pascal visual object classes (voc) challenge. 88, 303-338. https://doi.org/10.1007/s11263-009-0275-4.

Feng, W., Sui, H., Hua, L., Xu, C., Ma, G., and Huang, W. J. I. J. o. R. S. (2020). Building extraction from VHR remote sensing imagery by combining an improved deep convolutional encoder-decoder architecture and historical land use vector map. International Journal of Remote Sensing, 41(17), 6595–6617. https://doi.org /10.1080/01431161.2020.1742944.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778.

He, K., Zhang, X., Ren, S., Sun, J. J. I. t. o. p. a., and intelligence, m. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9), 1904-1916.

Hochreiter, S. J. N. C. M.-P. (1997). Long Short-term Memory. Neural Computation 8(9), 1735-1780. DOI: 10.1162/neco.1997.9.8. 1735.

Hu, Q., Zhen, L., Mao, Y., Zhou, X., and Zhou, G. J. A. i. C. (2021). Automated building extraction using satellite remote sensing imagery. Automation in Construction 123, 103509. https://doi.org /10.1016/j.autcon.2020.103509.

Huang, J., Li, P., Wang, W., and Pei, Y. (2022). Research on Building Extraction method based on Object-oriented and ArcGIS Engine. 2022 3rd International Conference on Geology, Mapping and Remote Sensing (ICGMRS), IEEE. DOI: 10.1109/ICGMRS55602. 2022.9849324.

Ioffe, S. J. a. p. a. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:448-456, 2015.

Jadon, S. (2020). A survey of loss functions for semantic segmentation. 2020 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB), IEEE DOI: 10.1109/CIBCB48159.2020.9277638.

Ji, S., Wei, S., Lu, M. J. I. T. o. g., and sensing, r. (2018). Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Transactions on Geoscience and Remote Sensing 57(1), 574-586. DOI: 10.1109/TGRS.2018.2858817.

Khan, S., Rahmani, H., Shah, S. A. A., Bennamoun, M., Medioni, G., and Dickinson, S. (2018). A guide to convolutional neural networks for computer vision. Springer Cham. https://doi. org/10.1007/978-3-031-01821-3.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. J. A. i. n. i. p. s. (2012). Imagenet classification with deep convolutional neural networks. Publication History 6(60) 84-90. https://doi. org/10.1145/3065386.

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. J. N. c. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation 1(4), 541-551. DOI: 10.1162/ neco.1989.1.4.541.

Li, J., Huang, X., Tu, L., Zhang, T., and Wang, L. (2022). A review of building detection from very high resolution optical remote sensing images. GIScience and Remote Sensing 59(1), 1199-1225. https://doi.org/10.1080/15481603.2022.21 01727.

Li, W., and Zhao, S. (2022). Semantic segmentation of buildings in high-resolution remote sensing images based on DeepLabV3+ algorithm. In Journal of Physics: Conference Series (Vol. 2400, No. 1, p. 012037). IOP Publishing.

Li, Z., and Guo, Y. (2020). Semantic segmentation of landslide images in Nyingchi region based on PSPNet network. 2020 7th International Conference on Information Science and Control Engineering (ICISCE), IEEE. DOI: 10.1109/ICISCE50968.2020.00256.

Long, L., He, F., and Liu, H. J. T. J. o. S. (2021). The use of remote sensing satellite using deep learning in emergency monitoring of high-level landslides disaster in Jinsha River. J Supercomput 77, 8728–8744 (2021). https:// doi.org/10.1007/s11227-020-03604-4.

Luo, L., Li, P., and Yan, X. J. E. (2021). Deep learning-based building extraction from remote sensing images: A comprehensive review. Energies 2021, 14, 7982. https://doi.org/10.3390 /en14237982.

Maggiori, E., Tarabalka, Y., Charpiat, G., and Alliez, P. (2017). Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark. 2017 IEEE International geoscience and remote sensing symposium (IGARSS), IEEE DOI: 10.1109/IGARSS.2017. 8127684

Mnih, V. (2013). Machine learning for aerial image labeling. University of Toronto (Canada). University of Toronto (Canada) ProQuest Dissertations and Theses, 2013. NR96184.

Punn, N. S., Agarwal, S. J. A. T. o. M. C., Communications,, and Applications. (2020). Inception u-net architecture for semantic segmentation to identify nuclei in microscopy cell images. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM),16(1), 1-15. https://doi.org/10.1145 /3376922

Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 (pp. 234-241). Springer international publishing.

Rottensteiner, F., Sohn, G., Jung, J., Gerke, M., Baillard, C., Benitez, S., and Breitkopf, U. (2012). The ISPRS benchmark on urban object classification and 3D building reconstruction. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, I-3, 1(1), 293-298.

Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

Srivastava, R. K., Greff, K., and Schmidhuber, J. J. a. p. a. (2015). Highway networks. Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE) https://doi.org/10. 48550/ arXiv.1505.00387.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D.,,... Rabinovich, A. (2015). Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition, (CVPR), 2015, pp. 1-9

Wang, Y., Yang, L., Liu, X., and Yan, P. J. S. R. (2024). An improved semantic segmentation algorithm for high-resolution remote sensing images based on DeepLabv3+. Sci Rep 14(1), 9716. https://doi.org/10.1038/s41598-024-60375-1

Wang, Z., Xu, N., Wang, B., Liu, Y., and Zhang, S. (2022). Urban building extraction from high-resolution remote sensing imagery based on multi-scale recurrent conditional generative adversarial network. GIScience and Remote Sensing 59(1), 861-884. https://doi.org/10. 1080/15481603.2022.2076382

Xu, S., and Wang, Y. (2024). Fusion of fractal features DeepLabV3+ remote sensing image building segmentation. 2024 43rd Chinese Control Conference (CCC), IEEE DOI:10.23919/CCC6 3176.2024.10662351

Xu, Y., Wu, L., Xie, Z., and Chen, Z. J. R. S. (2018). Building extraction in very high resolution remote sensing imagery using deep learning and guided filters. Remote Sens 10(1), 144. https://doi.org/10.3390/rs10010144.

Yu, W., Yang, K., Bai, Y., Xiao, T., Yao, H., and Rui, Y. (2016). Visualizing and comparing AlexNet and VGG using deconvolutional layers. Proceedings of the 33 rd International Conference on Machine Learning, (Vol. 3, pp. 43-76).

Zhao, X., Wang, L., Zhang, Y., Han, X., Deveci, M., and Parmar, M. (2024). A review of convolutional neural networks in computer vision. Artificial Intelligence Review, 57(4), 99.

Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., and Torralba, A. J. I. J. o. C. V. (2019). Semantic understanding of scenes through the ade20k dataset. 127, 302-321. Int J Comput Vis 127, 302–321 (2019). https://doi.org/10. 1007/s11263-018-1140-0.

Các bài báo khác

3. Dự đoán giá trị độ cao sử dụng mạng nơ ron hồi tiếp (GRU)

4. Nghiên cứu, tính toán lựa chọn kết cấu hợp lý của khung máy ép thuỷ lực 4 trụ, lực ép 20 tấn

5. Sử dụng phương pháp thăm dò điện trở (ERT) trong nghiên cứu đánh giá tai biến trượt lở đất tại dự án khu đô thị X, Thành phố Đà Lạt

6. Phân tích nội lực vỏ hầm trong sơ đồ bố trí hai đường hầm song song cùng trục thẳng đứng

7. Nghiên cứu ứng dụng phương pháp xác định độ rỗng của vật liệu từ ảnh chụp hiển vi điện tử quét

	Citations	1210
	h-index	11
	i10-index	21

Abstracting & Indexing