Vegetation change detection based on time series analysis by Apache Spark and RasterFrame
- Authors: Dung Mai Thi Nguyen*, Thu Hoai Thi Vu
Hanoi University of Mining and Geology, Vietnam
- Keywords: Apache Spark, MODIS, NDVI, RasterFrames, Spatial bigdata, Time series analysis.
- Received: 18th-Sept-2020
- Revised: 9th-Jan-2021
- Accepted: 2nd-Feb-2021
- Online: 28th-Feb-2021
- Section: Geomatics and Land Administration
Spatial big data has a large scale and complex, therefore, it cannot be collected, managed, and analyzed by traditional data analytic software shortly. These platforms in many situations are restricted to vectors data. However, the raster data generated by the sensors on the enormous number of satellites now needs to be processed in parallel on the cluster environment. The article introduces the satellite image data analyzing method using the RasterFrames library on the Apache Spark platform. The RasterFrames library examines raster data for Python, Scala, and SQL, bringing the power of Spark DataFrames to access to Earth Observation, cloud computing, and data science. In the experimental part, the NDVI and the change in the average value of NDVI in the time series are calculated to demonstrate the vegetation mantle changes in Phu Tho province. These results are the reference data source in the assessment of weather, climate, and environmental changes in the study area during that time.
Aji, A., Sun, X., Vo, H., Liu, Q., Lee, R., Zhang, X., Saltz, J. and Wang, F., (2013). Demonstration of Hadoop-GIS: a spatial data warehousing system over MapReduce. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 528-531). ACM.
Boyi Shangguan, Peng Yue, Zhaoyan Wu and Liangcun Jiang, (2017). Big spatial data processing with Apache Spark. In Agro-Geoinformatics, 2017. IEEE.
Eldawy, A. and Mokbel, M. F., (2015). SpatialHadoop: A MapReduce framework for spatial data. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on (pp. 1352- 1363). IEEE.
Databricks. Apache Spark – What is Spark. http://databricks.com/spark .
Fei Xiao, (2017). A Big Spatial Data Processing Framework Applying to National Geographic Conditions Monitoring. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3, 2018 ISPRS TC III Mid-term Symposium “Developments, Technologies and Applications in Remote Sensing”, 7-10 May, Beijing, China.
Huang, Z., Chen, Y., Wan, L., and Peng, X., (2017). GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark. In ISPRS International Journal of Geo- Information, 6(9), 285.
Hughes, J. N., Annex, A., Eichelberger, C. N., Fox, A., Hulbert, A. and Ronquest, M., (2015). Geomesa: a distributed architecture for spatio-temporal fusion. In SPIE Defense+ Security (pp. 94730F-94730F). International Society for Optics and Photonics.
Kini, A., and R., (2014). Emanuele. Geotrellis: Adding geospatial capabilities to spark. In Spark Summit.
Lu, J. and Guting, R. H., (2012). Parallel secondo: boosting database engines with hadoop. In Parallel and Distributed Systems (ICPADS), (2012) IEEE 18th International Conference on (pp. 738-743). IEEE.
MODIS on AWS https://docs.opendata.aws/ modis -pds/readme.html.
Nishimura, S., Das, S., Agrawal, D. and El Abbadi, A., (2011), June. Md-hbase: A scalable multi-dimensional data infrastructure for location aware services. In Mobile Data Management (MDM), 2011 12th IEEE International Conference on (Vol. 1, pp. 7-16). IEEE.
Ram Sriharsha, https://github.com/harsha2010 /magellan.
RasterFrames. http://rasterframes.io/.Stefan Hagedorn, Philipp Götze, Kai-Uwe Sattler, (2017). Big Spatial Data Processing Frameworks: Feature and Performance Evaluation. In 20th International Conference on Extending Database Technology (EDBT).
Thomas Lillesand, Ralph W., (2004). Kiefer, Jonathan Chipman. Remote sensing and image interpretation. Wiley.
You, S., Zhang, J. and Gruenwald, L., (2015). Large-scale spatial join query processing in cloud. In Data Engineering Workshops (ICDEW), 2015 31st IEEE International Conference on (pp. 34-41). IEEE.
Yu, J., Wu, J. and Sarwat, M., (2015). Geospark: A cluster computing framework for processing large-scale spatial data. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems (p.70). ACM.