5 Discussion

5.1 Evaluating satellite data and open-source tools for monitoring coastal shoreline fluctuations

Understanding which publicly available satellite data is suitable for observing coastal shoreline fluctuations is central to improving monitoring methods. While Sentinel-2 data proved effective in this project, other options like PlanetScope and RapidEye offer complementary advantages. PlanetScope provides higher resolution (3m) and daily revisit capability, making it well-suited for detecting finer sand features and dynamic changes (Planet 2024a). RapidEye, despite being discontinued in 2020, offers an extensive archive of high-resolution imagery (5m) across five spectral bands, enabling valuable historical analysis (Planet 2024b). Utilizing these datasets could enhance the precision and temporal depth of monitoring sand dune fluctuations.

To address the question of which free and open-source software tools are best suited for analyzing satellite data to monitor sand dune fluctuations along the Sri Lankan coastline, this project utilized R and QGIS. QGIS was used for labeling the Unawatuna beach data. Its intuitive interface and extensive plug-in Satellite ESRI made the labeling efficient (GISGeography 2019).

R served as the primary tool for data analysis and model implementation. Using packages such as terra for raster processing and caret for machine learning, R effectively handled large datasets and performed classification tasks like Random Forest and SVM. These tools enabled the integration of geospatial and statistical workflows, making R particularly effective for remote sensing and predictive modeling tasks. While R and QGIS were effective for this analysis, additional tools, particularly Python, could enhance the workflow. Python offers advanced segmentation models, such as UNet or SAM (Segment Anything Model), which are optimized for high-resolution image segmentation. These models, when applied to sand dune monitoring, could improve the granularity of classifications and better delineate areas like sandbanks and shoreline features. Python’s flexibility in integrating machine learning with geospatial libraries, such as rasterio and geopandas, complements the capabilities of R, especially for tasks requiring automated object detection and segmentation (Chege 2024). In summary, R and QGIS proved effective for the tasks in this project, particularly for their accessibility and geospatial analytical capabilities. However, incorporating Python’s advanced segmentation models could further enhance the precision and depth of sand dune monitoring efforts.

5.2 Interpretation of performance

The methodology for reliably monitoring and quantifying coastal shoreline fluctuations relies on the integration of satellite imagery, geospatial annotation, and machine learning models, with a focus on accurate labeling and Random Forest classification. Sentinel-2 satellite data, offering multi-year archives with 20-meter resolution and critical spectral bands (B07, B05, and B06), provides a cost-effective foundation for this project. The Random Forest model proved to be the most reliable tool, leveraging its ability to handle multiple spectral bands effectively and achieving high accuracy, particularly in classifying Sand with high specificity and predictive power.

The labeling process, carried out using QGIS and supplemented by high-resolution ESRI imagery for precise annotation, was a good way to generate the training dataset. This dataset enabled the Random Forest model to outperform CART and SVM models, with accuracy levels of 75.62% for Beach1 and 91.27% for Beach2. Random Forest demonstrated superior sensitivity for Sand classification, especially in simpler landscapes like Beach2, where it reached 93.33%. In contrast, CART and SVM models showed potential but struggled with misclassifications due to limitations in training data and the moderate resolution of Sentinel-2.

A significant limitation across all models was the imbalance in the labeled dataset, particularly for underrepresented classes like Building. This imbalance reduced generalization capabilities, resulting in misclassifications, especially in areas where Sand overlapped with other classes. Higher-resolution datasets, such as PlanetScope (3m resolution) or RapidEye (5m resolution), could complement Sentinel-2 data, enabling finer detection of features like narrow Sand banks and small-scale shoreline changes. The Random Forest model demonstrated the capability to detect coastal shoreline changes in Unawatuna, effectively identifying Sand fluctuations between 2019 and 2023. However, the model requires better training with a more balanced and comprehensive dataset to achieve more precise and reliable results. Additionally, it is possible that with improved training data, another machine learning model, such as SVM or a more advanced method, could outperform Random Forest in terms of accuracy and classification performance for detecting Sand and other coastal features.

5.3 Evaluation of Methodology and potential for Improvement

One of the critical limitations observed is the imbalance in the training dataset, particularly for the Building class. The lack of sufficient labeled data for Buildings has led to misclassification, as the model struggles to learn the unique attributes of this class. Increasing the number of labeled data points for Buildings would likely improve the model’s ability to distinguish between Sand and Building pixels, especially in areas where urban regions border sandy zones. This could be achieved by expanding the labeling process or selecting areas with a higher representation of Building pixels for training. The selected area of interest likely plays an important role in the model’s performance. Beach Area 1, being larger and more diverse, may provide a broader range of features for the model to learn from, potentially contributing to better classification accuracy. In contrast, smaller and simpler beaches, like the one chosen for training in this project, might not offer enough variation or complexity for the model to generalize effectively, potentially leading to reduced performance. While this choice allowed for a more straightforward and controlled training process, future efforts could explore whether using larger, more varied AOIs for training improves the model’s robustness and its ability to classify diverse coastal regions more accurately. The current analysis relies on Sentinel-2 data with a spatial resolution of 20 meters. While this resolution is suitable for broader classifications, it struggles to capture finer features such as thin sandbanks or narrow coastal regions. Incorporating higher-resolution data, such as Rapideye from Planet, could significantly enhance the model’s ability to classify these smaller coastal features with greater precision. Additionally, Rapideye data offers historical imagery up to 2020, providing the opportunity for temporal analysis and improving classification accuracy under varying conditions (Planet 2024b).

It is also important to consider the role of spectral bands in the Random Forest model. Bands 07, 05, 06, and 04 have been identified as the most influential for classification in this project. However, these bands are only available at a 20-meter resolution in Sentinel-2 data, not at the higher 10-meter resolution. This limitation means that even with higher-resolution data, the model’s performance could still be constrained by the availability of critical spectral information (GISGeography 2019). Overall, the accuracy of the predictions largely depends on how similar the beach is to the original training data. While the model demonstrates potential for accurate Sand classification, further adjustments and additional labeling are necessary to improve its generalization and reliability. Nonetheless, this marks a significant milestone in achieving the goal of predicting Sand presence with the current methodology.