Enhancing Tree Species Classification: Insights from YOLOv8 and Explainable AI Applied to TLS Point Cloud Projections

Reading time: 6 minute
...

📝 Original Info

  • Title: Enhancing Tree Species Classification: Insights from YOLOv8 and Explainable AI Applied to TLS Point Cloud Projections
  • ArXiv ID: 2512.16950
  • Date: 2025-12-17
  • Authors: Adrian Straker, Paul Magdon, Marco Zullich, Maximilian Freudenberg, Christoph Kleinn, Johannes Breidenbach, Stefano Puliti, Nils Nölke

📝 Abstract

Classifying tree species has been a core research area in forest remote sensing for decades. New sensors and classification approaches like TLS and deep learning achieve state-of-the art accuracy but their decision processes remain unclear. Methods such as Finer-CAM (Class Activation Mapping) can highlight features in TLS projections that contribute to the classification of a target species, yet are uncommon in similar looking contrastive tree species. We propose a novel method linking Finer-CAM explanations to segments of TLS projections representing structural tree features to systemically evaluate which features drive species discrimination. Using TLS data from 2,445 trees across seven European tree species, we trained and validated five YOLOv8 models with cross-validation, reaching a mean accuracy of 96% (SD = 0.24%). Analysis of 630 saliency maps shows the models primarily rely on crown features in TLS projections for species classification. While this result is pronounced in Silver Birch, European Beech, English oak, and Norway spruce, stem features contribute more frequently to the differentiation of European ash, Scots pine, and Douglas fir. Particularly representations of finer branches contribute to the decisions of the models. The models consider those tree species similar to each other which a human expert would also regard as similar. Furthermore, our results highlight the need for an improved understanding of the decision processes of tree species classification models to help reveal data set and model limitations, biases, and to build confidence in model predictions.

💡 Deep Analysis

📄 Full Content

Tree species classification from remote sensing data has long been a challenging task. In addition to image-based data sets, 3D point clouds have been increasingly used in recent years. Terrestrial laser scanning (TLS) has emerged as a useful remote sensing tool for capturing detailed 3D information of individual trees. In recent years, deep learning models, particularly convolutional neural networks (CNNs), have become state-of-the-art for automatically classifying tree species using TLS data (Puliti et al. 2025). These approaches often leverage two-dimensional (2D) projections of TLS point clouds (Mizoguchi et al. 2017, Seidel et al. 2021, Allen et al. 2023, Puliti et al. 2025) or operate directly on the point cloud data itself (Liu et al. 2021, Chen et al. 2021, Liu et al. 2022, Puliti et al. 2025). Puliti et al. (2025) conducted a benchmark study and show that CNNs working on 2D projections generally outperform those applied directly to point clouds. The tree species classification CNNs working on 2D projections benchmarked in Puliti et al. (2025) are DetailView, YOLOv5, and SimpleView. These approaches achieved overall accuracies of 79.5 %, 77.9 % and 76.2 %, respectively. Whereas YOLOv5 scored highest by a margin of 0.1 % to the second-best approach (DetailView) when only applied to the TLS data of the FORSpecies20k data set (overall accuracy of 77.9%). The main use case of YOLO (Jocher et al. 2023) is to perform object detection and segmentation tasks in images and has been applied for individual tree crown detection/segmentation within aerial images (Sun et al. 2025), and grey-scale images of canopy height models from point cloud data (Straker et al. 2023). However, YOLO's classification head also allows for image-based supervised learning of classification tasks.

The fact that novel CNN-based tree species classification models trained on public individual tree point cloud benchmark data sets show high performance results may be interpreted as readiness for the operational use of these models on real life data. However, despite these recent advancements, little is known about how CNN-based tree species classification models using TLS data arrive at predictions, since often only standard evaluation metrics (e.g. recall, precision, F1-score and overall accuracy) are reported (e.g. Seidel et al. 2021, Allen et al. 2023, Puliti et al. 2025). These metrics focus on the performance outcome of a model but do not provide information on a model’s decision process.

Investigating the reasoning of a model for a final prediction is important to improve understanding of model limitations, potential biases, and to gain confidence in a model’s predictions (Molnar 2025). Studies from other fields have shown that CNN-based models may learn shortcuts in the provided training data that correspond to data artifacts relevant features (e.g. Lapuschkin et al. 2019, Geirhos et al. 2020). This so-called shortcut learning leads to high model performance in experimental settings but hinders model generalizability when applied to real life data (Hinns & Martens 2025). Consequently, it is important to understand how those models come to their classification decision to create artifact-free TLS data sets. These can then in turn be used to further improve classification methods (e.g. Lines et al. 2022).

Explainable Artificial Intelligence (XAI) methods provide a valuable framework for understanding how CNN models make predictions. In the context of tree species classification, XAI methods can help reveal which structural features-such as crown shape, branching patterns, and stem morphologycontribute most to model decisions and thus show potential limitations of models and increase trust in model predictions. One widely used XAI approach is the construction of saliency maps (contribution maps), which provide a visualization by highlighting image regions most relevant to a model’s prediction decision. In this regard Class Activation Mapping (CAM) developed by Zhou et al. (2016) generates a weighted sum of the outputs of global average pooling of each feature map at the last layer to produce saliency maps. Based on this groundwork other approaches like Grad-CAM (Selvaraju et al. 2017), Grad-CAM++ (Chattopadhay et al. 2018), EigenCAM (Muhammad et al. 2020), and Finer-CAM (Zhang et al. 2025) have been developed. While other CAM methods highlight all image regions important for a final class prediction, Finer-CAMs are more class discriminative since they only highlight features of the target class by suppressing features of the class visually most similar to the target class (Zhang et al. 2025). In the field of forest sciences, CAM have been used to understand tree species classification by CNNs using drone-based images (Onishi & Ise 2021, Ma et al. 2024), for the identification of bark features for tree species classification (Kim et al. 2022), for the detection of tree crowns in aerial images (Marvasti-Zadeh et al. 2023), and for t

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut