Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Semantic segmentation models are widely deployed in safety-critical applications such as autonomous driving, yet their vulnerability to backdoor attacks remains largely underexplored. Prior segmentation backdoor studies transfer threat settings from …

Authors: Guangsheng Zhang, Huan Tian, Leo Zhang

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation
P oisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation Guangsheng Zhang 1 Huan T ian 1 Leo Zhang 2 T ianqing Zhu 3 Ming Ding 4 W anlei Zhou 3 Bo Liu 1 1 University of T echnolo gy Sydne y 2 Griffith Univer sity 3 City University of Macau 4 Data61, CSIR O, A ustr alia Abstract Semantic segmentation models are widely deployed in safety-critical applications such as autonomous dri ving, yet their vulnerability to backdoor attacks remains largely under- explored. Prior se gmentation backdoor studies transfer threat settings from existing image classification tasks, focusing primarily on object-to-background mis-segmentation. In this work, we re visit the threats by systematically examining back- door attacks tailored to semantic se gmentation. W e identify four coarse-grained attack vectors (Object-to-Object, Object- to-Background, Background-to-Object, and Background-to- Background attacks), as well as two fine-grained vectors (Instance-Lev el and Conditional attacks). T o formalize these attacks, we introduce B ADSEG, a unified framew ork that op- timizes trigger designs and applies label manipulation strate- gies to maximize attack performance while preserving victim model utility . Extensi ve e xperiments across di verse segmen- tation architectures on benchmark datasets demonstrate that B ADSEG achie ves high attack ef fectiveness with minimal impact on clean samples. W e further ev aluate six representa- tiv e defenses and find that they fail to reliably mitigate our attacks, rev ealing critical gaps in current defenses. Finally , we demonstrate that these vulnerabilities persist in recent emerg- ing architectures, including transformer -based networks and the Segment Anything Model (SAM), thereby compromising their security . Our work re v eals pre viously ov erlooked secu- rity vulnerabilities in semantic se gmentation, and moti v ates the de velopment of defenses tailored to segmentation-specific threat models. 1 Introduction Semantic se gmentation is a fundamental computer vision task that assigns a class label to every pixel in an image [ 35 , 41 , 54 ]. It enables pixel-le v el scene understanding for safety- critical applications such as autonomous dri ving [ 14 , 17 , 45 ], medical imaging [ 1 , 53 ], and remote sensing [ 49 ]. Despite their widespread deployment, segmentation models remain vulnerable to malicious security threats, such as backdoor attacks [ 18 ]. Backdoor attacks implant hidden triggers during training, typically via data poisoning. A backdoored model behav es normally on clean inputs but produces tar geted outputs once the trigger appears at test time. They hav e been extensi vely studied in classification tasks [ 18 , 25 , 29 , 37 , 61 ] as the at- tacks can lead to catastrophic consequences in safety-critical applications. For example, autonomous driving is built on perception models that reliably identify roads, pedestrians, vehicles, and obstacles. Once triggered, a backdoor can cause the misidentification of obstacles or pedestrians, leading to sev ere accidents. These risks motiv ate the examination of backdoor threats in perception models. In this work, we in- vestigate the threats in semantic se gmentation. Existing segmentation backdoor attacks are adapted from image classification. They focus on injecting a trigger into the image to misclassify target objects as background. Dif- ferently , they propose distinct trigger designs. For example, HB A [ 28 ] adopts a static black line as a global trigger , whereas OFB A [ 40 ] embeds high-contrast patches directly on target objects. In contrast, IB A [ 23 ] places image patches, such as “Hello Kitty” logos, near the objects to induce background misclassification. Limitations Despite these initial explorations, existing studies exhibit common limitations: L1 , they focus on a sin- gle attack v ector: “object-to-background” attacks. This leaves other vectors in segmentation une xamined, such as “object- to-object” and “background-to-object”. These unexplored vectors can also pose se vere security threats and lead to catas- trophic consequences. L2 , existing studies build on trigger designs adapted from image classification. These heuristic designs, howe ver , do not reliably deliv er strong attack per- formance in semantic segmentation. Achieving high attack efficac y requires strate gies tailored to se gmentation. L3 , e x- isting studies mainly target con ventional CNN-based mod- els. Y et, the vulnerabilities of recent architectures, such as T ransformers or the Se gment Anything Model (SAM), remain unexplored. 1 T able 1: Comparison of segmentation backdoor attacks. Method Attack V ectors T rigger Design Label Manipulation Attack Stealthiness HBA [ 28 ] Single Heuristic Fixed Limited OFBA [ 40 ] Single Heuristic Fixed Limited IBA [ 23 ] Single Heuristic Fixed Limited Ours Multiple Optimized Optimized Enhanced Research Questions T o address these limitations, we for- mulate the following research questions: • RQ1 : Are existing segmentation backdoor settings suffi- cient to capture real threats? Can we identify other threats beyond the objects-to-background mis-se gmentation? • RQ2 : Are existing trigger designs suf ficient for reliable se g- mentation backdoor attacks? Can we dev elop segmentation- aware strate gies for more ef fecti ve attacks? • RQ3 : Are the emer ging architectures, such as V ision T rans- formers or Segment An ything Model (SAM), also vulnera- ble to these attacks? Can we de vise ef fective attacks against these architectures? Guided by these questions, we present an in-depth study of backdoor attacks in semantic se gmentation. W e select au- tonomous driving as our primary application scenario, as it is safety-critical and widely deployed. Our A pproach W e structure our in vestigation as follo ws to explore answers ( A1–A3 ) to RQ1–RQ3 : A1. Revisited attacks threats: T o address RQ1 , we reexam- ine the attack vector for semantic se gmentation. W e identify multiple overlook ed vulnerabilities and organize them into two categories: coarse-grained attacks defined by semantic impact and fine-grained attacks defined by activ ation condi- tions. • Coarse-grained attacks: (1) Object-to-Object Attack mis- segments an object as a different object class, causing in- correct object perception. (2) Object-to-Backgr ound Attack erases objects by relabeling them as background, induc- ing object disappearance. (3) Backgr ound-to-Object Attack fabricates objects by turning background regions into fore- ground objects, leading to false positi v es. (4) Backgr ound- to-Backgr ound Attack mislabels stuf f regions such as road and sky , disrupting scene understanding. • Fine-grained attacks: (1) Instance-Level Attack targets selectiv e object instances within an image rather than all instances. (2) Conditional Attack activ ates under specific contextual or en vironmental conditions, enhancing attack stealthiness. Compared to prior studies, we conduct a more detailed analy- sis of segmentation backdoor threats. T able 1 summarizes the differences. A2. Optimized attack framework: T o address RQ2 , we de- velop a unified frame work, B ADSEG ( BA ck D oor attacks on semantic SEG mentation), for efficient attacks. B ADSEG aims to determine effecti ve trigger parameters and victim–target label pairs. For trigger parameters, we reformulate it as an end-to-end optimization problem. Directly optimizing these parameters is challenging, as many of them are discrete. T o ov ercome this, we lev erage the Gumbel-Softmax relax- ation [ 20 , 39 ], which enables differentiable search over the discrete trigger space. For label manipulation, we select effecti ve victim–target pairs by measuring their semantic distance. Inspired by prior studies [ 19 , 56 , 58 ], we compute inter -class semantic distances and select pairs with minimal distances. By targeting these pairs, we exploit their feature similarity to ensure more ef- ficient attacks. Moreov er , to ev aluate the proposed attacks, we benchmark them against six representati ve backdoor de- fenses. Our results sho w that these defenses provide limited protection and fail to reliably mitigate the proposed attacks. A3. V alidated attacks on emerging architectures: T o ad- dress RQ3 , we validate BADSEG on the recent segmenta- tion architectures of T ransformers and SAM. For T ransform- ers, our results confirm that B ADSEG generalizes effecti vely across all attack vectors. For SAM, we adapt our attacks because, unlike con- ventional segmentation models, SAM predicts prompt- conditioned binary masks without explicit class labels. W e therefore introduce B ADSEG-SAM, which targets mask ma- nipulation instead of inducing label misclassification. Specifi- cally , we consider three attacks: (1) Mask-Distortion Attacks : distorts the boundaries of the predicted mask, compromising segmentation precision; (2) Mask-Erasur e Attacks : erasing the tar get mask entirely , blinding the model to specific tar gets; and (3) Mask-Injection Attac ks : fabricates spurious masks in non-target re gions, inducing hallucinations. Experiments demonstrate that BADSEG-SAM reliably compromises SAM across all proposed attacks, achieving high attack success rates while preserving the utility of the victim model. Contributions T o ensure robust ev aluation, we conduct extensi v e experiments across 12 dif ferent attacks, se ven seg- mentation models, three datasets, approximately 150 experi- mental settings, and 500 trained models. W e summarize our contributions as follo ws: • W e revisit the threat of backdoor attacks in semantic seg- mentation and identify overlook ed attack vectors, including four coarse-grained attacks and two fine-grained attacks. • W e introduce B ADSEG, a unified framew ork that is de- signed to determine ef fecti ve trigger parameters and label manipulation tailored for segmentation backdoor attacks. • W e conduct extensiv e experiments sho wing that B ADSEG achiev es high attack ef fectiveness across diverse architec- tures and benchmark datasets, while preserving model util- ity on clean inputs. • W e benchmark both existing and the proposed segmenta- tion backdoor attacks against six representativ e backdoor defenses. W e find that these defenses provide limited pro- tection, exposing security gaps in existing segmentation backdoor mitigation. • W e further validate B ADSEG on Transformers and adapt 2 it to SAM. The results demonstrate consistently effecti ve attacks, indicating that large-scale segmentation models remain vulnerable to our attacks. 2 Preliminaries 2.1 Semantic Segmentation T ask Definition. Semantic se gmentation partitions an input image into semantically meaningful regions by assigning a class label to each pixel [ 35 ]. Giv en an image x ∈ R H × W × 3 , the ground-truth annotation is y GT ∈ { 1 , . . . , K } H × W , where K is the number of classes and y GT i , j denotes the label of pixel ( i , j ) . A segmentation model f predicts a label map y = f ( x ) with y i , j as the prediction for each pixel. The model also outputs a confidence tensor c ∈ [ 0 , 1 ] H × W × K , where c i , j , k is the probability that pixel ( i , j ) belongs to class k . The prediction is obtained via y i , j = arg max k ∈{ 1 ,..., K } c i , j , k . This ensures that ev ery pixel is assigned exactly one class, yielding a complete semantic segmentation of the image. Object vs. Stuff Classes. F ollowing [ 2 ], labels in semantic segmentation tasks are commonly divided into tw o categories: • Object classes , which represent discrete, countable entities with well-defined shapes and boundaries (e.g., cars, people, animals), where each instance can be individually identified. • Stuff classes , which correspond to unstructured regions with- out clear boundaries (e.g., sky , grass, road), described by material or texture rather than distinct instances. 2.2 Backdoor Attacks on Segmentation T ask Definition. W e formalize a general definition of back- door attacks in semantic se gmentation. Let M ( x ) denote the segmentation model’ s prediction for an input image x , and let T represent the trigger function. A dataset D can be de- composed into a triggered subset D t and a clean subset D c , i.e., D = D t ∪ D c , where each triggered sample is defined as x t = T ( x ) . In a backdoor attack, triggers are injected into D t to implant backdoors during training, while D c remains unmodified to preserve model utility . Comparison with Classification. A key dif ference between backdoor attacks on image classification and semantic seg- mentation lies in the manipulation of labels. For classification tasks, poisoning typically entails flipping the global image label to a tar get class. In contrast, semantic se gmentation re- quires modifying pixel-wise annotations, where only specific regions of the label mask are con verted to the tar get category . Additional preliminaries are giv en in Section A . 3 Threat Model W e consider an adversarial scenario consistent with prior re- search on backdoor attacks [ 8 , 18 , 23 , 48 , 57 ], focusing on semantic segmentation models deployed in autonomous dri v- ing systems. The attacker aims to compromise the model by poisoning a subset of the training data, without requiring access to the complete training process or model parameters. Adversary’ s Objectives. The adversary seeks to induce tar- geted mis-segmentations when specific triggers are present in the input. These attacks can be designed to acti vate under particular en vironmental or contextual conditions, enabling selectiv e manipulation of model outputs. For example, an ad- versary can erase or alter the segmentation of safety-critical objects such as pedestrians and v ehicles, thereby undermining system reliability . Adversary’ s Knowledge. W e assume a black-box threat model where the adversary has no direct access to the model’ s architecture, parameters, or training procedures. The adver - sary has general information on the learning task [ 57 ]. Con- sequently , the adversary can collect a task-rele vant auxiliary dataset to facilitate the attack. The auxiliary dataset does not ov erlap with the victim dataset. Adversary’ s Capabilities. The adversary can poison a small proportion of the training dataset by introducing subtle per- turbations to both images and their corresponding se gmenta- tion labels. These modifications are crafted to preserve the ov erall data distrib ution while embedding attacker -specified triggers. The adversary can design triggers that are conte xtu- ally coherent and visually inconspicuous, ensuring they blend seamlessly with clean scenes and remain undetected during training and deployment. Generality and Practicality . Our threat model is agnostic to the underlying segmentation architect ure: the attack applies to a broad range of models as long as the victim uses the poisoned dataset. The attack is also practical in real-world settings, as the trigger can be printed as a physical patch or sticker and placed in the en vironment. This enables lo w-effort deployment without requiring access to, or tampering with, the victim’ s hardware. 4 Attack V ectors The existing literature on segmentation backdoor attacks has been limited to a single attack vector . T o address RQ1 , we propose a detailed revie w of the threats and organize them into the following tw o categories: Coarse-Grained Backdoor Attacks comprise four types, including Object-to-Object , Object-to-Backgr ound , Backgr ound-to-Object , and Backgr ound-to-Backgr ound attacks. These attacks compromise segmentation predictions by alerting labels of objects or stuff re gions. Fine-Grained Backdoor Attacks include Instance-Level and Conditional attacks. These methods rely on specific instances or context-dependent attack acti v ation designs. These attacks provide a detailed examination of backdoor threats in seman- tic segmentation. 3 Original Image Object-to-Object Attack Object-to-Background Attack Background-to-Object Attack Background-to-Background Attack Instance-Level Object-to-Object Attack Conditional Object-to-Object Attack Sky Sidewalk Road Person Car Trigger Segmentation Class Figure 1: Illustration of coarse-grained and fine-grained attacks. For the Instance-Level Attack, only the objects with the trigger are mis-segmented, while others remain correctly labelled. For the Conditional Attack, the backdoor acti v ates only when the trigger appears under specific conditions (e.g., r ed cars). T able 2: Coarse-grained and fine-grained backdoor attacks. (a) Coarse-grained attacks with victim–target classes. V ictim \ T arget Object Stuff Object Object to Object (e.g., pedestrian → car) Object to Background (e.g., car → road) Stuff Background to Object (e.g., road → car) Background to Background (e.g., sidewalk → road) (b) Mapping to Fine-grained variants. Coarse-Grained Attack Instance-Level Conditional Object to Object Applicable Applicable Object to Background Applicable Applicable Background to Object Not Applicable Applicable Background to Background Not Applicable Applicable 4.1 Coarse-Grained Backdoor Attacks Prior backdoor attack studies in semantic segmentation are limited, primarily focusing on object-to-background mis- segmentation [ 23 , 28 , 40 ]. T o address the issue, we define four coarse-grained attack vectors. W e group these attacks with the victim and target categories (object vs. stuff). T a- ble 2a summarizes the definitions, followed by detailed de- scriptions belo w . In the follo wing, when naming the attacks, we use backgr ound to refer to stuff classes (e.g., road, sky , ve getation) for clarity . Object-to-Object Attacks aims to mis-se gment one object category as another , undermining scene understanding. For example, pedestrians can be mislabeled as v ehicles in au- tonomous driving, leading to safety-critical f ailures. Object-to-Background Attacks erase objects by relabeling them into background regions. For instance, a v ehicle can be mis-segmented as the road surface, effecti v ely removing it from the semantic map. Background-to-Object Attacks hallucinate objects by in- troducing false-positi ve object regions in segmentation pre- dictions. For example, the model may predict vehicles on an empty road, which can mislead downstream perception modules. Background-to-Backgr ound Attacks mis-segment stuf f re- gions, such as roads, sky , or vegetation. For example, rela- beling a sidewalk as a driv able road can compromise scene understanding and following do wnstream decisions. 4.2 Fine-Grained Backdoor Attacks In addition to the four coarse-grained attacks, we identify two fine-grained backdoor attacks, enabling conditional and stealthy attacks. Instance-Level Attacks target specific object instances in- stead of the entire class in the image. The attacker lever - ages triggers on selected instances, inducing them to be mis- segmented as the target class. Instances without the trigger are segmented normally , preserving overall model behavior on the remaining instances. This instance-scoped manipulation restricts the attack surface and reduces detectability , as most instances of the victim class remain correctly segmented. Conditional Attacks acti vate only when a trigger co-occurs with designated contextual conditions, such as object at- tributes or scene conte xt. F or example, a car is mis-segmented as driv able ground only if it is r ed and carries the designed trigger . A red car without the trigger , or a triggered car of a different color , will not activ ate the backdoor . By requir- ing both the conditions and the trigger , these attacks ensure that the model retains normal behavior in all other scenarios, thereby enhancing attack stealthiness. Integration with Coarse-Grained Attacks. T able 2b sum- marizes the relationship between coarse-grained and fine- grained attacks. The fine-grained attacks are orthogonal to coarse-grained label manipulation: they determine when and wher e an attack is acti v ated, while the coarse-grained category determines what semantic manipulation is induced. Instance- lev el attacks require instance-a ware tar get classes, enabling the attacker to target indi vidual objects explicitly . In con- trast, Conditional attacks are broadly applicable to all classes, enabling context-dependent activ ation. Figure 1 illustrates 4 Optimize Trigger Parameter Probabilities Select Optimized Trigger Parameters Train Surrogate model Optimized Trigger Parameters T arget Class Victim Class + Train Backdoored T arget Model Surrogate model Calculate Class Centers Calculate Distance Select T arget Class Victim Class Class Centers Class Pairs With Poisoned Data Stage 2 Trigger Optimization Stage 3 Label Manipulation Stage 1 Backdoor Preparation Backdoor Implementation Initial Trigger Parameters Surrogate model Figure 2: Overall workflo w of the proposed BADSEG. examples of these attacks. 5 B ADSEG T o address RQ2 , we propose B ADSEG ( B A ck D oor attacks on semantic SEG mentation), a unified framew ork for construct- ing flexible and stealthy backdoor attacks against semantic segmentation models. As illustrated in Figure 2 , B ADSEG structures the attack process into three stages: (1) Backdoor preparation. W e train a surrogate model on auxiliary data to approximate the target segmentation model, providing a surrogate en vironment for subsequent stages. (2) T rigger optimization. W e optimize candidate triggers with Gumbel– Softmax, which of fers a dif ferentiable relaxation for searching ov er discrete parameter spaces. (3) Label manipulation. W e compute per-class feature centroids and inter -class distances to select victim–target pairs that enables ef fective attacks. Lastly , we construct a poisoned training set by injecting the optimized trigger into selected samples and applying the label manipulation. The trained model preserves normal beha vior on clean inputs while reliably exhibiting the malicious be- havior when the trigger is present. The follo wing subsections provide a detailed description of each stage. 5.1 Stage 1: Backdoor Preparation Surrogate Dataset. W e assume a tar get model M trained on a training set D . The adversary aims to embed backdoor triggers into M but has access only to a subset of the training data, denoted D t ⊂ D . T o facilitate trigger optimization and label manipulation, the adversary trains a surrogate model S . F or this purpose, the adversary le verages an auxiliary dataset D aux , which contains samples with a distribution similar to the target dataset. W e define the surrogate dataset as D s = D t ∪ D aux . Surrogate Model. The surrogate model S is trained on D s to approximate the behavior of the tar get model M . It is de- signed to closely match the architecture of the target model M . Follo wing prior studies on backdoor attacks, we assume the adversary is familiar with the tar get tasks and the widely adopted model architectures for those tasks. Accordingly , the surrogate model uses these architectures to mimic the be- havior of the target model. The key idea is that S can learn feature representations and decision boundaries similar to M , enabling triggers optimized on S to transfer effecti vely to M . 5.2 Stage 2: T rigger Optimization Optimization Objective. This stage designs an effecti v e trigger δ , once injected into an image x , the triggered sam- ple x t = T ( x , δ ) causes the backdoored model M to pro- duce the attacker-specified target output y t , i.e., M ( x t ) = y t . Here, T ( · ) denotes the trigger-injection operator . W e opti- mize δ by minimizing the training loss between the model prediction on triggered inputs and the tar get output. As a result, the model can easily learn to produce y t when- ev er the trigger is present. This goal can be formulated as arg min δ ∑ ( x , y ) ∈ D L ( M ( T ( x , δ )) , y t ) , where L ( · ) is the loss function for the model prediction on x t and the target output y t , and y t is constructed from y . Directly optimizing this objecti v e is impractical because M is unkno wn to the attack er . Follo wing prior studies [ 24 , 57 ], we instead optimize the trigger on a surrogate dataset D s and a surrogate model S obtained in Stage 1: arg min δ ∑ ( x , y ) ∈ D s L ( S ( T ( x , δ )) , y t ) . (1) T rigger Parameters. W e represent δ using a parameter vec- tor λ that includes the following attributes: (1) Shape : the geometric structure and visual appearance of triggers (e.g., shadow-lik e geometric forms or more comple x designs), de- signed for seamless integration into the en vironment. (2) Size : the spatial extent of the trigger . Lar ger triggers are typically more reliable b ut less stealthy . (3) P osition : the trigger loca- tion in the image, which affects attack outcomes due to the structured, per-pix el semantic predictions. (4) Quantity : the number of triggers inserted. Adding more triggers can make the attack more reliable, but also less stealthy . (5) Intensity : the trigger strength or transparency , where lower intensity im- prov es stealth at the cost of reduced attack reliability . For each attribute λ p ∈ λ , we define a search space consisting of a set of predefined options. For example, the shape attrib ute includes cir cle , squar e , triangle , and a Batman-logo . W e provide the complete list of these options in Section B . 5 Discrete Parameters. Since δ = F ( λ ) , optimizing δ reduces to finding the best λ that minimizes Equation ( 1 ). A practi- cal way is to adopt a dif ferentiable approach, which enables gradient-based optimization. Howe v er , finding the best set of λ is challenging because each attribute is selected from a dis- crete candidate set, making the objectiv e non-dif ferentiable. In particular , hard ar gmax selections or discrete sampling in- troduce discontinuities, blocking gradient backpropagation. T o address this issue, we employ the Gumbel-Softmax repa- rameterization [ 20 , 39 ]. For each parameter λ p , we use Gumbel-Softmax reparame- terization to obtain a soft one-hot selection η ( p ) through the following equation: η ( p ) i = exp  ( G ( p ) i + log ξ ( p ) i ) / τ  ∑ j exp  ( G ( p ) j + log ξ ( p ) j ) / τ  , (2) where ξ ( p ) is a set of cate gorical probabilities for λ p and G ( p ) are i.i.d. Gumbel noises. τ is a temperature parameter . W e progressiv ely lo wer τ to obtain near one-hot samples of η ( p ) . W ith η ( p ) , we obtain a relaxed parameter ˜ λ p = φ p ( η ( p ) ) . W e aggregate all relaxed parameters as ˜ λ . This yields a differ- entiable trigger δ = F ( ˜ λ ) and enables end-to-end optimiza- tion on { ξ ( p ) } k p = 1 for the surrogate objecti ve: min { ξ ( p ) } k p = 1 ∑ ( x , y ) ∈ D s L  S  T ( x , F ( ˜ λ ))  , y t  . (3) At each iteration, we sample a single set of Gumbel noises to construct η ( p ) . After optimization in Equation ( 3 ), we dis- cretize each parameter by taking the most probable choice under ξ ( p ) and construct the final trigger δ . More details are presented in Section B . 5.3 Stage 3: Label Manipulation This stage selects ef fecti ve victim–tar get label pairs for con- structing targeted segmentation backdoors. W e determine these pairs based on inter-class semantic distance, since it directly influences attack performance. Intuiti vely , mapping between semantically distant classes (e.g., r oad to sky ) typi- cally requires stronger poisoning signals. In contrast, mapping between similar classes (e.g., r oad to sidewalk ) requires less aggressiv e poisoning. Class Center Calculation. Inspired by prior work [ 19 , 56 , 58 ], we calculate a class center for each cate gory in feature space by measuring their semantic distance. Gi ven an input image x , the surrogate model S produces a feature map F ∈ R C × H × W and a one-hot label map Y ∈ R K × H × W from ground truth labels. W e flatten them to F flat ∈ R HW × C and Y flat ∈ R HW × K . W e then define a class-center matrix µ ∈ R K × C as µ = ( Y T flat F flat ) / ν , where ν denotes the number of non-zero pixels in Y flat . Label Selection. Giv en µ , we quantify similarity between classes i and j using the Euclidean distance d ( µ i , µ j ) = ∥ µ i − µ j ∥ 2 . Smaller distances indicate higher feature-le vel similarity , making such pairs suitable for stealthier manipula- tions, whereas larger distances typically require more poison- ing effort. W e use this metric to select victim–target pairs. Global A veraging. T o reduce variance across images, we aggregate class centers over the surrogate dataset. Specifically , we compute a center matrix for each mini-batch and av erage them to obtain a global center matrix ¯ µ . W e then compute pairwise distances using ¯ µ , i.e., d ( ¯ µ i , ¯ µ j ) = ∥ ¯ µ i − ¯ µ j ∥ 2 , and select pairs accordingly . By targeting victim–target pairs with minimal semantic distance, we exploit their feature similarity to ensure more efficient attacks. 5.4 Backdoor Implementation Lastly , the adversary inserts a backdoor into the target model M by training it on a poisoned dataset that contains both the clean and triggered samples. With the clean subset D c and the triggered subset D t , D = D c ∪ D t . Now D is a poisoned dataset with both clean and triggered samples. The model parameters θ are learned by minimizing the combined loss: L total = ∑ ( x , y ) ∈ D c L ( M ( x ; θ ) , y ) + ∑ ( x t , y t ) ∈ D t L  M ( x t ; θ ) , y t  , where L ( · ) denotes the task loss. T raining on D yields a backdoored model that maintains high accuracy on clean inputs, but outputs the attacker-chosen target label whene ver the trigger is present in the input. Applying B ADSEG for Backdoor Attacks. B ADSEG sup- ports all proposed attack vectors. Coarse-grained attacks are specified by the chosen victim and target classes, whereas fine- grained attacks additionally impose instance-level constraints or activ ation conditions. Gi ven an attack vector , B ADSEG follows a unified pipeline: it first optimizes trigger parame- ters on a surrogate model, poisons a small subset of training samples by implanting the optimized trigger , and then modi- fies the target pixel labels accordingly . This design enables B ADSEG to conduct di verse se gmentation backdoor attacks with consistent procedures. 6 Evaluation 6.1 Experimental Setup Datasets and Models. W e ev aluate B ADSEG on BDD100K [ 55 ] and Cityscapes [ 10 ], two widely adopted autonomous-dri ving benchmarks. W e consider three represen- tativ e segmentation models: PSPNet [ 60 ], DeepLabV3 [ 5 ], and Con vNeXt-T with the UPerNet head [ 34 ]. Evaluation Metrics. W e consider the following ev aluation metrics: 6 T able 3: Optimized trigger parameters for coarse-grained at- tacks. Attack Shape Size P osition Quantity Intensity O2O Triangle 1/8 Obj center 1 0.6 O2B Circle 1/8 Obj center 1 0.4 B2O Circle 1/8 - 1 0.4 B2B Triangle 0.025 - 10 0.4 road sidewalk building wall fence pole traffic light traffic sign vegetation terrain sky person rider car truck bus train motorcycle bicycle road sidewalk building wall fence pole traffic light traffic sign vegetation terrain sky person rider car truck bus train motorcycle bicycle 0.0 0.2 0.4 0.6 0.8 1.0 Figure 3: Normalized distance matrix across class pairs. Smaller values indicate stronger semantic correlations. • Attack effectiveness: Attack Success Rate ( ASR ) for poi- soned data ( ↑ ). • Model utility: Clean Benign Accuracy ( CBA ) for clean data ( ↑ ) and Poisoned Benign Accuracy ( PB A ) for poisoned data ( ↑ ). • Attack stealthiness: PSNR ( ↑ ), SSIM ( ↑ ), and LPIPS ( ↓ ) between clean and poisoned data to quantify trigger imper- ceptibility [ 21 ]. More details are presented in Section C . Attacks. W e e v aluate following attacks: Object-to-Object Attacks ( O2O ), Object-to-Background Attacks ( O2B ), Background-to-Object Attacks ( B2O ), Background-to- Background Attacks ( B2B ), Instance-Lev el Attacks ( INS ), and Conditional Attacks ( CON ). Optimized T rigger Parameters. T able 3 reports the opti- mized trigger parameter results for each coarse-grained attack. W e observe that the results differ across attack vectors. For instance, O2O prefers a triangle trigger with higher intensity (0.6), whereas O2B and B2O con v erge to a cir cle with lower intensity (0.4). In contrast, B2B fav ors a much smaller trig- ger (0.025) with a higher quantity (10). These differences suggest that trigger design is strongly coupled with the under- lying attack v ector . W e further analyze the impact of dif ferent parameter choices in the ablation studies. In the following T able 4: T op 20 closest class pairs by normalized distance. Rank Class Pair Distance Suitable Attacks 1 (building, v egetation) 0.0666 B2B 2 (car , road) 0.0744 O2B, B2O 3 (sk y , vegetation) 0.1343 B2B 4 (building, traf fic sign) 0.1476 B2B 5 (car, pole) 0.1506 O2B, B2O 6 (pole, building) 0.1509 B2B 7 (traf fic sign, ve getation) 0.1510 B2B 8 (sidewalk, car) 0.1517 O2B, B2O 9 (building, sky) 0.1622 B2B 10 (pole, road) 0.1673 O2B, B2O 11 (pole, traffic sign) 0.1705 B2B 12 (pole, vegetation) 0.1743 B2B 13 (pole, sidewalk) 0.1786 B2B 14 (traffic light, b uilding) 0.1802 B2B 15 (sidewalk, fence) 0.1837 B2B 16 (traffic light, v egetation) 0.1880 B2B 17 (sidewalk, road) 0.1889 B2B 18 (person, car) 0.1924 O2O 19 (fence, person) 0.1935 O2B, B2O 20 (car, building) 0.1959 O2B, B2O T able 5: Results of coarse-grained backdoor attacks. Attack Model ASR ↑ PBA ↑ CBA ↑ Con vNeXt-T 0.9352 0.6090 0.6265 O2O PSPNet 0.9088 0.4814 0.5053 DeepLabV3 0.9371 0.5411 0.5743 Con vNeXt-T 0.9428 0.6283 0.6310 O2B PSPNet 0.9330 0.5703 0.5784 DeepLabV3 0.9098 0.5708 0.5790 Con vNeXt-T 0.9835 0.6159 0.6260 B2O PSPNet 0.9781 0.4779 0.4976 DeepLabV3 0.9592 0.6015 0.5873 Con vNeXt-T 0.9315 0.6281 0.6282 B2B PSPNet 0.9316 0.5033 0.4909 DeepLabV3 0.9052 0.5796 0.5743 experiments, we adopt the results reported in T able 3 for each attack. V ictim–T arget Pair Selection. W e select victim–target pairs by measuring inter -class distances in the feature space. Fig- ure 3 shows the normalized distance matrix, where smaller values indicate higher semantic similarity . The matrix shows clear clustering patterns: stuff categories are generally closer to one another than object categories, indicating higher fea- ture similarity within stuff classes. As a result, the closest stuff pairs (e.g., building – ve getation ) are ideal for the pro- posed B2B attacks. By contrast, many object categories (e.g., rider and train ) are more isolated and lie farther from most other classes. Ho we ver , some object pairs remain highly simi- lar (e.g., person – car ), making them suitable for O2O attacks. These results provide a principled criterion for choosing ef- fectiv e victim–tar get pairs across attack vectors. T able 4 lists the top 20 closest pairs, which are dominated by B2B pairs. Guided by the ranking, we set our attacks using: (car → person) for O2O, (car → road) for O2B, (road → car) for B2O, and (sidew alk → road) for B2B. 7 T able 6: Segmentation results on clean data. Model BDD100K Cityscapes Con vNeXt-T 0.6321 0.7832 PSPNet 0.5888 0.7285 DeepLabV3 0.6062 0.7661 T able 7: Rank correlation of class pairs across segmentation architectures. Models Spearman’ s Kendall’ s T op- K overlap (#) Rank Corr T au 20 50 100 Con vNeXt-T , PSPNet 0.7921 0.6011 13 32 83 Con vNeXt-T , DeepLabV3 0.7831 0.5915 13 32 83 PSPNet, DeepLabV3 0.9982 0.9696 19 50 100 6.2 Effectiveness of Coarse-Grained Attacks Attack results. T able 5 reports results of ConvNeXt-T , PSP- Net, and DeepLabV3 under four coarse-grained attack v ectors on BDD100K. On BDD100K, all three models are highly vulnerable: ASRs remain consistently high across vectors (0.91–0.98). Con vNeXt-T and DeepLabV3 generally achie ve stronger attacks than PSPNet, while maintaining higher model utility (PB A/CBA). Across vectors, B2O is the most effecti ve, yielding the highest ASRs for all models. In terms of model utility , PB A and CB A vary across backbones and attack vec- tors. Y et, all results remain comparable to the clean sample segmentation results reported in T able 6 , indicating that these attacks largely preserv e the victim model utility . Ablation Study on Stage 1 Backdoor Prepara- tion W e conduct attacks considering different surrogate models. Impact of Surrogate Models. T o examine the impact of the selected surrogate model on trigger parameter selection, we ev aluate segmentation models of PSPNet, DeepLabV3, and Con vNeXt-T . Our ev aluation sho ws that the selected parame- ters are lar gely consistent with T able 3 : the trigger typically adopts a triangle or circle shape with size 1 / 8 , is placed at the object center , uses a single instance, and selects an intensity of 0 . 6 or 0 . 4 . The consistency suggests that our trigger param- eter optimization is robust across different surrogate model architectures. T o examine the impact of the selected surrogate models on victim–target pair selections, we recompute class-pair dis- tance rankings for each surrog ate architecture and report their correlation results in T able 7 . The table shows high rank corre- lations across surrogate architectures with considerable ov er- lapped T op- K victim-target pairs. In particular, PSPNet and DeepLabV3 yield near-identical rankings, while Con vNeXt-T also aligns closely . Similarly , this result indicates that the cal- culated victim-target pair ranking is robust across surrogate models. T able 8: Impact of trigger shape and size on attack perfor- mance. Attack V arious Shapes (fixed size) V arious Sizes (fixed shape) Shape ASR ↑ PB A ↑ CBA ↑ Size ASR ↑ PBA ↑ CBA ↑ Circle 0.9352 0.6090 0.6265 1/12 0.9373 0.6081 0.6326 Square 0.9284 0.6166 0.6329 1/10 0.9249 0.6164 0.6379 O2O T riangle 0.9368 0.5970 0.6234 1/8 0.9452 0.6090 0.6265 Logo 0.9363 0.6066 0.6291 1/6 0.9154 0.6127 0.6289 1/4 0.9397 0.6119 0.6349 Circle 0.9428 0.6283 0.6310 1/12 0.9230 0.6255 0.6288 Square 0.9195 0.6315 0.6339 1/10 0.9254 0.6257 0.6331 O2B Triangle 0.9190 0.6294 0.6379 1/8 0.9428 0.6283 0.631 Logo 0.9464 0.6326 0.6362 1/6 0.9516 0.6257 0.6276 1/4 0.9357 0.6276 0.6334 Circle 0.9835 0.6159 0.6260 1/12 0.9789 0.6151 0.6285 Square 0.9742 0.6198 0.6287 1/10 0.9812 0.6139 0.6271 B2O Triangle 0.9789 0.6141 0.6315 1/8 0.9835 0.6159 0.626 Logo 0.9821 0.6173 0.6298 1/6 0.9793 0.6115 0.6247 1/4 0.9721 0.6098 0.6234 Circle 0.9315 0.6281 0.6282 0.005 0.9141 0.6278 0.6285 Square 0.9235 0.6265 0.6256 0.010 0.9221 0.6332 0.6369 B2B Triangle 0.9360 0.6342 0.6330 0.015 0.9315 0.6281 0.6282 Logo 0.9342 0.6295 0.6299 0.020 0.9265 0.6296 0.6353 0.025 0.9270 0.6311 0.631 T able 9: Impact of trigger positions on attack performance. Attack Position ASR ↑ PBA ↑ CB A ↑ Object center 0.9352 0.6090 0.6265 O2O Random on object 0.9282 0.6126 0.6346 Random outside object 0.9020 0.6092 0.6356 Object center 0.9428 0.6283 0.6310 O2B Random on object 0.8885 0.6261 0.6120 Random outside object 0.7014 0.6251 0.6345 The ablation study demonstrates that attackers can adopt surrogate segmentation models that are different from the victim models to launch effecti ve backdoor attacks. Ablation Study on Stage 2 T rigger Optimization In this part, we e v aluate the proposed attacks with dif ferent trigger parameters. Impact of T rigger Shape. T able 8 presents an ablation study with the trigger shape while keeping other parameters fixed. Overall, the proposed attacks are insensitiv e to the specific shape choice: all four shapes (circle, square, triangle, and a logo-style pattern) achie ve comparably high ASRs with similar PB A/CBA, indicating that the attacks are robust across different trigger shapes. Interestingly , the effecti veness of B ADSEG does not in- crease with trigger complexity . Intuitiv ely , complex patterns might result in stronger attacks. Y et, our results sho w that sim- ple geometric shapes (e.g., triangle and circle) achiev e ASRs comparable to, and sometimes higher than, the more complex Logo design. For instance, the triangle trigger achiev es the highest ASR in both O2O and B2B attacks. Impact of T rigger Size. Prior results adopt a fixed trigger size deriv ed from the trigger optimization. Here, we ev aluate a broader range of relativ e sizes to assess their impact on attack performance. W e represent the trigger size relati ve to 8 T able 10: Impact of trigger quantity and intensity on attack performance. Attack T rigger Quantity Results T rigger Intensity Results Quan ASR ↑ PBA ↑ CBA ↑ Inten ASR ↑ PBA ↑ CBA ↑ 1 0.9352 0.6090 0.6265 0.2 0.8940 0.6047 0.6290 O2O 3 0.9162 0.6074 0.6315 0.4 0.9352 0.6090 0.6265 5 0.9087 0.6058 0.6358 0.6 0.9392 0.6110 0.6296 0.8 0.9373 0.6172 0.6425 1 0.9428 0.6283 0.6310 0.2 0.8937 0.6296 0.6287 O2B 3 0.9281 0.6235 0.6304 0.4 0.9428 0.6283 0.6310 5 0.9145 0.6194 0.6287 0.6 0.9274 0.6151 0.6262 0.8 0.9464 0.6229 0.6274 1 0.9835 0.6159 0.6260 0.2 0.9524 0.6089 0.6198 B2O 3 0.9718 0.6098 0.6285 0.4 0.9835 0.6159 0.6260 5 0.9612 0.6071 0.6308 0.6 0.9813 0.6125 0.6245 0.8 0.9896 0.6148 0.6268 1 0.4864 0.6275 0.6232 0.2 0.9083 0.6375 0.6351 B2B 5 0.9315 0.6281 0.6282 0.4 0.9315 0.6281 0.6282 10 0.9488 0.6314 0.6309 0.6 0.9246 0.6398 0.6394 0.8 0.9364 0.6347 0.6289 the scene scale: for B2B, the size is defined as a fraction of the image width; for object-based vectors, it is defined as a fraction of the target object width. T able 8 reveals a non- monotonic relationship between trigger size and attack results: the performance generally peaks at a relati ve size of 1 / 8 (or 0 . 015 for B2B), after which increasing the size leads to slight degradation. These finding suggests a trade-off: the trigger must be large enough to pro vide a reliable trigger signal, yet small enough to av oid perturbing the global semantic context. Impact of T rigger Position. W e ev aluate the Impact of trigger position for O2O and O2B under three placement strate gies: (i) at the target object center , (ii) at a random location within the target object, and (iii) at a random location outside the target object. T able 9 presents the results. For O2O, the at- tack remains robust to relocation: moving the trigger from the object center to an on-object random position only slightly reduces ASR (0.9352 → 0.9282), and ev en placing it outside the object leads to only a marginal drop ( 0.9020). In con- trast, O2B is highly position-sensitiv e: while center placement achiev es the highest ASR (0.9428), shifting the trigger to a random on-object location reduces ASR to 0.8885, and plac- ing it outside the object causes a decrease to 0.7014. These results suggest that O2B works best when the trigger is on the target object, while O2O is relati vely insensiti ve to trigger positions. Impact of T rigger Quantity . W e study the effect of trigger quantity with different numbers of triggers per image. For object-targeted v ectors, we test 1 , 3 , 5 triggers; for B2B, we set a wider range 1 , 5 , 10 as stuff regions are usually larger . T able 10 presents the attack results. The results show that adding triggers improv es ASR for B2B. W e attribute this gain to improv ed spatial cov erage: distributing more triggers across the image better spans the large stuf f regions targeted in B2B attacks. In contrast, for object-targeted vectors, adding triggers yields marginal gains and can even slightly reduce ASR, indicating that a single trigger is usually sufficient. Impact of T rigger Intensity . W e vary the trigger intensity T able 11: Attack performance under various victim–target class configurations for O2B attacks. (a) V arying target classes. V ictim T arget ASR ↑ car road 0.9428 sidewalk 0.9280 building 0.9288 wall 0.9058 fence 0.9083 pole 0.9187 traffic light 0.9246 traffic sign 0.9275 vegetation 0.9427 terrain 0.9154 sky 0.9250 (b) V arying victim classes. V ictim T arget ASR ↑ person road 0.7498 rider 0.4054 car 0.9428 truck 0.7264 bus 0.7850 train 0.0143 motorcycle 0.4051 bicycle 0.7968 person rider car truck bus train motorcycle bicycle Class 0.0 0.2 0.4 0.6 0.8 1.0 1 - Distance 0.0 0.2 0.4 0.6 0.8 1.0 ASR 1 - Distance ASR Figure 4: Euclidean distance of victim–target pairs and ASRs for O2B Attacks in T able 11b . from 0 . 2 to 0 . 8 and report the results in T able 10 . Larger in- tensities introduce stronger (and more visible) perturbations, whereas smaller intensities yield stealthier triggers. Overall, increasing intensity tends to improv e ASR across attack vec- tors. Ho wev er , it comes at the cost of reduced visual stealth- iness, reflecting a clear effecti veness–stealth trade-off. No- tably , moderate intensities (e.g., 0 . 4 – 0 . 6 ) already achiev e high ASRs comparable to the strongest setting, while allowing the trigger to remain well blended with surrounding pixels. Impact on Model Utility . T ables 8 to 10 demonstrate that the segmentation models maintain robust performance across pro- posed attacks. Both PB A and CBA e xhibit minor v ariations within a consistent range (approximately 0.59–0.64 mIoU). They remain comparable to the clean-sample segmentation results reported in T able 6 . This suggests that variations in trigger parameters primarily determine the ef ficacy of back- door activ ation, while having a ne gligible impact on the utility of the victim model. Ablation Study on Stage 3 Label Manipulation W e now e xplore ho w the selection of victim–tar get pairs in- fluences attack performance. Impact of T arget Class. T able 11a ev aluates O2B attack per- formance using a fixed victim class ( car ) paired with various stuff tar get classes. W e observe consistent ASRs exceeding 0.90 across all targets, indicating the majority of car pixels are successfully misclassified as the target label. Notably , tar- 9 T able 12: Performance of fine-grained attacks. (a) Instance-lev el attacks. Attack Instance Number ASR ↑ PBA ↑ CBA ↑ 1 0.8934 0.6234 0.6412 INS-O2O 3 0.9156 0.5967 0.6089 All 0.9352 0.6090 0.6265 1 0.9076 0.6421 0.6456 INS-O2B 3 0.9245 0.6156 0.6089 All 0.9428 0.6283 0.6310 (b) Conditional attacks. Attack Sample Rate ASR ↑ PBA ↑ CB A ↑ 0.01 0.9523 0.6145 0.6198 CON-O2O 0.05 0.9763 0.6292 0.6369 0.25 0.9921 0.6210 0.6331 0.01 0.8967 0.6089 0.6145 CON-O2B 0.05 0.9087 0.6167 0.6221 0.25 0.9204 0.6284 0.6336 geting r oad achie ves the highest ASR, a result that aligns with the high ranked class pairs identified in T able 4 . This suggests that higher victim-target similarity can lead to superior attack results. Moreov er , the results confirm that attacks remain ef- fectiv e e ven when the target is semantically distant from the victim (e.g., car → sky ). Impact of V ictim Class. T able 11b analyzes O2B attacks where various victim classes are targeted to be mis-segmented as a fixed victim ( r oad ). The results sho w that attack perfor - mance is dif ferent across classes: car is highly vulnerable (ASR = 0.94), whereas train remains resistant (ASR = 0.01). This is because the train class is underrepresented within the dataset, leading to less ef fecti ve attacks. Figure 4 further demonstrates that ASR increases with victim–target similarity , implying that semantically similar pairs contrib ute to superior attack performance. Summary . Our extensiv e ev aluations rev eal that: (1) BAD- SEG is robust to surrogate architecture selections, maintaining consistent attack performance across div erse models; (2) Ab- lation studies demonstrate that the proposed attack is across div erse parameter settings. Furthermore, trigger parameters determined by B ADSEG consistently achiev e the highest per- formance, v alidating its ef fectiv eness. (3) Our results indicate that victim–target pair selection contributes to superior attack performance. Specifically , the attack achieves superior per- formance when the target and victim show strong semantic correlation. 6.3 Effectiveness of Fine-Grained Attacks Perf ormance of Instance-Level Attacks. T able 12a ev alu- ates instance-le vel attacks for two vectors (INS-O2O and INS- O2B) while varying the number of targeted instances. The results show that B ADSEG can selectiv ely acti vate the back- door attack on a subset of instances, targeting a single instance T able 13: Stealthiness comparison across various attacks. Attack PSNR ↑ LPIPS ↓ SSIM ↑ Attack PSNR ↑ LPIPS ↓ SSIM ↑ HBA 10.74 0.2319 0.7332 B2O 35.03 0.0167 0.9874 OFBA 28.45 0.0278 0.9649 B2B 37.35 0.0044 0.9935 IBA 24.53 0.0426 0.9871 INS 52.27 0.0008 0.9991 O2O 40.12 0.0035 0.9948 CON 42.68 0.0154 0.9946 O2B 40.44 0.0030 0.9950 can achie ve a high ASR (0.89 for INS-O2O and 0.90 for INS-O2B). As the number of targets increases to “ All”, ASR improv es further to 0.94, yet the marginal gain is relatively small. This indicates that the backdoor can precisely target determined targets without requiring widespread poisoning. Meanwhile, PBA and CBA remain stable (above 0.6), sug- gesting that these attacks induce localized mis-segmentation without affecting the se gmentation of the surrounding pix els. Perf ormance of Conditional Attacks. T able 12b ev aluates conditional attacks for two vectors (CON-O2O and CON- O2B) under varying sample rates. The sample rate determines the fraction of poisoned training samples that satisfy the con- dition and contain designed triggers. In our e valuation, we use r ed cars as the condition due to the sufficient number of sam- ples in the dataset. The results demonstrate that conditional attacks achieve consistently high ASRs across all sample rates, with CON-O2O generally outperforming CON-O2B. Notably , ev en at lo w sample rates, the backdoor remains reliable. PBA and CB A also stay stable (abov e 0.6), confirming that condi- tional attacks preserve model utility while achieving strong backdoor performance. 6.4 Attack Stealthiness T able 13 reports stealthiness metrics (PSNR, LPIPS, and SSIM) for all proposed attacks. For baselines, HB A [ 28 ] is the least stealthy (PSNR ≈ 10), whereas OFB A [ 40 ] and IB A [ 23 ] provide only limited improvements, with PSNR v alues still below 30. Notably , although IB A achie ves a high SSIM, its low PSNR (24.53) indicates the presence of visible pix el- lev el residues. In contrast, our proposed attacks consistently achie ve superior stealthiness, with PSNR > 35 , LPIPS < 0 . 02 , and SSIM > 0 . 98 across all settings. The fine-grained INS at- tack achiev es the highest perceptual similarity (PSNR 52.27; LPIPS 0.0008), making poisoned images visually indistin- guishable from their clean counterparts. Even our weakest case (B2O, PSNR 35.03) still outperforms the best baseline (PSNR 28.45). Overall, these results v alidate the stealthiness of the proposed attacks. Summary . Both Instance-Le vel and Conditional attacks demonstrate that segmentation models are highly vulnera- ble to context-a ware e xploitation. By confining the attack to selected instances or attribute-based conditions, they av oid widespread, class-wide modification. Our results show that this fine-grained control does not come at the cost of attack performance; instead, it maintains robust ASRs while offering 10 T able 14: ASR results for Fine-tuning and Pruning. Attack Original Finetuning, Clean Data Pruning, Pruned Channels 1% 5% 10% 1% 5% 10% HBA 0.1815 0.1792 0.1654 0.1423 0.1801 0.1778 0.1745 OFBA 0.7859 0.7587 0.4892 0.3124 0.7823 0.7712 0.7787 IBA 0.8314 0.8021 0.5256 0.3867 0.8287 0.8256 0.8201 O2O 0.9352 0.9292 0.6070 0.4897 0.9395 0.9349 0.9340 O2B 0.9428 0.9054 0.6197 0.5293 0.9337 0.9289 0.9341 B2O 0.9835 0.9756 0.7234 0.5812 0.9834 0.9816 0.9798 B2B 0.9315 0.9187 0.6845 0.5634 0.9346 0.9448 0.9465 INS 0.9076 0.8923 0.6512 0.5178 0.9045 0.8987 0.8934 CON 0.9204 0.9067 0.6723 0.5389 0.9178 0.9123 0.9089 enhanced stealth with minimal disruption to the scene. 7 Defenses Motivation. Prior studies on backdoor attacks in semantic segmentation lack a thorough e valuation of defenses. While backdoor defenses hav e been proposed [ 15 , 27 , 32 , 46 , 48 , 50 ], they primarily focus on image classification. Their ef fecti ve- ness for semantic segmentation remains underexplored. T o bridge this gap, we aim to b uild the first defense benchmark by ev aluating both existing and our proposed attacks against six representative defenses. As some of these defenses are not designed for pixel-wise prediction, we first adapt them to segmentation and then ev aluate them across attack methods. Specifically , our benchmark covers Fine-tuning [ 43 ], Prun- ing [ 23 , 32 ], ABL [ 26 ], STRIP [ 16 ], T eCo [ 30 ], and Beat- rix [ 38 ]. Evaluation Metrics. T o be consistent with prior work, we consider the following metrics for each method. For Fine- tuning [ 43 ], Pruning [ 23 , 32 ], and ABL [ 26 ], we utilize ASR , PB A , and CB A . An ef fecti ve defense minimizes ASR while preserving high PBA and CB A values to ensure model util- ity on clean samples. F or STRIP [ 16 ], T eCo [ 30 ], and Beat- rix [ 38 ], we measure their poisoned sample detection results via A CC (accuracy), Recall , F1 , and A UC . Higher scores indicate superior detection performance. Attacks. W e evaluate both e xisting attacks (HB A [ 28 ], OFB A [ 40 ], IBA [ 23 ]) and the proposed B ADSEG attacks. For fine-grained attacks, we consider INS-O2B and CON- O2B. The follo wing sections describe each defense and report its results. W e present more details and findings in Section E . Fine-T uning [ 43 ] mitigates backdoors by retraining models on clean data. Follo wing prior work [ 23 ], we fine-tune back- doored models for a fixed epoch of 10 using clean subsets of 1%, 5%, and 10% and summarize the results in T able 14 . As the clean data increases, ASR consistently decreases across all attacks, indicating that fine-tuning provides partial mitiga- tion. Howe ver , our attacks still remain comparati vely rob ust: under the strongest setting (10%), they still achieve ASRs that are approximately 10–20 percentage points higher than the strongest baseline. The results show that fine-tuning can sup- press segmentation backdoor behavior . Ho wev er , compared T able 15: Defense results for ABL and STRIP . (a) Attack with ABL. Attack ASR PBA CBA HBA 0.1589 0.5020 0.5246 OFBA 0.5542 0.5605 0.5817 IBA 0.7177 0.5781 0.5872 O2O 0.7804 0.5327 0.5523 O2B 0.7750 0.5810 0.5883 B2O 0.8962 0.5036 0.5209 B2B 0.9085 0.5512 0.5507 INS 0.8234 0.5423 0.5634 CON 0.8567 0.5289 0.5498 (b) Detection with STRIP . Attack A CC Recall AUC HBA 0.478 0.0257 0.4492 OFBA 0.514 0.0171 0.4824 IBA 0.474 0.0414 0.4590 O2O 0.448 0.1072 0.4263 O2B 0.530 0.0858 0.5017 B2O 0.596 0.0985 0.5173 B2B 0.636 0.0357 0.4877 INS 0.512 0.0923 0.4758 CON 0.547 0.0795 0.4932 −1 0 1 2 Loss 0 1 2 3 4 5 Density Samples Clean Backdoor (a) IB A w/ ABL −1 0 1 2 Loss 0 1 2 3 4 5 Density Samples Clean Backdoor (b) O2O w/ ABL −1 0 1 2 Loss 0 1 2 3 4 5 Density Samples Clean Backdoor (c) B2B w/ ABL 0.0 0.5 1.0 1.5 2.0 Entropy 0 1 2 3 Density Threshold = 0.73 Samples Clean Backdoor (d) IB A w/ STRIP 0.0 0.5 1.0 1.5 2.0 Entropy 0 1 2 3 Density Threshold = 0.29 Samples Clean Backdoor (e) O2O w/ STRIP 0.0 0.5 1.0 1.5 2.0 Entropy 0 1 2 3 Density Threshold = 0.53 Samples Clean Backdoor (f) B2B w/ STRIP −1 0 1 2 3 Score 0.00 0.25 0.50 0.75 1.00 1.25 Density Samples Clean Backdoor (g) IB A w/ T eCo −1 0 1 2 3 Score 0.00 0.25 0.50 0.75 1.00 1.25 Density Samples Clean Backdoor (h) O2O w/ T eCo −1 0 1 2 3 Score 0.00 0.25 0.50 0.75 1.00 1.25 Density Samples Clean Backdoor (i) B2B w/ T eCo Figure 5: KDE plots of training losses for ABL, entropy scores for STRIP , and detection scores for T eCo. to existing attacks, our attacks e xhibit stronger resistance. Pruning [ 23 , 32 ] removes less important neurons from back- doored models to disrupt malicious activ ations while main- taining model utility . Following prior studies [ 23 , 32 ], we use clean samples to measure activ ations in the final layer, rank channels by the number of acti vated neurons, and zero out the least-activ e 1%, 5%, and 10% of channels. W e report the results in T able 14 . Overall, pruning is ineffecti ve: ASRs remain close to the no-defense baseline across all pruning ratios. This aligns with the results in prior studies [ 23 , 32 ]: removing a subset of less important neurons from the last layer does not reliably mitigate segmentation backdoors. This might stem from the fact that, in segmentation, trigger features are distributed across spatial locations and multiple channels, making them resilient to the channel remov al. ABL [ 26 ] aims to separate poisoned samples from clean ones using sample training losses, and then retrain a model with the remaining data. The process is applied progressi vely , weaken- ing the spurious correlation between the trigger and the tar get 11 T able 16: Backdoor detection results with T eCo. n -std indi- cates different detection thresholds. Attack 1-std 2-std 3-std Recall AUC Recall AUC Recall AUC HBA 0.2521 0.5446 0.0652 0.5169 0.0000 0.5000 OFBA 0.2369 0.5574 0.0630 0.5074 0.0000 0.5000 IBA 0.3112 0.5765 0.0311 0.5117 0.0000 0.5000 O2O 0.1739 0.5018 0.0478 0.5026 0.0000 0.5000 O2B 0.3826 0.6098 0.0760 0.5242 0.0000 0.5000 B2O 0.1183 0.4829 0.0377 0.4965 0.0000 0.5000 B2B 0.1445 0.4885 0.0271 0.4971 0.0000 0.5000 INS 0.2134 0.5312 0.0589 0.5098 0.0000 0.5000 CON 0.2687 0.5523 0.0698 0.5187 0.0000 0.5000 label. T able 15a reports the ASR after ABL. The results show that ABL can mitigate simpler attacks such as HBA (reduc- ing ASR to 15.89%). Howe ver , it is ineffecti ve against the other attack variants. In particular , for proposed attacks (e.g., B2B and B2O), ASR remains alarmingly high (abo ve 89%). Moreov er , the lo w PB A and CB A scores (0.50–0.58) suggest a notable degradation in clean-model utility . W e further ex- plore the loss landscape and plot K ernel Density Estimation (KDE) curves in Figures 5a to 5c . The results show extensi ve distribution overlaps between clean and poisoned samples, indicating that ABL ’ s loss-based detection is insufficient to identify poisoned samples under our attacks reliably . STRIP [ 16 ] detects poisoned samples by measuring model prediction consistency under input perturbations. The trig- gered samples are expected to maintain stable (lo w-entropy) outputs, while clean samples exhibit more variable (high- entropy) predictions. T o enable STRIP for semantic se gmenta- tion, we first compute per-pixel entropy maps and then aggre- gate them into an image-le v el score for detection. T able 15b presents the detection results. Overall, STRIP sho ws inef fec- ti ve detection results. Across all attacks, the A UC scores hov er around 0.5 (ranging from 0.42 to 0.51), comparable to random guessing. Accordingly , Recall rates are negligible, remaining consistently below 11%, including O2O, which achie ves the highest recall among the ev aluated cases. As shown in the KDE plot Figures 5d to 5f , the score distributions of clean and poisoned samples are nearly same. This suggests that, in semantic segmentation, the perturbation-based signal in STRIP becomes too weak to distinguish between clean and poisoned samples. T eCo [ 30 ] identifies poisoned samples based on the assump- tion that clean images exhibit consistent robustness under div erse image corruptions. T o enable T eCo for segmentation, we apply a set of 15 image corruption operations to each input and calculate the mIoU for the clean image and its corrupted variant. A sample is flagged as suspicious if its mIoU drop exceeds a predefined threshold. T able 16 reports the detection results under thresholds of µ + n · σ , where µ and σ denote the mean and standard deviation of the scores, respectively . Overall, T eCo is ineffecti ve for detecting se gmentation back- doors. Ev en with the most sensiti v e setting (1-std), A UC stays T able 17: Backdoor detection results with Beatrix. Attack Main Class (e.g., road) Selected Class (car) A CC Recall F1 AUC ACC Recall F1 AUC HBA 0.500 0.00 0.0000 0.500 0.500 0.01 0.0196 0.500 OFBA 0.500 0.06 0.1071 0.500 0.505 0.01 0.0198 0.505 IBA 0.515 0.10 0.1709 0.515 0.530 0.20 0.2985 0.530 O2O 0.525 0.08 0.1441 0.525 0.500 0.00 0.0000 0.500 O2B 0.505 0.01 0.0198 0.505 0.500 0.00 0.0000 0.500 B2O 0.515 0.04 0.0762 0.515 0.515 0.11 0.1849 0.515 B2B 0.510 0.06 0.1091 0.510 0.655 0.55 0.6145 0.655 INS 0.518 0.07 0.1251 0.518 0.510 0.06 0.1091 0.510 CON 0.512 0.05 0.0943 0.512 0.520 0.12 0.1978 0.520 close to 0.5 (i.e., random guessing), and Recall is typically below 0.4. T o further in vestig ate the results, we plot the KDE distributions of detection scores in Figures 5g to 5i . Consis- tent with ABL and STRIP , the heavy o verlap between clean and poisoned distributions suggests that T eCo cannot reliably distinguish between backdoored and clean samples. Beatrix [ 38 ] detects poisoned samples by measuring Gramian statistics of internal feature maps. They assume that triggers can induce statistically abnormal scores compared to clean inputs. T o enable Beatrix for segmentation, we assign each image a main class : the class that occupies the lar gest pixel area in the image. W e then compute the score over feature maps for the class and apply Beatrix to identify poisoned samples. As shown in T able 17 , this scheme results in near- random performance, with Recall approaching zero and A UC staying around 0.5 across all attacks. W e further ev aluate with a selected class (e.g., car ) to calculate the detection score. This leads to only minor gains for most attacks, but improv es B2B more noticeably . W e attribute the improv ement to its background-to-background setting, which dominates the pix el distribution in images. Therefore, the attack induces a stronger shift in global feature statistics, making Gramian statistics slightly more separable. Howe v er , e ven in this case, Recall only reaches 0.55, suggesting that Beatrix remains unreliable for detecting poisoned samples in segmentation. Summary . W e benchmark both existing and our proposed segmentation backdoor attacks against six representativ e de- fenses and find that: (1) Backdoored segmentation models are often resistant to post-training defenses (e.g., fine-tuning). This robustness likely stems from the feature entanglement in dense prediction, which allows backdoor signals to persist despite parameter perturbations. (2) Existing poisoned sample detection methods prove inef fective for segmentation. This is because segmentation attacks typically modify labels locally at the pixel level rather than global label flipping; they do not induce global statistical anomalies significant enough for reliable detection. The ef fecti ve mitigation of segmentation backdoors remains largely unresolv ed. 12 T able 18: Attack results with Transformers on BDD100K. Model ASR ↑ PBA ↑ CBA ↑ Model ASR ↑ PBA ↑ CBA ↑ V iT -B 0.9189 0.6123 0.6287 ViT -B 0.9187 0.6234 0.6312 O2O DeiT -S 0.9145 0.6051 0.6198 B2B DeiT -S 0.9023 0.6198 0.6245 Swin-T 0.9378 0.6115 0.6241 Swin-T 0.9334 0.6295 0.6267 V iT -B 0.9247 0.6245 0.6334 ViT -B 0.8934 0.6378 0.6489 O2B DeiT -S 0.9039 0.6198 0.6256 INS DeiT -S 0.8867 0.6345 0.6412 Swin-T 0.9486 0.6232 0.6239 Swin-T 0.9098 0.6434 0.6441 V iT -B 0.9712 0.6189 0.6298 ViT -B 0.9067 0.6312 0.6378 B2O DeiT -S 0.9678 0.6134 0.6221 CON DeiT -S 0.9012 0.6267 0.6289 Swin-T 0.9823 0.6172 0.6243 Swin-T 0.9221 0.6298 0.6323 T able 19: Attack results for SAM [ 42 ]. Attack Mask-Distortion Mask-Erasure Mask-Injection ASR 0.9134 0.9082 0.9166 8 Attacking Emerging Ar chitectur es T o answer RQ3 , we e xtend our ev aluation to recent se gmenta- tion architectures, including Transformer -based architectures and the Segment An ything Model (SAM). Attacking T ransformer -Based Models W e attack T ransformers with the experimental settings de- tailed in Section 6.1 , including the optimized trigger param- eters and selected victim–tar get pairs. T able 18 summarizes the results on BDD100K for V iT -B [ 11 ], DeiT -S [ 47 ], and Swin-T [ 33 ] (all strucutred with a UPerNet head). Across all settings, these architectures remain vulnerable to backdoor attacks. Specifically , Swin-T achie ves attack performance comparable to Con vNeXt-T , as they share similar feature ex- traction architectures. V iT -B and DeiT exhibit slightly lower ASRs, but still remain clearly susceptible to backdoor attacks. Meanwhile, PB As and CB As consistently stay abov e 0.6, in- dicating that the attacks preserve the model utility on clean inputs. More details are presented in Section F . Attacking Segment Anything Model (SAM) Motivation. The Segment Anything Model (SAM) [ 22 , 42 ] introduces a promptable frame work for image se gmentation. This enables users to generate high-quality masks with simple inputs, such as points or bounding boxes. T rained on massi ve- scale datasets, SAM is capable of zero-shot segmentation on unseen images without additional fine-tuning. Unlike con ventional segmentation models, SAM adopts a distinct architecture and is trained on large-scale datasets. In- stead of predicting a fixed set of semantic labels, SAM learns to segment objects in a category-agnostic manner: given a prompt, it predicts the corresponding object mask without assigning a semantic class label. This promptable interface enables flexible interaction, allo wing users to segment arbi- trary objects as long as they are visually distinguishable in the image. Proposed Attacks. T o deploy backdoor attacks against SAM, we adapt BADSEG to match SAM’ s prompt-conditioned mask prediction. Instead of inducing label misclassification, our proposed attacks directly manipulate the predicted masks. Specifically , we consider three attack vectors tailored to SAM: (1) Mask-Distortion , where the trigger alters the shape, loca- tion, or extent of the predicted mask; (2) Mask-Erasur e , where the trigger erases the output, causing the mask to v anish en- tirely; and (3) Mask-Injection , where the trigger fabricates spurious masks in regions where no object e xists. For the detailed attack procedure, although B ADSEG is formulated for label misclassification, its trigger optimization can be adapted to mask manipulation in SAM. Therefore, we propose B ADSEG-SAM, which uses the first two stages of B ADSEG to optimize trigger parameters for mask manipu- lation. For each attack vector , B ADSEG-SAM optimizes the trigger parameters utilizing a surrogate model. And then it modifies the target masks to match the objectiv e of each attack vector . This approach allows us to effecti vely launch all three proposed attack vectors on SAM. Attack Results. W e ev aluate the robustness of SAM under three proposed attacks on the LabPicsV1 dataset [ 13 ]. T a- ble 19 sho ws the results. Overall, SAM is highly vulnerable: all three attack vectors yield high ASRs, indicating that back- door triggers can consistently manipulate prompt-conditioned mask predictions. Notably , ev en the weakest case ( Mask- Erasur e ) attains an ASR of 0.9082, indicating that large-scale segmentation models remain vulnerable to our attacks. 9 Related W ork In this section, we first revie w backdoor attacks in general, summarize existing w ork on backdoors in semantic segmen- tation, and then introduce backdoor defense strategies. Backdoor Attacks. Backdoor attacks were first introduced in image classification [ 18 ], where training samples were manipulated by injecting a patch trigger and relabeling them to a target class. Subsequent work di versified trigger design, such as cartoon blending [ 8 ], in visible perturbations [ 62 ], and reflections [ 31 ]. Beyond image classification, backdoor attacks have also been explored in diverse settings, such as federated learning [ 51 ], natural language processing [ 7 ], and object detection [ 3 ]. Backdoor Attacks on Semantic Segmentation. Sev eral studies have demonstrated the feasibility of backdoor at- tacks in semantic segmentation with different trigger de- signs, such as black line triggers in Hidden Backdoor Attacks (HB A) [ 28 ], grid-shaped triggers in Object-Free Backdoor Attacks (OFB A) [ 40 ], and a “Hello-Kitty” logos in Influencer Backdoor Attacks (IB A) [ 23 ]. While these works establish initial segmentation backdoor studies, they are restricted to narrow settings. More recently , research has explored adver - sarial attacks in the Segment Anything Model (SAM) [ 63 ]. In contrast to these existing studies, we revisit se gmentation 13 backdoor attacks through a broader lens, examining potential threats, trigger designs, defense benchmarking, and emerging architectures. Backdoor Defenses. V arious defenses have been proposed to mitigate backdoor attacks. One line of work applies post- training interventions to suppress backdoor behavior [ 6 , 15 , 26 , 32 , 43 ]. Another line focuses on detecting or neutralizing backdoors in already trained models [ 16 , 27 , 30 , 38 , 61 ]. While these approaches ha ve shown strong results for image classifi- cation, their effecti veness for semantic segmentation remains largely une xplored. In this work, we benchmark existing and the proposed attacks against six representativ e defenses, and find that they of fer limited protection for segmentation back- door attacks. Additional discussion is presented in Section H . 10 Conclusion This work presents the first comprehensiv e study of back- door attacks on semantic segmentation. W e revisit the threats and identify four coarse-grained attacks and two fine-grained attacks, exposing pre viously ov erlooked backdoor vulnerabil- ities. T o launch these attacks, we propose B ADSEG, a unified framew ork that integrates trigger parameter optimization and label manipulation strategies. Extensi ve experiments across multiple architectures and datasets demonstrate that B AD- SEG achie ves consistently high attack success rates while preserving model utility . Furthermore, we further benchmark existing attacks against six representative defenses, rev eal- ing that these defenses provide limited protection against the proposed attacks. Finally , we sho w that these vulnerabilities persist in recent emerging architectures, including transform- ers and SAM. Our results expose overlook ed threats in seg- mentation, motiv ate the de velopment of efficient backdoor defenses, and establish a practical benchmark for dev eloping more secure segmentation models. References [1] Fares Bougourzi and Abdenour Hadid. Recent Ad- vances in Medical Imaging Segmentation: A Survey . arXiv pr eprint arXiv:2505.09274 , 2025. [2] Holger Caesar , Jasper Uijlings, and V ittorio Ferrari. COCO-Stuff: Thing and Stuff Classes in Context. In 2018 IEEE/CVF Conference on Computer V ision and P attern Recognition , pages 1209–1218, 2018. [3] Shih-Han Chan, Y inpeng Dong, Jun Zhu, Xiaolu Zhang, and Jun Zhou. BadDet: Backdoor Attacks on Object De- tection. In Computer V ision – ECCV 2022 W orkshops , pages 396–412, 2022. [4] Kangjie Chen, Xiaoxuan Lou, Guowen Xu, Jiwei Li, and T ianwei Zhang. Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only. In The Eleventh International Confer ence on Learning Repr e- sentations , 2023. [5] Liang-Chieh Chen, Y ukun Zhu, George Papandreou, Flo- rian Schroff, and Hartwig Adam. Encoder-Decoder with Atrous Separable Con v olution for Semantic Image Seg- mentation. In Computer V ision – ECCV 2018 , pages 833–851, 2018. [6] W eixin Chen, Baoyuan W u, and Haoqian W ang. Ef- fectiv e Backdoor Defense by Exploiting Sensitivity of Poisoned Samples. Advances in Neural Information Pr ocessing Systems , 35:9727–9737, 2022. [7] Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai W u, and Y ang Zhang. BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements. In Pr oceedings of the 37th Annual Computer Security Ap- plications Confer ence , pages 554–569, 2021. [8] Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. T argeted Backdoor Attacks on Deep Learn- ing Systems Using Data Poisoning. arXiv pr eprint arXiv:1712.05526 , 2017. [9] Siyuan Cheng, Guangyu Shen, Guanhong T ao, Kaiyuan Zhang, Zhuo Zhang, Shengwei An, Xiangzhe Xu, Y ingqi Li, Shiqing Ma, and Xiangyu Zhang. OdScan: Backdoor Scanning for Object Detection Models. In 2024 IEEE Symposium on Security and Privacy (SP) , pages 1703–1721, 2024. [10] Marius Cordts, Mohamed Omran, Sebastian Ramos, T imo Rehfeld, Markus Enzweiler , Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The Cityscapes Dataset for Semantic Urban Scene Under- standing. In 2016 IEEE Conference on Computer V i- sion and P attern Recognition (CVPR) , pages 3213–3223, 2016. [11] Alex ey Dosovitskiy , Lucas Beyer , Alexander K olesniko v , Dirk W eissenborn, Xiaohua Zhai, Thomas Unterthiner , Mostafa Dehghani, Matthias Minderer , Georg Heigold, Sylvain Gelly , Jakob Uszkoreit, and Neil Houlsby . An Image is W orth 16x16 W ords: T ransformers for Image Recognition at Scale. In International Confer ence on Learning Repr esentations , 2021. [12] Qiuyu Duan, Zhongyun Hua, Qing Liao, Y ushu Zhang, and Leo Y u Zhang. Conditional Backdoor Attack via JPEG Compression. Pr oceedings of the AAAI Confer- ence on Artificial Intelligence , 38:19823–19831, 2024. 14 [13] Sagi Eppel, Haoping Xu, Mor Bismuth, and Alan Aspuru-Guzik. Computer V ision for Recognition of Materials and V essels in Chemistry Lab Settings and the V ector-LabPics Data Set. A CS Central Science , 6:1743– 1752, 2020. [14] Di Feng, Christian Haase-Schütz, Lars Rosenbaum, Heinz Hertlein, Claudius Gläser , Fabian T imm, W erner W iesbeck, and Klaus Dietmayer . Deep Multi-Modal Object Detection and Semantic Segmentation for Au- tonomous Driving: Datasets, Methods, and Challenges. IEEE T ransactions on Intelligent T ransportation Sys- tems , 22:1341–1360, 2021. [15] Kuofeng Gao, Y ang Bai, Jindong Gu, Y ong Y ang, and Shu-T ao Xia. Backdoor Defense via Adapti vely Split- ting Poisoned Dataset. In 2023 IEEE/CVF Confer ence on Computer V ision and P attern Recognition (CVPR) , pages 4005–4014, 2023. [16] Y ansong Gao, Change Xu, Derui W ang, Shiping Chen, Damith C. Ranasinghe, and Surya Nepal. STRIP: A defence against trojan attacks on deep neural networks. In Pr oceedings of the 35th Annual Computer Security Applications Confer ence , pages 113–125, 2019. [17] Junyi Gu, Mauro Bellone, T omáš Piv o ˇ nka, and Raiv o Sell. CLFT: Camera-LiDAR Fusion T ransformer for Semantic Segmentation in Autonomous Dri ving. IEEE T ransactions on Intelligent V ehicles , pages 1–12, 2024. [18] T ianyu Gu, Brendan Dolan-Ga vitt, and Siddharth Garg. BadNets: Identifying V ulnerabilities in the Machine Learning Model Supply Chain. arXiv pr eprint arXiv:1708.06733 , 2019. [19] Y e Huang, Di Kang, Liang Chen, Xuefei Zhe, W enjing Jia, Linchao Bao, and Xiangjian He. CAR: Class-A ware Regularizations for Semantic Segmentation. In Com- puter V ision – ECCV 2022 , pages 518–534, 2022. [20] Eric Jang, Shixiang Gu, and Ben Poole. Cate gorical Reparameterization with Gumbel-Softmax. In Interna- tional Confer ence on Learning Repr esentations , 2017. [21] W enbo Jiang, Hongwei Li, Guowen Xu, and T ianwei Zhang. Color Backdoor: A Robust Poisoning Attack in Color Space. In Pr oceedings of the IEEE/CVF Con- fer ence on Computer V ision and P attern Recognition , pages 8133–8142, 2023. [22] Alexander Kirillov , Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, T ete Xiao, Spencer Whitehead, Alexander C. Berg, W an-Y en Lo, Piotr Dollár , and Ross Girshick. Segment Anything. In 2023 IEEE/CVF International Confer ence on Computer V ision (ICCV) , pages 3992–4003, 2023. [23] Haoheng Lan, Jindong Gu, Philip T orr , and Hengshuang Zhao. Influencer Backdoor Attack on Semantic Seg- mentation. In The T welfth International Conference on Learning Repr esentations , 2023. [24] Jiahe Lan, Jie W ang, Baochen Y an, Zheng Y an, and Elisa Bertino. FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Kno wledge. In 2024 IEEE Symposium on Security and Privacy (SP) , pages 1646–1664, 2024. [25] Harry Langford, Ilia Shumailov , Y iren Zhao, Robert Mullins, and Nicolas Papernot. Architectural Neural Backdoors from First Principles. In 2025 IEEE Sympo- sium on Security and Privacy (SP) , pages 1657–1675, 2025. [26] Y ige Li, Xixiang L yu, Nodens K oren, Lingjuan L yu, Bo Li, and Xingjun Ma. Anti-Backdoor Learning: T rain- ing Clean Models on Poisoned Data. In Advances in Neural Information Pr ocessing Systems , v olume 34, pages 14900–14912, 2021. [27] Y ige Li, Xixiang L yu, Nodens K oren, Lingjuan L yu, Bo Li, and Xingjun Ma. Neural Attention Distillation: Erasing Backdoor T riggers from Deep Neural Netw orks. In International Conference on Learning Representa- tions , 2021. [28] Y iming Li, Y anjie Li, Y alei Lv , Y ong Jiang, and Shu-T ao Xia. Hidden Backdoor Attack against Semantic Seg- mentation Models. arXiv pr eprint arXiv:2103.04038 , 2021. [29] Hongbin Liu, Michael K. Reiter , and Neil Zhenqiang Gong. Mudjacking: Patching Backdoor V ulnerabilities in Foundation Models. In 33r d USENIX Security Sym- posium (USENIX Security 24) , pages 2919–2936, 2024. [30] Xiaogeng Liu, Minghui Li, Haoyu W ang, Shengshan Hu, Dengpan Y e, Hai Jin, Libing W u, and Chao wei Xiao. De- tecting Backdoors During the Inference Stage Based on Corruption Rob ustness Consistency . In 2023 IEEE/CVF Confer ence on Computer V ision and P attern Recogni- tion (CVPR) , pages 16363–16372, 2023. [31] Y unfei Liu, Xingjun Ma, James Bailey , and Feng Lu. Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks. In Computer V ision – ECCV 2020 , pages 182–199, 2020. [32] Y untao Liu, Y ang Xie, and Ankur Sriv astav a. Neural T rojans. In 2017 IEEE International Confer ence on Computer Design (ICCD) , pages 45–48, 2017. [33] Ze Liu, Y utong Lin, Y ue Cao, Han Hu, Y ixuan W ei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin T ransformer: Hierarchical V ision Transformer using 15 Shifted W indows. In 2021 IEEE/CVF International Confer ence on Computer V ision (ICCV) , pages 9992– 10002, 2021. [34] Zhuang Liu, Hanzi Mao, Chao-Y uan W u, Christoph Fe- ichtenhofer , Tre vor Darrell, and Saining Xie. A Con vNet for the 2020s. In 2022 IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) , 2022. [35] Jonathan Long, Evan Shelhamer, and Tre v or Darrell. Fully con v olutional networks for semantic segmentation. In IEEE Confer ence on Computer V ision and P attern Recognition, CVPR 2015, Boston, MA, USA, J une 7-12, 2015 , pages 3431–3440, 2015. [36] Jialin Lu, Junjie Shan, Ziqi Zhao, and Ka-Ho Cho w . AnywhereDoor: Multi-T arget Backdoor Attacks on Ob- ject Detection. arXiv preprint , 2024. [37] Hua Ma, Shang W ang, Y ansong Gao, Zhi Zhang, Hum- ing Qiu, Minhui Xue, Alsharif Abuadbba, Anmin Fu, Surya Nepal, and Derek Abbott. W atch Out! Simple Horizontal Class Backdoor Can Tri vially Evade Defense. In Pr oceedings of the 2024 on ACM SIGSAC Confer- ence on Computer and Communications Security , pages 4465–4479, 2024. [38] W anlun Ma, Derui W ang, Ruoxi Sun, Minhui Xue, Sheng W en, and Y ang Xiang. The "Beatrix" Resurrec- tions: Robust Backdoor Detection via Gram Matrices. In Pr oceedings 2023 Network and Distributed System Security Symposium , 2023. [39] Chris J. Maddison, Andriy Mnih, and Y ee Whye T eh. The Concrete Distribution: A Continuous Relaxation of Discrete Random V ariables. In International Confer- ence on Learning Repr esentations , 2017. [40] Jiaoze Mao, Y aguan Qian, Jianchang Huang, Zejie Lian, Renhui T ao, Bin W ang, W ei W ang, and T engteng Y ao. Object-free backdoor attack and defense on semantic segmentation. Computers & Security , 132:103365, 2023. [41] Shervin Minaee, Y uri Y . Boyko v , Fatih Porikli, Anto- nio J Plaza, Nasser Kehtarna v az, and Demetri T erzopou- los. Image segmentation using deep learning: A survey . IEEE T r ansactions on P attern Analysis and Machine Intelligence , pages 1–1, 2021. [42] Nikhila Ravi, V alentin Gabeur , Y uan-Ting Hu, Rong- hang Hu, Chaitanya Ryali, T engyu Ma, Haitham Khedr , Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan V asudev Alwala, Nicolas Carion, Chao-Y uan W u, Ross Girshick, Piotr Dollár , and Christoph Feichtenhofer . SAM 2: Segment An ything in Images and V ideos. arXiv pr eprint arXiv:2408.00714 , 2024. [43] Zeyang Sha, Xinlei He, P ascal Berrang, Mathias Hum- bert, and Y ang Zhang. Fine-Tuning Is All Y ou Need to Mitigate Backdoor Attacks. arXiv pr eprint arXiv:2212.09067 , 2022. [44] Ali Shafahi, W . Ronny Huang, Mahyar Najibi, Octa vian Suciu, Christoph Studer, T udor Dumitras, and T om Gold- stein. Poison Frogs! T argeted Clean-Label Poisoning Attacks on Neural Networks. In Advances in Neural Information Pr ocessing Systems , volume 31, 2018. [45] Mennatullah Siam, Mostafa Gamal, Moemen Abdel- Razek, Senthil Y ogamani, Martin Jagersand, and Hong Zhang. A Comparativ e Study of Real-Time Semantic Segmentation for Autonomous Dri ving. In Proceedi ngs of the IEEE Conference on Computer V ision and P attern Recognition W orkshops , pages 587–597, 2018. [46] Guanhong T ao, Y ingqi Liu, Guangyu Shen, Qiuling Xu, Shengwei An, Zhuo Zhang, and Xiangyu Zhang. Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Security. In 2022 IEEE Symposium on Security and Privacy (SP) , pages 1372–1389, 2022. [47] Hugo T ouvron, Matthieu Cord, Matthijs Douze, Fran- cisco Massa, Alexandre Sablayrolles, and Herve Je gou. T raining data-ef ficient image transformers & distillation through attention. In Pr oceedings of the 38th Interna- tional Confer ence on Machine Learning , pages 10347– 10357, 2021. [48] Bolun W ang, Y uanshun Y ao, Shawn Shan, Huiying Li, Bimal V iswanath, Haitao Zheng, and Ben Y . Zhao. Neu- ral Cleanse: Identifying and Mitigating Backdoor At- tacks in Neural Networks. In 2019 IEEE Symposium on Security and Privacy (SP) , pages 707–723, 2019. [49] Di W ang, Jing Zhang, Bo Du, Minqiang Xu, Lin Liu, Dacheng T ao, and Liangpei Zhang. SAMRS: Scaling- up Remote Sensing Segmentation Dataset with Segment Anything Model. Advances in Neural Information Pr o- cessing Systems , 36:8815–8827, 2023. [50] Hang W ang, Zhen Xiang, David J. Miller , and George K esidis. MM-BD: Post-T raining Detection of Backdoor Attacks with Arbitrary Backdoor Pattern T ypes Using a Maximum Margin Statistic. In 2024 IEEE Symposium on Security and Privacy (SP) , pages 1994–2012, 2024. [51] Hongyi W ang, Kartik Sreeniv asan, Shashank Rajput, Harit V ishwakarma, Saurabh Agarwal, Jy-yong Sohn, Kangwook Lee, and Dimitris Papailiopoulos. Attack of the T ails: Y es, Y ou Really Can Backdoor Federated Learning. In Advances in Neural Information Pr ocess- ing Systems , volume 33, pages 16070–16084, 2020. 16 [52] Zeyu W ang, Y utong Bai, Y uyin Zhou, and Cihang Xie. Can CNNs Be More Rob ust Than Transformers? In The Eleventh International Confer ence on Learning Repr e- sentations , 2023. [53] Michał W ieczorek, Jakub Siłka, Katarzyna W iltos, and Marcin W o ´ zniak. Transformer Based Semantic Segmen- tation Network for Medical Imaging Application. In Artificial Intelligence and Soft Computing , pages 380– 389, 2025. [54] T ete Xiao, Y ingcheng Liu, Bolei Zhou, Y uning Jiang, and Jian Sun. Unified Perceptual Parsing for Scene Un- derstanding. In Computer V ision – ECCV 2018 , pages 432–448, 2018. [55] Fisher Y u, Haofeng Chen, Xin W ang, W enqi Xian, Y ingying Chen, Fangchen Liu, V ashisht Madhavan, and T revor Darrell. BDD100K: A Di verse Driving Dataset for Heterogeneous Multitask Learning. In 2020 IEEE/CVF Confer ence on Computer V ision and P attern Recognition (CVPR) , pages 2633–2642, 2020. [56] Y uhui Y uan, Xilin Chen, and Jingdong W ang. Object- Contextual Representations for Semantic Segmentation. In Computer V ision – ECCV 2020 , pages 173–190, 2020. [57] Y i Zeng, Minzhou Pan, Hoang Anh Just, Lingjuan L yu, Meikang Qiu, and Ruoxi Jia. Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information. In Pr oceedings of the 2023 A CM SIGSA C Confer ence on Computer and Communications Security , pages 771– 785, 2023. [58] Fan Zhang, Y anqin Chen, Zhihang Li, Zhibin Hong, Jing- tuo Liu, Feifei Ma, Junyu Han, and Errui Ding. A CFNet: Attentional Class Feature Network for Semantic Seg- mentation. In 2019 IEEE/CVF International Confer ence on Computer V ision (ICCV) , pages 6797–6806, 2019. [59] Guangsheng Zhang, Bo Liu, Huan T ian, T ianqing Zhu, Ming Ding, and W anlei Zhou. How Does a Deep Learn- ing Model Architecture Impact Its Pri vac y? A Com- prehensiv e Study of Priv acy Attacks on {CNNs} and T ransformers. In 33r d USENIX Security Symposium (USENIX Security 24) , 2024. [60] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang W ang, and Jiaya Jia. Pyramid scene parsing netw ork. In 2017 IEEE Confer ence on Computer V ision and P attern Recognition, CVPR 2017, Honolulu, HI, USA, J uly 21- 26, 2017 , pages 6230–6239, 2017. [61] Runkai Zheng, Rongjun T ang, Jianze Li, and Li Liu. Data-Free Backdoor Removal Based on Channel Lip- schitzness. In Computer V ision – ECCV 2022 , pages 175–191, 2022. T able 20: List of candidate options for trigger parameters. Attack Attribute Options O2O Shape circle, square, triangle, batman logo Size 1/12, 1/10, 1/8, 1/6, 1/4, 1/2 Position object center, random on object, random outside object Quantity 1, 2, 3, 4, 5 Intensity 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 O2B Shape circle, square, triangle, batman logo Size 1/12, 1/10, 1/8, 1/6, 1/4, 1/2 Position object center, random on object, random outside object Quantity 1, 2, 3, 4, 5 Intensity 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 B2O Shape circle, square, triangle, batman logo Size 1/12, 1/10, 1/8, 1/6, 1/4, 1/2 Position - Quantity 1, 2, 3, 4, 5 Intensity 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 B2B Shape circle, square, triangle, batman logo Size 0.005, 0.010, 0.015, 0.020, 0.025 Position - Quantity 1, 3, 5, 7, 10 Intensity 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 [62] Haoti Zhong, Cong Liao, Anna Cinzia Squicciarini, Sen- cun Zhu, and David Miller . Backdoor Embedding in Con v olutional Neural Netw ork Models via In visible Per - turbation. In Pr oceedings of the T enth ACM Confer ence on Data and Application Security and Privacy , pages 97–108, 2020. [63] Ziqi Zhou, Y ufei Song, Minghui Li, Shengshan Hu, Xi- anlong W ang, Leo Y u Zhang, Dezhong Y ao, and Hai Jin. DarkSAM: Fooling Segment Anything Model to Seg- ment Nothing. In The Thirty-eighth Annual Confer ence on Neural Information Pr ocessing Systems , 2024. A Additional Preliminaries Semantic Segmentation Pipeline. Semantic segmentation can be used in many applications, such as autonomous driv- ing [ 14 , 17 , 45 ], medical imaging [ 1 , 53 ], and remote sens- ing [ 49 ]. W e take autonomous dri ving as a representati ve application scenario to explain the semantic segmentation pipeline. In autonomous dri ving, semantic se gmentation en- ables the perception of complex urban en vironments by identi- fying critical elements, such as driv able surfaces, pedestrians, and vehicles, from camera images. The pipeline in volv es ac- quiring data from public datasets with pixel-le vel annotations, followed by preprocessing and augmentation to enhance ro- bustness. A deep learning model is trained using cross-entropy loss to predict segmentation maps that assign semantic labels to every pixel. These maps integrate into the vehicle’ s per- ception and planning systems, enabling safe navigation by detecting driv able areas and obstacles. Attackers can exploit this pipeline by injecting backdoor triggers during data ac- quisition, creating backdoored models that compromise the vehicle’ s perception and planning capabilities. 17 B Additional Details in B ADSEG Discussion on candidate trigger options. T able 20 sum- marizes the discrete search space for trigger parameters in Stage 2. Our candidates are selected to strike a balance be- tween attack effecti v eness and stealthiness. The search spaces are shared across attack vectors, with minor adjustments when the threat changes. F or background-oriented attacks (B2O/B2B), the trigger is positioned relativ e to the specified background/victim region; thus, the object-centric P osition options used in O2O/O2B are not applicable (marked as “–”). Details for optimization of discrete parameters. W e em- ploy the Gumbel-Softmax reparameterization [ 20 , 39 ]. W e maintain a categorical distribution for each discrete parameter λ p . Specifically , let V p = { v ( p ) i } m p i = 1 denote its candidate set and let ξ ( p ) = ( ξ ( p ) 1 , . . . , ξ ( p ) m p ) be the associated categorical probabilities. Using the Gumbel–Softmax reparameterization, we draw i.i.d. Gumbel noise G ( p ) i and obtain a differentiable, near one-hot sample η ( p ) shown in Equation ( 2 ). W e then map η ( p ) to a relaxed parameter value via a weighted combination of candidates: ˜ λ p = m p ∑ i = 1 η ( p ) i v ( p ) i , ˜ λ = ( ˜ λ 1 , . . . , ˜ λ k ) . (4) Constructing δ = F ( ˜ λ ) makes the trigger dif ferentiable with respect to { ξ ( p ) } k p = 1 . Accordingly , we optimize: min { ξ ( p ) } k p = 1 E { G ( p ) } " ∑ ( x , y ) ∈ D s L  S  T ( x , F ( ˜ λ ))  , y t  # . (5) ˜ λ is computed from { ξ ( p ) } , so the objecti ve is dif ferentiable w .r .t. { ξ ( p ) } . E { G ( p ) } denotes the expectation over the random Gumbel noises used to sample η ( p ) for each discrete trigger parameter in the Gumbel–Softmax relaxation (approximated in practice by drawing one sample per iteration). After optimization, we discretize each parameter by select- ing the most likely option ˆ λ p = v ( p ) arg max i ξ ( p ) i and construct the final trigger δ = F ( ˆ λ ) . C Additional Experimental Setup Datasets. W e ev aluate B ADSEG on two widely used au- tonomous driving benchmarks. BDD100K [ 55 ] contains 100,000 images under di verse driving conditions (time of day , weather, and scene types), of which 10,000 are anno- tated for semantic segmentation. Cityscapes [ 10 ] consists of 5,000 finely annotated images collected from 50 cities, with 2,975 for training, 500 for validation, and 1,525 for testing. W e primarily conduct our experiments on BDD100K, as it is the more complex dataset with di verse driving scenes and illumination conditions. Evaluating B ADSEG on BDD100K enables us to assess its rob ustness under challenging real- world scenarios. Results on Cityscapes are also reported to verify that our findings transfer across datasets with different scales and collection en vironments. In Section 8 , we ev aluate B ADSEG-SAM on LabPicsV1 [ 13 ]. V ector-LabPics V1 contains 2,187 images of chemical e xperiments featuring materials in mostly transparent vessels across di verse laboratory scenes and e veryday conditions (e.g., be verage handling). W e choose this dataset for SAM e v aluation because it provides re gion-le vel annotations for each material phase along with its type, enabling us to assess SAM’ s ability to produce fine-grained, detailed segmentations. Models. W e consider three representati ve semantic se gmen- tation architectures. PSPNet [ 60 ] employs a pyramid pool- ing module to aggre gate multi-scale conte xtual information, standing as one of the earliest baselines in semantic se gmenta- tion. DeepLabV3 [ 5 ] integrates atrous spatial pyramid pooling for enhanced contextual reasoning. Both utilize ResNet back- bones that are pre-trained on ImageNet. ConvNeXt-T [ 34 ], pre-trained on ImageNet-22K, modernizes Con vNet design by incorporating architectural refinements inspired by trans- formers while retaining conv olutional efficienc y . W e conduct most e xperiments using Con vNeXt-T , for it is a modern archi- tecture with high efficienc y , making it a strong baseline for semantic segmentation. In Section 8 , we consider three T ransformer -based emerg- ing architectures, including V iT -B [ 11 ], DeiT -S [ 47 ], and Swin-T [ 33 ] (all strucutred with a UPerNet head). V iT -B [ 11 ] is a V ision T ransformer that tokenizes an image into fixed- size patches and models global context via full self-attention, offering strong representation capacity . DeiT -S [ 47 ] follows the same transformer formulation as V iT b ut improves data ef- ficiency through kno wledge distillation during training, mak- ing it a lightweight yet competitiv e alternati ve under limited data. Swin-T [ 33 ] adopts a hierarchical design with windo w- based self-attention and shifted windows, yielding multi-scale features that better match the locality and scale variation re- quired by segmentation while keeping attention computation tractable. These transformer-based models are widely adopted in prior work and thus serve as representati v e baselines in our ev aluation. W e also consider emerging architectures such as SAM. Specifically , we use sam2-hiera-T [ 42 ] in our e valuation. sam2-hiera-T is a lightweight SAM 2 variant that adopts a hierarchical transformer backbone (“Hiera”), enabling strong mask prediction with improved efficienc y and making it a practical representative of recent segment-an ything–style models. Evaluation Metrics. F ollo wing prior backdoor e v aluations in semantic segmentation [ 23 , 40 ], we assess attacks in the following metrics: • Attack Success Rate (ASR). ASR measures ho w often vic- 18 tim pixels are successfully mis-se gmented to the specified target class under the poisoned test: ASR = N success / N victim , where N victim is the number of victim pixels and N success is the subset predicted as the tar get class. N victim is attack- dependent: for B2O it is restricted to the intended appearing region; for Instance-Le vel attacks it is restricted to the tar- geted instance; otherwise it includes all pixels of the victim class. • P oisoned Benign Accuracy (PB A). PB A captures se gmen- tation utility on non-victim pixels in the poisoned test. W e compute it as mIoU between predictions and ground truth after masking out victim pixels. • Clean Benign Accuracy (CB A). CBA measures standard segmentation utility on clean data, defined as the mIoU on the unmodified test set. An effecti v e backdoor should keep CB A close to a cleanly trained mode l, indicating minimal impact on normal functionality . W e measure attack stealthiness using three standard image- similarity metrics computed between clean and poisoned im- ages. • P eak Signal-to-Noise Ratio (PSNR) summarizes pixel-le vel distortion, where higher values indicate smaller perturba- tions. • Structural Similarity Index(SSIM) measures similarity in luminance, contrast, and structure, with higher scores indi- cating better perceptual alignment. • Learned P erceptual Image P atch Similarity (LPIPS) uses deep features to approximate human perceptual distance; lower LPIPS means the poisoned image is more perceptu- ally similar to the clean one. D Additional Experiments Additional Results on the Impact of Surrogate Models. T able 21 lists the top 20 closest class pairs by normalized dis- tance for DeepLabV3 and PSPNet. These results align closely with Con vNeXt-T , as our preferred class pairs consistently appear in the top 20 across all models. This demonstrates that optimized class pair selection remains robust regardless of the surrogate model used. Additional Results on the Impact of Victim–T arget Classes. T able 22 provides additional information for attack perfor- mance under different victim/tar get class configurations. T a- ble 22a ev aluates O2O Attacks with a fixed victim class (car) and v arying target classes. Attack ef fectiveness remains con- sistently high, with over 91% of poisoned samples success- fully mis-segmented into target classes, while model utility remains stable. This demonstrates that once a victim class is compromised, the backdoor generalizes ef fecti vely across multiple target classes without degrading model utility . T a- bles 22b and 22c report additional model-utility results, sho w- ing that varying the victim or tar get class mainly af fects back- door activ ation, while leaving clean-data performance essen- tially unchanged. T able 21: T op 20 closest class pairs by normalized distance with different models (complementary to T able 7 ). (a) Results with DeepLabV3. Rank Class Pair Distance Suitable Attacks 1 (Building, V egetation) 0.1087 B2B 2 (Building, Traf fic Sign) 0.1187 B2B 3 (V egetation, Traf fic Sign) 0.1414 B2B 4 (Car , Sidewalk) 0.1511 O2B, B2O 5 (Sidewalk, Road) 0.1570 B2B 6 (Sidewalk, Person) 0.1580 O2B, B2O 7 (Road, Car) 0.1591 O2B, B2O 8 (Sidewalk, T errain) 0.1760 B2B 9 (Fence, Building) 0.1815 B2B 10 (Pole, T errain) 0.1837 B2B 11 (Person, Car) 0.1875 O2O 12 (Building, Pole) 0.2015 B2B 13 (Traf fic Sign, Fence) 0.2100 B2B 14 (Person, T errain) 0.2112 O2B, B2O 15 (Pole, Traf fic Sign) 0.2125 B2B 16 (Road, Person) 0.2232 O2B, B2O 17 (Person, Fence) 0.2269 O2B, B2O 18 (V egetation, Pole) 0.2277 B2B 19 (Sidewalk, Pole) 0.2407 B2B 20 (Traf fic Light, Building) 0.2412 B2B (b) Results with PSPNet. Rank Class Pair Distance Suitable Attacks 1 (Building, V egetation) 0.1116 B2B 2 (Building, Traf fic Sign) 0.1207 B2B 3 (V egetation, Traf fic Sign) 0.1408 B2B 4 (Person, Sidewalk) 0.1513 O2B, B2O 5 (Car , Sidewalk) 0.1523 O2B, B2O 6 (Car, Road) 0.1563 O2B, B2O 7 (Sidewalk, Road) 0.1569 B2B 8 (Sidewalk, T errain) 0.1721 B2B 9 (Fence, Building) 0.1758 B2B 10 (Car, Person) 0.1759 O2O 11 (Pole, T errain) 0.1814 B2B 12 (Pole, Building) 0.1964 B2B 13 (T errain, Person) 0.2059 O2B, B2O 14 (Road, Person) 0.2078 O2B, B2O 15 (Fence, Traf fic Sign) 0.2082 B2B 16 (Traf fic Sign, Pole) 0.2092 B2B 17 (Fence, Person) 0.2150 O2B, B2O 18 (Pole, V egetation) 0.2240 B2B 19 (Sidewalk, Pole) 0.2251 B2B 20 (Pole, Car) 0.2268 O2B, B2O Impact of Poisoning Rate. T able 23 studies the effect of the poisoning rate (the percentage of training samples stamped with triggers) on attack performance. ASR increases monoton- ically as the poisoning rate gro ws. A similar trend is observed across all attack vectors, with B2O Attacks consistently reach- ing the highest ASRs. Notably , our attacks remain highly effecti ve even at a low poisoning rate (0.05), whereas prior work typically relied on 0.1 to 0.2 to obtain practical backdoor performance [ 23 , 40 ]. This highlights the efficienc y of our poisoning strategy . Comparison with Prior W ork. W e compare against three representati ve baselines: HB A [ 28 ], OFBA [ 40 ], and IB A [ 23 ]. In our studies, all three methods fall under the category of O2B Attacks. W e re-implement these baselines under consis- tent hyperparameter settings to ensure fair comparison and ev aluate them using ConvNeXt-T on BDD100K. The results in T able 24 show that our proposed attacks consistently out- 19 T able 22: Attack performance under dif ferent victim–target class configurations (fix ed victim/target) for various at- tacks(complementary to T able 11 ). (a) O2O: fixed victim (car), v arying target (objects). V ictim T arget ASR ↑ PBA ↑ CBA ↑ person 0.9352 0.6090 0.6265 rider 0.9163 0.6102 0.6316 truck 0.9307 0.6310 0.6369 car bus 0.9184 0.6223 0.6382 train 0.9281 0.6366 0.6367 motorcycle 0.9264 0.6006 0.6310 bicycle 0.9303 0.5874 0.6227 (b) O2B: fixed victim (car), v arying target (stuf f). V ictim T arget ASR ↑ PBA ↑ CBA ↑ road 0.9428 0.6283 0.6310 sidewalk 0.9280 0.6214 0.6356 building 0.9288 0.6197 0.6274 wall 0.9058 0.6245 0.6195 fence 0.9083 0.6228 0.6261 car pole 0.9187 0.6206 0.6280 traffic light 0.9246 0.6116 0.6387 traffic sign 0.9275 0.6069 0.6330 vegetation 0.9427 0.6183 0.6147 terrain 0.9154 0.6219 0.6281 sky 0.9250 0.6374 0.6421 (c) O2B: fixed tar get (road), varying victim (objects). V ictim T arget ASR ↑ PBA ↑ CB A ↑ person 0.7498 0.6422 0.6075 rider 0.4054 0.6516 0.6151 car 0.9428 0.6283 0.6310 truck road 0.7264 0.6315 0.6147 bus 0.7850 0.6216 0.6163 train 0.0143 0.6530 0.6185 motorcycle 0.4051 0.6239 0.6125 bicycle 0.7968 0.6290 0.5942 perform prior approaches, confirming the effecti v eness of our framew ork. E Additional Experiments and Details on De- fenses STRIP [ 16 ] monitors prediction consistency when noise is added to model inputs, using entropy as a measure of this con- sistency . For clean inputs, perturbations produce random and varied predictions, yielding high entropy v alues. In contrast, backdoored inputs exhibit low entropy due to their consis- tently biased predictions toward the target class. T o adapt STRIP for semantic segmentation models, we modify its ap- proach from computing a single Shannon entrop y v alue per input to generating an entropy map that captures per-pixel entropy across the segmentation mask. W e then aggregate these pixel-wise scores into a unified metric for the entire image. This aggregated score serv es as the decision criterion for identifying poisoned inputs. W e ev aluate STRIP’ s detec- tion performance using 250 clean and 250 poisoned samples, randomly sampled from the dataset. The threshold is deter- T able 23: Results of various poisoning rates with ASR met- rics. Rate 0.05 0.1 0.2 0.3 0.5 O2O 0.8936 0.9212 0.9352 0.9346 0.9495 O2B 0.8794 0.9257 0.9428 0.9406 0.9423 B2O 0.9349 0.9621 0.9835 0.9877 0.9892 B2B 0.8996 0.915 0.9315 0.9267 0.9364 INS 0.8534 0.8891 0.9076 0.9143 0.9289 CON 0.8623 0.8956 0.9204 0.9267 0.9378 T able 24: Comparison between prior work and our attacks. Method HBA [ 28 ] OFBA [ 40 ] IBA [ 23 ] Ours (O2O) Ours (O2B) ASR 0.1815 0.7859 0.8314 0.9352 0.9428 Method Ours (B2O) Ours (B2B) Ours (Ins) Ours (Con) ASR 0.9835 0.9315 0.9076 0.9204 mined based on a target false positiv e rate (FPR): samples whose entropy lies belo w the FPR-th percentile of the clean entropy distrib ution are classified as poisoned samples. Fig- ure 7 presents additional KDE plots of STRIP entropies. The results across different attacks indicate that the entropy distri- butions of clean and poisoned samples are highly o verlapping and difficult to distinguish. This suggests that entrop y-based detection is ineffecti ve for backdoor attacks in semantic seg- mentation. T eCo [ 30 ] operates on the principle that clean images exhibit consistent robustness patterns across dif ferent types of image corruptions. These corruptions include noise, blur , brightness changes, pixel-lev el perturbations, and image compression. For instance, an image robust to noise corruption typically demonstrates predictable robustness to blur corruption as well. Con v ersely , backdoor triggers exhibit inconsistent robustness profiles across different types of corruption. They may be resilient to certain corruptions while being highly sensitiv e to others, resulting in high variance in the sev erity required to disrupt the model’ s prediction. T o adapt T eCo for seman- tic segmentation models, we need to consider whether all labels on the prediction mask have changed. W e compute the mIoU between the clean prediction mask and the corrupted prediction mask. A prediction is considered “broken” when the mIoU falls belo w a predefined threshold, indicating sub- stantial degradation in segmentation quality under corruption. W e adopt the 15 corruption types used in [ 30 ] and report the corresponding results, including gaussian noise , shot noise , impulse noise , defocus blur , glass blur , motion blur , zoom blur , snow , fr ost , fog , brightness , contrast , elastic transform , pixelate , and jpe g compr ession . W e ev aluate T eCo’ s detection performance using 500 clean and 500 poisoned samples. T a- ble 25 provides additional experimental results on defense methods. Figure 8 presents additional KDE plots of T eCo scores. The results across different attacks indicate that the score distributions of clean and poisoned samples are also highly ov erlapping. This suggests that T eCo is ineffecti ve for 20 T able 25: Backdoor detection results under T eCo defense with additional ev aluation metrics (complementary to T able 16 ). Attack 1std 2std 3std A CC Recall F1 A UC ACC Recall F1 AUC A CC Recall F1 AUC HBA 0.568 0.2521 0.3494 0.5446 0.553 0.0652 0.1183 0.5169 0.540 0.0000 0.0000 0.5000 OFBA 0.583 0.2369 0.3433 0.5574 0.543 0.0630 0.1126 0.5074 0.540 0.0000 0.0000 0.5000 IBA 0.586 0.3112 0.4202 0.5765 0.529 0.0311 0.0599 0.5117 0.518 0.0000 0.0000 0.5000 O2O 0.528 0.1739 0.2532 0.5018 0.539 0.0478 0.0871 0.5026 0.540 0.0000 0.0000 0.5000 O2B 0.628 0.3826 0.4862 0.6098 0.560 0.0760 0.1373 0.5242 0.540 0.0000 0.0000 0.5000 B2O 0.558 0.1183 0.1754 0.4829 0.591 0.0377 0.0683 0.4965 0.603 0.0000 0.0000 0.5000 B2B 0.604 0.1445 0.1951 0.4885 0.655 0.0271 0.0496 0.4971 0.668 0.0000 0.0000 0.5000 INS 0.572 0.2134 0.3089 0.5312 0.548 0.0589 0.1067 0.5098 0.540 0.0000 0.0000 0.5000 CON 0.595 0.2687 0.3712 0.5523 0.556 0.0698 0.1245 0.5187 0.540 0.0000 0.0000 0.5000 −1 0 1 2 Loss 0 1 2 3 4 5 Density Samples Clean Backdoor (a) HB A −1 0 1 2 Loss 0 1 2 3 4 5 Density Samples Clean Backdoor (b) OFB A −1 0 1 2 Loss 0 1 2 3 4 5 Density Samples Clean Backdoor (c) IB A −1 0 1 2 Loss 0 1 2 3 4 5 Density Samples Clean Backdoor (d) O2O −1 0 1 2 Loss 0 1 2 3 4 5 Density Samples Clean Backdoor (e) O2B −1 0 1 2 Loss 0 1 2 3 4 5 Density Samples Clean Backdoor (f) B2O −1 0 1 2 Loss 0 1 2 3 4 5 Density Samples Clean Backdoor (g) B2B −1 0 1 2 Loss 0 1 2 3 4 5 Density Samples Clean Backdoor (h) INS −1 0 1 2 Loss 0 1 2 3 4 5 Density Samples Clean Backdoor (i) CON Figure 6: KDE plots of the losses of more attacks under ABL defense (complementary to Figure 5 ). segmentation backdoor attacks. Beatrix [ 38 ] analyzes subtle changes in a model’ s internal activ ation patterns using Gramian information. The method operates on the principle that backdoor triggers induce statisti- cally significant anomalies in the internal feature correlations of a model. It exploits this discrepancy by modelling the Gramian information of feature maps to distinguish between clean and poisoned samples. Adapting Beatrix to semantic segmentation models requires a class-wise sample grouping approach. In classification models, Beatrix lev erages ground truth class labels to group input samples. It then calculates Gramian information within each group to identify poisoned samples. Howe ver , semantic segmentation models produce masks containing per-pix el class labels. This differs from sin- gle categorical predictions in classification tasks. T o accom- modate this structural difference, we adopt a straightforward adaptation strate gy . F or each ground truth segmentation mask, we identify the most dominant class, which is the class occu- pying the lar gest pixel area. W e then use this dominant class as the grouping criterion for that sample. W e ev aluate Beat- 0.0 0.5 1.0 1.5 2.0 Entropy 0 1 2 3 Density Threshold = 0.16 Samples Clean Backdoor (a) HB A 0.0 0.5 1.0 1.5 2.0 Entropy 0 1 2 3 Density Threshold = 0.37 Samples Clean Backdoor (b) OFB A 0.0 0.5 1.0 1.5 2.0 Entropy 0 1 2 3 Density Threshold = 0.73 Samples Clean Backdoor (c) IB A 0.0 0.5 1.0 1.5 2.0 Entropy 0 1 2 3 Density Threshold = 0.29 Samples Clean Backdoor (d) O2O 0.0 0.5 1.0 1.5 2.0 Entropy 0 1 2 3 Density Threshold = 0.47 Samples Clean Backdoor (e) O2B 0.0 0.5 1.0 1.5 2.0 Entropy 0 1 2 3 Density Threshold = 0.39 Samples Clean Backdoor (f) B2O 0.0 0.5 1.0 1.5 2.0 Entropy 0 1 2 3 Density Threshold = 0.53 Samples Clean Backdoor (g) B2B 0.0 0.5 1.0 1.5 2.0 Entropy 0 1 2 3 Density Threshold = 0.27 Samples Clean Backdoor (h) INS 0.0 0.5 1.0 1.5 2.0 Entropy 0 1 2 3 Density Threshold = 0.45 Samples Clean Backdoor (i) CON Figure 7: KDE plots of the entropy scores of more attacks under STRIP defense (complementary to Figure 5 ). rix’ s detection performance on 100 clean and 100 poisoned samples. The experimental results in Section 7 demonstrate that Beatrix is inef fectiv e for segmentation backdoor attacks. F Additional Discussion on Attacking T rans- f ormers Prior work has shown that T ransformer-based models are vulnerable to a range of priv acy and security attacks [ 52 , 59 ]. T able 18 reports backdoor attack results for V iT -B [ 11 ], DeiT - S [ 47 ], and Swin-T [ 33 ] on BDD100K. Across all settings, these transformers are comparably vulnerable to backdoor attacks on con v entional segmentation models. Perf ormance of Swin-T . Swin-T and Con vNeXt-T both pre- serve strong local inductiv e biases and hierarchical multi- scale representations. ConvNeXt-T extracts local patterns through con volutions, while the Swin T ransformer relies on windowed self-attention. In both cases, a localized trigger can be reliably captured in the early layers and passed through 21 −1 0 1 2 3 Score 0.00 0.25 0.50 0.75 1.00 1.25 Density Samples Clean Backdoor (a) HB A −1 0 1 2 3 Score 0.00 0.25 0.50 0.75 1.00 1.25 Density Samples Clean Backdoor (b) OFB A −1 0 1 2 3 Score 0.00 0.25 0.50 0.75 1.00 1.25 Density Samples Clean Backdoor (c) IB A −1 0 1 2 3 Score 0.00 0.25 0.50 0.75 1.00 1.25 Density Samples Clean Backdoor (d) O2O −1 0 1 2 3 Score 0.00 0.25 0.50 0.75 1.00 1.25 Density Samples Clean Backdoor (e) O2B −1 0 1 2 3 Score 0.00 0.25 0.50 0.75 1.00 1.25 Density Samples Clean Backdoor (f) B2O −1 0 1 2 3 Score 0.00 0.25 0.50 0.75 1.00 1.25 Density Samples Clean Backdoor (g) B2B −1 0 1 2 3 Score 0.00 0.25 0.50 0.75 1.00 1.25 Density Samples Clean Backdoor (h) INS −1 0 1 2 3 Score 0.00 0.25 0.50 0.75 1.00 1.25 Density Samples Clean Backdoor (i) CON Figure 8: KDE plots of the detection scores of more attacks under T eCo defense (complementary to Figure 5 ). the feature pyramid to the segmentation head. This locality and multi-scale feature extraction enable learning a stable trigger-to-mask manipulation, leading to comparable attack performance. Perf ormance of ViT -B and DeiT . V iT -B and DeiT tokenize images into patches, emphasizing global token interactions that can weaken the signal of small, localized triggers. The trigger may not consistently dominate attention across di verse scenes, making it harder for the model to learn a highly reli- able backdoor . Nev ertheless, their ASRs remain non-tri vial, indicating that they are still clearly susceptible to backdoor attacks. G V isualization Figures 9 to 14 visualize our attacks on BDD100K, with one representati ve setting for each attack vector . In the O2O attack (Figure 9 ), the model maps the victim class car to the target class person once the trigger is present. In the O2 B attack (Figure 10 ), the model maps car to the background class r oad . In the B2O attack (Figure 11 ), the model instead maps the background class r oad to the object class car , producing hallucinated object regions. In the B2B attack (Figure 12 ), the model maps sidewalk to r oad when triggers are injected into the victim region. In the INS attack (Figure 13 ), the trigger affects only the stamped object instance: the attacked car instance is flipped to r oad , while other car instances remain unchanged. In the CON attack (Figure 14 ), the backdoor behavior is conditioned on context: the model maps car to r oad only when the trigger appears on a red car . Overall, these visualizations confirm the ef fectiveness of our attacks and illustrate how triggers can induce targeted yet context- dependent segmentation errors. H Additional Related W ork Other Related Research. Some research explored backdoor triggers embedded within objects in images [ 3 , 4 , 9 , 36 ], which relates to our coarse-grained attacks. These works primarily target object detection by manipulating bounding box predic- tions. In contrast, our attacks focus on semantic segmentation, manipulating individual pixel classifications to produce in- correct segmentation masks, requiring distinct trigger design strategies. Some prior backdoor attacks [ 12 , 44 ] share similarities with our fine-grained attacks, but k ey distinctions remain. In [ 12 ], they use a single condition (JPEG compression presence). Howe ver , our method requires composite conditions, which in v olve both a specific trigger pattern on an object and the ob- ject ha ving particular attrib utes (i.e., a specific color). This en- ables more flexible, context-dependent attacks with enhanced stealthiness. In [ 44 ], they focused on model degradation rather than backdoor acti v ation, whereas our method tar gets back- door attacks through trigger and object associations. I Discussion on Real-W orld Implementation Our work identifies four coarse-grained attack vectors, defined by class-le vel semantic labels, and tw o fine-grained v ectors, characterised by activ ation specificity . W e discuss how these attacks could acti v ate in a realistic autonomous-dri ving per- ception pipeline, where a backdoored segmentation model processes images captured by a front-facing camera. W e consider a practical adversary who can ph ysically place a small trigger in the en vironment (e.g., near a roadway) b ut does not need to tamper with the victim v ehicle. Once the trig- ger enters the camera’ s field of vie w , the backdoored model may produce a manipulated segmentation mask while be- having normally on clean scenes. In practice, the geometric trigger can be printed (e.g., on paper or a stick er) and attached to common surfaces or objects with minimal ef fort. For O2O, O2B and INS attacks, the trigger can be placed on the rear of a vehicle parked by the roadside, such that it appears in the captured scene when the victim car passes by . For B2O and B2B attacks, the trigger can be placed on planar background regions such as the road surface or side walk to induce false positi ves or background relabeling. For the CON attack, the trigger can be attached to a specific contextual object (e.g., the rear of a red car), causing activ ation only when the contextual condition is satisfied. As the victim vehicle approaches and captures the scene, the trigger becomes visible in one or more frames, acti vating the backdoor and causing the attacker -chosen manipulation. 22 Original Image Original Label Poisoned Image Poisoned Output Figure 9: V isualization for O2O Attack. Original Image Original Label Poisoned Image Poisoned Output Figure 10: V isualization for O2B Attack. Original Image Original Label Poisoned Image Poisoned Output Figure 11: V isualization for B2O Attack. 23 Original Image Original Label Poisoned Image Poisoned Output Figure 12: V isualization for B2B Attack. Original Image Original Label Poisoned Image Poisoned Output Figure 13: V isualization for Instance-Lev el Attack. Original Image Original Label Poisoned Image Poisoned Output Figure 14: V isualization for Conditional Attack. 24

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment