Post-hoc Self-explanation of CNNs

P O S T - H O C S E L F - E X P L A NA T I O N O F C N N S Ahc ` ene Boubekki & Line H. Clemmensen Department of Mathematical Sciences Univ ersity of Copenhagen, Denmark { ahcene.boubekki,lkhc } @math.ku.dk A B S T R AC T Although standard Con volutional Neural Networks (CNNs) can be mathematically reinterpreted as Self-Explainable Models (SEMs), their b uilt-in prototypes do not on their o wn accurately represent the data. Replacing the ﬁnal linear layer with a k -means-based classiﬁer addresses this limitation without compromising perfor- mance. This work introduces a common formalization of k -means-based post-hoc explanations for the classiﬁer , the encoder’ s ﬁnal output (B4), and combinations of intermediate feature acti vations. The latter approach le verages the spatial con- sistency of con volutional recepti ve ﬁelds to generate concept-based explanation maps, which are supported by gradient-free feature attribution maps. Empirical ev aluation with a ResNet34 sho ws that using shallo wer, less compressed feature activ ations, such as those from the last three blocks (B234), results in a trade-of f between semantic ﬁdelity and a slight reduction in predictiv e performance. 1 I N T RO D U C T I O N Con volutional Neural Networks (CNNs) are the basis for se veral foundation models, particularly for image classiﬁcation Deng et al. (2009), but are also utilized for other data types Ribeiro et al. (2020). Their architectures, inspired by the biological visual corte x, typically consist of a feature extractor or encoder that combines 2D con volutions, acti vation functions, dropout, and normalization layers, followed by an a verage pooling and ﬁnally a linear classiﬁer or re gressor . This structure is present in many modern models, including ResNet He et al. (2016) and DenseNet Huang et al. (2017). In some cases, a dropout layer precedes the classiﬁer , as in GoogleNet Szegedy et al. (2015) or EfﬁcientNet T an & Le (2019). Some variants use a multilayer perceptron as the classiﬁer Krizhe vsky et al. (2012); Simonyan & Zisserman (2014). The consistent element is the average pooling between the encoder and the classiﬁer . This paper shows that this ke y operator renders these models structurally and mechanistically analogous to self-explainable models. Self-explainable models (SEMs) constitute a class of architectures designed to improve the interpretability and transparency of deep learning models by e xplicitly associating predictions with human-understandable concepts or prototypes. Here, we distinguish these two by deﬁning concepts as frequent/relev ant patterns in the training dataset that may possess varying degrees of semantic meaning, such as ”a blue sky” or ”a plane. ” Prototypes are vectors representing these concepts in the embedding space, and the similarity between input data and these prototypes provides concept-based explainability for predictions. The selection of concepts and prototypes is a critical consideration. Prototypes selected a priori Kim et al. (2018) may introduce biases and fail to capture the full di versity of the dataset Celis et al. (2016). Jointly learning prototypes during training comes at a cost of a complicated alternating training scheme and may negati vely affect the performance Alvarez Melis & Jaakkola (2018); Chen et al. (2019); Kjærsgaard et al. (2024). Additionally , because these architectures dif fer from the traditional classiﬁers they are intended to explain or replace, the interpretability insights obtained may not transfer seamlessly . This work adv ocates for computing prototypes after training. Indeed, the classic feedforw ard architecture remains highly versatile and efﬁcient. A common formalization framework for k -means- based post-hoc e xplanations is introduced, cov ering three e xplanation locations: the classiﬁer weights, the encoder’ s ﬁnal output, and its intermediate feature activ ations. It encompasses the cases where the CNN is reinterpreted as a SEM, the linear classiﬁer is substituted with a k -means-based one as described in Gautam et al. (2024), and the multi-depth explanation of the encoder building Boubekki 1 et al. (2025). The latter is here further extended to produce gradient-free and concept-aligned feature attribution maps. 2 P RO B L E M D E FI N I T I O N A N D N O TA T I O N S Notational con ventions follow Goodfello w et al. (2016): capital letters denote dimensions, dot products are written · , and biases are omitted from classiﬁers without loss of generality . The classiﬁer is represented by a matrix C with columns c j . A standard CNN-based classiﬁer architecture comprises a CNN encoder enc : R Q → R R × D , an av erage pooling operation a vg . p ool : R R × D → R D , and a linear classiﬁer clf : R D → R C . The encoder output is typically a tensor of dimension ( W , H, D ) . Ho wev er , for legibility , the ﬁrst two dimensions are here ﬂattened. The operations are formally deﬁned as follo ws: x enc 7− − → h avg . po ol 7− − − − − → z clf 7− → y , (1) where x ∈ R Q , h ∈ R R × D , z ∈ R D and y ∈ R C , and: a vg . p ool( h ) = 1 R R X r =1 h r = z and clf ( z ) = z · C = ( z · c j ) j =1 ...C . (2) SEM Self-explainable models differ from classic CNNs by their output layer . In this case, a classiﬁer which compares a feature v ector , also noted h , to a set of K > 0 prototypes P = { p k } K using a similarity measure sim . The similarity scores are then mapped using pro j into a prediction score y ∈ R C . x enc 7− − → h sim 7− − →  sim( h r , p k )  R,K pro j 7− − → y . (3) The dimensions of h are intentionally omitted as the location of the prototypical classiﬁer depends on which part of the CNN we want to e xplain. The similarity measure v aries between models and is usually implemented as either a dot product Parekh et al. (2021) or a distance metric Chen et al. (2019). In order to remain transpar ent , pro j should remain as simple as possible. T ypically , it is either a matrix multiplication Chen et al. (2019) or a pooling operation. Explaining with ProtoPNet W e use ProtoPNet introduced in Chen et al. (2019) as the SEM baseline. It employs an alternating update strate gy: the encoder is updated via gradient descent, and the prototype layer is updated via a multi-stage procedure. The similarity measure is ℓ 2 -based, and pro j is a linear layer . 3 P O S T - H O C S E L F - E X P L A N A T I O N O F C N N S This section presents a common formalization for three k -means-based post-hoc self-explanation of frozen CNNs. 3 . 1 S E L F - E X P L A I N I N G T H E C L A S S I FI E R A CNN can be reinterpreted as a SEM. Indeed, since the classiﬁer is a linear layer, its column v ectors can be viewed as prototypes, and the matrix operation as the similarity measure. Theorem 1. Convolutional neur al network classiﬁers ar e self-explainable models with C pr ototypes corr esponding to the vector columns of the classiﬁer . Pr oof. Consider a CNN classiﬁer as deﬁned in Equations 1 and 2. The commutativity of the a verage pool and dot-product operations means that the ﬁnal prediction is also the average prediction of each h r : clf ◦ avg . p ool( h ) =   1 R R X r =1 h r  · c j  j =1 ...C =  1 R R X r =1  h r · c j   j =1 ...C (4) 2 By setting pro j = a vg . p ool , deﬁning sim as the dot-product, and treating c j as prototypes, the following is obtained: clf ◦ avg . p ool ◦ enc( x ) = 1 R X R ( h r · c j ) = pro j ◦ sim ◦ enc( x ) (5) Therefore, the operations of the CNN can be reinterpreted as those of an SEM, as deﬁned by Equation 3. In practice, the c j vectors are not satisfactory prototypes, particularly with respect to div ersity . Their number is inherently limited by the number of classes. Preliminary experiments indicate that simply increasing the number of classes and merging them using max or av erage pooling, without additional regularization, results in a single direction dominating each class. Although the cross-entropy loss encourages prediction separability , leading to class embeddings covering distinct half-spaces, it does not guarantee that points are centered or distrib uted along the line deﬁned by the prototype or column vector , meaning that prototypes may be far from the data. This contradicts the implicit requirement of representativ eness or diversity of the prototypes. 3 . 2 E X P L A I N I N G T H E C L A S S I FI E R A workaround to the lack of representati veness is to replace the classiﬁer with a k -means based one as introduced in Gautam et al. (2024). As the centroids of class-wise clusterings, the prototypes are thus more likely to be closer to the data. The procedure is as follows: 1. For each of the C classes, K /C prototypes are learned using k -means on z of the training data. 2. The similarity measure is the exponential of minus the ℓ 2 distance: sim( h , p k ) = exp( - || h , p k || 2 ) . 3. The pro j returns a one-hot vector centered on the class of the prototype with largest similarity score. The original KMEx utilizes the ℓ 2 distance as a similarity measure and emplo ys a nearest neighbor classiﬁer . The similarity described in the second step is equiv alent and allows to derive the class predictions using an arg max . Although the method can accommodate varying numbers of prototypes per class, we ﬁx them to K/C prototypes per class. 3 . 3 E X P L A I N I N G T H E E N C O D E R T o explain the encoder rather than the classiﬁer, intermediate outputs, also referred to as feature activ ations, are compared to prototypes instead of the embedding v ector after av erage pooling, like in the previous section. This approach aims to access information that may be ﬁltered out before reaching the classiﬁer and provide explanations at the patch or se gment le vel. ProtoPNet is an e xample of such an approach, as its so-called part prototypes are compared to the pixels of the encoder’ s output. Other models Parekh et al. (2021); Zhu et al. (2025) utilize shallo wer feature activ ations to compute part prototypes. In these cases, the prototypes are composite representations that integrate information from multiple depths. The post-hoc framework presented here follows the rationale of KMEx Gautam et al. (2024) by using k -means to learn prototypes on a frozen backbone. Formally , it extends the work in Boubekki et al. (2025) to compute predictions and feature attrib utions. Extracting Featur e Activations Let us ﬁrst decompose the encoder into B blocks of layers: x f 1 7− → x (1) · · · f b 7− → x ( b ) · · · f B 7− − → x ( B ) = h , where x ( b ) ∈ R R b × D b . (6) Since the outputs of each block hav e different resolutions R b and numbers of channels or dimensions D b , preprocessing steps are required to compute a composite matrix q h . This matrix is then compared to the prototypes to generate a prediction vector y . The K/C class prototypes are learned using k -means on the ro w vectors q h r of the class data. The operations are as follo ws. 1. The intermediate outputs are ﬁrst linearly interpolated to a shared resolution R ′ . Upsample  x ( b )  = u ( b ) ∈ R R ′ × D b . (7) 2. The u ( b ) are then normalized and scaled before being concatenated into q h Concatenate  u ( b ) D b || u ( b ) ||  = q h ∈ R R ′ × ( P b D b ) (8) 3 3. The assignments of the ro ws q h r to the closest prototypes are stored in a binary matrix : sim  ( x ( b ) ) B , P  = sim  q h , P  = arg min k ( || q h r − p k || ) = q s ∈ R R ′ × K (9) 4. The prediction vector is computed as the av erage count of the class-wise clusters in q s , and the predicted class corresponds to the most frequently occurring cluster . pro j( q s ) = a vg . p o ol X r q s r ; K /C ! = y ∈ R C (10) The av erage pooling operation, which uses K/C non-ov erlapping windows, assumes a constant and ordered number of clusters per class. The assignment binary matrix q s can be interpreted as a low-resolution segmentation of the input, referred to as an “ explanation map ”. In practice, outputs from all blocks after a speciﬁed depth are combined. Accordingly , the notations q h ( b :) and q s ( b :) indicate that all outputs from f b to f B are utilized. Featur e Importance Numerous feature attribution methods have been proposed Sundararajan et al. (2017); Selvaraju et al. (2017); Petsiuk et al. (2018), resulting in a wide variety of outputs. The present work introduces a simpliﬁed approach that lev erages the self-explainable reinterpretation of a CNN and the commutati vity of average pooling with respect to the dot product. T o keep track of the input format, this section explicitly indicates the width W b and height H b of the outputs, rather than the aggregated R b dimension. Analogous to the class acti vation map (CAM) Zhou et al. (2016), the feature attribution score of pixel h wh is deﬁned as a measure of its alignment with c j . att( h wh , c j ) = h wh · c j || c j || 2 (11) The distinction lies in the normalization by the squared norm, which ensures that att( c T j , c j ) = 1 and allows v alues to be compared across classes. The resulting attribution map matches the low resolution of the encoder’ s output. Rather than relying on assumptions about the information con veyed by gradients, as in Grad-CAM v ariants Selv araju et al. (2017), the upstream feature attribution map is approximated by exploiting the spatial consistency of the con volutional receptiv e ﬁelds. Namely , the attrib ution map at depth b is computed from the one at depth b + 1 as follows: 1. Compute the e xplanation map at depth b , q s ( b :) ∈ R W b × H b × K . 2. Upsample att ( b +1:) ∈ R W b +1 × H b +1 to att ( b +1: , up ) ∈ R W b × H b . 3. Compute att ( b :) as the segment-wise a verage of att ( b +1: , up ) based on q s ( b :) . The segment-wise averaging results in discrete attribution, where pixels within the same segment share an identical attribution score. Interpr etation Pr ocess Figure 1 illustrates the interpretation process of the k -means based explana- tion of the encoder . Each input receives tw o forms of explanation: a segmentation map and a feature importance map. The segmentation map delineates regions of the input that yield similar feature activ ations, resulting in clustered segments. Cluster interpretation is supported by visualization of the nearest segment to associated prototype, using a consistent color scheme. The feature importance map highlights the relev ance of each concept with respect to the classiﬁer of the backbone model. Incorporating shallower feature acti vations enhances the resolution and detail of the map. 4 E X P E R I M E N T S Experimental Setting All e xperiments use a ResNet34 backbone pretrained on ImageNet, trained following Gautam et al. (2024), and e valuated on MNIST (Lecun et al. (1998)), STL10 (Coates et al. (2011)), and CUB-200 (W ah et al. (2011)), with ﬁ ve prototypes per class for the ﬁrst two and ten 4 B4 B234 Figure 1: Interpretation process for B4 (left) and B234 (right) on a CUB-200 red-bellied woodpecker . Red and blue indicate higher and lo wer feature importance. Representati ve patches are the closest training examples to each prototype; border colors match the e xplanation map segments. for CUB-200, if not speciﬁed otherwise. Our method is referenced according to the depth of the feature acti vations emplo yed. The ResNet34 backbone consists of a series of preprocessing layers and four residual blocks. The model utilizing the same information as ProtoPNet, speciﬁcally at the encoder’ s output (the fourth block), is designated as B4. The model B234, which incorporates outputs from the last three blocks, is also ev aluated. W e compare these to KMEx and ProtoPNet. Alignment of the Pr ototypes The column v ectors c j of the classiﬁer deﬁne the half-space in which the class embedding predominantly resides, rather than the direction along which the data spreads. T able 1 validates this interpretation using the average cosine similarity between each method’ s prototypes and their class embedding ( class ) and all other data points ( out ). The uniformly high cosines of ProtoPNet reﬂect the distinct embedding geometry produced by an integrated training. The class cosines of the CNN’ s classiﬁer weights are approximately 0 . 5 , versus nearly 1 for KMEx, indicating poor alignment of the c j with the data. The out cosines suggest that the c j deﬁne nearly orthogonal half-spaces, though the class embeddings themselves are not fully orthogonal. Figure 2 illustrates this misalignment via a UMAP projection McInnes et al. (2018) of the CNN’ s embeddings for the twenty sparro w classes of CUB-200. The classiﬁer prototypes c j , both raw (crosses) and rescaled to class norm (triangles), are dispersed and rarely near their respectiv e class; by contrast, the KMEx prototypes (squares) lie within their respective clusters. Figure 2: UMAP projection of training and test (lower opacity) embeddings from the twenty sparrow classes of CUB-200. Left: classiﬁer prototypes c j as crosses and rescaled to class norm as triangles. Right: KMEx prototypes. Concept-based Accuracy Regarding train and test accuracies (T able 2), replacing the linear classiﬁer with KMEx does not af fect the performance. This outcome is expected, since the prototypes of KMEx are learned class-wise and based on the same information as the c j . Although B4 deri ves 5 T able 1: A verage cosine of classiﬁer prototypes within class and out of class embedding. MNIST STL10 CUB200 Model class out class out class out CNN 0 . 48 − 0 . 05 0 . 46 − 0 . 04 0 . 51 0 . 00 KMEx ( K / C = 1 ) 0 . 88 0 . 32 0 . 79 0 . 42 0 . 93 0 . 47 KMEx ( K / C = 5 ) 0 . 83 0 . 30 0 . 84 0 . 34 0 . 87 0 . 45 ProtoPNet 1 . 00 1 . 00 0 . 95 0 . 92 0 . 98 0 . 97 T able 2: A verage train and test accuracies over ﬁv e runs. Models were pretrained on ImageNet. MNIST STL10 CUB200 Model train test train test train test CNN 100 . 0 99 . 4 95 . 6 86 . 1 100 . 0 79 . 1 KMEx ( K / C = 5 ) 100 . 0 99 . 4 94 . 9 85 . 6 100 . 0 78 . 6 ProtoPNet 100 . 0 99 . 4 74 . 2 72 . 7 99 . 4 66 . 4 ProtoPNet in Chen et al. (2019) 79 . 2 Ours B4 100 . 0 99 . 4 94 . 0 85 . 5 100 . 0 75 . 2 Ours B234 98 . 9 98 . 5 83 . 5 77 . 9 85 . 5 52 . 8 the class predictions from the concept proportions, it matches the performance of the backbone network (T able 2). Howe ver , including information at shallower depths, as in B234, renders the classiﬁcation signal less distinct, leading to reduced performance. 5 D I S C U S S I O N A N D R E L A T E D W O R K Concept-based explainability methods for deep networks hav e been extensi vely re viewed in Lee et al. (2025). The proposed framew ork offe rs a common formalization of k -means-based post-hoc expla- nations, explicitly a voiding gradient- or perturbation-based techniques. W e position our work with re- spect to the most closely related methods for explaining the encoder . TCA V (Kim et al. (2018)) probes feature acti vations at a single depth using user -deﬁned concept vectors and assigns feature importance using directional deri vati ves of the classiﬁer’ s output with respect to the concept vectors. Its extension, A CE (Ghorbani et al. (2019)), automates concept discov ery by segmenting inputs with SLIC (Achanta et al. (2012)) and processing the resulting crops through the encoder . Since SLIC segmentation operates independently of the encoder’ s learned representations, it explains image regions rather than internal concepts. Additionally , projecting crops through an encoder trained on complex images is likely to result in out-of-distrib ution behavior . Substituting TCA V with SHAP produces CONE- SHAP (Li et al. (2021))), which retains the same limitations. Network Dissection (Bau et al. (2017)) and Net2V ec (Fong & V edaldi (2018)) probe individual neurons at a single depth to identify human- interpretable concepts. For the ﬁnal layer of a ResNet34 (B4), this in volv es comparing 512 neurons, which, as reported by the authors, are often redundant. CRAFT (Fel et al. (2023)) and ICA CE (Zhang et al. (2021)) apply matrix f actorization to feature activ ations to extract concepts, and lev erage implicit differentiation and non-ne gati ve constraints, respecti vely , to produce concept attribution maps. How- ev er, restricting the back-propagated signal to a single class leav es portions of the input unexplained. 6 C O N C L U S I O N This paper introduces a common formalization of k -means-based post-hoc explanations cov ering three locations: the classiﬁer weights, the encoder’ s ﬁnal output, and its intermediate feature acti vations. The commutativity of a verage pooling and the linear classiﬁer shows that CNNs are mechanistically analogous to prototype-based SEMs. Howe ver , the classiﬁer’ s column vectors, when interpreted as prototypes, are poorly aligned with the data, undermining their representativ eness. Replacing them with k -means centroids addresses this limitation without loss of performance. Extending this to intermediate feature acti vations produces detailed, dense explanation maps supported by semantically rele vant prototypes. Attribution maps are computed via a gradient-free v ariant of CAM that lev erages the spatial consistency of con volutional receptive ﬁelds. T ogether , these results suggest that the percei ved black-box nature of CNNs is not an inherent property , but one that can be mitigated through rigorous reinterpretation of their existing operations. The primary limitation is the scalability of k -means to large datasets, though preliminary results suggest that performance remains stable on subsets of the training data. The sustained classiﬁcation performance of B4 and B234 using cluster distributions calls for further in vestigation. 6 R E F E R E N C E S Radhakrishna Achanta, Appu Shaji, Ke vin Smith, Aurelien Lucchi, P ascal Fua, and Sabine S ¨ usstrunk. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE T ransactions on P attern Analysis and Machine Intelligence , 34(11):2274–2282, 2012. David Alvarez Melis and T ommi Jaakkola. T o wards robust interpretability with self-e xplaining neural networks. Advances in neural information pr ocessing systems , 31, 2018. David Bau, Bolei Zhou, Aditya Khosla, Aude Oliv a, and Antonio T orralba. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) , pp. 6541–6549, 2017. Ahc ` ene Boubekki, Samuel G. Fadel, and Sebastian Mair . Explaining the Impact of Training on V ision Models via Activ ation Clustering, March 2025. URL 19700 . arXiv:2411.19700 [cs]. L Elisa Celis, Amit Deshpande, T arun Kathuria, and Nisheeth K V ishnoi. Ho w to be fair and di verse? arXiv pr eprint arXiv:1610.07183 , 2016. Chaofan Chen, Oscar Li, Daniel T ao, Alina Barnett, Cynthia Rudin, and Jonathan K Su. This looks like that: deep learning for interpretable image recognition. In Advances in Neural Information Pr ocessing Systems , volume 32, 2019. Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. In Geoffrey Gordon, David Dunson, and Miroslav Dud ´ ık (eds.), Proceedings of the F ourteenth International Conference on Artiﬁcial Intelligence and Statistics , v olume 15 of Pr oceedings of Machine Learning Resear ch , pp. 215–223, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR. Jia Deng, W ei Dong, Richard Socher , Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hier- archical image database. In 2009 IEEE Conference on Computer V ision and P attern Recognition , pp. 248–255, 2009. Thomas Fel, Agustin Picard, Louis Bethune, Thibaut Boissin, David V igouroux, Julien Colin, R ´ emi Cad ` ene, and Thomas Serre. Craft: Concept recursiv e activ ation factorization for explainability . In Proceedings of the IEEE/CVF Conference on Computer V ision and P attern Recognition , pp. 2711–2721, 2023. Ruth Fong and Andrea V edaldi. Net2vec: Quantifying and explaining how concepts are encoded by ﬁlters in deep neural networks. In Pr oceedings of the IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) , pp. 8730–8738, 2018. Srishti Gautam, Ahcene Boubekki, Marina MC H ¨ ohne, and Michael Kampffme yer . Prototypical self-explainable models without re-training. T ransactions on Machine Learning Resear ch , 2024. Amirata Ghorbani, James W exler , James Y Zou, and Been Kim. T o wards automatic concept-based explanations. In Advances in Neural Information Pr ocessing Systems , volume 32, 2019. Ian Goodfellow , Y oshua Bengio, and Aaron Courville. Deep Learning . MIT Press, 2016. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In Pr oceedings of the IEEE Conference on Computer V ision and P attern Recog- nition (CVPR) , pp. 770–778, 2016. URL . arXiv: 1512.03385. Gao Huang, Zhuang Liu, Laurens V an Der Maaten, and Kilian Q W einber ger . Densely connected con volutional networks. In Pr oceedings of the IEEE confer ence on computer vision and pattern r ecognition , pp. 4700–4708, 2017. Been Kim, Martin W attenberg, Justin Gilmer, Carrie Cai, James W e xler , Fernanda V iegas, et al. Interpretability beyond feature attribution: Quantitativ e testing with concept activ ation vectors (tcav). In International confer ence on machine learning , pp. 2668–2677. PMLR, 2018. 7 Rune Kjærsgaard, Ahc ` ene Boubekki, and Line Clemmensen. Pantypes: Div erse representatives for self-explainable models, 2024. URL . Alex Krizhe vsky , Ilya Sutske ver , and Geoffre y E. Hinton. Imagenet classiﬁcation with deep con volu- tional neural networks. In Advances in Neural Information Pr ocessing Systems , pp. 1097–1105, 2012. Y . Lecun, L. Bottou, Y . Bengio, and P . Haffner . Gradient-based learning applied to document recognition. Pr oceedings of the IEEE , 86(11):2278–2324, 1998. Jae Hee Lee, Georgii Mikriuk ov , Gesina Schwalbe, Stefan W ermter, and Diedrich W olter . Concept- based explanations in computer vision: Where are we and where could we go? In Computer V ision – ECCV 2024 W orkshops , volume 15643 of Lecture Notes in Computer Science , pp. 266–287. Springer Nature Switzerland, 2025. Jiahui Li, Kun Kuang, Lin Li, Long Chen, Songyang Zhang, Jian Shao, and Jun Xiao. Instance-wise or class-wise? a tale of neighbor shapley for concept-based explanation. In Pr oceedings of the 29th A CM International Confer ence on Multimedia , pp. 3664–3672, 2021. Leland McInnes, John Healy , Nathaniel Saul, and Lukas Großberger . Umap: Uniform manifold approximation and projection. Journal of Open Sour ce Software , 3(29):861, 2018. Jayneel Parekh, Pavlo Mozharovskyi, and Florence d’Alch ´ e Buc. A framew ork to learn with interpretation. Advances in Neural Information Pr ocessing Systems , 34:24273–24285, 2021. V itali Petsiuk, Abir Das, and Kate Saenko. RISE: Randomized input sampling for explanation of black-box models. In British Machine V ision Confer ence , 2018. Antonio H Ribeiro, Daniel Gedon, Daniel Martins T eixeira, Manoel Horta Ribeiro, Antonio L Pinho Ribeiro, Thomas B Schon, and W agner Meira Jr . Automatic 12-lead ECG classiﬁcation using a con volutional network ensemble. In Computing in Cardiolo gy (CinC) , 2020. doi: 10.22489/CinC. 2020.130. Ramprasaath R Selv araju, Michael Cogswell, Abhishek Das, Ramakrishna V edantam, Devi Parikh, and Dhruv Batra. Grad-CAM: V isual explanations from deep networks via gradient-based local- ization. In Pr oceedings of the IEEE International Confer ence on Computer V ision , pp. 618–626, 2017. Karen Simonyan and Andrew Zisserman. V ery Deep Con volutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs] , September 2014. URL 1556 . 00000 arXiv: 1409.1556. Mukund Sundararajan, Ankur T aly , and Qiqi Y an. Axiomatic attribution for deep networks. In International Confer ence on Machine Learning , pp. 3319–3328. PMLR, 2017. Christian Szegedy , W ei Liu, Y angqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov , Du- mitru Erhan, V incent V anhoucke, and Andre w Rabinovich. Going deeper with con volutions. In Pr oceedings of the IEEE conference on computer vision and pattern r ecognition , pp. 1–9, 2015. Mingxing T an and Quoc Le. Efﬁcientnet: Rethinking model scaling for con volutional neural networks. In International confer ence on machine learning , pp. 6105–6114. PMLR, 2019. C. W ah, S. Branson, P . W elinder , P . Perona, and S. Belongie. The caltech-ucsd birds-200-2011 dataset. T echnical Report CNS-TR-2011-001, California Institute of T echnology , 2011. Ruihan Zhang, Prashan Madumal, T im Miller, Krista A. Ehinger, and Benjamin I. P . Rubinstein. In vertible concept-based explanations for CNN models with non-negati ve concept acti vation vectors. In Pr oceedings of the AAAI Conference on Artiﬁcial Intellig ence , volume 35, pp. 11682– 11690, 2021. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliv a, and Antonio T orralba. Learning deep features for discriminativ e localization. In Proceedings of the IEEE confer ence on computer vision and pattern r ecognition , pp. 2921–2929, 2016. 8 Zhijie Zhu, Lei Fan, Maurice P agnucco, and Y ang Song. Interpretable image classiﬁcation via non- parametric part prototype learning. In Pr oceedings of the Computer V ision and P attern Recognition Confer ence , pp. 9762–9771, 2025. 9

Post-hoc Self-explanation of CNNs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment