Face Presentation Attack Detection via Content-Adaptive Spatial Operators

Face Presentation Attac k Detection via Content- A daptiv e Spatial Operators Shujaat Khan a,b , ∗ a Department of Computer Engineering, Colleg e of Computing and Mathematics, King F ahd Univ ersity of P etr oleum & Minerals, Dhahran, 31261, Saudi Arabia b SDAIA–KFUPM Joint Researc h Center for Ar tiﬁcial Intelligence, King F ahd Univer sity of P etr oleum & Minerals, Dhahr an, 31261, Saudi Arabia A R T I C L E I N F O Keyw ords : Face presentation attack detection Face anti-spooﬁng Inv olution Content-A daptive Spatial Operator A B S T R A C T Face presentation attack detection (FaceP AD) is critical for securing facial authentication against print, replay , and mask-based spooﬁng. This paper proposes CASO-P AD, an RGB-onl y , single- frame model that enhances MobileNetV3 with content-adaptive spatial operators (in volution) to better capture localized spoof cues. Unlik e spatially shared conv olution kernels, the proposed operator generates location-speciﬁc, channel-shared kernels conditioned on the input, improving spatial selectivity with minimal ov erhead. CASO-P AD remains lightweight (3.6M parameters; 0.64 GFLOPs at 256 × 256 ) and is trained end-to-end using a standard binary cross-entropy objectiv e. Extensive experiments on Replay- Attack, Repla y-Mobile, ROSE- Y outu, and OULU-NPU demonstrate strong performance, achieving 100/100/98.9/99.7% test accuracy, AUC of 1.00/1.00/0.9995/0.9999, and HTER of 0.00/0.00/0.82/0.44%, respectiv ely . On t he larg e-scale SiW-Mv2 Protocol-1 benchmark, CASO-P AD furt her att ains 95.45% accuracy with 3.11% HTER and 3.13% EER, indicating improv ed robustness under diverse real-world attacks. Ablation studies show that placing the adaptive operator near the network head and using moderate group shar ing yields the best accuracy–eﬃciency balance. Overall, CASO-P AD provides a practical pathway for robust, on-device FaceP AD with mobile-class compute and without auxiliary sensors or temporal stacks. 1. Introduction Facial recognition is now integ ral to mobile pa yments, surveillance, border control, and personal device aut henti- cation. Its appeal lies in being non-intrusive and accurate, pow ering applications such as smartphone unlocking [ 1 ], mobile pa yments [ 2 ], sur veillance sy stems [ 3 ], access con- trol [ 4 ], and attendance monitoring [ 5 ]. Lev eraging larg e datasets and high-per formance computing, moder n recog- nition systems hav e surpassed 99% accuracy [ 6 ]. Y et, this very sensitivity to facial data intr oduces vulnerabilities to spooﬁng att acks, where adv ersaries use counter f eit inputs such as pr inted photos, displa y repla ys, or 3D masks [ 7 ]. As adoption expands into sensitiv e domains like ﬁnance and healthcare, the need f or reliable anti-spooﬁng solutions has become paramount. FaceP AD systems aim to distinguish genuine ( bonaﬁde ) inputs from spoof attempts, ensur ing the security of facial recognition technologies [ 8 ]. Applications extend beyond traditional security to e-commerce, mobile device aut hen- tication, and remote access and t he demand for robust and deployment-friendly P AD methods has nev er been g reater [ 9 ]. Earl y approaches to FaceP AD relied on handcrafted f eatures such as SIFT [ 10 ], SURF [ 11 ], and HOG [ 12 ], often combined with SVMs or LDA classiﬁers. T exture- and motion-based cues, including eye blinks and lip motion, also pro ved useful [ 13 , 14 ]. Ho we ver , these techniques str uggled ∗ Corresponding author shujaat.khan@kfupm.edu.sa (S. Khan) OR CI D (s): under variable lighting, backgrounds, and camera qualities, limiting their real-w orld applicability . Bey ond basic photo or video replay s, adversaries now employ sophisticated techniques, including high-ﬁdelity 3D masks, adversarial per turbations, and machine learning- driven attacks [ 15 , 16 ] and the arms race between attack ers and FaceP AD continues. Although data augmentation and uniﬁed frame wor ks such as UniF AD [ 17 ] ha ve been pro- posed, they often require larg e-scale datasets and remain limited to speciﬁc attack scenar ios, especially under chal- lenging white-box conditions. Compounding the diﬃculty , no benc hmark dataset can encompass the diversity of possi- ble attack types [ 18 ]. T o address data scarcity , generativ e models such as GANs hav e been explored for synthesizing spooﬁng samples [ 19 , 20 ]. Recent works demonstrate that adversarial train- ing with synthetic data can improv e P AD robustness [ 21 ], though challenges of model scalability and eﬃciency re- main. Deep learning, par ticularly Con volutional Neural Net- w orks (CNNs), has since transf or med FaceP AD by captur ing subtle, discriminative f eatures. CNN-based models per f or m well but often their high computational cost hinders real- time use in resource-limited environments such as mobile devices. T o deal with the challenges of FaceP AD, we propose a content-adaptive framew ork for resource-constrained envi- ronments. The proposed netw ork utilizes a computationall y eﬃcient backbone architecture speciﬁcally designed for edge devices [ 22 ]. The major contr ibutions of t he proposed S. Khan: Preprint submitted to Elsevier Page 1 of 14 CASO P AD system lie in its cont ent-adaptive spatial oper ators mech- anism, whic h fuses feature extraction with no vel g roup- wise inv olution [ 23 ] signal processing to detect input- speciﬁc, ﬁne-grained spooﬁng cues eﬀectiv ely . U nlike ﬁxed depthwise ﬁlters, the proposed approach generates location- speciﬁc, channel-shared ker nels conditioned on t he input, thereby enhancing spatial adaptivity and improving discrim- ination of spoof artifacts. The contributions of t his w ork are as f ollow s: • A R GB single-frame FaceP AD model, CASO-P AD, that augments MobileNetV3 [ 22 ] wit h content-adaptive in volution [ 23 ] la yers at selected stages f or enhanced spoof detection. • A detailed audit of lear ned kernels and their interpre- tation f or model e xplainability . • A streamlined training setup with ablations ov er model architectures, kernel size, placement strategy , image size, and group sharing f or eﬃcient learning. • Comprehensiv e evaluation on Repla y- Attac k [ 24 ], Repla y-Mobile [ 25 ], OUL U-NPU [ 26 ], ROSE- Y outu [ 27 ], and SiW -Mv2 [ 28 ] datasets using standard met- rics (e.g., Accuracy , AUC, EER, HTER, APCER, BPCER, and ACER, etc), showing s trong HTER – eﬃciency trade-oﬀs ag ainst baselines. The remainder of this paper is str uctured as follo ws: Section 2 re view s e xisting FaceP AD methods and their lim- itations. Section 3 det ails the proposed framew ork. Section 4 discusses ﬁndings and compar isons, while Section 5 and 6 present t he kernel audit and detailed ablation studies respectiv ely . Finally Section 7 concludes the study . 2. Related W ork FaceP AD has attracted substantial interest, producing a broad tax onomy of methods: (i) spatial/appear ance tech- niques that analyze frame-lev el texture and color cues; (ii) tempor al approac hes that e xploit motion and dynamics; and (iii) multimodal methods that incor porate auxiliar y signals such as Remote pho toplethysmograph y (rPPG), depth, ther - mal, or infrared. Recent work also explores adversarial ro- bustness and domain generalization. Below , we synt hesize these directions and highlight gaps motiv ating an eﬃcient, R GB-only solution. 2.1. Spatial/Appearance Methods Classical handcrafted featur es. Early P AD relied on handcrafted descriptors to capture ﬁne-grained te xtural dif- f erences betw een bonaﬁde and spoof images. Common choices include Local Binary Patterns (LBP) [ 29 ], His- tograms of Oriented Gradients (HOG) [ 12 ], and SIFT [ 10 ]. Määttä et al. [ 30 ] used LBP to expose pr int-induced ar tifacts, while other works explored color spaces (e.g., HSV , Y CbCr) and frequency cues to better separate spoof patterns from genuine skin reﬂectance [ 31 , 32 , 33 ]. Although these meth- ods are eﬀective against simple photo attacks, they degrade under varied illuminations, cameras, and high-quality 3D mask or repla y attacks [ 34 ]. Deep learning and hybr id models. CNN-based FaceP AD superseded handcrafted approac hes by lear ning discr im- inative f eatures end-to-end. R epresent ativ e adv ances in- clude two-stream or attention mechanisms ov er R GB and illumination-in variant spaces, pixel-lev el supervision f or local spoof cues [ 15 ], and f eature fusion with learned or autoencoder -der ived representations [ 35 ]. Additional strands in vestig ate adv ersar ial robus tness [ 36 ], one-class client- speciﬁc modeling [ 37 ], domain adaptation/generalization [ 38 , 39 ], and diﬀusion-aided h ybr ids [ 40 ]. Hybr id methods deliberately combine handcrafted te xture with CNNs or integrate classical f eatures to improv e cross-dataset general- ization [ 41 ]. While t hese designs of ten boost accuracy , many are computationally expensiv e and less suit able for real-time deployment on edge de vices. 2.2. T emporal and Motion-Based Methods T emporal inf ormation helps re veal inconsistencies c har- acteristic of repla y and mask attacks. Classic works used optical ﬂo w or mo tion patterns [ 42 ] and e ye-blink analy- sis via undirected conditional random ﬁelds (CRFs) [ 13 ]. With deep lear ning, CNN-RNN hybrids (e.g., CNN+LSTM) capture spatiotemporal dependencies, while deep dynamic textures and ﬂow -guided models fur ther enhance temporal sensitivity [ 43 ]. More recent eﬀorts compress or summar ize clip dynamics t hrough temporal sequence sampling and stack ed recur rent encoders [ 44 , 45 ]. Despite strong accuracy , these approaches are slo w in nature due to high latency and can also be parameter - and compute-intensiv e, limiting use on mobile hardw are. 2.3. Multimodal Signals: rPPG, Depth, and Infrared The rPPG lev erages subtle color chang es induced by blood ﬂow as a liveness pr ior . Methods range from cor re- spondence features and noise-aw are templates to transf or mer - based encoders [ 46 , 47 , 48 ]. rPPG can separate bonaﬁde from spoof ed content and has been applied to 3D mask scenarios [ 49 ], but it is sensitive to illumination, mo tion, and video quality , often req uir ing longer capture window s [ 50 , 51 ]. Be yond rPPG, multispectral and infrared imaging capture non-visible cues and temperature or mater ial dif- f erences [ 52 , 53 ]. These systems are pow er ful but typically demand specialized hardw are, limiting scalability in low - cost deplo yments. 2.4. Summary and Motivation Classical texture and motion pipelines are fast but fragile under domain shif ts and sophisticated spoofs [ 34 ]. Deep models substantially improv e accuracy , yet many require hea vy bac kbones, auxiliary sensors, or complex training that impedes real-time use [ 40 ]. Multimodal signals (rPPG, S. Khan: Preprint submitted to Elsevier P age 2 of 14 CASO P AD T able 1 Overview of the b enchmark datasets used for evaluation. Dataset Release Y ear Pa rticipants Authentic / Attack Samples Replay-A ttack [ 24 ] 2012 50 300 / 1000 Replay-Mobile [ 25 ] 2016 40 550 / 640 OULU-NPU [ 26 ] 2017 55 990 / 3,960 ROSE-Y outu [ 27 ] 2018 20 1000 / 2350 SiW-Mv2 [ 28 ] 2022 600 785 / 915 depth, ther mal/IR) enhance robus tness but demand con- trolled capture conditions or specialized hardware [ 49 , 52 , 53 ]. Temporal encoders strengthen repla y/mask detection [ 43 , 44 , 45 ], but LSTM/transf ormer stacks are of ten parameter - intensive for phones and embedded devices. These trade-oﬀs motivate eﬃcient, RGB-onl y models that retain strong discr iminative po wer while meeting edge constraints. Lightweight backbones such as MobileNetV2/V3 and ShuﬄeNet achie ve fa v orable accuracy–eﬃciency via depthwise separable conv olutions and attention; how ev er, their spatiall y shared ker nels can under-captur e localized spoof artifacts. To address this, involution [ 23 ] replaces channel-speciﬁc, spatially shared kernels with spatially speciﬁc, channel-shar ed kernels produced by a lightweight generator , enabling content-adaptive ﬁlter ing with low over - head. 3. Proposed Me thod This section outlines t he datasets employ ed in t his study , the per f or mance ev aluation criter ia, the architecture of the proposed netw ork, and t he adopted training protocol. 3.1. Datasets T o assess the robustness and generalization capabil- ity of the proposed approach, ﬁv e well-established face anti-spooﬁng datasets are used: R eplay -A ttack [ 24 ], Repla y- Mobile [ 25 ], OULU-NPU [ 26 ], ROSE- Y outu [ 27 ], and SiW - Mv2 [ 28 ]. A concise summary of their characteristics is pro vided in T able 1 . Figure 1: Sample frames from authentic and sp o ofed videos in (a) RA, (b) RM, (c) OULU-NPU, (d) RY, and (e) SiW-Mv2 datasets. Genuine samples app ear in the top ro w, while the b ottom row sho ws diﬀerent attack types. 3.1.1. Replay-Attack The Repla y- Attack dataset [ 24 ] comprises 1,300 video sequences encompassing both genuine and attack scenar - ios (including photos and video replay s). The data were collected from 50 individuals under both controlled and variable illumination using diﬀerent cameras. The dataset is partitioned into disjoint training, dev elopment, and testing subsets, ensuring that no subject appears in more t han one split. Figure 1 (a) presents illustrativ e frames from genuine and spoof ed sequences. 3.1.2. Replay-Mobile The Repla y-Mobile dataset [ 25 ] was curated to ev alu- ate mobile-based face recognition and presentation attack detection systems. It consists of 1,190 video samples from 40 participants, captured in diverse lighting settings using smartphone cameras. The attacks include both pr inted photo and digital replay types. Example genuine and spoof frames are sho wn in Figure 1 (b). 3.1.3. OULU-NPU The OUL U-NPU [ 26 ] dat aset consists of 4 , 950 video samples collected from 55 subjects using six mobile devices (HTC Desire Eye, ASUS Zenfone Selﬁe, Oppo N3, Meizu MX5, Son y Xperia C5 Ultra Dual, and Samsung Galaxy S6 Edge). The data were acquired across t hree sessions under varying background scenes and illumination conditions. Pre- sentation attacks were g enerated using two pr int and tw o displa y devices. The dataset is par titioned subject-disjointl y into training and testing sets with 20 subjects each, while the remaining 15 subjects are reserved f or validation. Example genuine and spoof frames are sho wn in Figure 1 (c), 3.1.4. ROSE- Y outu The R OSE- Y outu dat aset [ 27 ] includes 3,350 videos from 20 subjects, recorded under multiple illumination con- ditions and using diﬀerent camera models. It contains t hree distinct spoof types: video replay , pr inted photo, and paper mask att acks. As illustrated in Figure 1 (d), this dataset poses a considerable challeng e for FaceP AD research due to its diverse acquisition conditions and attac k modalities. 3.1.5. Spooﬁng in the Wild (SiW-Mv2) Dataset SiW -Mv2 [ 28 ] is a lar ge-scale FaceP AD benchmark de- signed t o ev aluate robustness under a wide variety of spoof- ing scenarios, comprising 14 diﬀerent presentation attack categories. The dataset includes 915 spoof videos collected from 600 subjects, along with 785 bona ﬁde recordings from 493 individuals. The attack set co v ers div erse and challenging conditions such as replay-based attacks, par tial manipulation of f acial regions (e.g., e ye-onl y spoofs), sil- icone mask presentations, and paper -based ar tifacts. Rep- resentative sample frames from SiW -Mv2 are illustrated in Figure 1 (e). 3.2. Pre-processing In this work, the pre-processing stage inv olv es a center- cropping strategy designed to retain the natural aspect ratio of each frame while av oiding geometric distortion dur ing resizing. For both the training and testing phases, the same procedure is applied to ensure consistency . The adaptiv e cropping operation f ollow s the steps below : • Determine the smaller dimension 𝜌 between the frame height ( 𝑀 ) and width ( 𝑁 ) . S. Khan: Preprint submitted to Elsevier P age 3 of 14 CASO P AD • Extract a centered square region of size ( 𝜌 × 𝜌 ) from the original frame. Figure 2 illustrates the adaptive center -cropping proce- dure implemented in this study , which ser ves as the f ounda- tion f or subsequent dat a augmentation and normalization. Figure 2: Illustration of the adaptive center-cropping process used in the pre-p ro cessing pip eline. This op eration ensures consistent framing and preserves the o riginal asp ect ratio across all samples. 3.3. Proposed Netw ork Arc hitecture In CNNs, a conv olution la yer typically apply a single spatial k er nel unif ormly across the f eature map. This entails tw o well-kno wn constraints: 1. Spatially agnostic kernels: the same ﬁlter is reused at ev er y location, limiting sensitivity to local te xture, scale, and s tr ucture. 2. Fix ed cross-channel coupling: channel interactions are prescribed by t he lear ned 𝐶 ′ × 𝐶 ker nel and remain independent of spatial context. Bef ore contrasting alternative operators, w e ﬁx notation and the receptiv e-ﬁeld con vention used t hroughout. Notation. Let 𝑋 ∈ ℝ 𝐻 × 𝑊 × 𝐶 denote the input, 𝑌 ∈ ℝ 𝐻 × 𝑊 × 𝐶 ′ the output, and let Ω = {( 𝑢, 𝑣 ) ∶ | 𝑢 | ≤ ⌊ 𝑘 ∕2 ⌋ , | 𝑣 | ≤ ⌊ 𝑘 ∕2 ⌋ } be the 𝑘 × 𝑘 receptive ﬁeld. With t his notation, the con v entional con volution la yer can be written as: 3.3.1. Standard Convolution 𝑌 ( 𝑖, 𝑗 , 𝑐 𝑜 ) = 𝐶 ∑ 𝑐 𝑖 =1 ∑ ( 𝑢,𝑣 )∈Ω 𝐾 ( 𝑐 𝑜 , 𝑐 𝑖 , 𝑢, 𝑣 ) 𝑋 ( 𝑖 + 𝑢, 𝑗 + 𝑣, 𝑐 𝑖 ) , (1) where a single k er nel 𝐾 is shared across all spatial positions ( 𝑖, 𝑗 ) . While standard conv olution mix es all channels ev er y- where, many eﬃcient bac kbones restr ict cross-channel mix- ing to subsets to reduce cost. This yields group (and in the extreme, depthwise) conv olution: 3.3.2. Group Conv olution Partition channels into 𝐺 disjoint groups of size 𝑆 = 𝐶 ∕ 𝐺 . Mixing occurs onl y within the same group: 𝑌 ( 𝑖, 𝑗 , 𝑐 𝑜 ) = ∑ 𝑐 𝑖 ∈  ( 𝑐 𝑜 ) ∑ ( 𝑢,𝑣 )∈Ω 𝐾 𝑔 ( 𝑐 𝑜 , 𝑐 𝑖 , 𝑢, 𝑣 ) 𝑋 ( 𝑖 + 𝑢, 𝑗 + 𝑣, 𝑐 𝑖 ) , (2) where 𝑔 = 𝑔 ( 𝑐 𝑜 ) ∈ {1 , … , 𝐺 } index es the group of 𝑐 𝑜 and  ( 𝑐 𝑜 ) is its input-channel set. Standard con volution is 𝐺 =1 ; depthwise con volution cor responds to 𝐺 = 𝐶 . Both of the abov e still reuse the same spatial ker nel at ev - ery location, which dilutes sensitivity to position-dependent artifacts typical in FaceP AD (e.g., specular highlights or print borders). T o address this, w e adopt a location-adaptiv e operator . 3.3.3. Inv olution (location-adaptive, channel-shared) In volution [ 23 ] replaces a spatially invariant kernel with location-speciﬁc kernels 𝐻 ( 𝑖, 𝑗 , 𝑢, 𝑣 ) that are shared across channels: 𝑌 ( 𝑖, 𝑗 , 𝑐 ) = ∑ ( 𝑢,𝑣 )∈Ω 𝐻 ( 𝑖, 𝑗 , 𝑢, 𝑣 ) 𝑋 ( 𝑖 + 𝑢, 𝑗 + 𝑣, 𝑐 ) , (3) where 𝐻 is generated on-the-ﬂy from features (e.g., via a compact kernel-generator network). In volution preserves the low compute of depthwise/grouped designs while allowing the spatial ker nel to vary across ( 𝑖, 𝑗 ) , a property we will exploit in our content-adaptive head. 3.3.4. Proposed Group in volution (GI) Building on the advantages of in volution, w e introduce an adaptiv e group in v olution (GI). The proposed operator is gr oup-wise and location-adaptiv e . For 𝐺 groups, where one spatial kernel per group and location is generated; t hat kernel is shar ed across c hannels within the group and applied depthwise (no c hannel mixing in the spatial op): 𝑌 ( 𝑖, 𝑗 , 𝑐 ) = ∑ ( 𝑢,𝑣 )∈Ω 𝐻 ( 𝑖, 𝑗 , 𝑔 ( 𝑐 ) , 𝑢, 𝑣 ) 𝑋 ( 𝑖 + 𝑢, 𝑗 + 𝑣, 𝑐 ) , (4) where 𝑔 ( 𝑐 ) ∈ {1 , … , 𝐺 } maps c hannel 𝑐 to its g roup. Cross-channel interaction is provided by surrounding 1×1 pointwise la yers (squeeze/expand), while t he spatial opera- tor itself remains g roup-shared and content-adaptive. Remar ks. • Special cases: 𝐺 = 𝐶 reduces to depthwise in v olu- tion [ 23 ] (one kernel per channel); 𝐺 =1 reduces to channel-shared inv olution[ 23 ]. • Eﬃciency: the spatial application in ( 4 ) scales as  ( 𝐶 𝑘 2 𝐻 𝑊 ) , while t he kernel-generator cost scales with group count 𝐺 and the reduction ratio in the 1×1 bottleneck; both are lightweight compared to full 𝐶 × 𝐶 ′ con volution. The proposed GI pro vides a ﬂexible trade-oﬀ between expressivity and eﬃciency by (i) adapting kernels to spatial content and (ii) controlling channel sharing via t he number of groups 𝐺 . This is especially beneﬁcial f or lightw eight backbones (e.g., MobileNetV3) and tasks demanding ﬁne spatial aw areness (e.g., f ace P AD, texture analy sis, medical imaging), where directional, content-adaptive ﬁltering im- pro ves selectivity without incurr ing the cost of full 𝐶 × 𝐶 ′ spatial mixing. S. Khan: Preprint submitted to Elsevier P age 4 of 14 CASO P AD Figure 3: Schematic diagram of the p rop osed content adaptive spatial op erator-based deep learning net wo rk. 3.3.5. Proposed MobV3-GI: Content-Adaptive Spatial Operator -based Netw ork The proposed network arc hitecture (as shown in Fig- ure 3 ) begin with MobileNetV3-Lar ge [ 22 ]. A typical block comprises e xpansion ( 1×1 ), depthwise 3×3 , squeeze-and- ex citation (SE), and projection ( 1×1 ). Herein, selected depth- wise con volutions are replaced by proposed gr oup inv olu- tion : 𝐘 ( 𝑖, 𝑗 , ∶) = ∑ ( 𝑢,𝑣 )∈Ω 𝑘  𝑖,𝑗 ( 𝑢, 𝑣 ) 𝐗 ( 𝑖 + 𝑢, 𝑗 + 𝑣, ∶) , (5) where  𝑖,𝑗 ∈ ℝ 𝑘 × 𝑘 is spatiall y varying and channel-shared. Kernels are generated by a lightweight function 𝑔 𝜙 (e.g., 1×1 con vs + BN + nonlinear ity) from a squeezed version of 𝐗 . Group sharing ( 𝐺 ) reduces cost b y generating ker nels per channel-group. Global av erage pooling yields 𝐳 ∈ ℝ 𝑑 ; a linear head produces logits 𝐬 = 𝑊 𝐳 + 𝑏 ∈ ℝ 2 . W e optimize cross- entropy with optional label smoothing ( 𝜖 =0 . 05 ): Placement. T o balance accuracy and eﬃciency , in volu- tion/GI is applied at both low - and high-resolution stag es; Section 6 ablates early v s. late placement. Algorit hm 1 summarizes the o verall inf erence w orkﬂow of t he proposed CASO-P AD system. It outlines how the MobV3-GI backbone extracts spatially adaptiv e features from each input frame, and per forms classiﬁcation via a Algorithm 1: Input: Batc h of videos 𝑋 ∈ ℝ 𝐵 × 𝐶 × 𝐻 × 𝑊 Output: Class predictions 𝑦 ∈ ℝ 𝐵 × num_classes 1 Initialize Model: 2 Load MobV3-GI as t he bac kbone CNN 3 F r ame-Level F eature Extraction: 4 Extract f eatures 𝐟 = CNN ( 𝑋 [∶ , ∶ , ∶ , ∶]) ∈ ℝ 𝐵 ×960 5 F ully Connect ed Layer: 6 Classify: 𝑦 = FC ( 𝐅 av g ) 7 return Class predictions 𝑦 lightweight full y connected head. The algorit hmic outline emphasizes the model’ s simplicity and computational eﬃ- ciency , making it easily reproducible f or mobile and embed- ded implementations. 3.4. T raining Setup The training of the proposed model is performed using the Adam optimizer with a learning rate of 10 −4 . The opti- mization objective is the binar y cross-entropy (BCE) loss, f or mulated as: 𝐿 = − 1 𝑁 𝑁 ∑ 𝑖 =1 [ 𝑦 𝑖 log( 𝑝 𝑖 ) + (1 − 𝑦 𝑖 ) log(1 − 𝑝 𝑖 ) ] , (6) S. Khan: Preprint submitted to Elsevier P age 5 of 14 CASO P AD T able 2 P erformance of the proposed metho d on Replay-A ttack (RA), Repla y-Mobile (RM), OULU-NPU, ROSE-Y outu (RY), and SiW-Mv2 (Proto col-1) datasets. Results a re rep orted as mean ± std over 3 run s where applicable. Metric Dataset RA RM OULU RY SiW-Mv2 T est Accuracy (%) 100 ± 0.0 100 ± 0.0 99.68 ± 0.13 98.90 ± 0.35 95.45 ± 1.63 Y ouden Index (YI) 1.0 ± 0.0 1.0 ± 0.0 0.991 ± 0.001 0.98 ± 0.0 0.938 ± 0.020 AUC-ROC 1.0 ± 0.0 1.0 ± 0.0 0.9999 ± 0.0000 0.99 ± 0.0 0.9906 ± 0.0022 EER (%) 0.0 ± 0.0 0.0 ± 0.0 0.44 ± 0.11 0.82 ± 0.09 3.13 ± 0.70 F AR (%) 0.0 ± 0.0 0.0 ± 0.0 0.23 ± 0.24 0.82 ± 0.09 2.76 ± 0.55 HTER (%) 0.0 ± 0.0 0.0 ± 0.0 0.44 ± 0.04 0.82 ± 0.21 3.11 ± 1.02 FRR (%) 0.0 ± 0.0 0.0 ± 0.0 0.65 ± 0.26 0.82 ± 0.34 3.45 ± 1.75 where 𝑦 𝑖 denotes the g round-trut h label and 𝑝 𝑖 represents the model’ s estimated probability f or the genuine (live) class. Tr aining is conducted for 100 epochs using mini-batches of size 32 . Early stopping with a patience threshold of 5 epochs is employ ed to prevent ov er ﬁtting, based on the validation loss trend. Dur ing the testing phase, inference is performed with a batch size of 256 to ensure computational eﬃciency . All e xperiments are implemented in t he PyT orch frame- w ork and executed on an NVIDIA GeF orce RTX 5080 GPU equipped with 16 GB of VRAM. Unless other wise stated, all input frames are resized to a spatial resolution of 256 × 256 pixels. The source code and implementation details are pub- licly av ailable at: https://github.com/Shujaat123/CASO- PAD . 4. Results and Discussion This section presents a detailed evaluation of the pro- posed me t hod acr oss standard benc hmark datasets and mul- tiple performance metrics. W e ﬁrst discuss the quantitativ e outcomes obtained on Replay -A ttack (RA), Repla y-Mobile (RM), OULU-NPU , R OSE- Y outu (R Y), and SiW -Mv2, f ol- low ed by a comparative analysis with e xisting state-of-the- art FaceP AD appr oaches. 4.1. P er formance Evaluation The proposed Content- Adaptiv e Spatial Oper ators for F aceP AD (CASO-P AD) is built on an inv olution-augmented MobileNetV3 backbone. T able 2 reports the o verall per f or- mance under the default conﬁguration ( 256 × 256 input reso- lution, 𝐺 =120 groups, reduction f actor 𝑟 =4 , and k er nel size 𝑘 =5 ). Across ﬁve benchmarks, CASO-P AD demonstrates strong discr iminative capability , achieving per fect or near- perfect separation on controlled datasets and maintaining robust per formance on t he more challenging in-the-wild protocol. On Repla y- Attack (RA) and Repla y-Mobile (RM), CASO- P AD achie ves ﬂaw less separation betw een bona ﬁde and spoof classes, reaching 100% accuracy and A UC-R OC of 1.0 with 0.0% EER/HTER. These results indicate strong generalization under the controlled capture conditions and device variations typical of RA/RM. On OULU-NPU , which introduces larg er variability in illumination, bac kground, capture devices, and attack in- struments, CASO-P AD maintains near -ceiling per f or mance T able 3 Repla y-Attack (RA): comparison of test HTER (%) sorted from highest (wo rst) to low est (b est). Metho d Y ear HTER (%) EﬃcientNet-B0 [ 54 ] 2024 36.88 InceptionV4 [ 55 ] 2020 13.54 3D ConvNet [ 56 ] 2023 11.70 SCNN [ 55 ] 2020 7.53 Multi-Blo ck LBP [ 57 ] 2023 6.98 MIQF+SVM [ 58 ] 2022 5.38 Go ogLeNet+GMM [ 37 ] 2021 3.76 SfSNet [ 59 ] 2020 3.10 V GG16+GMM [ 37 ] 2021 1.46 MobileNet+Image Diﬀusion [ 40 ] 2023 0.09 ResNet50V2 [ 60 ] 2023 0.03 Hyb ridNet I I [ 61 ] 2025 0.02 Light weight 3D-DNN [ 62 ] 2024 0.00 HaTF AS [ 63 ] 2024 0.00 Hyb ridNet I [ 61 ] 2025 0.00 Defo rmable Convolution [ 64 ] 2025 0.00 Spatio-T emp oral [ 65 ] 2025 0.00 A dvSp o ofGuard [ 21 ] 2025 0.00 Dual-Branch [ 66 ] 2025 0.00 CASO-P AD (Prop osed) 2025 0.00 (99.68% accuracy , A UC-R OC 0.9999) wit h low er ror rates (EER 0.44%, HTER 0.44%). R OSE- Y outu (R Y) is more c hallenging due to diverse spoof types (print, replay , and mask/paper-mask) and het- erogeneous acquisition settings. CASO-P AD sust ains strong robustness wit h 98.90% accuracy , AUC-R OC 0.99, and EER/HTER of 0.82%, while preserving balanced sensitivity and speciﬁcity (Y ouden Index 0.98). Finall y , on SiW -Mv2 Protocol-1, which reﬂects large- scale, in-the-wild spooﬁng conditions with substantial sub- ject and att ack diversity , CASO-P AD achiev es 95.45% ac- curacy and A UC-ROC 0.9906 with 3.11% HTER and 3.13% EER. Overall, these results conﬁr m t hat the proposed content- adaptive spatial oper ator improv es discriminability on stan- dard benchmarks while remaining robus t under more realis- tic, unconstrained att ack scenar ios. 4.2. Comparison with the State of the Art T o benchmark its performance, we compare against a broad range of contemporar y approaches on RA, RM, OULU-NPU , R Y , and SiW -Mv2 datasets using standard ev aluation metrics. 4.2.1. Replay-Attack (RA) As sho wn in T able 3 , CASO-P AD achiev es perf ect clas- siﬁcation on the Repla y- Attack dataset with 0.0% HTER, matching or outper forming the most recent high-per f or ming models such as AdvSpoofGuard [ 21 ], Dual-Branch [ 66 ], and Def or mable Con volution [ 64 ]. Earlier methods, including EﬃcientNet-B0 [ 54 ] and InceptionV4 [ 55 ], show consid- erably higher error rates, conﬁrming the prog ress made S. Khan: Preprint submitted to Elsevier P age 6 of 14 CASO P AD T able 4 Repla y-Mobile (RM): comparison of test HTER (%) sorted from highest (wo rst) to low est (b est). Metho d Y ear HTER (%) V GG16+GMM [ 37 ] 2021 17.21 Go ogLeNet+GMM [ 37 ] 2021 13.56 SMKFNS [ 67 ] 2020 11.88 3D ConvNet [ 56 ] 2023 8.70 MK-SVDD-Slim [ 68 ] 2021 7.60 MKL [ 68 ] 2021 6.70 InceptionV4 [ 55 ] 2020 5.94 W A (PSO+PS) [ 69 ] 2021 5.85 W A (GA+MMS+PS) [ 70 ] 2022 5.12 SCNN [ 55 ] 2020 4.96 EﬃcientNet-B0 [ 54 ] 2024 4.62 MobileNet+Image Diﬀusion [ 40 ] 2023 1.14 Light weight 3D-DNN [ 62 ] 2024 0.45 ResNet50V2 [ 60 ] 2023 0.00 Defo rmable Convolution [ 64 ] 2025 0.00 Spatio-T emp oral [ 65 ] 2025 0.00 A dvSp o ofGuard [ 21 ] 2025 0.00 Dual-Branch [ 66 ] 2025 0.00 CASO-P AD (Prop osed) 2025 0.00 T able 5 OULU-NPU (complete protocol): comparison of test p erfo r- mance sorted b y ACER from highest (wo rst) to low est (b est). Method Y ear APCER / BPCER / ACER (%) T exture (V AR) [ 71 ] 2021 14.5 / 15.0 / 14.8 ED-LBP (V AR) [ 72 ] 2021 11.3 / 8.4 / 9.9 Fak e-Net (V AR) [ 73 ] 2021 5.4 / 6.9 / 6.2 OFT (V AR) [ 74 ] 2022 5.7 / 2.7 / 4.2 UCDCN [ 75 ] 2024 2.6 / 1.01 / 1.82 3DLCN [ 76 ] 2024 1.5 / 0.5 / 1.0 Spatio-T emporal [ 65 ] 2025 0.13 / 1.11 / 0.62 KD+Depth [ 66 ] 2025 0.28 / 0.83 / 0.56 CASO-P AD (Proposed) 2025 0.00 / 0.83 / 0.42 by moder n lightw eight architectures. The zero-er ror per- f or mance achie ved b y CASO-P AD underscores its strong f eature discrimination and adaptation capabilities. 4.2.2. Replay-Mobile (RM) The results on the Replay -Mobile dataset, summar ized in T able 4 , f ollow a similar trend. CASO-P AD again attains 0.0% HTER, positioning it among the t op-per forming meth- ods such as Dual-Branch [ 66 ] and AdvSpoofGuard [ 21 ]. Older handcrafted and hybrid approaches (e.g., W A [ 69 , 70 ] and SCNN [ 55 ]) show much higher er ror rates between 5–12%. These ﬁndings highlight CASO-P AD’ s ability to generalize across mobile capture environments character - ized b y unstable illumination and v ar ying conditions. 4.2.3. OULU T able 5 repor ts comparative per f or mance on the OULU- NPU benchmark under the complete protocol, where meth- ods are e valuated using the standardized APCER, BPCER, and ACER metrics. CASO-P AD achie ves an A CER of 0.42%, deliv ering one of t he best ov erall results among T able 6 ROSE-Y outu (RY): compa rison of test HTER/EER (%) sorted b y HTER from highest (wo rst) to low est (b est). Entries without rep o rted HTER are listed last. Method Y ear HTER/EER (%) 3D ConvNet [ 56 ] 2023 21.30/- ResNet50+GMM [ 37 ] 2021 14.69/- ViViT [ 77 ] 2023 13.28/2.46 EﬃcientNet-B0 [ 54 ] 2024 9.54/- F ASNet [ 36 ] 2021 8.57/- Fatemifa r et al. [ 78 ] 2022 6.34/- W A (PSO+PS) [ 69 ] 2021 5.61/- W A (GA+MMS+PS) [ 70 ] 2022 5.12/- MobileNet+Image Diﬀusion [ 40 ] 2023 4.92/4.95 ResNet50V2 [ 60 ] 2023 2.53/2.64 AdvSpoofGuard [ 21 ] 2025 1.97/1.08 Spatio-T emp oral ( 𝜏 =1 ) [ 65 ] 2025 1.47/0.85 CA-F AS [ 79 ] 2024 1.37/- Deformable Convolution [ 64 ] 2025 1.26/0.80 Dual-Branch [ 66 ] 2025 1.02/1.15 CASO-P AD (Prop osed) 2025 0.82/0.82 recent s tate-of-t he-ar t appr oaches. In par ticular, t he pr o- posed method obtains an APCER of 0.00%, indicating perfect rejection of attack present ations, while maintaining a competitiv e BPCER of 0.83%. Compared with earlier handcrafted f eature baselines such as T exture (V AR) [ 71 ] and ED-LBP (V AR) [ 72 ], CASO-P AD reduces ACER by more t han an order of magni- tude. Further more, CASO-P AD remains highly competitive against recent deep architectures including 3DLCN [ 76 ] (ACER 1.0%) and KD+Depth [ 66 ] (A CER 0.56%). These results demonstrate that CASO-P AD generalizes strongly across t he diverse illumination, camera, and spooﬁng con- ditions present in OULU-NPU. 4.2.4. ROSE- Y outu (R Y) T able 6 reports comparative per f or mance on the more challenging R OSE- Y outu benchmark. Here, CASO-P AD achie ves an HTER of 0.82% and an EER of 0.82%, outper - f or ming several recent high-per forming methods including Dual-Branch [ 66 ] (1.02/1.15) and Def ormable Conv olu- tion [ 64 ] (1.26/0.80). Compared with earlier arc hitectures such as ViV iT [ 77 ] or EﬃcientNet-B0 [ 54 ], the improv ement margin e xceeds 90%, demonstrating substantial gains in discriminability and generalization. The results on ROSE- Y outu validate CASO-P AD’ s robustness against diverse spooﬁng modalities and sensor variations. 4.2.5. Spooﬁng in the Wild (SiW-Mv2) T able 7 repor ts results on the SiW -Mv2 Protocol-1 benchmark, a larg e-scale and challenging dat aset charac- terized by substantial subject diversity , varied att ack instru- ments, and r ealistic capture conditions. Under this pro tocol, CASO-P AD achie ves an HTER of 3.11% and an EER of 3.13%, yielding the low est er ror rates among the compared methods. S. Khan: Preprint submitted to Elsevier P age 7 of 14 CASO P AD T able 7 SiW-Mv2 (Proto col-1): compa rison of test HTER/EER (%) so rted b y HTER from highest (wo rst) to lo west (best). Metho d Y ear HTER/EER (%) V GG16 [ 80 ] 2014 10.54/10.45 Spatio-T emp oral ( 𝜏 =1 ) [ 65 ] 2025 6.11/6.63 MobileNetV3 large [ 22 ] 2019 5.93/5.87 Inceptionv3 [ 81 ] 2016 5.77/5.78 Depth-aug. T eacher [ 66 ] 2025 5.82/6.08 ResNet50V2 [ 60 ] 2023 5.18/6.08 MobileNetV3 small [ 22 ] 2019 5.56/5.51 Defo rmMobileNet [ 64 ] 2025 4.88/4.83 EﬃcientNet-B0 [ 54 ] 2024 4.82/5.80 CASO-P AD (Prop osed) 2026 3.11/3.13 Compared with recent lightweight architectures such as EﬃcientNet-B0 [ 54 ] (4.82/5.80) and DeformMobileNet [ 64 ] (4.88/4.83), CASO-P AD reduces HTER by appro ximately 35–40%, indicating improv ed generalization under diverse spoof scenar ios. Relative to ResNet50V2 [ 60 ] (5.18/6.08) and the depth-augmented teacher model [ 66 ] (5.82/6.08), the reduction in HTER ex ceeds 40%, despite relying solely on R GB input without auxiliary depth or temporal modeling. Furt hermore, CASO-P AD substantially outperforms ear - lier baselines such as MobileNetV3 [ 22 ] and V GG16 [ 80 ], where er ror rates remain abov e 5–10%. These results demon- strate that the proposed content-adaptive spatial operator enhances discr iminative capability in comple x, real-w orld spooﬁng conditions while preser ving a lightweight and deployment-friendly architecture. 4.2.6. Over all Discussion Across all three datasets, CASO-P AD consistently matches or sur passes the stronges t recent competitors while main- taining a compact and comput ationally eﬃcient architecture. Its use of conte xt-adaptive inv olution ker nels allo w s the model to dynamically capture spatial dependencies, yielding robust anti-spooﬁng per formance. These results collectively establish CASO-P AD as a reliable and scalable framew ork f or real-w orld FaceP AD systems. 5. Kernel Audit T o better understand the adaptive behavior of the pro- posed model, w e per f or med a comprehensiv e audit of t he learned kernel functions. This anal ysis is conducted on R Y dataset and it f ocuses on f our com plement ary indicators that collectiv ely descr ibe the k er nel’ s spatial and spectral properties: • HF/LF ratio — the proportion of high- to low - frequency energy , where higher values indicate shar per, edge-rich responses; • Anisotrop y — t he deg ree of or ientation selectivity ( 0 indicates isotropic kernels, while larger v alues reﬂect strong er directional sensitivity); • DC oﬀset — the mean value of kernel weights, where values near zero cor respond to center–sur round or high-pass beha vior without brightness bias; and • P osition variance — a measure of spatial non-stationar ity , with low er values denoting g reater position inv ar i- ance. T est-set summar y. The ev aluation on the held-out test se t rev eals that the in v olution-based head produces k er nels that are nearl y zero-mean and spatially inv ariant (DC oﬀse t ≈ 0 . 0000 ± 0 . 0000 , position variance ≈ 4 × 10 −6 ± 3 × 10 −6 ). These ker nels e xhibit moderate or ientation selec- tivity (anisotropy 0 . 1832 ± 0 . 0696 ) and a noticeable high- frequency preference (HF/LF 17 . 58 ± 4 . 65 ). Figure 4: Kernel audit visualization (normalized to [0 , 1] ). Left: Mean kernel showing directional p ola rity contrast, resembling an edge-detecting ﬁlter. Right: Mean energy distribution, radially compact and indicative of high-pass/edge-emphasizing b ehavio r. Figure 4 illustrates these proper ties. The mean kernel (left) shows a localized polar ity contrast, similar to an ori- ented edge detector rat her than a symmetric center–sur round pattern. The corresponding energy map (right) display s a radially concentrated distribution, characteristic of high- pass ﬁlter ing. Tog ether, these visualizations conﬁr m that the model learns to enhance structural gradients and texture cues typical of genuine facial regions, while naturally suppressing smooth or redundant low-fr equency content. Class-wise diﬀer ences. A class-wise breakdown of the learned kernels rev eals clear distinctions betw een genuine and spoof ed inputs. As depicted in Figure 5 , attack samples exhibit a higher HF/LF ratio (18.46 vs. 15.02) but a low er anisotrop y (0.161 vs. 0.248) than genuine f aces. This indi- cates that spoofed frames tend to contain ar tiﬁcially sharp, broadband textures but lack coherent directional organiza- tion, consistent wit h reﬂections or pr inting ar tifacts. Cohen ’ s eﬀect size fur ther quantiﬁes these diﬀerences: HF/LF ( 𝑑 = + 0 . 782 ) suggests moderately higher sharpness in att acks, while anisotrop y ( 𝑑 = − 1 . 486 ) reﬂects a larg e eﬀect f av oring genuine faces with stronger directional coher- ence. The histograms in Figure 5 illustrate these tendencies: real samples cluster around higher anisotropy values, while attack samples dominate t he higher HF/LF range. Over all, S. Khan: Preprint submitted to Elsevier P age 8 of 14 CASO P AD (a) Anisotropy distribution (b) HF/LF dis tr ibution Figure 5: Kernel audit overlays on the test set. ( a ) Real faces sho w higher directional consistency (anisotropy), while ( b ) attack samples exhibit excessive sha rpness (higher HF/LF) but weak er organization. anisotropy emerges as the more discriminative feature, cap- turing str uctural regularity inherent in authentic facial geom- etry , whereas HF/LF primar ily responds to o ver -sharpening and specular eﬀects common in spoof attempts. Interpr etation. The ker nel audit suggests that the inv olu- tion lay ers act as adaptive edg e-oriented spatial ﬁlters . They enhance str uctured g radients aligned with genuine facial geometry while down-w eighting homogeneous or specular regions often found in spoof media. The combination of near -zero DC bias, balanced frequency response, and mod- erate directional selectivity highlights the interpretability of the learned ﬁlters. In essence, the model implicitly lear ns a phy sics-consistent representation, accentuating meaningful texture and geome tr ic cues t hat distinguish real from fake f acial imagery . 6. Ablation Studies This section presents a ser ies of ablation experiments de- signed to ev aluate the contr ibution of diﬀerent architectural and training f actors to the proposed method’ s ov erall per - f or mance. W e systematicall y analyze the eﬀects of netw ork backbone, g roup count in the inv olution head, placement of the proposed group-inv olution (GI) module, input image resolution, and reduction ratio. The section concludes with computational complexity analy sis and qualitative interpre- tation using Grad-C AM visualizations. 6.1. Eﬀect of Ne twor k Architecture T o understand how backbone design inﬂuences perfor - mance, tw o MobileNet variants were tested: MobileNetV2 and MobileN etV3-Large, each integrated wit h the proposed GI head. Both models were ev aluated using the ROSE- Y outu dataset with identical training conditions. Key results are summarized in T able 8 . MobileNetV3-Lar ge consistently outperforms MobileNetV2, achie ving higher accuracy and a lo wer HTER despite re- duced comput ational cost (0.643 vs. 0.932 GFLOPs). The T able 8 Compa rison of diﬀerent MobileNet backb ones with proposed GI head on the ROSE-Y outu dataset (reduction ratio = 4, groups = 120). Results are rep orted as mean ± std over 3 runs. Model Pa rams (M) GFLOPs Y ouden max ↑ HTER (%) ↓ MobileNetV3+GI 3.635 0.643 0 . 984 ± 0 . 004 0 . 82 ± 0 . 21 MobileNetV2+GI 3.399 0.932 0 . 971 ± 0 . 006 1 . 43 ± 0 . 34 T able 9 Ablation over group count ( 𝐺 ) in the GI head (input size 256 × 256 ). Results a re mean ± std over 3 runs. Groups Pa rams (M) GFLOPs Y ouden ↑ HTER (%) ↓ 16 3.476 0.623 0.977 ± 0.004 1.16 ± 0.19 30 3.497 0.626 0.977 ± 0.010 1.14 ± 0.50 60 3.543 0.631 0.972 ± 0.020 1.41 ± 1.01 120 3.635 0.643 0.984 ± 0.004 0.82 ± 0.21 240 3.818 0.666 0.965 ± 0.020 1.77 ± 0.98 impro vement can be attr ibuted to MobileNetV3’ s more expressiv e activation functions and squeeze-and-ex citation modules, whic h be tter complement t he adaptiv e behavior of the in volution head. 6.2. Eﬀect of Group Count in the GI Head The number of g roups ( 𝐺 ) in the inv olution operator controls spatial diversity and computational load. T able 9 summarizes performance for v ar ious g roup counts. Per- f or mance peaks at 𝐺 =120 , while smaller or larger values degrade results due to underﬁtting or ov er -parameter ization. Moderate group sizes allo w the model to capture suf- ﬁcient spatial variation without unnecessary ov erhead. Ex- cessive grouping ( 𝐺 =240 ) increases FLOPs and parameters without measurable beneﬁt, while very low values reduce the netw ork’ s representational capacity . 6.3. Eﬀect of Proposed GI Placement T o assess ho w the position of the GI module aﬀects performance, tw o placements wer e ev aluated: (1) at the S. Khan: Preprint submitted to Elsevier P age 9 of 14 CASO P AD Figure 6: t-SNE emb eddings of features b efore the classiﬁcation lay er across diﬀerent input resolutions. Higher resolutions yield b etter class separation. T able 10 Eﬀect of GI blo ck placement in MobileNetV3 (groups = 120 ). V alues are mean ± std over 3 runs. GFLOPs measured on 256 × 256 input. Placement Pa rams (M) GFLOPs Y ouden max ↑ HTER (%) ↓ Beginning 2.975 0.645 0 . 968 ± 0 . 014 1 . 59 ± 0 . 70 End 3.635 0.643 0 . 984 ± 0 . 004 0 . 82 ± 0 . 21 T able 11 P erformance for diﬀerent input image sizes on the ROSE-Y outu dataset. Mean ± std over 3 runs. Metric Image Size 𝟓𝟏𝟐 × 𝟓𝟏𝟐 𝟐𝟓𝟔 × 𝟐𝟓𝟔 𝟏𝟐𝟖 × 𝟏𝟐𝟖 𝟔𝟒 × 𝟔𝟒 Accuracy (%) 98.97 ± 0.43 98.90 ± 0.35 98.15 ± 1.01 93.48 ± 1.17 AUC-ROC 0.9994 ± 0.0003 0.9995 ± 0.0003 0.9971 ± 0.0033 0.9724 ± 0.0063 EER (%) 0.90 ± 0.36 0.82 ± 0.09 1.61 ± 0.80 8.21 ± 1.16 HTER (%) 0.61 ± 0.23 0.82 ± 0.21 1.64 ± 0.70 7.70 ± 1.42 Y ouden Index (YI) 0.988 ± 0.005 0.984 ± 0.004 0.967 ± 0.014 0.846 ± 0.028 Pa rams (M) 3.635 3.635 3.635 3.635 GFLOPs 2.563 0.643 0.163 0.043 beginning of the MobileNetV3 backbone and (2) at the end , right before adaptive a verag e pooling. Results are summa- rized in T able 10 . Placing the GI at the end yields superior results, in- dicating that conte xt-adaptive ﬁlter ing beneﬁts most from high-lev el semantic representations. Early placement oﬀers slightly reduced computational cost but sacr iﬁces discrimi- native power . 6.4. Eﬀect of Input Imag e Size T o study the trade-oﬀ between input resolution and accuracy , models were trained wit h image sizes of 64 × 64 , 128 × 128 , 256 × 256 , and 512 × 512 . Results are reported in T able 11 and illustr ated in Figure 6 . The t-SNE plots sho w a clear resolution-separation trend: at 64×64 the bona ﬁde and spoof clusters ov erlap notably; at 128×128 and 256×256 clusters become more compact wit h wider margins; at 512×512 separation is stronges t, matching the best EER/HTER. Perf or mance improv es with higher resolutions, with t he best results achie ved at 512 × 512 (HTER 0.61%). Ho w- ev er , computational cost gro ws rapidly , from 0.043 to 2.563 GFLOPs. The 256 × 256 resolution pro vides a balanced trade-oﬀ, oﬀer ing near-maximum accuracy with manage- able complexity , making it t he most practical choice for real- time applications. T able 12 Eﬀect of reduction ratio ( reduce ∈ {1 , 4 , 8} ) in the END-placed involution [ 23 ] head ( 𝐺 =120 ). Mean ± std over 3 runs. Reduce Pa rams (M) GFLOPs Y ouden max ↑ HTER (%) ↓ 1 5.775 0.919 0 . 965 ± 0 . 028 1 . 75 ± 1 . 40 4 3.635 0.643 0 . 984 ± 0 . 004 0 . 82 ± 0 . 21 8 3.303 0.600 0 . 967 ± 0 . 016 1 . 65 ± 0 . 78 6.5. Eﬀect of R eduction Ratio T able 12 ex amines the inﬂuence of t he reduction ratio ( r educe ) in the kernel generator of the in volution head. As the bottlenec k ratio increases, parameter count and FLOPs decrease, but performance f ollow s a non-linear trend. The optimal conﬁguration is r educe = 4 , which deliv ers the highest Y ouden Inde x ( 0 . 984 ± 0 . 004 ) and low est HTER ( 0 . 82 ± 0 . 21 ). A small bottleneck ( reduce = 1 ) ov erﬁts and increases computational load, while an over ly larg e one ( reduce = 8 ) restricts t he kernel g enerator’ s capacity . A moderate setting at reduce = 4 str ikes the best balance between eﬃciency and expressiv eness. 6.6. Computational Complexity and Eﬃciency Analy sis T o examine computational eﬃciency , CASO-P AD is ev aluated against representative lightweight architectures, including ShuﬄeNetV2 [ 82 ], Once-f or- All [ 83 ], MobileNetV2 [ 84 ], MobileVi TV2 [ 85 ], GhostN et [ 86 ], and MobileNe tV3 [ 22 ]. All networ ks are trained and tested using identical settings with a 224 × 224 input resolution, ImageNet initializa- tion, and consistent optimization parameters. Performance and complexity measurements are obtained under unif orm benchmarking conditions to ensure f air comparison. Figure 7 illustrates t he relationship between spoof detec- tion performance and model complexity from two comple- mentar y perspectiv es. The parameter -based analy sis (left) show s that CASO-P AD attains t he low est HTER (1.02%) de- spite ha ving only moderate model size (3.635M parameters), indicating super ior parameter eﬃciency . Sev eral smaller netw orks, such as ShuﬄeNe tV2, exhibit noticeabl y higher er ror rates, sugg esting that reduced parameter count alone does not guarantee robust liv eness representation. S. Khan: Preprint submitted to Elsevier P age 10 of 14 CASO P AD Figure 7: T rade-oﬀ compa rison b etw een mo del complexit y and performance. (Left) HTER vs. numb er of pa rameters. (Right) HTER vs. GFLOPs. T able 13 Inference latency compa rison on NVIDIA Jetson Orin Nano fo r input resolution 224 × 224 (CUDA execution). Method Jetson Orin Nano Latency (ms) ShuﬄeNetV2 [ 82 ] 22.6 ± 0.6 Once-for-All [ 83 ] 49.36 ± 0.4 Mobile ViTV2 [ 85 ] 28.87 ± 0.8 MobileNetV2 [ 84 ] 19.0 ± 0.4 MobileNetV3 (la rge) [ 22 ] 23.17 ± 0.7 GhostNet [ 86 ] 37.8 ± 0.2 CASO-P AD (Prop osed) 25.6 ± 0.8 Values denote average inference latency (mean ± std) over 100 runs on NVIDIA Jetson Orin Nano (25W mode, CUDA). The GFLOPs-based view (r ight) fur ther reveals that CASO-P AD achiev es strong discriminative per f or mance at low computational cost (0.48 GFLOPs). Impor tantly , mod- els with comparable or ev en lower FLOPs do not consis- tently matc h this accuracy lev el, highlighting t hat eﬀective architectural design and f eature modeling play a more cr iti- cal role than raw operation count. These obser vations collec- tivel y indicate that CASO-P AD impro v es generalization ef- ﬁciency by extracting more inf or mative representations per unit of computation rather than rel ying on lar ger ne tworks. 6.6.1. Edg e Deployment : Jetson Or in Nano T able 13 repor ts inf erence latency on the NVIDIA Jetson Orin Nano, reﬂecting realistic edge deployment conditions. While cer tain models exhibit slightly lo wer latency , these netw orks are associated with substantiall y higher HTER values (Fig. 7 ), indicating weak er spoof discr imination ca- pability . CASO-P AD maintains competitive r untime per - f or mance while delivering signiﬁcantly impro ved detection accuracy , demonstrating that modest increases in latency can be justiﬁed by notable gains in reliability and robustness. This beha vior underscores a critical practical considera- tion: minimal inf erence time alone is insuﬃcient f or secure biometric sys tems if accompanied b y degraded liv eness detection. The results conﬁr m that CASO-P AD achiev es a f av orable accuracy–eﬃciency balance, making it suit able f or resource-constrained, real-time f ace authentication scenar- ios. 6.7. Model Interpretability via Grad-C AM T o inter pret the model’ s decision-making process, Grad- CAM heatmaps w ere generated f or both genuine and spoof samples (Figure 8 ). The activation maps show that the net- w ork emphasizes meaningful f acial regions, such as skin texture, e yes, and lips, while also f ocusing on artifacts such as mask and print borders in spoofed ones. This behavior validates the discriminative nature of the learned f eatures, illustrating that CASO-P AD eﬀectivel y lever ages spatial and textural cues associated with f acial liveness. 7. Conclusion This paper introduced CASO-P AD, a lightw eight Face Presentation Attack Detection (FaceP AD) model that inte- grates content-adaptiv e spatial operators (in v olution) into a MobileNetV3 backbone. By replacing selected depthwise con volutions with a group-wise, location-adaptive opera- tor , CASO-P AD improv es spatial selectivity for spoof cues while retaining mobile-class eﬃciency and operating in an R GB-only , single-fr ame setting. Extensive experiments on Replay -A ttack, Repla y-Mobile, OULU-NPU , ROSE- Y outu, and SiW -Mv2 (Protocol-1) demon- strate that CASO-P AD achie ves perfect or near -ceiling performance on controlled benchmarks and maintains robust accuracy under in-the-wild conditions (e.g., 3.11% HTER and 3.13% EER on SiW-Mv2 Protocol-1). Ablation studies furt her sho w that placing the adaptive operator near t he net- w ork head and using moderate group sharing yields the best accuracy–eﬃciency trade-oﬀ. In addition, the kernel audit and Grad-C AM anal ysis pro vide interpretable e vidence that CASO-P AD emphasizes meaningful textur e and boundar y S. Khan: Preprint submitted to Elsevier P age 11 of 14 CASO P AD Figure 8: Grad-CAM [ 87 ] visualizations on genuine and sp o of frames from the ROSE-Y outu dataset. The mo del fo cuses on discriminative cues such as eyes, lips, and mask edges (b est view ed in color). cues, while Jetson Orin N ano measur ements conﬁrm its suitability f or real-time edge deployment. The implementation of CASO-P AD is publicly av ailable at: https://github.com/Shujaat123/CASO- PAD . A ckno wledgment Shujaat Khan acknow ledg es t he suppor t from t he King Fahd University of Pe troleum & Minerals (KFUPM) under Earl y Career Researc h Grant no. EC241027. Ref erences [1] K. W ang, M. Huang, G. Zhang, H. Y ue, G. Zhang, Y . Qiao, Dynamic f eature queue for sur veillance face anti-spooﬁng via prog ressive train- ing, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Patter n Recognition, 2023, pp. 6371–6378. [2] W . Y e, W . Chen, L. For tunati, Mobile payment in china: A study from a sociological perspectiv e, Jour nal of Communication Inquiry 47 (2023) 222–248. [3] H. Fang, A. Liu, J. W an, S. Escalera, C. Zhao, X. Zhang, S. Z. Li, Z. Lei, Surveillance face anti-spooﬁng, IEEE Transactions on Inf or mation Forensics and Security 19 (2023) 1535–1546. [4] H. Lee, S.-H. Park, J.-H. Y oo, S.-H. Jung, J.-H. Huh, Face recognition at a distance for a stand-alone access control system, Sensors 20 (2020) 785. [5] K. Alhanaee, M. Alhammadi, N. Almenhali, M. Shatnawi, Face recognition smar t attendance system using deep transfer lear ning, Procedia Computer Science 192 (2021) 4093–4102. [6] P . Grother, P . Grother, M. Ngan, K. Hanaoka, Face recognition vendor test (fr vt) par t 2: Identiﬁcation, 2019. [7] J. Galbally , et al., Biometric anti-spooﬁng methods: A sur vey in face recognition systems, IEEE Security and Privacy 12 (2014) 30–37. [8] Y . Wen, et al., Face spoof detection with image distor tion analysis, in: IEEE Transactions on Information Forensics and Security, volume 10, 2015, pp. 746–761. [9] Z. Yu, Y . Qin, X. Li, C. Zhao, Z. Lei, G. Zhao, Deep lear ning f or face anti-spooﬁng: A sur ve y , IEEE transactions on pattern analysis and machine intelligence 45 (2022) 5609–5631. [10] D. G. Low e, Distinctiv e image features from scale-invariant key - points, International jour nal of computer vision 60 (2004) 91–110. [11] H. Bay , T . Tuytelaars, L. V an Gool, Surf: Speeded up robust f eatures, in: Computer Vision–ECCV 2006: 9th European Conf erence on Computer V ision, Graz, Austria, Ma y 7-13, 2006. Proceedings, Part I 9, Springer, 2006, pp. 404–417. [12] N. Dalal, B. Triggs, Histograms of or iented gradients for human detection, in: 2005 IEEE computer society conference on computer vision and patter n recognition (CVPR’05), volume 1, Ieee, 2005, pp. 886–893. [13] G. Pan, L. Sun, Z. Wu, Y . W ang, Eyeblink -based anti-spooﬁng in face recognition from a generic webcamera, 2007 IEEE 11th Inter national Conf erence on Computer Vision (2007) 1–8. [14] J. Komulainen, A. Hadid, M. Pietikäinen, Context based f ace anti- spooﬁng, in: 2013 IEEE sixth inter national conference on biometrics: theory, applications and systems (BT AS), IEEE, 2013, pp. 1–8. [15] W . Sun, Y . Song, C. Chen, J. Huang, A. C. Kot, Face spooﬁng de- tection based on local ter nary label super vision in fully conv olutional netw orks, IEEE Transactions on Information Forensics and Security 15 (2020) 3181–3196. [16] C. Szeg edy , W . Zaremba, I. Sutskev er, J. Bruna, D. Erhan, I. Good- f ellow , R. Fer gus, Intriguing proper ties of neural netw orks, arXiv preprint arXiv:1312.6199 (2013). [17] D. Deb, X. Liu, A. K. Jain, Uniﬁed detection of digital and physical face attacks, in: 2023 IEEE 17th International Conference on Aut o- matic Face and Gesture Recognition (FG), IEEE, 2023, pp. 1–8. [18] Z. Zhang, J. Y an, S. Liu, Z. Lei, D. Yi, S. Z. Li, A face antispooﬁng database with diverse attacks, in: 2012 5th IAPR international conf erence on Biometr ics (ICB), IEEE, 2012, pp. 26–31. [19] E. L. Denton, S. Chintala, A. Szlam, R. Fergus, Deep generative image models using a Laplacian pyramid of adversarial networks, in: Adv ances in Neural Information Processing Systems (NeurIPS), volume 28, 2015, pp. 1486–1494. [20] A. Radf ord, L. Metz, S. Chintala, Unsupervised representation learning with deep conv olutional generativ e adversarial networks, arXiv prepr int arXiv:1511.06434 (2015). [21] T . H. M. Siddique, S. Khan, Z. W ang, K. Huang, Advspoofguard: Optimal transpor t dr iven robust face presentation attack detection system, Know ledge-Based Systems (2025) 113759. [22] A. How ard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. T an, W . W ang, Y . Zhu, R. P ang, V . V asudevan, et al., Searching for mo- bilenetv3, in: Proceedings of the IEEE/CVF Inter national Conference on Computer Vision, 2019, pp. 1314–1324. [23] D. Li, J. Hu, C. W ang, X. Li, Q. She, L. Zhu, T. Zhang, Q. Chen, Inv o- lution: Inv er ting t he inherence of convolution for visual r ecognition, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12321–12330. [24] I. Chingo vska, A. Anjos, S. Marcel, On the eﬀectiv eness of local binary patterns in face anti-spooﬁng, in: 2012 BIOSIG-proceedings of the international conf erence of biometr ics special interest group (BIOSIG), IEEE, 2012, pp. 1–7. [25] A. Cos ta-Pazo, S. Bhattacharjee, E. V azquez-Fernandez, S. Mar - cel, The repla y-mobile face presentation-attack database, in: 2016 International Conference of the Biometrics Special Interest Group (BIOSIG), IEEE, 2016, pp. 1–7. [26] Z. Boulkenaf et, J. Komulainen, L. Li, X. Feng, A. Hadid, Oulu-npu: A mobile face presentation att ack database wit h real-wor ld variations, in: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017), IEEE, 2017, pp. 612–618. [27] H. Li, W . Li, H. Cao, S. W ang, F . Huang, A . C. K ot, Unsupervised domain adaptation f or face anti-spooﬁng, IEEE T ransactions on S. Khan: Preprint submitted to Elsevier P age 12 of 14 CASO P AD Inf or mation Forensics and Security 13 (2018) 1794–1809. [28] X. Guo, Y . Liu, A. Jain, X. Liu, Multi-domain learning for updating face anti-spooﬁng models, in: ECCV , 2022. [29] T . Ojala, M. Pietikainen, T . Maenpaa, Multiresolution gray-scale and rotation inv ariant texture classiﬁcation with local binar y patter ns, IEEE Transactions on pattern analy sis and machine intelligence 24 (2002) 971–987. [30] J. Määttä, A. Hadid, M. Pietikäinen, Face spooﬁng detection from single images using micro-te xture analysis, in: 2011 Inter national Joint Conference on Biometrics (IJCB), IEEE, 2011, pp. 1–7. [31] Z. Boulkenaf et, J. Komulainen, A. Hadid, Face anti-spooﬁng using color te xture analy sis, IEEE Transactions on Information Forensics and Secur ity 11 (2016) 1818–1830. [32] Z. Boulk enafet, J. Komulainen, A. Hadid, Face anti-spooﬁng based on color texture analy sis, in: 2015 IEEE inter national conference on image processing (ICIP), IEEE, 2015, pp. 2636–2640. [33] J. Li, Y . W ang, T . Tan, A. K. Jain, Liv e face detection based on the analy sis of f our ier spectra, in: Biometric technology f or human identiﬁcation, volume 5404, SPIE, 2004, pp. 296–303. [34] V . M. Patel, N. K. Ratha, R. Chellappa, Secure face unlock: Spoof de- tection on smar tphones, IEEE Transactions on Inf or mation Forensics and Secur ity 11 (2016) 2268–2283. [35] Y . A. U. Rehman, L.-M. Po, M. Liu, Z. Zou, W . Ou, Y . Zhao, Face liveness detection using conv olutional-f eatures fusion of real and deep netw ork generated face images, Journal of Visual Communication and Image Representation 59 (2019) 574–582. [36] N. Bousnina, L. Zheng, M. Mikram, S. Ghouzali, K. Minaoui, Un- ra veling robustness of deep face anti-spooﬁng models against pix el attacks, Multimedia Tools and Applications 80 (2021) 7229–7246. [37] S. Fatemif ar, S. R. Arashloo, M. A w ais, J. Kittler, Client-speciﬁc anomaly detection for face presentation att ack detection, Patter n Recognition 112 (2021) 107696. [38] G. W ang, H. Han, S. Shan, X. Chen, Unsupervised adversarial domain adaptation for cross-domain face presentation attack detection, IEEE Transactions on Inf or mation Forensics and Secur ity 16 (2020) 56–69. [39] Z. W ang, Z. W ang, Z. Y u, W . Deng, J. Li, T. Gao, Z. W ang, Domain generalization via shuﬄed style assembly for face anti-spooﬁng, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4123–4133. [40] M. O. Alassaﬁ, M. S. Ibrahim, I. Naseem, R. AlGhamdi, R. Alotaibi, F . A. Kateb, H. M. Oqaibi, A. A. Alshdadi, S. A. Y usuf, A nov el deep learning architecture with image diﬀusion for robust face presentation attack detection, IEEE Access 11 (2023) 59204–59216. [41] K. P atel, H. Han, A. K. Jain, Cross-database f ace antispooﬁng with robust f eature representation, in: Chinese Conference on Biometric Recognition, Springer, 2016, pp. 611–619. [42] W . Bao, H. Li, N. Li, W . Jiang, Liveness detection for face recognition based on optical ﬂow ﬁeld, 2009 Inter national Conference on Image Analy sis and Signal Processing (2009) 233–236. [43] R. Shao, X. Lan, P . C. Y uen, Joint discr iminative learning of deep dynamic textures for 3d mask face anti-spooﬁng, IEEE Transactions on Information Forensics and Security 14 (2018) 923–938. [44] U. Muhammad, Z. Y u, J. Komulainen, Self-supervised 2d face presentation attack detection via temporal sequence sampling, Patter n Recognition Letters (2022). [45] C. Dhiman, A. Antil, A. Anand, S. Gakhar, A deep f ace spoof de- tection framew ork using multi-lev el elbps and stacked lstms, Signal, Image and Video Processing (2024) 1–14. [46] S.-Q. Liu, X. Lan, P . C. Y uen, Remote photoplethy smography correspondence feature f or 3d mask face presentation attack detection, in: Proceedings of the European Conf erence on Computer Vision (ECCV), 2018. [47] S.-Q. Liu, X. Lan, P . C. Y uen, Multi-channel remote pho toplethys- mography cor respondence feature f or 3d mask face presentation attack detection, IEEE Transactions on Information Forensics and Security 16 (2021) 2683–2696. [48] Z. Y u, X. Li, P . W ang, G. Zhao, Transrppg: Remote photopleth ys- mography transformer for 3d mask face presentation attack detection, IEEE Signal Processing Letters (2021). [49] Y . Liu, X. Zhang, S. Zhang, H. Liu, J. Shi, L. Liu, S. Fu, 3d mask face anti-spooﬁng wit h remote phot oplethysmography , IEEE Signal Processing Letters 23 (2016) 1589–1593. [50] C. Y ao, S. W ang, J. Zhang, W . He, H. Du, J. Ren, R. Bai, J. Liu, rppg- based spooﬁng detection f or face mask attack using eﬃcientnet on weighted spatial-temporal representation, in: 2021 IEEE International Conf erence on Image Processing (ICIP), IEEE, 2021, pp. 3872–3876. [51] S. Liu, X. Lan, P . Yuen, Temporal similar ity analy sis of remote photo- plethy smography for fast 3d mask face presentation attack detection, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 2608–2616. [52] K. Kotwal, S. Bhattacharjee, S. Marcel, Multispectral deep em- beddings as a counter measure to cus tom silicone mask presentation attacks, IEEE Tr ansactions on Biometrics, Behavior , and Identity Science 1 (2019) 238–251. [53] D. Li, G. Chen, X. Wu, Z. Y u, M. Tan, Face anti-spooﬁng with cross- stage relation enhancement and spoof material perception, Neural Netw orks 175 (2024) 106275. [54] V . D. Huszár , V . K. Adhikarla, Secur ing phygital gameplay: Strategies f or video-replay spooﬁng detection, IEEE Access (2024). [55] R. Koshy , A. Mahmood, Enhanced deep learning architectures f or face liveness detection f or static and video seq uences, Entropy 22 (2020) 1186. [56] S. Giurato, A. Or tis, S. Battiato, Real-time multiclass face spooﬁng recognition through spatiotemporal conv olutional 3d features, in: In- ternational Conf erence on Imag e Anal ysis and Pr ocessing, Springer , 2023, pp. 356–367. [57] A. Günay Yılmaz, U. Turhal, V . Nabiyev , Face present ation attack de- tection per formances of f acial regions with multi-block lbp features, Multimedia Tools and Applications 82 (2023) 40039–40063. [58] H.-H. Chang, C.-H. Y eh, Face anti-spooﬁng detection based on multi- scale image quality assessment, Image and Vision Computing 121 (2022) 104428. [59] A. Pinto, S. Goldenstein, A. Fer reira, T. Car valho, H. Pedrini, A. Rocha, Le veraging shape, reﬂectance and albedo from shading for face presentation attack detection, IEEE Transactions on Information Forensics and Security 15 (2020) 3347–3358. [60] M. O. Alassaﬁ, M. S. Ibrahim, I. Naseem, R. AlGhamdi, R. Alotaibi, F . A. Kateb, H. M. Oqaibi, A. A. Alshdadi, S. A. Y usuf, Fully supervised contrastive learning in latent space for f ace presentation attack detection, Applied Intelligence 53 (2023) 21770–21787. [61] A. S. Biswas, S. Dey , S. V erma, K. V erma, Deep guard: An enhanced hybrid ensemble classiﬁer for f ace presentation attack detection in- tegrating gabor and binarized statistical image features descriptors with deep learning, Computers and Electrical Engineer ing 127 (2025) 110566. [62] P . J. Seegehalli, B. N. Krupa, Lightweight 3d-studentnet f or defending against face replay att acks, Signal, Image and Video Processing (2024) 1–17. [63] J. Zhang, Y . Zhang, F . Shao, X. Ma, S. Feng, Y . Wu, D. Zhou, Eﬃ- cient face anti-spooﬁng via head-aw are transformer based know ledge distillation with 5 mb model parameters, Applied Soft Computing 166 (2024) 112237. [64] S. M. Ibrahim, M. S. Ibrahim, S. Khan, Y .-W . Ko, J.-G. Lee, Improv - ing face presentation attack detection through def or mable convolution and transfer learning, IEEE Access (2025). [65] S. Khan, T. H. M. Siddique, M. S. Ibrahim, A. J. Siddiqui, K. Huang, Spatio-temporal deep learning f or impr oved face presentation attack detection, Knowledg e-Based Systems (2025) 113059. [66] M. S. Jabbar, T . H. M. Siddique, K. Huang, S. Khan, Knowledg e distillation with predicted depth f or robust and lightweight face presentation attac k detection, Know ledge-Based Sy stems (2025) 114325. [67] S. R. Arashloo, Unseen face presentation attack detection using sparse multiple kernel ﬁsher null-space, IEEE Transactions on Circuits and Systems for Video T echnology 31 (2020) 4084–4095. S. Khan: Preprint submitted to Elsevier P age 13 of 14 CASO P AD [68] S. R. Arashloo, Matr ix-regularized one-class multiple kernel lear ning f or unseen face presentation attack detection, IEEE Transactions on Inf or mation Forensics and Security 16 (2021) 4635–4647. [69] S. Fatemif ar, M. A wais, A. Akbar i, J. Kittler, Particle swarm and pattern search optimisation of an ensemble of face anomaly detectors, in: 2021 IEEE Inter national Conference on Image Processing (ICIP), IEEE, 2021, pp. 3622–3626. [70] S. Fatemif ar, S. Asadi, M. A wais, A. Akbar i, J. Kittler, Face spooﬁng detection ensemble via multistage optimisation and pruning, Pattern recognition letters 158 (2022) 1–8. [71] N. Daniel, A. Anitha, T exture and quality anal ysis for face spooﬁng detection, Computers & Electrical Engineering 94 (2021) 107293. [72] X. Shu, H. T ang, S. Huang, Face spooﬁng detection based on chromatic ed-lbp texture feature, Multimedia Systems 27 (2021) 161– 176. [73] M. Alshaikhli, O. Elhar rouss, S. Al-Maadeed, A. Bouridane, Face- fak e-net: The deep learning method for image f ace anti-spooﬁng detection: Paper id 45, in: 2021 9th European W orkshop on Visual Inf or mation Processing (EUVIP), IEEE, 2021, pp. 1–6. [74] Z. Li, et al., Facepad: a critical revie w of face presentation att ack detection systems, Patter n Recognition 113 (2022) 107749. [75] J. Zhang, Q. Guo, X. W ang, R. Hao, X. Du, S. Tao, J. Liu, L. Liu, Ucdcn: a nested arc hitecture based on central diﬀerence conv olution f or face anti-spooﬁng, Complex & Intelligent Systems 10 (2024) 4817–4833. [76] Z. Ning, W . Zhang, J. Y ang, Face anti-spooﬁng based on 3d learnable conv olutional operators, in: 2024 36th Chinese Control and Decision Conf erence (CCDC), IEEE, 2024, pp. 4034–4040. [77] M. Marais, D. Brown, J. Connan, A. Boby , Facial liveness and anti-spooﬁng detection using vision transformers, in: Sout hern Africa T elecommunication Netw orks and Applications Conference (SA TN AC), volume 8, 2023, p. 2023. [78] S. F atemifar, M. A w ais, A. Akbar i, J. Kittler, Dev eloping a generic framew ork for anomaly detection, Pattern Recognition 124 (2022) 108500. [79] X. Long, J. Zhang, S. Shan, Conﬁdence a ware learning f or reliable face anti-spooﬁng, IEEE T ransactions on Information Forensics and Security (2025). [80] K. Simony an, A. Zisserman, V er y deep convolutional networks f or large-scale image recognition, arXiv preprint (2014). [81] C. Szegedy , V . V anhouck e, S. Ioﬀe, J. Shlens, Z. W ojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE conf erence on com puter vision and pattern recognition, 2016, pp. 2818–2826. [82] N. Ma, X. Zhang, H.- T . Zheng, J. Sun, Shuﬄenet v2: Practical guidelines f or eﬃcient cnn architecture design, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 116– 131. [83] H. Cai, C. Gan, T . W ang, Z. Zhang, S. Han, Once-for -all: Train one netw ork and specialize it for eﬃcient deployment, arXiv preprint arXiv:1908.09791 (2019). [84] M. Sandler, A. How ard, M. Zhu, A. Zhmoginov , L.-C. Chen, Mo- bilenetv2: Inv er ted residuals and linear bo ttlenecks, in: Proceedings of the IEEE conf erence on computer vision and pattern recognition, 2018, pp. 4510–4520. [85] S. Meht a, M. Rastegari, Separable self-attention for mobile vision transf or mers, arXiv preprint arXiv:2206.02680 (2022). [86] K. Han, Y . W ang, Q. Tian, J. Guo, C. Xu, C. Xu, Ghostnet: More f eatures from cheap operations, in: Proceedings of t he IEEE/CVF conf erence on computer vision and patter n recognition, 2020, pp. 1580–1589. [87] R. R. Selvaraju, M. Cogswell, A. Das, R. V edantam, D. Parikh, D. Batra, Grad-cam: Visual e xplanations from deep netw orks via gradient-based localization, in: Proceedings of t he IEEE inter national conf erence on computer vision, 2017, pp. 618–626. S. Khan: Preprint submitted to Elsevier P age 14 of 14

Face Presentation Attack Detection via Content-Adaptive Spatial Operators

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment