A Neural Network-Based Real-time Casing Collar Recognition System for Downhole Instruments
Accurate downhole positioning is critical in oil and gas operations but is often compromised by signal degradation in traditional surface-based Casing Collar Locator (CCL) monitoring. To address this, we present an in-situ, real-time collar recogniti…
Authors: Si-Yu Xiao, Xin-Di Zhao, Xiang-Zhan Wang
1 A Neural Network-Based Real-time Casing Collar Recognition System for Do wnhole Instruments Si-Y u Xiao , Xin-Di Zhao , Xiang-Zhan W ang , T ian-Hao Mao , Y ing-Kai Liao , Xing-Y u Liao , Y u-Qiao Chen , Jun-Jie W ang , Shuang Liu , T u-Pei Chen , Y ang Liu ∗ Abstract —Accurate downhole positioning is critical in oil and gas operations but is often compr omised by signal degrada- tion in traditional surface-based Casing Collar Locator (CCL) monitoring. T o address this, we present an in-situ, r eal-time collar recognition system using embedded neural network. W e introduce lightweight “Collar Recognition Nets” (CRNs) optimized for resour ce-constrained ARM Cortex-M7 microprocessors. By leveraging temporal and depthwise separable convolutions, our most compact model reduces computational complexity to just 8,208 MA Cs while maintaining an F1 score of 0.972. Hardwar e validation confirms an av erage inference latency of 343.2 µs, demonstrating that rob ust, autonomous signal processing is feasible within the sever e power and space limitations of downhole instrumentation. Index T erms —Casing collar locator , Deep Learning, Downhole Instrument, Edge Computing System, Patter n Recognition, Signal Processing I . I N T RO D U C T I O N I N the exploration and production of oil and gas resources, the accurate positioning of downhole instruments remains a challenging yet critical task, as it directly influences reserv oir contact, production efficienc y , and operational safety [1], [2]. The detection of casing collars, which serve as depth markers along the steel casing string, by casing collar locators (CCLs), is the predominant method for estimating the depth of downhole instruments due to its cost-ef fectiveness, efficiency , and high reliability [3]–[5], as illustrated in Fig. 1(a). A CCL is a magnetic sensor typically integrated into the downhole toolstring, comprising a coil positioned between two magnets. As the CCL traverses a casing collar , the magnetic flux lines concentrate within the increased metal mass of the casing collar and this variation in the magnetic field induces a voltage pulse in the coil [3], [4], [6], referred to as a “collar signal” or “collar (magnetic) signature”. This characteristic magnetic response typically exhibits a bimodal wa veform, as illustrated by the dark blue traces in Fig. 1(a). By correlating collar signatures with the casing tally (reference depths of the casing collars) from well completion data, the precise position of the downhole instrument can be determined [5]. While the use of a length Si-Y u Xiao, Xiang-Zhan W ang, Tian-Hao Mao, Y ing-Kai Liao, Xing-Y u Liao, Y u-Qiao Chen, Jun-Jie W ang, Shuang Liu and Y ang Liu are with the State Key Laboratory of Thin Solid Films and Integrated Devices, University of Electronic Science and T echnology of China, Chengdu 611731, China. Xin-Di Zhao is with the Southwest Branch of China National Petroleum Corporation Logging Co., Ltd., Chongqing 401100, China. T u-Pei Chen is with Nanyang T echnological Uni versity , Singapore 639798. This work is supported by NSFC under project No. 62404034 and 62404033. This work is also supported by China National Petroleum Corporation Logging Co., Ltd. (CNLC) under project No. CNLC2023-7A01. ∗ Corresponding author. measuring wheel (LMW) to measure wireline length is a simple and common alternati ve, it is practically unreliable due to the inherent elasticity and long metallic wirelines, providing only a rough estimation of the do wnhole instrument position. In current operations, the recognition of collar signatures from CCL signals is typically performed at the surface, relying on data transmitted via wirelines exceeding 1000 m in length. Howe ver , this approach faces significant practical challenges: (1) The CCL signal is often contaminated by other magnetic sources, such as the casing wall and the metallic downhole toolstring. Certain interference waveforms closely resem- ble collar signatures, making it difficult to distinguish actual CCL signals (blue wa veform) from interference (green wav eform) [3], [9] in Fig. 1(a). (2) Long wirelines introduce significant signal attenuation and noise, hindering the reliable recognition of collar signatures [5], [10]. (3) Manual identification of collar signatures at the surface is not only inefficient and error -prone, b ut the reliance on manpower and surf ace equipment also incurs substantial costs [6]–[8]. In specific operations, such as pump- down perforation (PDP) and plug-and-perf (P&P) [2], collar recognition must be performed in real-time, further exacerbating the difficulty . (4) Due to the restricted diameter of the wellbore, the a v ailable space and power supply within the downhole instrument are limited, precluding the integration of conv entional high-performance, power -intensive computing hardware. For these reasons, the development of automatic, in-situ, and real-time CCL signal processing using lightweight algorithms embedded within downhole instruments is highly desirable. T raditional methods include thresholding techniques [7], [11], [12], time-domain and frequency-domain analysis [6], [13], [14], physical plausibility analysis [3], [7]. Ho wever , these approaches exhibit limited generalizability [13], [15], making it challenging to recognize collar signatures in the presence of complex and variable interference. W ith the rapid adv ancement and increasing accessibility of deep learning, deep neural networks (DNNs) ha ve emerged as a promising solution to address challenges in the oil and gas industry . These applications include data interpretation [16], [17], imaging [18], and operational planning [19]. Encouraging results have also been achieved in collar signature recognition using architectures such as con volutional neural networks (CNNs) [20]–[22], recurrent neural networks (RNNs) [20], [21], and residual neural networks (ResNets) [23]. The high computational demand of neural networks often exceeds the processing performance of downhole edge de vices. 2 Fig. 1: (a) Schematic cross-section of a typical oil and gas well structure. Representative casing collar signatures deriv ed from magnetic response are illustrated in dark blue near the corresponding casing collars, while typical interference signals are illustrated in dark green. Adapted from [7], [8]; (b) Photograph of an perforating gun, e xemplifying a typical downhole instrument assembly used in oil and gas wells; (c) Schematic diagram of the internal structure of a do wnhole instrument; the battery is omitted for clarity; (d) Block diagram of the collar recognition system within the control capsule; (e) Progress flo w diagram for casing collar recognition utilizing neural network and casing tally; (f) Deployment workflow for neural network models, illustrating the transition from a PyT orch model to an ex ecutable program. Consequently , the model architecture must be optimized to ensure compatibility with these hardware constraints. Over the past decade, neural network theory has ev olved significantly . Causal and temporal con volutions provide a theoretical basis for applying conv olution operators to signal processing [24]–[26]. Architectures such as MobileNets [27]– [29] and NIN [30] have introduced techniques like consecutiv e small conv olutional kernels [31], depthwise separable and pointwise con volutions [32], and global average pooling [33], respectiv ely , to enhance computation efficiency . Furthermore, batch normalization [34] and dropout [35]–[39] have been established as standard regularization techniques to mitigate ov erfitting and enhance generalization. Additionally , the at- tention mechanism represents a major breakthrough in deep leaning [40]–[42]. In this work, we propose three lightweight neural network architectures specifically designed for real-time casing collar recognition. These models are deployed within the custom downhole instrument po wered by an ARM Cortex-M7 sys- tem. The performance of both the proposed models and the integrated collar recognition system is validated using field data. I I . M E T H O D O L O G Y W e developed a system integrated into the downhole instru- ment to achiev e this in-situ and in real-time, as illustrated in Fig. 1(b)–(d). This architecture represents an adv ancement of the embedded system originally proposed in [7]. The core of the collar recognition system is a microprocessor unit (MPU), which offers a compact size and tolerance to do wnhole high temperature yet possesses limited computational resources compared to high-performance hardware typically restricted to surface equipment. The system samples and pre- processes the CCL signal, identifies collar signatures using a neural network, and subsequently estimate the depth and motion state of the downhole tool through a post-processing algorithm [8]. The entire workflo w is executed in real-time within the downhole instrument. A. Signal Processing The raw analog CCL signal is acquired by the analog front-end (AFE) module and digitized by the analog-to-digital con verter (ADC), as illustrated in Fig. 2. A signal conditioning circuit adjusts the signal amplitude to match the dynamic range of the ADC. Giv en the low signal frequency band ( 10 Hz to 3 100 Hz ), this simplified hardware architecture is sufficient. The ADC samples the raw CCL signal at a rate of 1 kHz with a 16-bit resolution, and the resulting digital stream is transmitted to the MPU for subsequent processing. The integer v alues of the digitized CCL signal are first logged to data storage. Subsequently , these values are normalized to floating-point format using an empirical formula, resulting in a distribution assumed to be standard normal (mean of 0, standard deviation of 1). A sliding windo w buff er retains the most recent 160 samples. This sequence constitutes the pre- processing stage, after which the buf fered sequence is input into the neural network. The neural network’ s outputs are passed through a sigmoid function to deri ve the probability of a collar signature, gen- erating a continuous probability map. A collar signature is identified when the probability exceeds a predefined threshold for a specific duration, at which point its temporal centroid is calculated. The corresponding depth is determined by correlating these detection ev ents with the casing tally from well completion data. Consequently , the motion state of the downhole tool is deriv ed from the timestamps and depth correlations of the identified collars. This procedure defines the post-processing stage, and both the probability map and recognition results are logged to storage. The recognition results and motion state data are utilized by control algorithms to generate control signals. These signals are transmitted to the downhole tool driving modules to regulate the operation of the do wnhole tool. B. Network Structure and T raining Due to the limited computational resources of MPUs, the depth, feature size, and parameters of the proposed model must be strictly controlled. W e le verage T emporal Conv olutional Net- work (TCN) [24] inspired design principles within a MobileNet framew ork to maintain efficiency . Attention mechanisms are omitted to avoid excessi ve computational costs, especially considering the limited availability of field CCL signal data [8], [40], [42]. The architectures of proposed models, designated as Collar Recognition Nets (CRNs), are illustrated in Fig. 3. The model input is a sequence of the most recent 160 sample points from the pre-processed CCL signal. The backbone employs standard blocks (con volution, batch normalization (BN) [34], spatial dropout [38], and acti vation), while the head consists of fully connected (FC), activ ation, and dropout layers. The output logit indicates the probability of a collar signature. Although CRN-1 provides a functional baseline with suf ficient parameter capacity to reach maximum performance, it is not optimized for resource-constrained deployment. T o improve ef ficiency , CRN-2 utilizes depthwise separable conv olutions, which cut the conv olutional computational cost to approximately 40% of the baseline [27]. Building on this, CRN-3 introduces an initial pooling layer to further diminish the computational load with minimal impact on accurac y , as illustrated in Fig. 3 and T able I. The dataset for training consists of field CCL wa veforms, where each sample contains a manually labeled collar signature. VC C ½ VC C ½ VC C GN D AD C Ana lo g CC L Signa l D igit al C C L Sign a l GN D GN D Fig. 2: Schematic diagram of the AFE module within the collar recognition system. Co n v k 3 <3 × 1 × 16 > s1, p 1 1 60 × 1 R e l u 1 6 0 × 1 6 M a xPoo l k 10 i n p u t Co n v k 3 <3 × 16 × 32 > s1, p 1 R e l u 1 6 × 3 2 M a xPoo l 8 × 3 2 k2 1 6 × 1 6 Co n v k 3 <3 × 32 × 16 > s1, p 1 R e l u 8 × 1 6 M a xPoo l 4 × 1 6 k2 FC FC 1 l o g i t o u tp u t R e l u Fl a t te n 64 16 Co n v k 3 <3 × 1 × 16 > s1, p 1 1 60 × 1 R e l u 1 6 0 × 1 6 M a xPoo l k 10 i n p u t 1 6 × 1 6 DwC onv k 3 <3 × 16 > s1, p 1 R e l u 1 6 × 1 6 Pw Co n v k 1 <1 × 16 × 32 > s1 R e l u 1 6 × 3 2 M a xPoo l 8 × 3 2 k2 Dw C on v k 3 <3 × 32 > s1, p 1 R e l u 8 × 3 2 Pw Co n v k 1 <1 × 32 × 16 > s1 R e l u 8 × 1 6 M a xPoo l 4 × 1 6 k2 FC FC 1 l o g i t o u tp u t R e l u Fl a t te n 64 16 1 60 × 1 i n p u t 1 6 × 1 Co n v k 3 <3 × 1 × 16 > s1, p 1 R e l u 1 6 × 1 6 M a xPoo l k2 8 × 1 6 DwC onv k 3 <3 × 16 > s1, p 1 R e l u 8 × 1 6 Pw Co n v k 1 <1 × 16 × 32 > s1 R e l u 8 × 3 2 M a xPoo l 4 × 3 2 k2 Dw C on v k 3 <3 × 32 > s1, p 1 R e l u 4 × 3 2 Pw Co n v k 1 <1 × 32 × 16 > s1 R e l u 4 × 1 6 M a xPoo l 2 × 1 6 k2 FC FC 1 l o g i t o u tp u t R e l u Fl a t te n 32 16 A vg P o ol k 10 L egend Omiss ion Beh avior Fo llo w s P yTo rch CR N - 1 CR N - 2 CR N - 3 in p u t c h an n els = 96 outpu t c h an n el s = 2 56 o p e ra t o r = c on v 1d k er n e l siz e = 5 st r id e = 1 p a d d i n g = 2 Co n v k 5 <5 × 96 × 25 6> s1, p 2 R e l u Fig. 3: Network architectures of the Collar Recognition Nets (CRNs) proposed in this work. For clarity , the batch normalization and dropout layers following each con volutional layer or fully connected layer are omitted. 4 T o improve training ef ficiency and data div ersity , we apply a suite of techniques: standardization, label distribution smooth- ing (LDS), random cropping, label smoothing regularization (LSR), time scaling, and multiple sampling [8]. Notably , collar labels are con verted into continuous probability maps, as illustrated in Fig. 4(a), to provide more informative training targets. Model training is conducted in the PyT orch frame work, employing the binary cross-entropy (BCE) loss and Adam optimizer . C. Deployment of Networks The system is po wered by an ARM Corte x-M7 MPU, which features a double-precision floating point unit (FPU), single instruction multiple data (SIMD) capabilities, and an L1-cache. Experimental memory-bound benchmarks indicate that the processor achie ves a throughput of approximately 70 MFLOPS at a clock frequency of 550 MHz under optimal conditions. The parameters counts and computational costs of the ev aluated models are summarized in T able I. Specifically , the CRN-3 model comprises 1,985 parameters and requires 8,208 MA Cs per inference. The deployment of the CRN-3 model on this MPU is computationally feasible, supporting T ABLE I: Comparison of Network Capacities and Performance with Existing W orks Method Network Capacity Perf ormance Remarks Inference Latency In-situ, Real-time Params MA Cs Acc P R F1 CRN-1 4305 45584 98.4% 100.0% 98.4% 99.2% CNN + Data Augmentation 1465 . 0 µ s CRN-2 2497 22544 97.7% 100.0% 97.7% 98.8% CNN + Data Augmentation + Depthwise Separable Conv olution 1203 . 7 µ s CRN-3 1985 8208 94.6% 100.0% 94.6% 97.2% CNN + Data Augmentation + Depthwise Separable Con volution + Input Pooling 376 . 5 µ s ✓ [8] T AN 21744160 31112608 97.7% 97.7% 100.0% 98.9% Thin AlexNet + Data Augmentation – [7] DTPP – – 97.3% 98.8% 98.5% 98.6% Dynamic Threshold + Physical Plausibility – ✓ [8] MAN 1383584 3824640 96.2% 97.7% 98.4% 98.1% Miniaturized AlexNet + Data Augmentation – [21] CNN-LSTM 65250 16086144 97.8% 95.9% 99.1% 97.5% CNN + LSTM – [21] CNN – – 97.4% 100.0% 94.2% 97.0% CNN – [21] LSTM – – 94.8% 100.0% 88.4% 93.9% LSTM – [22] 1D-CNN 1814445 27451200 – – – – CNN – Only models with reproducible structural descriptions are listed. “–” indicates “Not reported” or “Not applicable”; “P” and “R” denote precision and recall, respectively . Fig. 4: (a) Ideal probability map deriv ed from manually collar annotations; (b–e) Comparisons of probability maps and recognition performance between MAN [8] and CRN-1 through CRN-3 models; probability maps de viate more noticeably from the ideal map as network capacity decreases; (f) An e xample of full-length recognition results using CRN-3, demonstrating that the majority of collar signatures are correctly recognized. 5 continuous 1 kHz inference to match the sensor’ s sampling rate. T ensorFlow Lite for Microcontrollers (TFLM) serves as the runtime en vironment for ex ecuting neural network models on microcontroller de vices. TFLM provides optimized implementations for the Cortex-M architecture, le veraging the hardware-accelerated FPU and SIMD instructions. The CRN-3 model is conv erted into an deployable format using TFLM tools and integrated into the recognition firmware, as illustrated in Fig. 1(e)–(f). I I I . M E A S U R E M E N T , V A L I DAT I O N , A N D D I S C U S S I O N The proposed system is v alidated by e v aluating both the neural network model and its MPU-based deployment. The model validation process inv olves performing inference on full-length field CCL wa veforms to recognize collar signatures. The recognition results are compared with manually annotated ground truth collar labels to obtain model performance, includ- ing precision, recall and F1 scores. Specifically , recognized collar signatures within the temporal neighborhood of annotated collar labels are classified as true positives, whereas detections outside these regions are classified as false positiv es; missed collar signatures are considered as false negati ves. The entire v alidation process is conducted offline on a workstation, as an example illustrated in Fig. 4(f). The resulting performance for each model are detailed in T able I. Results show that CRNs significantly reduce parameters and computational costs with only marginal performance loss. Howe ver , as network capacity decreases, probability maps deviate more noticeably from the ideal map, as illustrated in Fig. 4(b)–(e). This trend highlights a trade-of f between hardware ef ficiency and recognition precision. As shown in T able I, the proposed CRNs operate with only thousands of MACs, yet match or surpass the performance of existing models that demand millions of MA Cs. Specifically , CRN-1 achie ves an F1 score of 0.992, the highest among all ev aluated models. Meanwhile, CRN-3 achieves an F1 score of 0.972, with a computational cost ranging from only 264 ppm to 2140 ppm of the costs reported in related works [8], [21], [22]. Consequently , CRNs demonstrate distinct adv antages in performance, parameter efficiency , and computational economy relativ e to models from existing literature. The hardware implementation was validated by executing the CRN-3 model on the MPU using stored CCL wa veforms. The end-to-end execution time, encompassing sample acquisition to final output, was recorded and compared against workstation- based simulations, as listed in T able I. The results indicate that the CRN-3 model inference requires an av erage of 343 . 2 µ s per 1 ms sampling interv al. This duration is consistent with the computational capacity of the MPU and the computational complexity of the model, confirming the capability of the proposed solution to robustly recognize collar signatures in real-time. The on-board probability maps and collar recognition results match the offline simulations within the margin of floating-point precision errors, thereby verifying the correctness of the embedded implementation. I V . C O N C L U S I O N This paper proposes a casing collar recognition system for do wnhole instruments po wered by an ARM Cortex-M7 MPU. A lightweight neural-network-based algorithm enables the in situ, real-time identification of casing collar signatures. The most compact model achieves a computational cost of 8,208 MA Cs, while maintaining an F1 score of 0.972. On-board validation confirms the system’ s capability to correctly recognize collar signatures under real-time constraints. Consequently , this work v alidates the feasibility of deploying neural networks for in-situ, real-time casing collar recognition within downhole instruments and provides a foundation for future research and development. R E F E R E N C E S [1] M. Harris, “The effect of perforating oil well productivity , ” Journal of P etroleum T echnology , vol. 18, no. 04, pp. 518–528, Apr . 1966. [2] D. Lu, Oil & Gas F ield P erforating T echnology . Beijing, China: Petroleum Industry Press, 2012. [3] J. O. Alvarez, E. Buzi, R. W . Adams, and M. Deffenbaugh, “Theory , design, realization, and field results of an inductiv e casing collar locator, ” IEEE T ransactions on Instrumentation and Measurement , vol. 67, no. 4, pp. 760–766, 2018. [4] A. O. Gidado, C. Ekesiobi, H. Kpone-T onwe, and J. Adesun, “W ell diagnostic of new underperforming wells using do wnhole log tool [SNT & MDT], ” in SPE Nigeria Annual International Confer ence and Exhibition . Society of Petroleum Engineers, 2023. [5] R. Mijarez, D. Pascacio, R. Guevara, C. T ello, O. P acheco, and J. Rodr ´ ıguez, “HPHT cased-hole CCL tool enhancement via DSP techniques for accurate depth control in wire-line well interventions, ” IMAPSour ce Proceedings , pp. 305–310, jan 1 2014. [6] H. Li, T . T ang, and Y . W ang, “Casing state detection methods based on the ccl signal of the tractor for horizontal wells, ” in 2013 IEEE 11th International Conference on Electr onic Measurement & Instruments . IEEE, 2013, pp. 568–573. [7] S.-Y . Xiao, G.-H. Ren, T .-H. Mao, Y .-Q. Chen, Y .-A. Liu, J.-J. W ang, K. T ang, X.-D. Zhao, Z.-J. Y u, S. Liu, T .-P . Chen, and L. Y ang, “Realization of precise perforating using dynamic threshold and physical plausibility algorithm for self-locating perforating in oil and gas wells, ” arXiv preprint arXiv:2509.00608 , 2025. [8] S.-Y . Xiao, X.-D. Zhao, T .-H. Mao, Y .-W . W ang, Y .-Q. Chen, H.-Y . Zhang, J. W ang, J.-J. W ang, S. Liu, T .-P . Chen, and Y . Liu, “Data-augmented deep learning for downhole depth sensing and field validation, ” arXiv pr eprint arXiv:2511.00129 , 2025. [9] H. W ang and W . T ang, “ Application of computer automatic recognition technology in perforating depth control, ” W ell Logging T echnology , vol. 30, no. 4, pp. 378–380, 2006. [10] J. Brown, “The effects of cable on signal quality , ” Sound and V ideo Contractor , pp. 22–33, 1990. [11] H. W ang, H. Lu, J. Pan, G. Li, and X. Gao, “Collar depth identification method based on relative amplitude method, ” Journal of Harbin University of Commer ce (Natural Sciences Edition) , vol. 28, no. 4, pp. 435–438, 2012. [12] Y . Cong, “Perforating depth control method based on automatic tracking and recognition technology of casing collar, ” P etr ochemical Industry Automation , vol. 58, no. 5, pp. 29–33, 2022. [13] J. Li, Y . Liu, J. Zhang, J. W ang, and Y . Zhang, “ Application of cross- correlation function method in locating perforation depth, ” Journal of Southwest P etr oleum University (Science & T echnology Edition) , vol. 42, no. 6, pp. 42–48, 2020. [14] H. Li, J. Chen, Y . Xiao, X. Liu, and J. W u, “Research on feature extraction of tractor magnetic positioning information based on frequency domain anti-aliasing wavelet time entropy , ” High T echnology Letters , vol. 20, no. 5, pp. 538–543, 2010. [15] Y .-P . Y ang, G.-H. Luan, L.-F . Zhang, M.-Y . Niu, G.-G. Zou, X.-L. Zhang, J.-Y . W ang, J.-F . Y ang, and M.-S. Li, “Leak identification and positioning strategies for downhole tubing in gas wells, ” Processes , vol. 13, no. 6, p. 1708, 2025. [16] K. Noh, D. Pardo, and C. T orres-V erdin, “Deep-learning inversion method for the interpretation of noisy logging-while-drilling resistivity measurements, ” arXiv preprint , 2021. 6 [17] S. Brazell, A. Bayeh, M. Ashby , and D. Burton, “ A machine-learning- based approach to assistive well-log correlation, ” P etr ophysics , vol. 60, no. 4, pp. 469–479, 2019. [18] E. M. V iggen, S. Grønsberg, S. Brekke, B. Hicks, and S. V . Wifstad, “Improving pipe perforation estimates from ultrasonic imaging using subpixel machine learning trained on optical data, ” Geoenergy Science and Engineering , vol. 246, p. 213541, 2025. [19] A. Elhadidy , A. Helmy , M. Heikal, and W . Hany , “Optimizing well per- foration with machine learning: A breakthrough in predictive modeling, ” in SPE Gas & Oil T echnology Showcase and Conference . Society of Petroleum Engineers, 2025. [20] S. K. Raman and M. Abuhaikal, “Data dri ven casing collar feature detection and identification for automated depth estimation for wireline, ” in F ourth EAGE Digitalization Confer ence & Exhibition . European Association of Geoscientists & Engineers, 2024, pp. 1–5. [21] J. Jing, Y . Qin, X. Zhu, H. Shan, and P . Peng, “Identification and prediction of casing collar signal based on CNN-LSTM, ” Arabian Journal for Science and Engineering , vol. 50, no. 7, pp. 4897–4911, 2025. [22] V . A. T orres Caceres, K. Duffaut, A. Y azidi, F . W estad, and Y . B. Johansen, “ Automated well log depth matching: Late fusion multimodal deep learning, ” Geophysical Prospecting , vol. 72, no. 1, pp. 155–182, 2024. [23] Z. Y an, Y . Chen, S. Zou, and J. Li, “ Automatic recognition method of collar based on Faster-RCNN network, ” Industrial Control Computer , vol. 37, no. 3, pp. 57–58, 2024. [24] S. Bai, J. Z. Kolter , and V . Koltun, “ An empirical ev aluation of generic con volutional and recurrent networks for sequence modeling, ” arXiv pr eprint arXiv:1803.01271 , 2018. [25] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. V inyals, A. Grav es, N. Kalchbrenner , A. Senior , and K. Kavukcuoglu, “W avenet: A generative model for raw audio, ” arXiv preprint , 2016. [26] A. v an den Oord, N. Kalchbrenner , L. Espeholt, O. V inyals, and A. Grav es, “Conditional image generation with pixelcnn decoders, ” in Advances in Neural Information Pr ocessing Systems (NeurIPS) , 2016. [27] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W . W ang, T . W eyand, M. Andreetto, and H. Adam, “MobileNets: Efficient con volutional neural networks for mobile vision applications, ” arXiv pr eprint arXiv:1704.04861 , 2017. [28] M. Sandler , A. Howard, M. Zhu, A. Zhmoginov , and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks, ” in Pr oceedings of the IEEE Conference on Computer V ision and P attern Recognition (CVPR) , 2018, pp. 4510–4520. [29] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. T an, W . W ang, Y . Zhu, R. Pang, V . V asudevan, Q. V . Le, and H. Adam, “Searching for MobileNetV3, ” in Pr oceedings of the IEEE/CVF International Confer ence on Computer V ision (ICCV) , 2019, pp. 1314–1324. [30] M. Lin, Q. Chen, and S. Y an, “Network in network, ” in International Confer ence on Learning Representations (ICLR) , 2014. [31] K. Simonyan and A. Zisserman, “V ery deep conv olutional networks for large-scale image recognition, ” arXiv preprint , 2015. [32] F . Chollet, “Xception: Deep learning with depthwise separable con volu- tions, ” in Proceedings of the IEEE Conference on Computer V ision and P attern Recognition (CVPR) , 2017, pp. 1251–1258. [33] C. Szegedy , V . V anhoucke, S. Ioffe, J. Shlens, and Z. W ojna, “Rethinking the inception architecture for computer vision, ” in Proceedings of the IEEE Conference on Computer V ision and P attern Recognition (CVPR) , 2016, pp. 2818–2826. [34] S. Ioffe and C. Szegedy , “Batch normalization: Accelerating deep network training by reducing internal covariate shift, ” in Pr oceedings of the 32nd International Confer ence on Machine Learning (ICML) . PMLR, 2015, pp. 448–456. [35] N. Sriv astava, G. Hinton, A. Krizhevsky , I. Sutske ver , and R. Salakhutdi- nov , “Dropout: A simple way to prevent neural networks from overfitting, ” Journal of Machine Learning Research , vol. 15, no. 1, pp. 1929–1958, 2014. [36] N. Sriv astava, “Improving neural networks with dropout, ” Master’ s thesis, Univ ersity of T oronto, 2013. [37] X. Bouthillier , K. K onda, P . V incent, and R. Memisevic, “Dropout as data augmentation, ” arXiv preprint , 2015. [38] J. T ompson, R. Goroshin, A. Jain, Y . LeCun, and C. Bregler , “Efficient object localization using con volutional networks, ” in Proceedings of the IEEE Conference on Computer V ision and P attern Recognition (CVPR) , 2015, pp. 648–656. [39] G. Ghiasi, T .-Y . Lin, and Q. V . Le, “Dropblock: A regularization method for con volutional networks, ” in Advances in Neural Information Pr ocessing Systems (NeurIPS) , 2018. [40] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks, ” in Pr oceedings of the IEEE Conference on Computer V ision and P attern Recognition (CVPR) , 2018, pp. 7132–7141. [41] A. V aswani, N. Shazeer, N. Parmar , J. Uszkoreit, L. Jones, A. N. Gomez, Ł . Kaiser, and I. Polosukhin, “ Attention is all you need, ” in Advances in Neural Information Pr ocessing Systems (NeurIPS) , 2017. [42] A. Dosovitskiy , L. Beyer , A. Kolesnikov , D. W eissenborn, X. Zhai, T . Unterthiner, M. Dehghani, M. Minderer , G. Heigold, S. Gelly , J. Uszkoreit, and N. Houlsby , “ An image is worth 16x16 words: T ransformers for image recognition at scale, ” in International Conference on Learning Representations (ICLR) , 2021.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment