Beam-Coherence-Aware Two-Stage Digital Combining for mmWave MU-MIMO Systems

This paper considers a wideband millimeter-wave MIMO system with fully digital transceivers at both the base station and the user equipment (UE), focusing on mobile scenarios. To reduce the baseband processing burden at the UE, we propose a two-stage…

Authors: Yasaman Khors, manesh, Emil Bjornson

Beam-Coherence-Aware Two-Stage Digital Combining for mmWave MU-MIMO Systems
1 Beam-Coherence-A ware T wo-Stage Digital Combining for mmW a v e MU-MIMO Systems Y asaman Khorsandmanesh, Student Member , IEEE , Emil Björnson, F ellow , IEEE , Joakim Jaldén, Senior Member , IEEE , and Bengt Lindof f, Senior Member , IEEE Abstract —This paper considers a wideband millimeter -wave MIMO system with fully digital transceivers at both the base station and the user equipment (UE), focusing on mobile scenar - ios. T o reduce the baseband processing burden at the UE, we propose a two-stage digital combining architectur e, where the recei ved signals are compressed from K antennas to dimension N c befor e baseband processing . The first-stage combining matrix exploits channel geometry and is updated on the beam-coherence timescale, which is longer than the channel coherence time, while the second stage is updated per channel coherence time. W e develop a pilot-based channel estimation framework tailored to the proposed two-stage digital combining architectur e, leveraging maximum likelihood estimation. Furthermore, we propose a time- domain method that exploits the finite delay spread to reconstruct the full channel from a reduced number of pilot subcarriers. Precoding and combining schemes are designed accordingly , and spectral efficiency expressions with imperfect channel state information are derived. Numerical r esults show that the pr o- posed time-domain approach outperforms hybrid beamforming while reducing pilot ov erhead. W e further demonstrate that the framework extends to multi-user MIMO and retains its performance advantages. These results highlight the potential of two-stage fully digital transceivers for future wideband systems. Index T erms —mmW av e MIMO, T wo-Stage Digital Combining, Time-Domain Channel Estimation, Multi-user MIMO, Maximum Likelihood Channel Estimation. I . I N T RO D U C T I O N Millimeter-w ave (mmW ave) technology holds great po- tential for wireless networks due to its significantly larger bandwidth compared to sub-6 GHz systems [ 2 ]. Howe ver , mmW ave systems ha ve faced deployment challenges in 5G. Ke y limitations include poor penetration through obstacles like walls, which limits the cov erage in urban en vironments, and environmental factors like rain and fog that heavily attenuate mmW ave signals, complicating service reliability [ 3 ]. Furthermore, ef fective beamforming for mmW ave currently requires complex, power-intensi ve hardware, leading to rapid battery drain at the user equipment (UE) [ 4 ]. These limitations suggest that the widespread mmW ave deployment will wait until significant advancements in hardware design, power management, and signal processing hav e occurred. Y . Khorsandmanesh, E. Björnson, and J. Jaldén are with the School of Electrical Engineering and Computer Science, KTH Royal Institute of T ech- nology , Stockholm, Sweden (E-mails: {yasamank, emilbjo, jalden}@kth.se). B. Lindoff is with BeammW ave AB (E-mail: bengt@beammwave.com). This work was supported by the Smarter Electronics System program by V innov a and V innova through the SweWIN center (2023-00572). This work was presented in part at the IEEE 36th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC 2025), 2025, Istanbul, Türkiye, which appears in this manuscript as reference [ 1 ]. T o ov ercome the severe path loss at these frequencies, sys- tems employ large antenna arrays and multiple-input multiple- output (MIMO) techniques to achieve high beamforming gains [ 5 ]. While hybrid beamforming (HBF) has been widely studied as a practical solution to reduce hardware comple xity , fully digital beamforming (DBF) is increasingly attractiv e due to its flexibility and ability to fully exploit the spatial degrees of freedom offered by MIMO technology [ 6 ]. The main challenge with DBF is the high hardware and processing complexity . In particular , a UE with many antennas must process high- dimensional signals across many subcarriers, leading to sub- stantial power consumption and data mov ement, especially in wideband systems [ 7 ], [ 8 ]. Channel estimation is another ke y challenge in mmW av e systems [ 9 ]. Con ventional frequency-domain approaches esti- mate each subcarrier independently , leading to pilot overhead and computational complexity proportional to the number of subcarriers. Ho wever , OFDM channels exhibit inherent structure and can often be represented by a limited number of significant time-domain taps [10]. By exploiting this property , more ef ficient estimation can be dev eloped that jointly pro- cesses information across subcarriers, thereby reducing both pilot overhead and computational complexity while improving estimation accuracy . In this paper, we propose a fully digital wideband mmW ave point-to-point MIMO architecture that addresses these chal- lenges. The UE employs a two-stage digital combining struc- ture, where the downlink received signal is first compressed from K UE recei ve antennas to N c dimensions, with N s ≤ N c ≤ K , where N s denotes the number of data streams. The first-stage combining is updated on the beam-coherence timescale, while the second stage adapts to small-scale fading per coherence block. The beam coherence time is defined as the duration over which beams remain aligned [11]. W e also propose a time-domain channel estimation method that reduces pilot ov erhead by estimating the channel taps directly and reconstructing the full frequency-domain channel via discrete Fourier transforms. While the framework is dev eloped for single-user point- to-point MIMO (SU-MIMO) to clearly expose the design, it naturally extends to multi-user MIMO (MU-MIMO). In the multi-user case, inter-user interference is handled via linear precoding, and we adopt minimum mean-squared error (MMSE)-based designs while retaining the proposed two-stage combining structure at the UE. 2 A. Related W orks HBF has been extensiv ely studied to reduce RF-chain re- quirements in mmW ave systems [12], but its analog constraints limit performance in wideband and dynamic scenarios. Fully digital architectures enable per-subcarrier processing and fine- grained spatial control, but introduce significant computa- tional and hardware challenges [13]. Dimension-reduction and subspace-based methods have been proposed to mitigate these issues [14], although they typically assume frequent updates or full-dimensional channel state information (CSI). Fully digital architectures hav e recently gained attention due to advances in hardware design [13]. These architectures enable per-subcarrier beamforming and fine-grained spatial processing, which are particularly beneficial in wideband systems. Ho wever , the associated computational burden and data movement remain major challenges, especially at the UE side. DBF and HBF require a similar number of RF components [15, Ch. 7], thus, the basic premise for the HBF is that phase shifters have lower power and cost than ADCs [16]. Howe ver , if DBF is implemented with a lower ADC resolution, it can provide better energy efficienc y [17]. Sev eral works have proposed dimension-reduction techniques or subspace-based processing to alleviate this issue, but often assume frequent updates of the combining matrices or full-dimensional channel state information [ 8 ], [14]. Channel estimation in wideband MIMO systems is com- monly performed in the frequency domain [15], which leads to high pilot overhead. Time-domain and structured estima- tion approaches exploit the finite delay spread to reduce the number of unknowns, but their integration with practical UE architectures is less explored. In [18], a sparse time-domain channel estimation method for hybrid mmW ave systems is proposed that reduces training overhead by exploiting channel sparsity , whereas our approach leverages the finite delay spread to reconstruct the full channel with a simple DFT -based method and reduced pilot overhead without requiring sparsity assumptions. In [19], a low-complexity two-step time-domain channel estimation is proposed for hybrid mmW ave systems that exploits angular and delay sparsity . In MU-MIMO, linear precoding schemes such as minimum mean-square error (MMSE) precoding are commonly used to manage inter-user interference [ 7 ]. These methods require accurate channel state information and are typically studied under fully digital architectures without considering UE-side dimensionality reduction. B. Contributions This paper proposes a beam-coherence-aware two-stage dig- ital combining architecture for wideband mmW av e MIMO sys- tems. The key idea is to reduce the receiv ed signal dimension before baseband processing while maintaining the flexibility of fully digital beamforming. The main contributions are: • W e propose a beam-coherence-aw are two-stage digital combining architecture for mmW ave MIMO UEs, where the first-stage combining matrix is updated once per beam coherence time T B , while the second-stage combiner is updated once per channel coherence time T C . • W e de velop a pilot-based channel estimation framework for this architecture in both uplink and downlink. In addition to a con ventional frequency-domain maximum likelihood (ML) estimator, we propose a time-domain- based estimation method that exploits the finite delay spread of the OFDM channel. This reduces the required pilot overhead from S pilot subcarriers to L pilot subcar- riers, where L ≪ S . • W e propose precoding and combining schemes tailored to the considered architecture. In the single-user case, the design is based on the singular value decomposition of the estimated channels, while in the multi-user extension, we employ MMSE precoding based on the estimated effecti ve channels. • W e derive achiev able spectral-efficiency (SE) expressions under imperfect CSI and quantify the performance of the proposed architecture. This includes both the single-user case and a multi-user extension based on a UatF bound. C. Notation The sets of integer , real, and complex numbers are denoted by Z , R , and C , respectiv ely . Matrices and vectors are denoted by bold uppercase and lowercase letters, respectiv ely , such as X and x . The ( m, k ) th element of X is denoted by [ X ] m,k , while [ x ] m denotes the m th element of x . The identity matrix of size M × M is denoted by I M , and 0 M × K denotes the all- zero matrix of size M × K . X (: , 1: d ) denotes the submatrix formed by the first d columns of X , i.e., all ro ws and columns 1 to d . The transpose, conjugate, conjugate transpose, pseudo-in verse, trace, and vectorization operators are denoted by ( · ) T , ( · ) ∗ , ( · ) H , ( · ) † , tr( · ) , and vec( · ) , respectively . The Frobenius norm and Euclidean norm are denoted by ∥ · ∥ F and ∥ · ∥ 2 , respectiv ely . Moreov er , E {·} denotes expectation and C N (0 , σ 2 ) denotes a circularly symmetric complex Gaussian random variable with variance σ 2 . D. P aper Outline The remainder of this paper is organized as follows. Sec- tion II presents the system model and the considered two- stage digital combining architecture. Section III introduces the pilot-based channel estimation procedure, while Section IV presents the proposed time-domain-based channel estimation method. Section V ev aluates the achiev able downlink spectral efficienc y and describes the corresponding precoding and combining design. Section VI extends the proposed frame- work to the MU-MIMO case. Section IV -D compares the pilot overhead and computational complexity of the proposed time-domain estimator with con ventional frequency-domain estimation. Finally , Section VII provides numerical results and Section VIII concludes the paper . I I . S Y S T E M M O D E L W e first consider a wideband mmW ave SU-MIMO system with M antennas at the base station (BS) and K antennas at the UE. In Section VI, we extend the results for MU- MIMO setups. As depicted in Fig. 1 , we focus on a mobile 3 Fig. 1: A mmW ave MIMO mobile system. system in which the scheduled UE moves along a trajectory , while multi-user scenarios will be considered in future work. The propagation environment is modeled geometrically using N cl scattering clusters [20]. W e assume an OFDM signal with S subcarriers and L time-domain taps. The channel matrix undergoes continuous variations over time, which we model piecewise constantly . As depicted in Fig. 2 , we consider time- domain blocks numbered by τ that contain all S subcarriers and hav e a time duration that matches the coherence time T C of the channel [ 7 ]. The frequency domain channel matrix for the ν -th subcarrier in the τ -th block is denoted as [15, Ch. 7] H [ τ , ν ] = N cl X i =0  L − 1 X ℓ =0 α i [ τ , ℓ ] e − j 2 πℓv /S | {z } ¯ α i [ τ ,ν ]  a r ( ϕ r i [ τ ]) a T t ( ϕ t i [ τ ]) , (1) for ν = 0 , . . . , S − 1 . Here, α i [ τ , ℓ ] ∼ C N (0 , β i [ τ , ℓ ]) is the small-scale fading coefficient of the ℓ -th time-domain tap for ℓ = 1 , . . . , L − 1 , and β i [ τ , ℓ ] ≜ E {| α i [ τ , ℓ ] | 2 } denotes the av erage power from the i -th cluster in the ℓ -th tap. β i [ τ , ℓ ] will vary gradually from block τ to other blocks. The line-of- sight path is denoted by i = 0 in (1), where α 0 [ τ , 0] = √ β 0 , α 0 [ τ , ℓ ] = 0 for ℓ = 1 , . . . , L − 1 , and β 0 describes the large- scale fading. The vectors a r ( ϕ r i [ τ ]) and a t ( ϕ t i [ τ ]) are the array response vectors at the UE and BS, respectiv ely . Both the UE and the BS employ horizontal uniform linear arrays (ULA) configuration with antenna spacing δ so that 1 [15] a r ( ϕ r i [ τ ]) = [1 , e j 2 πδ sin( ϕ r i [ τ ]) /λ c , . . . , e j 2 πδ ( K − 1)sin( ϕ r i [ τ ]) /λ c ] T , (2) a t ( ϕ t i [ τ ]) = [1 , e j 2 πδ sin( ϕ t i [ τ ]) /λ c , . . . , e j 2 πδ ( M − 1)sin( ϕ t i [ τ ]) /λ c ] T , (3) where ϕ r i [ τ ] and ϕ t i [ τ ] denotes the azimuth angle of arriv al (AoA) and the azimuth angle of departure (AoD) measured from the broadside direction of the respecti ve arrays in the block τ , and λ c is the wavelength at the carrier frequency f c . The channel matrix in ( 1 ) changes continuously over time. The parameters ¯ α i [ τ , ν ] undergoes rapid fluctuations, while the AoA ϕ r i [ τ ] and the AoD ϕ t i [ τ ] , and also β i [ τ , ℓ ] , ev olve slowly as they are determined by the large-scale geometry . The beam coherence time T B was defined in [11] as the duration ov er which the angular directions remain approximately fixed from a beamforming perspective, so one can keep the beamforming vectors constant. As indicated in Fig. 2 , T B , is much larger 1 This assumption is made to make the notation tractable, but can be easily extended to uniform planar arrays or even non-uniform array geometries. τ = 1 τ = 2 τ = 3 · · · τ = t T C . . . . . . . . . · · · . . . T B ν -th subcarrier T ime Frequency T p Fig. 2: The channel is approximately time-in variant in each block τ , comprising all subcarriers S and channel coherence time T C . The compressed digital combining matrix must be updated at the larger time intervals called the beam coherence time T B , which is t times larger than T C . T p denotes the pilot time, corresponding to the duration used for pilot transmission within a coherence interval. than T C ( t = T B / T C times larger , shown by the green box). The beam coherence time is typically at least one order of magnitude larger than the channel coherence time, although both scale with factors such as UE mobility [11]. A. Hardwar e Arc hitectur e Properties Even if there has been much prior work on HBF , the long-term goal is to implement DBF in mmW ave systems, as its digital processing over each antenna element enables fast and flexible adaptation to channel variations [13]. This is possible using state-of-the-art transceiver technology , but the bottleneck is the v ast amount of baseband samples that must be processed. A potential solution is to use the UE architecture shown in Fig. 3a, which contains K RF chains and a first-stage digital combining that reduces its dimension to N c , where N s ≤ N c ≤ K , so the rest of the baseband processing (i.e., channel estimation, second-stage combining) can be done with a similar dimensionality as in current HBF methods. Here, N s is the number of data streams. The ke y to success is that the first-stage combining is implemented efficiently and updated infrequently , so full-dimensional CSI is only required occasionally . Fig. 3b shows BeammW av e company’ s DBF platform on which the proposed algorithm could be implemented [21]. Based on the beam coherence time concept, it is logical that the first-stage digital combining is updated once per T B , while it is fixed within one green block in Fig. 2 . Since the small-scale fading varies more rapidly (i.e., once per coherence time), both the digital precoding F [ τ , ν ] and second-stage baseband combining W [ τ , ν ] are updated at this interv al for each τ and ν -th subcarrier . In contrast, the first-stage digital combining Q [ ν ] remains constant over t coherence blocks. A ke y difference between the proposed two-stage digital architecture and conv entional HBF is that the first-stage dig- ital combining can vary arbitrarily ov er the subcarriers. By contrast, the analog combining in a hybrid architecture also reduces the dimension but is the same on all subcarriers, since it is implemented using phase shifters. In the remainder of the paper , we will explore how to operate the considered two-stage digital architecture when it comes to uplink and downlink 4 1 2 . . . N s Digital Precoding F [ τ, ν ] 1 2 . . . M mmW ave Channel 1 2 . . . K RF Chain RF Chain RF Chain RF Chain RF Chain RF Chain 1 2 . . . N c CSI of K × N c Occasionally updated First-Stage Digital Combining (Dimension Reduction) Q [ ν ] CSI of N c × N s Baseband Digital Combining W [ τ , ν ] 1 2 . . . N s (a) Proposed setup (b) Implementation of BeammW ave’ s setup Fig. 3: Block diagram of a mmW ave SU-MIMO system employing fully digital precoding and a two-stage digital combining architecture. a) In the proposed setup, the first-stage combining matrix Q [ ν ] reduces the signal dimension and occasionally accesses CSI across all K antennas. The second-stage combining W [ τ , ν ] is updated frequently but has a reduced dimension N c < K . b) The BeammW av e company utilizes the same structure for implementing its UE receiver chips. channel estimation and what rates are achiev ed. Note that we assume that the BS uses DBF without hardware limitations to focus on the design of low-comple xity UEs. I I I . F R E Q U E N C Y - D O M A I N C H A N N E L E S T I M AT I O N In this section, we describe how to acquire CSI in the system shown in Fig. 3 . T o set the digital precoding matrix F [ τ , ν ] and the two digital combining matrices Q [ ν ] and W [ τ , ν ] , both the BS and UE need to estimate the channel H [ τ , ν ] . W e consider time-division duplexing (TDD) operation, where the channel is first estimated in the uplink and used in the downlink by lev eraging channel reciprocity . The first block ( τ = 1 ) in the green box in Fig. 2 is handled dif ferently from the others, as the first-stage combining Q [ ν ] is selected and then remains fixed until the end of the green box. W e then estimate the reduced-dimension effecti ve channel Q H [ ν ] H [ τ , ν ] for τ > 1 . Thus, we consider τ = 1 and τ > 1 separately in the remainder of this section. W e need uplink and downlink pilots as we have multiple antennas at both the BS and UE sides. A. Channel Estimation in the F irst Block: τ = 1 In the first block of the t that fits in a beam coherence time, the UE transmits pilot sequences on each subcarrier, which the BS uses to estimate the complete channel. Each antenna at the UE transmits an orthonormal pilot sequence of length t p ≥ K . The pilot overhead can be expressed as the ratio t p / t c , where t c indicates coherence block length in symbols. The pilot matrix used by the UE is denoted as Φ U [1 , ν ] ∈ C K × t p and has orthonormal rows. The UE transmits √ t p Φ U [1 , ν ] to ensure that the total pilot energy is propor - tional to the pilot length. The receiv ed signal at the BS is Y Pilot,U [1 , ν ] = p P r t p H T [1 , ν ] Φ U [1 , ν ] + N Pilot,U [1 , ν ] , (4) where P r is the transmit power used by the UE, normalized by the noise power , and N Pilot,U [1 , ν ] ∈ C M × t p is the noise matrix at the BS with i.i.d. C N (0 , 1) -entries. Many channel estimators can be de veloped based on the receiv ed pilot signal in ( 4 ). In this paper, we adopt the classical ML estimation approach since we consider a mobile UE where prior statistical channel knowledge cannot be obtained. The ML estimate of H T [1 , ν ] based on Y Pilot,U [1 , ν ] is [22] ˆ H T [1 , ν ] = Y Pilot,U [1 , ν ] Φ U † [1 , ν ] / p P r t p , (5) where ( · ) † denotes the pseudo-inv erse of a matrix. W e have Φ U † [1 , ν ] = Φ U H [1 , ν ]( Φ U [1 , ν ] Φ U H [1 , ν ]) − 1 in this case. Now , we shift focus to the downlink data transmission. The receiv ed signal on the ν -th subcarrier through H [1 , ν ] for the first block is giv en as Y D [1 , ν ] = H [1 , ν ] F [1 , ν ] | {z } B [1 ,ν ] S [1 , ν ] + N D [1 , ν ] , (6) where S [1 , ν ] ∈ C N s × N d represents the symbol matrix with independent entries that hav e unit norms, N s ≤ rank ( H [1 , ν ]) is the number of spatially multiplexed data streams transmitted on each subcarrier, and N d denotes the number of symbol vec- tors s [1 , ν ] ∈ C N s × 1 (one column of the symbol matrix) that are sent in the do wnlink. Each entry of N D [1 , ν ] ∈ C K × N d is i.i.d. C N (0 , 1) . The digital precoding F [1 , ν ] ∈ C M × N s is set by the BS utilizing the channel estimates obtained from the uplink in ( 5 ). Moreover , ∥ F [1 , ν ] ∥ 2 F = P t represents the per-subcarrier transmit power normalized by the noise power , where ∥ · ∥ F denotes the Frobenius norm. Section V -B provides details on how to select the precoding matrix F [1 , ν ] . The notation B [1 , ν ] = H [1 , ν ] F [1 , ν ] will be used for the precoded channel. A viable method to provide the UE with CSI is to transmit pilots using the precoding matrix, so the UE can estimate B [1 , ν ] . The downlink pilot matrix is Φ D [1 , ν ] ∈ C N s × N s . By transmitting these pilots ov er the channel in ( 6 ), the receiv ed signal at the UE becomes Y Pilot,D [1 , ν ] = p N s B [1 , ν ] Φ D [1 , ν ] + N Pilot,D [1 , ν ] , (7) where N Pilot,D [1 , ν ] ∈ C K × N s is the noise matrix with i.i.d. C N (0 , 1) -entries. By following the same steps as in the uplink, the estimated precoded channel at τ = 1 becomes ˆ B [1 , ν ] = Y Pilot,D [1 , ν ] Φ D † [1 , ν ] / p N s . (8) Now , the UE can select the first-stage digital combining matrix Q [ ν ] ∈ C K × N c and lower -dimensional second-stage digital combining matrix W [1 , ν ] ∈ C N c × K based on ˆ B [1 , ν ] and 5 Q H [ ν ] ˆ B [1 , ν ] , respectiv ely . In Section V -B, we explain how to select them and why we need the second combining matrix. B. Channel Estimation for Blocks τ > 1 For 1 < τ ≤ t , we need to repeat the steps of uplink channel estimation and estimation of the precoded downlink channel, but now the first-stage combining matrix Q [ ν ] is fixed. W e de- fine the effecti ve channel G [ τ , ν ] = Q H [ ν ] H [ τ , ν ] ∈ C N c × M . The UE sends the orthonormal pilot matrix ` Φ U [ τ , ν ] ∈ C N c × t p to estimate G [ τ , ν ] . The receiv ed signal at the BS is ` Y Pilot,U [ τ , ν ] = p P r t p G T [ τ , ν ] ` Φ U [ τ , ν ] + ` N Pilot,U [ τ , ν ] , (9) which is similar to (4) but contains the lower -dimensional ef- fectiv e channel. Here, ` N Pilot,U [ τ , ν ] ∈ C M × t p is the noise ma- trix with i.i.d. C N (0 , 1) -entries. The ML estimate of G [ τ , ν ] is ˆ G T [ τ , ν ] = ` Y Pilot,U [ τ , ν ] ` Φ U † [ τ , ν ] / p P r t p . (10) The BS selects the precoding F [ τ , ν ] based on ˆ G [ τ , ν ] (see Section V -B for details). At the UE, the recei ved signal after the first-stage digital combining matrix is ` Y D [ τ , ν ] = G [ τ , ν ] F [ τ , ν ] | {z } D [ τ ,ν ] S [ τ , ν ] + Q H [ ν ] N D [ τ , ν ] | {z } ` N D [ τ ,ν ] , (11) where D [ τ , ν ] = G [ τ , ν ] F [ τ , ν ] is the precoded effecti ve chan- nel. Since the first-stage combining matrix Q [ ν ] only reduces the channel dimension based on the channel geometry but is independent of the current small-scale fading realizations, we also need a second-stage combining matrix W [ τ , ν ] to mitigate interference between the N s streams. W e need to estimate D [ τ , ν ] to implement that. Now , by transmitting the downlink pilot matrix ` Φ D [ τ , ν ] ∈ C N s × N s , the received signal after first-stage combining matrix is ` Y Pilot,D [ τ , ν ] = p N s D [ τ , ν ] ` Φ D [ τ , ν ] + ` N Pilot , D [ τ , ν ] , (12) where ` N Pilot , D [ τ , ν ] ∈ C N c × N s is the colored processed noise with independent columns and the cov ariance matrix C ` N Pilot , D = E  Q H [ ν ] Q [ ν ]  . Since the pilot matrix is multi- plied from the right in (12) and the noise correlation appears from the left, the correlation has no impact on the estimator . The ML estimate of effecti ve downlink channel D [ τ , ν ] is ˆ D [ τ , ν ] = ` Y Pilot,D [ τ , ν ] ` Φ D † [ τ , ν ] √ N s . (13) All the proposed steps are presented in Algorithm 1 . I V . T I M E - D O M A I N - B A S E D C H A N N E L E S T I M AT I O N In Section III, we described a frequency-domain channel estimation approach for TDD systems, where the channel is estimated independently on each subcarrier, as summarized in Algorithm 1 . While straightforward, this approach has several limitations. First, it requires pilots on all subcarriers, which leads to a large pilot overhead that reduces the achievable SE. Second, the computational complexity scales linearly with the number of subcarriers, leading to increased latency Algorithm 1 Proposed Channel Estimation Approach 1: Input : Pilot matrices Φ U [1 , ν ] , Φ D [1 , ν ] , ` Φ U [ τ , ν ] , ` Φ D [ τ , ν ] 2: if τ = 1 then 3: Estimate uplink channel H T [1 , ν ] by utilizing ( 5 ) 4: Set precoding F [1 , ν ] as in (31) 5: Estimate downlink channel after precoding B [1 , ν ] as in ( 8 ) 6: Set compressed first-stage combining matrix Q [ ν ] as in (32) 7: Set second-stage digital combining matrix W [1 , ν ] based on lower -dimensional channel Q [ ν ] ˆ B [1 , ν ] as in (33) 8: else if 1 < τ ≤ t then 9: Define effectiv e channel G [ τ , ν ] = Q H [ ν ] H [ τ , ν ] blue 10: Estimate uplink effectiv e channel G T [ τ , ν ] as in (10) 11: Update precoding F [ τ , ν ] as in (34) 12: Estimate second effectiv e channel D [ τ , ν ] as in (13) 13: Update second-stage digital combining matrix W [ τ , ν ] as in (35) 14: end if 15: Output : F [ τ , ν ] , Q [ ν ] and W [ τ , ν ] and po wer consumption. Third, frequency-domain estimation is sensitiv e to noise and interference, often yielding noisy channel estimates unless additional smoothing or interpolation is applied. Most importantly , this approach fails to exploit the inherent structure of OFDM-based channels, as it treats the channel responses on different subcarriers as independent unknowns, and thereby ignoring the strong correlation across frequency induced by the finite delay spread of the channel. These limitations motiv ate a time-domain-based estimation approach. The ke y observation is that the OFDM channel is fully characterized by a finite number of time-domain taps L . Instead of estimating the channel on each subcarrier , we estimate these taps directly . Since the number of taps is typically much smaller than the number of subcarriers ( L ≪ S ), the number of unkno wn parameters is significantly reduced. For example, in our simulations in Section VII with S = 512 and L = 6 , we hav e S/L ≈ 86 , so the cyclic prefix is about 1.17%. These limitations motiv ate a time-domain- based estimation approach. The ke y observation is that the OFDM channel is fully characterized by a finite number of time-domain taps L . Instead of estimating the channel on each subcarrier ν , we estimate these taps directly . Since the number of taps is typically much smaller than the number of subcarriers ( L ≪ S ), the number of unknown parameters is significantly reduced. For example, in our simulations in Section VII with S = 512 and L = 6 , we hav e S/L ≈ 86 , which corresponds to a cyclic prefix ov erhead of only 1.17% of the symbol duration. This enables joint processing across subcarriers and reduces the number of unkno wn parameters by exploiting the finite delay spread. Consequently , the pilot ov erhead can be significantly reduced by estimating a smaller set of channel parameters while still enabling reconstruction of the channel ov er the entire bandwidth. A. Channel Representation W e focus on the uplink channel estimation problem. Let’ s start with first block τ = 1 and the uplink channel ma- trix H T [1 , ν ] = [ h 1 [1 , ν ] , . . . , h K [1 , ν ]] ∈ C M × K , where h k [1 , ν ] ∈ C M denotes the channel vector associated with 6 the k -th UE transmit antenna where k = 1 , . . . , K during the first transmission block τ = 1 . The receiv ed uplink signal corresponding to the k -th UE on subcarrier ν is expressed as y U k [1 , ν ] = p P r h k [1 , ν ] x [1 , ν ] + n U k [1 , ν ] , ν = 0 , . . . , S − 1 , (14) where x [1 , ν ] is a pilot symbol and n U k [1 , ν ] ∈ C M is additive noise with i.i.d. C N (0 , 1) entries at the BS. For notational simplicity , we assume that the pilot sequence length is t p = 1 per UE k . The frequency-domain channel vector h k [1 , ν ] ∈ C M can be expressed as h k [1 , ν ] = L − 1 X ℓ =0 ¯ h T k [1 , ℓ ] e − j 2 πℓν /S , (15) where ¯ h k [1 , ℓ ] denotes the ℓ -th time-domain channel tap and L is the ef fective channel length. Equation (15) shows that the frequency-domain channel is the S -point discrete Fourier transform (DFT) of the time-domain channel taps. Collecting the frequency-domain channel vectors across all subcarriers yields H k ≜  h k [1 , 0] , . . . , h k [1 , S − 1]  ∈ C M × S , (16) and we define H = [ H 1 , . . . , H K ] ∈ C M × S K . Assuming unit-modulus pilot symbols, i.e., x k [1 , ν ] = 1 for all ν and P r the transmit power used by the UE, the receiv ed signals across subcarriers can be stacked as in (17) on the top of the next page, where DFT S denotes the S × S DFT matrix. Applying the in verse DFT yields 1 √ P r S Y k · DFT − 1 S = [ ¯ H k , 0 ] + 1 √ P r S N k · DFT − 1 S , (19) where the last S − L columns correspond to zero padding. B. T ime-F requency Conver sion Instead of transmitting pilots on all S subcarriers, the L time-domain channel taps can be uniquely recovered from only L pilot subcarriers { ν 0 , . . . , ν L − 1 } . This allows us to express the channel responses jointly across these subcarriers, leading to (18), which establishes a linear transformation between the L frequency-domain observations and the L time-domain channel taps. A con venient and well-conditioned choice is to select equally spaced pilot subcarriers 2 ν ℓ = S ℓ L , ℓ = 0 , . . . , L − 1 , (20) W e first select L pilot subcarriers according to (20) and transmit pilots x [1 , ν ℓ ] = P r /K on each of them. The ML estimate of h k [1 , ν ℓ ] is giv en by ˆ h k [1 , ν ℓ ] = y U k [1 , ν ℓ ] , ℓ = 0 , . . . , L − 1 , k = 1 , . . . , K . (21) 2 This formulation applies when S L is an integer , which enables uniform spacing of the selected pilot subcarriers. If this condition is not satisfied, one can approximate the spacing by rounding to the nearest integer, selecting the corresponding subcarriers S , and discarding any excess subcarriers that are not needed ( ⌈ S L ⌉ × L − S ). This results in S L distinct pilot patterns, each enabling full channel estimation with identical overhead. The pilot subcarriers can also be selected with an offset δ ∈ { 0 , . . . , ⌈ S L ⌉ − 1 } , giv en by ν ℓ = δ + S ℓ L for ℓ = 0 , . . . , L − 1 . Using the relation in (18), these estimated frequency-domain responses are conv erted to the time domain by applying the in verse L -point DFT , which yields the estimates of the L time- domain channel taps ˆ ¯ h k [1 , ℓ ] . C. F r equency-Domain Channel Reconstruction Once the L time-domain channel taps have been estimated, the channel across all subcarriers can be reconstructed. Let ˆ ¯ H k =  ˆ ¯ h k [1 , 0] , . . . , ˆ ¯ h k [1 , L − 1]  (22) denote the estimated time-domain channel taps. T o obtain the full frequency-domain channel, the vector is first zero-padded to length S as ˆ H k, TD =  ˆ ¯ H k , 0 M × ( S − L )  . (23) The frequency-domain channels on all S subcarriers are then obtained by applying the S -point DFT ˆ H k = 1 √ S ˆ H k, TD DFT S . (24) This procedure reconstructs the full frequency-domain chan- nel from only L pilot subcarriers. Consequently , for t p = 1 , the pilot overhead is reduced from S symbols in conv entional frequency-domain estimation to only L symbols. Algorithm 2 summarizes the proposed time-domain-based channel estimation procedure. The same time-domain estima- tion principle can also be extended to the effecti ve channels B [ τ , ν ] , G [ τ , ν ] and D [ τ , ν ] introduced in Section III. How- ev er , unlike H T [1 , ν ] , these effecti ve channels may exhibit additional frequency selectivity due to subcarrier-dependent precoding and combining matrices inside them. T o limit this effect, the S subcarriers are partitioned into N sub subbands, each containing S sub = S N sub (25) subcarriers. W ithin each subband, the precoding and combin- ing matrices are assumed to be approximately constant, that is, F k [ τ , ν ] ≈ F k [ τ , b ] , W k [ τ , ν ] ≈ W k [ τ , b ] , (26) for all subcarriers ν belonging to subband b = 1 , . . . , N sub . This approximation ensures that the effecti ve channel within each subband remains governed by a limited delay spread L eff , which allows the time-domain estimation method to be applied using only L eff ≥ L pilot subcarriers per subband. The time- domain estimation method is then applied using L eff pilot subcarriers, followed by reconstruction across all subcarriers. D. Complexity and Pilot Overhead Comparison W e next compare the proposed time-domain-based chan- nel estimation approach with con ventional frequency-domain estimation in terms of pilot overhead and computational com- plexity . T able I summarizes the key differences between the two approaches in terms of pilot overhead and computational complexity , which we will explain in the following. 7 h y U k [1 , 0] , . . . , y U k [1 , S − 1] i | {z } ∆ = Y k = H k |{z} [ ¯ H k , 0 ( L +1) × ( S − L − 1) ] √ P r S DFT S . 1 + [ n U k [1 , 0] , . . . , n U k [1 , S − 1]] | {z } ∆ = N k (17) [ h k [1 , ν 0 ] , . . . , h k [1 , ν L − 1 ]] = [ ¯ h k [1 , 0] , . . . , ¯ h k [1 , L − 1]]     1 e − j 2 π ν 0 S . . . e − j 2 π ν L − 1 S . . . . . . . . . . . . 1 e − j 2 π ν 0 ( L − 1) S . . . e − j 2 π ν L − 1 ( L − 1) S     | {z } DFT L (18) Algorithm 2 Time-Domain-Based Channel Estimation 1: Input: S , L , δ , and { y U k [1 , ν ℓ ] } 2: Select pilot subcarriers ν ℓ = δ + S ℓ L , ℓ = 0 , . . . , L − 1 3: T ransmit x [1 , ν ℓ ] = 1 4: f or k = 1 , . . . , K do 5: Estimate pilot-subcarrier channels: ˆ h k [1 , ν ℓ ] = y U k [1 , ν ℓ ] 6: Form ˆ H k, FD = [ ˆ h k [1 , ν 0 ] , . . . , ˆ h k [1 , ν L − 1 ]] T 7: Compute taps: ˆ ¯ H k, TD = DFT − 1 L ˆ H k, FD 8: Zero-pad to length S 9: Reconstruct full channel: ˆ H k = 1 √ S DFT S ˆ H k, TD 10: end for 11: Output: { ˆ H k } K k =1 1) Pilot Overhead: In con ventional frequency-domain channel estimation, the channel response is estimated indepen- dently on each of the S subcarriers. As a result, t p · S pilot symbols are required to estimate the channel over one trans- mission block, where t p denotes the number of pilot OFDM symbols. The total number of pilot symbols grows linearly with the system bandwidth, which i ncreases the number of unknown channel coefficients and the estimation complexity . This can become prohibitiv e in wideband systems, even though the relativ e pilot overhead remains constant. The parameter t p must be selected such that the transmitted pilot sequences are orthogonal, which typically requires t p ≥ K (or equiv alently , the number of transmitted spatial streams). Hence, t p scales with the number of transmit antennas rather than the number of receiv e antennas. In contrast, the proposed time-domain-based method exploits the finite delay spread of the channel. Since the channel is fully characterized by L time-domain taps with L ≪ S , only t p · L pilot symbols are sufficient to recover the full frequency-domain channel response. The resulting pilot ov erhead reduction factor is therefore Overhead TD Overhead FD = L S . (27) In practical OFDM systems, the ratio L S is kept small to limit the cyclic prefix overhead, meaning that the channel remains sparse relativ e to the total number of subcarriers. 2) Computational Complexity: Frequency-domain channel estimation requires processing each subcarrier independently . For each receiv e antenna, this in volv es S complex-v alued channel estimations, typically through least-squares or MMSE processing. The ov erall computational complexity scales as O ( K M S ) , ignoring constant factors related to the estimator . T ABLE I: Frequency- vs. T ime-Domain Channel Estimation Metric Freq.-Domain Time-Domain Estimation domain Per subcarrier Channel taps Unknown parameters K M S K M L Pilot symbols t p S t p L Pilot scaling O ( S ) O ( L ) Estimation complexity O ( K M S ) O ( K M L ) FFT operations None L -IFFT + S -FFT FFT complexity – O ( L log S ) Channel structure Not exploited Exploited Noise averaging No Y es W ideband suitability Limited High The proposed time-domain approach, on the other hand, performs channel estimation using only L pilot subcarriers, followed by an inv erse DFT of size L and a DFT of size S to reconstruct the frequency-domain channel. Since the estimated time-domain channel has only L taps, it is zero- padded to length S before applying the transform. Therefore, this corresponds to a partial DFT , in the sense that only L non-zero coefficients contribute to the S frequency-domain samples. The dominant operations per receive antenna are thus O ( K M L ) + O ( K L log S ) , where the first term corresponds to estimating the L time-domain taps and the second term accounts for the FFT -based transformation to the frequency domain with only L non-zero inputs. Since L ≪ S in typ- ical wireless channels, the time-domain method substantially reduces the estimation complexity , particularly in the channel estimation stage, while the transformation to the frequency domain remains efficient due to the sparse (zero-padded) structure. The comparison rev eals that the proposed time- domain-based channel estimation approach of fers significant reductions in both pilot overhead and estimation complex- ity compared to conv entional frequency-domain estimation. These gains become increasingly pronounced as the system bandwidth grows, making the proposed method particularly attractiv e for wideband and high-frequency systems where large numbers of subcarriers are employed. V . D O W N L I N K A C H I E V A B L E S E The proposed channel estimation procedure provides the BS and UE with CSI, which makes the selection of precoding and combining matrices feasible. The av ailable CSI at the UE is crucial in determining the combining matrices and, ultimately , the achiev able downlink SE. Therefore, in this section, we will 8 explore the SE that can be achieved with two different levels of CSI av ailability at the UE: perfect and imperfect CSI. The achiev able SE is upper bounded by the channel capacity [23]. In cases where the UE has imperfect CSI, the classi- cal Shannon capacity formula cannot be ev aluated directly . Howe ver , there are well-established lower bounds that can be utilized to characterize the SE under imperfect CSI. W e will apply the use-and-then-forget (UatF) technique [7], where the UE uses the channel estimates to design the receive combining but disregards them in the signal detection process. In this section, we first consider the ideal scenario where the UE has perfect CSI and describe how to select the precoding and combining matrices in our setup. Then, we present an SE expression based on the UatF technique. A. Achievable SE with P erfect CSI at the UE W e first consider the genie-aided case where the channel H [ τ , ν ] is assumed to be perfectly known to the UE. This is a benchmark for the practical estimation method proposed in this paper . W ith the transmission of Gaussian data symbols, the ergodic achiev able SE on the subcarrier ν is given by R [ ν ] = E " log 2  det  I N s +  Q [ ν ] W [ τ , ν ]  † H [ τ , ν ] F [ τ , ν ] × F H [ τ , ν ] H H [ τ , ν ]  Q [ ν ] W [ τ , ν ]    # , (28) where the expectation is computed with respect to the fading process { H [ τ , ν ] } [23, Ch. 8]. The av erage SE over the subcarriers is SE full CSI = 1 S S − 1 X ν =0 ρR [ ν ] , (29) where ρ = 1 − t p + N s t c compensates for the estimation overhead. B. Pr ecoding and Combining Design It remains to select the precoding and combining matrices to maximize the SE, for example, the expression in (29). This is a classical problem with a well-established solution under perfect CSI [15, Ch. 4]. The optimal approach uses the singular value decomposition (SVD) of each subcarriers channel matrix to decouple the MIMO channels into many parallel channels. W e take the same approach in our system, but consider the es- timated channel matrix. For the ν -th subcarrier and τ -th block, the estimated channel matrix ˆ H [ τ , ν ] can be decomposed via the SVD as ˆ H [ τ , ν ] = U [ τ , ν ] Λ [ τ , ν ] V H [ τ , ν ] , (30) where U [ τ , ν ] ∈ C K × K is a unitary matrix containing the left singular vectors, Λ [ τ , ν ] ∈ C K × M is a diagonal matrix containing the singular values in decreasing order on the diagonal, and V [ τ , ν ] ∈ C M × M is a unitary matrix containing the right singular vectors. The precoding matrix at the ν -th subcarrier for τ = 1 is F [1 , ν ] = V (: , 1: N s ) [1 , ν ] diag( p P 1 ,ν , . . . , p P N s ,ν ) , (31) where diag( p P 1 ,ν , . . . , p P N s ,ν ) is an N s × N s diagonal matrix with P i,ν representing the i -th diagonal element. This matrix represents a power allocation ov er the i streams and follows a water-filling strategy with the total trans- mit po wer constraint Σ N s i =1 P i,ν = P t per subcarrier . Here, V (: , 1: N s ) [1 , ν ] ∈ C M × N s contains the first N s columns of V [ τ , ν ] , corresponding to the dominant N s singular values. T o set the first-stage compressed combining matrix, the UE utilizes the SVD of the estimated precoded channel ˆ B [1 , ν ] = U B [1 , ν ] Λ B [1 , ν ] V H B [1 , ν ] and pick the first N c columns of left singular matrix U B [1 , ν ] as Q [ ν ] = U B(: , 1: N c ) [1 , ν ] . (32) For block τ = 1 , we can set the second-stage combining as W [1 , ν ] =  I N s 0 ( N c − N s ) × N s  , (33) which is non-square due to the SVD-based selection of Q [ ν ] . The most significant N s signal components are already con- centrated in the leading dimensions in (32), making additional processing unnecessary at initialization. For 1 < τ ≤ t , the first-stage combining matrix Q [ ν ] remains fixed. The system uses the SVD of the estimated effecti ve channel ˆ G [ τ , ν ] = U G [ τ , ν ] Λ G [ τ , ν ] V H G [ τ , ν ] and selects the precoding as F [ τ , ν ] = V G(: , 1: N s ) [ τ , ν ] diag ( p P 1 ,ν , . . . , p P N s ,ν ) , (34) using water-filling power allocation. The second-stage combin- ing is calculated based on estimated precoded effecti ve channel ˆ D [ τ , ν ] = U D [ τ , ν ] Λ D [ τ , ν ] V H D [ τ , ν ] as W [ τ , ν ] = U D(: , 1: N s ) [ τ , ν ] . (35) C. Achievable SE with Imperfect CSI at the UE W e will now utilize the UatF approach to characterize the achiev able ergodic with imperfect CSI. In this technique, the UE uses the channel estimate to compute the first- and second- stage combining matrices (as described in the last subsection), but treats the effecti ve channel as a deterministic quantity when computing the SE. An arbitrary column of the receiv ed signal after the second-stage combining can be expressed as ˘ y [ τ , ν ] = W H [ τ , ν ] D [ τ , ν ] s [ τ , ν ] + W H [ τ , ν ] ` n [ τ , ν ] = ¯ E [ ν ] s [ τ , ν ] + ( E [ τ , ν ] − ¯ E [ ν ]) s [ τ , ν ] + W H [ τ , ν ] ` n [ τ , ν ] , (36) where E [ τ , ν ] = W H [ τ , ν ] D [ τ , ν ] ∈ C K × K represents the effecti ve channel after applying the second receive combining matrix and ¯ E [ ν ] denotes its mean with respect to the fading variations. This distinction between the effecti ve channel and its mean is crucial for handling the statistical properties of the channel in subsequent analysis. The term ` ` n d [ τ , ν ] = ( E [ τ , ν ] − ¯ E [ ν ]) s [ τ , ν ] + W H [ τ , ν ] ` n [ τ , ν ] represents spatially colored noise term that is uncorrelated with the first term and has the cov ariance matrix C ` ` n d = E h ` ` n d [ τ , ν ] ` ` n H d [ τ , ν ] i . (37) 9 By interpreting (36) as a MIMO channel with the deterministic channel ¯ E [ ν ] and the colored noise ` ` n d , we can now write the av erage achiev able SE of the system as SE Imperfect CSI = 1 S S − 1 X ν =0 ρR Imperfect CSI [ ν ] , (38) where ρ = 1 − t p + N s t c and the SE at the subcarrier ν is R Imperfect CSI [ ν ] = log 2  det  I N s + ¯ E H [ ν ] C − 1 ` ` n d ¯ E [ ν ]  . (39) V I . M U - M I M O S Y S T E M S The previous sections considered a SU-MIMO system in order to present the proposed two-stage combining architec- ture and the associated channel estimation procedure with notational brevity . In this section, we briefly discuss how the framew ork extends to the multi-user case. T o this end, consider a system where the BS serves U UEs simultaneously . Each UE is equipped with K antennas and employs the two-stage combining architecture described in Section II-A. The BS is equipped with M antennas and transmits N s streams per UE, which is generally smaller than K . The receiv ed signal at UE u on subcarrier ν during block τ can be written as ˜ y u [ τ , ν ] = W H u Q H u [ ν ] H u [ τ , ν ] U X i =1 F i [ τ , ν ] x i [ τ , ν ] + ˜ n u [ τ , ν ] , (40) where ˜ n u [ τ , ν ] = W H u Q H u [ ν ] n u [ τ , ν ] ∈ C K × N s is additive noise with i.i.d. C N (0 , 1) entries. H u [ τ , ν ] ∈ C K × M denote the downlink channel between the BS and UE u on subcarrier ν during block τ , F i [ τ , ν ] ∈ C M × N s is the precoding matrix for UE i , x i [ τ , ν ] ∈ C N s is the arbitrary transmitted symbol vector , Q u [ ν ] ∈ C K × N c is the first-stage combining matrix and W u [ τ , ν ] ∈ C N c × K is second-stage combining matrix. A. Channel Estimation The proposed time-domain channel estimation method in Section IV can be applied independently for each UE. During uplink transmission, each UE u transmits pilots on the selected L pilot subcarriers. Note that to avoid pilot contamination, different UEs are assigned shifted versions of the pilot subcar- riers (different δ ), which provides separation in the frequency domain. This av oids the need for orthogonal pilot sequences but reduces flexibility in pilot allocation. B. Pr ecoding and Combining Design In MU-MIMO scenarios, the precoding must not only enhance the desired signal for each UE, but also suppress inter- user interference. For this reason, the SVD-based precoding used in SU-MIMO is no longer suitable, as it does not account for this. Instead, we follow a per-user linear MMSE precoding strategy based on the estimated effecti ve channels. This choice follows the same design as the MMSE-based precoding matrix in [24]. For τ = 1 , the MMSE precoding matrix is F u [1 , ν ] = η u [1 , ν ]  ˆ H H u [1 , ν ] ˆ H u [1 , ν ] + µ I M  − 1 ˆ H H u [1 , ν ] , (41) where µ = U σ 2 n P t is a regularization parameter , σ 2 n denotes the noise variance and P t is the total transmit power . The normalization factor η u [ τ , ν ] is selected such that the total power constraint for UE u P K k =1 P N s i =1 P k,i,ν = P t per subcarrier satisfied. UE u selects its first-stage combining matrix using the SVD of the estimated precoded channel ˆ B u [1 , ν ] = U B ,u [1 , ν ] Λ B ,u [1 , ν ] V H B ,u [1 , ν ] . (42) The first-stage combining matrix is then selected as Q u [ ν ] = U B ,u (: , 1: N c ) [1 , ν ] , (43) and the second-stage combining can be initialized as W u [1 , ν ] =  I N s 0 ( N c − N s ) × N s  . (44) For 1 < τ ≤ t , the first-stage combining matrix Q u [ ν ] remains fixed. Let ˆ G u [ τ , ν ] ∈ C N c × M denote the estimated effecti ve channel of UE u , defined as ˆ G u [ τ , ν ] . Based on this effecti ve channel, the BS computes the user-specific precoding matrix as F u [ τ , ν ] = η u [ τ , ν ]  ˆ G H u [ τ , ν ] ˆ G u [ τ , ν ] + µ I M  − 1 ˆ G H u [ τ , ν ] . (45) At UE u side, the second-stage combining is updated from the estimated precoded channel ˆ D u [ τ , ν ] , where its SVD is written as ˆ D u [ τ , ν ] = U D ,u [ τ , ν ] Λ D ,u [ τ , ν ] V H D ,u [ τ , ν ] , (46) and the second-stage combining matrix is selected as W [ τ , ν ] = U D ,u (: , 1: N s ) [ τ , ν ] . (47) C. Achievable SE with Imperfect CSI for MU-MIMO W e next provide an achie vable downlink SE expression for the MU-MIMO system when the UEs rely on imperfect CSI obtained from pilot-based downlink estimation. W e adopt the UatF bounding technique as described in Section V. Consider subcarrier ν in coherence block τ . UE k applies the two-stage combining matrix U u [ τ , ν ] = Q u [ ν ] W u [ τ , ν ] and forms the combined signal at UE u ˘ y u [ τ , ν ] = W H u [ τ , ν ] Q H u [ ν ] y u [ τ , ν ] . (48) By inserting the MU-MIMO received signal model, we obtain ˘ y u [ τ , ν ] = G u [ τ , ν ] s u [ τ , ν ] | {z } desired signal + X i  = u G i [ τ , ν ] s i [ τ , ν ] | {z } multiuser interference + W H u [ τ , ν ] Q H u [ ν ] n u [ τ , ν ] | {z } noise , (49) where the effecti ve channel is defined as E u [ τ , ν ] ≜ W H u [ τ , ν ] Q H u [ ν ] H u [ τ , ν ] F u [ τ , ν ] ∈ C N s × K . (50) 10 Define ¯ E u [ ν ] ≜ E { E u [ τ , ν ] } , where the expectation is taken with respect to the small-scale fading. The corresponding effecti ve noise term is ˜ n u [ τ , ν ] ≜  E u [ τ , ν ] − ¯ E u [ ν ]  s u [ τ , ν ] + X i  = u E i [ τ , ν ] s i [ τ , ν ] + W H u [ τ , ν ] Q H u [ ν ] n u [ τ , ν ] . (51) The cov ariance matrix of this effecti ve noise is C noise u [ ν ] ≜ E  ˜ n u [ τ , ν ] ˜ n H u [ τ , ν ]  . (52) Using (49)–(52), an achiev able SE lower bound for UE u on subcarrier ν is giv en by R UatF u [ ν ] = log 2  det  I N s + ¯ E H u [ ν ] C − 1 noise u [ ν ] ¯ E u [ ν ]  . (53) Finally , the average downlink SE is SE MU − MIMO = 1 S S − 1 X ν =0 ρ U X u =1 R MU − MIMO u [ ν ] , (54) where ρ = 1 − t p + N s t c . The bound in (53) is valid for arbitrary linear precoding and combining matrices. V I I . N U M E R I C A L R E S U LT S In this section, we use Monte Carlo simulations to ev aluate the SE of the proposed digital precoding and combining archi- tecture. The aim is to demonstrate its performance advantages ov er con ventional HBF schemes and alternati ve algorithms in both SU-MIMO and MU-MIMO scenarios, as well as to high- light the gains achiev ed by the proposed time-domain channel estimation compared to the frequency-domain approach. A. Simulation Setup W e simulate a mobile scenario where the UE travels along a linear trajectory . W e consider that the BS is positioned at the 2D coordinates (2,5) meters and is equipped with a half- wa velength-spaced ULA comprising M = 64 antennas. The UE is equipped with a ULA containing K = 16 antennas. Both BS and UE’ s antennas are aligned parallel to the Y - axis. It begins its motion from coordinates (20,10) meters at a pedestrian speed of v = 5 m/s along a vertical line. 3 The UE’ s location changes over time due to the mov ement, which creates realistic changes in the AoA and AoD, triggering the need for modifying the precoding and combining matrices. There are N cl = 3 clusters randomly located between the BS and UE, which are contrib uting to multipath propagation. The results are presented for S = 512 subcarriers. For path loss, we employ the 3GPP model outlined in [25, T able 7.4.1- 1], tailored for an urban microcell (UMi) en vironment and disregarding shadow fading. The carrier frequency is f c = 28 GHz. W e assume N s = 3 , N c = 4 , t p = K , and L = 6 . In the next section, we present an av erage of achiev able SE ov er various random cluster locations and fading realizations. 3 The results are also applicable to higher mobility scenarios. Increased UE velocities reduce both the beam and channel coherence time, which can be accommodated by correspondingly reducing the coherence block size. Fig. 4: The NMSE vs SNR for frequency- and time-domain estimation of ˆ H T [1 , ν ] . As 512 6 is not an integer , we pick 516 subcarriers for time- domain channel estimation first and then remove the last 4 subcarriers. B. F r equency-Domain and T ime-Domain Estimation Fig. 4 shows the normalized mean squared error (NMSE) of the channel estimate ˆ H T [1 , ν ] as a function of the SNR for both frequency-domain and time-domain estimation. The SNR is defined based on the total transmit power normalized by the noise power . T o ensure a fair comparison, the total pilot energy is kept constant across all methods. Hence, when fewer pilot subcarriers are used, the energy per pilot is increased proportionally . The NMSE is defined as NMSE = E { ∥ ˆ H − H ∥ 2 F } E { ∥ H ∥ 2 F } and quantifies the relativ e estimation error . Both methods exhibit an approximately linear decay in NMSE (in dB scale) with increasing SNR, which indicates that the estimation error is dominated by additi ve noise. Howe ver , the TD-based estimator consistently outperforms the FD approach over the entire SNR range, with a gain of roughly 68 dB. Howe ver , the TD-based estimator consistently outperforms the FD approach ov er the entire SNR range, with a gain of roughly 6–8 dB. This performance advantage originates from the structural exploitation of the channel in the TD method. By le veraging the finite delay spread, the estimation problem is reduced from S independent subcarrier coefficients to L ≪ S channel taps, which improves the conditioning of the estimation problem and enables implicit noise av eraging across subcarriers. In contrast, FD estimation treats each subcarrier independently and does not exploit this structure, leading to higher estimation error . These results confirm that TD-based estimation provides more efficient reconstruction of H T [1 , ν ] . C. SU-MIMO System Fig. 5 shows the achiev able SE over time for SU-MIMO. The upper bound is giv en by ideal DBF , where perfect CSI is assumed and the first-stage combining matrix Q [ ν ] is updated at ev ery channel realization. W ith imperfect CSI, the proposed DBF with continuous updates closely follows this bound, demonstrating that frequent updates of Q [ ν ] enable effecti ve 11 Fig. 5: The average SE of SU-MIMO at different time in- stances when moving along a linear trajectory with speed υ = 5 m/s. tracking of channel v ariations. When Q [ ν ] is kept fix ed over the beam coherence time. In this simulation setup, the beam coherence time is T B = 102 ,ms. A moderate performance loss is observed due to channel aging, but the degradation remains limited under the adopted beam coherence definition, noting that the exact impact depends on how this concept is defined (3 dB half-power beamwidth loss following [11]). W e notice that the performance reduces when the channel changes, but the degradation from having a fixed first-stage combining is small, at most 14%. This demonstrates that the hardware setup in Fig. 3 can maintain high performance while reducing the computational complexity . In contrast, fixing both Q [ ν ] and W [ τ , ν ] leads to substantial performance loss, highlighting the importance of adapti ve digital processing under mobility . The figure also compares time-domain (TD) and frequency- domain (FD) channel estimation. The TD-based estimation consistently provides higher SE, since it exploits the finite delay spread to sav e pilot resources and transmit power by a factor of L/S . This adv antage is particularly visible over time, where TD estimation better preserves performance under channel variations, while FD estimation suffers from increased estimation errors and faster degradation. For comparison, we include SE results using the well-known HBF LSAA approach [26], where the first-stage combining is implemented using analog components and has a fixed configuration within the beam coherence time. Our proposed method demonstrates a clear performance advantage ov er LSAA. The bottommost curve depicts a scenario where both combining matrices, Q [ ν ] and W [ τ , ν ] , remain fixed throughout the simulation. In this case, the mobility significantly impacts the performance, showing substantial degradation ov er time. D. MU-MIMO System Fig. 6 illustrates one considered propagation scenario for the MU-MIMO setup, where multiple UEs are simultaneously served. This extends the pre viously considered single-user scenario. This includes the positions of the BS, one example of scattering clusters’ locations, and three mobile UEs. The BS is located near the origin, while multiple scattering clusters are distributed in the en vironment to generate a spatially 0 10 20 30 40 50 0 5 10 15 20 25 30 Fig. 6: The 2D locations of the BS, UEs, their movement direction, and one example for scattering clusters. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 5 10 15 20 25 30 Fig. 7: The average SE over time in a MU-MIMO system with three UEs, each moving at a speed of υ = 5 ,m/s. The results correspond to UE 3, as shown in Fig. 6 correlated mmW av e channel. Each UE follows a predefined trajectory: UE 1 moves diagonally away from the BS, UE 2 mov es horizontally , and UE 3 mov es vertically starting from the position (10 , 15) m. These mobility patterns induce time variations in both the large-scale geometry (angles and path gains) and the small-scale fading, which directly impact the channel estimation accuracy and beamforming performance. W e focus on UE 3 and show the corresponding SE ev olution in Fig. 7 . While the absolute SE is lower than in the single-user case due to inter-user interference and spatial multiplexing, the relativ e performance trends remain unchanged, confirming that the proposed framework extends ef fectively to the multi- user setting. The Ideal DBF provides an upper bound. With imperfect CSI, the proposed two-stage DBF with continuous updates closely approaches this bound, indicating ef ficient tracking of channel variations. The practical implementation incurs only a minor loss due to estimation errors but still clearly outperforms HBF schemes such as LSAA. The per - formance gap increases over time since hybrid architectures are more constrained in adapting to the instantaneous channel. Although the combining can be updated, the analog stage is frequency-flat and cannot capture the frequency selectivity of the channel, resulting in a suboptimal representation. As the channel ev olves, this mismatch becomes more pronounced, 12 leading to faster performance degradation than fully digital beamforming. The periodic drops in SE reflect channel aging between updates and are more evident for schemes with less accurate channel tracking. These results show the robust performance under mobility , also in multi-user scenarios. E. Digital Beamforming versus Hybrid Beamforming Fig. 8 shows the average SE versus the SNR for different combining schemes, ev aluated at UE position (20,15) m (i.e., after 3 seconds). The proposed DBF with both TD and FD channel estimation is compared with HBF methods, including LSAA [26], PE-AltMin [27], and SS-SVD [28]. The results show that the proposed DBF consistently outperforms all HBF schemes across the entire SNR range. This gain originates from the higher flexibility of fully digital combining, which enables more accurate adaptation to the instantaneous wide- band channel conditions. Moreover , the TD-based channel estimation provides a systematic performance improvement ov er FD estimation, particularly at lo w and moderate SNRs, since it exploits the limited delay spread to reduce estimation noise and pilot ov erhead. V I I I . C O N C L U S I O N S HBF architectures suf fer from practical limitations, includ- ing hardware inefficiencies and limited adaptability under mobility . T o address these issues, we proposed a fully digital beamforming architecture with two-stage combining. The first stage performs dimension reduction prior to baseband process- ing, retaining the flexibility of digital processing while mitigat- ing its implementation complexity . T o enable efficient opera- tion, we developed a pilot-based channel estimation framew ork and introduced a time-domain method that exploits the finite delay spread of OFDM channels. This approach improv es esti- mation accuracy compared to con ventional frequency-domain techniques. W e further designed corresponding precoding and combining schemes and analyzed the achie vable SE under imperfect CSI. The results sho w that updating the first-stage combining at the beam coherence timescale is sufficient to achiev e near-optimal performance, even under mobility . More- ov er, the proposed framework consistently outperforms HBF schemes and extends effecti vely to multi-user scenarios. R E F E R E N C E S [1] Y . Khorsandmanesh, E. Björnson, J. Jaldén, and B. Lindoff, “Channel- coherence-adaptiv e two-stage fully digital combining for mmW ave MIMO systems, ” arXiv preprint , 2025. [2] S. Rangan, T . S. Rappaport, and E. Erkip, “Millimeter-wav e cellular wireless networks: Potentials and challenges, ” Pr oceedings of the IEEE , vol. 102, no. 3, pp. 366–385, 2014. [3] A. N. Uwaechia and N. M. Mahyuddin, “ A comprehensive survey on millimeter wave communications for fifth-generation wireless networks: Feasibility and challenges, ” IEEE Access , vol. 8, pp. 62 367–62 414, 2020. [4] S. Dutta, C. N. Barati, D. Ramirez, A. Dhananjay , J. F . Buckwalter, and S. Rangan, “ A case for digital beamforming at mmW ave, ” IEEE T ransactions on W ireless Communications , vol. 19, no. 2, pp. 756–770, 2019. [5] T . E. Bogale and L. B. Le, “Beamforming for multiuser massive MIMO systems: Digital versus hybrid analog-digital, ” in 2014 IEEE Global Communications Conference . IEEE, 2014, pp. 4066–4071. -10 -5 0 5 10 15 20 5 10 15 20 25 30 35 40 45 50 Fig. 8: The average SE vs. the SNR of the proposed DBF method TD and FD estimation and different HBF schemes. [6] B. Y ang, Z. Y u, J. Lan, R. Zhang, J. Zhou, and W . Hong, “Digital beamforming-based massive MIMO transceiver for 5G millimeter-wa ve communications, ” IEEE T ransactions on Micr owave Theory and T ech- niques , vol. 66, no. 7, pp. 3403–3418, 2018. [7] E. Björnson, J. Hoydis, and L. Sanguinetti, “Massive MIMO networks: Spectral, energy , and hardware efficienc y , ” F oundations and T r ends in Signal Processing , vol. 11, no. 3-4, pp. 154–655, 2017. [8] A. Alkhateeb, O. El A yach, G. Leus, and R. W . Heath, “Channel estimation and hybrid precoding for millimeter wav e cellular systems, ” IEEE journal of selected topics in signal pr ocessing , vol. 8, no. 5, pp. 831–846, 2014. [9] K. Hassan, M. Masarra, M. Zwingelstein, and I. Dayoub, “Chan- nel estimation techniques for millimeter-wa ve communication systems: Achiev ements and challenges, ” IEEE Open Journal of the Communica- tions Society , vol. 1, pp. 1336–1363, 2020. [10] Y . Li, “Simplified channel estimation for OFDM systems with multiple transmit antennas, ” IEEE T ransactions on wir eless communications , vol. 1, no. 1, pp. 67–75, 2002. [11] Y . Khorsandmanesh, E. Björnson, J. Jaldén, and B. Lindoff, “Beam coherence time analysis for mobile wideband mmW ave point-to-point MIMO channels, ” IEEE W ireless Communications Letters , vol. 13, no. 6, pp. 1546–1550, 2024. [12] R. W . Heath, N. Gonzalez-Prelcic, S. Rangan, W . Roh, and A. M. Say- eed, “ An overvie w of signal processing techniques for millimeter wave MIMO systems, ” IEEE journal of selected topics in signal pr ocessing , vol. 10, no. 3, pp. 436–453, 2016. [13] B. Lindoff, C. D’Andrea, S. Buzzi, M. T ormanen, and P .-O. Brandt, “The ultimate weapon for ultra-broadband 6G: Digital beamforming and doubly massive mmW ave MIMO, ” arXiv preprint , 2021. [14] J. Brady , N. Behdad, and A. M. Sayeed, “Beamspace MIMO for millimeter-wa ve communications: System architecture, modeling, anal- ysis, and measurements, ” IEEE T ransactions on Antennas and Propa- gation , vol. 61, no. 7, pp. 3814–3827, 2013. [15] E. Björnson and Ö. T . Demir , Introduction to multiple antenna commu- nications and r econfigurable surfaces . Now Publishers, Inc., 2024. [16] A. BeammW ave. (September 2020) Digital beamforming for mobile devices: The power efficient architecture for 5G on mmW ave frequencies. [Online]. A vailable: https://beammwav e.com/download/69/ ?uid=30ff f0131e [17] K. Roth, H. Pirzadeh, A. L. Swindlehurst, and J. A. Nossek, “ A comparison of hybrid beamforming and digital beamforming with low- resolution ADCs for multiple users and imperfect CSI, ” IEEE Journal of Selected T opics in Signal Processing , vol. 12, no. 3, pp. 484–498, 2018. [18] K. V enugopal, A. Alkhateeb, R. W . Heath, and N. G. Prelcic, “Time- domain channel estimation for wideband millimeter wav e systems with hybrid architecture, ” in 2017 IEEE international confer ence on acoustics, speech and signal processing (ICASSP) . IEEE, 2017, pp. 6493–6497. [19] H. Kim, G.-T . Gil, and Y . H. Lee, “T wo-step approach to time-domain channel estimation for wideband millimeter wave systems with hybrid 13 architecture, ” IEEE T ransactions on Communications , vol. 67, no. 7, pp. 5139–5152, 2019. [20] T . S. Rappaport, R. W . Heath Jr, R. C. Daniels, and J. N. Murdock, Millimeter wave wir eless communications . Pearson Education, 2015. [21] BeammW ave AB, “Beammwav e launches an advanced dev elopment platform, ” https://beammwav e.com/press- release/ beammwav e- launches- an- advanced- de velopment- platform/, accessed: 2026-03-24. [22] S. M. Kay , Fundamentals of statistical signal processing: Estimation theory . Prentice Hall, 1993. [23] D. Tse and P . V iswanath, Fundamentals of wireless communication . Cambridge university press, 2005. [24] H. M. Elmagzoub, “On the mmse-based multiuser millimeter wav e MIMO hybrid precoding design, ” International Journal of Communi- cation Systems , vol. 33, no. 11, p. e4409, 2020. [25] 3GPP , “5G; study on channel model for frequencies from 0.5 to 100 GHz ( TR 38.901 version 18.0.0 release 18), ” 2024. [26] F . Sohrabi and W . Y u, “Hybrid analog and digital beamforming for mmW ave OFDM large-scale antenna arrays, ” IEEE Journal on Selected Ar eas in Communications , vol. 35, no. 7, pp. 1432–1443, 2017. [27] X. Y u, J.-C. Shen, J. Zhang, and K. B. Letaief, “ Alternating minimization algorithms for hybrid precoding in millimeter wave MIMO systems, ” IEEE Journal of Selected T opics in Signal Pr ocessing , vol. 10, no. 3, pp. 485–500, 2016. [28] T .-H. Tsai, M.-C. Chiu, and C.-c. Chao, “Sub-system SVD hybrid beamforming design for millimeter wave multi-carrier systems, ” IEEE T ransactions on W ireless Communications , vol. 18, no. 1, pp. 518–531, 2018.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment