Real-Time Redundancy for the 1.3 GHz Master Oscillator of the European-XFEL

Many modern large-scale facilities, like European X-ray Free Electron Laser (E-XFEL), require precise synchronisation, often down to femtosecond level. Even a very short interruption or an excessive glitch in the reference signal might break the prec…

Authors: Bartosz Gk{a}sowski, Tomasz Owczarek, Krzysztof Czuba

Real-Time Redundancy for the 1.3 GHz Master Oscillator of the   European-XFEL
Real-T ime Redundanc y for the 1.3 GHz Master Oscillator of the European-XFEL Bartosz G ˛ aso wski, T omasz Owczarek Krzysztof Czuba and Łukasz Zembala Institute of Electronic Systems W arsaw Uni versity of T echnology W arsaw , Poland Email: b .gaso wski@elka.pw .edu.pl Holger Schlarb Deutsches Elektronen-Synchrotron Hambur g, Germany Abstract —Many modern lar ge-scale facilities, like European X-ray Free Electron Laser (E-XFEL), requir e precise synchro- nisation, often down to femtosecond level. Even a very short interruption or an excessive glitch in the refer ence signal might break the precise time r elations between subsystems. In such event, a time-consuming resynchr onization pr ocess is r equired that r enders the facility not available for the users until it is completed. Therefor e, such events are highly undesirable. In this paper , we present an autonomous redundancy solution for the European-XFEL ’ s master oscillator that will guarantee a continuous delivery of the high-quality reference signal even in case of most of the potential failures. The concept and implementation are pr esented, as well as results fr om testing in the laboratory en vironment. I . I N T RO D U C T I O N Many modern large-scale facilities, like for example X- ray Free Electron Lasers, require synchronisation with minute accuracy often reaching do wn to femtosecond level. An ex- ample of such facility is European X-ray Free Electron Laser (European-XFEL) [1], an over 3.4 km long complex which is located at Deutsches Elektronen-Synchrotron site (Hamb urg, Germany). At such scale of comple xity achieving required accuracy results in time-consuming system setup and hence requirement for continuous operation. A. Synchr onisation overview Synchronisation at such levels of accuracy is usually re- alised by means of deli very and use of a reference signal whose phase is well-defined and stable. For the purpose of a simplified analysis, synchronisation issues can be split into short- and long-term stability of reference signal phase. Short-term stability is considered in the terms of phase noise, often also expressed as integrated jitter . Most of the challenges in the construction of the master oscillator system come from the phase noise requirements. Long-term phase stability is mainly a result of drifts of the electrical and/or optical lengths in the reference’ s distrib ution network (includ- ing cables as well as components). Synchronisation system’ s main task is compensation of those drifts in order to maintain phase relations between synchronized nodes within required accuracy . Long-term frequency stability is another matter as it mostly concerns the primary frequency reference inside the master oscillator . With proper operation of the synchronisation sys- tem, stability of this reference dominates the ov erall frequency stability . Reference signal is very often used for synthesis of local deriv ed signals in LLRF and other systems. Example of such signals are local oscillator (LO) and ADC clock signals, whose frequency and phase relations to the reference are well defined. Synthesis of these signals usually employs circuits like phase- lock loops and frequency dividers, which require continuous reference signal for proper operation. B. Motivation Stringent performance requirements imposed on the syn- chronisation system, including the master oscillator , in turn impose stringent requirements on the components of these systems. This v ery often reduces the available spectrum of so- lutions and parts to a narro w range. Consequently , performance as an requirement that is clearly e xpressed in quantitati ve terms and has to be met, is prioritised ov er reliability . In fact, some failures were already observed during dev el- opment and test runs of European-XFEL ’ s master oscillator , as well as in similar systems. Failures can be permament (complete failure of some component) or intermittent (e.g. ov erheating caused by cooling system failure, cold joints, or EMI issues). Therefore, the y can result in a v ariety of reference signal’ s disturbances, ranging from complete loss of signal, to temporary power dips, to phase jumps. When such disturbances occur in the input signal of a phase- lock loop or a frequenc y divider , events like cycle-slips can occur . Subsequently , well-defined time relation between the input and output signals is no longer guaranteed. Usually , frequency di viders that are used in such cases ha ve a reset input which allo w to synchronise them ag ain. Ho wev er , in such a large and complex systems a race condition will probably occur between the reset and reference signals, and thus this solution will not guarantee deterministic behaviour . What is more, Europen-XFEL employs complex mixed RF and optical system for distribution of the reference signal [2] [3]. Effects of reference signal’ s disturbances on these systems Figure 1. Simplified block diagram of the European-XFEL ’ s master oscillator system. were not yet analysed in detail. Ho we ver , it cannot be e xcluded that after a disturbance the distribution system might require a warm-up-like period in order to return to desired long-term stability , introducing addidtional delays. Thus, an y interruptions or significant disturbances in the reference signal might break well-defined time relations be- tween subsystems and result in a time-consuming process of resynchronisation and set-up. At such scale of complexity recov ery to femtosecond stability and set-up of the LLRF system can easily take hours, if not days. During this time facility is una vailable for the users which is highly undesirable. In this paper we present an autonomous redundancy solution which aims to mitigate reliability issues in the source of the reference signal, the master oscillator . I I . C O N C E P T A. Overview Prior solutions for minimizing do wntime of the master oscillator are limited to maintaining of a hot spare. It means that there is a continuously running spare copy of the system. In case a failure occurs in the main system, the facility is reconnected to the spare one. Howe ver , as the failure already occured and the reference signal has been disturbed or lost, issues outlined in the previous section still hold true. Our solution for European-XFEL aims to mitigate these issues. An overvie w of the European-XFEL ’ s master oscillator sys- tem is presented in fig. 1, with emphasis on redundancy . The system consists of three identical and independent reference signal sources (generation channels, GC) and a redundancy subsystem. Each GC produces a very high quality 1.3 GHz reference signal (< 20 fs rms jitter , < 10 − 12 frequency stability , +41 dBm power) and is fully independent from the other two. More detailed information on the generation channels can be found in [4]. The main objecti ve of the system is to maintain a continuous and proper reference signal e ven in case of f ailure in one of the GCs, and minimise or smooth-out any disturbances that might be caused by such events. The concept is based on energy storage in a high quality-factor dielectric resonator filter, low- latency detection of phase and/or amplitude disturbances, and fast switching between generation channels. The filter delays failure propagation to the output, allowing for a brief period of time for reaction. Nominally , all three channels are equal and fully operational at all times and provide their output signals to the redundancy subsystem. Howe ver , only signal from one of the channels is passed further at a time—this channel is reg arded as a current master , and the other two are treated as spares. If a potentially critical failure is detected in the current master channel, a switchov er occurs and one of the former spares becomes a new master . Critical failure is understood as a failure that potentially renders output signal unusable and could result in losing synchronism in the European-XFEL, as explained in the pre vious section. Due to the properties of the filter , the total available time budget for failure detection and reaction is estimated to be approximately 200–300 nanoseconds. B. F ailur e Detection The system aims to maintain reference signal continuity ev en in case of disturbances caused by failures. All distur- bances will manifest themselves in irregularities of the phase or the amplitude, or both of them. In other words, an actual issue are the discontinuities of the phase and amplitude. Thus, the concept of failure detection is built around moni- toring of the phase and amplitude of the reference signals. All three signals (from all generation channels) are continously monitored for disturbances, and based on that their validity is assessed. W ith this approach it is possible to reliably detect all important failures, where by important failure we understand one that causes unacceptable disturbances in the reference signal. Example of a most likely "unimportant" f ailure is a degradation of the phase noise performance. Of course, an excessi ve phase noise is undesirable, but neither does it cause cycle-slips 1 , nor can it be measured reliably in a short time, 1 Of course, if the phase noise increases enough to cause c ycle-slips, then it is an unacceptable phase disturbance and should be considered as such. Figure 2. Concept of low-latency monitoring with programmable window comparators. especially at femtosecond lev els. It such case the switchov er can be done manually by the operator at a later time. While monitoring of the amplitude is trivial, monitoring of the phase poses additional challenge. The amplitude can be easily measured as an absolute value. The phase, howe ver , can only be measured in a relativ e manner . Therefore, in our system the phase is measured between the generation channels, in pairs. If we consider only a single pair of sources, then incorrect v alue of measured phase would only indicate that failure occured, but it would not give information on which channel did actually fail. In case of three sources, three phase measurements are av ailable. And while the v alues of phase related to the failing source will change, the third value—measured between properly operating sources—will not. Therefore, it is possible to use an approach similar to voting employed in digital circuits and infer which generation channel did f ail. Of course, it is assumed that it is very unlikely for two channels to fail at the same moment. For lo west latenc y , part of the system that is responsible for fast detection of failures processes signals from the phase and amplitude detectors primarily in the analogue domain. Use of a fast ADC and digital domain processing was found to of fer higher latency and increased design complexity . The solution employs programmable window comparators in order to continously check if the v alues are withing expected range, as shown in Fig. 2. Out-of-range signals are then passed to the digital circuits for further processing, according to the description in the previous paragraph. Slow D A Cs are used to control the thresholds of the comparators as desired. Slow ADC is also included in order to enable closed-loop control of the thresholds and analyze long-term changes, among others. C. Maintaining Refer ence Continuity When a failure is detected in the currently selected genera- tion channel (a master channel), a quick switchov er to another one is required in order to maintain continuity of the output signal. A three-way RF switch module is used to select one of the channels and to switch between them within 100 ns of a trigger signal. This module is required to handle reference signals with power lev els reaching +41 dBm while offering reasonable insertion loss and high isolation. The concept of this module w as already de veloped and presented earlier in [5]. At the very output of the system a high quality-factor filter is inserted as a temporary RF energy storage. It sustains output power during intermittent signal interruptions, caused either by a failure or the switching itself. It also helps to smooth out any disturbances of the phase and the amplitude. As switching between the channels should not induce exces- siv e phase changes on its o wn, synchronisation (phase align- ment) between the generation channels is required. Therefore, each GC has a phase tuning capability which is used to align their phases. The spare channels follow current master’ s phase, while the phase shifter of the master channel is always frozen as not to cause any excess jitter in the output signal. In the role of the phase shifters vector modulators are proposed, because their phase shifting range is not constrained and so they allow for continuous tracking of phase. The synchronisation loop shares the phase detectors and ADCs with detection of phase failures. I I I . H A R DWA R E I M P L E M E N T A T I O N The redundancy subsystem was implemented as a set of hadware modules. An overwie v of each module is presented in the following subsections. A. RF Switch Module The RF switch module is a custom three-way RF switch circuit which has integrated phase and amplitude detectors. It is able to handle +43 dBm signals and switch between the channels within about 50 ns. Insertion loss has an acceptable value of about 2 dB, while isolation exceeds 80 dB. The module is already de veloped and tested; the details were presented earlier in [5]. B. F ilter The filter that was dev eloped for this system is a dual dieletric resonator filter, characterized by a loaded Q-factor of about 4000. The Q-factor of the resonators themselves exceeds 10000. Insertion loss is below 1.5 dB and the filter is capable of handling signals with power exceeding 50 dBm. More detailed description of the filter’ s construction is av ailable in [6]. Thanks to the relatively large Q-factor , this filter is able to sustain output po wer for fe w hundred nanoseconds. For example, in case the input power disappears, after 300 ns the output power will drop by about 3 dB. This giv es rough estimation of time budget av ailable for failure reaction path. C. Redundancy Contr oller The redundancy controller is a central module of the system, controlling and supervising operation of the other modules. Manufactured device is sho wn in Fig. 3. Core components of this module are an analogue processing circuits, a CPLD, and an FPGA. The analogue processing circuits are implemented according to concept introduced in the previous section. Out- puts of the comparators are then passed to the CPLD. The CPLD contains an asynchronous low-latency logic which is rensponsible for failure decoding, decision making and control Figure 3. Photograph of the redundancy controller . of the RF switch module. The logic is implemented with triple modular redundancy (TMR) for increased reliability . The FPGA is responsible for general control, communica- tion and supervision. It handles channel synchronisation (phase alignment) through connected phase shifting modules, control of the thresholds for the window comparators, monitoring of system’ s state, and is an interface to the control system of the facility . The FPGA can be reprogrammed remotely without interruption in module’ s operation—separate CPLD circuit ensures that critical failure reaction paths are continously operational. Also, thanks to a built-in mechanism the FPGA will reprogram itself when it detects errors in its configuration memory , in particular those caused by a single e vent upset (SEU). Additionally , the controller module contains curcuit for an auto-calibration of phase detector set-points. Because of large tolerance of the phase detector parameters (which themselves are within the RF switch module), set-point calibration is nec- essary for obtaining good accurac y of channel synchronisation. Power supply sections and sources are redundant in order to ensure contonuous operation. D. Phase Shifter As a phase shifters DR TM-VM2LF vector modulator mod- ules are used. These modules were dev eloped primarily for use in European-XFEL ’ s LLRF system based on mTCA.4 platform [7]. In case of our system these modules operate in a standalone mode with a custom firmware. Also, they are modified for better phase noise performance, i.e. analogue bandwidth of the baseband is reduced to a very narro w v alue of about 10 Hz. The vector modulator modules are the only part of the redundancy system that is contained within the generation channels. In each generation channel the modulator module is placed just before the last synthesizer . It is required because of po wer lev els: the last synthesizer incorporates high-po wer amplifier and the po wer level of its output signal reaches ca Figure 4. Block diagram of the test setup. 41 dBm. Additionally , it helps to significantly reduce influence of residual phase noise of the modulator . The DR TM-VM2LF modules are connected with the redundancy controller via optical fibre links. I V . T E S T S A. T est Setup and Method Block diagram of the test setup is presented in Fig. 4. The generation channels are emulated by a multichannel phase- coherent synthesizer, a set of the vector modulator modules, and a set of microw av e high power amplifiers (only for high power tests). The amplifiers are currently not av ailable and thus only tests with lower po wer le vels were carried on with the full system. This is, howe ver , enough to do a functional verification and prove correctnes of the concept. The modules that are in the path of the high power signal were tested earlier separately (the RF switch module up to 42 dBm, the filter up to 50 dBm). Therefore, no issues are expected during high power tests of the full system. There are two proposed methods for an injection of artificial failures. One method is to inject them in the v ector modulators. While this method offers a better control over the signal, it does also require a modified firmware. It might not be desirable, howe ver , to use a dif ferent firmware during the tests than during the normal operation. In the second method the failures are emulated within the multichannel synthesizer by changing its settings through a control interface. While this method offers more coarse control of the signals, it does not require the modified firmware and is simpler to implement. Results presented in the next subsection were obtained with help of the second method. Output signal (from before and after the filter) is recorded on a digital oscilloscope with a bandwidth of 4 GHz and sampling rate of 10 GHz, and then processed on a PC. Processing comprises an IQ demodulation with an av eraging window of 100 samples (13 full periods of the 1.3 GHz signal) follo wed by a con version of IQ values to the amplitude (magnitude) and phase (angle). B. T est Results W e present results that were obtained without a closed synchronisation loop (manual phase alignment) and with lo wer Figure 5. Reaction to po wer loss—phase and amplitude of signal before the filter . Figure 6. Reaction to power loss—phase and amplitude of signal after the filter (system output). power signals (no high-po wer amplifiers). These preliminary tests were meant to verify correctness of system’ s operation and to assess the latency of its reaction to the failures. T wo main types of failure were analysed: po wer loss and phase change. 1) P ower Loss: Fig. 5 shows phase and amplitude of the signal before filter in case of a power loss ev ent. It’ s clearly visible that at about 140 ns power started to decrease. About 90 ns later a switcho ver to another channel is already com- pleted. Because of the synthesizer’ s behaviour the phase was also changing. Howe ver , in this case reaction was triggered first by the change of the amplitude. Fig. 6 sho ws phase and anplitude in a similar e vent, b ut after the filter (at the system output). Note the compressed time axis and stretched phase and amplitude axes. Both the amplitude and the phase change smoothly . In the e xtremum the amplitude has decreased only by about 1 dB; howe ver some amplitude unbalance between channels is also visible. Phase change is mostly a result of channel alignment error (ca 2 ◦ ). 2) Phase Change: Fig. 7 shows phase and amplitude of the signal before filter in case of a phase change. The phase starts changing at about 50 ns. In this case f ailure detection latenc y is Figure 7. Reaction to phase change—phase and amplitude of signal after the filter (system output). Figure 8. Reaction to phase change—phase and amplitude of signal after the filter (system output). higher and the switchover to another channel is complete after about 150 ns. Increased latency is caused by two connected factors: slow phase change and a wide window in the window comparator . The slower change occurs, the higher latency is acceptable though. On the other hand, the wide window is a result of increased noise of the phase detectors (due to lower power levels) and lack of the (calibrated) channel synchroni- sation. In nominal conditions window shall be narrower and hence latency should decrease. The unbalance of the phase and the amplitude is similar as in the previous case. As previously , Fig. 8 shows phase and amplitude after the filter (at the system output). In this case the phase is also changing smoothly . The power almost do not change except as a result of the amplitude unbalance. V . C O N C L U S I O N S A N D F U RT H E R P L A N S All hardware modules of the redundancy system for the European-XFEL ’ s master oscillator are ready and tested. The test setup is partially completed and the tests of the whole system are currently in progress. Preliminary tests hav e shown that concept of the redundancy is valid and it is possible to react in failure quickly enough—i.e. in less than 200 ns. The high-Q filter properly smooths out distrubances ensuring continuity of the output signal. Further firmware dev elopment is required to include miss- ing functions, e.g the channel synchronisation and the auto- calibration. After completion of both, the firmware and the teststand, the system will be tested thoroughly and then commisioned in the facility . A C K N O W L E D G M E N T Research supported by Polish Ministry of Science and Higher Education, founds for international co-financed projects for year 2017. R E F E R E N C E S [1] European XFEL, https://www .xfel.eu [2] K. Czuba et al. , “Overview of the RF Synchronization System for the European XFEL”, in Pr oc. IP AC2013 , Shanghai, China, May 2013, paper WEPME035, pp. 3001–3003. [3] C. Sydlo et al. , “Femtosecond Optical Synchronization System for the European XFEL”, in Pr oc. IP AC2017 , Copenhagen, Denmark, May 2017, paper THP AB108, pp. 3969–3971. [4] L. Zembala et al. , “Master Oscillator for the European XFEL”, in Pr oc. IP AC2014 , Dresden, Germany , Jun. 2014, paper WEPRI116, pp. 2771– 2773. [5] B. G ˛ asowski, K. Czuba, H. Schlarb, and L.Z. Zembala, “Channel Selec- tion Switch for the Redundant 1.3 GHz Master Oscillator of the European XFEL”, in Proc. ICALEPCS2017 , Barcelona, Spain, Oct. 2017, paper THPHA090, pp. 1590–1594. [6] A. Abramowicz, “DR Narro wband Filters for XFEL Accelerator”, 2018 22nd International Conference on Micr owave, Radar and W ir eless Com- munications (MIK ON) , Poznan, Poland, May 2018. [7] I. Rutko wski et al. , “Improv ed V ector Modulator Card for MTCA-based LLRF Control System for Linear Accelerators”, in Proc. IP A C2013 , Shanghai, China, May 2013, paper THPEA030, pp. 3207–3209.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment