Open Access

459

# Fault-Tolerant Techniques for ATC Systems Used in High-Speed Railway to Prevent Geomagnetic Storm's Effects

Wang Xin<sup>\*</sup>, Wang Xu, Liu Mingguang and Jiang Xuedong

School of Electrical Engineering, Beijing Jiaotong University, Beijing, 100044, P.R. China

**Abstract:** FPGA has found an increasingly wide utilization in automatic train control (ATC) equipment used in highspeed rail systems, which is potentially sensitive to radiation. How the space radiation caused by Geomagnetic storm affects FPGA devices was analyzed. This paper investigated the FPGA fault-tolerant techniques used nowadays and presented a new mitigation technique based on duplication with comparison combined with time redundancy, which can block an upset. A comparison between the main characteristics of several implemented versions of the case circuit was reported. The results show that the minimum partitioning dual modular redundancy design, with the upset detector and voter circuit not protected from radiation, has 12.3% sensitive areas. If the upset detector and voter circuit is tripled, the radiation sensitive areas will be 0%. This methodology may not only reduce area and pin counts and consequently power dissipation in the I/O pads, but also mitigate the radiation effect produced by geomagnetic storm.

Keywords: Automatic train control, Fault-tolerant techniques, Geomagnetic storm, High-speed railways, Single event effect.

#### **1. INTRODUCTION**

In recent years, with the rapid development of high-speed rail construction, China is enjoying the fastest and the longest high-speed railway operation in the world. The maximum speed of the high-speed trains in China has reached 350 km/h, so it requires blocking section to be at least 6-8 km long if we continue to use the ordinary automatic blocking (e.g. red light signs blocking, green light signs running). Therefore, ATC system is demanded and drivers control a train according to the display of on-board signal without block signal on the ground. When the train is speeding, the auto brake can help automatically control the train's interval and speed, improve transport efficiency and ensure driving safety. The ATC equipment mainly consists of two parts: ground equipment and on-board equipment, as shown in Fig. (1).

As a consequence, FPGAs are increasingly demanded by high-speed rail ATC system because of their high flexibility in achieving multiple requirements such as high performance, low NRE (Non-Recurring Engineering) cost and fast turnaround time [1, 2].

A high density FPGA device of Altera Cyclone II series --- EP2C8Q208I8 was used to complement equally accurate measurement of frequency signal and equally accurate acquisition and processing in locomotive speed sensor [3]. The HDLC protocol and RS485 protocol communication gateways based on FPGA were designed to solve the network problem caused by the brake control unit of CRH2 EMUs (electric multiple unit) [4]. A high performance digital control system, which took DSP + FPGA as the core control, was designed to improve dynamic response speed and steady precision of the regenerate braking energy-absorb device [5]. FPGA has found an increasingly wide utilization in ATC equipment used in high-speed rail systems. Not only many key technical problems have been solved, but also the ATC system's rapid popularization and adoption were promoted.

What leads to Geomagnetic storm is explosive solar activity. When the sun releases flares and ejections coronal mass, a large amount of X-ray, ultraviolet, visible and highenergy proton and the electron beam will be sprayed [6]. The plasma with a large amount of energy, formed by charged particles (protons and electrons), will travel through outer space at the speed of 300 km/s to 1000 km/s. If these particles strike the FPGA chips in ATC equipment, it may produce Single Event Effect (SEE). As a potentially consequence, it will cause information to be lost and function to be failed, which seriously threatened the safe operation of the high-speed rail. Consequently, it is necessary to study Geomagnetic storm's fault-tolerant technique of high-speed rail ATC system.

# 2. GEOMAGNETIC STORM RADIATION IMPACT ANALYSIS OF FPGA

Geomagnetic storm may not only interfere with shortwave radio communications and all kinds of magnetic measurements, but also disturb electrical and magnetic equipment to the normal operation. In 1940, America, the first country linking Geomagnetic storm to power system, found Geomagnetic induced current (GIC) triggered by Ge-

<sup>\*</sup>Address correspondence to this author at the School of Electrical Engineering, Beijing Jiaotong University, Beijing, 100044, P.R. China; Tel: +86 10 51682028; Fax: +86 10 51687101;

E-mail: xwang3@bjtu.edu.cn



Fig. (1). ATC system.



(a) Ionization



(b) Nuclear reaction→Short range recoil→Ionization

Fig. (2). Charged particle striking the silicon surface.

omagnetic storm may cause a host of harmful effects in

power system operation [7-9]. In recent years, Geomagnetic storm's influences on railway electric equipment, track circuit and communication signals system also have begun drawing people's attention [10].

When Geomagnetic storm occurs, Earth's magnetic field will capture a large number of charged particles (protons and electrons) and small amounts of space radiation particles such as alpha particles. SEE can have a destructive or transient effect, according to the amount of energy deposited by the charged particles and the location of the strike in the device. The main consequences of the transient effect, also called Single Event Upset (SEU), are bit flips in the memory elements.

Single particle's flip rate is used to describe the SEU indicators, which is the probability of upsets occurring in devices every day in every bit. The general formula to calculate this probability is

$$R_{p} = \int_{E_{0}}^{\infty} \sigma_{p}(E) \, \varphi(E) \, dE \quad (SEU \,/ \, bit \,\cdot d) \tag{1}$$

where,

 $E_0$  --- the threshold energy, MeV;

 $\sigma_P(E)$ --- the cross-sectional area of the proton SEU,  $cm^2/bit$ ;

 $\varphi(E)$  --- the proton differential flow.

When a particle strikes the device sensitive zone, it will cause elastic and inelastic collisions with the electrons in the device, and this incident particle will lose its energy at this time. The particle energy loss per unit distance is indicated by the LET (Linear Energy Transfer). Depositional energy occurred on the track of particle will produce many electronic-hole pairs and form a dense ionization track, as shown in Fig. (**2a**). For silicon, the production of an electronic-hole pair needs the deposited energy of  $3.6 \ eV$ , while silicon oxide needs  $18 \ eV$  energy. Under the actions of inside and



Fig. (3). SEUs in the routing.

outside electric field and diffusion, these electronic-hole pairs will move to electrode and then collected. If the number of the collected charges is greater than the critical value for a device to flip, the device will reverse it.

The principle of SEU for protons is slightly different. Because the proton's LET is not big enough, electronic-hole pairs will be produced on the track directly, and then collected by electrode. Generally, these collected charges could not reach the critical value for circuits to flip. The main way to make circuits flip is through a recoil react with nucleus, as illustrated in Fig. (**2b**). Due to the recoil carrying great LET, it can deposit enough energy on the track to produce enough electronic-hole pairs, which then will be collected by electrode and make the circuits flip.

If the channel length of a transistor in the storage unit is less than 0.25  $\mu m$  and the channel length of a transistor in combinational circuits is less than 0.13  $\mu m$ , flip is likely to produce in the high radiation environment and even atmospheric environment. Now the integrated circuit manufacturing process is 90 nm or even more smaller than the size of the CMOS (Complementary Metal Oxide Semiconductor) process.

SEU has a peculiar effect in FPGAs when a particle hits the user's combinational logic. In an ASIC (Application Specific Integrated Circuit), the effect of a particle hitting either the combinational or the sequential logic is transient; and the only variation is the time duration of the fault. On the other hand, in a SRAM-based FPGA, an upset in the LUT (Lookup Table) memory cell modifies the implemented combinational logic. It has a permanent effect and it can only be corrected at the next load of the configuration bit stream. An upset in the routing can connect or disconnect a wire in the matrix, see Fig. (3). It has also a permanent effect and its effect can be mapped to an open or a short circuit in the combinational logic implemented by the FPGA. When SEUs occur, many key FPGA application fields, including space missions, satellites, high energy physics experiments, nuclear power, and high speed railway, etc., are increasingly using the fault-tolerant technology to ensure the proper operation of the integrated circuit system.

## **3. RESEARCH ON SEU MITIGATION TECHNIQUES FOR FPGA**

Several SEU mitigation techniques have been proposed in the last few years in order to avoid faults in digital circuits, including those implemented in programmable logic. They can be classified as: fabrication process-based techniques, design-based techniques, and recovery techniques (applied to programmable logic only); and they mainly focus on space applications. Reference [11] presented a SEU mitigation technique for FPGAs utilized in nuclear power plant digital instrumentation and control. High speed rail system also has high reliability requirements, but research literatures on SEU mitigation techniques applied in this field have not been reported.

Design-based SEU mitigation techniques range from the system level to circuit level technology, and mainly include logic redundancy methods based on TMR and EDAC (Error Detection and Correction). Each technique has some advantages and drawbacks, and there is always a compromise between area, performance, power dissipation and fault tolerance efficiency. At present, a SEU mitigation technique with the highest reliability and most mature development is the TMR [12].

The TMR mitigation scheme uses three identical logic circuits performing the same task in parallel with corresponding outputs being compared through majority voters, as shown in Fig. (4).

The majority voter schematic and the truth table are shown in Fig. (5). Majority voter's Boolean expression can be described as:

$$F = M_0 M_1 + M_0 M_2 + M_1 M_2 \tag{2}$$



Fig. (4). TMR scheme.



Fig. (5). Majority voter schematic and the truth table.



Fig. (6). Test circuit.

However, the TMR technique comes with some penalties because of its full hardware redundancy, such as area, I/O pad limitations and power dissipation. Although these overheads and limitations could be reduced by using some architectural SEU mitigation solutions such as hardened memory cells, EDAC techniques and standard TMR with single voter, these solutions are very costly, because they require modifications to the matrix architecture of the FPGA. In the next section, we present a technique based on duplication with comparison combined with time redundancy. The robustness of this technique is evaluated by a test circuit and the result shows that it may reduce area and pin count and consequently power dissipation in the I/O pads.

## 4. NEW FAULT-TOLERANT TECHNIQUES TO PRE-VENT GEOMAGNETIC STORM'S EFFECTS

#### 4.1. Design and Verification of the Test Circuit

Test circuit, as shown in Fig. (6), compares the  $(A+B)^2$  size with the 4AB size. If  $(A+B)^2$  size is larger than 4AB size, the output Y is equal to 100; if  $(A+B)^2$  size is less than 4AB size, Y is 001; and if  $(A+B)^2$  is equal to 4AB, Y is 010.

Fig. (7). VHDL code of adder.



Fig. (8). Functional simulation for the adder circuit.

The multiplier modules are directly called the custom macro module from Quartus II software, and the comparator and adder modules are both described by VHDL (VHSIC Hardware Description Language). An example of an adder circuit in VHDL code is presented in Fig. (7).

After logic synthesis and verification, we get the simulation waveform, as shown in Fig. (8). Inputs A and B are both 3-bit binary number; outputs X1 and X2 are respectively  $(A+B)^2$  and 4AB calculation results; and  $Y1 \sim Y3$ are the comparator's output results.

#### 4.2. Research on Duplication with Comparison Combined with Time Redundancy

Fig. (9) shows such a maximum partitioning DMR (Dual Modular Redundancy) design for the test circuit, where each multiplier, each adder and the comparators are duplicated and followed by a single upset detector and voter circuit called 'DnV'. There are 9 'DnV' circuits and they are all sensitive to SEU because they are not triplicated.

The 1-bit upset detector and voter circuit is illustrated in Fig. (10). Signals  $D_0$  and  $D_1$  come from the outputs of the front two double backup modules. Using four auxiliary D-latches, they latch the outputs of each double backup module and the delayed outputs, respectively. The comparator is used to identify whether there is a fault or fault type.

We can also consider the test circuit, as shown in Fig. (6), as a whole to double backup, then only one 'DnV' circuit is needed here. Such a minimum partitioning DMR design is shown in Fig. (11). In order to ensure that the 'DnV' circuit has a higher ability to mitigate SEU, we can triple it as shown in the dashed part in Fig. (11). The Quartus II software is adopted to realize the above each circuit, and the area, pin number and other resource utilization, etc., are compared and the results are shown in Table 1.

STD --- standard design with no SEU protection;

*DMR*<sup>1</sup> --- minimum partitioning DMR design with the 'DnV' circuits with no protection;



Fig. (9). Maximum partitioning DMR design.



Fig. (10). 10 1-bit SEE detect and voter circuit.



Fig. (11). Minimum partitioning DMR design.

*DMR*<sup>2</sup> --- minimum partitioning DMR design with the 'DnV' circuits protected by TMR;

*DMR*<sup>3</sup> --- maximum partitioning DMR design with the 'DnV' circuits with no protection;

 $S_1$  --- ratio of the IOBs used by that design relative to the standard design STD;

 $S_2$  --- ratio of the number of slices of a particular design compared to the standard design STD;

R --- ratio of SEU sensitive area to the whole occupied area (number of slices).

The number of I/O pads in DMR designs is less than that of TMR designs. In fact, even in the DMR2 design, the

| Designs | Resource Utilization |                    |                             | Reliability      |
|---------|----------------------|--------------------|-----------------------------|------------------|
|         | S <sub>1</sub> (%)   | <sup>S</sup> 2 (%) | Estimated Performance (MHz) | <sup>R</sup> (%) |
| STD     | 100                  | 100                | 33.2                        | 100              |
| DMR1    | 192                  | 228                | 28.1                        | 12.3             |
| DMR2    | 208                  | 296                | 25.5                        | 0                |
| DMR3    | 192                  | 404                | 22.3                        | 50.5             |

Table 1. Comparison between DMR redundancy designs.

number of I/O pads utilized only the occupied 208% of the standard design, instead of 300% as in the TMR approach. In the DMR3 design, 'DnV' circuits occupy more areas, which are 404% of the standard design; and additional 3-levels 'DnV' circuits increase the length of the path. The area occupied by DMR designs is also less than TMR designs. That is to say, the number of I/O pads and areas available in the DMR approach, as opposed to the TMR approach, is increased.

In Table 1, R is the ratio of SEU sensitive area to the whole area occupied by the design, where SEU sensitive area,  $\xi$ , is the 'DnV' circuits without any protection. The ratio R can be calculated from

$$R = \frac{\xi}{\delta} \times 100\% = \frac{\delta - 2\eta}{\delta} \times 100\% = (1 - 2\eta/\delta) \times 100\%$$
(3)  
= (1 - 2/S<sub>2</sub>) × 100%

where,

 $\delta$  --- the whole area occupied by the design;

 $\eta$  --- the area occupied by the standard design.

Thus, for the *DMR*<sup>1</sup> design, we can calculate:

$$R = (1 - 2/2.28) \times 100\% = 12.3\%$$
<sup>(4)</sup>

For the *DMR3* design, we get:

 $R = (1 - 2/4.04) \times 100\% = 50.5\%$ <sup>(5)</sup>

The sensitive area of the DMR2 design fall to 0 %, i.e., it is completely immune to SEU.

### CONCLUSION

Designers for ATC equipment used in high-speed rail system currently use radiation-hardened FPGA devices to cope with radiation effects caused by Geomagnetic storm. However, there is a strong drive to utilize standard commercial-off-the-shelf (COTS) and military devices in ATC systems to minimize cost and development time as compared to radiation-hardened devices. This paper, based on the study of the existing FPGA fault-tolerant techniques, puts forward a new design method of radiation mitigation --- duplication with comparison combined with time redundancy, and the simulation results of several different design schemes are analyzed in comparison. The conclusion is as follows:

- The TMR technique is a suitable solution for integrated circuit (IC) application, including FPGA, because it provides a full hardware redundancy, including the user's combinational and sequential logic, the routing, and the I/O pads. It is currently one of the SEU mitigation techniques with the highest reliability. But in many cases the overhead caused by area and power dissipation will influence the user's design flexibility.
- 2. A new type of DMR method is presented to mitigate the Geomagnetic storm's effects, which can reduce the area overhead and the I/O pin numbers. In the DMR2 design, the minimum partitioning DMR design with the 'DnV' circuits is protected by TMR, the number of I/O pads is 208 % of the standard design, instead of 300% as in the TMR approach. In the DMR3 design, 'DnV' circuits occupy more areas, which are 404% of the standard design; and additional 3-levels 'DnV' circuits increase the length of the path.
- 3. The *DMR*<sup>1</sup> design, the minimum partitioning DMR design with the 'DnV' circuits with no protection, has 12.3 % sensitive areas. While the *DMR*<sup>2</sup> design, whose 'DnV' circuits are protected by TMR, has 0 % sensitive areas. That is to say, the *DMR*<sup>2</sup> design has the same robustness as TMR method.
- 4. In high-speed rail ATC applications, a right DMR design can be chosen according to different areas overhead and SEU mitigation ability.

#### **CONFLICT OF INTEREST**

The authors confirm that this article content has no conflict of interest.

#### ACKNOWLEDGEMENTS

This work was financially supported by "the Fundamental Research Funds for the Central Universities" (2015J-BM085).

#### REFERENCES

 L.J. Diao, K. Dong, L.T. Zhao, L. Wang, and J. Chen, "Dual DSPs-FPGA structured traction control system for urban rail transit vehicle," *Transactions of China Electrotechnical Society*, vol. 29, no. 1, pp. 174-180, 2014.

- [2] S. Xiao, J.B. Sun, H. Geng, and J. Wu, "FPGA based ratio changeable all digital phase-locked-loop," *Transactions of China Electrotechnical Society*, vol. 27, no. 4, pp. 153-158, 2012.
- [3] Y.B. Xu, and Y.S. Wang, "SOC design of locomotive speed signals processing system based on Nios II," *Electric Locomotives & Mass Transit Vehicles*, vol. 30, no. 5, pp. 47-50, 2007.
- [4] C.G. Li, P. Shen and X.B. Nie, "Design of communication gateway between HDLC and RS485 based on FPGA," *Electric Drive for Locomotives*, vol. 55, no. 1, pp. 20-23. 2011.
- [5] C. Zhou, J.L. Chen, H.L. Tao, M. Zhang, and L. Zhou, "Control system of regenerative braking energy absorption device based on DSP and FPGA," *Converter Technology & Electric Traction*, vol. 37, no. 5, pp. 9-16, 2011.
- [6] L.G. Liu, K.R. Wang, C.H. Zhao, and X. Feng, "Solar storm heliographic parameters and conditions driving the GIC in Grid," *Transactions of China Electrotechnical Society*, vol. 28, no. 2, pp. 360-366, 2013.
- [7] W.L. Wu, "Analysis of voltage stability considering geomagnetic disturbance based on the catastrophe theory," *Power System Protection and Control*, vol. 41, no. 23, pp. 30-36, 2013.

Received: September 16, 2014

Revised: December 23, 2014

Accepted: December 31, 2014

© Xin et al.; Licensee Bentham Open.

This is an open access article licensed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.

- [8] M. Wik, R. Pirjola, H. Lundstedt, A. Viljanen, P. Wintoft, and A. Pulkkinen, "Space weather events in July 1982 and October 2003 and the effects of geomagnetically induced currents on Swedish technical systems," *Annales Geophysicae*, vol. 27, no. 4, pp. 1775-1787, 2009.
- [9] R. Pirjola, "Geomagnetically induced currents during magnetic storms," *IEEE Transactions on Plasma Science*, vol. 28, no. 6, pp. 1867-1873, 2000.
- [10] N.G. Ptitsyna, V.V. Kasinskii, G. Villoresi, N.N. Lyahovb, L.I. Dormand, and N. Iucci, "Geomagnetic effects on mid-latitude railways: A statistical study of anomalies in the operation of signaling and train control equipment on the East-Siberian railway," *Advances in Space Research*, vol. 42, no. 9, pp. 1510-1514, 2008.
  [11] X. Wang, K.E. Holbert and L.C. Clark, "Single event upset mitiga-
- [11] X. Wang, K.E. Holbert and L.C. Clark, "Single event upset mitigation techniques for FPGAs utilized in nuclear power plant digital instrumentation and control," *Nuclear Engineering and Design*, vol. 241, no. 8, pp. 3317-3324, 2011.
- [12] K.S. Morgan, D.L. McMurtrey, B.H. Pratt, and M.J. Wirthlin, "A comparison of TMR with alternative fault-tolerant design techniques for FPGAs," *IEEE Transactions on Nuclear Science*, vol. 54, no. 6, pp. 2065-2072, 2007.