# Low-Power RNS Converter Using Modified RCA-EAC

Amir Sabbagh Molahosseini and Azadeh Alsadat Emrani Zarandi Department of Computer Engineering Kerman Science and Research Branch, Islamic Azad University Kerman, Iran sabbagh@iauk.ac.ir

Abstract— This paper presents low-power implementation of residue number system (RNS) reverse converter based on an improved ripple carry adder (RCA) with end-around carry (EAC). The presented RCA with EAC uses modified full adders to achieve the ability of correcting double-representation of zero. The implementation results shows that the reverse converter with the modified RCAs with EAC achieves better performance in compare with the previous converter design which uses one's detector circuits after RCAs with EAC to correct double-representation of zero.

#### Keywords-arithmetic circuits, residue number system (RNS), reverse converter.

#### I. INTRODUCTION

These days with the extensive use of wireless devices, power becomes one of the primary design constraints. Hence lots of researches have been done on all levels of VLSI design including architecture, implementation and arithmetic to reduce power consumption. For instance, on the arithmetic level the residue number system (RNS) has been used as an alternative number system to provide power efficient arithmetic [1], [2]. Particularly inherent modular structure of RNS makes it suitable for fast and low-power realization of computing systems where addition (subtraction) and multiplication are the main operations [3], [4].

The RNS can be described as a system with three main parts: forward converter, modulo arithmetic units and reverse converters [5]. The two former parts have modular architectures. In other words, for each moduli of the selected RNS set, independent circuits are required without any dependency between them, and this increases the efficiency of RNS rather than conventional number system. However, reverse converter has a non-modular architecture and it cannot be implemented as parallel as other parts of RNS, and consequently consumes more power. The hardware structure of reverse converters especially for arithmetic-friendly moduli sets requires many modulo adders. One significant point which is usually ignored is the need of modulo adders with single-representation of zero in reverse converters, and adders with double representation of zero causes the converter produces wrong results.

The main modulo adder architecture usually used in reverse converters is carry-propagate adder (CPA) with end-around carry (EAC) [6]. Since, there are many modulo adders in reverse converters, ripple carry adders (RCAs) with EAC have been used to implement the required moduli 2n-1 additions. However, RCA with EAC has double-representation of zero, not suitable for reverse converter, and therefore an additional circuitry should be used to correct its output. This additional zero-correction circuits requires a one's detector circuit consisting of cascaded AND gates to detect whether all sum bits are one's or not [7]. Then if the one's detector output is one, it should be changed to zero. Alternatively, a different approach has also been used in some parallel-prefix modulo 2n-1 adders to correct second representation of zero [8], [9]. First, the group generate and propagate signals are combined using an OR gate, and then the result effects the internal generate signals (output carries for each bit) with an additional prefix level. This technique can not directly be used in RCA-based EAC adders since we haven't pre-defined propagate signals in full adders (FAs). However, in [10] the authors solve the double representation of zero problem by the cost of using an ALU chip. They used this ALU to have group propagate signal.

This paper investigates the implementation of the reverse converter for the moduli set  $\{2^{n}-1, 2^{n}, 2^{n}+1, 2^{2n+1}-1\}$ [11] using modified CPA with EAC that is a RCA-based EAC adder with embedded zero-correction circuits. First, FA+, full adder with additional input and output bits as propagate signals is introduced. Then, these FA+ units are used instead of regular FAs in the structure of RCA-based EAC adders to have single-representation of zero but without explicit one's detector circuit at the adder output, and therefore consuming less area and power. The VLSI implementation results show over 50% reduction of power consumption when using the modified RCAs with EAC.

In the rest of paper, the main concept of residue number system (RNS) and the methods of designing reverse converters described in Section II. The proposed approach to improve the performance of the reverse converter presented in section III, and the performance evaluation based on experimental results done in Section IV. The paper is concluded in Section V.

# II. RESIDUE NUMBER SYSTEM

The residue number system has been considered as a powerful unconventional number system in computer arithmetic due to its attractive features such as carry-limited computations which makes them an effective tool to increase speed and reducing power consumption [3], [4]. In order to construct an RNS system, first a moduli set which includes some pairwise relatively prime numbers should be selected. Then as described in [5], a forward converter is needed to translate the weighted binary numbers to the residues corresponding to the moduli. Then, arithmetic operations on residues will be done using parallel modular arithmetic channels that consist of modulo adder and multiplier. Next, the result which is in the residue form, converted to the equivalent weighted number using reverse converter.

The residue number system  $\{2^{n}-1, 2^{n}, 2^{n}+1, 2^{2n+1}-1\}$  [11] has recently introduced to provide both high dynamic range and parallelism. The block diagram of RNS system based on this moduli set is shown in Figure 1. Since the largest modulo of this set is in the form of  $2^{k}-1$ , it can reduce the total delay of RNS arithmetic unit. However, the reverse converter for this kind of moduli sets are usually very complex and requires lots of regular and modular adders as shown in Figure 2. In order to have the least hardware cost, the RCA-based adders are used in [11] to realize the reverse converter rather than other types of adders such as parallel-prefix adders. However, the one detector is needed after each of these RCAs with EAC to provide single representation of zero.



Fig. 1. The general structure of RNS system based on moduli set  $\{2^{n}-1, 2^{n}, 2^{n}+1, 2^{2n+1}-1\}$ 

## III. THE MODIFIED RCA-EAC STRUCTURE

Figure 3 shows the basic structure of the RCA with EAC used in [6] to design reverse converter, which is suffered from double representation of zero. The Zimmerman [9] solved this problem using an extra prefix level which one of its inputs is the output of the OR gate of group generate and propagate signals. This method suffers from high fan-out problem and excessive power consumption. Another approach in parallel-prefix adders is using the cascaded AND gates as one detector at the end of the adder [7]. The main problem of this method is its extra hardware's such as one needed to force the output to be zero. However, in lots of reverse converter designs the same method is used such that the cascaded AND gates are placed after the output of the regular RCA with EAC. All these were suggested to substitute the old method [10] since using an ALU to have propagate signal is not acceptable in nowadays circuits. Our aim is solving these problems in order to produce single representation of zero with better performance.



Fig. 2. The reverse converter architecture of [11] for the moduli set  $\{2^{n}-1, 2^{n}, 2^{n}+1, 2^{2n+1}-1\}$ 



Fig. 3. RCA-based EAC adder with double representation of zero [6]

We change the structure of FA to have an additional input and output bits  $P_{in}$  and  $P_{out}$  as shown in Fig. 4. This additional signal produces the group-propagate based on exclusive-OR of the current inputs *a* and *b*, and the input propagate signal  $P_{in}$ . In other words, in our design, achieving the group-propagate signal by using the existing XOR gate of the FA with only one additional AND gate results in reducing the power consumption rather the previous approaches.



Fig. 4. The FA+: Full-adder with input and output propagate signals.

Next, these FA+ circuits can be used in the architecture of RCA with EAC to achieve ripple-based EAC adder with single-representation of zero as shown in Fig. 5. It should be noted that in the proposed adder the carry and propagate signal of the final FA+ combined using an OR gate to produce the end-around carry which should be fed again in the circuit to produce the final result. Furthermore, the high regularity of our design results in VLSI efficient architecture.



Fig. 5. RCA-based EAC adder based on FA+ units with single zero

# IV. PERFORMANCE EVALUATION

In order to examine the effect of the suggested design in the performance of reverse converter, we have implemented the reverse converter of [11] for the moduli set  $\{2^{n}-1, 2^{n}, 2^{n}+1, 2^{2n+1}-1\}$  based on the modified RCA with EAC instead of regular CPA with EAC, and the results have been shown in Tables 1 and 2. It should be mentioned that the TSMC 65nm CMOS technology is considered to implement the designs without manual optimization using the Cadence RTL Compiler tools (version v09.10-s242\_1) for synthesizing and the Cadence Encounter and Nano Route tools (versions v09.12-s159 and v09.12-s013, respectively) for placing and routing.

The results indicate that the reverse converter based on the proposed adder consumes significantly less power than the original reverse converter of [11] which uses one's detector after each EAC adder to correct double representation of zero. Besides, the area is also reduced but the latency has been increased. However, the power-delay product (PDP) of the converter with the modified zero-correction scheme is better than the original converter. Both these designs require the same number of AND gates. However, the suggested design distributes AND gates in full adders instead of using all of them at the end. Also the suggested design doesn't need any extra hardware to change the output to zero. Consequently due to these points, the power consumption reduced and PDP is improved.

In addition to the reverse converters of [11], the modified RCA with EAC structure introduced in this paper can also be used in other modern reverse converters such as [12]-[14] to enhance their performance. Besides, the power consumption of forward converters can also reduce using this modified RCA with EAC to realize the required modulo  $2^{k-1}$  additions. However, the performance improvement of the modified adder design in reverse converters such as [15] which need only one RCA with EAC is less than other converters.

Table 1. Implementation results: Area and Delay

| n  | Converter | Chip Area (µm <sup>2</sup> ) | Delay (ns) |
|----|-----------|------------------------------|------------|
| 4  | [11]      | 4199                         | 0.64       |
|    | Proposed  | 3843                         | 1.139      |
| 8  | [11]      | 8359                         | 1.02       |
|    | Proposed  | 7073                         | 1.708      |
| 12 | [11]      | 12137                        | 1.394      |
|    | Proposed  | 10281                        | 2.419      |
| 16 | [11]      | 17029                        | 1.766      |
|    | Proposed  | 13710                        | 2.989      |

| n  | Converter | Chip Area (µm <sup>2</sup> ) | Delay (ns) |
|----|-----------|------------------------------|------------|
| 4  | [11]      | 4199                         | 0.64       |
|    | Proposed  | 3843                         | 1.139      |
| 8  | [11]      | 8359                         | 1.02       |
|    | Proposed  | 7073                         | 1.708      |
| 12 | [11]      | 12137                        | 1.394      |
|    | Proposed  | 10281                        | 2.419      |
| 16 | [11]      | 17029                        | 1.766      |
|    | Proposed  | 13710                        | 2.989      |

| п  | Converter | Power (mW) | PDP      |
|----|-----------|------------|----------|
| 4  | [11]      | 3.631      | 2.32384  |
|    | Proposed  | 1.795      | 2.044505 |
| 8  | [11]      | 4.534      | 4.62468  |
|    | Proposed  | 2.059      | 3.516772 |
| 12 | [11]      | 5.004      | 6.975576 |
|    | Proposed  | 1.932      | 4.673508 |
| 16 | [11]      | 5.447      | 9.619402 |
|    | Proposed  | 2.062      | 6.163318 |

Table 2. Implementation results: Power and PDP

#### V. CONCLUSION

This work investigates implementation issues of reverse converter based on a modified RCA with EAC which have embedded mechanism to correct the double-representation of zero problem, and produce output with single-representation of zero. The results showed that the presented modified RCA with EAC lead to significant power reduction at the expense of higher delay; but with better power-delay balancing compared with the previous converter that using one's detector unit to achieve adder outputs with single-representation of zero.

### VI. ACKNOWLEDGMENT

The authors would like to thank the Kerman Science and Research Branch, Islamic Azad University, Kerman, Iran for supporting this work. The authors also thank the Signal Processing Systems laboratory of INESC-ID, S. Sorouri and S.A. Ebrahimi for cooperating in VLSI implementation.

#### REFERENCES

- [1] T. Stouratitis and V. Paliouras, "Considering the alternatives in low power design," IEEE Circuits and Devices, vol. 7, pp. 23-29, 2001.
- [2] G.C. Cardarilli, A. Nannarelli and M. Re, "Residue Number System for Low-Power DSP Applications," in Proc. of 41<sup>nd</sup> Asilomar Conference on Signals, Systems, and Computers, 2007.
- [3] A. Omondi, B. Premkumar, Residue Number Systems: Theory and Implementations, Imperial College Press, London, 2007.
- [4] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs, 2<sup>nd</sup> edition, Oxford University Press, New York, 2010.
- [5] K. Navi, A.S. Molahosseini, M. Esmaeildoust, "How to Teach Residue Number System to Computer Scientists and Engineers," IEEE Trans. Education, vol. 54, pp. 156-163, 2011.
- [6] S.J. Piestrak, "A high speed realization of a residue to binary converter," IEEE Trans. Circuits and Systems-II, vol. 42, pp. 661-663, 1995.
- [7] L. Kalampoukas, D. Nikolos, C. Efstathiou, H.T. Vergos, J. Kalamatianos, "High-Speed Parallel-Prefix Modulo 2<sup>a</sup>-1 Adders," IEEE Trans. Computers, vol. 49, pp. 673-680, 2000.
- [8] C. Efstathiou, H.T. Vergos and D. Nikolos, "Modulo 2<sup>n</sup>±1 Adder Design Using Select-Prefix Blocks," IEEE Trans. Computers, vol. 52, pp. 1399-1406, 2003.
- [9] R. Zimmermann, "Efficient VLSI Implementation of Modulo (2<sup>n</sup>±1) Addition and Multiplication," in Proc. of IEEE International Symposium on Computer Arithmetic, pp. 158-167, 1999.
- [10] J. J. Shedletsky, "Comment on the Sequential and Indeterminate Behavior of an End-Around-Carry Adder", IEEE Trans. Computers, pp. 271-272, 1977.
- [11] A.S. Molahosseini, K. Navi, C. Dadkhah, O. Kavehei, S. Timarchi, "Efficient Reverse Converter Designs for the New 4-Moduli Sets {2<sup>n</sup>-1, 2<sup>n</sup>, 2<sup>n</sup>+1, 2<sup>2n+1</sup>-1} and {2<sup>n</sup>-1, 2<sup>n</sup>+1, 2<sup>2n</sup>, 2<sup>2n</sup>+1} Based on New CRTs," IEEE Trans. Circuits and Systems-I, vol. 57, pp. 823-835, 2010.
- [12] L. Sousa and S. Antao, "MRC-Based RNS Reverse Converters for the Four-Moduli Sets  $\{2^{n}+1, 2^{n}-1, 2^{n}, 2^{2n+1}-1\}$  and  $\{2^{n}+1, 2^{n}-1, 2^{2n}, 2^{2n+1}-1\}$ ," IEEE Trans. Circuits and Systems-II, vol. 59, pp. 244–248, 2012.
- [13] B. Cao, C.H. Chang and T. Srikanthan, "A Residue-to-Binary Converter for a New Five-Moduli Set,"IEEE Trans. Circuits and Systems-I, vol. 54, pp. 1041–1049, 2007.
- [14] A.S. Molahosseini, K. Navi, "A Reverse Converter for the Enhanced Moduli Set {2<sup>n</sup>-1, 2<sup>n</sup>+1, 2<sup>2n</sup>, 2<sup>2n+1</sup>-1} Using CRT and MRC," in Proc. of IEEE Computer Society Annual Symposium on VLSI, 2010.
- [15] A. Hariri, K. Navi, and R. Rastegar, "A new high dynamic range moduli set with efficient reverse converter," Elsevier Journal of Computers and Mathematics with Applications, vol. 55, pp. 660–668, 2008.