Binary multiplier
A binary multiplier is an electronic circuit used in digital electronics, such as a computer, to multiply two binary numbers. It is built using binary adders.
Part of a series on the  
ALU  

Quick navigation  
Components


Categories


See also


A variety of computer arithmetic techniques can be used to implement a digital multiplier. Most techniques involve computing a set of partial products, and then summing the partial products together. This process is similar to the method taught to primary schoolchildren for conducting long multiplication on base10 integers, but has been modified here for application to a base2 (binary) numeral system.
History
Between 19471949 Arthur Alec Robinson worked for English Electric Ltd, as a student apprentice, and then as a development engineer. Crucially during this period he studied for a PhD degree at the University of Manchester, where he worked on the design of the hardware multiplier for the early Mark 1 computer. However, until the late 1970s, most minicomputers did not have a multiply instruction, and so programmers used a "multiply routine"[1][2] which repeatedly shifts and accumulates partial results, often written using loop unwinding. Mainframe computers had multiply instructions, but they did the same sorts of shifts and adds as a "multiply routine".
Early microprocessors also had no multiply instruction. Though the multiply instruction is usually associated with the 16bit microprocessor generation,[3] at least two "enhanced" 8bit micro have a multiply instruction: the Motorola 6809, introduced in 1978,[4][5] and Intel MCS51 family, developed in 1980, and later the modern Atmel AVR 8bit microprocessors present in the ATMega, ATTiny and ATXMega microcontrollers.
As more transistors per chip became available due to largerscale integration, it became possible to put enough adders on a single chip to sum all the partial products at once, rather than reuse a single adder to handle each partial product one at a time.
Because some common digital signal processing algorithms spend most of their time multiplying, digital signal processor designers sacrifice a lot of chip area in order to make the multiply as fast as possible; a singlecycle multiply–accumulate unit often used up most of the chip area of early DSPs.
Basics
The method taught in school for multiplying decimal numbers is based on calculating partial products, shifting them to the left and then adding them together. The most difficult part is to obtain the partial products, as that involves multiplying a long number by one digit (from 0 to 9):
123
x 456
=====
738 (this is 123 x 6)
615 (this is 123 x 5, shifted one position to the left)
+ 492 (this is 123 x 4, shifted two positions to the left)
=====
56088
Signed numbers
A binary computer does exactly the same multiplication as decimal numbers do, but with binary numbers. In binary encoding each long number is multiplied by one digit (either 0 or 1), and that is much easier than in decimal, as the product by 0 or 1 is just 0 or the same number. Therefore, the multiplication of two binary numbers comes down to calculating partial products (which are 0 or the first number), shifting them left, and then adding them together (a binary addition, of course):
1011 (this is 11 in binary)
x 1110 (this is 14 in binary)
======
0000 (this is 1011 x 0)
1011 (this is 1011 x 1, shifted one position to the left)
1011 (this is 1011 x 1, shifted two positions to the left)
+ 1011 (this is 1011 x 1, shifted three positions to the left)
=========
10011010 (this is 154 in binary)
This is much simpler than in the decimal system, as there is no table of multiplication to remember: just shifts and adds.
This method is mathematically correct and has the advantage that a small CPU may perform the multiplication by using the shift and add features of its arithmetic logic unit rather than a specialized circuit. The method is slow, however, as it involves many intermediate additions. These additions take a lot of time. Faster multipliers may be engineered in order to do fewer additions; a modern processor can multiply two 64bit numbers with 6 additions (rather than 64), and can do several steps in parallel.
The second problem is that the basic school method handles the sign with a separate rule ("+ with + yields +", "+ with − yields −", etc.). Modern computers embed the sign of the number in the number itself, usually in the two's complement representation. That forces the multiplication process to be adapted to handle two's complement numbers, and that complicates the process a bit more. Similarly, processors that use ones' complement, signandmagnitude, IEEE754 or other binary representations require specific adjustments to the multiplication process.
Unsigned numbers
For example, suppose we want to multiply two unsigned eight bit integers together: a[7:0] and b[7:0]. We can produce eight partial products by performing eight onebit multiplications, one for each bit in multiplicand a:
p0[7:0] = a[0] × b[7:0] = {8{a[0]}} & b[7:0]
p1[7:0] = a[1] × b[7:0] = {8{a[1]}} & b[7:0]
p2[7:0] = a[2] × b[7:0] = {8{a[2]}} & b[7:0]
p3[7:0] = a[3] × b[7:0] = {8{a[3]}} & b[7:0]
p4[7:0] = a[4] × b[7:0] = {8{a[4]}} & b[7:0]
p5[7:0] = a[5] × b[7:0] = {8{a[5]}} & b[7:0]
p6[7:0] = a[6] × b[7:0] = {8{a[6]}} & b[7:0]
p7[7:0] = a[7] × b[7:0] = {8{a[7]}} & b[7:0]
where {8{a[0]}} means repeating a[0] (the 0th bit of a) 8 times (Verilog notation).
To produce our product, we then need to add up all eight of our partial products, as shown here:
p0[7] p0[6] p0[5] p0[4] p0[3] p0[2] p0[1] p0[0] + p1[7] p1[6] p1[5] p1[4] p1[3] p1[2] p1[1] p1[0] 0 + p2[7] p2[6] p2[5] p2[4] p2[3] p2[2] p2[1] p2[0] 0 0 + p3[7] p3[6] p3[5] p3[4] p3[3] p3[2] p3[1] p3[0] 0 0 0 + p4[7] p4[6] p4[5] p4[4] p4[3] p4[2] p4[1] p4[0] 0 0 0 0 + p5[7] p5[6] p5[5] p5[4] p5[3] p5[2] p5[1] p5[0] 0 0 0 0 0 + p6[7] p6[6] p6[5] p6[4] p6[3] p6[2] p6[1] p6[0] 0 0 0 0 0 0 + p7[7] p7[6] p7[5] p7[4] p7[3] p7[2] p7[1] p7[0] 0 0 0 0 0 0 0  P[15] P[14] P[13] P[12] P[11] P[10] P[9] P[8] P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0]
In other words, P[15:0] is produced by summing p0, p1 << 1, p2 << 2, and so forth, to produce our final unsigned 16bit product.
Signed integers
If b had been a signed integer instead of an unsigned integer, then the partial products would need to have been signextended up to the width of the product before summing. If a had been a signed integer, then partial product p7 would need to be subtracted from the final sum, rather than added to it.
The above array multiplier can be modified to support two's complement notation signed numbers by inverting several of the product terms and inserting a one to the left of the first partial product term:
1 ~p0[7] p0[6] p0[5] p0[4] p0[3] p0[2] p0[1] p0[0] ~p1[7] +p1[6] +p1[5] +p1[4] +p1[3] +p1[2] +p1[1] +p1[0] 0 ~p2[7] +p2[6] +p2[5] +p2[4] +p2[3] +p2[2] +p2[1] +p2[0] 0 0 ~p3[7] +p3[6] +p3[5] +p3[4] +p3[3] +p3[2] +p3[1] +p3[0] 0 0 0 ~p4[7] +p4[6] +p4[5] +p4[4] +p4[3] +p4[2] +p4[1] +p4[0] 0 0 0 0 ~p5[7] +p5[6] +p5[5] +p5[4] +p5[3] +p5[2] +p5[1] +p5[0] 0 0 0 0 0 ~p6[7] +p6[6] +p6[5] +p6[4] +p6[3] +p6[2] +p6[1] +p6[0] 0 0 0 0 0 0 1 +p7[7] ~p7[6] ~p7[5] ~p7[4] ~p7[3] ~p7[2] ~p7[1] ~p7[0] 0 0 0 0 0 0 0  P[15] P[14] P[13] P[12] P[11] P[10] P[9] P[8] P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0]
Where ~p represents the complement (opposite value) of p.
There are a lot of simplifications in the bit array above that are not shown and are not obvious. The sequences of one complemented bit followed by noncomplemented bits are implementing a two's complement trick to avoid sign extension. The sequence of p7 (noncomplemented bit followed by all complemented bits) is because we're subtracting this term so they were all negated to start out with (and a 1 was added in the least significant position). For both types of sequences, the last bit is flipped and an implicit 1 should be added directly below the MSB. When the +1 from the two's complement negation for p7 in bit position 0 (LSB) and all the 1's in bit columns 7 through 14 (where each of the MSBs are located) are added together, they can be simplified to the single 1 that "magically" is floating out to the left. For an explanation and proof of why flipping the MSB saves us the sign extension, see a computer arithmetic book.[6]
Floating point numbers
Implementations
Older multiplier architectures employed a shifter and accumulator to sum each partial product, often one partial product per cycle, trading off speed for die area. Modern multiplier architectures use the (Modified) Baugh–Wooley algorithm,[7][8][9][10] Wallace trees, or Dadda multipliers to add the partial products together in a single cycle. The performance of the Wallace tree implementation is sometimes improved by modified Booth encoding one of the two multiplicands, which reduces the number of partial products that must be summed.
Example circuits
See also
 Booth's multiplication algorithm
 Fused multiply–add
 Wallace tree
 BKM algorithm for complex logarithms and exponentials
 Kochanski multiplication for modular multiplication
 Logical shift left
References
 "The Evolution of Forth" by Elizabeth D. Rather et al.
 "Interfacing a hardware multiplier to a generalpurpose microprocessor"
 M. Rafiquzzaman (2005). Fundamentals of Digital Logic and Microcomputer Design. John Wiley & Sons. p. 251. ISBN 9780471733492.
 Krishna Kant (2007). Microprocessors and Microcontrollers: Architecture, Programming and System Design 8085, 8086, 8051, 8096. PHI Learning Pvt. Ltd. p. 57. ISBN 9788120331914.
 Krishna Kant (2010). MicroprocessorBased Agri Instrumentation. PHI Learning Pvt. Ltd. p. 139. ISBN 9788120340862.
 Parhami, Behrooz, Computer Arithmetic: Algorithms and Hardware Designs, Oxford University Press, New York, 2000 (ISBN 0195125835, 490 + xx pp.)
 Baugh, Charles Richmond; Wooley, Bruce A. (December 1973). "A Two's Complement Parallel Array Multiplication Algorithm". IEEE Transactions on Computers. C22 (12): 1045–1047. doi:10.1109/TC.1973.223648.
 Hatamian, Mehdi; Cash, Glenn (1986). "A 70MHz 8bit×8bit parallel pipelined multiplier in 2.5µm CMOS". IEEE Journal of SolidState Circuits. 21 (4): 505–513. doi:10.1109/ISSC.1986.1052564 (inactive 20190820).
 Gebali, Fayez (2003). "Baugh–Wooley Multiplier" (PDF). University of Victoria, CENG 465 Lab 2. Archived (PDF) from the original on 20180414. Retrieved 20180414.
 Reynders, Nele; Dehaene, Wim (2015). Written at Heverlee, Belgium. UltraLowVoltage Design of EnergyEfficient Digital Circuits. Analog Circuits And Signal Processing (ACSP) (1 ed.). Cham, Switzerland: Springer International Publishing AG Switzerland. doi:10.1007/9783319161365. ISBN 9783319161358. ISSN 1872082X. LCCN 2015935431.
 Hennessy, John L.; Patterson, David A. (1990). "Section A.2, section A.9". Computer Architecture: A quantitative Approach. Morgan Kaufmann Publishers, Inc. pp. A–3..A–6, A–39..A–49. ISBN 9780123838728.