# 32bit multiplication using TTL logic

L

#### logjam

Jan 1, 1970
0
I want to design what would have been a super computer in 1975, using
parts that would have been "easily" available. Several people have
done this already, just not on an insane scale. My favorite is the
Magic-1, which I'm currently designing PCBs for. It has been a GREAT
learning experience studying the schematics from the Magic-1. I never
thought much about what a computer did during the mysterious clock
cycles between instruction cycles, and now with the microcode muxes,
registers, it all just makes sense.

Any way, I need my computer to be very good at math. The magic-1 is a
16bit Add/sub/compare machine. I want to make mine capable of 32bit
operations and have a 32 bit data path.

I'm starting with 32bit multiplication because it gets 32bit addition
out of the way as well.

I threw together a diagram of a 32*32 multiplier using around 1040 AND
gates and 400 74283 Adder circuits. This schematic would be what I
think is called "asynchronous".

Once I completed my diagrams I remembered something horrible. There is
a LOT of time spent waiting for the carry-in-out propagation in the
AND gates. This comes at the added expense of power consumption board
space, but should speed things up by about 250x. Could anyone suggest
the best 1 bit addition "block" with carry? Its pretty late here and
I'm starting to loose brain function.

Crazy? Remember, a few weeks ago I completed soldering 19,008 LEDs to
make a display...which I haven't finished yet... (building the drivers)

One last thought for those of you who, well, you know who you are.
Idle, what would the power consumption of some 600 odd ICs be? 20ma
per device? This just might make more heat than a pentium!

Now...for floating point...

A

#### Anthony Fremont

Jan 1, 1970
0
One last thought for those of you who, well, you know who you are.
Idle, what would the power consumption of some 600 odd ICs be? 20ma
per device? This just might make more heat than a pentium!

Now...for floating point...

You really need (in this order) a girlfriend and an FPGA development
kit.

Z

#### zoot

Jan 1, 1970
0
In the good old days, as I remmeber, Mulipications /dvisions were done
by sofwtare ( OS libraries ),
Some Aritmetics unit did it Async, most of the time using some
microcode or a hardwwre ( that's even older ) but none did it Sync ( 32
bits in parall )
Not sure butseems to to me 32bits* 32 bits = 64bits ??

Regards

J

#### Jonathan Kirwan

Jan 1, 1970
0
<snip>
Now...for floating point...

Bipolar Integrated Technologies, I think they were in Beaverton,
Oregon or nearby, used to make pure high-speed floating point ALU
chips. Might see if there is a white paper or two floating around
from them.

floating point. It's possible that someone has already developed some
Verilog or VHDL for such a beast and you might be able to look at the
code for ideas.

Finally, this was posted over on comp.arch.embedded and it may help
you think more closely about the class of individuals you are soon to
join:

http://www.cs.pdx.edu/~harry/Relay/index.html

Jon

C

#### Chris

Jan 1, 1970
0
Jonathan said:
<snip>

Finally, this was posted over on comp.arch.embedded and it may help
you think more closely about the class of individuals you are soon to
join:

http://www.cs.pdx.edu/~harry/Relay/index.html

Jon

The fabled "Machina Analytica"! Terminal blocks for the address and
data busses! Relays everywhere! St. Leibowitz and the Venerable
Boedullos are smiling somewhere, Jonathan.

But is that a 32K X 8 SRAM I see?...

"...He turned to the community, spread his hands, shrugged eloquently.

" 'It's still not Him', he told them sourly, then hobbled away.

"Afterwards, there was little formality."

Chris

C

#### Chris

Jan 1, 1970
0
Jonathan said:
<snip>

Finally, this was posted over on comp.arch.embedded and it may help
you think more closely about the class of individuals you are soon to
join:

http://www.cs.pdx.edu/~harry/Relay/index.html

Jon

The fabled "Machina Analytica"! Terminal blocks for the address and
data busses! Relays everywhere! St. Leibowitz and the Venerable
Boedullos are smiling somewhere, Jonathan.

But is that a 32K X 8 SRAM I see?...

"...He turned to the community, spread his hands, shrugged eloquently.

" 'It's still not Him', he told them sourly, then hobbled away.

"Afterwards, there was little formality."

Chris

L

#### logjam

Jan 1, 1970
0
I found his web page just a month or two ago also. But the relatively
short life of a relay worries me. I found a few auctions on ebay for
3000 relays at ~$200, but I don't want to hand wire such a thing. SOMEONE needs to make a relay based web server. Maybe it only says "Hello World", which would probably take about a minute or two... When I thought about it last night I came up with a new solution. A compromise between speed and board space. This is my first experience with binary multiplication, so my first implimentation was very wasteful! I wanted to have a 64bit adder circuit with parallel carry generation, but I forgot about fan-in/out. Getting to the end of the adder circuit would require a LOT of components too. Maybe 16bit parallel by 4 serial is more reasonable. If I try to multiply 1111 by 1111 I should get 11100001. Carry bit | multiplicand | multiplier Start: 0 0000 1111 - Next Op, P0=1, add 1111 0000 to 0000 1111 0 1111 1111 - Next Op, shift right 1 bit 0 0111 1111 - Next Op, P0=1, add 1111 0000 to 0111 1111 1 0110 1111 - Next Op, shift right 1 bit 0 1011 0111 - Next Op, P0=1, add 1111 0000 to 1011 0111 1 1010 0111 - Next Op, shift right 1 bit 0 1101 0011 - Next Op, P0=1, add 1111 0000 to 1101 0011 1 1100 0011 - Next Op, shift right 1 bit 0 1110 0001 - 4 repetitions? yes Done Requirements are a register to store the 32bit multiplicand, 64bit flip flop shift right register (carry out from adder gets shifted in), possibly a 64bit register to store the product/working value. This would require the least amount of parts, but limit me to multiplication with the circuit. It would be pretty fast since it only needs a 32bit adder instead of a 64bit. In the end it might be worth having a separate multiply / shift right circuit and then an additional add / subtract circuit. It all depends on the speed path of the full 64bit adder. ALU flow would probably come from a fast ROM or even SRAM loaded by the host at boot-up. Anything I've missed that could speed up this operation? I'm interested in reducing components, but not at a huge speed drop. The process will take 32 loops, so 10ns more per loop is a lot. J #### Jonathan Kirwan Jan 1, 1970 0 The fabled "Machina Analytica"! Terminal blocks for the address and data busses! Relays everywhere! St. Leibowitz and the Venerable Boedullos are smiling somewhere, Jonathan. But is that a 32K X 8 SRAM I see?... "...He turned to the community, spread his hands, shrugged eloquently. " 'It's still not Him', he told them sourly, then hobbled away. "Afterwards, there was little formality." Chris What's really neat for me, I think... it's right here in my city!! So I may hope to meet him and watch it work. Jon J #### Jonathan Kirwan Jan 1, 1970 0 I found his web page just a month or two ago also. hehe. But the relatively short life of a relay worries me. I found a few auctions on ebay for 3000 relays at ~$200, but I don't want to hand wire such a thing.

Nor would I.
SOMEONE needs to make a relay based web server. Maybe it only says
"Hello World", which would probably take about a minute or two...

Well, remember that each and every "bit" of memory requires a lot of
space and handiwork to make. It would be excessively large, memory
wise. I think anyone doing something like this, with relays, is going
to be very conscious of getting the most from the least effort. And
so a lot of thought will go into that aspect. (Which is why I'd like
to visit, because I'm sure I'd learn a few things developed out of
has the most payoff in the result.)
When I thought about it last night I came up with a new solution. A
compromise between speed and board space. This is my first experience
with binary multiplication, so my first implimentation was very
wasteful!

I wanted to have a 64bit adder circuit with parallel carry generation,
but I forgot about fan-in/out. Getting to the end of the adder circuit
would require a LOT of components too. Maybe 16bit parallel by 4
serial is more reasonable.

I think a lot of folks have been down the same path you are going, as
well. Trade-offs between combinatorial and sequential approaches.
Particularly, in this area of division, multiplication, and so on.

When thinking about an adder, though, why not take this to an extreme
here and think about doing it with a single full adder and sequential
logic? One bit at a time. Between that and a fully combinatorial
solution lays a lot of fun.
If I try to multiply 1111 by 1111 I should get 11100001.

Carry bit | multiplicand | multiplier

First off, just by way of interesting information on the side, in
multipliers there is a trade-off in combinatorial vs synchronous
implementations. A 4x4's combinatorial equivalent gate area just
about equals a sequential implementation of the same thing, in fact.
Below a 4x4, combinatorial is actually less gates. Above 4x4,
sequential is less gates. Your 4x4 suggestion here is right on the
cusp. You can go either way.
Start:
0 0000 1111 - Next Op, P0=1, add 1111 0000 to 0000 1111
0 1111 1111 - Next Op, shift right 1 bit
0 0111 1111 - Next Op, P0=1, add 1111 0000 to 0111 1111
1 0110 1111 - Next Op, shift right 1 bit
0 1011 0111 - Next Op, P0=1, add 1111 0000 to 1011 0111
1 1010 0111 - Next Op, shift right 1 bit
0 1101 0011 - Next Op, P0=1, add 1111 0000 to 1101 0011
1 1100 0011 - Next Op, shift right 1 bit
0 1110 0001 - 4 repetitions? yes

Since you are talking sequential here, the common approach I've
the structure looks like:

4
X ================================\
\\
4 ,-------, ||
Y =======>| YR | ||
'-------' /===\ ||
||4 // \\ ||
|| || || ||
\/ \/ || ||
,-----, ,-----, || ||
\ \ / / || ||
\ \ / / || ||
\ \/ / || ||
/\ / || ||
/ \ / || ||
1/ '----' // ||
/ || // ||
/ ||4 // 4 || 8
/ || /====== || =========> ANSWER
| || // || //
| || // || //4
v \/ // \/ //
,---, ,--------, ,--------,
0-->| C |--->| ALUR |--->| XR |
'---' '--------' '--------'
/\
||
0000

You start out through your execution control logic (not shown), by
latching X into XR, Y into YR, 0000 into ALUR, 0 into C, and some
sequential step counter (not shown) to an initial value of 4. That
counter will have to be able to decrement and test for 0 (not shown.)

That is step (0).

Then the following logic is followed:

(1) if LSB of XR is 1, latch C:ALUR from ALU.
(2) shift right, "0":C:ALUR:XR
(3) decrement counter
(4) if counter not zero, go to step (1)
(5) Result is ALUR:XR

YR C ALUR XR counter
(0) 1111 0 0000 1111 4
(1) 1111 0 1111 1111 4
(2) 1111 0 0111 1111 4
(3) 1111 0 0111 1111 3
(1) 1111 1 0110 1111 3
(2) 1111 0 1011 0111 3
(3) 1111 0 1011 0111 2
(1) 1111 1 1010 0111 2
(2) 1111 0 1101 0011 2
(3) 1111 0 1101 0011 1
(1) 1111 1 1100 0011 1
(2) 1111 0 1110 0001 1
(3) 1111 0 1110 0001 0
(5) ============

Is that about what you are talking about here? It only requires a
4-bit adder to produce an 8-bit output.
Requirements are a register to store the 32bit multiplicand, 64bit flip
flop shift right register (carry out from adder gets shifted in),
possibly a 64bit register to store the product/working value.

This would require the least amount of parts, but limit me to
multiplication with the circuit. It would be pretty fast since it only

Of course.
In the end it might be worth having a separate multiply / shift right
circuit and then an additional add / subtract circuit. It all depends
on the speed path of the full 64bit adder.

Have you looked at Booth's?
ALU flow would probably come from a fast ROM or even SRAM loaded by the
host at boot-up.

Anything I've missed that could speed up this operation? I'm
interested in reducing components, but not at a huge speed drop. The
process will take 32 loops, so 10ns more per loop is a lot.

all been done, somewhere. Have you looked over the literature on the
subject?

Jon

J

#### John Larkin

Jan 1, 1970
0
I want to design what would have been a super computer in 1975, using
parts that would have been "easily" available. Several people have
done this already, just not on an insane scale. My favorite is the
Magic-1, which I'm currently designing PCBs for. It has been a GREAT
learning experience studying the schematics from the Magic-1. I never
thought much about what a computer did during the mysterious clock
cycles between instruction cycles, and now with the microcode muxes,
registers, it all just makes sense.

Any way, I need my computer to be very good at math. The magic-1 is a
16bit Add/sub/compare machine. I want to make mine capable of 32bit
operations and have a 32 bit data path.

I'm starting with 32bit multiplication because it gets 32bit addition
out of the way as well.

I threw together a diagram of a 32*32 multiplier using around 1040 AND
gates and 400 74283 Adder circuits. This schematic would be what I
think is called "asynchronous".

Once I completed my diagrams I remembered something horrible. There is
a LOT of time spent waiting for the carry-in-out propagation in the
AND gates. This comes at the added expense of power consumption board
space, but should speed things up by about 250x. Could anyone suggest
the best 1 bit addition "block" with carry? Its pretty late here and
I'm starting to loose brain function.

Crazy? Remember, a few weeks ago I completed soldering 19,008 LEDs to
make a display...which I haven't finished yet... (building the drivers)

One last thought for those of you who, well, you know who you are.
Idle, what would the power consumption of some 600 odd ICs be? 20ma
per device? This just might make more heat than a pentium!

Now...for floating point...

The IBM 704 did floating point with tubes!

http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP704.html

John

C

#### Chris

Jan 1, 1970
0
Jonathan said:
What's really neat for me, I think... it's right here in my city!! So
I may hope to meet him and watch it work.

Jon

I'd like to see it in operation, too. Obviously a rather extensive
object lesson in building an ALU -- arcane, but very appropriate to his
classwork. No one will ever accuse him of not taking his job
seriously. ;-)

There are four relays for the clock -- I'm wondering if he's got a
three-phase or two phase clock.

Clock speed = 10Hz? Mirth still lives in our world -- if nothing else,
Professor Porter has a sense of humor.

Chris

L

#### logjam

Jan 1, 1970
0
If you guys want to see/hear a much bigger relay computer, look here:

http://www.relaiscomputer.de/

Click on Downloads and then listen to all of the sounds. In one of
them you can here the electric typewriter banging out the answer!

1500 relays vs 400

L

#### logjam

Jan 1, 1970
0
Well, I've drawn the 4 bit adder in my schematic program and I'm not
sure the board space consumption from a 32bit parallel carry adder are
worth the very little speed increase.

I want to make sure that I'm using parts that were available in the
late 70s. Does anyone know of a list of parts and their introduction
date? Or an old databook in PDF format?

R

#### Rich Grise

Jan 1, 1970
0
Well, I've drawn the 4 bit adder in my schematic program and I'm not
sure the board space consumption from a 32bit parallel carry adder are
worth the very little speed increase.

I want to make sure that I'm using parts that were available in the
late 70s. Does anyone know of a list of parts and their introduction
date? Or an old databook in PDF format?

Cheers!
Rich

R

#### Rich Grise

Jan 1, 1970
0
I found his web page just a month or two ago also. But the relatively
short life of a relay worries me. I found a few auctions on ebay for
3000 relays at ~$200, but I don't want to hand wire such a thing. SOMEONE needs to make a relay based web server. Maybe it only says "Hello World", which would probably take about a minute or two... When I thought about it last night I came up with a new solution. A compromise between speed and board space. This is my first experience with binary multiplication, so my first implimentation was very wasteful! I wanted to have a 64bit adder circuit with parallel carry generation, but I forgot about fan-in/out. Getting to the end of the adder circuit would require a LOT of components too. Maybe 16bit parallel by 4 serial is more reasonable. Look up "Carry Lookahead". Have Fun! Rich G #### Guest Jan 1, 1970 0 : I want to design what would have been a super computer in 1975, using : parts that would have been "easily" available. Several people have : done this already, just not on an insane scale. My favorite is the : Magic-1, which I'm currently designing PCBs for. It has been a GREAT : learning experience studying the schematics from the Magic-1. I never : thought much about what a computer did during the mysterious clock : cycles between instruction cycles, and now with the microcode muxes, : registers, it all just makes sense. : Any way, I need my computer to be very good at math. The magic-1 is a : 16bit Add/sub/compare machine. I want to make mine capable of 32bit : operations and have a 32 bit data path. : I'm starting with 32bit multiplication because it gets 32bit addition : out of the way as well. : I threw together a diagram of a 32*32 multiplier using around 1040 AND : gates and 400 74283 Adder circuits. This schematic would be what I : think is called "asynchronous". : Once I completed my diagrams I remembered something horrible. There is : a LOT of time spent waiting for the carry-in-out propagation in the : hundreds of adders which leads me to building an adder out of XOR and : AND gates. This comes at the added expense of power consumption board : space, but should speed things up by about 250x. Could anyone suggest : the best 1 bit addition "block" with carry? Its pretty late here and : I'm starting to loose brain function. : Crazy? Remember, a few weeks ago I completed soldering 19,008 LEDs to : make a display...which I haven't finished yet... (building the drivers) : One last thought for those of you who, well, you know who you are. : Idle, what would the power consumption of some 600 odd ICs be? 20ma : per device? This just might make more heat than a pentium! : Now...for floating point... While some others have suggested changes like using a carry-lookahead adder for the final carry-propagate adder in the multiplier, I would suggest that you also look into Booth encoding (the two techniques can both be used.) At the expense of a little bit of extra complexity to generate the partial products, you can cut the number of rows of partial products approximately in half. As far as using a CLA for the final adder, I think that that's a good idea, because, if you are hell-bent on building this out of TTL parts,there is a 4-bit TTL carry-lookahead block (74x182?, if I recall.) So, a Booth-Array Carry-Save portion, followed by a Carry-Lookahead Carry Propagate portion, will be fast, and do-able, if you're really hell-bent on making this out of TTL parts. I also agree with the other poster that said that you really should get yourself a FPGA development platform, but whatever floats your boat! Have Fun, Joe R #### Rich Grise Jan 1, 1970 0 : I want to design what would have been a super computer in 1975, using : parts that would have been "easily" available. Several people have : Crazy? Remember, a few weeks ago I completed soldering 19,008 LEDs to : make a display...which I haven't finished yet... (building the drivers) Yup. Crazy. Nuts. Certifiable. Bonkers. Wacko. Insane. ;-P Stupid? Remains to be seen. ;-D ;-D ;-D : One last thought for those of you who, well, you know who you are. : Idle, what would the power consumption of some 600 odd ICs be? 20ma : per device? This just might make more heat than a pentium! : Now...for floating point... Get the integers going first! ;-) While some others have suggested changes like using a carry-lookahead adder for the final carry-propagate adder in the multiplier, I would suggest that you also look into Booth encoding (the two techniques can both be used.) ... [enlightening discussion of carry techniques snipped] I've drawn a schematic using Xilinx's WebPack, of a 4x4 multiplier: http://www.abiengr.com/~sysop/images/mult-4x4-sch.gif The ANDs are just plain ANDs, and the ALU-looking thingies are nothing but 1-bit full adders. And I've been procrastinating on doing enough of their tutorials to figure out how to do the guzintas[1] and the guzoutas[2] and schtuff. ;-) Cheers! Rich [1] Goes Into [2] Goes [comes] Out Of L #### logjam Jan 1, 1970 0 So, a Booth-Array Carry-Save portion, followed by a Carry-Lookahead Carry Propagate portion, will be fast, and do-able, if you're really hell-bent on making this out of TTL parts. I also agree with the other poster that said that you really should get yourself a FPGA development platform, but whatever floats your boat! Do you think I will gain anything useful while trying to use the fewest parts to generate the result fast? From what I've seen so far of some FPGA desings they're pretty wastful "because you can". L #### logjam Jan 1, 1970 0 I think I've fallen in love with 74S274 and 74S275. I don't think the 130ns for a 32 bit product includes addition, maybe it does. I wonder why these parts had to be killed off??? The parts are pretty expensive...$1.99 for a 274 and 2.99 for a 275,
but compared to a few hundred logic gates??? Using these chips I might
be able to get more than a million multiplications per second.

L

#### logjam

Jan 1, 1970
0
http://www.ece.ucsb.edu/~kastner/ece154/slides/ece154-5.pdf
http://www.ecs.umass.edu/ece/koren/arith/slides/
http://www.iwu.edu/~wzhu/classes/cs256/

I think I like the combinational method used in the 1976 databook.
Very fast, exceeds everything else I've found.

I have two choices...use parts FROM 1975 for the circuit, or make it
out of modern 283s, etc. Using the origional 74S parts I estimate 2A
power consumption.

I've found list prices at:
74S274s for 1.99, I would need 64 of them
74S275s for 2.99, (I haven't quite figured this one out yet, somewhere
around 15)

I would need 4 74F283s for every 74S274, but they're pretty cheap. In
the end it would be smarter to go with 74S274s even though 256 16 pin
devices on a PCB would look cool. I could go either way on this one.
According to the 1975 ttl book a 74S283 completes a carry in 11ns
and a 74S274 4x4 in 45ns, so I assume the 74S274 is pretty much 4
74S283 and 16 AND gates The critical speed path is carry-in-out on all
the 64 partial products is typ 5ns for all the AND gates and then 34ns

As long as the circuit would work fine with the parts from the 75 I'm
happy. I don't need to buy a bunch of NOS chips, but I want to make
sure I account for the delay of the old chips.

I'll work a little more on this and post some schematics that hopefully
one of you can look over. This is pretty exciting for me.

Replies
0
Views
666
F
Replies
1
Views
954
Tim Wescott
T
D
Replies
0
Views
1K
D
Replies
1
Views
996
Replies
44
Views
4K