Kolja> A very abstract answer:
I a single layer network every input connects to N other pins.
These everage wire length for each input is at least proportional to
(n*sqrt(n)), same for the capacitance.
Therefor the bandwidth is reciprocal to n^3, the latency is proportional
The other extreme is a tree of 2-way switches. Each path has log(n)
switches. Each connection has 2 pins. The area is n^2 (n*log(n) is only
true for unbounded number of rounting layers). The average wire length
is n. The bandwidth is constant, the latency is proportional to log(n).
Of course the latency of the switch will have a larger constant value
than the wire. But the difference between log and n^3 is extreme, so the
break even point will be for rather small n.
I see some more issues.
Some switches are entirely on a single chip. I've heard of folks
implementing these as multi-layer switches. Perhaps if something
specific about the switching pattern is known, that may make sense.
(e.g. it's a 64-bit shifter, implemented as 3 layers of 4-way muxes).
Or perhaps when latency does not matter. But when latency matters,
a full crossbar, implemented on a single chip, seems quite reasonable
to me, even for e.g. 64 16-bit ports. My reasoning is that it's not
64 64-way 16bit muxes that's going to chew up the area, it's the
buffering and scheduling. And a full crossbar should have fewer
scheduling issues than a multi-layer switch.
Once your switch is distributed across more than one chip, you
have a very different problem. The wires between chips cost so
much more than the wires on chip ($0.02 each versus $0.00001 each)
that you can't afford to stall a board wire due to contention for a
I'm currently quite enamoured with the load-balanced switch idea.
(Previously I was enamoured with the Tiny Tera design, both have
come from Nick McKeown's group at Stanford.) The nice thing about
a load-balanced switch is that the switch fabric itself can be a
shifter, or pair of shifters, which is a *lot* easier to implement. I
a load balanced switch implemented on a single chip is an interesting
idea, that may have already been implemented as part of a shared