ResearchIndex

A Fast Switched Backplane for a Gigabit Switched Router

by Nick McKeown

url show details

You need to log in to add tags and post comments.

Tags

Public comments

#1 posted on Feb 12 2008, 00:46 in collection CMU 15-744: Computer Networks -- Spring 08
I thought Virtual Output Queuing was a very good idea, and iSLIP seems to have solved the more complex scheduling problem that VOQ introduces.
One thing that wasn't clear from the paper is what types of memory and how much they use. Having large numbers of queues as in VOQ means you have to access more memory to make a scheduling decision. This could potentially take more time and make the performance worse. In this sense, there should be a limit number of queues but it's not clear what the number would be.

#2 posted on Feb 12 2008, 10:53 in collection CMU 15-744: Computer Networks -- Spring 08
Regarding packet size: it seems like it ends up just being so much easier to work with fixed-length packets that fragmentation is the right answer.

Regarding the theoretical maximum: perhaps this is just a characterization of the traffic? In other words, this theoretical router has infinite-length queues on each output link, and all received packets go into these queues immediately (as if the switch fabric were super-fast).

#3 posted on Feb 12 2008, 11:40 in collection CMU 15-744: Computer Networks -- Spring 08
This paper was easy to read and introduced some topics that I needed to understand in order to read the other paper.

VOQ (also present in the other reading) is a good idea and eliminates HOL(head line blocking) which (together with Variable-Length packets) seems to be one of the main reasons behind non-optimal throughput.

In the paper, the author used Fixe-Length packets because he concluded that it is simpler in terms of scheduling. However, it would be interesting to see how these scheduling problems could be overcome in networks that do use packets of variable lengths.

#4 posted on Feb 12 2008, 12:19 in collection CMU 15-744: Computer Networks -- Spring 08
The article provides the design rationale for using crossbar switches instead of traditional buses for the backplane in routers. I found that the author clearly motivates the design decisions, in part by clearly specifying the requirements and challenges, but also by providing a nice overview of the directions taken in prior iterations of router design.

#5 posted on Feb 12 2008, 13:28 in collection CMU 15-744: Computer Networks -- Spring 08
I thought the paper was well written, with useful analogies(example, the traffic light analogy for scheduling). The paper makes a strong case for a crossbar type architecture. While it is clear that this architecture allows greater performance, it also comes at an increased cost(A cross bar is roughly O(n^2) where n is the number of ports). I felt that not enough space was spent on this aspect of the solution. While it was argued that this mechanism would let them handle line rates of 2.4Gbps, it isn't clear how this would do if the number of such lines increased.

#6 posted on Feb 12 2008, 13:56 in collection CMU 15-744: Computer Networks -- Spring 08
I enjoyed reading this paper. It was easy to read and provided good background knowledge on the area of router architecture design.

Regarding the fixed packet size, I doubt that fragmenting variable-size packets into fixed-size smaller packets would work. Actually, IP packet fragmentation sometimes breaks packet integrity (like in IPSEC packets), so path MTU discovery is commonly performed to avoid packet fragmentation on delivery.

#7 posted on Feb 12 2008, 14:19 in collection CMU 15-744: Computer Networks -- Spring 08
I totally agree with the author that the fixed packet size would make scheduling much simpler. However, if the size of the packet is too large, it might end up waste a large portion of bandwidth on padding a small packet. If the size of the packet is too small, a packet fragmentation would become an issue because a reconstruction of packet will be quite complicate. Is it possible and simple to manage multiple fixed packet sizes to avoid too much padding and fragmentation?

#8 posted on Feb 12 2008, 15:08 in collection CMU 15-744: Computer Networks -- Spring 08
I liked the fact that this paper seemed more like a lecture aimed at non-experts than most of the technical papers we have read, and found the pictures especially helpful.

I have a few questions that are probably obvious, but I figured I'd ask anyway. In section 4.5 on the iSLIP algorithm he refers to the "priority" of inputs and outputs. For inputs, is this the same concept of priority we saw last lecture? (He introduces "priority classes" in section 6, which made it a little confusing.) Also, I'm not sure what the priority of outputs refers to.

Also, he kept referring to "input cells" and occasionally to packets -- is a "cell" just a group of packets sent from the same place?

#9 posted on Feb 12 2008, 15:52 in collection CMU 15-744: Computer Networks -- Spring 08
This was a very informative read about router evolution and the crossbar switch architecture. It starts by explaining the reasons for the transition from shared-bus routers to routers with point-to-point links, such as the crossbar, and then continues on by explaining further details of the proposed router architecture, such as fixed-size segmentation, Virtual Output Queueing, the iSLIP crossbar-scheduling algorithm and ways to handle multicast traffic.

Since this paper was written quite some time ago and I have personally worked a little in this area (while doing my masters at the University of Crete and working at the Foundation for Research and Technology Hellas), I will provide a little more information on more recent research in this area.

When using the bufferless crossbar architecture traffic is usually segmented in fixed-size cells to avoid scheduling inefficiencies that arise with variable-size packets. The introduction of the "buffered crossbar" or "combined input-crosspoint queueing (CICQ)" architecture allows operating directly on variable-size packets, while still offering peak performance, even without speedup. Moreover, this architecture allows for simpler and more efficient scheduling algorithms, since it does not require a central scheduler that synchronously changes the crossbar configuration for all input-output pairs.

More information on the buffered crossbar architecture can be found here http://archvlsi.ics.forth.gr/bufxbar, while the implementation of an 8x8 buffered crossbar in a FPGA that directly operates on variable-size packets is described in section IV of this paper http://www.cs.cmu.edu/~mpapamic/research/samos_vii_2007.pdf

About the question in the first comment, I believe that the theoretical maximum presented in figure 7 corresponds to simulation results using the ideal "output-queueing" architecture, which is the best performing switch architecture. However, due to its very high implementation cost, it is usually only used as a base for comparison.

#10 posted on Feb 12 2008, 15:55 in collection CMU 15-744: Computer Networks -- Spring 08
This is a very well written paper that gives me many idea of what makes a efficient router.

I have a little question of the second step of the iSLIP algorithm: why it suggest to choose a fixed, round-robin schedule of the inputs starting from the highest-priority input..Is there any other variation?

#11 posted on Feb 12 2008, 15:58 in collection CMU 15-744: Computer Networks -- Spring 08
I liked the level of detail this paper presented, especially the analogies that describe head-of-line blocking. It would be interesting to know a little bit more about the current state-of-the-art of packet processing. Are embedded processors still connected to line cards and used to carry out the packet processing? Or everything is done in hard logic to raise the throughput?

#12 posted on Feb 12 2008, 16:19 in collection CMU 15-744: Computer Networks -- Spring 08
This paper is a good starting point before reading the other paper; I also wish I'd read this paper before the other one. The paper provides good description of the architecture of router; especially one at the core network. Also, the paper give good justification of why switched backplane is preferable to bus backplane, Why VOQ is and fixed-length packet are needed etc. For multicast traffic, although multicast hasn't been implemented yet, I think the way that the author used to make replica of packets is impressive.

Also, just a minor question: what is the practical method that can use to have a fixed-length packet?

#13 posted on Feb 12 2008, 16:27 in collection CMU 15-744: Computer Networks -- Spring 08
I wonder if anybody ever implemented the multicast features. Supporting multicast would seem to unnecessarily burden what would otherwise be a clean design if nobody uses multicast.

Regarding the variable delay: perhaps cells that have waited longer than x time units can be bumped given strictly higher priority than newer cells, instead of doing strict round-robin. This scheme would still avoid starvation, and it would probably even out the delays.

Overall, I think this was an good, clearly written article.

#14 posted on Feb 12 2008, 16:30 in collection CMU 15-744: Computer Networks -- Spring 08
I find Michael's comment is quite helpful. Because the author is suggesting a number of design choices for achieving optimized performance of the crossbar switch architecture. While some of them is self-contained in the router design, a router should be somewhat flexible about packet size other than MTU.

#15 posted on Feb 12 2008, 16:42 in collection CMU 15-744: Computer Networks -- Spring 08
The paper was a nice tutorial on the evolution of router architectures. Though the section on multicast implementation was not absolutely necessary (as others pointed out), I guess the author couldn't resist including a section on it given the natural way in which the crossbar switch lends itself to multicast.

I do think life is made much simpler by having fixed-length packet sizes, but as Wittawat mentioned, it also makes me wonder about the trade-offs to choose the fixed length size (probably it can be engineered).

#16 posted on Feb 12 2008, 18:28 in collection CMU 15-744: Computer Networks -- Spring 08
Regarding the SRAM question: I believe cheap switches use DRAM, and that's good enough to support 1Gbps switches, such as the ones you might have in your house. However, as we move towards faster switches (10Gbps and beyond), I believe that SRAM will be necessary to keep up. Of course, with that change, you're also likely to see smaller output buffers, since SRAM is quite expensive and also power-hungry.

#17 posted on Feb 12 2008, 23:15 in collection CMU 15-744: Computer Networks -- Spring 08
This paper is easy to read and understand. One question here, what is the purpose to use VoQ ? I don't quite catch the advantage here.

#18 posted on Feb 13 2008, 01:26 in collection CMU 15-744: Computer Networks -- Spring 08
For me, it cleared up one of the most mysterious parts about routers from class so far - the amount of computation that's actually done per data packet. We have had many discussions where we pointed out that routers can't do too much work and thus many of the (congestion) algorithms are impractical. This shows about how much work is tolerable.

#19 posted on Sep 10 2008, 02:34 in collection UW-Madison CS 740: Advanced Computer Networking -- Spring 2012
The paper propounds the superiority of a switched backplane over conventional ones, and uses the Cisco GSR example to explain the switched backplane in elegant detail. I have worked at Cisco on the GSR router and so some of what the paper covers was already known to me.

The paper initially builds up a robust case for why the earlier software-switched shared bus architectures perform worse than the hardware switched GSR employing the switch fabric. The stress on switched backplanes is particularly apt since with internet scale traffic, the forwarding engines are less of a bottleneck compared to the backplane.

The paper then goes on to describe various implementation details of the switch fabric, like fixed sized cells which reduce hardware complexity, the VOQ which assists in improving the scheduling performance, prioritization and speedup mechanisms which reduce cell delivery delays, and the iSLIP scheduling algorithm itself. The consideration that all of these need to be implemented in hardware is a guiding force in the overall design.

Finally in-fabric multicast replication and the ESLIP algorithm is described, where, by making multiple cross-connections simultaneously in the fabric, the same cell can be "multi-cast" in one shot to different destinations.

I liked the paper overall. I have a small comment on the multicast part though. Eventually, every core router gets pushed onto the edge (like with the Cisco CRS supplanting the GSR in the core and pushing it to the edge). And on the edge, feature processing becomes a major concern. So essentially the ingress forwarding engine would most likely have already replicated the multicast packet (to apply features) and the fabric support for multicast risks becoming redundant.

--
Note on terminology: fabric and switch-fabric means switched backplane.

#20 posted on Jan 26 2010, 00:52 in collection UW-Madison CS 740: Advanced Computer Networking -- Spring 2012
Computer Science is "a mode of thinking," says D. Knuth. As a result, it's not a surprise to see a lot of common patterns in the ways that researchers address problems from various fields. This paper is a great example. To name a couple of connections with other fields:

1. The evolution from shared CPU/memory/bus to parallel resources is reminiscent of the shared-nothing architecture that has been favored by database people.

2. The iSLIP algorithm goes just like the Gale-Shapley algorithm for the stable marriage problem.