papers | collections | search login | register | forgot password?

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing
by Sally Floyd, Van Jacobson, Ching-Gung Liu, Lixia Zhang, Steven McCanne
url  show details
You need to log in to add tags and post comments.
Tags
Public comments
#1 posted on Apr 14 2008, 18:27 in collection CMU 15-744: Computer Networks -- Spring 08
This paper describes a framework used in the development of multicast communication that is flexible enough to adapt itself to the requirements of the applications. The authors claim that a reliable multicast delivery similar to TCP or that enforces certain restrictions (delivering messages in order for instance) is not necessary desirable. In this paper, a framework is proposed where the receivers detect missing data and try to repair the losses by requesting it either from the original sender or any other node from the group. This relieves the sender from the task of checking losses and performing retransmission, which would prevent the multicast delivery from scaling well.

This framework was initially applied to the development of Wb, a "network conferencing tool". Wb allows its users to draw on "pages" concurrently. Those changes are identified by the ID of the user responsible for them as well as a sequence number that identifies when the change was made. If a user detects a gap in this sequence, it will try to send a multicast request to the other users.
One interesting feature of this scheme is the fact that if the network becomes partitioned, users can still create new pages, and those pages are always valid even if the network recovers (due to the fact that the page ID is created in the same fashion that message IDs).

One of the most important aspects of the proposed Framework is the ability to retrieve lost data from the network. Basically, if someone detects a loss it will send a multicast request and any node that can answer the request will send a multicast repair. Any node that detects a request but cannot answer will instead backoff from sending a request itself. This allows nodes to quickly recover lost data and also prevents a request implosion. For example, in networks where nodes do not stand at the same distance from a certain source of data, the ones closer to the source will detect the loss sooner and send a request, preventing other users from resending the request.

The authors conclude that the performance of the framework depends on several parameters, and so they developed an algorithm that automatically adapts those parameters to reflect the networks conditions. Their simulations show that this approach decreases the number of necessary retransmissions without significantly increasing the amount of delay.


Overall, the framework appears to work well, but most of the performance improvement seems to reside on the decrease of request/repairs rather than the decrease in delay. One of the major contributions of the paper is to show how to implement reliability without using point-to-point data retransmission.

problems:
There is still work to do on congestion control.

If the routing of the message is not done correctly, sending multicast request/repair every time some data is lost may cause more delay.

Since in their simulations request/repair packets are not lost, the results may not be very accurate.

In particular, one of the advantages of this framework is the ability to suppress unnecessary requests/repairs (due to the arrival of other request/repairs), but how good would the performance be in the presence of congestion and lost request/repair?
#2 posted on Apr 14 2008, 22:17 in collection CMU 15-744: Computer Networks -- Spring 08
Miguel's review provides a nice overview. I'll take this opportunity as a second responder to be a little critical and try to kick of some more discussion.

The paper purports to advance a "framework", but it was almost exclusively about tweaking parameters and optimizing performance for wb. As Miguel notes, wb is an application where every user is both a sender and a receiver, which is a model that isn't appropriate for every user application (to be fair, the authors note this).

My real issue with this paper is intrinsic: on a certain level of abstraction, you're not really doing anything at all, and I think this framework comes pretty close to reaching that level. There is limited discussion about these other applications. The problem is that the application developer will have to be responsible for setting up all of these parameters and tweaking performance, etc. One of the great things about, say, TCP, is that as an application developer you don't have to mess with it - just set it up and use it. In contrast, it seems that this framework doesn't really help a developer at all. I'm also concerned that the approach used in this paper may not be appropriate for other settings. So many of the tweaks appear to be particular for the wb application that the general applicability of the framework comes into question.

At the same time, Miguel is entirely correct - as a proof-of-concept that you can provide reliability in a fully distributed context, this paper is elegant and effective.

What did other people think about the focus on wb? How difficult or applicable do you find this framework to be? How easy do you think it would be to "port" this framework on to other needs and applications?
#3 posted on Apr 14 2008, 22:36 in collection CMU 15-744: Computer Networks -- Spring 08
It's cool that the framework was implemented for a real-ish application, though the request/repair mechanism which seems to be the main idea of the paper is not included... I still liked the simple analysis and simulation on the toy topologies, though I wish they had labeled the axes on the plots (unless it's a product of converting the ps to pdf somehow?!).
#4 posted on Apr 15 2008, 13:45 in collection CMU 15-744: Computer Networks -- Spring 08
As Abe mentioned, the framework given in this paper is tailored to a specific application, wb. Even though the authors argue that the framework can be applied to other network applications like BGP and web caching, I don't see how to tune the framework given for wb to those applications. It'd have been better if they had defined more specific requirements and interfaces that should be provided for developers.
#5 posted on Apr 15 2008, 14:13 in collection CMU 15-744: Computer Networks -- Spring 08
I'm unsure about the premises behind this article. Do the authors assume that best-effort (unreliable) IP multicast will be implemented at the router level? If so, then the paper describes a nice technique for scalable error recovery, but by itself, it would be of little relevance today since router-level IP multicast isn't going to be implemented anytime soon. If the paper doesn't assume router-level IP multicast, then it seems the paper omits discussion of how data can be scalably multicasted in the first place.

In either case, it seems that a scalable method of doing best-effort multicasting is needed. Coupled with such a system, the recovery methods described here would yield a reliable mechanism for multicast.
#6 posted on Apr 15 2008, 14:55 in collection CMU 15-744: Computer Networks -- Spring 08
I agree with the opinion that it would be difficult to tailor the proposed approach to suit applications other than wb although their underlying assumption is that there is hardly a signle general solution for the multicast protocol design. Its adaptive loss recovery scheme is still interesting in a way it effectively suppress flooding of redundant request/repair messages.
#7 posted on Apr 15 2008, 16:12 in collection CMU 15-744: Computer Networks -- Spring 08
The paper proposed a framework for Scalable Reliable Multicast (SRM). It pointed out why applying unicast approach to deliver data in multicast delivery would be problematic; and then it suggested how to deal with those problem such as using receiver-based reliability, and naming in application data units. The paper described more on what the proposed framework is using a network conferencing tool called Wb as an example.

Because this paper was from 1995, and someone mentioned that IP multicast is not likely to be implemented soon, what are the main reasons that obstruct the implementation of this IP multicast?
#8 posted on Apr 15 2008, 16:22 in collection CMU 15-744: Computer Networks -- Spring 08
In reply to the previous post, the End-System Multicast paper (http://www.cs.cmu.edu/~hzhang/papers/sigmetrics-2000.ps.gz) which was publisehd around 5 years after this one looks at some of the issues with IP level multicast and suggests an E2E alternative.
#9 posted on Apr 15 2008, 16:32 in collection CMU 15-744: Computer Networks -- Spring 08
It would have been nice to see some discussion of where all of the magic parameters/initial values/min and max values come from for the experiments in section 6.
#10 posted on Apr 15 2008, 16:41 in collection CMU 15-744: Computer Networks -- Spring 08
An important feature in the wb application is that each member is willing and able to contribute for recovery when there is a loss. But as Dongsu pointed out, in general, SRM is an efficient solution to SRM reliability. For the parameter tuning, it has to be important for a lot of algorithms/application to work (in practice). Actually I don't have a very strong opinion on that.
#11 posted on Apr 15 2008, 16:47 in collection CMU 15-744: Computer Networks -- Spring 08
I personally thought the paper could have been stronger by motivating general reliable multicast principles and applying them specifically to the wb application---rather than the other way around. Trying to "extract" generality from just 1 application doesn't seem enough; how do their assumptions hold up against future applications that employ reliable multicast?
#12 posted on Apr 15 2008, 16:47 in collection CMU 15-744: Computer Networks -- Spring 08
This paper starts off by giving a rough overview of the problems associated with multicast and why the approaches of unicast solutions are not effective anymore. The presented arguments make sense and I found this introductory part of the paper to be very interesting and educating. The rest of the paper described the whiteboard (wb) application and continued on to present various multicast techniques that improve the performance of wb.

My main problem with this paper was that although the authors claim that their ideas are applicable to a “wide variety of other applications”, in this paper they mostly focus on a specific application, mainly wb. It would be interesting if they applied the same techniques to other applications that have different multicast requirements than wb (especially since they claim that multicast applications are very diverse); this would make also make their ideas more convincing.
#13 posted on Apr 15 2008, 16:50 in collection CMU 15-744: Computer Networks -- Spring 08
I find the underlying idea for the article interesting: in previous research projects, two of the authors of the current article implemented wp, an interactive whiteboard application built on top of multicast. This article looks back on the project, and tries to extract the components of the project that could stand alone as a more-or-less general framework for building other applications using a scalable reliable multicast.

However, like previous commenters, I find the evidence presented by the authors for the suitability of their approach to other problems to be quite light. To pick one in particular, I would have been interested in seeing data to back up their claim that BGP could be improved by applying their algorithm "perhaps with some minor adjustments". Maybe this type of work appeared in follow-up publications?
#14 posted on Apr 15 2008, 16:54 in collection CMU 15-744: Computer Networks -- Spring 08
I really think it is an interesting paper. The proposed request/repair scheme is based on IP multicast ( which, as we know, ends in nowhere ). So under this assumption, the author propose a 'clever' way to back off in order to prevent request implosion and response. I quote the word clever because after thinking for a while, I realize the proposed mechanism may be the only reasonable way I can give. Or they just give a reasonable solution.

It is interesting to compare this scheme to the most popular receiver-driven application level multicast. I think the latter approach has request implosion (it broadcasts the bitmap information, which can be seen as the request for the missing ones). However it is much more popular and the overhead is proved to be acceptable. Thus I think we have reasons to re-visit the motivation and basic assumption in this paper.