ResearchIndex

An Architecture for Internet Data Transfer

by Niraj Tolia, David G. Andersen, Michael Kaminsky, Swapnil Patil

url show details

You need to log in to add tags and post comments.

Tags

Public comments

#1 posted on Apr 12 2008, 19:22 in collection CMU 15-744: Computer Networks -- Spring 08
Public Review of "An Architecture for Internet Data Transfer"
by Niraj Tolia, Michael Kaminsky, David G. Andersen, Swapnil Patil

This paper proposes a new Data-Oriented Transfer (DOT) service. Applications would be able to conduct bulk data transfer using DOT as a common intermediate layer. As new methods of data transfer (such as peer-to-peer or content distribution networks (CDNs)) become available, existing applications (such as web browsers / web servers, email client/servers) would be able to easily take advantage of these new methods.

As an example, a DOT-capable web browser talking to a DOT-capable server would be able to download the contents of a large file, such as an embedded Flash video, using BitTorrent rather than using HTTP. Also, the local DOT service would be able to cache data items (identified by a hash code) and supply the data from the cache instead of downloading the entire file again. For example, if a certain 10-MB animated GIF becomes popular, and users of various email systems keep spreading it by emailing it to their friends, then the email servers won't need to physically transfer all 10 MBs each time, because they would be able to retrieve the file from their local cache instead of asking the remote server to transfer it again.

The authors of this paper implemented a DOT system and tested the performance of the system. Benchmarks shows the prototype system performs well and imposes minimal overhead. They also incorporated DOT into a production mail server, and by exploiting redundancy between messages, they were able to reduce the amount of transferred data by 20%.

Possible topics for discussion:

What practical barriers stand in the way of adoption of a DOT-like system, and how difficult would be to overcome these barriers?
Is the DOT protocol sufficiently flexible to efficiently accommodate likely future transfer methods?
Might the system introduce too much latency to be used in web browsers?

#2 posted on Apr 12 2008, 23:23 in collection CMU 15-744: Computer Networks -- Spring 08
The idea of having a transfer service that works for all application really will certainly create favorable environment for innotvative transfer techniques. What's more interesting was that we can even save bandwidth by being able to have a universal cache for all applications.

#3 posted on Apr 13 2008, 09:46 in collection CMU 15-744: Computer Networks -- Spring 08
DOT decouples the application logic from the actual process of transferring data, which is important to help the development of new services. In the case of such new services and since the overhead is not very significant , there is no reason why the architecture would be difficult to implement.
In the case of old services, if for some reason their transfer method suffers a significant change, they would have to be reimplemented anyway (in that case the new implementation could use DOT).

The paper shows that using their multi-path plugin DOT can improve performance of transfers. By allowing applications to take advantage of transfer plugins that may not even exist when they were developed, DOT creates a level of modularity that makes applications more flexible and more durable.

#4 posted on Apr 13 2008, 12:10 in collection CMU 15-744: Computer Networks -- Spring 08
I thought of this architecture and its modularity as really the fulfillment of the "end to end" ethos which we read about at the beginning of the course. Such a system provides a way to separate an application from its underlying structure, all made possible through DHTs.

#5 posted on Apr 13 2008, 12:40 in collection CMU 15-744: Computer Networks -- Spring 08
Overall, I found that the ideas in this paper generally made sense. One small thing that confused me was: in the original motivation for this paper, it's claimed that DOT helps to avoid re-implementing transfer mechanisms; while later in the discussion, falling to back to the application's native mechanism is recommended when DOT doesn't work. Another thought: I really like the idea of a shared content distribution system that is tapped into by multiple applications. Especially since this allows for content-based caching. I think this makes sense with so many media applications out there with redundant content.

#6 posted on Apr 13 2008, 13:54 in collection CMU 15-744: Computer Networks -- Spring 08
This is a nice abstraction I'd never thought about. I'm glad they show that overhead is not too great, since that'd have been my greatest concern. It doesn't seem like such a system would be difficult to deploy, especially if they let "small objects" to be sent directly as they mention in Section 7.

#7 posted on Apr 13 2008, 14:00 in collection CMU 15-744: Computer Networks -- Spring 08
I think that this idea can be applied for the DTN architecture for "challenged Internets," by providing naming interfaces and transfer plugins for those networks. Also, it is interesting to include a plugin for physical delivery through portable storage media;

#8 posted on Apr 13 2008, 14:13 in collection CMU 15-744: Computer Networks -- Spring 08
#4: I think it depends on what the service is offering. A lot of networking researchers use the Click Modular Router software for real world implementations of new routing protocols, sampling techniques, etc. As far as I know, it's used more as a proof of concept for evaluating the overhead of various schemes, not as a mechanism for real world deployment. While we try to use modular architectures for future innovation, these architectures seem very rarely used, sadly.

My initial hope was that with the open source movement, companies would begin to embrace the quality and innovation of certain open source software, especially with something like XORP (open source router software). This could eliminate dependencies on Cisco/Juniper to implement something that the open source community could do instead. On the other hand, as we know, many things are actually implemented in Cisco/Juniper routers, and so the main difficulty still is in getting real world networks to enable them.

#6: If you haven't seen this already, SET uses DOT as a substrate for implementing similarity enhanced transfer for P2P applications. See the paper here: http://www.cs.cmu.edu/~dga/papers/nsdi2007-set-abstract.html. Also for

#9 posted on Apr 13 2008, 15:10 in collection CMU 15-744: Computer Networks -- Spring 08
I thought the mail case study was very interesting (and definitely an important domain). DOT seemed to perform very well, saving 20% of message bytes in several ways. I really liked the use of diagrams to isolate the different factors -- in particular showing the heavy-tailed distribution of mail message size, and the histogram of the number of duplicates. I also found the results on the multi-path plugin in section 6.2 very impressive -- achieving a savings of 45-50% on several of the links.

#10 posted on Apr 13 2008, 15:48 in collection CMU 15-744: Computer Networks -- Spring 08
Overall I liked this paper. The modular design and the notion of plug-ins for decoupling control and data transfers makes a lot of sense. Initially, I was hesitant as to why DOT would really be useful, but I think the authors did a good job of presenting the motivation behind it. I was also expecting to see larger overheads when using the DOT framework, but it seems that – especially for bulky transfers – the obtained performance is on par with traditional transfer techniques, which was quite impressive. It was nice to see that the authors created a socket-like stub library for easier porting of existing applications. Along with the graceful fall-back mechanisms of DOT (e.g. fall back to normal SMTP communication if the local GTC is irresponsive) I think this makes the deployment of DOT much more appealing. Finally, I think that the proof-of-concept DOT-based mail server implementation was one of the strongest points in the paper. Not only did it prove that incorporating DOT in large scale applications is relatively easy, but it also showed that this leads to significant bandwidth savings.

#11 posted on Apr 13 2008, 16:40 in collection CMU 15-744: Computer Networks -- Spring 08
The paper is well organized. It proposed the architecture for separating data transfer from applications; i.e. DOT. The authors addressed both design and implementation aspects of the proposed architecture. As it is stated that the main goal of the architecture is to "facilitate innovation without impeding performance", it is shown in evaluation section that DOT can reduce bandwidth use, and make new functionality such as multi-path or portable storage-based transfer available while imposing little overhead.

#12 posted on Apr 13 2008, 16:40 in collection CMU 15-744: Computer Networks -- Spring 08
The paper presents an innovation in data transfer which is based on the idea of the decouple of content negotiation (selection of the content) and content transfer (transmitting the data) to make it extensible. The experiment shows that the overhead is pretty low and performance benefits come from caching. But I suspect that they could be helpful for malicious users to create and spread spams.

#13 posted on Apr 13 2008, 20:14 in collection CMU 15-744: Computer Networks -- Spring 08
I am working in DOT project. I was really inspired by the idea of DOT when I first read this paper. For me it is more like a virtual layer so every content is associated with a address known as ObjectID (OID) instead of a real URL like http://128.1.1.1/files/foo.bar. And with this OID, multiple sources can be obtained so users may have more potential downloading sources. This idea is a kind of similar as i3 where each virtual ID corresponds to multiple hosts. It seems this delegation scheme is playing an important role now.

#14 posted on Oct 06 2008, 00:59 in collection UW-Madison CS 740: Advanced Computer Networking -- Spring 2012
DOT, or 'Data Object Transfer' (well probably; it's never mentioned), is a flexible transfer service for use by other applications. Much like FTP, connections that use DOT have a control channel and a data channel, except that the data channel is managed entirely by DOT. As I was reading this, I had a few moments where I thought, 'aha, but what about ...?. In the end, all of my lingering questions were answered. This paper gives a very clear and complete description of the authors' concept.

I noticed a peculiarity in Table 2 that the authors did not discuss. When a link is bonded with another slower link (except row 2), the efficiency increases. For example, the fourth row shows a 10% increase in the theoretical combined bandwidth. The effective increase in bandwidth is 21.45/13.58-1 = 58%. Perhaps slower connections are more efficient?

It would have been nice to see some benchmarks on compression pipelining. This isn't applicable to FTP, but it is for emails and web content, which are mostly text. gzip would work well, and it would be interesting to see how the network load would change.

The only apparent problem is that the DOT scheme is similar to that of FTP. The data and control channels are on separate connections, so it suffers from the same jankiness. Namely, it doesn't work well with firewalls. The problem of separate control & data channels is probably well enough understood, though, and the same techniques could surely be used.

#15 posted on Oct 06 2008, 03:10 in collection UW-Madison CS 740: Advanced Computer Networking -- Spring 2012
This paper is motivated by the fact that many existing network applications entangle control and data transmission into one bundle. As a result, any new approach to data transfer needs to be hacked on a per application basis to fit for that particular application.

The essence of the paper is decoupling the data transfer from the control exchange, and providing an service (DOT) to which applications can outsource data transfer.

Having this common service which transfers data on behalf of diverse application has many advantages as the paper mentions. One no longer needs to reimplement each novel data trasfer mechanism for every existing applications; a new plugin can be introduced for DOT which implements the new algorithm once and for all. Also, things like finding and fetching data from mirror sources, compression etc becomes feasible without modification to the application itself.

On a side note, the authors have done a neat job of arresting the interest of the reader by dangling the possibility of stuff like delivering email over bittorrent in front of them. And DOT does in fact make that seem reasonably feasible.

However, there are some issues which I feel will be road blocks to widespread deployment of DOT.

1. Given the receiver pull model of DOT, and separate connections for control and data; there needs to be two way connectivity in many instances (though not all) for DOT to work. The paper does not propose any mechanism to circumvent NAT and stuff like that which can come in the way.

2. Although there is mention of fetching the object from a mirror rather than the original sender, I see no mention of how DOT proposes to identify mirrors.

3. DOT cannot operate with things like voice data etc. From the paper, DOT does not seem to have any mechanism in place to support quality/level of service etc.

4. The possibility of receiver obtaining the object from a mirror rather than the actual sender introduces another problem. How long does the sender cache the data? The solution proposed by the paper seem to be memory intensive, and given that DOT is designed for bulk data transfer, this would flag a serious scalability concern.

5. The paper uses a hash of the data primarily as the object identifier. However, there's no mention of how hash-conflicts are addressed.

Albeit all those questions, I have to say that the paper proposes a neat and novel concept. It boils down to a data transfer service that can be provided maybe at the OS level, which user processes can use for bulk data transfer over the network while being blissfully ignorant of how the tranfer is actually carried out. And this separation allows for a large extent of global optimizations.