ResearchIndex

Strategies for Sound Internet Measurement

by Vern Paxson

url show details

You need to log in to add tags and post comments.

Tags

Public comments

#1 posted on Apr 03 2008, 00:13 in collection CMU 15-744: Computer Networks -- Spring 08
Summary:
Internet, traffic, and routing measurements are difficult and error prone.
Errors may occur as 1) imprecision (e.g. timing clock in most machines has a precision of 500ms(or 10ms). 2)meta-data problem such as annotation of web log; 3) inaccuracy, such as problem with packet filters, also timing; 4) misconception e.g. TCP loss rate vs retransmition rate.

To detect errors, the author discuss several strategies including:
1) examining outliers and spikes, e.g. some traffic rate even higher than light speed;
2) self-consistency checks,
3) comparing multiple measurements and verify against each other, also we could perform multiple version ofthe analysis
4) through synthetic data

The author discusses the issue of the large data, and he suggests an initial analysis on small subset and then the whole data.
Vern also comments on the seemingly paradox of deadline-driven research and the reproducibility. He again discusses the importance of making the dataset publicly available, not only the data itself but also with the tools to process the data.

This paper is a kind of philosophical thoughts on measuring internet. While measurement is important to many other fields, in computer science or in phisics, chemistry, biology, the same arguments and strategies could also bring the attention much beyond the current scope. For example, in data mining or machine learning, how much attention is paid to the quality or accuracy of the data, versus to the fancy model?

#2 posted on Apr 03 2008, 04:26 in collection CMU 15-744: Computer Networks -- Spring 08
I thought the paper was well written and discussed important problems that often may be overlooked during deadline-driven research. There were many relevant examples described relating to networks that may not have been obvious to researchers previously. But that being said, I think pretty much all of the problems discussed are just common sense -- precision, accuracy, significant digits, outliers, error-checking, large datasets, etc. I think this would be a great paper to give to undergrads or new grad students starting to learn about research in networks and other experimental fields.

#3 posted on Apr 03 2008, 14:40 in collection CMU 15-744: Computer Networks -- Spring 08
I thought this was an insightful paper that summarized a large number of problems that system researchers often face. The author frequently mentions the "packet filter drop" problem, which is when instrumentation introduces enough overhead to artificially cause packet drops.

I wonder if FPGA-based NIC devices such as http://www.cs.rice.edu/CS/Architecture/ricenic/index.html could be used to implement detailed instrumentation without incurring slowdown (since instrumentation such as filtering can be done in parallel in hardware).

#4 posted on Apr 03 2008, 15:26 in collection CMU 15-744: Computer Networks -- Spring 08
This paper was fun to read, especially the section on misconception. It's important to check one's assumptions when performing measurements, especially with all the layering, caching, and indirection that networks imply. The reproducable analysis section also brings up a point that applies to almost any evaluation (not just in networks).

#5 posted on Apr 03 2008, 15:26 in collection CMU 15-744: Computer Networks -- Spring 08
This paper lists almost every single point which is important and difficult in measuring network data. However, even though he states a problem without missing any important points, the solutions given for each problem are not in detail. (They seem correct and visionary though.)

#6 posted on Apr 03 2008, 15:57 in collection CMU 15-744: Computer Networks -- Spring 08
I rather enjoyed this paper; it is well-written and exposes quite a few "gotchas" in network measurement studies. When using a custom implementation (either in simulation or in real use), I think it is especially important to have transparency and test it thorougly to ensure that it does what you think it does. A simple software bug that silently lies undetected can produce plausible-looking but completely unsound data!

#7 posted on Apr 03 2008, 16:14 in collection CMU 15-744: Computer Networks -- Spring 08
Like many people said, this paper is useful not only for Internet measurement but also for all fields of research that needs data analysis. When I read the paper, without noticing the year it was published, I though that this was one of classical papers in the area. the fact that this was a paper published in 2004 also surprises me. One interesting point that I've learned is, from section 3. Dealing with Large Volumes of Data, that sometimes "too good" data could be not useful as well.

#8 posted on Apr 03 2008, 16:19 in collection CMU 15-744: Computer Networks -- Spring 08
This was a very nicely written and interesting paper. Although the presented measurement strategies are given in the context of network-related experiments, the same guidelines apply to a variety of other research areas. It also made me think about many of the previous papers we read from a different perspective. For example, after reading about the "vantage point" problem it makes much more sense why the authors of the “DNS Caching” paper used two completely different trace-datasets (one from MIT in Boston and one from KAIST in Korea). Moreover the abundant use of real examples makes the differentiation among the various common measurement mistakes much clearer. In particular I liked the explanation of the distinction between the notions of precision and accuracy.

#9 posted on Apr 03 2008, 21:22 in collection CMU 15-744: Computer Networks -- Spring 08
This is a paper with ZERO mathematical equation. I think it is more like a collection of suggestions and tips. I don't have much experience with real network measurement. But I think I learn a lot from this paper.

I was a kind of surprised by the statement in the paper that, "large datasets almost never have statistically exact descriptions". I am not sure about this since I saw too many distributions gathered from very large dataset.