Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
by Michael Franklin, Murphy McCauley, Justin Ma, Ankur Dave, Tathagata Das, Mosharaf Chowdhury, Matei Zaharia
show details
Details
abstract: | We present Resilient Distributed Datasets (RDDs), a dis- tributed memory abstraction that lets programmers per- form in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks han- dle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarse- grained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expres- sive enough to capture a wide class of computations, in- cluding recent specialized programming models for iter- ative jobs, such as Pregel, and new applications that these models do not capture. We have implemented RDDs in a system called Spark, which we evaluate through a variety of user applications and benchmarks. | url: | http://www.cs.cmu.edu/~15712/papers//zaharia12.pdf |
|
|
You need to log in to add tags and post comments.