papers | collections | search login | register | forgot password?

Token Coherence: Decoupling Performance and Correctness
by David Wood, Mark Hill, Milo Martin
show details
You need to log in to add tags and post comments.
Tags
shared-memory multiprocessor (1), cache coherence (1), performance protocol (1)
Public comments
#1 posted on Sep 22 2013, 18:06
Many future shared-memory multiprocessor servers will both target commercial workloads and use highly-integrated "glueless" designs. Totally-ordered interconnects are difficult to implement in glueless designs. Here we propose a new coherence framework to enable such protocols by separating performance from correctness. A performance protocol can optimize for the common case and rely on the underlying correctness substrate to resolve races, provide safety, and prevent starvation.
To efficiently support the frequent communication and synchronization in these workloads, servers should optimize the latency of cache-to-cache misses. To reduce cache-to-cache miss latency, many multiple processor servers use snooping cache coherence. The use of broadcast limits snooping's scalability, but small-to medium-sized snooping-based multiprocessors suffice for many workloads.
The increasing number of transistors per chip will continue to encourage more integrated designs, making "glue" logic less desirable. These clueless interconnects are fast but do not easily provide the virtual bus behavioral required by traditional snooping protocols. Ideally, a coherence protocol would both avoid indirection latency for cache-to-cache misses and not require any interconnect ordering. Rather than abandoning this fast approach, we use it to make the common case fast, but we back it up with a substance that ensures correctness.
Token coherence is a general coherence framework that enables the creation of other performance protocols that can reduce traffic for larger systems, use prediction to push data, and support hierarchy with low complexity. A race example has been showed to illustrate the advantages and disadvantages of fast approach, snooping protocols, directory protocols and token coherence. Token coherence allows races to occur but provides correct behavior in all cases with a correctness substrate. The correctness substrate uses token counting to enforce safety, and it uses persistent requests to prevent starvation. We provide the same guarantee by explicitly tracking tokens foreach block. While our invariant restrict the data and token content of coherence messages, the invariants do not restrict when or to whom the substrate can send coherence messages.
A processor invokes a persistent request whenever it detects possible starvation. The correctness substrate implements persistent requests with a simple arbiter state machine at each home memory module. One way in which performance protocols seek high performance is by specifying a policy for using transient requests. A performance protocol also specifies a policy for how system components respond to transient requests.