papers | collections | search login | register | forgot password?

Speculative Lock Elision:: Enabling Highly Concurrent Multithreaded Execution
by James Goodman, Ravi Rajwar
show details
You need to log in to add tags and post comments.
Tags
Speculation (1), removal of unnecessary serialization (1), observation of memory (1)
Public comments
#1 posted on Sep 28 2013, 22:55
A fundamental bottleneck of multithreaded programs is the serialization caused by critical sections, which actually could have been safely executed concurrently without locks. In this paper, the author proposed a mechanism that can dynamically detect such false inter-thread dependence the lack of which has prevented the full utilization of high parallelism.The key insight of it is that locks do not always have to be acquired for a correct execution.
This SLE needs no support from instruction set nor system-level modifications, is transparent to programmers, and requires only trivial additional hardware support.
Something to note: 1) conventional speculative execution in out-of-order processors cannot take advantage of the parallelism because the thread needs to acquire the lock in a serial manner first. 2) A lock does not always have to be acquired for a correct execution if hardware can provide the appearance of atomicity for all memory operations within the critical sections.
SLE involves two kinds of predictions:
1. On a store, predict that another store will shortly follow and undo the changes by this store. The prediction is resolved without stores being performed but it requires the memory location (of the stores) to be monitored. If the prediction is validated, the two stores are elided.
2. Predict that all memory operations within the window bounded by the two elided stores occur atomically.
SLE does not require the processor to support out-of-order execution but simply the ability to speculatively retire instructions. In other words, inter-instruction dependence information need not be maintained.
The key notion of atomicity of memory operations enables the technique to be incorporated in any processor without regard to memory consistency as correctness is guaranteed without any dependence on memory ordering.

One possible weakness: How many significant false inter-thread dependencies can this SLE actually detect?