Hardware scout is a technique that uses otherwise idle processor execution resources to perform prefetching during cache misses. When a thread is stalled by a cache miss, the processor pipeline checkpoints the register file, switches to runahead mode, and continues to issue instructions from the thread that is waiting for memory. The thread of execution in run-ahead mode is known as a scout thread. When the data returns from memory, the processor restores the register file contents from the checkpoint, and switches back to normal execution mode.
The computation during run-ahead mode is discarded by the processor; nevertheless, scouting provides speedup because memory level parallelism (MLP) is increased. The cache lines brought into the cache hierarchy are often used by the processor again when it switches back to normal mode.
Rock processor scout
Sun's Rock processor (later canceled) used a form of hardware scout. However, any computations in run-ahead mode that do not depend on the cache miss may be retired immediately. This allows both prefetching and traditional instruction-level parallelism.
Scouting vs. SMT
Scouting and simultaneous multithreading (SMT) both use hardware threads to fight the memory wall. With scouting, the scout thread runs the instructions from the same instruction stream as the instruction that causes the pipeline stall. In the case of SMT, the SMT thread executes instruction in another context.
Thus, SMT increases the throughput of the processor while scouting increases the performance by lowering the number of cache misses.
- Improving data cache performance by pre-executing instructions under a cache miss
- Improving processor performance by dynamically preprocessing the instruction stream
- High Performance Throughput Computing
- Runahead execution: an alternative to very large instruction windows for out-of-order processors
- Sun: Can you smell what the Rock is Cookin'