Sunday, March 1, 2015

Transactional Memory

Transactional Memory is a method that allows programming blocks, where data races can occur, to execute atomically. Just like writing code in a single-threaded manner, you only need to wrap the block with atomic {} to ensure atomic execution.

Software Implementation

For software implementation, simple assignments like a = 3 won't work. Instead, you would use something like a.transtore(3) to handle the operation.

There are various types of transactional memory, but they are not widely used yet. While it works well in practice, there hasn't been a significant performance improvement observed in most cases. To truly see performance gains, you would likely need a system with more than 64 cores to overcome the overhead.

For example, Unreal 3 attempted to use it, but it’s still not practical in most cases.

Hardware Implementation

When implemented in hardware, one example is SUN's lock processor, which ultimately failed because Oracle wasn't interested in hardware solutions.

In this hardware approach, if the transaction bit in the cache tag is set, the data is not written to the main memory. Memory operations occur only within the L1 cache, which works well even when there are multiple CPUs.

Cache Limitations

The L1 cache size is typically 32KB. If the data exceeds this size, the transaction won't work. Interestingly, even if the data size is 9KB, it might still fail under certain conditions. There are three types of cache:

  • Direct-Mapped Cache: In this type, when an integer variable a is stored, the cache location is predefined. If the variable’s location overlaps with another, it invalidates the cache (for example, using two variables that overlap). This method is cost-effective but limited.

  • Fully Associative Cache: This type allows data to be stored freely in any available cache location, making it more flexible but also more expensive.

  • Set-Associative Cache: This is a middle ground. For example, an integer variable a might have 8 possible cache locations. If all 8 locations are occupied, it cannot fit in the cache. In the case of a 9-byte variable, the cache cannot accommodate it because each slot is for a 1-byte unit. This limitation may cause transactional memory not to function properly.


This version provides a clearer explanation of transactional memory, how it works in software and hardware, and the cache-related limitations that can affect its performance. Let me know if you'd like any more adjustments!