Thursday, October 31, 2019

Garbage Collection - allocation, fragmentation, partitioning the heap

Garbage Collection (GC) Overview

Garbage Collection (GC) is a memory management technique used to automatically identify and reclaim unused memory in a program. It ensures that programs run efficiently by managing memory allocation and deallocation transparently, without manual intervention. Below are the core aspects of GC:


Three Core Aspects of Memory Management

  1. Initial Memory Allocation:

    • Allocating memory for new objects or data as required during program execution.
  2. Identification of Live Data:

    • Determining which objects are still in use and accessible by the program.
  3. Reclaiming Dead Memory:

    • Freeing memory occupied by objects that are no longer reachable or in use.

Key Differences Between Automatic and Explicit Deallocation

  1. Bulk Deallocation:

    • Automatic GC typically reclaims memory in bulk rather than one object at a time.
  2. Metadata for Allocation:

    • Automatic GC requires additional metadata to track allocations, enabling garbage collection to operate efficiently.
  3. Programming Style:

    • Automatic GC encourages frequent use of heap allocations without the burden of manual memory management.

Allocation Strategies

  1. Sequential Allocation:

    • Allocates memory sequentially from the free space.
    • Advantages: Excellent cache locality, ideal for moving collectors.
    • Disadvantages: May lead to fragmentation.

    Algorithm Example (Sequential Allocation):

    sequentialAllocation(n): result <- free newFree <- result + n if newFree > limit: return null free <- newFree return result
  2. Free List Allocation:

    • Maintains a data structure (free list) to track free memory blocks.
    • Advantages: Handles fragmentation better in non-moving collectors.
    • Disadvantages: Slower allocation compared to sequential methods.

    Algorithm Example (First-Fit Allocation):

    firstFitAllocate(n): prev <- addressOf(head) loop: curr <- next(prev) if curr = null: return null else if size(curr) < n: prev <- curr else: return listAllocate(prev, curr, n)
    • If the block size is larger than requested, it splits the block and adds the remainder back to the free list.

Types of Free List Allocation

  1. First-Fit Allocation:

    • Allocates the first suitable free block.
    • Small blocks accumulate at the front, slowing allocation over time.
  2. Next-Fit Allocation:

    • Resumes searching from the last successful allocation point.
    • Reduces repeated traversal of small blocks.
  3. Best-Fit Allocation:

    • Allocates the smallest block that fits.
    • Minimizes wasted space but increases search time.

Fragmentation

  • Definition: The process where free memory is scattered in small, non-contiguous blocks.
  • Negative Effects:
    • Sufficient memory may exist overall, but the lack of contiguous blocks can prevent allocation.
    • Leads to forced garbage collection or program termination.

Heap Partitioning

Different GC algorithms and optimizations can be applied based on how memory is partitioned. Partitioning improves GC efficiency by categorizing objects based on their properties.

  1. Partitioning by Mobility:

    • Separate movable objects from non-movable ones (e.g., objects tied to I/O).
  2. Partitioning by Size:

    • Large objects are typically non-movable to reduce copying overhead.
  3. Partitioning by Space:

    • Objects with short lifespans are allocated in fast and low-cost collection spaces.
    • Long-lived objects are allocated in less frequently collected spaces.
  4. Partitioning for Generational Collection:

    • Separates objects by age:
      • Young objects (short-lived) are collected more frequently.
      • Old objects are collected less often to reduce collection costs.

Generational Garbage Collection

  • Principle: Most objects die young.
  • Key Features:
    • Divides heap into multiple generations (e.g., young and old).
    • Collects younger generations more frequently, as they are likely to contain most garbage.
    • Reduces collection overhead for long-lived objects.

Advanced Partitioning Strategies

  1. Locality-Based Partitioning:

    • Groups related objects (e.g., parent-child relationships) for better cache performance.
    • Focuses on areas likely to free up space.
  2. Thread-Based Partitioning:

    • Allocates and collects objects within a thread to minimize synchronization overhead.
  3. Mutability-Based Partitioning:

    • Segregates frequently updated objects from rarely updated ones, reducing the overhead of tracking changes.

Optimal Allocation Goals

  • Aim to allocate and deallocate memory while minimizing space usage and fragmentation.
  • Theoretical optimal allocation is NP-hard, meaning heuristic approaches are required in practice.

By understanding these concepts, developers can better utilize GC systems and design programs that optimize memory management.