Menu
Dev.to #systemdesign·March 29, 2026

Implementing Intrusive Hash Tables for Custom Data Structures

This article delves into the complexities of implementing an intrusive hash table, highlighting the challenges of managing data structures without embedding data. It discusses the necessity of tracking multiple tables during incremental resizing and the architectural decisions involved in separating node and table-level functionalities. The author also touches upon the choice of hash functions, favoring MurmurHash for better distribution over FNV.

Read original on Dev.to #systemdesign

Understanding Intrusive Data Structures

An intrusive data structure offers a way to organize data without embedding the data directly within the structure itself. Instead, the data structure holds pointers or references to the data, and the data itself contains metadata (like pointers for linked lists or hash table entries) that allows it to participate in the structure. This approach can be beneficial for memory efficiency, avoiding unnecessary copies, and managing complex object lifecycles, especially in performance-critical applications or when integrating with existing C-style data without modification.

Challenges of Intrusive Hash Table Implementation

The implementation of an intrusive hash table, particularly with features like incremental resizing, presents significant architectural hurdles. A key challenge is managing multiple tables concurrently during the rehashing process. While resizing, new entries go into the new, larger table, but existing entries remain in the old table until they are gradually migrated. This necessitates:

  • Two-level Functionality: Distinct functions for node-level operations (e.g., managing individual entries and their links) and table-level operations (e.g., rehashing, overall table management).
  • State Management: Carefully tracking the state of both the old and new hash tables, including which entries reside where and the progress of rehashing.
  • Equality Checks: Performing equality checks in two places – comparing hash codes at the node level and actual data at the server/application level to ensure correctness.
ℹ️

Why Incremental Resizing?

Incremental resizing avoids the performance spike associated with reallocating and rehashing an entire table at once. Instead, rehashing is spread out over multiple operations, leading to more predictable performance, which is crucial for real-time or low-latency systems. However, as the article notes, this complexity is often only justified when strictly necessary.

Hash Function Selection

The choice of hash function is critical for hash table performance. For non-cryptographic hash tables, the primary goals are fast computation and good distribution to minimize collisions and maintain O(1) average-case lookup times. The article notes that while FNV is a common choice, MurmurHash often provides better distribution, which can be beneficial for minimizing chain lengths in collision resolution strategies.

cpp
/* Example conceptual structure for an intrusive node */
struct MyDataNode {
    // Application data fields
    int value;
    std::string key;

    // Intrusive hash table metadata
    MyDataNode* next_in_bucket;
    size_t hash_code;
};

/* Concept of container_of() */
#define container_of(ptr, type, member) \
    ((type *)((char *)(ptr) - offsetof(type, member)))
hash tableintrusive data structuresdata structuresperformancememory managementrehashingcustom implementation

Comments

Loading comments...