Hash Table in Data Structure

Hash Table is a data structure that stores key-value pairs in an Array. A hash function is used to determine the array index for every key. An efficient hash function equally distributes the keys across the array.

Table of Contents

1. Representation of Hash Table

Here is an example of storing the Customer Name and Customer ID mapping in a hash table. We often forget the customer id, so we can lookup by the unique name. This helps to store and retrieve the Customer ID in a single lookup operation which is O(1).

Hash Function: Formula to determine the array index for the given key.
Key: An unique key indexed to store and retrieve the data.
Val: Value associated with the key. This can be a large object consisting of information about the key.

2. What is Hashing in Data Structure

Hashing is a technique to map certain data to a particular key. In a large database, the data can be quickly stored and retrieved using the unique key.

Hash function

A hash function is essential to identify the unique address for every key. The hash function is generally a formula that can produce a unique index for every key. So, we can store and retrieve the data from the location.

Collision

The hash function can still return the same index for multiple keys is called collisions. This can be addressed by the various collision resolution techniques described below.

3. Hashing Techniques

There are different types of hashing techniques available to handle collisions. This is also referred to as collision resolution techniques.

Separate Chaining
Open Addressing
- Linear Probing
- Quadratic Probing
- Double Hashing

3.1. Separate Chaining

An extended linked list is attached for every index when the collision occurs. The key can be identified by performing a linear search on the list. This method is suitable for dynamic and extensive application data.

Read Hash Table Implementation using Separate Chaining for more details.

3.2. Open Addressing

The data will still be stored in the same array by locating a different index when the collision occurs. This method is suitable for limited data and provides a better cache than the Separate Chaining method. The following are the different ways to identify the next index during the collision.

Linear Probing – Sequentially find the next free slot.
Quadratic Probing – Find the next free slot by factor x*x.
Double Hashing – Apply a second hashing method to find the free slot.

4. Hash Table Operations and Time Complexity

Operation	Short Description	Time Complexity	Space Complexity
Access	Lookup the element by key	O (1)	O (n)
Insert	Insert a new key-value pair	O (1)	O (n)
Delete	Delete the key-value pair	O (1)	O (n)
Rehashing	Increase the size when the table reaches near capacity	O (n)	O (n)

Operations and Time Complexity of Hash Table

5. Implementations of Hash Table

Prefer Hash Table Implementation using Separate Chaining to use with dynamic and extensive data.
Prefer Hash Table Implementation using Linear Probing to use with static and limited data.

6. Advantages and Disadvantages

Advantages

The hash table is efficient to store and retrieve the data and works in O(1) time. So, it works better than other data structures to search the data.

Disadvantages

Requires extra memory to store the keys.
Requires additional computing power for the hash function.
Not suitable for sorting operations.

7. Applications

Hash Table is generally required for indexing the data with keys. The following are the real-time applications.

Message Digest – MD5SUM, SHA256 for content verification
Password Verification – Stores hashed passwords in the server for validations.
Hash Data Structure in programming – C++ std::unordered_set, std::unordered_map
Set – Implementation of the set data structure
Compiler operation – Stores all the keywords in a set for differentiation.
Rabin-Karp Algorithm
Database systems – In multi-node database systems, hash tables are commonly used to distribute rows amongst nodes, reducing network traffic for hash joins
Implement caches – Cache data for faster access, collision can be handled easily by discarding the older cache item.
Object representation – Perl, Python, JavaScript, Lua, and Ruby, use hash tables to implement objects. In this representation, the keys are the names of the members/methods, and the values are pointers to the corresponding value.
Unique data representation – Avoid storing the repeated content for example having the multiple character strings with the same content.