11.1 Direct-address tables
11.1-1
Suppose that a dynamic set $S$ is represented by a direct-address table $T$ of length $m$. Describe a procedure that finds the maximum element of $S$. What is the worst-case performance of your procedure?
As the dynamic set $S$ is represented by the direct-address table $T$, for each key $k$ in $S$, there is a slot $k$ in $T$ points to it. If no element with key $k$ in $S$, then $T[k] = \text{NIL}$. Using this property, we can find the maximum element of $S$ by traversing down from the highest slot to seek the first non-$\text{NIL}$ one.
1 2 | MAXIMUM(S) return TABLE-MAXIMUM(T, m - 1) |
1 2 3 4 5 6 | TABLE-MAXIMUM(T, l) if l < 0 return NIL else if DIRECT-ADDRESS-SEARCH(T, l) != NIL return l else return TABLE-MAXIMUM(T, l - 1) |
The $\text{TABLE-MAXIMUM}$ procedure gest down and checks $1$ sloc at a time, linearly approaches the solution. In the worst case where $S$ is empty, $\text{TABLE-MAXIMUM}$ examines $m$ slots. Therefore, the worst-case performance of $\text{MAXIMUM}$ is $O(n)$, where $n$ is the number of elements in the set $S$.
11.1-2
A bit vector is simply an array of bits ($0$s and $1$s). A bit vector of length $m$ takes much less space than an array of $m$ pointers. Describe how to use a bit vector to represent a dynamic set of distinct elements with no satellite data. Dictionary operations should run in $O(1)$ time.
Using the bit vector data structure, we can represent keys less than $m$ by a string of $m$ bits, denoted by $V[0..m - 1]$, in which each position that occupied by the bit $1$, corresponds to a key in the set $S$. If the set contains no element with key $k$, then $V[k] = 0$. For instance, we can store the set $\{2, 4, 6, 10, 16\}$ in a bit vector of length $20$:
$$001010100010000010000$$
1 2 3 4 | BITMAP-SEARCH(V, k) if V[k] != 0 return k else return NIL |
1 2 | BITMAP-INSERT(V, x) V[x] = 1 |
1 2 | BITMAP-DELETE(V, x) V[x] = 0 |
Each of these operations takes only $O(1)$ time.
11.1-3
Suggest how to implement a direct-address table in which the keys of stored elements do not need to be distinct and the elements can have satellite data. All three dictionary operations ($\text{INSERT}$, $\text{DELETE}$, and $\text{SEARCH}$) should run in $O(1)$ time. (Don't forget that $\text{DELETE}$ takes as an argument a pointer to an object to be deleted, not a key.)
Assuming that fetching an element should return the satellite data of all the stored elements, we can have each key map to a doubly linked list.
- $\text{INSERT}$: appends the element to the list in constant time
- $\text{DELETE}$: removes the element from the linked list in constant time (the element contains pointers to the previous and next element)
- $\text{SEARCH}$: returns the first element, which is a node in a linked list, in constant time
11.1-4 $\star$
We wish to implement a dictionary by using direct addressing on a huge array. At the start, the array entries may contain garbage, and initializing the entire array is impractical because of its size. Describe a scheme for implementing a direct-address dictionary on a huge array. Each stored object should use $O(1)$ space; the operations $\text{SEARCH}$, $\text{INSERT}$, and $\text{DELETE}$ should take $O(1)$ time each; and initializing the data structure should take $O(1)$ time. ($\textit{Hint:}$ Use an additional array, treated somewhat like a stack whose size is the number of keys actually stored in the dictionary, to help determine whether a given entry in the huge array is valid or not.)
We denote the huge array by $T$ and, taking the hint from the book, we also have a stack implemented by an array $S$. The size of $S$ equals the number of keys actually stored, so that $S$ should be allocated at the dictionary's maximum size. The stack
has an attribute $S.top$, so that only entries $S[1..S.top]$ are valid.
The idea of this scheme is that entries of $T$ and $S$ validate each other. If key $k$ is
actually stored in $T$, then $T[k]$ contains the index, say $j$, of a valid entry in $S$, and
$S[j]$ contains the value $k$. Let us call this situation, in which $1 \le T[k] \le S.top$, $S[T[k]] = k$, and $T[S[j]] = j$, a validating cycle.
Assuming that we also need to store pointers to objects in our direct-address table, we can store them in an array that is parallel to either $T$ or $S$. Since $S$ is smaller than $T$, we'll use an array $S'$, allocated to be the same size as $S$, for these pointers. Thus, if the dictionary contains an object $x$ with key $k$, then there is a validating cycle and $S'[T[k]]$ points to $x$.
The operations on the dictionary work as follows:
- Initialization: Simply set $S.top = 0$, so that there are no valid entries in the stack.
- SEARCH: Given key $k$, we check whether we have a validating cycle, i.e., whether $1 \le T [k] \le S.top$ and $S[T[k]] = k$. If so, we return $S'[T[k]]$, and otherwise we return $\text{NIL}$.
- INSERT: To insert object $x$ with key $k$, assuming that this object is not already in the dictionary, we increment $S.top$, set $S[S.top] = k$, set $S'[S.top] = x$, and set $T[k] = S.top$.
- DELETE: To delete object $x$ with key $k$, assuming that this object is in the dictionary, we need to break the validating cycle. The trick is to also ensure that we don't leave a "hole" in the stack, and we solve this problem by moving the top entry of the stack into the position that we are vacating-and then fixing up that entry's validating cycle. That is, we execute the following sequence of assignments:
$$ \begin{aligned} & S[T[k]] = S[S.top] \\ & S'[T[k]] = S'[S.top] \\ & T[S[T[k]]] = T[k] \\ & T[k] = 0 \\ & S.top = S.top - 1 \end{aligned} $$
Each of these operation - initialization, $\text{SEARCH}$, $\text{INSERT}$, and $\text{DELETE}$-takes $O(1)$ time.