67 lines
3.5 KiB
TeX
67 lines
3.5 KiB
TeX
|
\section{Related Work}
|
||
|
Czech, Havas and Majewski~\cite{chm97} provide a
|
||
|
comprehensive survey of the most important theoretical results
|
||
|
on perfect hashing.
|
||
|
In the following, we review some of those results.
|
||
|
|
||
|
Fredman, Koml\'os and Szemer\'edi~\cite{FKS84} showed that it is possible to
|
||
|
construct space efficient perfect hash functions that can be evaluated in
|
||
|
constant time with table sizes that are linear in the number of keys:
|
||
|
$m=O(n)$. In their model of computation, an element of the universe~$U$ fits
|
||
|
into one machine word, and arithmetic operations and memory accesses have unit
|
||
|
cost. Randomized algorithms in the FKS model can construct a perfect hash
|
||
|
function in expected time~$O(n)$:
|
||
|
this is the case of our algorithm and the works in~\cite{chm92,p99}.
|
||
|
|
||
|
Many methods for generating minimal perfect hash functions use a
|
||
|
{\em mapping}, {\em ordering} and {\em searching}
|
||
|
(MOS) approach,
|
||
|
a description coined by Fox, Chen and Heath~\cite{fch92}.
|
||
|
In the MOS approach, the construction of a minimal perfect hash function
|
||
|
is accomplished in three steps.
|
||
|
First, the mapping step transforms the key set from the original universe
|
||
|
to a new universe.
|
||
|
Second, the ordering step places the keys in a sequential order that
|
||
|
determines the order in which hash values are assigned to keys.
|
||
|
Third, the searching step attempts to assign hash values to the keys.
|
||
|
Our algorithm and the algorithm presented in~\cite{chm92} use the
|
||
|
MOS approach.
|
||
|
|
||
|
Pagh~\cite{p99} proposed a family of randomized algorithms for
|
||
|
constructing minimal perfect hash functions.
|
||
|
The form of the resulting function is $h(x) = (f(x) + d_{g(x)}) \bmod n$,
|
||
|
where $f$ and $g$ are universal hash functions and $d$ is a set of
|
||
|
displacement values to resolve collisions that are caused by the function $f$.
|
||
|
Pagh identified a set of conditions concerning $f$ and $g$ and showed
|
||
|
that if these conditions are satisfied, then a minimal perfect hash
|
||
|
function can be computed in expected time $O(n)$ and stored in
|
||
|
$(2+\epsilon)n$ computer words.
|
||
|
Dietzfelbinger and Hagerup~\cite{dh01} improved~\cite{p99},
|
||
|
reducing from $(2+\epsilon)n$ to $(1+\epsilon)n$ the number of computer
|
||
|
words required to store the function, but in their approach~$f$ and~$g$ must
|
||
|
be chosen from a class
|
||
|
of hash functions that meet additional requirements.
|
||
|
Differently from the works in~\cite{p99,dh01}, our algorithm uses two
|
||
|
universal hash functions $h_1$ and $h_2$ randomly selected from a class
|
||
|
of universal hash functions that do not need to meet any additional
|
||
|
requirements.
|
||
|
|
||
|
The work in~\cite{chm92} presents an efficient and practical algorithm
|
||
|
for generating order preserving minimal perfect hash functions.
|
||
|
Their method involves the generation of acyclic random graphs
|
||
|
$G = (V, E)$ with~$|V|=cn$ and $|E|=n$, with $c \ge 2.09$.
|
||
|
They showed that an order preserving minimal perfect hash function
|
||
|
can be found in optimal time if~$G$ is acyclic.
|
||
|
To generate an acyclic graph, two vertices $h_1(x)$ and $h_2(x)$ are
|
||
|
computed for each key $x \in S$.
|
||
|
Thus, each set~$S$ has a corresponding graph~$G=(V,E)$, where $V=\{0,1,
|
||
|
\ldots,t\}$ and $E=\big\{\{h_1(x),h_2(x)\}:x \in S\big\}$.
|
||
|
In order to guarantee the acyclicity of~$G$, the algorithm repeatedly selects
|
||
|
$h_1$ and $h_2$ from a family of universal hash functions
|
||
|
until the corresponding graph is acyclic.
|
||
|
Havas et al.~\cite{hmwc93} proved that if $|V(G)|=cn$ and $c>2$,
|
||
|
then the probability that~$G$ is acyclic is $p=e^{1/c}\sqrt{(c-2)/c}$.
|
||
|
For $c=2.09$, this probability is
|
||
|
$p \simeq 0.342$, and
|
||
|
the expected number of iterations to obtain an acyclic graph
|
||
|
is~$1/p \simeq 2.92$.
|