\title{Space and Time Efficient Minimal Perfect Hash \\[0.2cm]
Functions for Very Large Databases\thanks{
This work was supported in part by
GERINDO Project--grant MCT/CNPq/CT-INFO 552.087/02-5,
CAPES/PROF Scholarship (Fabiano C. Botelho),
FAPESP Proj.\ Tem.\ 03/09925-5 and CNPq Grant 30.0334/93-1
(Yoshiharu Kohayakawa),
and CNPq Grant 30.5237/02-0 (Nivio Ziviani).}
\author{Fabiano C. Botelho \and Davi C. Reis \and Yoshiharu Kohayakawa \and Nivio Ziviani}
F. C. Botelho \and
N. Ziviani \at
Dept. of Computer Science,
Federal Univ. of Minas Gerais,
Belo Horizonte, Brazil\\
D. C. Reis \at
Google, Brazil \\
Y. Kohayakawa
Dept. of Computer Science,
Univ. of S\~ao Paulo,
S\~ao Paulo, Brazil\\
We propose a novel external memory based algorithm for constructing minimal
perfect hash functions~$h$ for huge sets of keys.
For a set of~$n$ keys, our algorithm outputs~$h$ in time~$O(n)$.
The algorithm needs a small vector of one byte entries
in main memory to construct $h$.
The evaluation of~$h(x)$ requires three memory accesses for any key~$x$.
The description of~$h$ takes a constant number of bits
for each key, which is optimal, i.e., the theoretical lower bound is $1/\ln 2$
bits per key.
In our experiments, we used a collection of 1 billion URLs collected
from the web, each URL 64 characters long on average.
For this collection, our algorithm
(i) finds a minimal perfect hash function in approximately
3 hours using a commodity PC,
(ii) needs just 5.45 megabytes of internal memory to generate $h$
and (iii) takes 8.1 bits per key for the description of~$h$.
