\title{Space and Time Efficient Minimal Perfect Hash \\[0.2cm] Functions for Very Large Databases\thanks{ This work was supported in part by GERINDO Project--grant MCT/CNPq/CT-INFO 552.087/02-5, CAPES/PROF Scholarship (Fabiano C. Botelho), FAPESP Proj.\ Tem.\ 03/09925-5 and CNPq Grant 30.0334/93-1 (Yoshiharu Kohayakawa), and CNPq Grant 30.5237/02-0 (Nivio Ziviani).} }

\author{Fabiano C. Botelho \and Davi C. Reis \and Yoshiharu Kohayakawa \and Nivio Ziviani}

\institute{ F. C. Botelho \and N. Ziviani \at Dept. of Computer Science, Federal Univ. of Minas Gerais, Belo Horizonte, Brazil\\ \email{\{fbotelho,nivio\}@dcc.ufmg.br} \and D. C. Reis \at Google, Brazil \\ \email{davi.reis@gmail.com} \and Y. Kohayakawa Dept. of Computer Science, Univ. of S\~ao Paulo, S\~ao Paulo, Brazil\\ \email{yoshi@ime.usp.br} }

\begin{abstract} We propose a novel external memory based algorithm for constructing minimal perfect hash functions~$h$ for huge sets of keys. For a set of~$n$ keys, our algorithm outputs~$h$ in time~$O(n)$. The algorithm needs a small vector of one byte entries in main memory to construct $h$. The evaluation of~$h(x)$ requires three memory accesses for any key~$x$. The description of~$h$ takes a constant number of bits for each key, which is optimal, i.e., the theoretical lower bound is $1/\ln 2$ bits per key. In our experiments, we used a collection of 1 billion URLs collected from the web, each URL 64 characters long on average. For this collection, our algorithm (i) finds a minimal perfect hash function in approximately 3 hours using a commodity PC, (ii) needs just 5.45 megabytes of internal memory to generate $h$ and (iii) takes 8.1 bits per key for the description of~$h$. \keywords{Minimal Perfect Hashing \and Large Databases} \end{abstract}