vldb07 directory removed
This commit is contained in:
parent
7835d6b3e6
commit
c3cece4173
|
@ -1,7 +0,0 @@
|
||||||
\section{Acknowledgments}
|
|
||||||
This section is optional; it is a location for you
|
|
||||||
to acknowledge grants, funding, editing assistance and
|
|
||||||
what have you. In the present case, for example, the
|
|
||||||
authors would like to thank Gerald Murray of ACM for
|
|
||||||
his help in codifying this \textit{Author's Guide}
|
|
||||||
and the \textbf{.cls} and \textbf{.tex} files that it describes.
|
|
|
@ -1,174 +0,0 @@
|
||||||
%% Nivio: 23/jan/06 29/jan/06
|
|
||||||
% Time-stamp: <Monday 30 Jan 2006 03:56:47am EDT yoshi@ime.usp.br>
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
\section{Analytical results}
|
|
||||||
\label{sec:analytcal-results}
|
|
||||||
|
|
||||||
\vspace{-1mm}
|
|
||||||
The purpose of this section is fourfold.
|
|
||||||
First, we show that our algorithm runs in expected time $O(n)$.
|
|
||||||
Second, we present the main memory requirements for constructing the MPHF.
|
|
||||||
Third, we discuss the cost of evaluating the resulting MPHF.
|
|
||||||
Fourth, we present the space required to store the resulting MPHF.
|
|
||||||
|
|
||||||
\vspace{-2mm}
|
|
||||||
\subsection{The linear time complexity}
|
|
||||||
\label{sec:linearcomplexity}
|
|
||||||
|
|
||||||
First, we show that the partitioning step presented in
|
|
||||||
Figure~\ref{fig:partitioningstep} runs in $O(n)$ time.
|
|
||||||
Each iteration of the {\bf for} loop in statement~1
|
|
||||||
runs in $O(|B_j|)$ time, $1 \leq j \leq N$, where $|B_j|$ is the
|
|
||||||
number of keys
|
|
||||||
that fit in block $B_j$ of size $\mu$. This is because statement 1.1 just reads
|
|
||||||
$|B_j|$ keys from disk, statement 1.2 runs a bucket sort like algorithm
|
|
||||||
that is well known to be linear in the number of keys it sorts (i.e., $|B_j|$ keys),
|
|
||||||
and statement 1.3 just dumps $|B_j|$ keys to the disk into File $j$.
|
|
||||||
Thus, the {\bf for} loop runs in $O(\sum_{j=1}^{N}|B_j|)$ time.
|
|
||||||
As $\sum_{j=1}^{N}|B_j|=n$, then the partitioning step runs in $O(n)$ time.
|
|
||||||
|
|
||||||
Second, we show that the searching step presented in
|
|
||||||
Figure~\ref{fig:searchingstep} also runs in $O(n)$ time.
|
|
||||||
The heap construction in statement 1 runs in $O(N)$ time, for $N \ll n$.
|
|
||||||
We have assumed that insertions and deletions in the heap cost $O(1)$ because
|
|
||||||
$N$ is typically much smaller than $n$ (see \cite[Section 6.4]{bkz06t} for details).
|
|
||||||
Statement 2 runs in $O(\sum_{i=0}^{\lceil n/b \rceil - 1} \mathit{size}[i])$ time
|
|
||||||
(remember that $\mathit{size}[i]$ stores the number of keys in bucket $i$).
|
|
||||||
As $\sum_{i=0}^{\lceil n/b \rceil - 1} \mathit{size}[i] = n$, if
|
|
||||||
statements 2.1, 2.2 and 2.3 run in $O(\mathit{size}[i])$ time, then statement 2
|
|
||||||
runs in $O(n)$ time.
|
|
||||||
|
|
||||||
%Statement 2.1 runs the algorithm to read a bucket from disk. That algorithm reads $\mathit{size}[i]$
|
|
||||||
%keys of bucket $i$ that might be spread into many files or, in the worst case,
|
|
||||||
%into $|BS_{max}|$ files, where $|BS_{max}|$ is the number of keys in the bucket of maximum size.
|
|
||||||
%It uses the heap $H$ to drive a multiway merge of the sprayed bucket $i$.
|
|
||||||
%As we are considering that each read/write on disk costs $O(1)$ and
|
|
||||||
%each heap operation also costs $O(1)$ (recall $N \ll n$), then statement 2.1
|
|
||||||
%costs $O(\mathit{size}[i])$ time.
|
|
||||||
%We need to take into account that this step could generate a lot of seeks on disk.
|
|
||||||
%However, the number of seeks can be amortized (see Section~\ref{sec:contr-disk-access})
|
|
||||||
%and that is why we have been able of getting a MPHF for a set of 1 billion keys in less
|
|
||||||
%than 4 hours using a machine with just 500 MB of main memory
|
|
||||||
%(see Section~\ref{sec:performance}).
|
|
||||||
Statement 2.1 reads $O(\mathit{size}[i])$ keys of bucket $i$
|
|
||||||
and is detailed in Figure~\ref{fig:readingbucket}.
|
|
||||||
As we are assuming that each read or write on disk costs $O(1)$ and
|
|
||||||
each heap operation also costs $O(1)$, statement~2.1
|
|
||||||
takes $O(\mathit{size}[i])$ time.
|
|
||||||
However, the keys of bucket $i$ are distributed in at most~$BS_{max}$ files on disk
|
|
||||||
in the worst case
|
|
||||||
(recall that $BS_{max}$ is the maximum number of keys found in any bucket).
|
|
||||||
Therefore, we need to take into account that
|
|
||||||
the critical step in reading a bucket is in statement~1.3 of Figure~\ref{fig:readingbucket},
|
|
||||||
where a seek operation in File $j$
|
|
||||||
may be performed by the first read operation.
|
|
||||||
|
|
||||||
In order to amortize the number of seeks performed we use a buffering technique~\cite{k73}.
|
|
||||||
We create a buffer $j$ of size \textbaht$\: = \mu/N$ for each file $j$,
|
|
||||||
where $1\leq j \leq N$
|
|
||||||
(recall that $\mu$ is the size in bytes of an a priori reserved internal memory area).
|
|
||||||
Every time a read operation is requested to file $j$ and the data is not found
|
|
||||||
in the $j$th~buffer, \textbaht~bytes are read from file $j$ to buffer $j$.
|
|
||||||
Hence, the number of seeks performed in the worst case is given by
|
|
||||||
$\beta/$\textbaht~(remember that $\beta$ is the size in bytes of $S$).
|
|
||||||
For that we have made the pessimistic assumption that one seek happens every time
|
|
||||||
buffer $j$ is filled in.
|
|
||||||
Thus, the number of seeks performed in the worst case is $64n/$\textbaht, since
|
|
||||||
each URL is 64 bytes long on average. Therefore, the number of seeks is linear on
|
|
||||||
$n$ and amortized by \textbaht.
|
|
||||||
|
|
||||||
It is important to emphasize two things.
|
|
||||||
First, the operating system uses techniques
|
|
||||||
to diminish the number of seeks and the average seek time.
|
|
||||||
This makes the amortization factor to be greater than \textbaht~in practice.
|
|
||||||
Second, almost all main memory is available to be used as
|
|
||||||
file buffers because just a small vector
|
|
||||||
of $\lceil n/b\rceil$ one-byte entries must be maintained in main memory,
|
|
||||||
as we show in Section~\ref{sec:memconstruction}.
|
|
||||||
|
|
||||||
|
|
||||||
Statement 2.2 runs our internal memory based algorithm in order to generate a MPHF for each bucket.
|
|
||||||
That algorithm is linear, as we showed in~\cite{bkz05}. As it is applied to buckets with {\it size}$[i]$ keys, statement~2.2 takes $O(\mathit{size}[i])$ time.
|
|
||||||
|
|
||||||
Statement 2.3 has time complexity $O(\mathit{size}[i])$ because it writes to disk
|
|
||||||
the description of each generated MPHF and each description is stored in
|
|
||||||
$c \times \mathit{size}[i] + O(1)$ bytes, where $c\in[0.93,1.15]$.
|
|
||||||
In conclusion, our algorithm takes $O(n)$ time because both the partitioning and
|
|
||||||
the searching steps run in $O(n)$ time.
|
|
||||||
|
|
||||||
An experimental validation of the above proof and a performance comparison with
|
|
||||||
our internal memory based algorithm~\cite{bkz05} were not included here due to
|
|
||||||
space restrictions but can be found in~\cite{bkz06t} and also in the appendix.
|
|
||||||
|
|
||||||
\vspace{-1mm}
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
\subsection{Space used for constructing a MPHF}
|
|
||||||
\label{sec:memconstruction}
|
|
||||||
|
|
||||||
The vector {\it size} is kept in main memory
|
|
||||||
all the time.
|
|
||||||
The vector {\it size} has $\lceil n/b \rceil$ one-byte entries.
|
|
||||||
It stores the number of keys in each bucket and
|
|
||||||
those values are less than or equal to 256.
|
|
||||||
For example, for a set of 1 billion keys and $b=175$ the vector size needs
|
|
||||||
$5.45$ megabytes of main memory.
|
|
||||||
|
|
||||||
We need an internal memory area of size $\mu$ bytes to be used in
|
|
||||||
the partitioning step and in the searching step.
|
|
||||||
The size $\mu$ is fixed a priori and depends only on the amount
|
|
||||||
of internal memory available to run the algorithm
|
|
||||||
(i.e., it does not depend on the size $n$ of the problem).
|
|
||||||
|
|
||||||
% One could argue about the a priori reserved internal memory area and the main memory
|
|
||||||
% required to run the indirect bucket sort algorithm.
|
|
||||||
% Those internal memory requirements do not depend on the size of the problem
|
|
||||||
% (i.e., the number of keys being hashed) and can be fixed a priori.
|
|
||||||
|
|
||||||
The additional space required in the searching step
|
|
||||||
is constant, once the problem was broken down
|
|
||||||
into several small problems (at most 256 keys) and
|
|
||||||
the heap size is supposed to be much smaller than $n$ ($N \ll n$).
|
|
||||||
For example, for a set of 1 billion keys and an internal area of~$\mu = 250$ megabytes,
|
|
||||||
the number of files is $N = 248$.
|
|
||||||
|
|
||||||
\vspace{-1mm}
|
|
||||||
\subsection{Evaluation cost of the MPHF}
|
|
||||||
|
|
||||||
Now we consider the amount of CPU time
|
|
||||||
required by the resulting MPHF at retrieval time.
|
|
||||||
The MPHF requires for each key the computation of three
|
|
||||||
universal hash functions and three memory accesses
|
|
||||||
(see Eqs.~(\ref{eq:mphf}), (\ref{eq:bucketindex}) and (\ref{eq:mphfi})).
|
|
||||||
This is not optimal. Pagh~\cite{p99} showed that any MPHF requires
|
|
||||||
at least the computation of two universal hash functions and one memory
|
|
||||||
access.
|
|
||||||
|
|
||||||
\subsection{Description size of the MPHF}
|
|
||||||
|
|
||||||
The number of bits required to store the MPHF generated by the algorithm
|
|
||||||
is computed by Eq.~(\ref{eq:newmphfbits}).
|
|
||||||
We need to store each $g_i$ vector presented in Eq.~(\ref{eq:mphfi}), where
|
|
||||||
$0\leq i < \lceil n/b \rceil$. As each bucket has at most 256 keys, each
|
|
||||||
entry in a $g_i$ vector has 8~bits. In each $g_i$ vector there are
|
|
||||||
$c \times \mathit{size}[i]$ entries (recall $c\in[0.93, 1.15]$).
|
|
||||||
When we sum up the number of entries of $\lceil n/b \rceil$ $g_i$ vectors we have
|
|
||||||
$c\sum_{i=0}^{\lceil n/b \rceil -1} \mathit{size}[i]=cn$ entries. We also need to
|
|
||||||
store $3 \lceil n/b \rceil$ integer numbers of
|
|
||||||
$\log_2n$ bits referring respectively to the {\it offset} vector and the two random seeds of
|
|
||||||
$h_{1i}$ and $h_{2i}$. In addition, we need to store $\lceil n/b \rceil$ 8-bit entries of
|
|
||||||
the vector {\it size}. Therefore,
|
|
||||||
\begin{eqnarray}\label{eq:newmphfbits}
|
|
||||||
\mathrm{Required\: Space} = 8cn + \frac{n}{b}\left( 3\log_2n +8\right) \:
|
|
||||||
\mathrm{bits}.
|
|
||||||
\end{eqnarray}
|
|
||||||
|
|
||||||
Considering $c=0.93$ and $b=175$, the number of bits per key to store
|
|
||||||
the description of the resulting MPHF for a set of 1~billion keys is $8.1$.
|
|
||||||
If we set $b=128$, then the bits per key ratio increases to $8.3$.
|
|
||||||
Theoretically, the number of bits required to store the MPHF in
|
|
||||||
Eq.~(\ref{eq:newmphfbits})
|
|
||||||
is $O(n\log n)$ as~$n\to\infty$. However, for sets of size up to $2^{b/3}$ keys
|
|
||||||
the number of bits per key is lower than 9~bits (note that
|
|
||||||
$2^{b/3}>2^{58}>10^{17}$ for $b=175$).
|
|
||||||
%For $b=175$, the number of bits per key will be close to 9 for a set of $2^{58}$ keys.
|
|
||||||
Thus, in practice the resulting function is stored in $O(n)$ bits.
|
|
|
@ -1,6 +0,0 @@
|
||||||
\appendix
|
|
||||||
\input{experimentalresults}
|
|
||||||
\input{thedataandsetup}
|
|
||||||
\input{costhashingbuckets}
|
|
||||||
\input{performancenewalgorithm}
|
|
||||||
\input{diskaccess}
|
|
|
@ -1,42 +0,0 @@
|
||||||
% Time-stamp: <Monday 30 Jan 2006 12:38:06am EST yoshi@flare>
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
\section{Concluding remarks}
|
|
||||||
\label{sec:concuding-remarks}
|
|
||||||
|
|
||||||
This paper has presented a novel external memory based algorithm for
|
|
||||||
constructing MPHFs that works for sets in the order of billions of keys. The
|
|
||||||
algorithm outputs the resulting function in~$O(n)$ time and, furthermore, it
|
|
||||||
can be tuned to run only $34\%$ slower (see \cite{bkz06t} for details) than the fastest
|
|
||||||
algorithm available in the literature for constructing MPHFs~\cite{bkz05}.
|
|
||||||
In addition, the space
|
|
||||||
requirement of the resulting MPHF is of up to 9 bits per key for datasets of
|
|
||||||
up to $2^{58}\simeq10^{17.4}$ keys.
|
|
||||||
|
|
||||||
The algorithm is simple and needs just a
|
|
||||||
small vector of size approximately 5.45 megabytes in main memory to construct
|
|
||||||
a MPHF for a collection of 1 billion URLs, each one 64 bytes long on average.
|
|
||||||
Therefore, almost all main memory is available to be used as disk I/O buffer.
|
|
||||||
Making use of such a buffering scheme considering an internal memory area of size
|
|
||||||
$\mu=200$ megabytes, our algorithm can produce a MPHF for a
|
|
||||||
set of 1 billion URLs in approximately 3.6 hours using a commodity PC of 2.4 gigahertz and
|
|
||||||
500 megabytes of main memory.
|
|
||||||
If we increase both the main memory
|
|
||||||
available to 1 gigabyte and the internal memory area to $\mu=500$ megabytes,
|
|
||||||
a MPHF for the set of 1 billion URLs is produced in approximately 3 hours. For any
|
|
||||||
key, the evaluation of the resulting MPHF takes three memory accesses and the
|
|
||||||
computation of three universal hash functions.
|
|
||||||
|
|
||||||
In order to allow the reproduction of our results and the utilization of both the internal memory
|
|
||||||
based algorithm and the external memory based algorithm,
|
|
||||||
the algorithms are available at \texttt{http://cmph.sf.net}
|
|
||||||
under the GNU Lesser General Public License (LGPL).
|
|
||||||
They were implemented in the C language.
|
|
||||||
|
|
||||||
In future work, we will exploit the fact that the searching step intrinsically
|
|
||||||
presents a high degree of parallelism and requires $73\%$ of the
|
|
||||||
construction time. Therefore, a parallel implementation of our algorithm will
|
|
||||||
allow the construction and the evaluation of the resulting function in parallel.
|
|
||||||
Therefore, the description of the resulting MPHFs will be distributed in the paralell
|
|
||||||
computer allowing the scalability to sets of hundreds of billions of keys.
|
|
||||||
This is an important contribution, mainly for applications related to the Web, as
|
|
||||||
mentioned in Section~\ref{sec:intro}.
|
|
|
@ -1,177 +0,0 @@
|
||||||
% Nivio: 29/jan/06
|
|
||||||
% Time-stamp: <Monday 30 Jan 2006 12:37:22am EST yoshi@flare>
|
|
||||||
\vspace{-2mm}
|
|
||||||
\subsection{Performance of the internal memory based algorithm}
|
|
||||||
\label{sec:intern-memory-algor}
|
|
||||||
|
|
||||||
%\begin{table*}[htb]
|
|
||||||
%\begin{center}
|
|
||||||
%{\scriptsize
|
|
||||||
%\begin{tabular}{|c|c|c|c|c|c|c|c|}
|
|
||||||
%\hline
|
|
||||||
%$n$ (millions) & 1 & 2 & 4 & 8 & 16 & 32 \\
|
|
||||||
%\hline
|
|
||||||
%Average time (s)& $6.1 \pm 0.3$ & $12.2 \pm 0.6$ & $25.4 \pm 1.1$ & $51.4 \pm 2.0$ & $117.3 \pm 4.4$ & $262.2 \pm 8.7$\\
|
|
||||||
%SD (s) & $2.6$ & $5.4$ & $9.8$ & $17.6$ & $37.3$ & $76.3$ \\
|
|
||||||
%\hline
|
|
||||||
%\end{tabular}
|
|
||||||
%\vspace{-3mm}
|
|
||||||
%}
|
|
||||||
%\end{center}
|
|
||||||
%\caption{Internal memory based algorithm: average time in seconds for constructing a MPHF,
|
|
||||||
%the standard deviation (SD), and the confidence intervals considering
|
|
||||||
%a confidence level of $95\%$.}
|
|
||||||
%\label{tab:medias}
|
|
||||||
%\end{table*}
|
|
||||||
|
|
||||||
Our three-step internal memory based algorithm presented in~\cite{bkz05}
|
|
||||||
is used for constructing a MPHF for each bucket.
|
|
||||||
It is a randomized algorithm because it needs to generate a simple random graph
|
|
||||||
in its first step.
|
|
||||||
Once the graph is obtained the other two steps are deterministic.
|
|
||||||
|
|
||||||
Thus, we can consider the runtime of the algorithm to have the form~$\alpha
|
|
||||||
nZ$ for an input of~$n$ keys, where~$\alpha$ is some machine dependent
|
|
||||||
constant that further depends on the length of the keys and~$Z$ is a random
|
|
||||||
variable with geometric distribution with mean~$1/p=e^{1/c^2}$ (see
|
|
||||||
Section~\ref{sec:mphfbucket}). All results in our experiments were obtained
|
|
||||||
taking $c=1$; the value of~$c$, with~$c\in[0.93,1.15]$, in fact has little
|
|
||||||
influence in the runtime, as shown in~\cite{bkz05}.
|
|
||||||
|
|
||||||
The values chosen for $n$ were $1, 2, 4, 8, 16$ and $32$ million.
|
|
||||||
Although we have a dataset with 1~billion URLs, on a PC with
|
|
||||||
1~gigabyte of main memory, the algorithm is able
|
|
||||||
to handle an input with at most 32 million keys.
|
|
||||||
This is mainly because of the graph we need to keep in main memory.
|
|
||||||
The algorithm requires $25n + O(1)$ bytes for constructing
|
|
||||||
a MPHF (details about the data structures used by the algorithm can
|
|
||||||
be found in~\texttt{http://cmph.sf.net}.
|
|
||||||
% for the details about the data structures
|
|
||||||
%used by the algorithm).
|
|
||||||
|
|
||||||
In order to estimate the number of trials for each value of $n$ we use
|
|
||||||
a statistical method for determining a suitable sample size (see, e.g.,
|
|
||||||
\cite[Chapter 13]{j91}).
|
|
||||||
As we obtained different values for each $n$,
|
|
||||||
we used the maximal value obtained, namely, 300~trials in order to have
|
|
||||||
a confidence level of $95\%$.
|
|
||||||
|
|
||||||
% \begin{figure*}[ht]
|
|
||||||
% \noindent
|
|
||||||
% \begin{minipage}[b]{0.5\linewidth}
|
|
||||||
% \centering
|
|
||||||
% \subfigure[The previous algorithm]
|
|
||||||
% {\scalebox{0.5}{\includegraphics{figs/bmz_temporegressao.eps}}}
|
|
||||||
% \end{minipage}
|
|
||||||
% \hfill
|
|
||||||
% \begin{minipage}[b]{0.5\linewidth}
|
|
||||||
% \centering
|
|
||||||
% \subfigure[The new algorithm]
|
|
||||||
% {\scalebox{0.5}{\includegraphics{figs/brz_temporegressao.eps}}}
|
|
||||||
% \end{minipage}
|
|
||||||
% \caption{Time versus number of keys in $S$. The solid line corresponds to
|
|
||||||
% a linear regression model.}
|
|
||||||
% %obtained from the experimental measurements.}
|
|
||||||
% \label{fig:temporegressao}
|
|
||||||
% \end{figure*}
|
|
||||||
|
|
||||||
Table~\ref{tab:medias} presents the runtime average for each $n$,
|
|
||||||
the respective standard deviations, and
|
|
||||||
the respective confidence intervals given by
|
|
||||||
the average time $\pm$ the distance from average time
|
|
||||||
considering a confidence level of $95\%$.
|
|
||||||
Observing the runtime averages one sees that
|
|
||||||
the algorithm runs in expected linear time,
|
|
||||||
as shown in~\cite{bkz05}.
|
|
||||||
|
|
||||||
\vspace{-2mm}
|
|
||||||
\begin{table*}[htb]
|
|
||||||
\begin{center}
|
|
||||||
{\scriptsize
|
|
||||||
\begin{tabular}{|c|c|c|c|c|c|c|c|}
|
|
||||||
\hline
|
|
||||||
$n$ (millions) & 1 & 2 & 4 & 8 & 16 & 32 \\
|
|
||||||
\hline
|
|
||||||
Average time (s)& $6.1 \pm 0.3$ & $12.2 \pm 0.6$ & $25.4 \pm 1.1$ & $51.4 \pm 2.0$ & $117.3 \pm 4.4$ & $262.2 \pm 8.7$\\
|
|
||||||
SD (s) & $2.6$ & $5.4$ & $9.8$ & $17.6$ & $37.3$ & $76.3$ \\
|
|
||||||
\hline
|
|
||||||
\end{tabular}
|
|
||||||
\vspace{-1mm}
|
|
||||||
}
|
|
||||||
\end{center}
|
|
||||||
\caption{Internal memory based algorithm: average time in seconds for constructing a MPHF,
|
|
||||||
the standard deviation (SD), and the confidence intervals considering
|
|
||||||
a confidence level of $95\%$.}
|
|
||||||
\label{tab:medias}
|
|
||||||
\vspace{-4mm}
|
|
||||||
\end{table*}
|
|
||||||
|
|
||||||
% \enlargethispage{\baselineskip}
|
|
||||||
% \begin{table*}[htb]
|
|
||||||
% \begin{center}
|
|
||||||
% {\scriptsize
|
|
||||||
% (a)
|
|
||||||
% \begin{tabular}{|c|c|c|c|c|c|c|c|}
|
|
||||||
% \hline
|
|
||||||
% $n$ (millions) & 1 & 2 & 4 & 8 & 16 & 32 \\
|
|
||||||
% \hline
|
|
||||||
% Average time (s)& $6.119 \pm 0.300$ & $12.190 \pm 0.615$ & $25.359 \pm 1.109$ & $51.408 \pm 2.003$ & $117.343 \pm 4.364$ & $262.215 \pm 8.724$\\
|
|
||||||
% SD (s) & $2.644$ & $5.414$ & $9.757$ & $17.627$ & $37.333$ & $76.271$ \\
|
|
||||||
% \hline
|
|
||||||
% \end{tabular}
|
|
||||||
% \\[5mm] (b)
|
|
||||||
% \begin{tabular}{|l|c|c|c|c|c|}
|
|
||||||
% \hline
|
|
||||||
% $n$ (millions) & 1 & 2 & 4 & 8 & 16 \\
|
|
||||||
% \hline % Part. 16 \% 16 \% 16 \% 18 \% 20\%
|
|
||||||
% Average time (s) & $6.927 \pm 0.309$ & $13.828 \pm 0.175$ & $31.936 \pm 0.663$ & $69.902 \pm 1.084$ & $140.617 \pm 2.502$ \\
|
|
||||||
% SD & $0.431$ & $0.245$ & $0.926$ & $1.515$ & $3.498$ \\
|
|
||||||
% \hline
|
|
||||||
% \hline
|
|
||||||
% $n$ (millions) & 32 & 64 & 128 & 512 & 1000 \\
|
|
||||||
% \hline % Part. 20 \% 20\% 20\% 18\% 18\%
|
|
||||||
% Average time (s) & $284.330 \pm 1.135$ & $587.880 \pm 3.945$ & $1223.581 \pm 4.864$ & $5966.402 \pm 9.465$ & $13229.540 \pm 12.670$ \\
|
|
||||||
% SD & $1.587$ & $5.514$ & $6.800$ & $13.232$ & $18.577$ \\
|
|
||||||
% \hline
|
|
||||||
% \end{tabular}
|
|
||||||
% }
|
|
||||||
% \end{center}
|
|
||||||
% \caption{The runtime averages in seconds,
|
|
||||||
% the standard deviation (SD), and
|
|
||||||
% the confidence intervals given by the average time $\pm$
|
|
||||||
% the distance from average time considering
|
|
||||||
% a confidence level of $95\%$.}
|
|
||||||
% \label{tab:medias}
|
|
||||||
% \end{table*}
|
|
||||||
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
Figure~\ref{fig:bmz_temporegressao}
|
|
||||||
presents the runtime for each trial. In addition,
|
|
||||||
the solid line corresponds to a linear regression model
|
|
||||||
obtained from the experimental measurements.
|
|
||||||
As we can see, the runtime for a given $n$ has a considerable
|
|
||||||
fluctuation. However, the fluctuation also grows linearly with $n$.
|
|
||||||
|
|
||||||
\begin{figure}[htb]
|
|
||||||
\vspace{-2mm}
|
|
||||||
\begin{center}
|
|
||||||
\scalebox{0.4}{\includegraphics{figs/bmz_temporegressao}}
|
|
||||||
\caption{Time versus number of keys in $S$ for the internal memory based algorithm.
|
|
||||||
The solid line corresponds to a linear regression model.}
|
|
||||||
\label{fig:bmz_temporegressao}
|
|
||||||
\end{center}
|
|
||||||
\vspace{-6mm}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
The observed fluctuation in the runtimes is as expected; recall that this
|
|
||||||
runtime has the form~$\alpha nZ$ with~$Z$ a geometric random variable with
|
|
||||||
mean~$1/p=e$. Thus, the runtime has mean~$\alpha n/p=\alpha en$ and standard
|
|
||||||
deviation~$\alpha n\sqrt{(1-p)/p^2}=\alpha n\sqrt{e(e-1)}$.
|
|
||||||
Therefore, the standard deviation also grows
|
|
||||||
linearly with $n$, as experimentally verified
|
|
||||||
in Table~\ref{tab:medias} and in Figure~\ref{fig:bmz_temporegressao}.
|
|
||||||
|
|
||||||
%\noindent-------------------------------------------------------------------------\\
|
|
||||||
%Comentario para Yoshi: Nao consegui reproduzir bem o que discutimos
|
|
||||||
%no paragrafo acima, acho que vc conseguira justificar melhor :-). \\
|
|
||||||
%-------------------------------------------------------------------------\\
|
|
|
@ -1,146 +0,0 @@
|
||||||
% Nivio: 29/jan/06
|
|
||||||
% Time-stamp: <Monday 30 Jan 2006 04:01:40am EDT yoshi@ime.usp.br>
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
\subsection{Determining~$b$}
|
|
||||||
\label{sec:determining-b}
|
|
||||||
\begin{table*}[t]
|
|
||||||
\begin{center}
|
|
||||||
{\small %\scriptsize
|
|
||||||
\begin{tabular}{|c|ccc|ccc|}
|
|
||||||
\hline
|
|
||||||
\raisebox{-0.7em}{$n$} & \multicolumn{3}{c|}{\raisebox{-1mm}{b=128}} &
|
|
||||||
\multicolumn{3}{c|}{\raisebox{-1mm}{b=175}}\\
|
|
||||||
\cline{2-4} \cline{5-7}
|
|
||||||
& \raisebox{-0.5mm}{Worst Case} & \raisebox{-0.5mm}{Average} &\raisebox{-0.5mm}{Eq.~(\ref{eq:maxbs})}
|
|
||||||
& \raisebox{-0.5mm}{Worst Case} & \raisebox{-0.5mm}{Average} &\raisebox{-0.5mm}{Eq.~(\ref{eq:maxbs})} \\
|
|
||||||
\hline
|
|
||||||
$1.0 \times 10^6$ & 177 & 172.0 & 176 & 232 & 226.6 & 230 \\
|
|
||||||
%$2.0 \times 10^6$ & 179 & 174.0 & 178 & 236 & 228.5 & 232 \\
|
|
||||||
$4.0 \times 10^6$ & 182 & 177.5 & 179 & 241 & 231.8 & 234 \\
|
|
||||||
%$8.0 \times 10^6$ & 186 & 181.6 & 181 & 238 & 234.2 & 236 \\
|
|
||||||
$1.6 \times 10^7$ & 184 & 181.6 & 183 & 241 & 236.1 & 238 \\
|
|
||||||
%$3.2 \times 10^7$ & 191 & 183.9 & 184 & 240 & 236.6 & 240 \\
|
|
||||||
$6.4 \times 10^7$ & 195 & 185.2 & 186 & 244 & 239.0 & 242 \\
|
|
||||||
%$1.28 \times 10^8$ & 193 & 187.7 & 187 & 244 & 239.7 & 244 \\
|
|
||||||
$5.12 \times 10^8$ & 196 & 191.7 & 190 & 251 & 246.3 & 247 \\
|
|
||||||
$1.0 \times 10^9$ & 197 & 191.6 & 192 & 253 & 248.9 & 249 \\
|
|
||||||
\hline
|
|
||||||
\end{tabular}
|
|
||||||
\vspace{-1mm}
|
|
||||||
}
|
|
||||||
\end{center}
|
|
||||||
\caption{Values for $\mathit{BS}_{\mathit{max}}$: worst case and average case obtained in the experiments and using Eq.~(\ref{eq:maxbs}),
|
|
||||||
considering $b=128$ and $b=175$ for different number $n$ of keys in $S$.}
|
|
||||||
\label{tab:comparison}
|
|
||||||
\vspace{-6mm}
|
|
||||||
\end{table*}
|
|
||||||
|
|
||||||
The partitioning step can be viewed as the well known ``balls into bins''
|
|
||||||
problem~\cite{ra98,dfm02} where~$n$ keys (the balls) are placed independently and
|
|
||||||
uniformly into $\lceil n/b\rceil$ buckets (the bins). The main question related to that problem we are interested
|
|
||||||
in is: what is the maximum number of keys in any bucket?
|
|
||||||
In fact, we want to get the maximum value for $b$ that makes the maximum number of keys in any bucket
|
|
||||||
no greater than 256 with high probability.
|
|
||||||
This is important, as we wish to use 8 bits per entry in the vector $g_i$ of
|
|
||||||
each $\mathrm{MPHF}_i$,
|
|
||||||
where $0 \leq i < \lceil n/b\rceil$.
|
|
||||||
Let $\mathit{BS}_{\mathit{max}}$ be the maximum number of keys in any bucket.
|
|
||||||
|
|
||||||
Clearly, $\BSmax$ is the maximum
|
|
||||||
of~$\lceil n/b\rceil$ random variables~$Z_i$, each with binomial
|
|
||||||
distribution~$\Bi(n,p)$ with parameters~$n$ and~$p=1/\lceil n/b\rceil$.
|
|
||||||
However, the~$Z_i$ are not independent. Note that~$\Bi(n,p)$ has mean and
|
|
||||||
variance~$\simeq b$. To give an upper estimate for the probability of the
|
|
||||||
event~$\BSmax\geq \gamma$, we can estimate the probability that we have~$Z_i\geq \gamma$
|
|
||||||
for a fixed~$i$, and then sum these estimates over all~$i$.
|
|
||||||
Let~$\gamma=b+\sigma\sqrt{b\ln(n/b)}$, where~$\sigma=\sqrt2$.
|
|
||||||
Approximating~$\Bi(n,p)$ by the normal distribution with mean and
|
|
||||||
variance~$b$, we obtain the
|
|
||||||
estimate~$(\sigma\sqrt{2\pi\ln(n/b)})^{-1}\times\exp(-(1/2)\sigma^2\ln(n/b))$ for
|
|
||||||
the probability that~$Z_i\geq \gamma$ occurs, which, summed over all~$i$, gives
|
|
||||||
that the probability that~$\BSmax\geq \gamma$ occurs is at
|
|
||||||
most~$1/(\sigma\sqrt{2\pi\ln(n/b)})$, which tends to~$0$ as~$n\to\infty$.
|
|
||||||
Thus, we have shown that, with high probability,
|
|
||||||
\begin{equation}
|
|
||||||
\label{eq:maxbs}
|
|
||||||
\BSmax\leq b+\sqrt{2b\ln{n\over b}}.
|
|
||||||
\end{equation}
|
|
||||||
|
|
||||||
% The traditional approach used to estimate $\mathit{BS}_{\mathit{max}}$ with high probability is
|
|
||||||
% to consider $\mathit{BS}_{\mathit{max}}$ as a random variable that follows a binomial distribution
|
|
||||||
% that can be approximated by a poisson distribution. This yields a good approximation
|
|
||||||
% when the number of balls is lower than or equal to the number of bins~\cite{g81}. In our case,
|
|
||||||
% the number of balls is greater than the number of buckets.
|
|
||||||
% % and that is why we have used more recent works to estimate $\mathit{BS}_{\mathit{max}}$.
|
|
||||||
% As $b > \ln (n/b)$, we can use the result by Raab and Steger~\cite{ra98} to estimate
|
|
||||||
% $\mathit{BS}_{\mathit{max}}$ with high probability.
|
|
||||||
% The following equation gives the estimation, where $\sigma=\sqrt{2}$:
|
|
||||||
% \begin{eqnarray} \label{eq:maxbs}
|
|
||||||
% \mathit{BS}_{\mathit{max}} = b + O \left( \sqrt{b\ln\frac{n}{b}} \right) = b + \sigma \times \left(\sqrt{b\ln\frac{n}{b}} \right)
|
|
||||||
% \end{eqnarray}
|
|
||||||
|
|
||||||
% In order to estimate the suitable constant $\sigma$ we did a linear
|
|
||||||
% regression suppressing the constant term.
|
|
||||||
% We use the equation $BS_{max} - b = \sigma \times \sqrt{b\ln (n/b)}$
|
|
||||||
% in the linear regression considering $y=BS_{max} - b$ and $x=\sqrt{b\ln (n/b)}$.
|
|
||||||
% In order to obtain data to be used in the linear regression we set
|
|
||||||
% b=128 and ran the new algorithm ten times
|
|
||||||
% for n equal to 1, 2, 4, 8, 16, 32, 64, 128, 512, 1000 million keys.
|
|
||||||
% Taking a confidence level equal to 95\% we got
|
|
||||||
% $\sigma = 2.11 \pm 0.03$.
|
|
||||||
% The coefficient of determination was $99.6\%$, which means that the linear
|
|
||||||
% regression explains $99.6\%$ of the data variation and only $0.4\%$
|
|
||||||
% is due to experimental errors.
|
|
||||||
% Therefore, Eq.~(\ref{eq:maxbs}) with $\sigma = 2.11 \pm 0.03$ and $b=128$
|
|
||||||
% makes a very good estimation of the maximum number of keys in any bucket.
|
|
||||||
|
|
||||||
% Repeating the same experiments for $b$ equals to $175$ and
|
|
||||||
% a confidence level of $95\%$ we got $\sigma = 2.07 \pm 0.03$.
|
|
||||||
% Again we verified that Eq.~(\ref{eq:maxbs}) with $\sigma = 2.07 \pm 0.03$ and $b=175$ is
|
|
||||||
% a very good estimation of the maximum number of keys in any bucket once the
|
|
||||||
% coefficient of determination obtained was $99.7 \%$ and $\sigma$ is in a very narrow range.
|
|
||||||
|
|
||||||
In our algorithm the maximum number of keys in any bucket must be at most 256.
|
|
||||||
Table~\ref{tab:comparison} presents the values for $\mathit{BS}_{\mathit{max}}$
|
|
||||||
obtained experimentally and using Eq.~(\ref{eq:maxbs}).
|
|
||||||
The table presents the worst case and the average case,
|
|
||||||
considering $b=128$, $b=175$ and Eq.~(\ref{eq:maxbs}),
|
|
||||||
for several numbers~$n$ of keys in $S$.
|
|
||||||
The estimation given by Eq.~(\ref{eq:maxbs}) is very close to the experimental
|
|
||||||
results.
|
|
||||||
|
|
||||||
Now we estimate the biggest problem our algorithm is able to solve for
|
|
||||||
a given $b$.
|
|
||||||
Using Eq.~(\ref{eq:maxbs}) considering $b=128$, $b=175$ and imposing
|
|
||||||
that~$\mathit{BS}_{\mathit{max}}\leq256$,
|
|
||||||
the sizes of the biggest key set our algorithm
|
|
||||||
can deal with are $10^{30}$ keys and $10^{10}$ keys, respectively.
|
|
||||||
%It is also important to have $b$ as big as possible, once its value is
|
|
||||||
%related to the space required to store the resultant MPHF, as shown later on.
|
|
||||||
%Table~\ref{tab:bp} shows the biggest problem the algorithm can solve.
|
|
||||||
% The values were obtained from Eq.~(\ref{eq:maxbs}),
|
|
||||||
% considering $b=128$ and~$b=175$ and imposing
|
|
||||||
% that~$\mathit{BS}_{\mathit{max}}\leq256$.
|
|
||||||
|
|
||||||
% We set $\sigma=2.14$ because it was the greatest value obtained for $\sigma$
|
|
||||||
% in the two linear regression we did.
|
|
||||||
% \vspace{-3mm}
|
|
||||||
% \begin{table}[htb]
|
|
||||||
% \begin{center}
|
|
||||||
% {\small %\scriptsize
|
|
||||||
% \begin{tabular}{|c|c|}
|
|
||||||
% \hline
|
|
||||||
% b & Problem size ($n$) \\
|
|
||||||
% \hline
|
|
||||||
% 128 & $10^{30}$ keys \\
|
|
||||||
% 175 & $10^{10}$ keys \\
|
|
||||||
% \hline
|
|
||||||
% \end{tabular}
|
|
||||||
% \vspace{-1mm}
|
|
||||||
% }
|
|
||||||
% \end{center}
|
|
||||||
% \caption{Using Eq.~(\ref{eq:maxbs}) to estimate the biggest problem our algorithm can solve.}
|
|
||||||
% %considering $\sigma=\sqrt{2}$.}
|
|
||||||
% \label{tab:bp}
|
|
||||||
% \vspace{-14mm}
|
|
||||||
% \end{table}
|
|
|
@ -1,113 +0,0 @@
|
||||||
% Nivio: 29/jan/06
|
|
||||||
% Time-stamp: <Sunday 29 Jan 2006 11:58:28pm EST yoshi@flare>
|
|
||||||
\vspace{-2mm}
|
|
||||||
\subsection{Controlling disk accesses}
|
|
||||||
\label{sec:contr-disk-access}
|
|
||||||
|
|
||||||
In order to bring down the number of seek operations on disk
|
|
||||||
we benefit from the fact that our algorithm leaves almost all main
|
|
||||||
memory available to be used as disk I/O buffer.
|
|
||||||
In this section we evaluate how much the parameter $\mu$
|
|
||||||
affects the runtime of our algorithm.
|
|
||||||
For that we fixed $n$ in 1 billion of URLs,
|
|
||||||
set the main memory of the machine used for the experiments
|
|
||||||
to 1 gigabyte and used $\mu$ equal to $100, 200, 300, 400, 500$ and $600$
|
|
||||||
megabytes.
|
|
||||||
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
Table~\ref{tab:diskaccess} presents the number of files $N$,
|
|
||||||
the buffer size used for all files, the number of seeks in the worst case considering
|
|
||||||
the pessimistic assumption mentioned in Section~\ref{sec:linearcomplexity}, and
|
|
||||||
the time to generate a MPHF for 1 billion of keys as a function of the amount of internal
|
|
||||||
memory available. Observing Table~\ref{tab:diskaccess} we noticed that the time spent in the construction
|
|
||||||
decreases as the value of $\mu$ increases. However, for $\mu > 400$, the variation
|
|
||||||
on the time is not as significant as for $\mu \leq 400$.
|
|
||||||
This can be explained by the fact that the kernel 2.6 I/O scheduler of Linux
|
|
||||||
has smart policies
|
|
||||||
for avoiding seeks and diminishing the average seek time
|
|
||||||
(see \texttt{http://www.linuxjournal.com/article/6931}).
|
|
||||||
\begin{table*}[ht]
|
|
||||||
\vspace{-2mm}
|
|
||||||
\begin{center}
|
|
||||||
{\scriptsize
|
|
||||||
\begin{tabular}{|l|c|c|c|c|c|c|}
|
|
||||||
\hline
|
|
||||||
$\mu$ (MB) & $100$ & $200$ & $300$ & $400$ & $500$ & $600$ \\
|
|
||||||
\hline
|
|
||||||
$N$ (files) & $619$ & $310$ & $207$ & $155$ & $124$ & $104$ \\
|
|
||||||
%\hline
|
|
||||||
\textbaht~(buffer size in KB) & $165$ & $661$ & $1,484$ & $2,643$ & $4,129$ & $5,908$ \\
|
|
||||||
%\hline
|
|
||||||
$\beta$/\textbaht~(\# of seeks in the worst case) & $384,478$ & $95,974$ & $42,749$ & $24,003$ & $15,365$ & $10,738$ \\
|
|
||||||
% \hline
|
|
||||||
% \raisebox{-0.2em}{\# of seeks performed in} & \raisebox{-0.7em}{$383,056$} & \raisebox{-0.7em}{$95,919$} & \raisebox{-0.7em}{$42,700$} & \raisebox{-0.7em}{$23,980$} & \raisebox{-0.7em}{$15,347$} & \raisebox{-0.7em}{$xx,xxx$} \\
|
|
||||||
% \raisebox{0.2em}{statement 1.3 of Figure~\ref{fig:readingbucket}} & & & & & & \\
|
|
||||||
% \hline
|
|
||||||
Time (hours) & $4.04$ & $3.64$ & $3.34$ & $3.20$ & $3.13$ & $3.09$ \\
|
|
||||||
\hline
|
|
||||||
\end{tabular}
|
|
||||||
\vspace{-1mm}
|
|
||||||
}
|
|
||||||
\end{center}
|
|
||||||
\caption{Influence of the internal memory area size ($\mu$) in our algorithm runtime.}
|
|
||||||
\label{tab:diskaccess}
|
|
||||||
\vspace{-14mm}
|
|
||||||
\end{table*}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
% \begin{table*}[ht]
|
|
||||||
% \begin{center}
|
|
||||||
% {\scriptsize
|
|
||||||
% \begin{tabular}{|l|c|c|c|c|c|c|c|c|c|c|c|}
|
|
||||||
% \hline
|
|
||||||
% $\mu$ (MB) & $100$ & $150$ & $200$ & $250$ & $300$ & $350$ & $400$ & $450$ & $500$ & $550$ & $600$ \\
|
|
||||||
% \hline
|
|
||||||
% $N$ (files) & $619$ & $413$ & $310$ & $248$ & $207$ & $177$ & $155$ & $138$ & $124$ & $113$ & $103$ \\
|
|
||||||
% \hline
|
|
||||||
% \textbaht~(buffer size in KB) & $165$ & $372$ & $661$ & $1,033$ & $1,484$ & $2,025$ & $2,643$ & $3,339$ & & & \\
|
|
||||||
% \hline
|
|
||||||
% \# of seeks (Worst case) & $384,478$ & $170,535$ & $95,974$ & $61,413$ & $42,749$ & $31,328$ & $24,003$ & $19,000$ & & & \\
|
|
||||||
% \hline
|
|
||||||
% \raisebox{-0.2em}{\# of seeks performed in} & \raisebox{-0.7em}{$383,056$} & \raisebox{-0.7em}{$170,385$} & \raisebox{-0.7em}{$95,919$} & \raisebox{-0.7em}{$61,388$} & \raisebox{-0.7em}{$42,700$} & \raisebox{-0.7em}{$31,296$} & \raisebox{-0.7em}{$23,980$} & \raisebox{-0.7em}{$18,978$} & \raisebox{-0.7em}{$xx,xxx$} & \raisebox{-0.7em}{$xx,xxx$} & \raisebox{-0.7em}{$xx,xxx$} \\
|
|
||||||
% \raisebox{0.2em}{statement 1.3 of Figure~\ref{fig:readingbucket}} & & & & & & & & & & & \\
|
|
||||||
% \hline
|
|
||||||
% Time (horas) & $4.04$ & $3.93$ & $3.64$ & $3.46$ & $3.34$ & $3.26$ & $3.20$ & $3.13$ & & & \\
|
|
||||||
% \hline
|
|
||||||
% \end{tabular}
|
|
||||||
% }
|
|
||||||
% \end{center}
|
|
||||||
% \caption{Influence of the internal memory area size ($\mu$) in our algorithm runtime.}
|
|
||||||
% \label{tab:diskaccess}
|
|
||||||
% \end{table*}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
% \begin{table*}[htb]
|
|
||||||
% \begin{center}
|
|
||||||
% {\scriptsize
|
|
||||||
% \begin{tabular}{|l|c|c|c|c|c|}
|
|
||||||
% \hline
|
|
||||||
% $n$ (millions) & 1 & 2 & 4 & 8 & 16 \\
|
|
||||||
% \hline % Part. 16 \% 16 \% 16 \% 18 \% 20\%
|
|
||||||
% Average time (s) & $14.124 \pm 0.128$ & $28.301 \pm 0.140$ & $56.807 \pm 0.312$ & $117.286 \pm 0.997$ & $241.086 \pm 0.936$ \\
|
|
||||||
% SD & $0.179$ & $0.196$ & $0.437$ & $1.394$ & $1.308$ \\
|
|
||||||
% \hline
|
|
||||||
% \hline
|
|
||||||
% $n$ (millions) & 32 & 64 & 128 & 512 & 1000 \\
|
|
||||||
% \hline % Part. 20 \% 20\% 20\% 18\% 18\%
|
|
||||||
% Average time (s) & $492.430 \pm 1.565$ & $1006.307 \pm 1.425$ & $2081.208 \pm 0.740$ & $9253.188 \pm 4.406$ & $19021.480 \pm 13.850$ \\
|
|
||||||
% SD & $2.188$ & $1.992$ & $1.035$ & $ 6.160$ & $18.016$ \\
|
|
||||||
% \hline
|
|
||||||
|
|
||||||
% \end{tabular}
|
|
||||||
% }
|
|
||||||
% \end{center}
|
|
||||||
% \caption{The runtime averages in seconds,
|
|
||||||
% the standard deviation (SD), and
|
|
||||||
% the confidence intervals given by the average time $\pm$
|
|
||||||
% the distance from average time considering
|
|
||||||
% a confidence level of $95\%$.
|
|
||||||
% }
|
|
||||||
% \label{tab:mediasbrz}
|
|
||||||
% \end{table*}
|
|
|
@ -1,15 +0,0 @@
|
||||||
%Nivio: 29/jan/06
|
|
||||||
% Time-stamp: <Sunday 29 Jan 2006 11:57:21pm EST yoshi@flare>
|
|
||||||
\vspace{-2mm}
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
\section{Appendix: Experimental results}
|
|
||||||
\label{sec:experimental-results}
|
|
||||||
\vspace{-1mm}
|
|
||||||
|
|
||||||
In this section we present the experimental results.
|
|
||||||
We start presenting the experimental setup.
|
|
||||||
We then present experimental results for
|
|
||||||
the internal memory based algorithm~\cite{bkz05}
|
|
||||||
and for our algorithm.
|
|
||||||
Finally, we discuss how the amount of internal memory available
|
|
||||||
affects the runtime of our algorithm.
|
|
Binary file not shown.
Before Width: | Height: | Size: 5.6 KiB |
|
@ -1,107 +0,0 @@
|
||||||
#FIG 3.2
|
|
||||||
Landscape
|
|
||||||
Center
|
|
||||||
Metric
|
|
||||||
A4
|
|
||||||
100.00
|
|
||||||
Single
|
|
||||||
-2
|
|
||||||
1200 2
|
|
||||||
0 32 #bdbebd
|
|
||||||
0 33 #bdbebd
|
|
||||||
0 34 #bdbebd
|
|
||||||
0 35 #4a4d4a
|
|
||||||
0 36 #bdbebd
|
|
||||||
0 37 #4a4d4a
|
|
||||||
0 38 #bdbebd
|
|
||||||
0 39 #bdbebd
|
|
||||||
6 225 6615 2520 7560
|
|
||||||
2 1 0 1 -1 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
900 7133 1608 7133
|
|
||||||
2 2 0 1 0 35 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
260 6795 474 6795 474 6965 260 6965 260 6795
|
|
||||||
2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
474 6795 686 6795 686 6965 474 6965 474 6795
|
|
||||||
2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
474 6626 686 6626 686 6795 474 6795 474 6626
|
|
||||||
2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
1538 6795 1750 6795 1750 6965 1538 6965 1538 6795
|
|
||||||
2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
1538 6965 1750 6965 1750 7133 1538 7133 1538 6965
|
|
||||||
2 2 0 1 -1 7 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
474 6965 686 6965 686 7133 474 7133 474 6965
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
686 6965 900 6965 900 7133 686 7133 686 6965
|
|
||||||
2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
1538 6626 1750 6626 1750 6795 1538 6795 1538 6626
|
|
||||||
2 2 0 1 0 35 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
260 6965 474 6965 474 7133 260 7133 260 6965
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
686 6795 900 6795 900 6965 686 6965 686 6795
|
|
||||||
4 0 0 50 -1 0 14 0.0000 4 30 180 1148 7049 ...\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 60 60 332 7260 0\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 60 544 7260 1\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 60 60 758 7260 2\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 90 960 1538 7260 ${\\lceil n/b\\rceil - 1}$\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 105 975 540 7515 Buckets Logical View\001
|
|
||||||
-6
|
|
||||||
6 2700 6390 4365 7830
|
|
||||||
6 3461 6445 3675 7425
|
|
||||||
6 3463 6786 3675 7245
|
|
||||||
6 3546 6893 3591 7094
|
|
||||||
4 0 -1 50 -1 0 12 0.0000 2 15 45 3546 6959 .\001
|
|
||||||
4 0 -1 50 -1 0 12 0.0000 2 15 45 3546 7027 .\001
|
|
||||||
4 0 -1 50 -1 0 12 0.0000 2 15 45 3546 7094 .\001
|
|
||||||
-6
|
|
||||||
2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
3463 6786 3675 6786 3675 7245 3463 7245 3463 6786
|
|
||||||
-6
|
|
||||||
2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
3461 6445 3675 6445 3675 6615 3461 6615 3461 6445
|
|
||||||
2 2 0 1 -1 7 50 -1 41 0.000 0 0 7 0 0 5
|
|
||||||
3463 6616 3675 6616 3675 6785 3463 6785 3463 6616
|
|
||||||
2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
3463 7246 3675 7246 3675 7425 3463 7425 3463 7246
|
|
||||||
-6
|
|
||||||
6 3023 6786 3235 7245
|
|
||||||
6 3106 6893 3151 7094
|
|
||||||
4 0 -1 50 -1 0 12 0.0000 2 15 45 3106 6959 .\001
|
|
||||||
4 0 -1 50 -1 0 12 0.0000 2 15 45 3106 7027 .\001
|
|
||||||
4 0 -1 50 -1 0 12 0.0000 2 15 45 3106 7094 .\001
|
|
||||||
-6
|
|
||||||
2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
3023 6786 3235 6786 3235 7245 3023 7245 3023 6786
|
|
||||||
-6
|
|
||||||
6 4091 6425 4305 7425
|
|
||||||
6 4093 6946 4305 7255
|
|
||||||
6 4176 7018 4221 7153
|
|
||||||
4 0 -1 50 -1 0 12 0.0000 2 15 45 4176 7063 .\001
|
|
||||||
4 0 -1 50 -1 0 12 0.0000 2 15 45 4176 7108 .\001
|
|
||||||
4 0 -1 50 -1 0 12 0.0000 2 15 45 4176 7153 .\001
|
|
||||||
-6
|
|
||||||
2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
4093 6946 4305 6946 4305 7255 4093 7255 4093 6946
|
|
||||||
-6
|
|
||||||
2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
4091 6605 4305 6605 4305 6775 4091 6775 4091 6605
|
|
||||||
2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
4093 7256 4305 7256 4305 7425 4093 7425 4093 7256
|
|
||||||
2 2 0 1 -1 7 50 -1 41 0.000 0 0 7 0 0 5
|
|
||||||
4093 6776 4305 6776 4305 6945 4093 6945 4093 6776
|
|
||||||
2 2 0 1 0 35 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
4091 6425 4305 6425 4305 6595 4091 6595 4091 6425
|
|
||||||
-6
|
|
||||||
2 2 0 1 0 35 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
3021 6445 3235 6445 3235 6615 3021 6615 3021 6445
|
|
||||||
2 2 0 1 -1 7 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
3023 6616 3235 6616 3235 6785 3023 6785 3023 6616
|
|
||||||
2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
3023 7246 3235 7246 3235 7425 3023 7425 3023 7246
|
|
||||||
4 0 0 50 -1 0 14 0.0000 4 30 180 3780 6975 ...\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 255 3015 7560 File 1\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 255 3465 7560 File 2\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 270 4095 7560 File N\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 105 1020 3195 7785 Buckets Physical View\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 150 120 2700 7020 b)\001
|
|
||||||
-6
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 150 105 0 7020 a)\001
|
|
|
@ -1,126 +0,0 @@
|
||||||
#FIG 3.2
|
|
||||||
Landscape
|
|
||||||
Center
|
|
||||||
Metric
|
|
||||||
A4
|
|
||||||
100.00
|
|
||||||
Single
|
|
||||||
-2
|
|
||||||
1200 2
|
|
||||||
0 32 #bebebe
|
|
||||||
0 33 #4e4e4e
|
|
||||||
6 2160 3825 2430 4365
|
|
||||||
2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2160 4005 2430 4005 2430 4095 2160 4095 2160 4005
|
|
||||||
2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2160 3825 2430 3825 2430 3915 2160 3915 2160 3825
|
|
||||||
2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2160 3915 2430 3915 2430 4005 2160 4005 2160 3915
|
|
||||||
2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2160 4275 2430 4275 2430 4365 2160 4365 2160 4275
|
|
||||||
2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2160 4185 2430 4185 2430 4275 2160 4275 2160 4185
|
|
||||||
2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2160 4095 2430 4095 2430 4185 2160 4185 2160 4095
|
|
||||||
-6
|
|
||||||
6 2430 3735 2700 4365
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
2430 3825 2700 3825 2700 3915 2430 3915 2430 3825
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
2430 4275 2700 4275 2700 4365 2430 4365 2430 4275
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
2430 4185 2700 4185 2700 4275 2430 4275 2430 4185
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
2430 4095 2700 4095 2700 4185 2430 4185 2430 4095
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
2430 4005 2700 4005 2700 4095 2430 4095 2430 4005
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
2430 3915 2700 3915 2700 4005 2430 4005 2430 3915
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
2430 3735 2700 3735 2700 3825 2430 3825 2430 3735
|
|
||||||
-6
|
|
||||||
6 2700 4005 2970 4365
|
|
||||||
2 2 0 1 0 32 50 -1 43 0.000 0 0 -1 0 0 5
|
|
||||||
2700 4275 2970 4275 2970 4365 2700 4365 2700 4275
|
|
||||||
2 2 0 1 0 32 50 -1 43 0.000 0 0 -1 0 0 5
|
|
||||||
2700 4185 2970 4185 2970 4275 2700 4275 2700 4185
|
|
||||||
2 2 0 1 0 32 50 -1 43 0.000 0 0 -1 0 0 5
|
|
||||||
2700 4095 2970 4095 2970 4185 2700 4185 2700 4095
|
|
||||||
2 2 0 1 -1 32 50 -1 43 0.000 0 0 -1 0 0 5
|
|
||||||
2700 4005 2970 4005 2970 4095 2700 4095 2700 4005
|
|
||||||
-6
|
|
||||||
6 2025 5625 3690 5760
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 360 2025 5760 File 1\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 360 2565 5760 File 2\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 405 3285 5760 File N\001
|
|
||||||
-6
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
3510 4410 3510 4590
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
3780 4410 3780 4590
|
|
||||||
2 2 0 1 0 33 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
1890 4185 2160 4185 2160 4275 1890 4275 1890 4185
|
|
||||||
2 2 0 1 0 33 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
1890 4275 2160 4275 2160 4365 1890 4365 1890 4275
|
|
||||||
2 2 0 1 0 33 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
1890 4095 2160 4095 2160 4185 1890 4185 1890 4095
|
|
||||||
2 2 0 1 0 33 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
2070 4860 2340 4860 2340 5040 2070 5040 2070 4860
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 7 0 0 5
|
|
||||||
3330 5220 3600 5220 3600 5400 3330 5400 3330 5220
|
|
||||||
2 2 0 1 0 33 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
3330 4860 3600 4860 3600 4950 3330 4950 3330 4860
|
|
||||||
2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2070 5040 2340 5040 2340 5130 2070 5130 2070 5040
|
|
||||||
2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
3330 4950 3600 4950 3600 5220 3330 5220 3330 4950
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 7 0 0 5
|
|
||||||
2070 5130 2340 5130 2340 5310 2070 5310 2070 5130
|
|
||||||
2 2 0 1 0 7 50 -1 10 0.000 0 0 7 0 0 5
|
|
||||||
2610 5400 2880 5400 2880 5580 2610 5580 2610 5400
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 7 0 0 5
|
|
||||||
2610 4860 2880 4860 2880 5040 2610 5040 2610 4860
|
|
||||||
2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
2610 5040 2880 5040 2880 5130 2610 5130 2610 5040
|
|
||||||
2 2 0 1 0 7 50 -1 50 0.000 0 0 -1 0 0 5
|
|
||||||
2970 4275 3240 4275 3240 4365 2970 4365 2970 4275
|
|
||||||
2 2 0 1 0 7 50 -1 50 0.000 0 0 -1 0 0 5
|
|
||||||
2970 4185 3240 4185 3240 4275 2970 4275 2970 4185
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
3510 4410 3600 4410
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
3690 4410 3780 4410
|
|
||||||
2 2 0 1 0 7 50 -1 10 0.000 0 0 -1 0 0 5
|
|
||||||
3510 4275 3780 4275 3780 4365 3510 4365 3510 4275
|
|
||||||
2 2 0 1 0 7 50 -1 10 0.000 0 0 -1 0 0 5
|
|
||||||
3510 4185 3780 4185 3780 4275 3510 4275 3510 4185
|
|
||||||
2 2 0 1 0 32 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
2610 5130 2880 5130 2880 5400 2610 5400 2610 5130
|
|
||||||
2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
2070 5310 2340 5310 2340 5490 2070 5490 2070 5310
|
|
||||||
2 2 0 1 0 7 50 -1 10 0.000 0 0 7 0 0 5
|
|
||||||
2070 5490 2340 5490 2340 5580 2070 5580 2070 5490
|
|
||||||
2 2 0 1 0 7 50 -1 50 0.000 0 0 7 0 0 5
|
|
||||||
3330 5400 3600 5400 3600 5490 3330 5490 3330 5400
|
|
||||||
2 2 0 1 0 32 50 -1 20 0.000 0 0 -1 0 0 5
|
|
||||||
3240 4275 3510 4275 3510 4365 3240 4365 3240 4275
|
|
||||||
2 2 0 1 0 32 50 -1 20 0.000 0 0 -1 0 0 5
|
|
||||||
3240 4185 3510 4185 3510 4275 3240 4275 3240 4185
|
|
||||||
2 2 0 1 -1 32 50 -1 20 0.000 0 0 -1 0 0 5
|
|
||||||
3240 4095 3510 4095 3510 4185 3240 4185 3240 4095
|
|
||||||
2 2 0 1 0 32 50 -1 20 0.000 0 0 -1 0 0 5
|
|
||||||
3240 4005 3510 4005 3510 4095 3240 4095 3240 4005
|
|
||||||
2 2 0 1 0 32 50 -1 20 0.000 0 0 -1 0 0 5
|
|
||||||
3240 3915 3510 3915 3510 4005 3240 4005 3240 3915
|
|
||||||
2 2 0 1 0 32 50 -1 20 0.000 0 0 -1 0 0 5
|
|
||||||
3330 5490 3600 5490 3600 5580 3330 5580 3330 5490
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 75 1980 4545 0\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 420 3555 4545 n/b - 1\001
|
|
||||||
4 0 0 50 -1 0 18 0.0000 4 30 180 3015 5265 ...\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 75 2250 4545 1\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 75 2520 4545 2\001
|
|
||||||
4 0 0 50 -1 0 18 0.0000 4 30 180 2880 4500 ...\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 135 1410 4050 5310 Buckets Physical View\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 135 1350 4050 4140 Buckets Logical View\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 135 120 1665 3780 a)\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 135 135 1620 4950 b)\001
|
|
|
@ -1,183 +0,0 @@
|
||||||
#FIG 3.2 Produced by xfig version 3.2.5-alpha5
|
|
||||||
Landscape
|
|
||||||
Center
|
|
||||||
Metric
|
|
||||||
A4
|
|
||||||
100.00
|
|
||||||
Single
|
|
||||||
-2
|
|
||||||
1200 2
|
|
||||||
0 32 #bdbebd
|
|
||||||
0 33 #bdbebd
|
|
||||||
0 34 #bdbebd
|
|
||||||
0 35 #4a4d4a
|
|
||||||
0 36 #bdbebd
|
|
||||||
0 37 #4a4d4a
|
|
||||||
0 38 #bdbebd
|
|
||||||
0 39 #bdbebd
|
|
||||||
0 40 #bdbebd
|
|
||||||
6 3427 4042 3852 4211
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3427 4041 3852 4041 3852 4211 3427 4211 3427 4041
|
|
||||||
4 0 0 50 -1 0 14 0.0000 4 30 180 3551 4140 ...\001
|
|
||||||
-6
|
|
||||||
6 3410 5689 3835 5859
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3410 5689 3835 5689 3835 5858 3410 5858 3410 5689
|
|
||||||
4 0 0 50 -1 0 14 0.0000 4 30 180 3534 5788 ...\001
|
|
||||||
-6
|
|
||||||
6 3825 5445 4455 5535
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
4140 5445 4095 5490
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
4140 5445 4185 5490
|
|
||||||
3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 8
|
|
||||||
3825 5535 3825 5490 3870 5490 3915 5490 3959 5490 4006 5490
|
|
||||||
4095 5490 4095 5490
|
|
||||||
0.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000
|
|
||||||
3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 7
|
|
||||||
4455 5535 4455 5490 4410 5490 4365 5490 4321 5490 4274 5490
|
|
||||||
4185 5490
|
|
||||||
0.000 1.000 1.000 1.000 1.000 1.000 0.000
|
|
||||||
-6
|
|
||||||
6 1873 5442 2323 5532
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
2098 5442 2066 5487
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
2098 5442 2130 5487
|
|
||||||
3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 8
|
|
||||||
1873 5532 1873 5487 1905 5487 1937 5487 1969 5487 2002 5487
|
|
||||||
2066 5487 2066 5487
|
|
||||||
0.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000
|
|
||||||
3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 7
|
|
||||||
2323 5532 2323 5487 2291 5487 2259 5487 2227 5487 2194 5487
|
|
||||||
2130 5487
|
|
||||||
0.000 1.000 1.000 1.000 1.000 1.000 0.000
|
|
||||||
-6
|
|
||||||
6 2338 5442 2968 5532
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
2653 5442 2608 5487
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
2653 5442 2698 5487
|
|
||||||
3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 8
|
|
||||||
2338 5532 2338 5487 2383 5487 2428 5487 2473 5487 2518 5487
|
|
||||||
2608 5487 2608 5487
|
|
||||||
0.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000
|
|
||||||
3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 7
|
|
||||||
2968 5532 2968 5487 2923 5487 2878 5487 2833 5487 2788 5487
|
|
||||||
2698 5487
|
|
||||||
0.000 1.000 1.000 1.000 1.000 1.000 0.000
|
|
||||||
-6
|
|
||||||
6 2475 4500 4770 5175
|
|
||||||
2 1 0 1 -1 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
3137 5013 3845 5013
|
|
||||||
2 2 0 1 0 37 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
2497 4675 2711 4675 2711 4845 2497 4845 2497 4675
|
|
||||||
2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2711 4675 2923 4675 2923 4845 2711 4845 2711 4675
|
|
||||||
2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2711 4506 2923 4506 2923 4675 2711 4675 2711 4506
|
|
||||||
2 2 0 1 0 36 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
3775 4675 3987 4675 3987 4845 3775 4845 3775 4675
|
|
||||||
2 2 0 1 0 36 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
3775 4845 3987 4845 3987 5013 3775 5013 3775 4845
|
|
||||||
2 2 0 1 -1 7 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2711 4845 2923 4845 2923 5013 2711 5013 2711 4845
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
2923 4845 3137 4845 3137 5013 2923 5013 2923 4845
|
|
||||||
2 2 0 1 0 36 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
3775 4506 3987 4506 3987 4675 3775 4675 3775 4506
|
|
||||||
2 2 0 1 0 37 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
2497 4845 2711 4845 2711 5013 2497 5013 2497 4845
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
2923 4675 3137 4675 3137 4845 2923 4845 2923 4675
|
|
||||||
4 0 0 50 -1 0 14 0.0000 4 30 180 3385 4929 ...\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 60 2569 5140 0\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 60 2781 5140 1\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 60 2995 5140 2\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 405 4059 4845 Buckets\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 105 1095 3775 5140 ${\\lceil n/b\\rceil - 1}$\001
|
|
||||||
-6
|
|
||||||
6 2983 5446 3433 5536
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
3208 5446 3176 5491
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
3208 5446 3240 5491
|
|
||||||
3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 8
|
|
||||||
2983 5536 2983 5491 3015 5491 3047 5491 3079 5491 3112 5491
|
|
||||||
3176 5491 3176 5491
|
|
||||||
0.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000
|
|
||||||
3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 7
|
|
||||||
3433 5536 3433 5491 3401 5491 3369 5491 3337 5491 3304 5491
|
|
||||||
3240 5491
|
|
||||||
0.000 1.000 1.000 1.000 1.000 1.000 0.000
|
|
||||||
-6
|
|
||||||
2 2 0 1 0 36 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
3852 4041 4066 4041 4066 4211 3852 4211 3852 4041
|
|
||||||
2 2 0 1 0 36 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
4066 4041 4279 4041 4279 4211 4066 4211 4066 4041
|
|
||||||
2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
1937 4041 2149 4041 2149 4211 1937 4211 1937 4041
|
|
||||||
2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2149 4041 2362 4041 2362 4211 2149 4211 2149 4041
|
|
||||||
2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2362 4041 2576 4041 2576 4211 2362 4211 2362 4041
|
|
||||||
2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2576 4041 2788 4041 2788 4211 2576 4211 2576 4041
|
|
||||||
2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2788 4041 3002 4041 3002 4211 2788 4211 2788 4041
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3214 4041 3427 4041 3427 4211 3214 4211 3214 4041
|
|
||||||
2 2 0 1 0 36 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
4279 4041 4492 4041 4492 4211 4279 4211 4279 4041
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3002 4041 3214 4041 3214 4211 3002 4211 3002 4041
|
|
||||||
2 2 0 1 0 37 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
2132 5689 2345 5689 2345 5858 2132 5858 2132 5689
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
3197 5689 3410 5689 3410 5858 3197 5858 3197 5689
|
|
||||||
2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2771 5689 2985 5689 2985 5858 2771 5858 2771 5689
|
|
||||||
2 2 0 1 0 36 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
4262 5689 4475 5689 4475 5858 4262 5858 4262 5689
|
|
||||||
2 2 0 1 0 36 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
4049 5689 4262 5689 4262 5858 4049 5858 4049 5689
|
|
||||||
2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5
|
|
||||||
2985 5689 3197 5689 3197 5858 2985 5858 2985 5689
|
|
||||||
2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2345 5689 2559 5689 2559 5858 2345 5858 2345 5689
|
|
||||||
2 2 0 1 0 37 50 -1 20 0.000 0 0 7 0 0 5
|
|
||||||
1914 5687 2127 5687 2127 5856 1914 5856 1914 5687
|
|
||||||
2 2 0 1 0 36 50 -1 43 0.000 0 0 7 0 0 5
|
|
||||||
3835 5689 4049 5689 4049 5858 3835 5858 3835 5689
|
|
||||||
2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5
|
|
||||||
2559 5689 2771 5689 2771 5858 2559 5858 2559 5689
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 5
|
|
||||||
1 1 1.00 60.00 120.00
|
|
||||||
3330 4275 3330 4365 3330 4410 3330 4455 3330 4500
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 7 1 0 2
|
|
||||||
1 1 1.00 45.00 60.00
|
|
||||||
3880 5168 4140 5445
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 7 1 0 2
|
|
||||||
1 1 1.00 45.00 60.00
|
|
||||||
3025 5170 3205 5440
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 7 1 0 2
|
|
||||||
1 1 1.00 45.00 60.00
|
|
||||||
2805 5164 2653 5438
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 7 1 0 2
|
|
||||||
1 1 1.00 45.00 60.00
|
|
||||||
2577 5170 2103 5434
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 120 645 4562 4168 Key Set $S$\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 60 2008 3999 0\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 60 2220 3999 1\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 165 4314 3999 n-1\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 60 1991 5985 0\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 60 2203 5985 1\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 165 4297 5985 n-1\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 75 555 4545 5816 Hash Table\001
|
|
||||||
4 0 -1 50 -1 0 3 0.0000 2 75 450 1980 5625 MPHF$_0$\001
|
|
||||||
4 0 -1 50 -1 0 3 0.0000 2 75 450 2520 5625 MPHF$_1$\001
|
|
||||||
4 0 -1 50 -1 0 3 0.0000 2 75 450 3015 5625 MPHF$_2$\001
|
|
||||||
4 0 -1 50 -1 0 3 0.0000 2 75 1065 3825 5625 MPHF$_{\\lceil n/b \\rceil - 1}$\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 105 585 1440 4455 Partitioning\001
|
|
||||||
4 0 -1 50 -1 0 7 0.0000 2 105 495 1440 5265 Searching\001
|
|
Binary file not shown.
Before Width: | Height: | Size: 5.5 KiB |
|
@ -1,153 +0,0 @@
|
||||||
#FIG 3.2 Produced by xfig version 3.2.5-alpha5
|
|
||||||
Landscape
|
|
||||||
Center
|
|
||||||
Metric
|
|
||||||
A4
|
|
||||||
100.00
|
|
||||||
Single
|
|
||||||
-2
|
|
||||||
1200 2
|
|
||||||
0 32 #bebebe
|
|
||||||
6 2025 3015 3555 3690
|
|
||||||
2 3 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 8
|
|
||||||
2025 3285 2295 3285 2295 3015 3285 3015 3285 3285 3555 3285
|
|
||||||
2790 3690 2025 3285
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 135 765 2385 3330 Partitioning\001
|
|
||||||
-6
|
|
||||||
6 1890 3735 3780 4365
|
|
||||||
6 2430 3735 2700 4365
|
|
||||||
6 2430 3915 2700 4365
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2430 4275 2700 4275 2700 4365 2430 4365 2430 4275
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2430 4185 2700 4185 2700 4275 2430 4275 2430 4185
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2430 4095 2700 4095 2700 4185 2430 4185 2430 4095
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2430 4005 2700 4005 2700 4095 2430 4095 2430 4005
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2430 3915 2700 3915 2700 4005 2430 4005 2430 3915
|
|
||||||
-6
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2430 3825 2700 3825 2700 3915 2430 3915 2430 3825
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2430 3735 2700 3735 2700 3825 2430 3825 2430 3735
|
|
||||||
-6
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
1890 4275 2160 4275 2160 4365 1890 4365 1890 4275
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
1890 4185 2160 4185 2160 4275 1890 4275 1890 4185
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2160 4275 2430 4275 2430 4365 2160 4365 2160 4275
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2160 4185 2430 4185 2430 4275 2160 4275 2160 4185
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2160 4095 2430 4095 2430 4185 2160 4185 2160 4095
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2160 4005 2430 4005 2430 4095 2160 4095 2160 4005
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2160 3915 2430 3915 2430 4005 2160 4005 2160 3915
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2700 4275 2970 4275 2970 4365 2700 4365 2700 4275
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2700 4185 2970 4185 2970 4275 2700 4275 2700 4185
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2700 4095 2970 4095 2970 4185 2700 4185 2700 4095
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2700 4005 2970 4005 2970 4095 2700 4095 2700 4005
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2160 3825 2430 3825 2430 3915 2160 3915 2160 3825
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3240 4275 3510 4275 3510 4365 3240 4365 3240 4275
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3510 4275 3780 4275 3780 4365 3510 4365 3510 4275
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2970 4275 3240 4275 3240 4365 2970 4365 2970 4275
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3240 4185 3510 4185 3510 4275 3240 4275 3240 4185
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
1890 4095 2160 4095 2160 4185 1890 4185 1890 4095
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3510 4185 3780 4185 3780 4275 3510 4275 3510 4185
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3240 4095 3510 4095 3510 4185 3240 4185 3240 4095
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3240 4005 3510 4005 3510 4095 3240 4095 3240 4005
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3240 3915 3510 3915 3510 4005 3240 4005 3240 3915
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
1890 4365 3780 4365
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2970 4185 3240 4185 3240 4275 2970 4275 2970 4185
|
|
||||||
-6
|
|
||||||
6 1260 5310 4230 5580
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
1260 5400 4230 5400
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
1530 5310 1800 5310 1800 5400 1530 5400 1530 5310
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2070 5310 2340 5310 2340 5400 2070 5400 2070 5310
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2340 5310 2610 5310 2610 5400 2340 5400 2340 5310
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2610 5310 2880 5310 2880 5400 2610 5400 2610 5310
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2880 5310 3150 5310 3150 5400 2880 5400 2880 5310
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3420 5310 3690 5310 3690 5400 3420 5400 3420 5310
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3690 5310 3960 5310 3960 5400 3690 5400 3690 5310
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3960 5310 4230 5310 4230 5400 3960 5400 3960 5310
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
1800 5310 2070 5310 2070 5400 1800 5400 1800 5310
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3150 5310 3420 5310 3420 5400 3150 5400 3150 5310
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
1260 5310 1530 5310 1530 5400 1260 5400 1260 5310
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 210 4005 5580 n-1\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 75 1350 5580 0\001
|
|
||||||
-6
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
1260 2925 4230 2925
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
1530 2835 1800 2835 1800 2925 1530 2925 1530 2835
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2070 2835 2340 2835 2340 2925 2070 2925 2070 2835
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2340 2835 2610 2835 2610 2925 2340 2925 2340 2835
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2610 2835 2880 2835 2880 2925 2610 2925 2610 2835
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
2880 2835 3150 2835 3150 2925 2880 2925 2880 2835
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3420 2835 3690 2835 3690 2925 3420 2925 3420 2835
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3690 2835 3960 2835 3960 2925 3690 2925 3690 2835
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3960 2835 4230 2835 4230 2925 3960 2925 3960 2835
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
1800 2835 2070 2835 2070 2925 1800 2925 1800 2835
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
3150 2835 3420 2835 3420 2925 3150 2925 3150 2835
|
|
||||||
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
|
|
||||||
1260 2835 1530 2835 1530 2925 1260 2925 1260 2835
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
3510 4410 3510 4590
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
3510 4410 3600 4410
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
3690 4410 3780 4410
|
|
||||||
2 3 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 8
|
|
||||||
2025 4815 2295 4815 2295 4545 3285 4545 3285 4815 3555 4815
|
|
||||||
2790 5220 2025 4815
|
|
||||||
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
|
|
||||||
3780 4410 3780 4590
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 135 585 2475 4860 Searching\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 75 1980 4545 0\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 690 4410 5400 Hash Table\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 480 4410 4230 Buckets\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 135 555 4410 2925 Key set S\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 75 1350 2745 0\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 210 4005 2745 n-1\001
|
|
||||||
4 0 0 50 -1 0 10 0.0000 4 105 420 3555 4545 n/b - 1\001
|
|
Binary file not shown.
Before Width: | Height: | Size: 3.8 KiB |
|
@ -1,109 +0,0 @@
|
||||||
%% Nivio: 22/jan/06 23/jan/06 29/jan
|
|
||||||
% Time-stamp: <Monday 30 Jan 2006 03:52:42am EDT yoshi@ime.usp.br>
|
|
||||||
\section{Introduction}
|
|
||||||
\label{sec:intro}
|
|
||||||
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
Suppose~$U$ is a universe of \textit{keys} of size $u$.
|
|
||||||
Let $h:U\to M$ be a {\em hash function} that maps the keys from~$U$
|
|
||||||
to a given interval of integers $M=[0,m-1]=\{0,1,\dots,m-1\}$.
|
|
||||||
Let~$S\subseteq U$ be a set of~$n$ keys from~$U$, where $ n \ll u$.
|
|
||||||
Given a key~$x\in S$, the hash function~$h$ computes an integer in
|
|
||||||
$[0,m-1]$ for the storage or retrieval of~$x$ in a {\em hash table}.
|
|
||||||
% Hashing methods for {\em non-static sets} of keys can be used to construct
|
|
||||||
% data structures storing $S$ and supporting membership queries
|
|
||||||
% ``$x \in S$?'' in expected time $O(1)$.
|
|
||||||
% However, they involve a certain amount of wasted space owing to unused
|
|
||||||
% locations in the table and waisted time to resolve collisions when
|
|
||||||
% two keys are hashed to the same table location.
|
|
||||||
A perfect hash function maps a {\em static set} $S$ of $n$ keys from $U$ into a set of $m$ integer
|
|
||||||
numbers without collisions, where $m$ is greater than or equal to $n$.
|
|
||||||
If $m$ is equal to $n$, the function is called minimal.
|
|
||||||
|
|
||||||
% Figure~\ref{fig:minimalperfecthash-ph-mph}(a) illustrates a perfect hash function and
|
|
||||||
% Figure~\ref{fig:minimalperfecthash-ph-mph}(b) illustrates a minimal perfect hash function (MPHF).
|
|
||||||
%
|
|
||||||
% \begin{figure}
|
|
||||||
% \centering
|
|
||||||
% \scalebox{0.7}{\epsfig{file=figs/minimalperfecthash-ph-mph.ps}}
|
|
||||||
% \caption{(a) Perfect hash function (b) Minimal perfect hash function (MPHF)}
|
|
||||||
% \label{fig:minimalperfecthash-ph-mph}
|
|
||||||
% %\vspace{-5mm}
|
|
||||||
% \end{figure}
|
|
||||||
|
|
||||||
Minimal perfect hash functions are widely used for memory efficient storage and fast
|
|
||||||
retrieval of items from static sets, such as words in natural languages,
|
|
||||||
reserved words in programming languages or interactive systems, universal resource
|
|
||||||
locations (URLs) in web search engines, or item sets in data mining techniques.
|
|
||||||
Search engines are nowadays indexing tens of billions of pages and algorithms
|
|
||||||
like PageRank~\cite{Brin1998}, which uses the web link structure to derive a
|
|
||||||
measure of popularity for Web pages, would benefit from a MPHF for storage and
|
|
||||||
retrieval of such huge sets of URLs.
|
|
||||||
For instance, the TodoBr\footnote{TodoBr ({\texttt www.todobr.com.br}) is a trademark of
|
|
||||||
Akwan Information Technologies, which was acquired by Google Inc. in July 2005.}
|
|
||||||
search engine used the algorithm proposed hereinafter to
|
|
||||||
improve and to scale its link analysis system.
|
|
||||||
The WebGraph research group~\cite{bv04} would
|
|
||||||
also benefit from a MPHF for sets in the order of billions of URLs to scale
|
|
||||||
and to improve the storange requirements of their algorithms on Graph compression.
|
|
||||||
|
|
||||||
Another interesting application for MPHFs is its use as an indexing structure
|
|
||||||
for databases.
|
|
||||||
The B+ tree is very popular as an indexing structure for dynamic applications
|
|
||||||
with frequent insertions and deletions of records.
|
|
||||||
However, for applications with sporadic modifications and a huge number of
|
|
||||||
queries the B+ tree is not the best option,
|
|
||||||
because it performs poorly with very large sets of keys
|
|
||||||
such as those required for the new frontiers of database applications~\cite{s05}.
|
|
||||||
Therefore, there are applications for MPHFs in
|
|
||||||
information retrieval systems, database systems, language translation systems,
|
|
||||||
electronic commerce systems, compilers, operating systems, among others.
|
|
||||||
|
|
||||||
Until now, because of the limitations of current algorithms,
|
|
||||||
the use of MPHFs is restricted to scenarios where the set of keys being hashed is
|
|
||||||
relatively small.
|
|
||||||
However, in many cases it is crucial to deal in an efficient way with very large
|
|
||||||
sets of keys.
|
|
||||||
Due to the exponential growth of the Web, the work with huge collections is becoming
|
|
||||||
a daily task.
|
|
||||||
For instance, the simple assignment of number identifiers to web pages of a collection
|
|
||||||
can be a challenging task.
|
|
||||||
While traditional databases simply cannot handle more traffic once the working
|
|
||||||
set of URLs does not fit in main memory anymore~\cite{s05}, the algorithm we propose here to
|
|
||||||
construct MPHFs can easily scale to billions of entries.
|
|
||||||
% using stock hardware.
|
|
||||||
|
|
||||||
As there are many applications for MPHFs, it is
|
|
||||||
important to design and implement space and time efficient algorithms for
|
|
||||||
constructing such functions.
|
|
||||||
The attractiveness of using MPHFs depends on the following issues:
|
|
||||||
\begin{enumerate}
|
|
||||||
\item The amount of CPU time required by the algorithms for constructing MPHFs.
|
|
||||||
\item The space requirements of the algorithms for constructing MPHFs.
|
|
||||||
\item The amount of CPU time required by a MPHF for each retrieval.
|
|
||||||
\item The space requirements of the description of the resulting MPHFs to be
|
|
||||||
used at retrieval time.
|
|
||||||
\end{enumerate}
|
|
||||||
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
This paper presents a novel external memory based algorithm for constructing MPHFs that
|
|
||||||
is very efficient in these four requirements.
|
|
||||||
First, the algorithm is linear on the size of keys to construct a MPHF,
|
|
||||||
which is optimal.
|
|
||||||
For instance, for a collection of 1 billion URLs
|
|
||||||
collected from the web, each one 64 characters long on average, the time to construct a
|
|
||||||
MPHF using a 2.4 gigahertz PC with 500 megabytes of available main memory
|
|
||||||
is approximately 3 hours.
|
|
||||||
Second, the algorithm needs a small a priori defined vector of $\lceil n/b \rceil$
|
|
||||||
one byte entries in main memory to construct a MPHF.
|
|
||||||
For the collection of 1 billion URLs and using $b=175$, the algorithm needs only
|
|
||||||
5.45 megabytes of internal memory.
|
|
||||||
Third, the evaluation of the MPHF for each retrieval requires three memory accesses and
|
|
||||||
the computation of three universal hash functions.
|
|
||||||
This is not optimal as any MPHF requires at least one memory access and the computation
|
|
||||||
of two universal hash functions.
|
|
||||||
Fourth, the description of a MPHF takes a constant number of bits for each key, which is optimal.
|
|
||||||
For the collection of 1 billion URLs, it needs 8.1 bits for each key,
|
|
||||||
while the theoretical lower bound is $1/\ln2 \approx 1.4427$ bits per
|
|
||||||
key~\cite{m84}.
|
|
||||||
|
|
|
@ -1,17 +0,0 @@
|
||||||
all:
|
|
||||||
latex vldb.tex
|
|
||||||
bibtex vldb
|
|
||||||
latex vldb.tex
|
|
||||||
latex vldb.tex
|
|
||||||
dvips vldb.dvi -o vldb.ps
|
|
||||||
ps2pdf vldb.ps
|
|
||||||
chmod -R g+rwx *
|
|
||||||
|
|
||||||
perm:
|
|
||||||
chmod -R g+rwx *
|
|
||||||
|
|
||||||
run: clean all
|
|
||||||
gv vldb.ps &
|
|
||||||
clean:
|
|
||||||
rm *.aux *.bbl *.blg *.log *.ps *.pdf *.dvi
|
|
||||||
|
|
|
@ -1,141 +0,0 @@
|
||||||
%% Nivio: 21/jan/06
|
|
||||||
% Time-stamp: <Monday 30 Jan 2006 03:57:28am EDT yoshi@ime.usp.br>
|
|
||||||
\vspace{-2mm}
|
|
||||||
\subsection{Partitioning step}
|
|
||||||
\label{sec:partitioning-keys}
|
|
||||||
|
|
||||||
The set $S$ of $n$ keys is partitioned into $\lceil n/b \rceil$ buckets,
|
|
||||||
where $b$ is a suitable parameter chosen to guarantee
|
|
||||||
that each bucket has at most 256 keys with high probability
|
|
||||||
(see Section~\ref{sec:determining-b}).
|
|
||||||
The partitioning step works as follows:
|
|
||||||
|
|
||||||
\begin{figure}[h]
|
|
||||||
\hrule
|
|
||||||
\hrule
|
|
||||||
\vspace{2mm}
|
|
||||||
\begin{tabbing}
|
|
||||||
aa\=type booleanx \== (false, true); \kill
|
|
||||||
\> $\blacktriangleright$ Let $\beta$ be the size in bytes of the set $S$ \\
|
|
||||||
\> $\blacktriangleright$ Let $\mu$ be the size in bytes of an a priori reserved \\
|
|
||||||
\> ~~~ internal memory area \\
|
|
||||||
\> $\blacktriangleright$ Let $N = \lceil \beta/\mu \rceil$ be the number of key blocks that will \\
|
|
||||||
\> ~~~ be read from disk into an internal memory area \\
|
|
||||||
\> $\blacktriangleright$ Let $\mathit{size}$ be a vector that stores the size of each bucket \\
|
|
||||||
\> $1.$ {\bf for} $j = 1$ {\bf to} $N$ {\bf do} \\
|
|
||||||
\> ~~ $1.1$ Read block $B_j$ of keys from disk \\
|
|
||||||
\> ~~ $1.2$ Cluster $B_j$ into $\lceil n/b \rceil$ buckets using a bucket sort \\
|
|
||||||
\> ~~~~~~~ algorithm and update the entries in the vector {\it size} \\
|
|
||||||
\> ~~ $1.3$ Dump $B_j$ to the disk into File $j$\\
|
|
||||||
\> $2.$ Compute the {\it offset} vector and dump it to the disk.
|
|
||||||
\end{tabbing}
|
|
||||||
\hrule
|
|
||||||
\hrule
|
|
||||||
\vspace{-1.0mm}
|
|
||||||
\caption{Partitioning step}
|
|
||||||
\vspace{-3mm}
|
|
||||||
\label{fig:partitioningstep}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
Statement 1.1 of the {\bf for} loop presented in Figure~\ref{fig:partitioningstep}
|
|
||||||
reads sequentially all the keys of block $B_j$ from disk into an internal area
|
|
||||||
of size $\mu$.
|
|
||||||
|
|
||||||
Statement 1.2 performs an indirect bucket sort of the keys in block $B_j$
|
|
||||||
and at the same time updates the entries in the vector {\em size}.
|
|
||||||
Let us briefly describe how~$B_j$ is partitioned among the~$\lceil n/b\rceil$
|
|
||||||
buckets.
|
|
||||||
We use a local array of $\lceil n/b \rceil$ counters to store a
|
|
||||||
count of how many keys from $B_j$ belong to each bucket.
|
|
||||||
%At the same time, the global vector {\it size} is computed based on the local
|
|
||||||
%counters.
|
|
||||||
The pointers to the keys in each bucket $i$, $0 \leq i < \lceil n/b \rceil$,
|
|
||||||
are stored in contiguous positions in an array.
|
|
||||||
For this we first reserve the required number of entries
|
|
||||||
in this array of pointers using the information from the array of counters.
|
|
||||||
Next, we place the pointers to the keys in each bucket into the respective
|
|
||||||
reserved areas in the array (i.e., we place the pointers to the keys in bucket 0,
|
|
||||||
followed by the pointers to the keys in bucket 1, and so on).
|
|
||||||
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
To find the bucket address of a given key
|
|
||||||
we use the universal hash function $h_0(k)$~\cite{j97}.
|
|
||||||
Key~$k$ goes into bucket~$i$, where
|
|
||||||
%Then, for each integer $h_0(k)$ the respective bucket address is obtained
|
|
||||||
%as follows:
|
|
||||||
\begin{eqnarray} \label{eq:bucketindex}
|
|
||||||
i=h_0(k) \bmod \left \lceil \frac{n}{b} \right \rceil.
|
|
||||||
\end{eqnarray}
|
|
||||||
|
|
||||||
Figure~\ref{fig:brz-partitioning}(a) shows a \emph{logical} view of the
|
|
||||||
$\lceil n/b \rceil$ buckets generated in the partitioning step.
|
|
||||||
%In this case, the keys of each bucket are put together by the pointers to
|
|
||||||
%each key stored
|
|
||||||
%in contiguous positions in the array of pointers.
|
|
||||||
In reality, the keys belonging to each bucket are distributed among many files,
|
|
||||||
as depicted in Figure~\ref{fig:brz-partitioning}(b).
|
|
||||||
In the example of Figure~\ref{fig:brz-partitioning}(b), the keys in bucket 0
|
|
||||||
appear in files 1 and $N$, the keys in bucket 1 appear in files 1, 2
|
|
||||||
and $N$, and so on.
|
|
||||||
|
|
||||||
\vspace{-7mm}
|
|
||||||
\begin{figure}[ht]
|
|
||||||
\centering
|
|
||||||
\begin{picture}(0,0)%
|
|
||||||
\includegraphics{figs/brz-partitioning}%
|
|
||||||
\end{picture}%
|
|
||||||
\setlength{\unitlength}{4144sp}%
|
|
||||||
%
|
|
||||||
\begingroup\makeatletter\ifx\SetFigFont\undefined%
|
|
||||||
\gdef\SetFigFont#1#2#3#4#5{%
|
|
||||||
\reset@font\fontsize{#1}{#2pt}%
|
|
||||||
\fontfamily{#3}\fontseries{#4}\fontshape{#5}%
|
|
||||||
\selectfont}%
|
|
||||||
\fi\endgroup%
|
|
||||||
\begin{picture}(4371,1403)(1,-6977)
|
|
||||||
\put(333,-6421){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}0}}}}
|
|
||||||
\put(545,-6421){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}1}}}}
|
|
||||||
\put(759,-6421){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}2}}}}
|
|
||||||
\put(1539,-6421){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}${\lceil n/b\rceil - 1}$}}}}
|
|
||||||
\put(541,-6676){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Buckets Logical View}}}}
|
|
||||||
\put(3547,-6120){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}}
|
|
||||||
\put(3547,-6188){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}}
|
|
||||||
\put(3547,-6255){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}}
|
|
||||||
\put(3107,-6120){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}}
|
|
||||||
\put(3107,-6188){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}}
|
|
||||||
\put(3107,-6255){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}}
|
|
||||||
\put(4177,-6224){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}}
|
|
||||||
\put(4177,-6269){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}}
|
|
||||||
\put(4177,-6314){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}}
|
|
||||||
\put(3016,-6721){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}File 1}}}}
|
|
||||||
\put(3466,-6721){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}File 2}}}}
|
|
||||||
\put(4096,-6721){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}File N}}}}
|
|
||||||
\put(3196,-6946){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Buckets Physical View}}}}
|
|
||||||
\end{picture}%
|
|
||||||
\caption{Situation of the buckets at the end of the partitioning step: (a) Logical view (b) Physical view}
|
|
||||||
\label{fig:brz-partitioning}
|
|
||||||
\vspace{-2mm}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
This scattering of the keys in the buckets could generate a performance
|
|
||||||
problem because of the potential number of seeks
|
|
||||||
needed to read the keys in each bucket from the $N$ files in disk
|
|
||||||
during the searching step.
|
|
||||||
But, as we show later in Section~\ref{sec:analytcal-results}, the number of seeks
|
|
||||||
can be kept small using buffering techniques.
|
|
||||||
Considering that only the vector {\it size}, which has $\lceil n/b \rceil$
|
|
||||||
one-byte entries (remember that each bucket has at most 256 keys),
|
|
||||||
must be maintained in main memory during the searching step,
|
|
||||||
almost all main memory is available to be used as disk I/O buffer.
|
|
||||||
|
|
||||||
The last step is to compute the {\it offset} vector and dump it to the disk.
|
|
||||||
We use the vector $\mathit{size}$ to compute the
|
|
||||||
$\mathit{offset}$ displacement vector.
|
|
||||||
The $\mathit{offset}[i]$ entry contains the number of keys
|
|
||||||
in the buckets $0, 1, \dots, i-1$.
|
|
||||||
As {\it size}$[i]$ stores the number of keys
|
|
||||||
in bucket $i$, where $0 \leq i <\lceil n/b \rceil$, we have
|
|
||||||
\begin{displaymath}
|
|
||||||
\mathit{offset}[i] = \sum_{j=0}^{i-1} \mathit{size}[j] \cdot
|
|
||||||
\end{displaymath}
|
|
||||||
|
|
|
@ -1,113 +0,0 @@
|
||||||
% Nivio: 29/jan/06
|
|
||||||
% Time-stamp: <Monday 30 Jan 2006 12:13:14pm EST yoshi@flare>
|
|
||||||
\subsection{Performance of the new algorithm}
|
|
||||||
\label{sec:performance}
|
|
||||||
%As we have done for the internal memory based algorithm,
|
|
||||||
|
|
||||||
The runtime of our algorithm is also a random variable, but now it follows a
|
|
||||||
(highly concentrated) normal distribution, as we discuss at the end of this
|
|
||||||
section. Again, we are interested in verifying the linearity claim made in
|
|
||||||
Section~\ref{sec:linearcomplexity}. Therefore, we ran the algorithm for
|
|
||||||
several numbers $n$ of keys in $S$.
|
|
||||||
|
|
||||||
The values chosen for $n$ were $1, 2, 4, 8, 16, 32, 64, 128, 512$ and $1000$
|
|
||||||
million.
|
|
||||||
%Just the small vector {\it size} must be kept in main memory,
|
|
||||||
%as we saw in Section~\ref{sec:memconstruction}.
|
|
||||||
We limited the main memory in 500 megabytes for the experiments.
|
|
||||||
The size $\mu$ of the a priori reserved internal memory area
|
|
||||||
was set to 250 megabytes, the parameter $b$ was set to $175$ and
|
|
||||||
the building block algorithm parameter $c$ was again set to $1$.
|
|
||||||
In Section~\ref{sec:contr-disk-access} we show how $\mu$
|
|
||||||
affects the runtime of the algorithm. The other two parameters
|
|
||||||
have insignificant influence on the runtime.
|
|
||||||
|
|
||||||
We again use a statistical method for determining a suitable sample size
|
|
||||||
%~\cite[Chapter 13]{j91}
|
|
||||||
to estimate the number of trials to be run for each value of $n$. We got that
|
|
||||||
just one trial for each $n$ would be enough with a confidence level of $95\%$.
|
|
||||||
However, we made 10 trials. This number of trials seems rather small, but, as
|
|
||||||
shown below, the behavior of our algorithm is very stable and its runtime is
|
|
||||||
almost deterministic (i.e., the standard deviation is very small).
|
|
||||||
|
|
||||||
Table~\ref{tab:mediasbrz} presents the runtime average for each $n$,
|
|
||||||
the respective standard deviations, and
|
|
||||||
the respective confidence intervals given by
|
|
||||||
the average time $\pm$ the distance from average time
|
|
||||||
considering a confidence level of $95\%$.
|
|
||||||
Observing the runtime averages we noticed that
|
|
||||||
the algorithm runs in expected linear time,
|
|
||||||
as shown in~Section~\ref{sec:linearcomplexity}. Better still,
|
|
||||||
it is only approximately $60\%$ slower than our internal memory based algorithm.
|
|
||||||
To get that value we used the linear regression model obtained for the runtime of
|
|
||||||
the internal memory based algorithm to estimate how much time it would require
|
|
||||||
for constructing a MPHF for a set of 1 billion keys.
|
|
||||||
We got 2.3 hours for the internal memory based algorithm and we measured
|
|
||||||
3.67 hours on average for our algorithm.
|
|
||||||
Increasing the size of the internal memory area
|
|
||||||
from 250 to 600 megabytes (see Section~\ref{sec:contr-disk-access}),
|
|
||||||
we have brought the time to 3.09 hours. In this case, our algorithm is
|
|
||||||
just $34\%$ slower in this setup.
|
|
||||||
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
\begin{table*}[htb]
|
|
||||||
\vspace{-1mm}
|
|
||||||
\begin{center}
|
|
||||||
{\scriptsize
|
|
||||||
\begin{tabular}{|l|c|c|c|c|c|}
|
|
||||||
\hline
|
|
||||||
$n$ (millions) & 1 & 2 & 4 & 8 & 16 \\
|
|
||||||
\hline % Part. 16 \% 16 \% 16 \% 18 \% 20\%
|
|
||||||
Average time (s) & $6.9 \pm 0.3$ & $13.8 \pm 0.2$ & $31.9 \pm 0.7$ & $69.9 \pm 1.1$ & $140.6 \pm 2.5$ \\
|
|
||||||
SD & $0.4$ & $0.2$ & $0.9$ & $1.5$ & $3.5$ \\
|
|
||||||
\hline
|
|
||||||
\hline
|
|
||||||
$n$ (millions) & 32 & 64 & 128 & 512 & 1000 \\
|
|
||||||
\hline % Part. 20 \% 20\% 20\% 18\% 18\%
|
|
||||||
Average time (s) & $284.3 \pm 1.1$ & $587.9 \pm 3.9$ & $1223.6 \pm 4.9$ & $5966.4 \pm 9.5$ & $13229.5 \pm 12.7$ \\
|
|
||||||
SD & $1.6$ & $5.5$ & $6.8$ & $13.2$ & $18.6$ \\
|
|
||||||
\hline
|
|
||||||
|
|
||||||
\end{tabular}
|
|
||||||
\vspace{-1mm}
|
|
||||||
}
|
|
||||||
\end{center}
|
|
||||||
\caption{Our algorithm: average time in seconds for constructing a MPHF,
|
|
||||||
the standard deviation (SD), and the confidence intervals considering
|
|
||||||
a confidence level of $95\%$.
|
|
||||||
}
|
|
||||||
\label{tab:mediasbrz}
|
|
||||||
\vspace{-5mm}
|
|
||||||
\end{table*}
|
|
||||||
|
|
||||||
Figure~\ref{fig:brz_temporegressao}
|
|
||||||
presents the runtime for each trial. In addition,
|
|
||||||
the solid line corresponds to a linear regression model
|
|
||||||
obtained from the experimental measurements.
|
|
||||||
As we were expecting the runtime for a given $n$ has almost no
|
|
||||||
variation.
|
|
||||||
|
|
||||||
\begin{figure}[htb]
|
|
||||||
\begin{center}
|
|
||||||
\scalebox{0.4}{\includegraphics{figs/brz_temporegressao}}
|
|
||||||
\caption{Time versus number of keys in $S$ for our algorithm. The solid line corresponds to
|
|
||||||
a linear regression model.}
|
|
||||||
\label{fig:brz_temporegressao}
|
|
||||||
\end{center}
|
|
||||||
\vspace{-9mm}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
An intriguing observation is that the runtime of the algorithm is almost
|
|
||||||
deterministic, in spite of the fact that it uses as building block an
|
|
||||||
algorithm with a considerable fluctuation in its runtime. A given bucket~$i$,
|
|
||||||
$0 \leq i < \lceil n/b \rceil$, is a small set of keys (at most 256 keys) and,
|
|
||||||
as argued in Section~\ref{sec:intern-memory-algor}, the runtime of the
|
|
||||||
building block algorithm is a random variable~$X_i$ with high fluctuation.
|
|
||||||
However, the runtime~$Y$ of the searching step of our algorithm is given
|
|
||||||
by~$Y=\sum_{0\leq i<\lceil n/b\rceil}X_i$. Under the hypothesis that
|
|
||||||
the~$X_i$ are independent and bounded, the {\it law of large numbers} (see,
|
|
||||||
e.g., \cite{j91}) implies that the random variable $Y/\lceil n/b\rceil$
|
|
||||||
converges to a constant as~$n\to\infty$. This explains why the runtime of our
|
|
||||||
algorithm is almost deterministic.
|
|
||||||
|
|
||||||
|
|
|
@ -1,814 +0,0 @@
|
||||||
|
|
||||||
@InProceedings{Brin1998,
|
|
||||||
author = "Sergey Brin and Lawrence Page",
|
|
||||||
title = "The Anatomy of a Large-Scale Hypertextual Web Search Engine",
|
|
||||||
booktitle = "Proceedings of the 7th International {World Wide Web}
|
|
||||||
Conference",
|
|
||||||
pages = "107--117",
|
|
||||||
adress = "Brisbane, Australia",
|
|
||||||
month = "April",
|
|
||||||
year = 1998,
|
|
||||||
annote = "Artigo do Google."
|
|
||||||
}
|
|
||||||
|
|
||||||
@inproceedings{p99,
|
|
||||||
author = {R. Pagh},
|
|
||||||
title = {Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions},
|
|
||||||
booktitle = {Workshop on Algorithms and Data Structures},
|
|
||||||
pages = {49-54},
|
|
||||||
year = 1999,
|
|
||||||
url = {citeseer.nj.nec.com/pagh99hash.html},
|
|
||||||
key = {author}
|
|
||||||
}
|
|
||||||
|
|
||||||
@article{p00,
|
|
||||||
author = {R. Pagh},
|
|
||||||
title = {Faster deterministic dictionaries},
|
|
||||||
journal = {Symposium on Discrete Algorithms (ACM SODA)},
|
|
||||||
OPTvolume = {43},
|
|
||||||
OPTnumber = {5},
|
|
||||||
pages = {487--493},
|
|
||||||
year = {2000}
|
|
||||||
}
|
|
||||||
@article{g81,
|
|
||||||
author = {G. H. Gonnet},
|
|
||||||
title = {Expected Length of the Longest Probe Sequence in Hash Code Searching},
|
|
||||||
journal = {J. ACM},
|
|
||||||
volume = {28},
|
|
||||||
number = {2},
|
|
||||||
year = {1981},
|
|
||||||
issn = {0004-5411},
|
|
||||||
pages = {289--304},
|
|
||||||
doi = {http://doi.acm.org/10.1145/322248.322254},
|
|
||||||
publisher = {ACM Press},
|
|
||||||
address = {New York, NY, USA},
|
|
||||||
}
|
|
||||||
|
|
||||||
@misc{r04,
|
|
||||||
author = "S. Rao",
|
|
||||||
title = "Combinatorial Algorithms Data Structures",
|
|
||||||
year = 2004,
|
|
||||||
howpublished = {CS 270 Spring},
|
|
||||||
url = "citeseer.ist.psu.edu/700201.html"
|
|
||||||
}
|
|
||||||
@article{ra98,
|
|
||||||
author = {Martin Raab and Angelika Steger},
|
|
||||||
title = {``{B}alls into Bins'' --- {A} Simple and Tight Analysis},
|
|
||||||
journal = {Lecture Notes in Computer Science},
|
|
||||||
volume = 1518,
|
|
||||||
pages = {159--170},
|
|
||||||
year = 1998,
|
|
||||||
url = "citeseer.ist.psu.edu/raab98balls.html"
|
|
||||||
}
|
|
||||||
|
|
||||||
@misc{mrs00,
|
|
||||||
author = "M. Mitzenmacher and A. Richa and R. Sitaraman",
|
|
||||||
title = "The power of two random choices: A survey of the techniques and results",
|
|
||||||
howpublished={In Handbook of Randomized
|
|
||||||
Computing, P. Pardalos, S. Rajasekaran, and J. Rolim, Eds. Kluwer},
|
|
||||||
year = "2000",
|
|
||||||
url = "citeseer.ist.psu.edu/article/mitzenmacher00power.html"
|
|
||||||
}
|
|
||||||
|
|
||||||
@article{dfm02,
|
|
||||||
author = {E. Drinea and A. Frieze and M. Mitzenmacher},
|
|
||||||
title = {Balls and bins models with feedback},
|
|
||||||
journal = {Symposium on Discrete Algorithms (ACM SODA)},
|
|
||||||
pages = {308--315},
|
|
||||||
year = {2002}
|
|
||||||
}
|
|
||||||
@Article{j97,
|
|
||||||
author = {Bob Jenkins},
|
|
||||||
title = {Algorithm Alley: Hash Functions},
|
|
||||||
journal = {Dr. Dobb's Journal of Software Tools},
|
|
||||||
volume = {22},
|
|
||||||
number = {9},
|
|
||||||
month = {september},
|
|
||||||
year = {1997}
|
|
||||||
}
|
|
||||||
|
|
||||||
@article{gss01,
|
|
||||||
author = {N. Galli and B. Seybold and K. Simon},
|
|
||||||
title = {Tetris-Hashing or optimal table compression},
|
|
||||||
journal = {Discrete Applied Mathematics},
|
|
||||||
volume = {110},
|
|
||||||
number = {1},
|
|
||||||
pages = {41--58},
|
|
||||||
month = {june},
|
|
||||||
publisher = {Elsevier Science},
|
|
||||||
year = {2001}
|
|
||||||
}
|
|
||||||
|
|
||||||
@article{s05,
|
|
||||||
author = {M. Seltzer},
|
|
||||||
title = {Beyond Relational Databases},
|
|
||||||
journal = {ACM Queue},
|
|
||||||
volume = {3},
|
|
||||||
number = {3},
|
|
||||||
month = {April},
|
|
||||||
year = {2005}
|
|
||||||
}
|
|
||||||
|
|
||||||
@InProceedings{ss89,
|
|
||||||
author = {P. Schmidt and A. Siegel},
|
|
||||||
title = {On aspects of universality and performance for closed hashing},
|
|
||||||
booktitle = {Proc. 21th Ann. ACM Symp. on Theory of Computing -- STOC'89},
|
|
||||||
month = {May},
|
|
||||||
year = {1989},
|
|
||||||
pages = {355--366}
|
|
||||||
}
|
|
||||||
|
|
||||||
@article{asw00,
|
|
||||||
author = {M. Atici and D. R. Stinson and R. Wei.},
|
|
||||||
title = {A new practical algorithm for the construction of a perfect hash function},
|
|
||||||
journal = {Journal Combin. Math. Combin. Comput.},
|
|
||||||
volume = {35},
|
|
||||||
pages = {127--145},
|
|
||||||
year = {2000}
|
|
||||||
}
|
|
||||||
|
|
||||||
@article{swz00,
|
|
||||||
author = {D. R. Stinson and R. Wei and L. Zhu},
|
|
||||||
title = {New constructions for perfect hash families and related structures using combinatorial designs and codes},
|
|
||||||
journal = {Journal Combin. Designs.},
|
|
||||||
volume = {8},
|
|
||||||
pages = {189--200},
|
|
||||||
year = {2000}
|
|
||||||
}
|
|
||||||
|
|
||||||
@inproceedings{ht01,
|
|
||||||
author = {T. Hagerup and T. Tholey},
|
|
||||||
title = {Efficient minimal perfect hashing in nearly minimal space},
|
|
||||||
booktitle = {The 18th Symposium on Theoretical Aspects of Computer Science (STACS), volume 2010 of Lecture Notes in Computer Science},
|
|
||||||
year = 2001,
|
|
||||||
pages = {317--326},
|
|
||||||
key = {author}
|
|
||||||
}
|
|
||||||
|
|
||||||
@inproceedings{dh01,
|
|
||||||
author = {M. Dietzfelbinger and T. Hagerup},
|
|
||||||
title = {Simple minimal perfect hashing in less space},
|
|
||||||
booktitle = {The 9th European Symposium on Algorithms (ESA), volume 2161 of Lecture Notes in Computer Science},
|
|
||||||
year = 2001,
|
|
||||||
pages = {109--120},
|
|
||||||
key = {author}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@MastersThesis{mar00,
|
|
||||||
author = {M. S. Neubert},
|
|
||||||
title = {Algoritmos Distribu;os para a Constru;o de Arquivos invertidos},
|
|
||||||
school = {Departamento de Ci;cia da Computa;o, Universidade Federal de Minas Gerais},
|
|
||||||
year = 2000,
|
|
||||||
month = {Mar;},
|
|
||||||
key = {author}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@Book{clrs01,
|
|
||||||
author = {T. H. Cormen and C. E. Leiserson and R. L. Rivest and C. Stein},
|
|
||||||
title = {Introduction to Algorithms},
|
|
||||||
publisher = {MIT Press},
|
|
||||||
year = {2001},
|
|
||||||
edition = {second},
|
|
||||||
}
|
|
||||||
|
|
||||||
@Book{j91,
|
|
||||||
author = {R. Jain},
|
|
||||||
title = {The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modeling. },
|
|
||||||
publisher = {John Wiley},
|
|
||||||
year = {1991},
|
|
||||||
edition = {first}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Book{k73,
|
|
||||||
author = {D. E. Knuth},
|
|
||||||
title = {The Art of Computer Programming: Sorting and Searching},
|
|
||||||
publisher = {Addison-Wesley},
|
|
||||||
volume = {3},
|
|
||||||
year = {1973},
|
|
||||||
edition = {second},
|
|
||||||
}
|
|
||||||
|
|
||||||
@inproceedings{rp99,
|
|
||||||
author = {R. Pagh},
|
|
||||||
title = {Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions},
|
|
||||||
booktitle = {Workshop on Algorithms and Data Structures},
|
|
||||||
pages = {49-54},
|
|
||||||
year = 1999,
|
|
||||||
url = {citeseer.nj.nec.com/pagh99hash.html},
|
|
||||||
key = {author}
|
|
||||||
}
|
|
||||||
|
|
||||||
@inproceedings{hmwc93,
|
|
||||||
author = {G. Havas and B.S. Majewski and N.C. Wormald and Z.J. Czech},
|
|
||||||
title = {Graphs, Hypergraphs and Hashing},
|
|
||||||
booktitle = {19th International Workshop on Graph-Theoretic Concepts in Computer Science},
|
|
||||||
publisher = {Springer Lecture Notes in Computer Science vol. 790},
|
|
||||||
pages = {153-165},
|
|
||||||
year = 1993,
|
|
||||||
key = {author}
|
|
||||||
}
|
|
||||||
|
|
||||||
@inproceedings{bkz05,
|
|
||||||
author = {F.C. Botelho and Y. Kohayakawa and N. Ziviani},
|
|
||||||
title = {A Practical Minimal Perfect Hashing Method},
|
|
||||||
booktitle = {4th International Workshop on Efficient and Experimental Algorithms},
|
|
||||||
publisher = {Springer Lecture Notes in Computer Science vol. 3503},
|
|
||||||
pages = {488-500},
|
|
||||||
Moth = May,
|
|
||||||
year = 2005,
|
|
||||||
key = {author}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{chm97,
|
|
||||||
author = {Z.J. Czech and G. Havas and B.S. Majewski},
|
|
||||||
title = {Fundamental Study Perfect Hashing},
|
|
||||||
journal = {Theoretical Computer Science},
|
|
||||||
volume = {182},
|
|
||||||
year = {1997},
|
|
||||||
pages = {1-143},
|
|
||||||
key = {author}
|
|
||||||
}
|
|
||||||
|
|
||||||
@article{chm92,
|
|
||||||
author = {Z.J. Czech and G. Havas and B.S. Majewski},
|
|
||||||
title = {An Optimal Algorithm for Generating Minimal Perfect Hash Functions},
|
|
||||||
journal = {Information Processing Letters},
|
|
||||||
volume = {43},
|
|
||||||
number = {5},
|
|
||||||
pages = {257-264},
|
|
||||||
year = {1992},
|
|
||||||
url = {citeseer.nj.nec.com/czech92optimal.html},
|
|
||||||
key = {author}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{mwhc96,
|
|
||||||
author = {B.S. Majewski and N.C. Wormald and G. Havas and Z.J. Czech},
|
|
||||||
title = {A family of perfect hashing methods},
|
|
||||||
journal = {The Computer Journal},
|
|
||||||
year = {1996},
|
|
||||||
volume = {39},
|
|
||||||
number = {6},
|
|
||||||
pages = {547-554},
|
|
||||||
key = {author}
|
|
||||||
}
|
|
||||||
|
|
||||||
@InProceedings{bv04,
|
|
||||||
author = {P. Boldi and S. Vigna},
|
|
||||||
title = {The WebGraph Framework I: Compression Techniques},
|
|
||||||
booktitle = {13th International World Wide Web Conference},
|
|
||||||
pages = {595--602},
|
|
||||||
year = {2004}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@Book{z04,
|
|
||||||
author = {N. Ziviani},
|
|
||||||
title = {Projeto de Algoritmos com implementa;es em Pascal e C},
|
|
||||||
publisher = {Pioneira Thompson},
|
|
||||||
year = 2004,
|
|
||||||
edition = {segunda edi;o}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@Book{p85,
|
|
||||||
author = {E. M. Palmer},
|
|
||||||
title = {Graphical Evolution: An Introduction to the Theory of Random Graphs},
|
|
||||||
publisher = {John Wiley \& Sons},
|
|
||||||
year = {1985},
|
|
||||||
address = {New York}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Book{imb99,
|
|
||||||
author = {I.H. Witten and A. Moffat and T.C. Bell},
|
|
||||||
title = {Managing Gigabytes: Compressing and Indexing Documents and Images},
|
|
||||||
publisher = {Morgan Kaufmann Publishers},
|
|
||||||
year = 1999,
|
|
||||||
edition = {second edition}
|
|
||||||
}
|
|
||||||
@Book{wfe68,
|
|
||||||
author = {W. Feller},
|
|
||||||
title = { An Introduction to Probability Theory and Its Applications},
|
|
||||||
publisher = {Wiley},
|
|
||||||
year = 1968,
|
|
||||||
volume = 1,
|
|
||||||
optedition = {second edition}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@Article{fhcd92,
|
|
||||||
author = {E.A. Fox and L. S. Heath and Q. Chen and A.M. Daoud},
|
|
||||||
title = {Practical Minimal Perfect Hash Functions For Large Databases},
|
|
||||||
journal = {Communications of the ACM},
|
|
||||||
year = {1992},
|
|
||||||
volume = {35},
|
|
||||||
number = {1},
|
|
||||||
pages = {105--121}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@inproceedings{fch92,
|
|
||||||
author = {E.A. Fox and Q.F. Chen and L.S. Heath},
|
|
||||||
title = {A Faster Algorithm for Constructing Minimal Perfect Hash Functions},
|
|
||||||
booktitle = {Proceedings of the 15th Annual International ACM SIGIR Conference
|
|
||||||
on Research and Development in Information Retrieval},
|
|
||||||
year = {1992},
|
|
||||||
pages = {266-273},
|
|
||||||
}
|
|
||||||
|
|
||||||
@article{c80,
|
|
||||||
author = {R.J. Cichelli},
|
|
||||||
title = {Minimal perfect hash functions made simple},
|
|
||||||
journal = {Communications of the ACM},
|
|
||||||
volume = {23},
|
|
||||||
number = {1},
|
|
||||||
year = {1980},
|
|
||||||
issn = {0001-0782},
|
|
||||||
pages = {17--19},
|
|
||||||
doi = {http://doi.acm.org/10.1145/358808.358813},
|
|
||||||
publisher = {ACM Press},
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@TechReport{fhc89,
|
|
||||||
author = {E.A. Fox and L.S. Heath and Q.F. Chen},
|
|
||||||
title = {An $O(n\log n)$ algorithm for finding minimal perfect hash functions},
|
|
||||||
institution = {Virginia Polytechnic Institute and State University},
|
|
||||||
year = {1989},
|
|
||||||
OPTkey = {},
|
|
||||||
OPTtype = {},
|
|
||||||
OPTnumber = {},
|
|
||||||
address = {Blacksburg, VA},
|
|
||||||
month = {April},
|
|
||||||
OPTnote = {},
|
|
||||||
OPTannote = {}
|
|
||||||
}
|
|
||||||
|
|
||||||
@TechReport{bkz06t,
|
|
||||||
author = {F.C. Botelho and Y. Kohayakawa and N. Ziviani},
|
|
||||||
title = {An Approach for Minimal Perfect Hash Functions in Very Large Databases},
|
|
||||||
institution = {Department of Computer Science, Federal University of Minas Gerais},
|
|
||||||
note = {Available at http://www.dcc.ufmg.br/\texttt{\~ }nivio/pub/technicalreports.html},
|
|
||||||
year = {2006},
|
|
||||||
OPTkey = {},
|
|
||||||
OPTtype = {},
|
|
||||||
number = {RT.DCC.003},
|
|
||||||
address = {Belo Horizonte, MG, Brazil},
|
|
||||||
month = {April},
|
|
||||||
OPTannote = {}
|
|
||||||
}
|
|
||||||
|
|
||||||
@inproceedings{fcdh90,
|
|
||||||
author = {E.A. Fox and Q.F. Chen and A.M. Daoud and L.S. Heath},
|
|
||||||
title = {Order preserving minimal perfect hash functions and information retrieval},
|
|
||||||
booktitle = {Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval},
|
|
||||||
year = {1990},
|
|
||||||
isbn = {0-89791-408-2},
|
|
||||||
pages = {279--311},
|
|
||||||
location = {Brussels, Belgium},
|
|
||||||
doi = {http://doi.acm.org/10.1145/96749.98233},
|
|
||||||
publisher = {ACM Press},
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{fkp89,
|
|
||||||
author = {P. Flajolet and D. E. Knuth and B. Pittel},
|
|
||||||
title = {The first cycles in an evolving graph},
|
|
||||||
journal = {Discrete Math},
|
|
||||||
year = {1989},
|
|
||||||
volume = {75},
|
|
||||||
pages = {167-215},
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{s77,
|
|
||||||
author = {R. Sprugnoli},
|
|
||||||
title = {Perfect Hashing Functions: A Single Probe Retrieving
|
|
||||||
Method For Static Sets},
|
|
||||||
journal = {Communications of the ACM},
|
|
||||||
year = {1977},
|
|
||||||
volume = {20},
|
|
||||||
number = {11},
|
|
||||||
pages = {841--850},
|
|
||||||
month = {November},
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{j81,
|
|
||||||
author = {G. Jaeschke},
|
|
||||||
title = {Reciprocal Hashing: A method For Generating Minimal Perfect
|
|
||||||
Hashing Functions},
|
|
||||||
journal = {Communications of the ACM},
|
|
||||||
year = {1981},
|
|
||||||
volume = {24},
|
|
||||||
number = {12},
|
|
||||||
month = {December},
|
|
||||||
pages = {829--833}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{c84,
|
|
||||||
author = {C. C. Chang},
|
|
||||||
title = {The Study Of An Ordered Minimal Perfect Hashing Scheme},
|
|
||||||
journal = {Communications of the ACM},
|
|
||||||
year = {1984},
|
|
||||||
volume = {27},
|
|
||||||
number = {4},
|
|
||||||
month = {December},
|
|
||||||
pages = {384--387}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{c86,
|
|
||||||
author = {C. C. Chang},
|
|
||||||
title = {Letter-Oriented Reciprocal Hashing Scheme},
|
|
||||||
journal = {Inform. Sci.},
|
|
||||||
year = {1986},
|
|
||||||
volume = {27},
|
|
||||||
pages = {243--255}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{cl86,
|
|
||||||
author = {C. C. Chang and R. C. T. Lee},
|
|
||||||
title = {A Letter-Oriented Minimal Perfect Hashing Scheme},
|
|
||||||
journal = {Computer Journal},
|
|
||||||
year = {1986},
|
|
||||||
volume = {29},
|
|
||||||
number = {3},
|
|
||||||
month = {June},
|
|
||||||
pages = {277--281}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@Article{cc88,
|
|
||||||
author = {C. C. Chang and C. H. Chang},
|
|
||||||
title = {An Ordered Minimal Perfect Hashing Scheme with Single Parameter},
|
|
||||||
journal = {Inform. Process. Lett.},
|
|
||||||
year = {1988},
|
|
||||||
volume = {27},
|
|
||||||
number = {2},
|
|
||||||
month = {February},
|
|
||||||
pages = {79--83}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{w90,
|
|
||||||
author = {V. G. Winters},
|
|
||||||
title = {Minimal Perfect Hashing in Polynomial Time},
|
|
||||||
journal = {BIT},
|
|
||||||
year = {1990},
|
|
||||||
volume = {30},
|
|
||||||
number = {2},
|
|
||||||
pages = {235--244}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{fcdh91,
|
|
||||||
author = {E. A. Fox and Q. F. Chen and A. M. Daoud and L. S. Heath},
|
|
||||||
title = {Order Preserving Minimal Perfect Hash Functions and Information Retrieval},
|
|
||||||
journal = {ACM Trans. Inform. Systems},
|
|
||||||
year = {1991},
|
|
||||||
volume = {9},
|
|
||||||
number = {3},
|
|
||||||
month = {July},
|
|
||||||
pages = {281--308}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{fks84,
|
|
||||||
author = {M. L. Fredman and J. Koml\'os and E. Szemer\'edi},
|
|
||||||
title = {Storing a sparse table with {O(1)} worst case access time},
|
|
||||||
journal = {J. ACM},
|
|
||||||
year = {1984},
|
|
||||||
volume = {31},
|
|
||||||
number = {3},
|
|
||||||
month = {July},
|
|
||||||
pages = {538--544}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{dhjs83,
|
|
||||||
author = {M. W. Du and T. M. Hsieh and K. F. Jea and D. W. Shieh},
|
|
||||||
title = {The study of a new perfect hash scheme},
|
|
||||||
journal = {IEEE Trans. Software Eng.},
|
|
||||||
year = {1983},
|
|
||||||
volume = {9},
|
|
||||||
number = {3},
|
|
||||||
month = {May},
|
|
||||||
pages = {305--313}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{bt94,
|
|
||||||
author = {M. D. Brain and A. L. Tharp},
|
|
||||||
title = {Using Tries to Eliminate Pattern Collisions in Perfect Hashing},
|
|
||||||
journal = {IEEE Trans. on Knowledge and Data Eng.},
|
|
||||||
year = {1994},
|
|
||||||
volume = {6},
|
|
||||||
number = {2},
|
|
||||||
month = {April},
|
|
||||||
pages = {239--247}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{bt90,
|
|
||||||
author = {M. D. Brain and A. L. Tharp},
|
|
||||||
title = {Perfect hashing using sparse matrix packing},
|
|
||||||
journal = {Inform. Systems},
|
|
||||||
year = {1990},
|
|
||||||
volume = {15},
|
|
||||||
number = {3},
|
|
||||||
OPTmonth = {April},
|
|
||||||
pages = {281--290}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{ckw93,
|
|
||||||
author = {C. C. Chang and H. C.Kowng and T. C. Wu},
|
|
||||||
title = {A refinement of a compression-oriented addressing scheme},
|
|
||||||
journal = {BIT},
|
|
||||||
year = {1993},
|
|
||||||
volume = {33},
|
|
||||||
number = {4},
|
|
||||||
OPTmonth = {April},
|
|
||||||
pages = {530--535}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{cw91,
|
|
||||||
author = {C. C. Chang and T. C. Wu},
|
|
||||||
title = {A letter-oriented perfect hashing scheme based upon sparse table compression},
|
|
||||||
journal = {Software -- Practice Experience},
|
|
||||||
year = {1991},
|
|
||||||
volume = {21},
|
|
||||||
number = {1},
|
|
||||||
month = {january},
|
|
||||||
pages = {35--49}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{ty79,
|
|
||||||
author = {R. E. Tarjan and A. C. C. Yao},
|
|
||||||
title = {Storing a sparse table},
|
|
||||||
journal = {Comm. ACM},
|
|
||||||
year = {1979},
|
|
||||||
volume = {22},
|
|
||||||
number = {11},
|
|
||||||
month = {November},
|
|
||||||
pages = {606--611}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{yd85,
|
|
||||||
author = {W. P. Yang and M. W. Du},
|
|
||||||
title = {A backtracking method for constructing perfect hash functions from a set of mapping functions},
|
|
||||||
journal = {BIT},
|
|
||||||
year = {1985},
|
|
||||||
volume = {25},
|
|
||||||
number = {1},
|
|
||||||
pages = {148--164}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{s85,
|
|
||||||
author = {T. J. Sager},
|
|
||||||
title = {A polynomial time generator for minimal perfect hash functions},
|
|
||||||
journal = {Commun. ACM},
|
|
||||||
year = {1985},
|
|
||||||
volume = {28},
|
|
||||||
number = {5},
|
|
||||||
month = {May},
|
|
||||||
pages = {523--532}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{cm93,
|
|
||||||
author = {Z. J. Czech and B. S. Majewski},
|
|
||||||
title = {A linear time algorithm for finding minimal perfect hash functions},
|
|
||||||
journal = {The computer Journal},
|
|
||||||
year = {1993},
|
|
||||||
volume = {36},
|
|
||||||
number = {6},
|
|
||||||
pages = {579--587}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{gbs94,
|
|
||||||
author = {R. Gupta and S. Bhaskar and S. Smolka},
|
|
||||||
title = {On randomization in sequential and distributed algorithms},
|
|
||||||
journal = {ACM Comput. Surveys},
|
|
||||||
year = {1994},
|
|
||||||
volume = {26},
|
|
||||||
number = {1},
|
|
||||||
month = {March},
|
|
||||||
pages = {7--86}
|
|
||||||
}
|
|
||||||
|
|
||||||
@InProceedings{sb84,
|
|
||||||
author = {C. Slot and P. V. E. Boas},
|
|
||||||
title = {On tape versus core; an application of space efficient perfect hash functions to the
|
|
||||||
invariance of space},
|
|
||||||
booktitle = {Proc. 16th Ann. ACM Symp. on Theory of Computing -- STOC'84},
|
|
||||||
address = {Washington},
|
|
||||||
month = {May},
|
|
||||||
year = {1984},
|
|
||||||
pages = {391--400},
|
|
||||||
}
|
|
||||||
|
|
||||||
@InProceedings{wi90,
|
|
||||||
author = {V. G. Winters},
|
|
||||||
title = {Minimal perfect hashing for large sets of data},
|
|
||||||
booktitle = {Internat. Conf. on Computing and Information -- ICCI'90},
|
|
||||||
address = {Canada},
|
|
||||||
month = {May},
|
|
||||||
year = {1990},
|
|
||||||
pages = {275--284},
|
|
||||||
}
|
|
||||||
|
|
||||||
@InProceedings{lr85,
|
|
||||||
author = {P. Larson and M. V. Ramakrishna},
|
|
||||||
title = {External perfect hashing},
|
|
||||||
booktitle = {Proc. ACM SIGMOD Conf.},
|
|
||||||
address = {Austin TX},
|
|
||||||
month = {June},
|
|
||||||
year = {1985},
|
|
||||||
pages = {190--199},
|
|
||||||
}
|
|
||||||
|
|
||||||
@Book{m84,
|
|
||||||
author = {K. Mehlhorn},
|
|
||||||
editor = {W. Brauer and G. Rozenberg and A. Salomaa},
|
|
||||||
title = {Data Structures and Algorithms 1: Sorting and Searching},
|
|
||||||
publisher = {Springer-Verlag},
|
|
||||||
year = {1984},
|
|
||||||
}
|
|
||||||
|
|
||||||
@PhdThesis{c92,
|
|
||||||
author = {Q. F. Chen},
|
|
||||||
title = {An Object-Oriented Database System for Efficient Information Retrieval Appliations},
|
|
||||||
school = {Virginia Tech Dept. of Computer Science},
|
|
||||||
year = {1992},
|
|
||||||
month = {March}
|
|
||||||
}
|
|
||||||
|
|
||||||
@article {er59,
|
|
||||||
AUTHOR = {Erd{\H{o}}s, P. and R{\'e}nyi, A.},
|
|
||||||
TITLE = {On random graphs {I}},
|
|
||||||
JOURNAL = {Pub. Math. Debrecen},
|
|
||||||
VOLUME = {6},
|
|
||||||
YEAR = {1959},
|
|
||||||
PAGES = {290--297},
|
|
||||||
MRCLASS = {05.00},
|
|
||||||
MRNUMBER = {MR0120167 (22 \#10924)},
|
|
||||||
MRREVIEWER = {A. Dvoretzky},
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@article {erdos61,
|
|
||||||
AUTHOR = {Erd{\H{o}}s, P. and R{\'e}nyi, A.},
|
|
||||||
TITLE = {On the evolution of random graphs},
|
|
||||||
JOURNAL = {Bull. Inst. Internat. Statist.},
|
|
||||||
VOLUME = 38,
|
|
||||||
YEAR = 1961,
|
|
||||||
PAGES = {343--347},
|
|
||||||
MRCLASS = {05.40 (55.10)},
|
|
||||||
MRNUMBER = {MR0148055 (26 \#5564)},
|
|
||||||
}
|
|
||||||
|
|
||||||
@article {er60,
|
|
||||||
AUTHOR = {Erd{\H{o}}s, P. and R{\'e}nyi, A.},
|
|
||||||
TITLE = {On the evolution of random graphs},
|
|
||||||
JOURNAL = {Magyar Tud. Akad. Mat. Kutat\'o Int. K\"ozl.},
|
|
||||||
VOLUME = {5},
|
|
||||||
YEAR = {1960},
|
|
||||||
PAGES = {17--61},
|
|
||||||
MRCLASS = {05.40},
|
|
||||||
MRNUMBER = {MR0125031 (23 \#A2338)},
|
|
||||||
MRREVIEWER = {J. Riordan},
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{er60:_Old,
|
|
||||||
author = {P. Erd{\H{o}}s and A. R\'enyi},
|
|
||||||
title = {On the evolution of random graphs},
|
|
||||||
journal = {Publications of the Mathematical Institute of the Hungarian
|
|
||||||
Academy of Sciences},
|
|
||||||
year = {1960},
|
|
||||||
volume = {56},
|
|
||||||
pages = {17-61}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{er61,
|
|
||||||
author = {P. Erd{\H{o}}s and A. R\'enyi},
|
|
||||||
title = {On the strength of connectedness of a random graph},
|
|
||||||
journal = {Acta Mathematica Scientia Hungary},
|
|
||||||
year = {1961},
|
|
||||||
volume = {12},
|
|
||||||
pages = {261-267}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@Article{bp04,
|
|
||||||
author = {B. Bollob\'as and O. Pikhurko},
|
|
||||||
title = {Integer Sets with Prescribed Pairwise Differences Being Distinct},
|
|
||||||
journal = {European Journal of Combinatorics},
|
|
||||||
OPTkey = {},
|
|
||||||
OPTvolume = {},
|
|
||||||
OPTnumber = {},
|
|
||||||
OPTpages = {},
|
|
||||||
OPTmonth = {},
|
|
||||||
note = {To Appear},
|
|
||||||
OPTannote = {}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{pw04:_OLD,
|
|
||||||
author = {B. Pittel and N. C. Wormald},
|
|
||||||
title = {Counting connected graphs inside-out},
|
|
||||||
journal = {Journal of Combinatorial Theory},
|
|
||||||
OPTkey = {},
|
|
||||||
OPTvolume = {},
|
|
||||||
OPTnumber = {},
|
|
||||||
OPTpages = {},
|
|
||||||
OPTmonth = {},
|
|
||||||
note = {To Appear},
|
|
||||||
OPTannote = {}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@Article{mr95,
|
|
||||||
author = {M. Molloy and B. Reed},
|
|
||||||
title = {A critical point for random graphs with a given degree sequence},
|
|
||||||
journal = {Random Structures and Algorithms},
|
|
||||||
year = {1995},
|
|
||||||
volume = {6},
|
|
||||||
pages = {161-179}
|
|
||||||
}
|
|
||||||
|
|
||||||
@TechReport{bmz04,
|
|
||||||
author = {F. C. Botelho and D. Menoti and N. Ziviani},
|
|
||||||
title = {A New algorithm for constructing minimal perfect hash functions},
|
|
||||||
institution = {Federal Univ. of Minas Gerais},
|
|
||||||
year = {2004},
|
|
||||||
OPTkey = {},
|
|
||||||
OPTtype = {},
|
|
||||||
number = {TR004},
|
|
||||||
OPTaddress = {},
|
|
||||||
OPTmonth = {},
|
|
||||||
note = {(http://www.dcc.ufmg.br/\texttt{\~ }nivio/pub/technicalreports.html)},
|
|
||||||
OPTannote = {}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Article{mr98,
|
|
||||||
author = {M. Molloy and B. Reed},
|
|
||||||
title = {The size of the giant component of a random graph with a given degree sequence},
|
|
||||||
journal = {Combinatorics, Probability and Computing},
|
|
||||||
year = {1998},
|
|
||||||
volume = {7},
|
|
||||||
pages = {295-305}
|
|
||||||
}
|
|
||||||
|
|
||||||
@misc{h98,
|
|
||||||
author = {D. Hawking},
|
|
||||||
title = {Overview of TREC-7 Very Large Collection Track (Draft for Notebook)},
|
|
||||||
url = {citeseer.ist.psu.edu/4991.html},
|
|
||||||
year = {1998}}
|
|
||||||
|
|
||||||
@book {jlr00,
|
|
||||||
AUTHOR = {Janson, S. and {\L}uczak, T. and Ruci{\'n}ski, A.},
|
|
||||||
TITLE = {Random graphs},
|
|
||||||
PUBLISHER = {Wiley-Inter.},
|
|
||||||
YEAR = 2000,
|
|
||||||
PAGES = {xii+333},
|
|
||||||
ISBN = {0-471-17541-2},
|
|
||||||
MRCLASS = {05C80 (60C05 82B41)},
|
|
||||||
MRNUMBER = {2001k:05180},
|
|
||||||
MRREVIEWER = {Mark R. Jerrum},
|
|
||||||
}
|
|
||||||
|
|
||||||
@incollection {jlr90,
|
|
||||||
AUTHOR = {Janson, Svante and {\L}uczak, Tomasz and Ruci{\'n}ski,
|
|
||||||
Andrzej},
|
|
||||||
TITLE = {An exponential bound for the probability of nonexistence of a
|
|
||||||
specified subgraph in a random graph},
|
|
||||||
BOOKTITLE = {Random graphs '87 (Pozna\'n, 1987)},
|
|
||||||
PAGES = {73--87},
|
|
||||||
PUBLISHER = {Wiley},
|
|
||||||
ADDRESS = {Chichester},
|
|
||||||
YEAR = 1990,
|
|
||||||
MRCLASS = {05C80 (60C05)},
|
|
||||||
MRNUMBER = {91m:05168},
|
|
||||||
MRREVIEWER = {J. Spencer},
|
|
||||||
}
|
|
||||||
|
|
||||||
@book {b01,
|
|
||||||
AUTHOR = {Bollob{\'a}s, B.},
|
|
||||||
TITLE = {Random graphs},
|
|
||||||
SERIES = {Cambridge Studies in Advanced Mathematics},
|
|
||||||
VOLUME = 73,
|
|
||||||
EDITION = {Second},
|
|
||||||
PUBLISHER = {Cambridge University Press},
|
|
||||||
ADDRESS = {Cambridge},
|
|
||||||
YEAR = 2001,
|
|
||||||
PAGES = {xviii+498},
|
|
||||||
ISBN = {0-521-80920-7; 0-521-79722-5},
|
|
||||||
MRCLASS = {05C80 (60C05)},
|
|
||||||
MRNUMBER = {MR1864966 (2002j:05132)},
|
|
||||||
}
|
|
||||||
|
|
||||||
@article {pw04,
|
|
||||||
AUTHOR = {Pittel, Boris and Wormald, Nicholas C.},
|
|
||||||
TITLE = {Counting connected graphs inside-out},
|
|
||||||
JOURNAL = {J. Combin. Theory Ser. B},
|
|
||||||
FJOURNAL = {Journal of Combinatorial Theory. Series B},
|
|
||||||
VOLUME = 93,
|
|
||||||
YEAR = 2005,
|
|
||||||
NUMBER = 2,
|
|
||||||
PAGES = {127--172},
|
|
||||||
ISSN = {0095-8956},
|
|
||||||
CODEN = {JCBTB8},
|
|
||||||
MRCLASS = {05C30 (05A16 05C40 05C80)},
|
|
||||||
MRNUMBER = {MR2117934 (2005m:05117)},
|
|
||||||
MRREVIEWER = {Edward A. Bender},
|
|
||||||
}
|
|
|
@ -1,112 +0,0 @@
|
||||||
% Time-stamp: <Monday 30 Jan 2006 03:06:57am EDT yoshi@ime.usp.br>
|
|
||||||
\vspace{-3mm}
|
|
||||||
\section{Related work}
|
|
||||||
\label{sec:relatedprevious-work}
|
|
||||||
\vspace{-2mm}
|
|
||||||
|
|
||||||
% Optimal speed for hashing means that each key from the key set $S$
|
|
||||||
% will map to an unique location in the hash table, avoiding time wasted
|
|
||||||
% in resolving collisions. That is achieved with a MPHF and
|
|
||||||
% because of that many algorithms for constructing static
|
|
||||||
% and dynamic MPHFs, when static or dynamic sets are involved,
|
|
||||||
% were developed. Our focus has been on static MPHFs, since
|
|
||||||
% in many applications the key sets change slowly, if at all~\cite{s05}.
|
|
||||||
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
Czech, Havas and Majewski~\cite{chm97} provide a
|
|
||||||
comprehensive survey of the most important theoretical and practical results
|
|
||||||
on perfect hashing.
|
|
||||||
In this section we review some of the most important results.
|
|
||||||
%We also present more recent algorithms that share some features with
|
|
||||||
%the one presented hereinafter.
|
|
||||||
|
|
||||||
Fredman, Koml\'os and Szemer\'edi~\cite{FKS84} showed that it is possible to
|
|
||||||
construct space efficient perfect hash functions that can be evaluated in
|
|
||||||
constant time with table sizes that are linear in the number of keys:
|
|
||||||
$m=O(n)$. In their model of computation, an element of the universe~$U$ fits
|
|
||||||
into one machine word, and arithmetic operations and memory accesses have unit
|
|
||||||
cost. Randomized algorithms in the FKS model can construct a perfect hash
|
|
||||||
function in expected time~$O(n)$:
|
|
||||||
this is the case of our algorithm and the works in~\cite{chm92,p99}.
|
|
||||||
|
|
||||||
Mehlhorn~\cite{m84} showed
|
|
||||||
that at least $\Omega((1/\ln 2)n + \ln\ln u)$ bits are
|
|
||||||
required to represent a MPHF (i.e, at least 1.4427 bits per
|
|
||||||
key must be stored).
|
|
||||||
To the best of our knowledge our algorithm
|
|
||||||
is the first one capable of generating MPHFs for sets in the order
|
|
||||||
of billion of keys, and the generated functions
|
|
||||||
require less than 9 bits per key to be stored.
|
|
||||||
This increases one order of magnitude in the size of the greatest
|
|
||||||
key set for which a MPHF was obtained in the literature~\cite{bkz05}.
|
|
||||||
%which is close to the lower bound presented in~\cite{m84}.
|
|
||||||
|
|
||||||
Some work on minimal perfect hashing has been done under the assumption that
|
|
||||||
the algorithm can pick and store truly random functions~\cite{bkz05,chm92,p99}.
|
|
||||||
Since the space requirements for truly random functions makes them unsuitable for
|
|
||||||
implementation, one has to settle for pseudo-random functions in practice.
|
|
||||||
Empirical studies show that limited randomness properties are often as good as
|
|
||||||
total randomness.
|
|
||||||
We could verify that phenomenon in our experiments by using the universal hash
|
|
||||||
function proposed by Jenkins~\cite{j97}, which is
|
|
||||||
time efficient at retrieval time and requires just an integer to be used as a
|
|
||||||
random seed (the function is completely determined by the seed).
|
|
||||||
% Os trabalhos~\cite{asw00,swz00} apresentam algoritmos para construir
|
|
||||||
% FHPs e FHPMs deterministicamente.
|
|
||||||
% As fun\c{c}\~oes geradas necessitam de $O(n \log(n) + \log(\log(u)))$ bits para serem descritas.
|
|
||||||
% A complexidade de caso m\'edio dos algoritmos para gerar as fun\c{c}\~oes \'e
|
|
||||||
% $O(n\log(n) \log( \log (u)))$ e a de pior caso \'e $O(n^3\log(n) \log(\log(u)))$.
|
|
||||||
% A complexidade de avalia\c{c}\~ao das fun\c{c}\~oes \'e $O(\log(n) + \log(\log(u)))$.
|
|
||||||
% Assim, os algoritmos n\~ao geram fun\c{c}\~oes que podem ser avaliadas com complexidade
|
|
||||||
% de tempo $O(1)$, est\~ao distantes a um fator de $\log n$ da complexidade \'otima para descrever
|
|
||||||
% FHPs e FHPMs (Mehlhorn mostra em~\cite{m84}
|
|
||||||
% que para armazenar uma FHP s\~ao necess\'arios no m\'{\i}nimo
|
|
||||||
% $\Omega(n^2/(2\ln 2) m + \log\log u)$ bits), e n\~ao geram as
|
|
||||||
% fun\c{c}\~oes com complexidade linear.
|
|
||||||
% Al\'em disso, o universo $U$ das chaves \'e restrito a n\'umeros inteiros, o que pode
|
|
||||||
% limitar a utiliza\c{c}\~ao na pr\'atica.
|
|
||||||
|
|
||||||
Pagh~\cite{p99} proposed a family of randomized algorithms for
|
|
||||||
constructing MPHFs
|
|
||||||
where the form of the resulting function is $h(x) = (f(x) + d[g(x)]) \bmod n$,
|
|
||||||
where $f$ and $g$ are universal hash functions and $d$ is a set of
|
|
||||||
displacement values to resolve collisions that are caused by the function $f$.
|
|
||||||
Pagh identified a set of conditions concerning $f$ and $g$ and showed
|
|
||||||
that if these conditions are satisfied, then a minimal perfect hash
|
|
||||||
function can be computed in expected time $O(n)$ and stored in
|
|
||||||
$(2+\epsilon)n\log_2n$ bits.
|
|
||||||
|
|
||||||
Dietzfelbinger and Hagerup~\cite{dh01} improved~\cite{p99},
|
|
||||||
reducing from $(2+\epsilon)n\log_2n$ to $(1+\epsilon)n\log_2n$ the number of bits
|
|
||||||
required to store the function, but in their approach~$f$ and~$g$ must
|
|
||||||
be chosen from a class
|
|
||||||
of hash functions that meet additional requirements.
|
|
||||||
%Differently from the works in~\cite{dh01, p99}, our algorithm generates a MPHF
|
|
||||||
%$h$ in expected linear time and $h$ can be stored in $O(n)$ bits (9 bits per key).
|
|
||||||
|
|
||||||
% Galli, Seybold e Simon~\cite{gss01} propuseram um algoritmo r\^andomico
|
|
||||||
% que gera FHPMs da mesma forma das geradas pelos algoritmos de Pagh~\cite{p99}
|
|
||||||
% e, Dietzfelbinger e Hagerup~\cite{dh01}. No entanto, eles definiram a forma das
|
|
||||||
% fun\c{c}\~oes $f(k) = h_c(k) \bmod n$ e $g(k) = \lfloor h_c(k)/n \rfloor$ para obter em tempo esperado $O(n)$ uma fun\c{c}\~ao que pode ser descrita em $O(n\log n)$ bits, onde
|
|
||||||
% $h_c(k) = (ck \bmod p) \bmod n^2$, $1 \leq c \leq p-1$ e $p$ um primo maior do que $u$.
|
|
||||||
%Our algorithm is the first one capable of generating MPHFs for sets in the order of
|
|
||||||
%billion of keys. It happens because we do not need to keep into main memory
|
|
||||||
%at generation time complex data structures as a graph, lists and so on. We just need to maintain
|
|
||||||
%a small vector that occupies around 8MB for a set of 1 billion keys.
|
|
||||||
|
|
||||||
Fox et al.~\cite{fch92,fhcd92} studied MPHFs
|
|
||||||
%that also share features with the ones generated by our algorithm.
|
|
||||||
that bring down the storage requirements we got to between 2 and 4 bits per key.
|
|
||||||
However, it is shown in~\cite[Section 6.7]{chm97} that their algorithms have exponential
|
|
||||||
running times and cannot scale for sets larger than 11 million keys in our
|
|
||||||
implementation of the algorithm.
|
|
||||||
|
|
||||||
Our previous work~\cite{bkz05} improves the one by Czech, Havas and Majewski~\cite{chm92}.
|
|
||||||
We obtained more compact functions in less time. Although
|
|
||||||
the algorithm in~\cite{bkz05} is the fastest algorithm
|
|
||||||
we know of, the resulting functions are stored in $O(n\log n)$ bits and
|
|
||||||
one needs to keep in main memory at generation time a random graph of $n$ edges
|
|
||||||
and $cn$ vertices,
|
|
||||||
where $c\in[0.93,1.15]$. Using the well known divide to conquer approach
|
|
||||||
we use that algorithm as a building block for the new one, where the
|
|
||||||
resulting functions are stored in $O(n)$ bits.
|
|
|
@ -1,155 +0,0 @@
|
||||||
%% Nivio: 22/jan/06
|
|
||||||
% Time-stamp: <Monday 30 Jan 2006 03:57:35am EDT yoshi@ime.usp.br>
|
|
||||||
\vspace{-7mm}
|
|
||||||
\subsection{Searching step}
|
|
||||||
\label{sec:searching}
|
|
||||||
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
The searching step is responsible for generating a MPHF for each
|
|
||||||
bucket.
|
|
||||||
Figure~\ref{fig:searchingstep} presents the searching step algorithm.
|
|
||||||
\vspace{-2mm}
|
|
||||||
\begin{figure}[h]
|
|
||||||
%\centering
|
|
||||||
\hrule
|
|
||||||
\hrule
|
|
||||||
\vspace{2mm}
|
|
||||||
\begin{tabbing}
|
|
||||||
aa\=type booleanx \== (false, true); \kill
|
|
||||||
\> $\blacktriangleright$ Let $H$ be a minimum heap of size $N$, where the \\
|
|
||||||
\> ~~ order relation in $H$ is given by Eq.~(\ref{eq:bucketindex}), that is, the\\
|
|
||||||
\> ~~ remove operation removes the item with smallest $i$\\
|
|
||||||
\> $1.$ {\bf for} $j = 1$ {\bf to} $N$ {\bf do} \{ Heap construction \}\\
|
|
||||||
\> ~~ $1.1$ Read key $k$ from File $j$ on disk\\
|
|
||||||
\> ~~ $1.2$ Insert $(i, j, k)$ in $H$ \\
|
|
||||||
\> $2.$ {\bf for} $i = 0$ {\bf to} $\lceil n/b \rceil - 1$ {\bf do} \\
|
|
||||||
\> ~~ $2.1$ Read bucket $i$ from disk driven by heap $H$ \\
|
|
||||||
\> ~~ $2.2$ Generate a MPHF for bucket $i$ \\
|
|
||||||
\> ~~ $2.3$ Write the description of MPHF$_i$ to the disk
|
|
||||||
\end{tabbing}
|
|
||||||
\vspace{-1mm}
|
|
||||||
\hrule
|
|
||||||
\hrule
|
|
||||||
\caption{Searching step}
|
|
||||||
\label{fig:searchingstep}
|
|
||||||
\vspace{-4mm}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
Statement 1 of Figure~\ref{fig:searchingstep} inserts one key from each file
|
|
||||||
in a minimum heap $H$ of size $N$.
|
|
||||||
The order relation in $H$ is given by the bucket address $i$ given by
|
|
||||||
Eq.~(\ref{eq:bucketindex}).
|
|
||||||
|
|
||||||
%\enlargethispage{-\baselineskip}
|
|
||||||
Statement 2 has two important steps.
|
|
||||||
In statement 2.1, a bucket is read from disk,
|
|
||||||
as described below.
|
|
||||||
%in Section~\ref{sec:readingbucket}.
|
|
||||||
In statement 2.2, a MPHF is generated for each bucket $i$, as described
|
|
||||||
in the following.
|
|
||||||
%in Section~\ref{sec:mphfbucket}.
|
|
||||||
The description of MPHF$_i$ is a vector $g_i$ of 8-bit integers.
|
|
||||||
Finally, statement 2.3 writes the description $g_i$ of MPHF$_i$ to disk.
|
|
||||||
|
|
||||||
\vspace{-3mm}
|
|
||||||
\label{sec:readingbucket}
|
|
||||||
\subsubsection{Reading a bucket from disk.}
|
|
||||||
|
|
||||||
In this section we present the refinement of statement 2.1 of
|
|
||||||
Figure~\ref{fig:searchingstep}.
|
|
||||||
The algorithm to read bucket $i$ from disk is presented
|
|
||||||
in Figure~\ref{fig:readingbucket}.
|
|
||||||
|
|
||||||
\begin{figure}[h]
|
|
||||||
\hrule
|
|
||||||
\hrule
|
|
||||||
\vspace{2mm}
|
|
||||||
\begin{tabbing}
|
|
||||||
aa\=type booleanx \== (false, true); \kill
|
|
||||||
\> $1.$ {\bf while} bucket $i$ is not full {\bf do} \\
|
|
||||||
\> ~~ $1.1$ Remove $(i, j, k)$ from $H$\\
|
|
||||||
\> ~~ $1.2$ Insert $k$ into bucket $i$ \\
|
|
||||||
\> ~~ $1.3$ Read sequentially all keys $k$ from File $j$ that have \\
|
|
||||||
\> ~~~~~~~ the same $i$ and insert them into bucket $i$ \\
|
|
||||||
\> ~~ $1.4$ Insert the triple $(i, j, x)$ in $H$, where $x$ is the first \\
|
|
||||||
\> ~~~~~~~ key read from File $j$ that does not have the \\
|
|
||||||
\> ~~~~~~~ same bucket index $i$
|
|
||||||
\end{tabbing}
|
|
||||||
\hrule
|
|
||||||
\hrule
|
|
||||||
\vspace{-1.0mm}
|
|
||||||
\caption{Reading a bucket}
|
|
||||||
\vspace{-4.0mm}
|
|
||||||
\label{fig:readingbucket}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
Bucket $i$ is distributed among many files and the heap $H$ is used to drive a
|
|
||||||
multiway merge operation.
|
|
||||||
In Figure~\ref{fig:readingbucket}, statement 1.1 extracts and removes triple
|
|
||||||
$(i, j, k)$ from $H$, where $i$ is a minimum value in $H$.
|
|
||||||
Statement 1.2 inserts key $k$ in bucket $i$.
|
|
||||||
Notice that the $k$ in the triple $(i, j, k)$ is in fact a pointer to
|
|
||||||
the first byte of the key that is kept in contiguous positions of an array of characters
|
|
||||||
(this array containing the keys is initialized during the heap construction
|
|
||||||
in statement 1 of Figure~\ref{fig:searchingstep}).
|
|
||||||
Statement 1.3 performs a seek operation in File $j$ on disk for the first
|
|
||||||
read operation and reads sequentially all keys $k$ that have the same $i$
|
|
||||||
%(obtained from Eq.~(\ref{eq:bucketindex}))
|
|
||||||
and inserts them all in bucket $i$.
|
|
||||||
Finally, statement 1.4 inserts in $H$ the triple $(i, j, x)$,
|
|
||||||
where $x$ is the first key read from File $j$ (in statement 1.3)
|
|
||||||
that does not have the same bucket address as the previous keys.
|
|
||||||
|
|
||||||
The number of seek operations on disk performed in statement 1.3 is discussed
|
|
||||||
in Section~\ref{sec:linearcomplexity},
|
|
||||||
where we present a buffering technique that brings down
|
|
||||||
the time spent with seeks.
|
|
||||||
|
|
||||||
\vspace{-2mm}
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
\subsubsection{Generating a MPHF for each bucket.} \label{sec:mphfbucket}
|
|
||||||
|
|
||||||
To the best of our knowledge the algorithm we have designed in
|
|
||||||
our previous work~\cite{bkz05} is the fastest published algorithm for
|
|
||||||
constructing MPHFs.
|
|
||||||
That is why we are using that algorithm as a building block for the
|
|
||||||
algorithm presented here.
|
|
||||||
|
|
||||||
%\enlargethispage{-\baselineskip}
|
|
||||||
Our previous algorithm is a three-step internal memory based algorithm
|
|
||||||
that produces a MPHF based on random graphs.
|
|
||||||
For a set of $n$ keys, the algorithm outputs the resulting MPHF in expected time $O(n)$.
|
|
||||||
For a given bucket $i$, $0 \leq i < \lceil n/b \rceil$, the corresponding MPHF$_i$
|
|
||||||
has the following form:
|
|
||||||
\begin{eqnarray}
|
|
||||||
\mathrm{MPHF}_i(k) &=& g_i[a] + g_i[b] \label{eq:mphfi}
|
|
||||||
\end{eqnarray}
|
|
||||||
where $a = h_{i1}(k) \bmod t$, $b = h_{i2}(k) \bmod t$ and
|
|
||||||
$t = c\times \mathit{size}[i]$. The functions
|
|
||||||
$h_{i1}(k)$ and $h_{i2}(k)$ are the same universal function proposed by Jenkins~\cite{j97}
|
|
||||||
that was used in the partitioning step described in Section~\ref{sec:partitioning-keys}.
|
|
||||||
|
|
||||||
In order to generate the function above the algorithm involves the generation of simple random graphs
|
|
||||||
$G_i = (V_i, E_i)$ with~$|V_i|=t=c\times\mathit{size}[i]$ and $|E_i|=\mathit{size}[i]$, with $c \in [0.93, 1.15]$.
|
|
||||||
To generate a simple random graph with high
|
|
||||||
probability\footnote{We use the terms `with high probability'
|
|
||||||
to mean `with probability tending to~$1$ as~$n\to\infty$'.}, two vertices $a$ and $b$ are
|
|
||||||
computed for each key $k$ in bucket $i$.
|
|
||||||
Thus, each bucket $i$ has a corresponding graph~$G_i=(V_i,E_i)$, where $V_i=\{0,1,
|
|
||||||
\ldots,t-1\}$ and $E_i=\big\{\{a,b\}:k \in \mathrm{bucket}\: i\big\}$.
|
|
||||||
In order to get a simple graph,
|
|
||||||
the algorithm repeatedly selects $h_{i1}$ and $h_{i2}$ from a family of universal hash functions
|
|
||||||
until the corresponding graph is simple.
|
|
||||||
The probability of getting a simple graph is $p=e^{-1/c^2}$.
|
|
||||||
For $c=1$, this probability is $p \simeq 0.368$, and the expected number of
|
|
||||||
iterations to obtain a simple graph is~$1/p \simeq 2.72$.
|
|
||||||
|
|
||||||
The construction of MPHF$_i$ ends with a computation of a suitable labelling of the vertices
|
|
||||||
of~$G_i$. The labelling is stored into vector $g_i$.
|
|
||||||
We choose~$g_i[v]$ for each~$v\in V_i$ in such
|
|
||||||
a way that Eq.~(\ref{eq:mphfi}) is a MPHF for bucket $i$.
|
|
||||||
In order to get the values of each entry of $g_i$ we first
|
|
||||||
run a breadth-first search on the 2-\textit{core} of $G_i$, i.e., the maximal subgraph
|
|
||||||
of~$G_i$ with minimal degree at least~$2$ (see, e.g., \cite{b01,jlr00,pw04}) and
|
|
||||||
a depth-first search on the acyclic part of $G_i$ (see \cite{bkz05} for details).
|
|
||||||
|
|
|
@ -1,77 +0,0 @@
|
||||||
% SVJour2 DOCUMENT CLASS OPTION SVGLOV2 -- for standardised journals
|
|
||||||
%
|
|
||||||
% This is an enhancement for the LaTeX
|
|
||||||
% SVJour2 document class for Springer journals
|
|
||||||
%
|
|
||||||
%%
|
|
||||||
%%
|
|
||||||
%% \CharacterTable
|
|
||||||
%% {Upper-case \A\B\C\D\E\F\G\H\I\J\K\L\M\N\O\P\Q\R\S\T\U\V\W\X\Y\Z
|
|
||||||
%% Lower-case \a\b\c\d\e\f\g\h\i\j\k\l\m\n\o\p\q\r\s\t\u\v\w\x\y\z
|
|
||||||
%% Digits \0\1\2\3\4\5\6\7\8\9
|
|
||||||
%% Exclamation \! Double quote \" Hash (number) \#
|
|
||||||
%% Dollar \$ Percent \% Ampersand \&
|
|
||||||
%% Acute accent \' Left paren \( Right paren \)
|
|
||||||
%% Asterisk \* Plus \+ Comma \,
|
|
||||||
%% Minus \- Point \. Solidus \/
|
|
||||||
%% Colon \: Semicolon \; Less than \<
|
|
||||||
%% Equals \= Greater than \> Question mark \?
|
|
||||||
%% Commercial at \@ Left bracket \[ Backslash \\
|
|
||||||
%% Right bracket \] Circumflex \^ Underscore \_
|
|
||||||
%% Grave accent \` Left brace \{ Vertical bar \|
|
|
||||||
%% Right brace \} Tilde \~}
|
|
||||||
\ProvidesFile{svglov2.clo}
|
|
||||||
[2004/10/25 v2.1
|
|
||||||
style option for standardised journals]
|
|
||||||
\typeout{SVJour Class option: svglov2.clo for standardised journals}
|
|
||||||
\def\validfor{svjour2}
|
|
||||||
\ExecuteOptions{final,10pt,runningheads}
|
|
||||||
% No size changing allowed, hence a copy of size10.clo is included
|
|
||||||
\renewcommand\normalsize{%
|
|
||||||
\@setfontsize\normalsize{10.2pt}{4mm}%
|
|
||||||
\abovedisplayskip=3 mm plus6pt minus 4pt
|
|
||||||
\belowdisplayskip=3 mm plus6pt minus 4pt
|
|
||||||
\abovedisplayshortskip=0.0 mm plus6pt
|
|
||||||
\belowdisplayshortskip=2 mm plus4pt minus 4pt
|
|
||||||
\let\@listi\@listI}
|
|
||||||
\normalsize
|
|
||||||
\newcommand\small{%
|
|
||||||
\@setfontsize\small{8.7pt}{3.25mm}%
|
|
||||||
\abovedisplayskip 8.5\p@ \@plus3\p@ \@minus4\p@
|
|
||||||
\abovedisplayshortskip \z@ \@plus2\p@
|
|
||||||
\belowdisplayshortskip 4\p@ \@plus2\p@ \@minus2\p@
|
|
||||||
\def\@listi{\leftmargin\leftmargini
|
|
||||||
\parsep 0\p@ \@plus1\p@ \@minus\p@
|
|
||||||
\topsep 4\p@ \@plus2\p@ \@minus4\p@
|
|
||||||
\itemsep0\p@}%
|
|
||||||
\belowdisplayskip \abovedisplayskip
|
|
||||||
}
|
|
||||||
\let\footnotesize\small
|
|
||||||
\newcommand\scriptsize{\@setfontsize\scriptsize\@viipt\@viiipt}
|
|
||||||
\newcommand\tiny{\@setfontsize\tiny\@vpt\@vipt}
|
|
||||||
\newcommand\large{\@setfontsize\large\@xiipt{14pt}}
|
|
||||||
\newcommand\Large{\@setfontsize\Large\@xivpt{16dd}}
|
|
||||||
\newcommand\LARGE{\@setfontsize\LARGE\@xviipt{17dd}}
|
|
||||||
\newcommand\huge{\@setfontsize\huge\@xxpt{25}}
|
|
||||||
\newcommand\Huge{\@setfontsize\Huge\@xxvpt{30}}
|
|
||||||
%
|
|
||||||
%ALT% \def\runheadhook{\rlap{\smash{\lower5pt\hbox to\textwidth{\hrulefill}}}}
|
|
||||||
\def\runheadhook{\rlap{\smash{\lower11pt\hbox to\textwidth{\hrulefill}}}}
|
|
||||||
\AtEndOfClass{\advance\headsep by5pt}
|
|
||||||
\if@twocolumn
|
|
||||||
\setlength{\textwidth}{17.6cm}
|
|
||||||
\setlength{\textheight}{230mm}
|
|
||||||
\AtEndOfClass{\setlength\columnsep{4mm}}
|
|
||||||
\else
|
|
||||||
\setlength{\textwidth}{11.7cm}
|
|
||||||
\setlength{\textheight}{517.5dd} % 19.46cm
|
|
||||||
\fi
|
|
||||||
%
|
|
||||||
\AtBeginDocument{%
|
|
||||||
\@ifundefined{@journalname}
|
|
||||||
{\typeout{Unknown journal: specify \string\journalname\string{%
|
|
||||||
<name of your journal>\string} in preambel^^J}}{}}
|
|
||||||
%
|
|
||||||
\endinput
|
|
||||||
%%
|
|
||||||
%% End of file `svglov2.clo'.
|
|
1419
vldb07/svjour2.cls
1419
vldb07/svjour2.cls
File diff suppressed because it is too large
Load Diff
|
@ -1,18 +0,0 @@
|
||||||
% Time-stamp: <Sunday 29 Jan 2006 11:55:42pm EST yoshi@flare>
|
|
||||||
\vspace{-3mm}
|
|
||||||
\section{Notation and terminology}
|
|
||||||
\vspace{-2mm}
|
|
||||||
\label{sec:notation}
|
|
||||||
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
The essential notation and terminology used throughout this paper are as follows.
|
|
||||||
\begin{itemize}
|
|
||||||
\item $U$: key universe. $|U| = u$.
|
|
||||||
\item $S$: actual static key set. $S \subset U$, $|S| = n \ll u$.
|
|
||||||
\item $h: U \to M$ is a hash function that maps keys from a universe $U$ into
|
|
||||||
a given range $M = \{0,1,\dots,m-1\}$ of integer numbers.
|
|
||||||
\item $h$ is a perfect hash function if it is one-to-one on~$S$, i.e., if
|
|
||||||
$h(k_1) \not = h(k_2)$ for all $k_1 \not = k_2$ from $S$.
|
|
||||||
\item $h$ is a minimal perfect hash function (MPHF) if it is one-to-one on~$S$
|
|
||||||
and $n=m$.
|
|
||||||
\end{itemize}
|
|
|
@ -1,78 +0,0 @@
|
||||||
%% Nivio: 13/jan/06, 21/jan/06 29/jan/06
|
|
||||||
% Time-stamp: <Sunday 29 Jan 2006 11:56:25pm EST yoshi@flare>
|
|
||||||
\vspace{-3mm}
|
|
||||||
\section{The algorithm}
|
|
||||||
\label{sec:new-algorithm}
|
|
||||||
\vspace{-2mm}
|
|
||||||
|
|
||||||
\enlargethispage{2\baselineskip}
|
|
||||||
The main idea supporting our algorithm is the classical divide and conquer technique.
|
|
||||||
The algorithm is a two-step external memory based algorithm
|
|
||||||
that generates a MPHF $h$ for a set $S$ of $n$ keys.
|
|
||||||
Figure~\ref{fig:new-algo-main-steps} illustrates the two steps of the
|
|
||||||
algorithm: the partitioning step and the searching step.
|
|
||||||
|
|
||||||
\vspace{-2mm}
|
|
||||||
\begin{figure}[ht]
|
|
||||||
\centering
|
|
||||||
\begin{picture}(0,0)%
|
|
||||||
\includegraphics{figs/brz}%
|
|
||||||
\end{picture}%
|
|
||||||
\setlength{\unitlength}{4144sp}%
|
|
||||||
%
|
|
||||||
\begingroup\makeatletter\ifx\SetFigFont\undefined%
|
|
||||||
\gdef\SetFigFont#1#2#3#4#5{%
|
|
||||||
\reset@font\fontsize{#1}{#2pt}%
|
|
||||||
\fontfamily{#3}\fontseries{#4}\fontshape{#5}%
|
|
||||||
\selectfont}%
|
|
||||||
\fi\endgroup%
|
|
||||||
\begin{picture}(3704,2091)(1426,-5161)
|
|
||||||
\put(2570,-4301){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}0}}}}
|
|
||||||
\put(2782,-4301){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}1}}}}
|
|
||||||
\put(2996,-4301){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}2}}}}
|
|
||||||
\put(4060,-4006){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Buckets}}}}
|
|
||||||
\put(3776,-4301){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}${\lceil n/b\rceil - 1}$}}}}
|
|
||||||
\put(4563,-3329){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Key Set $S$}}}}
|
|
||||||
\put(2009,-3160){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}0}}}}
|
|
||||||
\put(2221,-3160){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}1}}}}
|
|
||||||
\put(4315,-3160){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}n-1}}}}
|
|
||||||
\put(1992,-5146){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}0}}}}
|
|
||||||
\put(2204,-5146){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}1}}}}
|
|
||||||
\put(4298,-5146){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}n-1}}}}
|
|
||||||
\put(4546,-4977){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Hash Table}}}}
|
|
||||||
\put(1441,-3616){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Partitioning}}}}
|
|
||||||
\put(1441,-4426){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Searching}}}}
|
|
||||||
\put(1981,-4786){\makebox(0,0)[lb]{\smash{{\SetFigFont{5}{6.0}{\familydefault}{\mddefault}{\updefault}MPHF$_0$}}}}
|
|
||||||
\put(2521,-4786){\makebox(0,0)[lb]{\smash{{\SetFigFont{5}{6.0}{\familydefault}{\mddefault}{\updefault}MPHF$_1$}}}}
|
|
||||||
\put(3016,-4786){\makebox(0,0)[lb]{\smash{{\SetFigFont{5}{6.0}{\familydefault}{\mddefault}{\updefault}MPHF$_2$}}}}
|
|
||||||
\put(3826,-4786){\makebox(0,0)[lb]{\smash{{\SetFigFont{5}{6.0}{\familydefault}{\mddefault}{\updefault}MPHF$_{\lceil n/b \rceil - 1}$}}}}
|
|
||||||
\end{picture}%
|
|
||||||
\vspace{-1mm}
|
|
||||||
\caption{Main steps of our algorithm}
|
|
||||||
\label{fig:new-algo-main-steps}
|
|
||||||
\vspace{-3mm}
|
|
||||||
\end{figure}
|
|
||||||
|
|
||||||
The partitioning step takes a key set $S$ and uses a universal hash function
|
|
||||||
$h_0$ proposed by Jenkins~\cite{j97}
|
|
||||||
%for each key $k \in S$ of length $|k|$
|
|
||||||
to transform each key~$k\in S$ into an integer~$h_0(k)$.
|
|
||||||
Reducing~$h_0(k)$ modulo~$\lceil n/b\rceil$, we partition~$S$ into $\lceil n/b
|
|
||||||
\rceil$ buckets containing at most 256 keys in each bucket (with high
|
|
||||||
probability).
|
|
||||||
|
|
||||||
The searching step generates a MPHF$_i$ for each bucket $i$,
|
|
||||||
$0 \leq i < \lceil n/b \rceil$.
|
|
||||||
The resulting MPHF $h(k)$, $k \in S$, is given by
|
|
||||||
\begin{eqnarray}\label{eq:mphf}
|
|
||||||
h(k) = \mathrm{MPHF}_i (k) + \mathit{offset}[i],
|
|
||||||
\end{eqnarray}
|
|
||||||
where~$i=h_0(k)\bmod\lceil n/b\rceil$.
|
|
||||||
The $i$th entry~$\mathit{offset}[i]$ of the displacement vector
|
|
||||||
$\mathit{offset}$, $0 \leq i < \lceil n/b \rceil$, contains the total number
|
|
||||||
of keys in the buckets from 0 to $i-1$, that is, it gives the interval of the
|
|
||||||
keys in the hash table addressed by the MPHF$_i$. In the following we explain
|
|
||||||
each step in detail.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,21 +0,0 @@
|
||||||
% Nivio: 29/jan/06
|
|
||||||
% Time-stamp: <Sunday 29 Jan 2006 11:57:40pm EST yoshi@flare>
|
|
||||||
\vspace{-2mm}
|
|
||||||
\subsection{The data and the experimental setup}
|
|
||||||
\label{sec:data-exper-set}
|
|
||||||
|
|
||||||
The algorithms were implemented in the C language and
|
|
||||||
are available at \texttt{http://\-cmph.sf.net}
|
|
||||||
under the GNU Lesser General Public License (LGPL).
|
|
||||||
% free software licence.
|
|
||||||
All experiments were carried out on
|
|
||||||
a computer running the Linux operating system, version 2.6,
|
|
||||||
with a 2.4 gigahertz processor and
|
|
||||||
1 gigabyte of main memory.
|
|
||||||
In the experiments related to the new
|
|
||||||
algorithm we limited the main memory in 500 megabytes.
|
|
||||||
|
|
||||||
Our data consists of a collection of 1 billion
|
|
||||||
URLs collected from the Web, each URL 64 characters long on average.
|
|
||||||
The collection is stored on disk in 60.5 gigabytes.
|
|
||||||
|
|
194
vldb07/vldb.tex
194
vldb07/vldb.tex
|
@ -1,194 +0,0 @@
|
||||||
%%%%%%%%%%%%%%%%%%%%%%% file template.tex %%%%%%%%%%%%%%%%%%%%%%%%%
|
|
||||||
%
|
|
||||||
% This is a template file for the LaTeX package SVJour2 for the
|
|
||||||
% Springer journal "The VLDB Journal".
|
|
||||||
%
|
|
||||||
% Springer Heidelberg 2004/12/03
|
|
||||||
%
|
|
||||||
% Copy it to a new file with a new name and use it as the basis
|
|
||||||
% for your article. Delete % as needed.
|
|
||||||
%
|
|
||||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
||||||
%
|
|
||||||
% First comes an example EPS file -- just ignore it and
|
|
||||||
% proceed on the \documentclass line
|
|
||||||
% your LaTeX will extract the file if required
|
|
||||||
%\begin{filecontents*}{figs/minimalperfecthash-ph-mph.ps}
|
|
||||||
%!PS-Adobe-3.0 EPSF-3.0
|
|
||||||
%%BoundingBox: 19 19 221 221
|
|
||||||
%%CreationDate: Mon Sep 29 1997
|
|
||||||
%%Creator: programmed by hand (JK)
|
|
||||||
%%EndComments
|
|
||||||
%gsave
|
|
||||||
%newpath
|
|
||||||
% 20 20 moveto
|
|
||||||
% 20 220 lineto
|
|
||||||
% 220 220 lineto
|
|
||||||
% 220 20 lineto
|
|
||||||
%closepath
|
|
||||||
%2 setlinewidth
|
|
||||||
%gsave
|
|
||||||
% .4 setgray fill
|
|
||||||
%grestore
|
|
||||||
%stroke
|
|
||||||
%grestore
|
|
||||||
%\end{filecontents*}
|
|
||||||
%
|
|
||||||
\documentclass[twocolumn,fleqn,runningheads]{svjour2}
|
|
||||||
%
|
|
||||||
\smartqed % flush right qed marks, e.g. at end of proof
|
|
||||||
%
|
|
||||||
\usepackage{graphicx}
|
|
||||||
\usepackage{listings}
|
|
||||||
\usepackage{epsfig}
|
|
||||||
\usepackage{textcomp}
|
|
||||||
\usepackage[latin1]{inputenc}
|
|
||||||
\usepackage{amssymb}
|
|
||||||
|
|
||||||
\DeclareGraphicsExtensions{.png}
|
|
||||||
%
|
|
||||||
% \usepackage{mathptmx} % use Times fonts if available on your TeX system
|
|
||||||
%
|
|
||||||
% insert here the call for the packages your document requires
|
|
||||||
%\usepackage{latexsym}
|
|
||||||
% etc.
|
|
||||||
%
|
|
||||||
% please place your own definitions here and don't use \def but
|
|
||||||
% \newcommand{}{}
|
|
||||||
%
|
|
||||||
|
|
||||||
\lstset{
|
|
||||||
language=Pascal,
|
|
||||||
basicstyle=\fontsize{9}{9}\selectfont,
|
|
||||||
captionpos=t,
|
|
||||||
aboveskip=1mm,
|
|
||||||
belowskip=1mm,
|
|
||||||
abovecaptionskip=1mm,
|
|
||||||
belowcaptionskip=1mm,
|
|
||||||
% numbers = left,
|
|
||||||
mathescape=true,
|
|
||||||
escapechar=@,
|
|
||||||
extendedchars=true,
|
|
||||||
showstringspaces=false,
|
|
||||||
columns=fixed,
|
|
||||||
basewidth=0.515em,
|
|
||||||
frame=single,
|
|
||||||
framesep=2mm,
|
|
||||||
xleftmargin=2mm,
|
|
||||||
xrightmargin=2mm,
|
|
||||||
framerule=0.5pt
|
|
||||||
}
|
|
||||||
|
|
||||||
\def\cG{{\mathcal G}}
|
|
||||||
\def\crit{{\rm crit}}
|
|
||||||
\def\ncrit{{\rm ncrit}}
|
|
||||||
\def\scrit{{\rm scrit}}
|
|
||||||
\def\bedges{{\rm bedges}}
|
|
||||||
\def\ZZ{{\mathbb Z}}
|
|
||||||
|
|
||||||
\journalname{The VLDB Journal}
|
|
||||||
%
|
|
||||||
|
|
||||||
\begin{document}
|
|
||||||
|
|
||||||
\title{Space and Time Efficient Minimal Perfect Hash \\[0.2cm]
|
|
||||||
Functions for Very Large Databases\thanks{
|
|
||||||
This work was supported in part by
|
|
||||||
GERINDO Project--grant MCT/CNPq/CT-INFO 552.087/02-5,
|
|
||||||
CAPES/PROF Scholarship (Fabiano C. Botelho),
|
|
||||||
FAPESP Proj.\ Tem.\ 03/09925-5 and CNPq Grant 30.0334/93-1
|
|
||||||
(Yoshiharu Kohayakawa),
|
|
||||||
and CNPq Grant 30.5237/02-0 (Nivio Ziviani).}
|
|
||||||
}
|
|
||||||
%\subtitle{Do you have a subtitle?\\ If so, write it here}
|
|
||||||
|
|
||||||
%\titlerunning{Short form of title} % if too long for running head
|
|
||||||
|
|
||||||
\author{Fabiano C. Botelho \and Davi C. Reis \and Yoshiharu Kohayakawa \and Nivio Ziviani}
|
|
||||||
%\authorrunning{Short form of author list} % if too long for running head
|
|
||||||
\institute{
|
|
||||||
F. C. Botelho \and
|
|
||||||
N. Ziviani \at
|
|
||||||
Dept. of Computer Science,
|
|
||||||
Federal Univ. of Minas Gerais,
|
|
||||||
Belo Horizonte, Brazil\\
|
|
||||||
\email{\{fbotelho,nivio\}@dcc.ufmg.br}
|
|
||||||
\and
|
|
||||||
D. C. Reis \at
|
|
||||||
Google, Brazil \\
|
|
||||||
\email{davi.reis@gmail.com}
|
|
||||||
\and
|
|
||||||
Y. Kohayakawa
|
|
||||||
Dept. of Computer Science,
|
|
||||||
Univ. of S\~ao Paulo,
|
|
||||||
S\~ao Paulo, Brazil\\
|
|
||||||
\email{yoshi@ime.usp.br}
|
|
||||||
}
|
|
||||||
|
|
||||||
\date{Received: date / Accepted: date}
|
|
||||||
% The correct dates will be entered by the editor
|
|
||||||
|
|
||||||
|
|
||||||
\maketitle
|
|
||||||
|
|
||||||
\begin{abstract}
|
|
||||||
We propose a novel external memory based algorithm for constructing minimal
|
|
||||||
perfect hash functions~$h$ for huge sets of keys.
|
|
||||||
For a set of~$n$ keys, our algorithm outputs~$h$ in time~$O(n)$.
|
|
||||||
The algorithm needs a small vector of one byte entries
|
|
||||||
in main memory to construct $h$.
|
|
||||||
The evaluation of~$h(x)$ requires three memory accesses for any key~$x$.
|
|
||||||
The description of~$h$ takes a constant number of bits
|
|
||||||
for each key, which is optimal, i.e., the theoretical lower bound is $1/\ln 2$
|
|
||||||
bits per key.
|
|
||||||
In our experiments, we used a collection of 1 billion URLs collected
|
|
||||||
from the web, each URL 64 characters long on average.
|
|
||||||
For this collection, our algorithm
|
|
||||||
(i) finds a minimal perfect hash function in approximately
|
|
||||||
3 hours using a commodity PC,
|
|
||||||
(ii) needs just 5.45 megabytes of internal memory to generate $h$
|
|
||||||
and (iii) takes 8.1 bits per key for the description of~$h$.
|
|
||||||
\keywords{Minimal Perfect Hashing \and Large Databases}
|
|
||||||
\end{abstract}
|
|
||||||
|
|
||||||
% main text
|
|
||||||
|
|
||||||
\def\cG{{\mathcal G}}
|
|
||||||
\def\crit{{\rm crit}}
|
|
||||||
\def\ncrit{{\rm ncrit}}
|
|
||||||
\def\scrit{{\rm scrit}}
|
|
||||||
\def\bedges{{\rm bedges}}
|
|
||||||
\def\ZZ{{\mathbb Z}}
|
|
||||||
\def\BSmax{\mathit{BS}_{\mathit{max}}}
|
|
||||||
\def\Bi{\mathop{\rm Bi}\nolimits}
|
|
||||||
|
|
||||||
\input{introduction}
|
|
||||||
%\input{terminology}
|
|
||||||
\input{relatedwork}
|
|
||||||
\input{thealgorithm}
|
|
||||||
\input{partitioningthekeys}
|
|
||||||
\input{searching}
|
|
||||||
%\input{computingoffset}
|
|
||||||
%\input{hashingbuckets}
|
|
||||||
\input{determiningb}
|
|
||||||
%\input{analyticalandexperimentalresults}
|
|
||||||
\input{analyticalresults}
|
|
||||||
%\input{results}
|
|
||||||
\input{conclusions}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
%\input{acknowledgments}
|
|
||||||
%\begin{acknowledgements}
|
|
||||||
%If you'd like to thank anyone, place your comments here
|
|
||||||
%and remove the percent signs.
|
|
||||||
%\end{acknowledgements}
|
|
||||||
|
|
||||||
% BibTeX users please use
|
|
||||||
%\bibliographystyle{spmpsci}
|
|
||||||
%\bibliography{} % name your BibTeX data base
|
|
||||||
\bibliographystyle{plain}
|
|
||||||
\bibliography{references}
|
|
||||||
\input{appendix}
|
|
||||||
\end{document}
|
|
Loading…
Reference in New Issue