turbonss/vldb07/searching.tex

%% Nivio: 22/jan/06
% Time-stamp: <Monday 30 Jan 2006 03:57:35am EDT yoshi@ime.usp.br>
\vspace{-7mm}
\subsection{Searching step}
\label{sec:searching}

\enlargethispage{2\baselineskip}
The searching step is responsible for generating a MPHF for each
bucket.
Figure~\ref{fig:searchingstep} presents the searching step algorithm.
\vspace{-2mm}
\begin{figure}[h]
%\centering
\hrule
\hrule
\vspace{2mm}
\begin{tabbing}
aa\=type booleanx \==  (false, true); \kill
\> $\blacktriangleright$ Let $H$ be a minimum heap of size $N$, where the \\
\> ~~ order relation in $H$ is given by Eq.~(\ref{eq:bucketindex}), that is, the\\
\> ~~ remove operation removes the item with smallest $i$\\
\> $1.$ {\bf for} $j = 1$ {\bf to} $N$ {\bf do} \{ Heap construction \}\\
\> ~~ $1.1$ Read key $k$ from File $j$ on disk\\
\> ~~ $1.2$ Insert $(i, j, k)$ in $H$ \\
\> $2.$ {\bf for} $i = 0$ {\bf to} $\lceil n/b \rceil - 1$ {\bf do} \\
\> ~~ $2.1$ Read bucket $i$ from disk driven by heap $H$ \\
\> ~~ $2.2$ Generate a MPHF for bucket $i$ \\
\> ~~ $2.3$ Write the description of MPHF$_i$ to the disk
\end{tabbing}
\vspace{-1mm}
\hrule
\hrule
\caption{Searching step}
\label{fig:searchingstep}
\vspace{-4mm}
\end{figure}

Statement 1 of Figure~\ref{fig:searchingstep} inserts one key from each file
in a minimum heap $H$ of size $N$.
The order relation in $H$ is given by the bucket address $i$ given by
Eq.~(\ref{eq:bucketindex}).

%\enlargethispage{-\baselineskip}
Statement 2 has two important steps.
In statement 2.1, a bucket is read from disk,
as described below.
%in Section~\ref{sec:readingbucket}.
In statement 2.2, a MPHF is generated for each bucket $i$, as described
in the following.
%in Section~\ref{sec:mphfbucket}.
The description of MPHF$_i$ is a vector $g_i$ of 8-bit integers.
Finally, statement 2.3 writes the description $g_i$ of MPHF$_i$ to disk.

\vspace{-3mm}
\label{sec:readingbucket}
\subsubsection{Reading a bucket from disk.}

In this section we present the refinement of statement 2.1 of
Figure~\ref{fig:searchingstep}.
The algorithm to read bucket $i$ from disk is presented
in Figure~\ref{fig:readingbucket}.

\begin{figure}[h]
\hrule
\hrule
\vspace{2mm}
\begin{tabbing}
aa\=type booleanx \==  (false, true); \kill
\> $1.$ {\bf while} bucket $i$ is not full {\bf do} \\
\> ~~ $1.1$ Remove $(i, j, k)$ from $H$\\
\> ~~ $1.2$ Insert $k$ into bucket $i$ \\
\> ~~ $1.3$ Read sequentially all keys $k$ from File $j$ that have \\
\> ~~~~~~~ the same $i$ and insert them into bucket $i$ \\
\> ~~ $1.4$ Insert the triple $(i, j, x)$ in $H$, where $x$ is the first \\
\> ~~~~~~~ key read from File $j$ that does not have the \\
\> ~~~~~~~ same bucket index $i$
\end{tabbing}
\hrule
\hrule
\vspace{-1.0mm}
\caption{Reading a bucket}
\vspace{-4.0mm}
\label{fig:readingbucket}
\end{figure}

Bucket $i$ is distributed among many files and the heap $H$ is used to drive a
multiway merge operation.
In Figure~\ref{fig:readingbucket}, statement 1.1 extracts and removes triple
$(i, j, k)$ from $H$, where $i$ is a minimum value in $H$.
Statement 1.2 inserts key $k$ in bucket $i$.
Notice that the $k$ in the triple $(i, j, k)$ is in fact a pointer to
the first byte of the key that is kept in contiguous positions of an array of characters
(this array containing the keys is initialized during the heap construction
in statement 1 of Figure~\ref{fig:searchingstep}).
Statement 1.3 performs a seek operation in File $j$ on disk for the first
read operation and reads sequentially all keys $k$ that have the same $i$
%(obtained from Eq.~(\ref{eq:bucketindex}))
and inserts them all in bucket $i$.
Finally, statement 1.4 inserts in $H$ the triple $(i, j, x)$,
where $x$ is the first key read from File $j$ (in statement 1.3)
that does not have the same bucket address as the previous keys.

The number of seek operations on disk performed in statement 1.3 is discussed
in Section~\ref{sec:linearcomplexity},
where we present a buffering technique that brings down
the time spent with seeks.

\vspace{-2mm}
\enlargethispage{2\baselineskip}
\subsubsection{Generating a MPHF for each bucket.} \label{sec:mphfbucket}

To the best of our knowledge the algorithm we have designed in
our previous work~\cite{bkz05} is the fastest published algorithm for
constructing MPHFs.
That is why we are using that algorithm as a building block for the
algorithm presented here.

%\enlargethispage{-\baselineskip}
Our previous algorithm is a three-step internal memory based algorithm
that produces a MPHF based on random graphs.
For a set of $n$ keys, the algorithm outputs the resulting MPHF in expected time $O(n)$.
For a given bucket $i$, $0 \leq i < \lceil n/b \rceil$, the corresponding MPHF$_i$
has the following form:
\begin{eqnarray}
        \mathrm{MPHF}_i(k) &=& g_i[a] + g_i[b] \label{eq:mphfi}
\end{eqnarray}
where $a = h_{i1}(k) \bmod t$, $b = h_{i2}(k) \bmod t$ and
$t = c\times \mathit{size}[i]$. The functions
$h_{i1}(k)$ and $h_{i2}(k)$ are the same universal function proposed by Jenkins~\cite{j97}
that was used in the partitioning step described in Section~\ref{sec:partitioning-keys}.

In order to generate the function above the algorithm involves the generation of simple random graphs
$G_i = (V_i, E_i)$ with~$|V_i|=t=c\times\mathit{size}[i]$ and $|E_i|=\mathit{size}[i]$, with  $c \in [0.93, 1.15]$.
To generate a simple random graph with high
probability\footnote{We use the terms `with high probability'
to mean `with probability tending to~$1$ as~$n\to\infty$'.}, two vertices $a$ and $b$ are
computed for each key $k$ in bucket $i$.
Thus, each bucket $i$ has a corresponding graph~$G_i=(V_i,E_i)$, where $V_i=\{0,1,
\ldots,t-1\}$ and $E_i=\big\{\{a,b\}:k \in \mathrm{bucket}\: i\big\}$.
In order to get a simple graph,
the algorithm repeatedly selects $h_{i1}$ and $h_{i2}$ from a family of universal hash functions
until the corresponding graph is simple.
The probability of getting a simple graph is $p=e^{-1/c^2}$.
For $c=1$, this probability is $p \simeq 0.368$, and the expected number of
iterations to obtain a simple graph is~$1/p \simeq 2.72$.

The construction of MPHF$_i$ ends with a computation of a suitable labelling of the vertices
of~$G_i$. The labelling is stored into vector $g_i$.
We choose~$g_i[v]$ for each~$v\in V_i$ in such
a way that Eq.~(\ref{eq:mphfi}) is a MPHF for bucket $i$.
In order to get the values of each entry of $g_i$ we first
run a breadth-first search on the 2-\textit{core} of $G_i$, i.e., the maximal subgraph
of~$G_i$ with minimal degree at least~$2$ (see, e.g., \cite{b01,jlr00,pw04}) and
a depth-first search on the acyclic part of $G_i$ (see \cite{bkz05} for details).