diff --git a/vldb07/acknowledgments.tex b/vldb07/acknowledgments.tex new file mode 100755 index 0000000..d903ceb --- /dev/null +++ b/vldb07/acknowledgments.tex @@ -0,0 +1,7 @@ +\section{Acknowledgments} +This section is optional; it is a location for you +to acknowledge grants, funding, editing assistance and +what have you. In the present case, for example, the +authors would like to thank Gerald Murray of ACM for +his help in codifying this \textit{Author's Guide} +and the \textbf{.cls} and \textbf{.tex} files that it describes. diff --git a/vldb07/analyticalresults.tex b/vldb07/analyticalresults.tex new file mode 100755 index 0000000..06ea049 --- /dev/null +++ b/vldb07/analyticalresults.tex @@ -0,0 +1,174 @@ +%% Nivio: 23/jan/06 29/jan/06 +% Time-stamp: +\enlargethispage{2\baselineskip} +\section{Analytical results} +\label{sec:analytcal-results} + +\vspace{-1mm} +The purpose of this section is fourfold. +First, we show that our algorithm runs in expected time $O(n)$. +Second, we present the main memory requirements for constructing the MPHF. +Third, we discuss the cost of evaluating the resulting MPHF. +Fourth, we present the space required to store the resulting MPHF. + +\vspace{-2mm} +\subsection{The linear time complexity} +\label{sec:linearcomplexity} + +First, we show that the partitioning step presented in +Figure~\ref{fig:partitioningstep} runs in $O(n)$ time. +Each iteration of the {\bf for} loop in statement~1 +runs in $O(|B_j|)$ time, $1 \leq j \leq N$, where $|B_j|$ is the +number of keys +that fit in block $B_j$ of size $\mu$. This is because statement 1.1 just reads +$|B_j|$ keys from disk, statement 1.2 runs a bucket sort like algorithm +that is well known to be linear in the number of keys it sorts (i.e., $|B_j|$ keys), +and statement 1.3 just dumps $|B_j|$ keys to the disk into File $j$. +Thus, the {\bf for} loop runs in $O(\sum_{j=1}^{N}|B_j|)$ time. +As $\sum_{j=1}^{N}|B_j|=n$, then the partitioning step runs in $O(n)$ time. + +Second, we show that the searching step presented in +Figure~\ref{fig:searchingstep} also runs in $O(n)$ time. +The heap construction in statement 1 runs in $O(N)$ time, for $N \ll n$. +We have assumed that insertions and deletions in the heap cost $O(1)$ because +$N$ is typically much smaller than $n$ (see \cite[Section 6.4]{bkz06t} for details). +Statement 2 runs in $O(\sum_{i=0}^{\lceil n/b \rceil - 1} \mathit{size}[i])$ time +(remember that $\mathit{size}[i]$ stores the number of keys in bucket $i$). +As $\sum_{i=0}^{\lceil n/b \rceil - 1} \mathit{size}[i] = n$, if +statements 2.1, 2.2 and 2.3 run in $O(\mathit{size}[i])$ time, then statement 2 +runs in $O(n)$ time. + +%Statement 2.1 runs the algorithm to read a bucket from disk. That algorithm reads $\mathit{size}[i]$ +%keys of bucket $i$ that might be spread into many files or, in the worst case, +%into $|BS_{max}|$ files, where $|BS_{max}|$ is the number of keys in the bucket of maximum size. +%It uses the heap $H$ to drive a multiway merge of the sprayed bucket $i$. +%As we are considering that each read/write on disk costs $O(1)$ and +%each heap operation also costs $O(1)$ (recall $N \ll n$), then statement 2.1 +%costs $O(\mathit{size}[i])$ time. +%We need to take into account that this step could generate a lot of seeks on disk. +%However, the number of seeks can be amortized (see Section~\ref{sec:contr-disk-access}) +%and that is why we have been able of getting a MPHF for a set of 1 billion keys in less +%than 4 hours using a machine with just 500 MB of main memory +%(see Section~\ref{sec:performance}). +Statement 2.1 reads $O(\mathit{size}[i])$ keys of bucket $i$ +and is detailed in Figure~\ref{fig:readingbucket}. +As we are assuming that each read or write on disk costs $O(1)$ and +each heap operation also costs $O(1)$, statement~2.1 +takes $O(\mathit{size}[i])$ time. +However, the keys of bucket $i$ are distributed in at most~$BS_{max}$ files on disk +in the worst case +(recall that $BS_{max}$ is the maximum number of keys found in any bucket). +Therefore, we need to take into account that +the critical step in reading a bucket is in statement~1.3 of Figure~\ref{fig:readingbucket}, +where a seek operation in File $j$ +may be performed by the first read operation. + +In order to amortize the number of seeks performed we use a buffering technique~\cite{k73}. +We create a buffer $j$ of size \textbaht$\: = \mu/N$ for each file $j$, +where $1\leq j \leq N$ +(recall that $\mu$ is the size in bytes of an a priori reserved internal memory area). +Every time a read operation is requested to file $j$ and the data is not found +in the $j$th~buffer, \textbaht~bytes are read from file $j$ to buffer $j$. +Hence, the number of seeks performed in the worst case is given by +$\beta/$\textbaht~(remember that $\beta$ is the size in bytes of $S$). +For that we have made the pessimistic assumption that one seek happens every time +buffer $j$ is filled in. +Thus, the number of seeks performed in the worst case is $64n/$\textbaht, since +each URL is 64 bytes long on average. Therefore, the number of seeks is linear on +$n$ and amortized by \textbaht. + +It is important to emphasize two things. +First, the operating system uses techniques +to diminish the number of seeks and the average seek time. +This makes the amortization factor to be greater than \textbaht~in practice. +Second, almost all main memory is available to be used as +file buffers because just a small vector +of $\lceil n/b\rceil$ one-byte entries must be maintained in main memory, +as we show in Section~\ref{sec:memconstruction}. + + +Statement 2.2 runs our internal memory based algorithm in order to generate a MPHF for each bucket. +That algorithm is linear, as we showed in~\cite{bkz05}. As it is applied to buckets with {\it size}$[i]$ keys, statement~2.2 takes $O(\mathit{size}[i])$ time. + +Statement 2.3 has time complexity $O(\mathit{size}[i])$ because it writes to disk +the description of each generated MPHF and each description is stored in +$c \times \mathit{size}[i] + O(1)$ bytes, where $c\in[0.93,1.15]$. +In conclusion, our algorithm takes $O(n)$ time because both the partitioning and +the searching steps run in $O(n)$ time. + +An experimental validation of the above proof and a performance comparison with +our internal memory based algorithm~\cite{bkz05} were not included here due to +space restrictions but can be found in~\cite{bkz06t} and also in the appendix. + +\vspace{-1mm} +\enlargethispage{2\baselineskip} +\subsection{Space used for constructing a MPHF} +\label{sec:memconstruction} + +The vector {\it size} is kept in main memory +all the time. +The vector {\it size} has $\lceil n/b \rceil$ one-byte entries. +It stores the number of keys in each bucket and +those values are less than or equal to 256. +For example, for a set of 1 billion keys and $b=175$ the vector size needs +$5.45$ megabytes of main memory. + +We need an internal memory area of size $\mu$ bytes to be used in +the partitioning step and in the searching step. +The size $\mu$ is fixed a priori and depends only on the amount +of internal memory available to run the algorithm +(i.e., it does not depend on the size $n$ of the problem). + +% One could argue about the a priori reserved internal memory area and the main memory +% required to run the indirect bucket sort algorithm. +% Those internal memory requirements do not depend on the size of the problem +% (i.e., the number of keys being hashed) and can be fixed a priori. + +The additional space required in the searching step +is constant, once the problem was broken down +into several small problems (at most 256 keys) and +the heap size is supposed to be much smaller than $n$ ($N \ll n$). +For example, for a set of 1 billion keys and an internal area of~$\mu = 250$ megabytes, +the number of files is $N = 248$. + +\vspace{-1mm} +\subsection{Evaluation cost of the MPHF} + +Now we consider the amount of CPU time +required by the resulting MPHF at retrieval time. +The MPHF requires for each key the computation of three +universal hash functions and three memory accesses +(see Eqs.~(\ref{eq:mphf}), (\ref{eq:bucketindex}) and (\ref{eq:mphfi})). +This is not optimal. Pagh~\cite{p99} showed that any MPHF requires +at least the computation of two universal hash functions and one memory +access. + +\subsection{Description size of the MPHF} + +The number of bits required to store the MPHF generated by the algorithm +is computed by Eq.~(\ref{eq:newmphfbits}). +We need to store each $g_i$ vector presented in Eq.~(\ref{eq:mphfi}), where +$0\leq i < \lceil n/b \rceil$. As each bucket has at most 256 keys, each +entry in a $g_i$ vector has 8~bits. In each $g_i$ vector there are +$c \times \mathit{size}[i]$ entries (recall $c\in[0.93, 1.15]$). +When we sum up the number of entries of $\lceil n/b \rceil$ $g_i$ vectors we have +$c\sum_{i=0}^{\lceil n/b \rceil -1} \mathit{size}[i]=cn$ entries. We also need to +store $3 \lceil n/b \rceil$ integer numbers of +$\log_2n$ bits referring respectively to the {\it offset} vector and the two random seeds of +$h_{1i}$ and $h_{2i}$. In addition, we need to store $\lceil n/b \rceil$ 8-bit entries of +the vector {\it size}. Therefore, +\begin{eqnarray}\label{eq:newmphfbits} +\mathrm{Required\: Space} = 8cn + \frac{n}{b}\left( 3\log_2n +8\right) \: +\mathrm{bits}. +\end{eqnarray} + +Considering $c=0.93$ and $b=175$, the number of bits per key to store +the description of the resulting MPHF for a set of 1~billion keys is $8.1$. +If we set $b=128$, then the bits per key ratio increases to $8.3$. +Theoretically, the number of bits required to store the MPHF in +Eq.~(\ref{eq:newmphfbits}) +is $O(n\log n)$ as~$n\to\infty$. However, for sets of size up to $2^{b/3}$ keys +the number of bits per key is lower than 9~bits (note that +$2^{b/3}>2^{58}>10^{17}$ for $b=175$). +%For $b=175$, the number of bits per key will be close to 9 for a set of $2^{58}$ keys. +Thus, in practice the resulting function is stored in $O(n)$ bits. diff --git a/vldb07/appendix.tex b/vldb07/appendix.tex new file mode 100644 index 0000000..288ad8a --- /dev/null +++ b/vldb07/appendix.tex @@ -0,0 +1,6 @@ +\appendix +\input{experimentalresults} +\input{thedataandsetup} +\input{costhashingbuckets} +\input{performancenewalgorithm} +\input{diskaccess} diff --git a/vldb07/conclusions.tex b/vldb07/conclusions.tex new file mode 100755 index 0000000..8d32741 --- /dev/null +++ b/vldb07/conclusions.tex @@ -0,0 +1,42 @@ +% Time-stamp: +\enlargethispage{2\baselineskip} +\section{Concluding remarks} +\label{sec:concuding-remarks} + +This paper has presented a novel external memory based algorithm for +constructing MPHFs that works for sets in the order of billions of keys. The +algorithm outputs the resulting function in~$O(n)$ time and, furthermore, it +can be tuned to run only $34\%$ slower (see \cite{bkz06t} for details) than the fastest +algorithm available in the literature for constructing MPHFs~\cite{bkz05}. +In addition, the space +requirement of the resulting MPHF is of up to 9 bits per key for datasets of +up to $2^{58}\simeq10^{17.4}$ keys. + +The algorithm is simple and needs just a +small vector of size approximately 5.45 megabytes in main memory to construct +a MPHF for a collection of 1 billion URLs, each one 64 bytes long on average. +Therefore, almost all main memory is available to be used as disk I/O buffer. +Making use of such a buffering scheme considering an internal memory area of size +$\mu=200$ megabytes, our algorithm can produce a MPHF for a +set of 1 billion URLs in approximately 3.6 hours using a commodity PC of 2.4 gigahertz and +500 megabytes of main memory. +If we increase both the main memory +available to 1 gigabyte and the internal memory area to $\mu=500$ megabytes, +a MPHF for the set of 1 billion URLs is produced in approximately 3 hours. For any +key, the evaluation of the resulting MPHF takes three memory accesses and the +computation of three universal hash functions. + +In order to allow the reproduction of our results and the utilization of both the internal memory +based algorithm and the external memory based algorithm, +the algorithms are available at \texttt{http://cmph.sf.net} +under the GNU Lesser General Public License (LGPL). +They were implemented in the C language. + +In future work, we will exploit the fact that the searching step intrinsically +presents a high degree of parallelism and requires $73\%$ of the +construction time. Therefore, a parallel implementation of our algorithm will +allow the construction and the evaluation of the resulting function in parallel. +Therefore, the description of the resulting MPHFs will be distributed in the paralell +computer allowing the scalability to sets of hundreds of billions of keys. +This is an important contribution, mainly for applications related to the Web, as +mentioned in Section~\ref{sec:intro}. \ No newline at end of file diff --git a/vldb07/costhashingbuckets.tex b/vldb07/costhashingbuckets.tex new file mode 100755 index 0000000..3a624ce --- /dev/null +++ b/vldb07/costhashingbuckets.tex @@ -0,0 +1,177 @@ +% Nivio: 29/jan/06 +% Time-stamp: +\vspace{-2mm} +\subsection{Performance of the internal memory based algorithm} +\label{sec:intern-memory-algor} + +%\begin{table*}[htb] +%\begin{center} +%{\scriptsize +%\begin{tabular}{|c|c|c|c|c|c|c|c|} +%\hline +%$n$ (millions) & 1 & 2 & 4 & 8 & 16 & 32 \\ +%\hline +%Average time (s)& $6.1 \pm 0.3$ & $12.2 \pm 0.6$ & $25.4 \pm 1.1$ & $51.4 \pm 2.0$ & $117.3 \pm 4.4$ & $262.2 \pm 8.7$\\ +%SD (s) & $2.6$ & $5.4$ & $9.8$ & $17.6$ & $37.3$ & $76.3$ \\ +%\hline +%\end{tabular} +%\vspace{-3mm} +%} +%\end{center} +%\caption{Internal memory based algorithm: average time in seconds for constructing a MPHF, +%the standard deviation (SD), and the confidence intervals considering +%a confidence level of $95\%$.} +%\label{tab:medias} +%\end{table*} + +Our three-step internal memory based algorithm presented in~\cite{bkz05} +is used for constructing a MPHF for each bucket. +It is a randomized algorithm because it needs to generate a simple random graph +in its first step. +Once the graph is obtained the other two steps are deterministic. + +Thus, we can consider the runtime of the algorithm to have the form~$\alpha +nZ$ for an input of~$n$ keys, where~$\alpha$ is some machine dependent +constant that further depends on the length of the keys and~$Z$ is a random +variable with geometric distribution with mean~$1/p=e^{1/c^2}$ (see +Section~\ref{sec:mphfbucket}). All results in our experiments were obtained +taking $c=1$; the value of~$c$, with~$c\in[0.93,1.15]$, in fact has little +influence in the runtime, as shown in~\cite{bkz05}. + +The values chosen for $n$ were $1, 2, 4, 8, 16$ and $32$ million. +Although we have a dataset with 1~billion URLs, on a PC with +1~gigabyte of main memory, the algorithm is able +to handle an input with at most 32 million keys. +This is mainly because of the graph we need to keep in main memory. +The algorithm requires $25n + O(1)$ bytes for constructing +a MPHF (details about the data structures used by the algorithm can +be found in~\texttt{http://cmph.sf.net}. +% for the details about the data structures +%used by the algorithm). + +In order to estimate the number of trials for each value of $n$ we use +a statistical method for determining a suitable sample size (see, e.g., +\cite[Chapter 13]{j91}). +As we obtained different values for each $n$, +we used the maximal value obtained, namely, 300~trials in order to have +a confidence level of $95\%$. + +% \begin{figure*}[ht] +% \noindent +% \begin{minipage}[b]{0.5\linewidth} +% \centering +% \subfigure[The previous algorithm] +% {\scalebox{0.5}{\includegraphics{figs/bmz_temporegressao.eps}}} +% \end{minipage} +% \hfill +% \begin{minipage}[b]{0.5\linewidth} +% \centering +% \subfigure[The new algorithm] +% {\scalebox{0.5}{\includegraphics{figs/brz_temporegressao.eps}}} +% \end{minipage} +% \caption{Time versus number of keys in $S$. The solid line corresponds to +% a linear regression model.} +% %obtained from the experimental measurements.} +% \label{fig:temporegressao} +% \end{figure*} + +Table~\ref{tab:medias} presents the runtime average for each $n$, +the respective standard deviations, and +the respective confidence intervals given by +the average time $\pm$ the distance from average time +considering a confidence level of $95\%$. +Observing the runtime averages one sees that +the algorithm runs in expected linear time, +as shown in~\cite{bkz05}. + +\vspace{-2mm} +\begin{table*}[htb] +\begin{center} +{\scriptsize +\begin{tabular}{|c|c|c|c|c|c|c|c|} +\hline +$n$ (millions) & 1 & 2 & 4 & 8 & 16 & 32 \\ +\hline +Average time (s)& $6.1 \pm 0.3$ & $12.2 \pm 0.6$ & $25.4 \pm 1.1$ & $51.4 \pm 2.0$ & $117.3 \pm 4.4$ & $262.2 \pm 8.7$\\ +SD (s) & $2.6$ & $5.4$ & $9.8$ & $17.6$ & $37.3$ & $76.3$ \\ +\hline +\end{tabular} +\vspace{-1mm} +} +\end{center} +\caption{Internal memory based algorithm: average time in seconds for constructing a MPHF, +the standard deviation (SD), and the confidence intervals considering +a confidence level of $95\%$.} +\label{tab:medias} +\vspace{-4mm} +\end{table*} + +% \enlargethispage{\baselineskip} +% \begin{table*}[htb] +% \begin{center} +% {\scriptsize +% (a) +% \begin{tabular}{|c|c|c|c|c|c|c|c|} +% \hline +% $n$ (millions) & 1 & 2 & 4 & 8 & 16 & 32 \\ +% \hline +% Average time (s)& $6.119 \pm 0.300$ & $12.190 \pm 0.615$ & $25.359 \pm 1.109$ & $51.408 \pm 2.003$ & $117.343 \pm 4.364$ & $262.215 \pm 8.724$\\ +% SD (s) & $2.644$ & $5.414$ & $9.757$ & $17.627$ & $37.333$ & $76.271$ \\ +% \hline +% \end{tabular} +% \\[5mm] (b) +% \begin{tabular}{|l|c|c|c|c|c|} +% \hline +% $n$ (millions) & 1 & 2 & 4 & 8 & 16 \\ +% \hline % Part. 16 \% 16 \% 16 \% 18 \% 20\% +% Average time (s) & $6.927 \pm 0.309$ & $13.828 \pm 0.175$ & $31.936 \pm 0.663$ & $69.902 \pm 1.084$ & $140.617 \pm 2.502$ \\ +% SD & $0.431$ & $0.245$ & $0.926$ & $1.515$ & $3.498$ \\ +% \hline +% \hline +% $n$ (millions) & 32 & 64 & 128 & 512 & 1000 \\ +% \hline % Part. 20 \% 20\% 20\% 18\% 18\% +% Average time (s) & $284.330 \pm 1.135$ & $587.880 \pm 3.945$ & $1223.581 \pm 4.864$ & $5966.402 \pm 9.465$ & $13229.540 \pm 12.670$ \\ +% SD & $1.587$ & $5.514$ & $6.800$ & $13.232$ & $18.577$ \\ +% \hline +% \end{tabular} +% } +% \end{center} +% \caption{The runtime averages in seconds, +% the standard deviation (SD), and +% the confidence intervals given by the average time $\pm$ +% the distance from average time considering +% a confidence level of $95\%$.} +% \label{tab:medias} +% \end{table*} + +\enlargethispage{2\baselineskip} +Figure~\ref{fig:bmz_temporegressao} +presents the runtime for each trial. In addition, +the solid line corresponds to a linear regression model +obtained from the experimental measurements. +As we can see, the runtime for a given $n$ has a considerable +fluctuation. However, the fluctuation also grows linearly with $n$. + +\begin{figure}[htb] +\vspace{-2mm} +\begin{center} +\scalebox{0.4}{\includegraphics{figs/bmz_temporegressao.eps}} +\caption{Time versus number of keys in $S$ for the internal memory based algorithm. +The solid line corresponds to a linear regression model.} +\label{fig:bmz_temporegressao} +\end{center} +\vspace{-6mm} +\end{figure} + +The observed fluctuation in the runtimes is as expected; recall that this +runtime has the form~$\alpha nZ$ with~$Z$ a geometric random variable with +mean~$1/p=e$. Thus, the runtime has mean~$\alpha n/p=\alpha en$ and standard +deviation~$\alpha n\sqrt{(1-p)/p^2}=\alpha n\sqrt{e(e-1)}$. +Therefore, the standard deviation also grows +linearly with $n$, as experimentally verified +in Table~\ref{tab:medias} and in Figure~\ref{fig:bmz_temporegressao}. + +%\noindent-------------------------------------------------------------------------\\ +%Comentario para Yoshi: Nao consegui reproduzir bem o que discutimos +%no paragrafo acima, acho que vc conseguira justificar melhor :-). \\ +%-------------------------------------------------------------------------\\ diff --git a/vldb07/determiningb.tex b/vldb07/determiningb.tex new file mode 100755 index 0000000..e9b3cb2 --- /dev/null +++ b/vldb07/determiningb.tex @@ -0,0 +1,146 @@ +% Nivio: 29/jan/06 +% Time-stamp: +\enlargethispage{2\baselineskip} +\subsection{Determining~$b$} +\label{sec:determining-b} +\begin{table*}[t] +\begin{center} +{\small %\scriptsize +\begin{tabular}{|c|ccc|ccc|} +\hline +\raisebox{-0.7em}{$n$} & \multicolumn{3}{c|}{\raisebox{-1mm}{b=128}} & +\multicolumn{3}{c|}{\raisebox{-1mm}{b=175}}\\ +\cline{2-4} \cline{5-7} + & \raisebox{-0.5mm}{Worst Case} & \raisebox{-0.5mm}{Average} &\raisebox{-0.5mm}{Eq.~(\ref{eq:maxbs})} + & \raisebox{-0.5mm}{Worst Case} & \raisebox{-0.5mm}{Average} &\raisebox{-0.5mm}{Eq.~(\ref{eq:maxbs})} \\ +\hline +$1.0 \times 10^6$ & 177 & 172.0 & 176 & 232 & 226.6 & 230 \\ +%$2.0 \times 10^6$ & 179 & 174.0 & 178 & 236 & 228.5 & 232 \\ +$4.0 \times 10^6$ & 182 & 177.5 & 179 & 241 & 231.8 & 234 \\ +%$8.0 \times 10^6$ & 186 & 181.6 & 181 & 238 & 234.2 & 236 \\ +$1.6 \times 10^7$ & 184 & 181.6 & 183 & 241 & 236.1 & 238 \\ +%$3.2 \times 10^7$ & 191 & 183.9 & 184 & 240 & 236.6 & 240 \\ +$6.4 \times 10^7$ & 195 & 185.2 & 186 & 244 & 239.0 & 242 \\ +%$1.28 \times 10^8$ & 193 & 187.7 & 187 & 244 & 239.7 & 244 \\ +$5.12 \times 10^8$ & 196 & 191.7 & 190 & 251 & 246.3 & 247 \\ +$1.0 \times 10^9$ & 197 & 191.6 & 192 & 253 & 248.9 & 249 \\ +\hline +\end{tabular} +\vspace{-1mm} +} +\end{center} +\caption{Values for $\mathit{BS}_{\mathit{max}}$: worst case and average case obtained in the experiments and using Eq.~(\ref{eq:maxbs}), +considering $b=128$ and $b=175$ for different number $n$ of keys in $S$.} +\label{tab:comparison} +\vspace{-6mm} +\end{table*} + +The partitioning step can be viewed as the well known ``balls into bins'' +problem~\cite{ra98,dfm02} where~$n$ keys (the balls) are placed independently and +uniformly into $\lceil n/b\rceil$ buckets (the bins). The main question related to that problem we are interested +in is: what is the maximum number of keys in any bucket? +In fact, we want to get the maximum value for $b$ that makes the maximum number of keys in any bucket +no greater than 256 with high probability. +This is important, as we wish to use 8 bits per entry in the vector $g_i$ of +each $\mathrm{MPHF}_i$, +where $0 \leq i < \lceil n/b\rceil$. +Let $\mathit{BS}_{\mathit{max}}$ be the maximum number of keys in any bucket. + +Clearly, $\BSmax$ is the maximum +of~$\lceil n/b\rceil$ random variables~$Z_i$, each with binomial +distribution~$\Bi(n,p)$ with parameters~$n$ and~$p=1/\lceil n/b\rceil$. +However, the~$Z_i$ are not independent. Note that~$\Bi(n,p)$ has mean and +variance~$\simeq b$. To give an upper estimate for the probability of the +event~$\BSmax\geq \gamma$, we can estimate the probability that we have~$Z_i\geq \gamma$ +for a fixed~$i$, and then sum these estimates over all~$i$. +Let~$\gamma=b+\sigma\sqrt{b\ln(n/b)}$, where~$\sigma=\sqrt2$. +Approximating~$\Bi(n,p)$ by the normal distribution with mean and +variance~$b$, we obtain the +estimate~$(\sigma\sqrt{2\pi\ln(n/b)})^{-1}\times\exp(-(1/2)\sigma^2\ln(n/b))$ for +the probability that~$Z_i\geq \gamma$ occurs, which, summed over all~$i$, gives +that the probability that~$\BSmax\geq \gamma$ occurs is at +most~$1/(\sigma\sqrt{2\pi\ln(n/b)})$, which tends to~$0$ as~$n\to\infty$. +Thus, we have shown that, with high probability, +\begin{equation} + \label{eq:maxbs} + \BSmax\leq b+\sqrt{2b\ln{n\over b}}. +\end{equation} + +% The traditional approach used to estimate $\mathit{BS}_{\mathit{max}}$ with high probability is +% to consider $\mathit{BS}_{\mathit{max}}$ as a random variable that follows a binomial distribution +% that can be approximated by a poisson distribution. This yields a good approximation +% when the number of balls is lower than or equal to the number of bins~\cite{g81}. In our case, +% the number of balls is greater than the number of buckets. +% % and that is why we have used more recent works to estimate $\mathit{BS}_{\mathit{max}}$. +% As $b > \ln (n/b)$, we can use the result by Raab and Steger~\cite{ra98} to estimate +% $\mathit{BS}_{\mathit{max}}$ with high probability. +% The following equation gives the estimation, where $\sigma=\sqrt{2}$: +% \begin{eqnarray} \label{eq:maxbs} +% \mathit{BS}_{\mathit{max}} = b + O \left( \sqrt{b\ln\frac{n}{b}} \right) = b + \sigma \times \left(\sqrt{b\ln\frac{n}{b}} \right) +% \end{eqnarray} + +% In order to estimate the suitable constant $\sigma$ we did a linear +% regression suppressing the constant term. +% We use the equation $BS_{max} - b = \sigma \times \sqrt{b\ln (n/b)}$ +% in the linear regression considering $y=BS_{max} - b$ and $x=\sqrt{b\ln (n/b)}$. +% In order to obtain data to be used in the linear regression we set +% b=128 and ran the new algorithm ten times +% for n equal to 1, 2, 4, 8, 16, 32, 64, 128, 512, 1000 million keys. +% Taking a confidence level equal to 95\% we got +% $\sigma = 2.11 \pm 0.03$. +% The coefficient of determination was $99.6\%$, which means that the linear +% regression explains $99.6\%$ of the data variation and only $0.4\%$ +% is due to experimental errors. +% Therefore, Eq.~(\ref{eq:maxbs}) with $\sigma = 2.11 \pm 0.03$ and $b=128$ +% makes a very good estimation of the maximum number of keys in any bucket. + +% Repeating the same experiments for $b$ equals to $175$ and +% a confidence level of $95\%$ we got $\sigma = 2.07 \pm 0.03$. +% Again we verified that Eq.~(\ref{eq:maxbs}) with $\sigma = 2.07 \pm 0.03$ and $b=175$ is +% a very good estimation of the maximum number of keys in any bucket once the +% coefficient of determination obtained was $99.7 \%$ and $\sigma$ is in a very narrow range. + +In our algorithm the maximum number of keys in any bucket must be at most 256. +Table~\ref{tab:comparison} presents the values for $\mathit{BS}_{\mathit{max}}$ +obtained experimentally and using Eq.~(\ref{eq:maxbs}). +The table presents the worst case and the average case, +considering $b=128$, $b=175$ and Eq.~(\ref{eq:maxbs}), +for several numbers~$n$ of keys in $S$. +The estimation given by Eq.~(\ref{eq:maxbs}) is very close to the experimental +results. + +Now we estimate the biggest problem our algorithm is able to solve for +a given $b$. +Using Eq.~(\ref{eq:maxbs}) considering $b=128$, $b=175$ and imposing +that~$\mathit{BS}_{\mathit{max}}\leq256$, +the sizes of the biggest key set our algorithm +can deal with are $10^{30}$ keys and $10^{10}$ keys, respectively. +%It is also important to have $b$ as big as possible, once its value is +%related to the space required to store the resultant MPHF, as shown later on. +%Table~\ref{tab:bp} shows the biggest problem the algorithm can solve. +% The values were obtained from Eq.~(\ref{eq:maxbs}), +% considering $b=128$ and~$b=175$ and imposing +% that~$\mathit{BS}_{\mathit{max}}\leq256$. + +% We set $\sigma=2.14$ because it was the greatest value obtained for $\sigma$ +% in the two linear regression we did. +% \vspace{-3mm} +% \begin{table}[htb] +% \begin{center} +% {\small %\scriptsize +% \begin{tabular}{|c|c|} +% \hline +% b & Problem size ($n$) \\ +% \hline +% 128 & $10^{30}$ keys \\ +% 175 & $10^{10}$ keys \\ +% \hline +% \end{tabular} +% \vspace{-1mm} +% } +% \end{center} +% \caption{Using Eq.~(\ref{eq:maxbs}) to estimate the biggest problem our algorithm can solve.} +% %considering $\sigma=\sqrt{2}$.} +% \label{tab:bp} +% \vspace{-14mm} +% \end{table} diff --git a/vldb07/diskaccess.tex b/vldb07/diskaccess.tex new file mode 100755 index 0000000..08e54b9 --- /dev/null +++ b/vldb07/diskaccess.tex @@ -0,0 +1,113 @@ +% Nivio: 29/jan/06 +% Time-stamp: +\vspace{-2mm} +\subsection{Controlling disk accesses} +\label{sec:contr-disk-access} + +In order to bring down the number of seek operations on disk +we benefit from the fact that our algorithm leaves almost all main +memory available to be used as disk I/O buffer. +In this section we evaluate how much the parameter $\mu$ +affects the runtime of our algorithm. +For that we fixed $n$ in 1 billion of URLs, +set the main memory of the machine used for the experiments +to 1 gigabyte and used $\mu$ equal to $100, 200, 300, 400, 500$ and $600$ +megabytes. + +\enlargethispage{2\baselineskip} +Table~\ref{tab:diskaccess} presents the number of files $N$, +the buffer size used for all files, the number of seeks in the worst case considering +the pessimistic assumption mentioned in Section~\ref{sec:linearcomplexity}, and +the time to generate a MPHF for 1 billion of keys as a function of the amount of internal +memory available. Observing Table~\ref{tab:diskaccess} we noticed that the time spent in the construction +decreases as the value of $\mu$ increases. However, for $\mu > 400$, the variation +on the time is not as significant as for $\mu \leq 400$. +This can be explained by the fact that the kernel 2.6 I/O scheduler of Linux +has smart policies +for avoiding seeks and diminishing the average seek time +(see \texttt{http://www.linuxjournal.com/article/6931}). +\begin{table*}[ht] +\vspace{-2mm} +\begin{center} +{\scriptsize +\begin{tabular}{|l|c|c|c|c|c|c|} +\hline +$\mu$ (MB) & $100$ & $200$ & $300$ & $400$ & $500$ & $600$ \\ +\hline +$N$ (files) & $619$ & $310$ & $207$ & $155$ & $124$ & $104$ \\ +%\hline +\textbaht~(buffer size in KB) & $165$ & $661$ & $1,484$ & $2,643$ & $4,129$ & $5,908$ \\ +%\hline +$\beta$/\textbaht~(\# of seeks in the worst case) & $384,478$ & $95,974$ & $42,749$ & $24,003$ & $15,365$ & $10,738$ \\ +% \hline +% \raisebox{-0.2em}{\# of seeks performed in} & \raisebox{-0.7em}{$383,056$} & \raisebox{-0.7em}{$95,919$} & \raisebox{-0.7em}{$42,700$} & \raisebox{-0.7em}{$23,980$} & \raisebox{-0.7em}{$15,347$} & \raisebox{-0.7em}{$xx,xxx$} \\ +% \raisebox{0.2em}{statement 1.3 of Figure~\ref{fig:readingbucket}} & & & & & & \\ +% \hline +Time (hours) & $4.04$ & $3.64$ & $3.34$ & $3.20$ & $3.13$ & $3.09$ \\ +\hline +\end{tabular} +\vspace{-1mm} +} +\end{center} +\caption{Influence of the internal memory area size ($\mu$) in our algorithm runtime.} +\label{tab:diskaccess} +\vspace{-14mm} +\end{table*} + + + +% \begin{table*}[ht] +% \begin{center} +% {\scriptsize +% \begin{tabular}{|l|c|c|c|c|c|c|c|c|c|c|c|} +% \hline +% $\mu$ (MB) & $100$ & $150$ & $200$ & $250$ & $300$ & $350$ & $400$ & $450$ & $500$ & $550$ & $600$ \\ +% \hline +% $N$ (files) & $619$ & $413$ & $310$ & $248$ & $207$ & $177$ & $155$ & $138$ & $124$ & $113$ & $103$ \\ +% \hline +% \textbaht~(buffer size in KB) & $165$ & $372$ & $661$ & $1,033$ & $1,484$ & $2,025$ & $2,643$ & $3,339$ & & & \\ +% \hline +% \# of seeks (Worst case) & $384,478$ & $170,535$ & $95,974$ & $61,413$ & $42,749$ & $31,328$ & $24,003$ & $19,000$ & & & \\ +% \hline +% \raisebox{-0.2em}{\# of seeks performed in} & \raisebox{-0.7em}{$383,056$} & \raisebox{-0.7em}{$170,385$} & \raisebox{-0.7em}{$95,919$} & \raisebox{-0.7em}{$61,388$} & \raisebox{-0.7em}{$42,700$} & \raisebox{-0.7em}{$31,296$} & \raisebox{-0.7em}{$23,980$} & \raisebox{-0.7em}{$18,978$} & \raisebox{-0.7em}{$xx,xxx$} & \raisebox{-0.7em}{$xx,xxx$} & \raisebox{-0.7em}{$xx,xxx$} \\ +% \raisebox{0.2em}{statement 1.3 of Figure~\ref{fig:readingbucket}} & & & & & & & & & & & \\ +% \hline +% Time (horas) & $4.04$ & $3.93$ & $3.64$ & $3.46$ & $3.34$ & $3.26$ & $3.20$ & $3.13$ & & & \\ +% \hline +% \end{tabular} +% } +% \end{center} +% \caption{Influence of the internal memory area size ($\mu$) in our algorithm runtime.} +% \label{tab:diskaccess} +% \end{table*} + + + +% \begin{table*}[htb] +% \begin{center} +% {\scriptsize +% \begin{tabular}{|l|c|c|c|c|c|} +% \hline +% $n$ (millions) & 1 & 2 & 4 & 8 & 16 \\ +% \hline % Part. 16 \% 16 \% 16 \% 18 \% 20\% +% Average time (s) & $14.124 \pm 0.128$ & $28.301 \pm 0.140$ & $56.807 \pm 0.312$ & $117.286 \pm 0.997$ & $241.086 \pm 0.936$ \\ +% SD & $0.179$ & $0.196$ & $0.437$ & $1.394$ & $1.308$ \\ +% \hline +% \hline +% $n$ (millions) & 32 & 64 & 128 & 512 & 1000 \\ +% \hline % Part. 20 \% 20\% 20\% 18\% 18\% +% Average time (s) & $492.430 \pm 1.565$ & $1006.307 \pm 1.425$ & $2081.208 \pm 0.740$ & $9253.188 \pm 4.406$ & $19021.480 \pm 13.850$ \\ +% SD & $2.188$ & $1.992$ & $1.035$ & $ 6.160$ & $18.016$ \\ +% \hline + +% \end{tabular} +% } +% \end{center} +% \caption{The runtime averages in seconds, +% the standard deviation (SD), and +% the confidence intervals given by the average time $\pm$ +% the distance from average time considering +% a confidence level of $95\%$. +% } +% \label{tab:mediasbrz} +% \end{table*} diff --git a/vldb07/experimentalresults.tex b/vldb07/experimentalresults.tex new file mode 100755 index 0000000..58b4091 --- /dev/null +++ b/vldb07/experimentalresults.tex @@ -0,0 +1,15 @@ +%Nivio: 29/jan/06 +% Time-stamp: +\vspace{-2mm} +\enlargethispage{2\baselineskip} +\section{Appendix: Experimental results} +\label{sec:experimental-results} +\vspace{-1mm} + +In this section we present the experimental results. +We start presenting the experimental setup. +We then present experimental results for +the internal memory based algorithm~\cite{bkz05} +and for our algorithm. +Finally, we discuss how the amount of internal memory available +affects the runtime of our algorithm. diff --git a/vldb07/figs/bmz_temporegressao.png b/vldb07/figs/bmz_temporegressao.png new file mode 100644 index 0000000..e7198c1 Binary files /dev/null and b/vldb07/figs/bmz_temporegressao.png differ diff --git a/vldb07/figs/brz-partitioning.fig b/vldb07/figs/brz-partitioning.fig new file mode 100644 index 0000000..112098b --- /dev/null +++ b/vldb07/figs/brz-partitioning.fig @@ -0,0 +1,107 @@ +#FIG 3.2 +Landscape +Center +Metric +A4 +100.00 +Single +-2 +1200 2 +0 32 #bdbebd +0 33 #bdbebd +0 34 #bdbebd +0 35 #4a4d4a +0 36 #bdbebd +0 37 #4a4d4a +0 38 #bdbebd +0 39 #bdbebd +6 225 6615 2520 7560 +2 1 0 1 -1 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 900 7133 1608 7133 +2 2 0 1 0 35 50 -1 20 0.000 0 0 7 0 0 5 + 260 6795 474 6795 474 6965 260 6965 260 6795 +2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5 + 474 6795 686 6795 686 6965 474 6965 474 6795 +2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5 + 474 6626 686 6626 686 6795 474 6795 474 6626 +2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5 + 1538 6795 1750 6795 1750 6965 1538 6965 1538 6795 +2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5 + 1538 6965 1750 6965 1750 7133 1538 7133 1538 6965 +2 2 0 1 -1 7 50 -1 -1 0.000 0 0 7 0 0 5 + 474 6965 686 6965 686 7133 474 7133 474 6965 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 686 6965 900 6965 900 7133 686 7133 686 6965 +2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5 + 1538 6626 1750 6626 1750 6795 1538 6795 1538 6626 +2 2 0 1 0 35 50 -1 20 0.000 0 0 7 0 0 5 + 260 6965 474 6965 474 7133 260 7133 260 6965 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 686 6795 900 6795 900 6965 686 6965 686 6795 +4 0 0 50 -1 0 14 0.0000 4 30 180 1148 7049 ...\001 +4 0 -1 50 -1 0 7 0.0000 2 60 60 332 7260 0\001 +4 0 -1 50 -1 0 7 0.0000 2 75 60 544 7260 1\001 +4 0 -1 50 -1 0 7 0.0000 2 60 60 758 7260 2\001 +4 0 -1 50 -1 0 7 0.0000 2 90 960 1538 7260 ${\\lceil n/b\\rceil - 1}$\001 +4 0 -1 50 -1 0 7 0.0000 2 105 975 540 7515 Buckets Logical View\001 +-6 +6 2700 6390 4365 7830 +6 3461 6445 3675 7425 +6 3463 6786 3675 7245 +6 3546 6893 3591 7094 +4 0 -1 50 -1 0 12 0.0000 2 15 45 3546 6959 .\001 +4 0 -1 50 -1 0 12 0.0000 2 15 45 3546 7027 .\001 +4 0 -1 50 -1 0 12 0.0000 2 15 45 3546 7094 .\001 +-6 +2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5 + 3463 6786 3675 6786 3675 7245 3463 7245 3463 6786 +-6 +2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5 + 3461 6445 3675 6445 3675 6615 3461 6615 3461 6445 +2 2 0 1 -1 7 50 -1 41 0.000 0 0 7 0 0 5 + 3463 6616 3675 6616 3675 6785 3463 6785 3463 6616 +2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5 + 3463 7246 3675 7246 3675 7425 3463 7425 3463 7246 +-6 +6 3023 6786 3235 7245 +6 3106 6893 3151 7094 +4 0 -1 50 -1 0 12 0.0000 2 15 45 3106 6959 .\001 +4 0 -1 50 -1 0 12 0.0000 2 15 45 3106 7027 .\001 +4 0 -1 50 -1 0 12 0.0000 2 15 45 3106 7094 .\001 +-6 +2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5 + 3023 6786 3235 6786 3235 7245 3023 7245 3023 6786 +-6 +6 4091 6425 4305 7425 +6 4093 6946 4305 7255 +6 4176 7018 4221 7153 +4 0 -1 50 -1 0 12 0.0000 2 15 45 4176 7063 .\001 +4 0 -1 50 -1 0 12 0.0000 2 15 45 4176 7108 .\001 +4 0 -1 50 -1 0 12 0.0000 2 15 45 4176 7153 .\001 +-6 +2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5 + 4093 6946 4305 6946 4305 7255 4093 7255 4093 6946 +-6 +2 2 0 1 0 35 50 -1 -1 0.000 0 0 7 0 0 5 + 4091 6605 4305 6605 4305 6775 4091 6775 4091 6605 +2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5 + 4093 7256 4305 7256 4305 7425 4093 7425 4093 7256 +2 2 0 1 -1 7 50 -1 41 0.000 0 0 7 0 0 5 + 4093 6776 4305 6776 4305 6945 4093 6945 4093 6776 +2 2 0 1 0 35 50 -1 20 0.000 0 0 7 0 0 5 + 4091 6425 4305 6425 4305 6595 4091 6595 4091 6425 +-6 +2 2 0 1 0 35 50 -1 20 0.000 0 0 7 0 0 5 + 3021 6445 3235 6445 3235 6615 3021 6615 3021 6445 +2 2 0 1 -1 7 50 -1 -1 0.000 0 0 7 0 0 5 + 3023 6616 3235 6616 3235 6785 3023 6785 3023 6616 +2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5 + 3023 7246 3235 7246 3235 7425 3023 7425 3023 7246 +4 0 0 50 -1 0 14 0.0000 4 30 180 3780 6975 ...\001 +4 0 -1 50 -1 0 7 0.0000 2 75 255 3015 7560 File 1\001 +4 0 -1 50 -1 0 7 0.0000 2 75 255 3465 7560 File 2\001 +4 0 -1 50 -1 0 7 0.0000 2 75 270 4095 7560 File N\001 +4 0 -1 50 -1 0 7 0.0000 2 105 1020 3195 7785 Buckets Physical View\001 +4 0 0 50 -1 0 10 0.0000 4 150 120 2700 7020 b)\001 +-6 +4 0 0 50 -1 0 10 0.0000 4 150 105 0 7020 a)\001 diff --git a/vldb07/figs/brz-partitioningfabiano.fig b/vldb07/figs/brz-partitioningfabiano.fig new file mode 100755 index 0000000..3e6d6ca --- /dev/null +++ b/vldb07/figs/brz-partitioningfabiano.fig @@ -0,0 +1,126 @@ +#FIG 3.2 +Landscape +Center +Metric +A4 +100.00 +Single +-2 +1200 2 +0 32 #bebebe +0 33 #4e4e4e +6 2160 3825 2430 4365 +2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5 + 2160 4005 2430 4005 2430 4095 2160 4095 2160 4005 +2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5 + 2160 3825 2430 3825 2430 3915 2160 3915 2160 3825 +2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5 + 2160 3915 2430 3915 2430 4005 2160 4005 2160 3915 +2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5 + 2160 4275 2430 4275 2430 4365 2160 4365 2160 4275 +2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5 + 2160 4185 2430 4185 2430 4275 2160 4275 2160 4185 +2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5 + 2160 4095 2430 4095 2430 4185 2160 4185 2160 4095 +-6 +6 2430 3735 2700 4365 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 2430 3825 2700 3825 2700 3915 2430 3915 2430 3825 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 2430 4275 2700 4275 2700 4365 2430 4365 2430 4275 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 2430 4185 2700 4185 2700 4275 2430 4275 2430 4185 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 2430 4095 2700 4095 2700 4185 2430 4185 2430 4095 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 2430 4005 2700 4005 2700 4095 2430 4095 2430 4005 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 2430 3915 2700 3915 2700 4005 2430 4005 2430 3915 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 2430 3735 2700 3735 2700 3825 2430 3825 2430 3735 +-6 +6 2700 4005 2970 4365 +2 2 0 1 0 32 50 -1 43 0.000 0 0 -1 0 0 5 + 2700 4275 2970 4275 2970 4365 2700 4365 2700 4275 +2 2 0 1 0 32 50 -1 43 0.000 0 0 -1 0 0 5 + 2700 4185 2970 4185 2970 4275 2700 4275 2700 4185 +2 2 0 1 0 32 50 -1 43 0.000 0 0 -1 0 0 5 + 2700 4095 2970 4095 2970 4185 2700 4185 2700 4095 +2 2 0 1 -1 32 50 -1 43 0.000 0 0 -1 0 0 5 + 2700 4005 2970 4005 2970 4095 2700 4095 2700 4005 +-6 +6 2025 5625 3690 5760 +4 0 0 50 -1 0 10 0.0000 4 105 360 2025 5760 File 1\001 +4 0 0 50 -1 0 10 0.0000 4 105 360 2565 5760 File 2\001 +4 0 0 50 -1 0 10 0.0000 4 105 405 3285 5760 File N\001 +-6 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3510 4410 3510 4590 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3780 4410 3780 4590 +2 2 0 1 0 33 50 -1 20 0.000 0 0 7 0 0 5 + 1890 4185 2160 4185 2160 4275 1890 4275 1890 4185 +2 2 0 1 0 33 50 -1 20 0.000 0 0 7 0 0 5 + 1890 4275 2160 4275 2160 4365 1890 4365 1890 4275 +2 2 0 1 0 33 50 -1 20 0.000 0 0 7 0 0 5 + 1890 4095 2160 4095 2160 4185 1890 4185 1890 4095 +2 2 0 1 0 33 50 -1 20 0.000 0 0 7 0 0 5 + 2070 4860 2340 4860 2340 5040 2070 5040 2070 4860 +2 2 0 1 0 7 50 -1 41 0.000 0 0 7 0 0 5 + 3330 5220 3600 5220 3600 5400 3330 5400 3330 5220 +2 2 0 1 0 33 50 -1 20 0.000 0 0 7 0 0 5 + 3330 4860 3600 4860 3600 4950 3330 4950 3330 4860 +2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5 + 2070 5040 2340 5040 2340 5130 2070 5130 2070 5040 +2 2 0 1 0 33 50 -1 -1 0.000 0 0 7 0 0 5 + 3330 4950 3600 4950 3600 5220 3330 5220 3330 4950 +2 2 0 1 0 7 50 -1 41 0.000 0 0 7 0 0 5 + 2070 5130 2340 5130 2340 5310 2070 5310 2070 5130 +2 2 0 1 0 7 50 -1 10 0.000 0 0 7 0 0 5 + 2610 5400 2880 5400 2880 5580 2610 5580 2610 5400 +2 2 0 1 0 7 50 -1 41 0.000 0 0 7 0 0 5 + 2610 4860 2880 4860 2880 5040 2610 5040 2610 4860 +2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5 + 2610 5040 2880 5040 2880 5130 2610 5130 2610 5040 +2 2 0 1 0 7 50 -1 50 0.000 0 0 -1 0 0 5 + 2970 4275 3240 4275 3240 4365 2970 4365 2970 4275 +2 2 0 1 0 7 50 -1 50 0.000 0 0 -1 0 0 5 + 2970 4185 3240 4185 3240 4275 2970 4275 2970 4185 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3510 4410 3600 4410 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3690 4410 3780 4410 +2 2 0 1 0 7 50 -1 10 0.000 0 0 -1 0 0 5 + 3510 4275 3780 4275 3780 4365 3510 4365 3510 4275 +2 2 0 1 0 7 50 -1 10 0.000 0 0 -1 0 0 5 + 3510 4185 3780 4185 3780 4275 3510 4275 3510 4185 +2 2 0 1 0 32 50 -1 20 0.000 0 0 7 0 0 5 + 2610 5130 2880 5130 2880 5400 2610 5400 2610 5130 +2 2 0 1 0 32 50 -1 43 0.000 0 0 7 0 0 5 + 2070 5310 2340 5310 2340 5490 2070 5490 2070 5310 +2 2 0 1 0 7 50 -1 10 0.000 0 0 7 0 0 5 + 2070 5490 2340 5490 2340 5580 2070 5580 2070 5490 +2 2 0 1 0 7 50 -1 50 0.000 0 0 7 0 0 5 + 3330 5400 3600 5400 3600 5490 3330 5490 3330 5400 +2 2 0 1 0 32 50 -1 20 0.000 0 0 -1 0 0 5 + 3240 4275 3510 4275 3510 4365 3240 4365 3240 4275 +2 2 0 1 0 32 50 -1 20 0.000 0 0 -1 0 0 5 + 3240 4185 3510 4185 3510 4275 3240 4275 3240 4185 +2 2 0 1 -1 32 50 -1 20 0.000 0 0 -1 0 0 5 + 3240 4095 3510 4095 3510 4185 3240 4185 3240 4095 +2 2 0 1 0 32 50 -1 20 0.000 0 0 -1 0 0 5 + 3240 4005 3510 4005 3510 4095 3240 4095 3240 4005 +2 2 0 1 0 32 50 -1 20 0.000 0 0 -1 0 0 5 + 3240 3915 3510 3915 3510 4005 3240 4005 3240 3915 +2 2 0 1 0 32 50 -1 20 0.000 0 0 -1 0 0 5 + 3330 5490 3600 5490 3600 5580 3330 5580 3330 5490 +4 0 0 50 -1 0 10 0.0000 4 105 75 1980 4545 0\001 +4 0 0 50 -1 0 10 0.0000 4 105 420 3555 4545 n/b - 1\001 +4 0 0 50 -1 0 18 0.0000 4 30 180 3015 5265 ...\001 +4 0 0 50 -1 0 10 0.0000 4 105 75 2250 4545 1\001 +4 0 0 50 -1 0 10 0.0000 4 105 75 2520 4545 2\001 +4 0 0 50 -1 0 18 0.0000 4 30 180 2880 4500 ...\001 +4 0 0 50 -1 0 10 0.0000 4 135 1410 4050 5310 Buckets Physical View\001 +4 0 0 50 -1 0 10 0.0000 4 135 1350 4050 4140 Buckets Logical View\001 +4 0 0 50 -1 0 10 0.0000 4 135 120 1665 3780 a)\001 +4 0 0 50 -1 0 10 0.0000 4 135 135 1620 4950 b)\001 diff --git a/vldb07/figs/brz.fig b/vldb07/figs/brz.fig new file mode 100755 index 0000000..a275032 --- /dev/null +++ b/vldb07/figs/brz.fig @@ -0,0 +1,183 @@ +#FIG 3.2 Produced by xfig version 3.2.5-alpha5 +Landscape +Center +Metric +A4 +100.00 +Single +-2 +1200 2 +0 32 #bdbebd +0 33 #bdbebd +0 34 #bdbebd +0 35 #4a4d4a +0 36 #bdbebd +0 37 #4a4d4a +0 38 #bdbebd +0 39 #bdbebd +0 40 #bdbebd +6 3427 4042 3852 4211 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3427 4041 3852 4041 3852 4211 3427 4211 3427 4041 +4 0 0 50 -1 0 14 0.0000 4 30 180 3551 4140 ...\001 +-6 +6 3410 5689 3835 5859 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3410 5689 3835 5689 3835 5858 3410 5858 3410 5689 +4 0 0 50 -1 0 14 0.0000 4 30 180 3534 5788 ...\001 +-6 +6 3825 5445 4455 5535 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 4140 5445 4095 5490 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 4140 5445 4185 5490 +3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 8 + 3825 5535 3825 5490 3870 5490 3915 5490 3959 5490 4006 5490 + 4095 5490 4095 5490 + 0.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 +3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 7 + 4455 5535 4455 5490 4410 5490 4365 5490 4321 5490 4274 5490 + 4185 5490 + 0.000 1.000 1.000 1.000 1.000 1.000 0.000 +-6 +6 1873 5442 2323 5532 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 2098 5442 2066 5487 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 2098 5442 2130 5487 +3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 8 + 1873 5532 1873 5487 1905 5487 1937 5487 1969 5487 2002 5487 + 2066 5487 2066 5487 + 0.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 +3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 7 + 2323 5532 2323 5487 2291 5487 2259 5487 2227 5487 2194 5487 + 2130 5487 + 0.000 1.000 1.000 1.000 1.000 1.000 0.000 +-6 +6 2338 5442 2968 5532 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 2653 5442 2608 5487 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 2653 5442 2698 5487 +3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 8 + 2338 5532 2338 5487 2383 5487 2428 5487 2473 5487 2518 5487 + 2608 5487 2608 5487 + 0.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 +3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 7 + 2968 5532 2968 5487 2923 5487 2878 5487 2833 5487 2788 5487 + 2698 5487 + 0.000 1.000 1.000 1.000 1.000 1.000 0.000 +-6 +6 2475 4500 4770 5175 +2 1 0 1 -1 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3137 5013 3845 5013 +2 2 0 1 0 37 50 -1 20 0.000 0 0 7 0 0 5 + 2497 4675 2711 4675 2711 4845 2497 4845 2497 4675 +2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5 + 2711 4675 2923 4675 2923 4845 2711 4845 2711 4675 +2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5 + 2711 4506 2923 4506 2923 4675 2711 4675 2711 4506 +2 2 0 1 0 36 50 -1 43 0.000 0 0 7 0 0 5 + 3775 4675 3987 4675 3987 4845 3775 4845 3775 4675 +2 2 0 1 0 36 50 -1 43 0.000 0 0 7 0 0 5 + 3775 4845 3987 4845 3987 5013 3775 5013 3775 4845 +2 2 0 1 -1 7 50 -1 -1 0.000 0 0 7 0 0 5 + 2711 4845 2923 4845 2923 5013 2711 5013 2711 4845 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 2923 4845 3137 4845 3137 5013 2923 5013 2923 4845 +2 2 0 1 0 36 50 -1 43 0.000 0 0 7 0 0 5 + 3775 4506 3987 4506 3987 4675 3775 4675 3775 4506 +2 2 0 1 0 37 50 -1 20 0.000 0 0 7 0 0 5 + 2497 4845 2711 4845 2711 5013 2497 5013 2497 4845 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 2923 4675 3137 4675 3137 4845 2923 4845 2923 4675 +4 0 0 50 -1 0 14 0.0000 4 30 180 3385 4929 ...\001 +4 0 -1 50 -1 0 7 0.0000 2 75 60 2569 5140 0\001 +4 0 -1 50 -1 0 7 0.0000 2 75 60 2781 5140 1\001 +4 0 -1 50 -1 0 7 0.0000 2 75 60 2995 5140 2\001 +4 0 -1 50 -1 0 7 0.0000 2 75 405 4059 4845 Buckets\001 +4 0 -1 50 -1 0 7 0.0000 2 105 1095 3775 5140 ${\\lceil n/b\\rceil - 1}$\001 +-6 +6 2983 5446 3433 5536 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3208 5446 3176 5491 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3208 5446 3240 5491 +3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 8 + 2983 5536 2983 5491 3015 5491 3047 5491 3079 5491 3112 5491 + 3176 5491 3176 5491 + 0.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 +3 0 0 1 0 7 50 -1 -1 0.000 0 0 0 7 + 3433 5536 3433 5491 3401 5491 3369 5491 3337 5491 3304 5491 + 3240 5491 + 0.000 1.000 1.000 1.000 1.000 1.000 0.000 +-6 +2 2 0 1 0 36 50 -1 -1 0.000 0 0 7 0 0 5 + 3852 4041 4066 4041 4066 4211 3852 4211 3852 4041 +2 2 0 1 0 36 50 -1 -1 0.000 0 0 7 0 0 5 + 4066 4041 4279 4041 4279 4211 4066 4211 4066 4041 +2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5 + 1937 4041 2149 4041 2149 4211 1937 4211 1937 4041 +2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5 + 2149 4041 2362 4041 2362 4211 2149 4211 2149 4041 +2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5 + 2362 4041 2576 4041 2576 4211 2362 4211 2362 4041 +2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5 + 2576 4041 2788 4041 2788 4211 2576 4211 2576 4041 +2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5 + 2788 4041 3002 4041 3002 4211 2788 4211 2788 4041 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3214 4041 3427 4041 3427 4211 3214 4211 3214 4041 +2 2 0 1 0 36 50 -1 -1 0.000 0 0 7 0 0 5 + 4279 4041 4492 4041 4492 4211 4279 4211 4279 4041 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3002 4041 3214 4041 3214 4211 3002 4211 3002 4041 +2 2 0 1 0 37 50 -1 20 0.000 0 0 7 0 0 5 + 2132 5689 2345 5689 2345 5858 2132 5858 2132 5689 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 3197 5689 3410 5689 3410 5858 3197 5858 3197 5689 +2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5 + 2771 5689 2985 5689 2985 5858 2771 5858 2771 5689 +2 2 0 1 0 36 50 -1 43 0.000 0 0 7 0 0 5 + 4262 5689 4475 5689 4475 5858 4262 5858 4262 5689 +2 2 0 1 0 36 50 -1 43 0.000 0 0 7 0 0 5 + 4049 5689 4262 5689 4262 5858 4049 5858 4049 5689 +2 2 0 1 0 7 50 -1 41 0.000 0 0 -1 0 0 5 + 2985 5689 3197 5689 3197 5858 2985 5858 2985 5689 +2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5 + 2345 5689 2559 5689 2559 5858 2345 5858 2345 5689 +2 2 0 1 0 37 50 -1 20 0.000 0 0 7 0 0 5 + 1914 5687 2127 5687 2127 5856 1914 5856 1914 5687 +2 2 0 1 0 36 50 -1 43 0.000 0 0 7 0 0 5 + 3835 5689 4049 5689 4049 5858 3835 5858 3835 5689 +2 2 0 1 0 37 50 -1 -1 0.000 0 0 7 0 0 5 + 2559 5689 2771 5689 2771 5858 2559 5858 2559 5689 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 5 + 1 1 1.00 60.00 120.00 + 3330 4275 3330 4365 3330 4410 3330 4455 3330 4500 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 7 1 0 2 + 1 1 1.00 45.00 60.00 + 3880 5168 4140 5445 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 7 1 0 2 + 1 1 1.00 45.00 60.00 + 3025 5170 3205 5440 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 7 1 0 2 + 1 1 1.00 45.00 60.00 + 2805 5164 2653 5438 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 7 1 0 2 + 1 1 1.00 45.00 60.00 + 2577 5170 2103 5434 +4 0 -1 50 -1 0 7 0.0000 2 120 645 4562 4168 Key Set $S$\001 +4 0 -1 50 -1 0 7 0.0000 2 75 60 2008 3999 0\001 +4 0 -1 50 -1 0 7 0.0000 2 75 60 2220 3999 1\001 +4 0 -1 50 -1 0 7 0.0000 2 75 165 4314 3999 n-1\001 +4 0 -1 50 -1 0 7 0.0000 2 75 60 1991 5985 0\001 +4 0 -1 50 -1 0 7 0.0000 2 75 60 2203 5985 1\001 +4 0 -1 50 -1 0 7 0.0000 2 75 165 4297 5985 n-1\001 +4 0 -1 50 -1 0 7 0.0000 2 75 555 4545 5816 Hash Table\001 +4 0 -1 50 -1 0 3 0.0000 2 75 450 1980 5625 MPHF$_0$\001 +4 0 -1 50 -1 0 3 0.0000 2 75 450 2520 5625 MPHF$_1$\001 +4 0 -1 50 -1 0 3 0.0000 2 75 450 3015 5625 MPHF$_2$\001 +4 0 -1 50 -1 0 3 0.0000 2 75 1065 3825 5625 MPHF$_{\\lceil n/b \\rceil - 1}$\001 +4 0 -1 50 -1 0 7 0.0000 2 105 585 1440 4455 Partitioning\001 +4 0 -1 50 -1 0 7 0.0000 2 105 495 1440 5265 Searching\001 diff --git a/vldb07/figs/brz_temporegressao.png b/vldb07/figs/brz_temporegressao.png new file mode 100644 index 0000000..ba6316a Binary files /dev/null and b/vldb07/figs/brz_temporegressao.png differ diff --git a/vldb07/figs/brzfabiano.fig b/vldb07/figs/brzfabiano.fig new file mode 100755 index 0000000..e08aae4 --- /dev/null +++ b/vldb07/figs/brzfabiano.fig @@ -0,0 +1,153 @@ +#FIG 3.2 Produced by xfig version 3.2.5-alpha5 +Landscape +Center +Metric +A4 +100.00 +Single +-2 +1200 2 +0 32 #bebebe +6 2025 3015 3555 3690 +2 3 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 8 + 2025 3285 2295 3285 2295 3015 3285 3015 3285 3285 3555 3285 + 2790 3690 2025 3285 +4 0 0 50 -1 0 10 0.0000 4 135 765 2385 3330 Partitioning\001 +-6 +6 1890 3735 3780 4365 +6 2430 3735 2700 4365 +6 2430 3915 2700 4365 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2430 4275 2700 4275 2700 4365 2430 4365 2430 4275 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2430 4185 2700 4185 2700 4275 2430 4275 2430 4185 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2430 4095 2700 4095 2700 4185 2430 4185 2430 4095 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2430 4005 2700 4005 2700 4095 2430 4095 2430 4005 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2430 3915 2700 3915 2700 4005 2430 4005 2430 3915 +-6 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2430 3825 2700 3825 2700 3915 2430 3915 2430 3825 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2430 3735 2700 3735 2700 3825 2430 3825 2430 3735 +-6 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 1890 4275 2160 4275 2160 4365 1890 4365 1890 4275 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 1890 4185 2160 4185 2160 4275 1890 4275 1890 4185 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2160 4275 2430 4275 2430 4365 2160 4365 2160 4275 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2160 4185 2430 4185 2430 4275 2160 4275 2160 4185 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2160 4095 2430 4095 2430 4185 2160 4185 2160 4095 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2160 4005 2430 4005 2430 4095 2160 4095 2160 4005 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2160 3915 2430 3915 2430 4005 2160 4005 2160 3915 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2700 4275 2970 4275 2970 4365 2700 4365 2700 4275 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2700 4185 2970 4185 2970 4275 2700 4275 2700 4185 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2700 4095 2970 4095 2970 4185 2700 4185 2700 4095 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2700 4005 2970 4005 2970 4095 2700 4095 2700 4005 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2160 3825 2430 3825 2430 3915 2160 3915 2160 3825 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3240 4275 3510 4275 3510 4365 3240 4365 3240 4275 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3510 4275 3780 4275 3780 4365 3510 4365 3510 4275 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2970 4275 3240 4275 3240 4365 2970 4365 2970 4275 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3240 4185 3510 4185 3510 4275 3240 4275 3240 4185 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 1890 4095 2160 4095 2160 4185 1890 4185 1890 4095 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3510 4185 3780 4185 3780 4275 3510 4275 3510 4185 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3240 4095 3510 4095 3510 4185 3240 4185 3240 4095 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3240 4005 3510 4005 3510 4095 3240 4095 3240 4005 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3240 3915 3510 3915 3510 4005 3240 4005 3240 3915 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 1890 4365 3780 4365 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2970 4185 3240 4185 3240 4275 2970 4275 2970 4185 +-6 +6 1260 5310 4230 5580 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 1260 5400 4230 5400 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 1530 5310 1800 5310 1800 5400 1530 5400 1530 5310 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2070 5310 2340 5310 2340 5400 2070 5400 2070 5310 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2340 5310 2610 5310 2610 5400 2340 5400 2340 5310 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2610 5310 2880 5310 2880 5400 2610 5400 2610 5310 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2880 5310 3150 5310 3150 5400 2880 5400 2880 5310 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3420 5310 3690 5310 3690 5400 3420 5400 3420 5310 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3690 5310 3960 5310 3960 5400 3690 5400 3690 5310 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3960 5310 4230 5310 4230 5400 3960 5400 3960 5310 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 1800 5310 2070 5310 2070 5400 1800 5400 1800 5310 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3150 5310 3420 5310 3420 5400 3150 5400 3150 5310 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 1260 5310 1530 5310 1530 5400 1260 5400 1260 5310 +4 0 0 50 -1 0 10 0.0000 4 105 210 4005 5580 n-1\001 +4 0 0 50 -1 0 10 0.0000 4 105 75 1350 5580 0\001 +-6 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 1260 2925 4230 2925 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 1530 2835 1800 2835 1800 2925 1530 2925 1530 2835 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2070 2835 2340 2835 2340 2925 2070 2925 2070 2835 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2340 2835 2610 2835 2610 2925 2340 2925 2340 2835 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2610 2835 2880 2835 2880 2925 2610 2925 2610 2835 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 2880 2835 3150 2835 3150 2925 2880 2925 2880 2835 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3420 2835 3690 2835 3690 2925 3420 2925 3420 2835 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3690 2835 3960 2835 3960 2925 3690 2925 3690 2835 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3960 2835 4230 2835 4230 2925 3960 2925 3960 2835 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 1800 2835 2070 2835 2070 2925 1800 2925 1800 2835 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 3150 2835 3420 2835 3420 2925 3150 2925 3150 2835 +2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 + 1260 2835 1530 2835 1530 2925 1260 2925 1260 2835 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3510 4410 3510 4590 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3510 4410 3600 4410 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3690 4410 3780 4410 +2 3 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 8 + 2025 4815 2295 4815 2295 4545 3285 4545 3285 4815 3555 4815 + 2790 5220 2025 4815 +2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 3780 4410 3780 4590 +4 0 0 50 -1 0 10 0.0000 4 135 585 2475 4860 Searching\001 +4 0 0 50 -1 0 10 0.0000 4 105 75 1980 4545 0\001 +4 0 0 50 -1 0 10 0.0000 4 105 690 4410 5400 Hash Table\001 +4 0 0 50 -1 0 10 0.0000 4 105 480 4410 4230 Buckets\001 +4 0 0 50 -1 0 10 0.0000 4 135 555 4410 2925 Key set S\001 +4 0 0 50 -1 0 10 0.0000 4 105 75 1350 2745 0\001 +4 0 0 50 -1 0 10 0.0000 4 105 210 4005 2745 n-1\001 +4 0 0 50 -1 0 10 0.0000 4 105 420 3555 4545 n/b - 1\001 diff --git a/vldb07/figs/minimalperfecthash-ph-mph.png b/vldb07/figs/minimalperfecthash-ph-mph.png new file mode 100644 index 0000000..a12825c Binary files /dev/null and b/vldb07/figs/minimalperfecthash-ph-mph.png differ diff --git a/vldb07/introduction.tex b/vldb07/introduction.tex new file mode 100755 index 0000000..de7282b --- /dev/null +++ b/vldb07/introduction.tex @@ -0,0 +1,109 @@ +%% Nivio: 22/jan/06 23/jan/06 29/jan +% Time-stamp: +\section{Introduction} +\label{sec:intro} + +\enlargethispage{2\baselineskip} +Suppose~$U$ is a universe of \textit{keys} of size $u$. +Let $h:U\to M$ be a {\em hash function} that maps the keys from~$U$ +to a given interval of integers $M=[0,m-1]=\{0,1,\dots,m-1\}$. +Let~$S\subseteq U$ be a set of~$n$ keys from~$U$, where $ n \ll u$. +Given a key~$x\in S$, the hash function~$h$ computes an integer in +$[0,m-1]$ for the storage or retrieval of~$x$ in a {\em hash table}. +% Hashing methods for {\em non-static sets} of keys can be used to construct +% data structures storing $S$ and supporting membership queries +% ``$x \in S$?'' in expected time $O(1)$. +% However, they involve a certain amount of wasted space owing to unused +% locations in the table and waisted time to resolve collisions when +% two keys are hashed to the same table location. +A perfect hash function maps a {\em static set} $S$ of $n$ keys from $U$ into a set of $m$ integer +numbers without collisions, where $m$ is greater than or equal to $n$. +If $m$ is equal to $n$, the function is called minimal. + +% Figure~\ref{fig:minimalperfecthash-ph-mph}(a) illustrates a perfect hash function and +% Figure~\ref{fig:minimalperfecthash-ph-mph}(b) illustrates a minimal perfect hash function (MPHF). +% +% \begin{figure} +% \centering +% \scalebox{0.7}{\epsfig{file=figs/minimalperfecthash-ph-mph.ps}} +% \caption{(a) Perfect hash function (b) Minimal perfect hash function (MPHF)} +% \label{fig:minimalperfecthash-ph-mph} +% %\vspace{-5mm} +% \end{figure} + +Minimal perfect hash functions are widely used for memory efficient storage and fast +retrieval of items from static sets, such as words in natural languages, +reserved words in programming languages or interactive systems, universal resource +locations (URLs) in web search engines, or item sets in data mining techniques. +Search engines are nowadays indexing tens of billions of pages and algorithms +like PageRank~\cite{Brin1998}, which uses the web link structure to derive a +measure of popularity for Web pages, would benefit from a MPHF for storage and +retrieval of such huge sets of URLs. +For instance, the TodoBr\footnote{TodoBr ({\texttt www.todobr.com.br}) is a trademark of +Akwan Information Technologies, which was acquired by Google Inc. in July 2005.} +search engine used the algorithm proposed hereinafter to +improve and to scale its link analysis system. +The WebGraph research group~\cite{bv04} would +also benefit from a MPHF for sets in the order of billions of URLs to scale +and to improve the storange requirements of their algorithms on Graph compression. + + Another interesting application for MPHFs is its use as an indexing structure + for databases. + The B+ tree is very popular as an indexing structure for dynamic applications + with frequent insertions and deletions of records. + However, for applications with sporadic modifications and a huge number of + queries the B+ tree is not the best option, + because it performs poorly with very large sets of keys + such as those required for the new frontiers of database applications~\cite{s05}. + Therefore, there are applications for MPHFs in + information retrieval systems, database systems, language translation systems, + electronic commerce systems, compilers, operating systems, among others. + +Until now, because of the limitations of current algorithms, +the use of MPHFs is restricted to scenarios where the set of keys being hashed is +relatively small. +However, in many cases it is crucial to deal in an efficient way with very large +sets of keys. +Due to the exponential growth of the Web, the work with huge collections is becoming +a daily task. +For instance, the simple assignment of number identifiers to web pages of a collection +can be a challenging task. +While traditional databases simply cannot handle more traffic once the working +set of URLs does not fit in main memory anymore~\cite{s05}, the algorithm we propose here to +construct MPHFs can easily scale to billions of entries. +% using stock hardware. + +As there are many applications for MPHFs, it is +important to design and implement space and time efficient algorithms for +constructing such functions. +The attractiveness of using MPHFs depends on the following issues: +\begin{enumerate} +\item The amount of CPU time required by the algorithms for constructing MPHFs. +\item The space requirements of the algorithms for constructing MPHFs. +\item The amount of CPU time required by a MPHF for each retrieval. +\item The space requirements of the description of the resulting MPHFs to be + used at retrieval time. +\end{enumerate} + +\enlargethispage{2\baselineskip} +This paper presents a novel external memory based algorithm for constructing MPHFs that +is very efficient in these four requirements. +First, the algorithm is linear on the size of keys to construct a MPHF, +which is optimal. +For instance, for a collection of 1 billion URLs +collected from the web, each one 64 characters long on average, the time to construct a +MPHF using a 2.4 gigahertz PC with 500 megabytes of available main memory +is approximately 3 hours. +Second, the algorithm needs a small a priori defined vector of $\lceil n/b \rceil$ +one byte entries in main memory to construct a MPHF. +For the collection of 1 billion URLs and using $b=175$, the algorithm needs only +5.45 megabytes of internal memory. +Third, the evaluation of the MPHF for each retrieval requires three memory accesses and +the computation of three universal hash functions. +This is not optimal as any MPHF requires at least one memory access and the computation +of two universal hash functions. +Fourth, the description of a MPHF takes a constant number of bits for each key, which is optimal. +For the collection of 1 billion URLs, it needs 8.1 bits for each key, +while the theoretical lower bound is $1/\ln2 \approx 1.4427$ bits per +key~\cite{m84}. + diff --git a/vldb07/makefile b/vldb07/makefile new file mode 100755 index 0000000..1b95644 --- /dev/null +++ b/vldb07/makefile @@ -0,0 +1,17 @@ +all: + latex vldb.tex + bibtex vldb + latex vldb.tex + latex vldb.tex + dvips vldb.dvi -o vldb.ps + ps2pdf vldb.ps + chmod -R g+rwx * + +perm: + chmod -R g+rwx * + +run: clean all + gv vldb.ps & +clean: + rm *.aux *.bbl *.blg *.log *.ps *.pdf *.dvi + diff --git a/vldb07/partitioningthekeys.tex b/vldb07/partitioningthekeys.tex new file mode 100755 index 0000000..e9a48c4 --- /dev/null +++ b/vldb07/partitioningthekeys.tex @@ -0,0 +1,141 @@ +%% Nivio: 21/jan/06 +% Time-stamp: +\vspace{-2mm} +\subsection{Partitioning step} +\label{sec:partitioning-keys} + +The set $S$ of $n$ keys is partitioned into $\lceil n/b \rceil$ buckets, +where $b$ is a suitable parameter chosen to guarantee +that each bucket has at most 256 keys with high probability +(see Section~\ref{sec:determining-b}). +The partitioning step works as follows: + +\begin{figure}[h] +\hrule +\hrule +\vspace{2mm} +\begin{tabbing} +aa\=type booleanx \== (false, true); \kill +\> $\blacktriangleright$ Let $\beta$ be the size in bytes of the set $S$ \\ +\> $\blacktriangleright$ Let $\mu$ be the size in bytes of an a priori reserved \\ +\> ~~~ internal memory area \\ +\> $\blacktriangleright$ Let $N = \lceil \beta/\mu \rceil$ be the number of key blocks that will \\ +\> ~~~ be read from disk into an internal memory area \\ +\> $\blacktriangleright$ Let $\mathit{size}$ be a vector that stores the size of each bucket \\ +\> $1.$ {\bf for} $j = 1$ {\bf to} $N$ {\bf do} \\ +\> ~~ $1.1$ Read block $B_j$ of keys from disk \\ +\> ~~ $1.2$ Cluster $B_j$ into $\lceil n/b \rceil$ buckets using a bucket sort \\ +\> ~~~~~~~ algorithm and update the entries in the vector {\it size} \\ +\> ~~ $1.3$ Dump $B_j$ to the disk into File $j$\\ +\> $2.$ Compute the {\it offset} vector and dump it to the disk. +\end{tabbing} +\hrule +\hrule +\vspace{-1.0mm} +\caption{Partitioning step} +\vspace{-3mm} +\label{fig:partitioningstep} +\end{figure} + +Statement 1.1 of the {\bf for} loop presented in Figure~\ref{fig:partitioningstep} +reads sequentially all the keys of block $B_j$ from disk into an internal area +of size $\mu$. + +Statement 1.2 performs an indirect bucket sort of the keys in block $B_j$ +and at the same time updates the entries in the vector {\em size}. +Let us briefly describe how~$B_j$ is partitioned among the~$\lceil n/b\rceil$ +buckets. +We use a local array of $\lceil n/b \rceil$ counters to store a +count of how many keys from $B_j$ belong to each bucket. +%At the same time, the global vector {\it size} is computed based on the local +%counters. +The pointers to the keys in each bucket $i$, $0 \leq i < \lceil n/b \rceil$, +are stored in contiguous positions in an array. +For this we first reserve the required number of entries +in this array of pointers using the information from the array of counters. +Next, we place the pointers to the keys in each bucket into the respective +reserved areas in the array (i.e., we place the pointers to the keys in bucket 0, +followed by the pointers to the keys in bucket 1, and so on). + +\enlargethispage{2\baselineskip} +To find the bucket address of a given key +we use the universal hash function $h_0(k)$~\cite{j97}. +Key~$k$ goes into bucket~$i$, where +%Then, for each integer $h_0(k)$ the respective bucket address is obtained +%as follows: +\begin{eqnarray} \label{eq:bucketindex} +i=h_0(k) \bmod \left \lceil \frac{n}{b} \right \rceil. +\end{eqnarray} + +Figure~\ref{fig:brz-partitioning}(a) shows a \emph{logical} view of the +$\lceil n/b \rceil$ buckets generated in the partitioning step. +%In this case, the keys of each bucket are put together by the pointers to +%each key stored +%in contiguous positions in the array of pointers. +In reality, the keys belonging to each bucket are distributed among many files, +as depicted in Figure~\ref{fig:brz-partitioning}(b). +In the example of Figure~\ref{fig:brz-partitioning}(b), the keys in bucket 0 +appear in files 1 and $N$, the keys in bucket 1 appear in files 1, 2 +and $N$, and so on. + +\vspace{-7mm} +\begin{figure}[ht] +\centering +\begin{picture}(0,0)% +\includegraphics{figs/brz-partitioning.ps}% +\end{picture}% +\setlength{\unitlength}{4144sp}% +% +\begingroup\makeatletter\ifx\SetFigFont\undefined% +\gdef\SetFigFont#1#2#3#4#5{% + \reset@font\fontsize{#1}{#2pt}% + \fontfamily{#3}\fontseries{#4}\fontshape{#5}% + \selectfont}% +\fi\endgroup% +\begin{picture}(4371,1403)(1,-6977) +\put(333,-6421){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}0}}}} +\put(545,-6421){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}1}}}} +\put(759,-6421){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}2}}}} +\put(1539,-6421){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}${\lceil n/b\rceil - 1}$}}}} +\put(541,-6676){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Buckets Logical View}}}} +\put(3547,-6120){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}} +\put(3547,-6188){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}} +\put(3547,-6255){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}} +\put(3107,-6120){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}} +\put(3107,-6188){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}} +\put(3107,-6255){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}} +\put(4177,-6224){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}} +\put(4177,-6269){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}} +\put(4177,-6314){\makebox(0,0)[lb]{\smash{{\SetFigFont{12}{14.4}{\familydefault}{\mddefault}{\updefault}.}}}} +\put(3016,-6721){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}File 1}}}} +\put(3466,-6721){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}File 2}}}} +\put(4096,-6721){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}File N}}}} +\put(3196,-6946){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Buckets Physical View}}}} +\end{picture}% +\caption{Situation of the buckets at the end of the partitioning step: (a) Logical view (b) Physical view} +\label{fig:brz-partitioning} +\vspace{-2mm} +\end{figure} + +This scattering of the keys in the buckets could generate a performance +problem because of the potential number of seeks +needed to read the keys in each bucket from the $N$ files in disk +during the searching step. +But, as we show later in Section~\ref{sec:analytcal-results}, the number of seeks +can be kept small using buffering techniques. +Considering that only the vector {\it size}, which has $\lceil n/b \rceil$ +one-byte entries (remember that each bucket has at most 256 keys), +must be maintained in main memory during the searching step, +almost all main memory is available to be used as disk I/O buffer. + +The last step is to compute the {\it offset} vector and dump it to the disk. +We use the vector $\mathit{size}$ to compute the +$\mathit{offset}$ displacement vector. +The $\mathit{offset}[i]$ entry contains the number of keys +in the buckets $0, 1, \dots, i-1$. +As {\it size}$[i]$ stores the number of keys +in bucket $i$, where $0 \leq i <\lceil n/b \rceil$, we have +\begin{displaymath} +\mathit{offset}[i] = \sum_{j=0}^{i-1} \mathit{size}[j] \cdot +\end{displaymath} + diff --git a/vldb07/performancenewalgorithm.tex b/vldb07/performancenewalgorithm.tex new file mode 100755 index 0000000..6911282 --- /dev/null +++ b/vldb07/performancenewalgorithm.tex @@ -0,0 +1,113 @@ +% Nivio: 29/jan/06 +% Time-stamp: +\subsection{Performance of the new algorithm} +\label{sec:performance} +%As we have done for the internal memory based algorithm, + +The runtime of our algorithm is also a random variable, but now it follows a +(highly concentrated) normal distribution, as we discuss at the end of this +section. Again, we are interested in verifying the linearity claim made in +Section~\ref{sec:linearcomplexity}. Therefore, we ran the algorithm for +several numbers $n$ of keys in $S$. + +The values chosen for $n$ were $1, 2, 4, 8, 16, 32, 64, 128, 512$ and $1000$ +million. +%Just the small vector {\it size} must be kept in main memory, +%as we saw in Section~\ref{sec:memconstruction}. +We limited the main memory in 500 megabytes for the experiments. +The size $\mu$ of the a priori reserved internal memory area +was set to 250 megabytes, the parameter $b$ was set to $175$ and +the building block algorithm parameter $c$ was again set to $1$. +In Section~\ref{sec:contr-disk-access} we show how $\mu$ +affects the runtime of the algorithm. The other two parameters +have insignificant influence on the runtime. + +We again use a statistical method for determining a suitable sample size +%~\cite[Chapter 13]{j91} +to estimate the number of trials to be run for each value of $n$. We got that +just one trial for each $n$ would be enough with a confidence level of $95\%$. +However, we made 10 trials. This number of trials seems rather small, but, as +shown below, the behavior of our algorithm is very stable and its runtime is +almost deterministic (i.e., the standard deviation is very small). + +Table~\ref{tab:mediasbrz} presents the runtime average for each $n$, +the respective standard deviations, and +the respective confidence intervals given by +the average time $\pm$ the distance from average time +considering a confidence level of $95\%$. +Observing the runtime averages we noticed that +the algorithm runs in expected linear time, +as shown in~Section~\ref{sec:linearcomplexity}. Better still, +it is only approximately $60\%$ slower than our internal memory based algorithm. +To get that value we used the linear regression model obtained for the runtime of +the internal memory based algorithm to estimate how much time it would require +for constructing a MPHF for a set of 1 billion keys. +We got 2.3 hours for the internal memory based algorithm and we measured +3.67 hours on average for our algorithm. +Increasing the size of the internal memory area +from 250 to 600 megabytes (see Section~\ref{sec:contr-disk-access}), +we have brought the time to 3.09 hours. In this case, our algorithm is +just $34\%$ slower in this setup. + +\enlargethispage{2\baselineskip} +\begin{table*}[htb] +\vspace{-1mm} +\begin{center} +{\scriptsize +\begin{tabular}{|l|c|c|c|c|c|} +\hline +$n$ (millions) & 1 & 2 & 4 & 8 & 16 \\ +\hline % Part. 16 \% 16 \% 16 \% 18 \% 20\% +Average time (s) & $6.9 \pm 0.3$ & $13.8 \pm 0.2$ & $31.9 \pm 0.7$ & $69.9 \pm 1.1$ & $140.6 \pm 2.5$ \\ +SD & $0.4$ & $0.2$ & $0.9$ & $1.5$ & $3.5$ \\ +\hline +\hline +$n$ (millions) & 32 & 64 & 128 & 512 & 1000 \\ +\hline % Part. 20 \% 20\% 20\% 18\% 18\% +Average time (s) & $284.3 \pm 1.1$ & $587.9 \pm 3.9$ & $1223.6 \pm 4.9$ & $5966.4 \pm 9.5$ & $13229.5 \pm 12.7$ \\ +SD & $1.6$ & $5.5$ & $6.8$ & $13.2$ & $18.6$ \\ +\hline + +\end{tabular} +\vspace{-1mm} +} +\end{center} +\caption{Our algorithm: average time in seconds for constructing a MPHF, +the standard deviation (SD), and the confidence intervals considering +a confidence level of $95\%$. +} +\label{tab:mediasbrz} +\vspace{-5mm} +\end{table*} + +Figure~\ref{fig:brz_temporegressao} +presents the runtime for each trial. In addition, +the solid line corresponds to a linear regression model +obtained from the experimental measurements. +As we were expecting the runtime for a given $n$ has almost no +variation. + +\begin{figure}[htb] +\begin{center} +\scalebox{0.4}{\includegraphics{figs/brz_temporegressao.eps}} +\caption{Time versus number of keys in $S$ for our algorithm. The solid line corresponds to +a linear regression model.} +\label{fig:brz_temporegressao} +\end{center} +\vspace{-9mm} +\end{figure} + +An intriguing observation is that the runtime of the algorithm is almost +deterministic, in spite of the fact that it uses as building block an +algorithm with a considerable fluctuation in its runtime. A given bucket~$i$, +$0 \leq i < \lceil n/b \rceil$, is a small set of keys (at most 256 keys) and, +as argued in Section~\ref{sec:intern-memory-algor}, the runtime of the +building block algorithm is a random variable~$X_i$ with high fluctuation. +However, the runtime~$Y$ of the searching step of our algorithm is given +by~$Y=\sum_{0\leq i<\lceil n/b\rceil}X_i$. Under the hypothesis that +the~$X_i$ are independent and bounded, the {\it law of large numbers} (see, +e.g., \cite{j91}) implies that the random variable $Y/\lceil n/b\rceil$ +converges to a constant as~$n\to\infty$. This explains why the runtime of our +algorithm is almost deterministic. + + diff --git a/vldb07/references.bib b/vldb07/references.bib new file mode 100755 index 0000000..d2ea475 --- /dev/null +++ b/vldb07/references.bib @@ -0,0 +1,814 @@ + +@InProceedings{Brin1998, + author = "Sergey Brin and Lawrence Page", + title = "The Anatomy of a Large-Scale Hypertextual Web Search Engine", + booktitle = "Proceedings of the 7th International {World Wide Web} + Conference", + pages = "107--117", + adress = "Brisbane, Australia", + month = "April", + year = 1998, + annote = "Artigo do Google." +} + +@inproceedings{p99, + author = {R. Pagh}, + title = {Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions}, + booktitle = {Workshop on Algorithms and Data Structures}, + pages = {49-54}, + year = 1999, + url = {citeseer.nj.nec.com/pagh99hash.html}, + key = {author} +} + +@article{p00, + author = {R. Pagh}, + title = {Faster deterministic dictionaries}, + journal = {Symposium on Discrete Algorithms (ACM SODA)}, + OPTvolume = {43}, + OPTnumber = {5}, + pages = {487--493}, + year = {2000} +} +@article{g81, + author = {G. H. Gonnet}, + title = {Expected Length of the Longest Probe Sequence in Hash Code Searching}, + journal = {J. ACM}, + volume = {28}, + number = {2}, + year = {1981}, + issn = {0004-5411}, + pages = {289--304}, + doi = {http://doi.acm.org/10.1145/322248.322254}, + publisher = {ACM Press}, + address = {New York, NY, USA}, + } + +@misc{r04, + author = "S. Rao", + title = "Combinatorial Algorithms Data Structures", + year = 2004, + howpublished = {CS 270 Spring}, + url = "citeseer.ist.psu.edu/700201.html" +} +@article{ra98, + author = {Martin Raab and Angelika Steger}, + title = {``{B}alls into Bins'' --- {A} Simple and Tight Analysis}, + journal = {Lecture Notes in Computer Science}, + volume = 1518, + pages = {159--170}, + year = 1998, + url = "citeseer.ist.psu.edu/raab98balls.html" +} + +@misc{mrs00, + author = "M. Mitzenmacher and A. Richa and R. Sitaraman", + title = "The power of two random choices: A survey of the techniques and results", + howpublished={In Handbook of Randomized + Computing, P. Pardalos, S. Rajasekaran, and J. Rolim, Eds. Kluwer}, + year = "2000", + url = "citeseer.ist.psu.edu/article/mitzenmacher00power.html" +} + +@article{dfm02, + author = {E. Drinea and A. Frieze and M. Mitzenmacher}, + title = {Balls and bins models with feedback}, + journal = {Symposium on Discrete Algorithms (ACM SODA)}, + pages = {308--315}, + year = {2002} +} +@Article{j97, + author = {Bob Jenkins}, + title = {Algorithm Alley: Hash Functions}, + journal = {Dr. Dobb's Journal of Software Tools}, + volume = {22}, + number = {9}, + month = {september}, + year = {1997} +} + +@article{gss01, + author = {N. Galli and B. Seybold and K. Simon}, + title = {Tetris-Hashing or optimal table compression}, + journal = {Discrete Applied Mathematics}, + volume = {110}, + number = {1}, + pages = {41--58}, + month = {june}, + publisher = {Elsevier Science}, + year = {2001} +} + +@article{s05, + author = {M. Seltzer}, + title = {Beyond Relational Databases}, + journal = {ACM Queue}, + volume = {3}, + number = {3}, + month = {April}, + year = {2005} +} + +@InProceedings{ss89, + author = {P. Schmidt and A. Siegel}, + title = {On aspects of universality and performance for closed hashing}, + booktitle = {Proc. 21th Ann. ACM Symp. on Theory of Computing -- STOC'89}, + month = {May}, + year = {1989}, + pages = {355--366} +} + +@article{asw00, + author = {M. Atici and D. R. Stinson and R. Wei.}, + title = {A new practical algorithm for the construction of a perfect hash function}, + journal = {Journal Combin. Math. Combin. Comput.}, + volume = {35}, + pages = {127--145}, + year = {2000} +} + +@article{swz00, + author = {D. R. Stinson and R. Wei and L. Zhu}, + title = {New constructions for perfect hash families and related structures using combinatorial designs and codes}, + journal = {Journal Combin. Designs.}, + volume = {8}, + pages = {189--200}, + year = {2000} +} + +@inproceedings{ht01, + author = {T. Hagerup and T. Tholey}, + title = {Efficient minimal perfect hashing in nearly minimal space}, + booktitle = {The 18th Symposium on Theoretical Aspects of Computer Science (STACS), volume 2010 of Lecture Notes in Computer Science}, + year = 2001, + pages = {317--326}, + key = {author} +} + +@inproceedings{dh01, + author = {M. Dietzfelbinger and T. Hagerup}, + title = {Simple minimal perfect hashing in less space}, + booktitle = {The 9th European Symposium on Algorithms (ESA), volume 2161 of Lecture Notes in Computer Science}, + year = 2001, + pages = {109--120}, + key = {author} +} + + +@MastersThesis{mar00, + author = {M. S. Neubert}, + title = {Algoritmos Distribu;os para a Constru;o de Arquivos invertidos}, + school = {Departamento de Ci;cia da Computa;o, Universidade Federal de Minas Gerais}, + year = 2000, + month = {Mar;}, + key = {author} +} + + +@Book{clrs01, + author = {T. H. Cormen and C. E. Leiserson and R. L. Rivest and C. Stein}, + title = {Introduction to Algorithms}, + publisher = {MIT Press}, + year = {2001}, + edition = {second}, +} + +@Book{j91, + author = {R. Jain}, + title = {The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modeling. }, + publisher = {John Wiley}, + year = {1991}, + edition = {first} +} + +@Book{k73, + author = {D. E. Knuth}, + title = {The Art of Computer Programming: Sorting and Searching}, + publisher = {Addison-Wesley}, + volume = {3}, + year = {1973}, + edition = {second}, +} + +@inproceedings{rp99, + author = {R. Pagh}, + title = {Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions}, + booktitle = {Workshop on Algorithms and Data Structures}, + pages = {49-54}, + year = 1999, + url = {citeseer.nj.nec.com/pagh99hash.html}, + key = {author} +} + +@inproceedings{hmwc93, + author = {G. Havas and B.S. Majewski and N.C. Wormald and Z.J. Czech}, + title = {Graphs, Hypergraphs and Hashing}, + booktitle = {19th International Workshop on Graph-Theoretic Concepts in Computer Science}, + publisher = {Springer Lecture Notes in Computer Science vol. 790}, + pages = {153-165}, + year = 1993, + key = {author} +} + +@inproceedings{bkz05, + author = {F.C. Botelho and Y. Kohayakawa and N. Ziviani}, + title = {A Practical Minimal Perfect Hashing Method}, + booktitle = {4th International Workshop on Efficient and Experimental Algorithms}, + publisher = {Springer Lecture Notes in Computer Science vol. 3503}, + pages = {488-500}, + Moth = May, + year = 2005, + key = {author} +} + +@Article{chm97, + author = {Z.J. Czech and G. Havas and B.S. Majewski}, + title = {Fundamental Study Perfect Hashing}, + journal = {Theoretical Computer Science}, + volume = {182}, + year = {1997}, + pages = {1-143}, + key = {author} +} + +@article{chm92, + author = {Z.J. Czech and G. Havas and B.S. Majewski}, + title = {An Optimal Algorithm for Generating Minimal Perfect Hash Functions}, + journal = {Information Processing Letters}, + volume = {43}, + number = {5}, + pages = {257-264}, + year = {1992}, + url = {citeseer.nj.nec.com/czech92optimal.html}, + key = {author} +} + +@Article{mwhc96, + author = {B.S. Majewski and N.C. Wormald and G. Havas and Z.J. Czech}, + title = {A family of perfect hashing methods}, + journal = {The Computer Journal}, + year = {1996}, + volume = {39}, + number = {6}, + pages = {547-554}, + key = {author} +} + +@InProceedings{bv04, +author = {P. Boldi and S. Vigna}, +title = {The WebGraph Framework I: Compression Techniques}, +booktitle = {13th International World Wide Web Conference}, +pages = {595--602}, +year = {2004} +} + + +@Book{z04, + author = {N. Ziviani}, + title = {Projeto de Algoritmos com implementa;es em Pascal e C}, + publisher = {Pioneira Thompson}, + year = 2004, + edition = {segunda edi;o} +} + + +@Book{p85, + author = {E. M. Palmer}, + title = {Graphical Evolution: An Introduction to the Theory of Random Graphs}, + publisher = {John Wiley \& Sons}, + year = {1985}, + address = {New York} +} + +@Book{imb99, + author = {I.H. Witten and A. Moffat and T.C. Bell}, + title = {Managing Gigabytes: Compressing and Indexing Documents and Images}, + publisher = {Morgan Kaufmann Publishers}, + year = 1999, + edition = {second edition} +} +@Book{wfe68, + author = {W. Feller}, + title = { An Introduction to Probability Theory and Its Applications}, + publisher = {Wiley}, + year = 1968, + volume = 1, + optedition = {second edition} +} + + +@Article{fhcd92, + author = {E.A. Fox and L. S. Heath and Q. Chen and A.M. Daoud}, + title = {Practical Minimal Perfect Hash Functions For Large Databases}, + journal = {Communications of the ACM}, + year = {1992}, + volume = {35}, + number = {1}, + pages = {105--121} +} + + +@inproceedings{fch92, + author = {E.A. Fox and Q.F. Chen and L.S. Heath}, + title = {A Faster Algorithm for Constructing Minimal Perfect Hash Functions}, + booktitle = {Proceedings of the 15th Annual International ACM SIGIR Conference + on Research and Development in Information Retrieval}, + year = {1992}, + pages = {266-273}, +} + +@article{c80, + author = {R.J. Cichelli}, + title = {Minimal perfect hash functions made simple}, + journal = {Communications of the ACM}, + volume = {23}, + number = {1}, + year = {1980}, + issn = {0001-0782}, + pages = {17--19}, + doi = {http://doi.acm.org/10.1145/358808.358813}, + publisher = {ACM Press}, + } + + +@TechReport{fhc89, + author = {E.A. Fox and L.S. Heath and Q.F. Chen}, + title = {An $O(n\log n)$ algorithm for finding minimal perfect hash functions}, + institution = {Virginia Polytechnic Institute and State University}, + year = {1989}, + OPTkey = {}, + OPTtype = {}, + OPTnumber = {}, + address = {Blacksburg, VA}, + month = {April}, + OPTnote = {}, + OPTannote = {} +} + +@TechReport{bkz06t, + author = {F.C. Botelho and Y. Kohayakawa and N. Ziviani}, + title = {An Approach for Minimal Perfect Hash Functions in Very Large Databases}, + institution = {Department of Computer Science, Federal University of Minas Gerais}, + note = {Available at http://www.dcc.ufmg.br/\texttt{\~ }nivio/pub/technicalreports.html}, + year = {2006}, + OPTkey = {}, + OPTtype = {}, + number = {RT.DCC.003}, + address = {Belo Horizonte, MG, Brazil}, + month = {April}, + OPTannote = {} +} + +@inproceedings{fcdh90, + author = {E.A. Fox and Q.F. Chen and A.M. Daoud and L.S. Heath}, + title = {Order preserving minimal perfect hash functions and information retrieval}, + booktitle = {Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval}, + year = {1990}, + isbn = {0-89791-408-2}, + pages = {279--311}, + location = {Brussels, Belgium}, + doi = {http://doi.acm.org/10.1145/96749.98233}, + publisher = {ACM Press}, + } + +@Article{fkp89, + author = {P. Flajolet and D. E. Knuth and B. Pittel}, + title = {The first cycles in an evolving graph}, + journal = {Discrete Math}, + year = {1989}, + volume = {75}, + pages = {167-215}, +} + +@Article{s77, + author = {R. Sprugnoli}, + title = {Perfect Hashing Functions: A Single Probe Retrieving + Method For Static Sets}, + journal = {Communications of the ACM}, + year = {1977}, + volume = {20}, + number = {11}, + pages = {841--850}, + month = {November}, +} + +@Article{j81, + author = {G. Jaeschke}, + title = {Reciprocal Hashing: A method For Generating Minimal Perfect + Hashing Functions}, + journal = {Communications of the ACM}, + year = {1981}, + volume = {24}, + number = {12}, + month = {December}, + pages = {829--833} +} + +@Article{c84, + author = {C. C. Chang}, + title = {The Study Of An Ordered Minimal Perfect Hashing Scheme}, + journal = {Communications of the ACM}, + year = {1984}, + volume = {27}, + number = {4}, + month = {December}, + pages = {384--387} +} + +@Article{c86, + author = {C. C. Chang}, + title = {Letter-Oriented Reciprocal Hashing Scheme}, + journal = {Inform. Sci.}, + year = {1986}, + volume = {27}, + pages = {243--255} +} + +@Article{cl86, + author = {C. C. Chang and R. C. T. Lee}, + title = {A Letter-Oriented Minimal Perfect Hashing Scheme}, + journal = {Computer Journal}, + year = {1986}, + volume = {29}, + number = {3}, + month = {June}, + pages = {277--281} +} + + +@Article{cc88, + author = {C. C. Chang and C. H. Chang}, + title = {An Ordered Minimal Perfect Hashing Scheme with Single Parameter}, + journal = {Inform. Process. Lett.}, + year = {1988}, + volume = {27}, + number = {2}, + month = {February}, + pages = {79--83} +} + +@Article{w90, + author = {V. G. Winters}, + title = {Minimal Perfect Hashing in Polynomial Time}, + journal = {BIT}, + year = {1990}, + volume = {30}, + number = {2}, + pages = {235--244} +} + +@Article{fcdh91, + author = {E. A. Fox and Q. F. Chen and A. M. Daoud and L. S. Heath}, + title = {Order Preserving Minimal Perfect Hash Functions and Information Retrieval}, + journal = {ACM Trans. Inform. Systems}, + year = {1991}, + volume = {9}, + number = {3}, + month = {July}, + pages = {281--308} +} + +@Article{fks84, + author = {M. L. Fredman and J. Koml\'os and E. Szemer\'edi}, + title = {Storing a sparse table with {O(1)} worst case access time}, + journal = {J. ACM}, + year = {1984}, + volume = {31}, + number = {3}, + month = {July}, + pages = {538--544} +} + +@Article{dhjs83, + author = {M. W. Du and T. M. Hsieh and K. F. Jea and D. W. Shieh}, + title = {The study of a new perfect hash scheme}, + journal = {IEEE Trans. Software Eng.}, + year = {1983}, + volume = {9}, + number = {3}, + month = {May}, + pages = {305--313} +} + +@Article{bt94, + author = {M. D. Brain and A. L. Tharp}, + title = {Using Tries to Eliminate Pattern Collisions in Perfect Hashing}, + journal = {IEEE Trans. on Knowledge and Data Eng.}, + year = {1994}, + volume = {6}, + number = {2}, + month = {April}, + pages = {239--247} +} + +@Article{bt90, + author = {M. D. Brain and A. L. Tharp}, + title = {Perfect hashing using sparse matrix packing}, + journal = {Inform. Systems}, + year = {1990}, + volume = {15}, + number = {3}, + OPTmonth = {April}, + pages = {281--290} +} + +@Article{ckw93, + author = {C. C. Chang and H. C.Kowng and T. C. Wu}, + title = {A refinement of a compression-oriented addressing scheme}, + journal = {BIT}, + year = {1993}, + volume = {33}, + number = {4}, + OPTmonth = {April}, + pages = {530--535} +} + +@Article{cw91, + author = {C. C. Chang and T. C. Wu}, + title = {A letter-oriented perfect hashing scheme based upon sparse table compression}, + journal = {Software -- Practice Experience}, + year = {1991}, + volume = {21}, + number = {1}, + month = {january}, + pages = {35--49} +} + +@Article{ty79, + author = {R. E. Tarjan and A. C. C. Yao}, + title = {Storing a sparse table}, + journal = {Comm. ACM}, + year = {1979}, + volume = {22}, + number = {11}, + month = {November}, + pages = {606--611} +} + +@Article{yd85, + author = {W. P. Yang and M. W. Du}, + title = {A backtracking method for constructing perfect hash functions from a set of mapping functions}, + journal = {BIT}, + year = {1985}, + volume = {25}, + number = {1}, + pages = {148--164} +} + +@Article{s85, + author = {T. J. Sager}, + title = {A polynomial time generator for minimal perfect hash functions}, + journal = {Commun. ACM}, + year = {1985}, + volume = {28}, + number = {5}, + month = {May}, + pages = {523--532} +} + +@Article{cm93, + author = {Z. J. Czech and B. S. Majewski}, + title = {A linear time algorithm for finding minimal perfect hash functions}, + journal = {The computer Journal}, + year = {1993}, + volume = {36}, + number = {6}, + pages = {579--587} +} + +@Article{gbs94, + author = {R. Gupta and S. Bhaskar and S. Smolka}, + title = {On randomization in sequential and distributed algorithms}, + journal = {ACM Comput. Surveys}, + year = {1994}, + volume = {26}, + number = {1}, + month = {March}, + pages = {7--86} +} + +@InProceedings{sb84, + author = {C. Slot and P. V. E. Boas}, + title = {On tape versus core; an application of space efficient perfect hash functions to the + invariance of space}, + booktitle = {Proc. 16th Ann. ACM Symp. on Theory of Computing -- STOC'84}, + address = {Washington}, + month = {May}, + year = {1984}, + pages = {391--400}, +} + +@InProceedings{wi90, + author = {V. G. Winters}, + title = {Minimal perfect hashing for large sets of data}, + booktitle = {Internat. Conf. on Computing and Information -- ICCI'90}, + address = {Canada}, + month = {May}, + year = {1990}, + pages = {275--284}, +} + +@InProceedings{lr85, + author = {P. Larson and M. V. Ramakrishna}, + title = {External perfect hashing}, + booktitle = {Proc. ACM SIGMOD Conf.}, + address = {Austin TX}, + month = {June}, + year = {1985}, + pages = {190--199}, +} + +@Book{m84, + author = {K. Mehlhorn}, + editor = {W. Brauer and G. Rozenberg and A. Salomaa}, + title = {Data Structures and Algorithms 1: Sorting and Searching}, + publisher = {Springer-Verlag}, + year = {1984}, +} + +@PhdThesis{c92, + author = {Q. F. Chen}, + title = {An Object-Oriented Database System for Efficient Information Retrieval Appliations}, + school = {Virginia Tech Dept. of Computer Science}, + year = {1992}, + month = {March} +} + +@article {er59, + AUTHOR = {Erd{\H{o}}s, P. and R{\'e}nyi, A.}, + TITLE = {On random graphs {I}}, + JOURNAL = {Pub. Math. Debrecen}, + VOLUME = {6}, + YEAR = {1959}, + PAGES = {290--297}, + MRCLASS = {05.00}, + MRNUMBER = {MR0120167 (22 \#10924)}, +MRREVIEWER = {A. Dvoretzky}, +} + + +@article {erdos61, + AUTHOR = {Erd{\H{o}}s, P. and R{\'e}nyi, A.}, + TITLE = {On the evolution of random graphs}, + JOURNAL = {Bull. Inst. Internat. Statist.}, + VOLUME = 38, + YEAR = 1961, + PAGES = {343--347}, + MRCLASS = {05.40 (55.10)}, + MRNUMBER = {MR0148055 (26 \#5564)}, +} + +@article {er60, + AUTHOR = {Erd{\H{o}}s, P. and R{\'e}nyi, A.}, + TITLE = {On the evolution of random graphs}, + JOURNAL = {Magyar Tud. Akad. Mat. Kutat\'o Int. K\"ozl.}, + VOLUME = {5}, + YEAR = {1960}, + PAGES = {17--61}, + MRCLASS = {05.40}, + MRNUMBER = {MR0125031 (23 \#A2338)}, +MRREVIEWER = {J. Riordan}, +} + +@Article{er60:_Old, + author = {P. Erd{\H{o}}s and A. R\'enyi}, + title = {On the evolution of random graphs}, + journal = {Publications of the Mathematical Institute of the Hungarian + Academy of Sciences}, + year = {1960}, + volume = {56}, + pages = {17-61} +} + +@Article{er61, + author = {P. Erd{\H{o}}s and A. R\'enyi}, + title = {On the strength of connectedness of a random graph}, + journal = {Acta Mathematica Scientia Hungary}, + year = {1961}, + volume = {12}, + pages = {261-267} +} + + +@Article{bp04, + author = {B. Bollob\'as and O. Pikhurko}, + title = {Integer Sets with Prescribed Pairwise Differences Being Distinct}, + journal = {European Journal of Combinatorics}, + OPTkey = {}, + OPTvolume = {}, + OPTnumber = {}, + OPTpages = {}, + OPTmonth = {}, + note = {To Appear}, + OPTannote = {} +} + +@Article{pw04:_OLD, + author = {B. Pittel and N. C. Wormald}, + title = {Counting connected graphs inside-out}, + journal = {Journal of Combinatorial Theory}, + OPTkey = {}, + OPTvolume = {}, + OPTnumber = {}, + OPTpages = {}, + OPTmonth = {}, + note = {To Appear}, + OPTannote = {} +} + + +@Article{mr95, + author = {M. Molloy and B. Reed}, + title = {A critical point for random graphs with a given degree sequence}, + journal = {Random Structures and Algorithms}, + year = {1995}, + volume = {6}, + pages = {161-179} +} + +@TechReport{bmz04, + author = {F. C. Botelho and D. Menoti and N. Ziviani}, + title = {A New algorithm for constructing minimal perfect hash functions}, + institution = {Federal Univ. of Minas Gerais}, + year = {2004}, + OPTkey = {}, + OPTtype = {}, + number = {TR004}, + OPTaddress = {}, + OPTmonth = {}, + note = {(http://www.dcc.ufmg.br/\texttt{\~ }nivio/pub/technicalreports.html)}, + OPTannote = {} +} + +@Article{mr98, + author = {M. Molloy and B. Reed}, + title = {The size of the giant component of a random graph with a given degree sequence}, + journal = {Combinatorics, Probability and Computing}, + year = {1998}, + volume = {7}, + pages = {295-305} +} + +@misc{h98, + author = {D. Hawking}, + title = {Overview of TREC-7 Very Large Collection Track (Draft for Notebook)}, + url = {citeseer.ist.psu.edu/4991.html}, + year = {1998}} + +@book {jlr00, + AUTHOR = {Janson, S. and {\L}uczak, T. and Ruci{\'n}ski, A.}, + TITLE = {Random graphs}, + PUBLISHER = {Wiley-Inter.}, + YEAR = 2000, + PAGES = {xii+333}, + ISBN = {0-471-17541-2}, + MRCLASS = {05C80 (60C05 82B41)}, + MRNUMBER = {2001k:05180}, +MRREVIEWER = {Mark R. Jerrum}, +} + +@incollection {jlr90, + AUTHOR = {Janson, Svante and {\L}uczak, Tomasz and Ruci{\'n}ski, + Andrzej}, + TITLE = {An exponential bound for the probability of nonexistence of a + specified subgraph in a random graph}, + BOOKTITLE = {Random graphs '87 (Pozna\'n, 1987)}, + PAGES = {73--87}, + PUBLISHER = {Wiley}, + ADDRESS = {Chichester}, + YEAR = 1990, + MRCLASS = {05C80 (60C05)}, + MRNUMBER = {91m:05168}, +MRREVIEWER = {J. Spencer}, +} + +@book {b01, + AUTHOR = {Bollob{\'a}s, B.}, + TITLE = {Random graphs}, + SERIES = {Cambridge Studies in Advanced Mathematics}, + VOLUME = 73, + EDITION = {Second}, + PUBLISHER = {Cambridge University Press}, + ADDRESS = {Cambridge}, + YEAR = 2001, + PAGES = {xviii+498}, + ISBN = {0-521-80920-7; 0-521-79722-5}, + MRCLASS = {05C80 (60C05)}, + MRNUMBER = {MR1864966 (2002j:05132)}, +} + +@article {pw04, + AUTHOR = {Pittel, Boris and Wormald, Nicholas C.}, + TITLE = {Counting connected graphs inside-out}, + JOURNAL = {J. Combin. Theory Ser. B}, + FJOURNAL = {Journal of Combinatorial Theory. Series B}, + VOLUME = 93, + YEAR = 2005, + NUMBER = 2, + PAGES = {127--172}, + ISSN = {0095-8956}, + CODEN = {JCBTB8}, + MRCLASS = {05C30 (05A16 05C40 05C80)}, + MRNUMBER = {MR2117934 (2005m:05117)}, +MRREVIEWER = {Edward A. Bender}, +} diff --git a/vldb07/relatedwork.tex b/vldb07/relatedwork.tex new file mode 100755 index 0000000..7693002 --- /dev/null +++ b/vldb07/relatedwork.tex @@ -0,0 +1,112 @@ +% Time-stamp: +\vspace{-3mm} +\section{Related work} +\label{sec:relatedprevious-work} +\vspace{-2mm} + +% Optimal speed for hashing means that each key from the key set $S$ +% will map to an unique location in the hash table, avoiding time wasted +% in resolving collisions. That is achieved with a MPHF and +% because of that many algorithms for constructing static +% and dynamic MPHFs, when static or dynamic sets are involved, +% were developed. Our focus has been on static MPHFs, since +% in many applications the key sets change slowly, if at all~\cite{s05}. + +\enlargethispage{2\baselineskip} +Czech, Havas and Majewski~\cite{chm97} provide a +comprehensive survey of the most important theoretical and practical results +on perfect hashing. +In this section we review some of the most important results. +%We also present more recent algorithms that share some features with +%the one presented hereinafter. + +Fredman, Koml\'os and Szemer\'edi~\cite{FKS84} showed that it is possible to +construct space efficient perfect hash functions that can be evaluated in +constant time with table sizes that are linear in the number of keys: +$m=O(n)$. In their model of computation, an element of the universe~$U$ fits +into one machine word, and arithmetic operations and memory accesses have unit +cost. Randomized algorithms in the FKS model can construct a perfect hash +function in expected time~$O(n)$: +this is the case of our algorithm and the works in~\cite{chm92,p99}. + +Mehlhorn~\cite{m84} showed +that at least $\Omega((1/\ln 2)n + \ln\ln u)$ bits are +required to represent a MPHF (i.e, at least 1.4427 bits per +key must be stored). +To the best of our knowledge our algorithm +is the first one capable of generating MPHFs for sets in the order +of billion of keys, and the generated functions +require less than 9 bits per key to be stored. +This increases one order of magnitude in the size of the greatest +key set for which a MPHF was obtained in the literature~\cite{bkz05}. +%which is close to the lower bound presented in~\cite{m84}. + +Some work on minimal perfect hashing has been done under the assumption that +the algorithm can pick and store truly random functions~\cite{bkz05,chm92,p99}. +Since the space requirements for truly random functions makes them unsuitable for +implementation, one has to settle for pseudo-random functions in practice. +Empirical studies show that limited randomness properties are often as good as +total randomness. +We could verify that phenomenon in our experiments by using the universal hash +function proposed by Jenkins~\cite{j97}, which is +time efficient at retrieval time and requires just an integer to be used as a +random seed (the function is completely determined by the seed). +% Os trabalhos~\cite{asw00,swz00} apresentam algoritmos para construir +% FHPs e FHPMs deterministicamente. +% As fun\c{c}\~oes geradas necessitam de $O(n \log(n) + \log(\log(u)))$ bits para serem descritas. +% A complexidade de caso m\'edio dos algoritmos para gerar as fun\c{c}\~oes \'e +% $O(n\log(n) \log( \log (u)))$ e a de pior caso \'e $O(n^3\log(n) \log(\log(u)))$. +% A complexidade de avalia\c{c}\~ao das fun\c{c}\~oes \'e $O(\log(n) + \log(\log(u)))$. +% Assim, os algoritmos n\~ao geram fun\c{c}\~oes que podem ser avaliadas com complexidade +% de tempo $O(1)$, est\~ao distantes a um fator de $\log n$ da complexidade \'otima para descrever +% FHPs e FHPMs (Mehlhorn mostra em~\cite{m84} +% que para armazenar uma FHP s\~ao necess\'arios no m\'{\i}nimo +% $\Omega(n^2/(2\ln 2) m + \log\log u)$ bits), e n\~ao geram as +% fun\c{c}\~oes com complexidade linear. +% Al\'em disso, o universo $U$ das chaves \'e restrito a n\'umeros inteiros, o que pode +% limitar a utiliza\c{c}\~ao na pr\'atica. + +Pagh~\cite{p99} proposed a family of randomized algorithms for +constructing MPHFs +where the form of the resulting function is $h(x) = (f(x) + d[g(x)]) \bmod n$, +where $f$ and $g$ are universal hash functions and $d$ is a set of +displacement values to resolve collisions that are caused by the function $f$. +Pagh identified a set of conditions concerning $f$ and $g$ and showed +that if these conditions are satisfied, then a minimal perfect hash +function can be computed in expected time $O(n)$ and stored in +$(2+\epsilon)n\log_2n$ bits. + +Dietzfelbinger and Hagerup~\cite{dh01} improved~\cite{p99}, +reducing from $(2+\epsilon)n\log_2n$ to $(1+\epsilon)n\log_2n$ the number of bits +required to store the function, but in their approach~$f$ and~$g$ must +be chosen from a class +of hash functions that meet additional requirements. +%Differently from the works in~\cite{dh01, p99}, our algorithm generates a MPHF +%$h$ in expected linear time and $h$ can be stored in $O(n)$ bits (9 bits per key). + +% Galli, Seybold e Simon~\cite{gss01} propuseram um algoritmo r\^andomico +% que gera FHPMs da mesma forma das geradas pelos algoritmos de Pagh~\cite{p99} +% e, Dietzfelbinger e Hagerup~\cite{dh01}. No entanto, eles definiram a forma das +% fun\c{c}\~oes $f(k) = h_c(k) \bmod n$ e $g(k) = \lfloor h_c(k)/n \rfloor$ para obter em tempo esperado $O(n)$ uma fun\c{c}\~ao que pode ser descrita em $O(n\log n)$ bits, onde +% $h_c(k) = (ck \bmod p) \bmod n^2$, $1 \leq c \leq p-1$ e $p$ um primo maior do que $u$. +%Our algorithm is the first one capable of generating MPHFs for sets in the order of +%billion of keys. It happens because we do not need to keep into main memory +%at generation time complex data structures as a graph, lists and so on. We just need to maintain +%a small vector that occupies around 8MB for a set of 1 billion keys. + +Fox et al.~\cite{fch92,fhcd92} studied MPHFs +%that also share features with the ones generated by our algorithm. +that bring down the storage requirements we got to between 2 and 4 bits per key. +However, it is shown in~\cite[Section 6.7]{chm97} that their algorithms have exponential +running times and cannot scale for sets larger than 11 million keys in our +implementation of the algorithm. + +Our previous work~\cite{bkz05} improves the one by Czech, Havas and Majewski~\cite{chm92}. +We obtained more compact functions in less time. Although +the algorithm in~\cite{bkz05} is the fastest algorithm +we know of, the resulting functions are stored in $O(n\log n)$ bits and +one needs to keep in main memory at generation time a random graph of $n$ edges +and $cn$ vertices, +where $c\in[0.93,1.15]$. Using the well known divide to conquer approach +we use that algorithm as a building block for the new one, where the +resulting functions are stored in $O(n)$ bits. diff --git a/vldb07/searching.tex b/vldb07/searching.tex new file mode 100755 index 0000000..8feb6f1 --- /dev/null +++ b/vldb07/searching.tex @@ -0,0 +1,155 @@ +%% Nivio: 22/jan/06 +% Time-stamp: +\vspace{-7mm} +\subsection{Searching step} +\label{sec:searching} + +\enlargethispage{2\baselineskip} +The searching step is responsible for generating a MPHF for each +bucket. +Figure~\ref{fig:searchingstep} presents the searching step algorithm. +\vspace{-2mm} +\begin{figure}[h] +%\centering +\hrule +\hrule +\vspace{2mm} +\begin{tabbing} +aa\=type booleanx \== (false, true); \kill +\> $\blacktriangleright$ Let $H$ be a minimum heap of size $N$, where the \\ +\> ~~ order relation in $H$ is given by Eq.~(\ref{eq:bucketindex}), that is, the\\ +\> ~~ remove operation removes the item with smallest $i$\\ +\> $1.$ {\bf for} $j = 1$ {\bf to} $N$ {\bf do} \{ Heap construction \}\\ +\> ~~ $1.1$ Read key $k$ from File $j$ on disk\\ +\> ~~ $1.2$ Insert $(i, j, k)$ in $H$ \\ +\> $2.$ {\bf for} $i = 0$ {\bf to} $\lceil n/b \rceil - 1$ {\bf do} \\ +\> ~~ $2.1$ Read bucket $i$ from disk driven by heap $H$ \\ +\> ~~ $2.2$ Generate a MPHF for bucket $i$ \\ +\> ~~ $2.3$ Write the description of MPHF$_i$ to the disk +\end{tabbing} +\vspace{-1mm} +\hrule +\hrule +\caption{Searching step} +\label{fig:searchingstep} +\vspace{-4mm} +\end{figure} + +Statement 1 of Figure~\ref{fig:searchingstep} inserts one key from each file +in a minimum heap $H$ of size $N$. +The order relation in $H$ is given by the bucket address $i$ given by +Eq.~(\ref{eq:bucketindex}). + +%\enlargethispage{-\baselineskip} +Statement 2 has two important steps. +In statement 2.1, a bucket is read from disk, +as described below. +%in Section~\ref{sec:readingbucket}. +In statement 2.2, a MPHF is generated for each bucket $i$, as described +in the following. +%in Section~\ref{sec:mphfbucket}. +The description of MPHF$_i$ is a vector $g_i$ of 8-bit integers. +Finally, statement 2.3 writes the description $g_i$ of MPHF$_i$ to disk. + +\vspace{-3mm} +\label{sec:readingbucket} +\subsubsection{Reading a bucket from disk.} + +In this section we present the refinement of statement 2.1 of +Figure~\ref{fig:searchingstep}. +The algorithm to read bucket $i$ from disk is presented +in Figure~\ref{fig:readingbucket}. + +\begin{figure}[h] +\hrule +\hrule +\vspace{2mm} +\begin{tabbing} +aa\=type booleanx \== (false, true); \kill +\> $1.$ {\bf while} bucket $i$ is not full {\bf do} \\ +\> ~~ $1.1$ Remove $(i, j, k)$ from $H$\\ +\> ~~ $1.2$ Insert $k$ into bucket $i$ \\ +\> ~~ $1.3$ Read sequentially all keys $k$ from File $j$ that have \\ +\> ~~~~~~~ the same $i$ and insert them into bucket $i$ \\ +\> ~~ $1.4$ Insert the triple $(i, j, x)$ in $H$, where $x$ is the first \\ +\> ~~~~~~~ key read from File $j$ that does not have the \\ +\> ~~~~~~~ same bucket index $i$ +\end{tabbing} +\hrule +\hrule +\vspace{-1.0mm} +\caption{Reading a bucket} +\vspace{-4.0mm} +\label{fig:readingbucket} +\end{figure} + +Bucket $i$ is distributed among many files and the heap $H$ is used to drive a +multiway merge operation. +In Figure~\ref{fig:readingbucket}, statement 1.1 extracts and removes triple +$(i, j, k)$ from $H$, where $i$ is a minimum value in $H$. +Statement 1.2 inserts key $k$ in bucket $i$. +Notice that the $k$ in the triple $(i, j, k)$ is in fact a pointer to +the first byte of the key that is kept in contiguous positions of an array of characters +(this array containing the keys is initialized during the heap construction +in statement 1 of Figure~\ref{fig:searchingstep}). +Statement 1.3 performs a seek operation in File $j$ on disk for the first +read operation and reads sequentially all keys $k$ that have the same $i$ +%(obtained from Eq.~(\ref{eq:bucketindex})) +and inserts them all in bucket $i$. +Finally, statement 1.4 inserts in $H$ the triple $(i, j, x)$, +where $x$ is the first key read from File $j$ (in statement 1.3) +that does not have the same bucket address as the previous keys. + +The number of seek operations on disk performed in statement 1.3 is discussed +in Section~\ref{sec:linearcomplexity}, +where we present a buffering technique that brings down +the time spent with seeks. + +\vspace{-2mm} +\enlargethispage{2\baselineskip} +\subsubsection{Generating a MPHF for each bucket.} \label{sec:mphfbucket} + +To the best of our knowledge the algorithm we have designed in +our previous work~\cite{bkz05} is the fastest published algorithm for +constructing MPHFs. +That is why we are using that algorithm as a building block for the +algorithm presented here. + +%\enlargethispage{-\baselineskip} +Our previous algorithm is a three-step internal memory based algorithm +that produces a MPHF based on random graphs. +For a set of $n$ keys, the algorithm outputs the resulting MPHF in expected time $O(n)$. +For a given bucket $i$, $0 \leq i < \lceil n/b \rceil$, the corresponding MPHF$_i$ +has the following form: +\begin{eqnarray} + \mathrm{MPHF}_i(k) &=& g_i[a] + g_i[b] \label{eq:mphfi} +\end{eqnarray} +where $a = h_{i1}(k) \bmod t$, $b = h_{i2}(k) \bmod t$ and +$t = c\times \mathit{size}[i]$. The functions +$h_{i1}(k)$ and $h_{i2}(k)$ are the same universal function proposed by Jenkins~\cite{j97} +that was used in the partitioning step described in Section~\ref{sec:partitioning-keys}. + +In order to generate the function above the algorithm involves the generation of simple random graphs +$G_i = (V_i, E_i)$ with~$|V_i|=t=c\times\mathit{size}[i]$ and $|E_i|=\mathit{size}[i]$, with $c \in [0.93, 1.15]$. +To generate a simple random graph with high +probability\footnote{We use the terms `with high probability' +to mean `with probability tending to~$1$ as~$n\to\infty$'.}, two vertices $a$ and $b$ are +computed for each key $k$ in bucket $i$. +Thus, each bucket $i$ has a corresponding graph~$G_i=(V_i,E_i)$, where $V_i=\{0,1, +\ldots,t-1\}$ and $E_i=\big\{\{a,b\}:k \in \mathrm{bucket}\: i\big\}$. +In order to get a simple graph, +the algorithm repeatedly selects $h_{i1}$ and $h_{i2}$ from a family of universal hash functions +until the corresponding graph is simple. +The probability of getting a simple graph is $p=e^{-1/c^2}$. +For $c=1$, this probability is $p \simeq 0.368$, and the expected number of +iterations to obtain a simple graph is~$1/p \simeq 2.72$. + +The construction of MPHF$_i$ ends with a computation of a suitable labelling of the vertices +of~$G_i$. The labelling is stored into vector $g_i$. +We choose~$g_i[v]$ for each~$v\in V_i$ in such +a way that Eq.~(\ref{eq:mphfi}) is a MPHF for bucket $i$. +In order to get the values of each entry of $g_i$ we first +run a breadth-first search on the 2-\textit{core} of $G_i$, i.e., the maximal subgraph +of~$G_i$ with minimal degree at least~$2$ (see, e.g., \cite{b01,jlr00,pw04}) and +a depth-first search on the acyclic part of $G_i$ (see \cite{bkz05} for details). + diff --git a/vldb07/svglov2.clo b/vldb07/svglov2.clo new file mode 100644 index 0000000..d98306e --- /dev/null +++ b/vldb07/svglov2.clo @@ -0,0 +1,77 @@ +% SVJour2 DOCUMENT CLASS OPTION SVGLOV2 -- for standardised journals +% +% This is an enhancement for the LaTeX +% SVJour2 document class for Springer journals +% +%% +%% +%% \CharacterTable +%% {Upper-case \A\B\C\D\E\F\G\H\I\J\K\L\M\N\O\P\Q\R\S\T\U\V\W\X\Y\Z +%% Lower-case \a\b\c\d\e\f\g\h\i\j\k\l\m\n\o\p\q\r\s\t\u\v\w\x\y\z +%% Digits \0\1\2\3\4\5\6\7\8\9 +%% Exclamation \! Double quote \" Hash (number) \# +%% Dollar \$ Percent \% Ampersand \& +%% Acute accent \' Left paren \( Right paren \) +%% Asterisk \* Plus \+ Comma \, +%% Minus \- Point \. Solidus \/ +%% Colon \: Semicolon \; Less than \< +%% Equals \= Greater than \> Question mark \? +%% Commercial at \@ Left bracket \[ Backslash \\ +%% Right bracket \] Circumflex \^ Underscore \_ +%% Grave accent \` Left brace \{ Vertical bar \| +%% Right brace \} Tilde \~} +\ProvidesFile{svglov2.clo} + [2004/10/25 v2.1 + style option for standardised journals] +\typeout{SVJour Class option: svglov2.clo for standardised journals} +\def\validfor{svjour2} +\ExecuteOptions{final,10pt,runningheads} +% No size changing allowed, hence a copy of size10.clo is included +\renewcommand\normalsize{% + \@setfontsize\normalsize{10.2pt}{4mm}% + \abovedisplayskip=3 mm plus6pt minus 4pt + \belowdisplayskip=3 mm plus6pt minus 4pt + \abovedisplayshortskip=0.0 mm plus6pt + \belowdisplayshortskip=2 mm plus4pt minus 4pt + \let\@listi\@listI} +\normalsize +\newcommand\small{% + \@setfontsize\small{8.7pt}{3.25mm}% + \abovedisplayskip 8.5\p@ \@plus3\p@ \@minus4\p@ + \abovedisplayshortskip \z@ \@plus2\p@ + \belowdisplayshortskip 4\p@ \@plus2\p@ \@minus2\p@ + \def\@listi{\leftmargin\leftmargini + \parsep 0\p@ \@plus1\p@ \@minus\p@ + \topsep 4\p@ \@plus2\p@ \@minus4\p@ + \itemsep0\p@}% + \belowdisplayskip \abovedisplayskip +} +\let\footnotesize\small +\newcommand\scriptsize{\@setfontsize\scriptsize\@viipt\@viiipt} +\newcommand\tiny{\@setfontsize\tiny\@vpt\@vipt} +\newcommand\large{\@setfontsize\large\@xiipt{14pt}} +\newcommand\Large{\@setfontsize\Large\@xivpt{16dd}} +\newcommand\LARGE{\@setfontsize\LARGE\@xviipt{17dd}} +\newcommand\huge{\@setfontsize\huge\@xxpt{25}} +\newcommand\Huge{\@setfontsize\Huge\@xxvpt{30}} +% +%ALT% \def\runheadhook{\rlap{\smash{\lower5pt\hbox to\textwidth{\hrulefill}}}} +\def\runheadhook{\rlap{\smash{\lower11pt\hbox to\textwidth{\hrulefill}}}} +\AtEndOfClass{\advance\headsep by5pt} +\if@twocolumn +\setlength{\textwidth}{17.6cm} +\setlength{\textheight}{230mm} +\AtEndOfClass{\setlength\columnsep{4mm}} +\else +\setlength{\textwidth}{11.7cm} +\setlength{\textheight}{517.5dd} % 19.46cm +\fi +% +\AtBeginDocument{% +\@ifundefined{@journalname} + {\typeout{Unknown journal: specify \string\journalname\string{% +\string} in preambel^^J}}{}} +% +\endinput +%% +%% End of file `svglov2.clo'. diff --git a/vldb07/svjour2.cls b/vldb07/svjour2.cls new file mode 100644 index 0000000..56d9216 --- /dev/null +++ b/vldb07/svjour2.cls @@ -0,0 +1,1419 @@ +% SVJour2 DOCUMENT CLASS -- version 2.8 for LaTeX2e +% +% LaTeX document class for Springer journals +% +%% +%% +%% \CharacterTable +%% {Upper-case \A\B\C\D\E\F\G\H\I\J\K\L\M\N\O\P\Q\R\S\T\U\V\W\X\Y\Z +%% Lower-case \a\b\c\d\e\f\g\h\i\j\k\l\m\n\o\p\q\r\s\t\u\v\w\x\y\z +%% Digits \0\1\2\3\4\5\6\7\8\9 +%% Exclamation \! Double quote \" Hash (number) \# +%% Dollar \$ Percent \% Ampersand \& +%% Acute accent \' Left paren \( Right paren \) +%% Asterisk \* Plus \+ Comma \, +%% Minus \- Point \. Solidus \/ +%% Colon \: Semicolon \; Less than \< +%% Equals \= Greater than \> Question mark \? +%% Commercial at \@ Left bracket \[ Backslash \\ +%% Right bracket \] Circumflex \^ Underscore \_ +%% Grave accent \` Left brace \{ Vertical bar \| +%% Right brace \} Tilde \~} +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesClass{svjour2}[2005/08/29 v2.8 +^^JLaTeX document class for Springer journals] +\newcommand\@ptsize{} +\newif\if@restonecol +\newif\if@titlepage +\@titlepagefalse +\DeclareOption{a4paper} + {\setlength\paperheight {297mm}% + \setlength\paperwidth {210mm}} +\DeclareOption{10pt}{\renewcommand\@ptsize{0}} +\DeclareOption{twoside}{\@twosidetrue \@mparswitchtrue} +\DeclareOption{draft}{\setlength\overfullrule{5pt}} +\DeclareOption{final}{\setlength\overfullrule{0pt}} +\DeclareOption{fleqn}{\input{fleqn.clo}\AtBeginDocument{\mathindent\z@}} +\DeclareOption{twocolumn}{\@twocolumntrue\ExecuteOptions{fleqn}} +\newif\if@avier\@avierfalse +\DeclareOption{onecollarge}{\@aviertrue} +\let\if@mathematic\iftrue +\let\if@numbook\iffalse +\DeclareOption{numbook}{\let\if@envcntsect\iftrue + \AtEndOfPackage{% + \renewcommand\thefigure{\thesection.\@arabic\c@figure}% + \renewcommand\thetable{\thesection.\@arabic\c@table}% + \renewcommand\theequation{\thesection.\@arabic\c@equation}% + \@addtoreset{figure}{section}% + \@addtoreset{table}{section}% + \@addtoreset{equation}{section}% + }% +} +\DeclareOption{openbib}{% + \AtEndOfPackage{% + \renewcommand\@openbib@code{% + \advance\leftmargin\bibindent + \itemindent -\bibindent + \listparindent \itemindent + \parsep \z@ + }% + \renewcommand\newblock{\par}}% +} +\DeclareOption{natbib}{% +\AtEndOfClass{\RequirePackage{natbib}% +% Changing some parameters of NATBIB +\setlength{\bibhang}{\parindent}% +%\setlength{\bibsep}{0mm}% +\let\bibfont=\small +\def\@biblabel#1{#1.}% +\newcommand{\etal}{et al.}% +\bibpunct{(}{)}{;}{a}{}{,}}} +% +\let\if@runhead\iffalse +\DeclareOption{runningheads}{\let\if@runhead\iftrue} +\let\if@smartrunh\iffalse +\DeclareOption{smartrunhead}{\let\if@smartrunh\iftrue} +\DeclareOption{nosmartrunhead}{\let\if@smartrunh\iffalse} +\let\if@envcntreset\iffalse +\DeclareOption{envcountreset}{\let\if@envcntreset\iftrue} +\let\if@envcntsame\iffalse +\DeclareOption{envcountsame}{\let\if@envcntsame\iftrue} +\let\if@envcntsect\iffalse +\DeclareOption{envcountsect}{\let\if@envcntsect\iftrue} +\let\if@referee\iffalse +\DeclareOption{referee}{\let\if@referee\iftrue} +\def\makereferee{\def\baselinestretch{2}} +\let\if@instindent\iffalse +\DeclareOption{instindent}{\let\if@instindent\iftrue} +\let\if@smartand\iffalse +\DeclareOption{smartand}{\let\if@smartand\iftrue} +\let\if@spthms\iftrue +\DeclareOption{nospthms}{\let\if@spthms\iffalse} +% +% language and babel dependencies +\DeclareOption{deutsch}{\def\switcht@@therlang{\switcht@deutsch}% +\gdef\svlanginfo{\typeout{Man spricht deutsch.}\global\let\svlanginfo\relax}} +\DeclareOption{francais}{\def\switcht@@therlang{\switcht@francais}% +\gdef\svlanginfo{\typeout{On parle francais.}\global\let\svlanginfo\relax}} +\let\switcht@@therlang\relax +\let\svlanginfo\relax +% +\AtBeginDocument{\@ifpackageloaded{babel}{% +\@ifundefined{extrasenglish}{}{\addto\extrasenglish{\switcht@albion}}% +\@ifundefined{extrasUKenglish}{}{\addto\extrasUKenglish{\switcht@albion}}% +\@ifundefined{extrasfrenchb}{}{\addto\extrasfrenchb{\switcht@francais}}% +\@ifundefined{extrasgerman}{}{\addto\extrasgerman{\switcht@deutsch}}% +\@ifundefined{extrasngerman}{}{\addto\extrasngerman{\switcht@deutsch}}% +}{\switcht@@therlang}% +} +% +\def\ClassInfoNoLine#1#2{% + \ClassInfo{#1}{#2\@gobble}% +} +\let\journalopt\@empty +\DeclareOption*{% +\InputIfFileExists{sv\CurrentOption.clo}{% +\global\let\journalopt\CurrentOption}{% +\ClassWarning{Springer-SVJour2}{Specified option or subpackage +"\CurrentOption" not found -}\OptionNotUsed}} +\ExecuteOptions{a4paper,twoside,10pt,instindent} +\ProcessOptions +% +\ifx\journalopt\@empty\relax +\ClassInfoNoLine{Springer-SVJour2}{extra/valid Springer sub-package (-> *.clo) +\MessageBreak not found in option list of \string\documentclass +\MessageBreak - autoactivating "global" style}{} +\input{svglov2.clo} +\else +\@ifundefined{validfor}{% +\ClassError{Springer-SVJour2}{Possible option clash for sub-package +\MessageBreak "sv\journalopt.clo" - option file not valid +\MessageBreak for this class}{Perhaps you used an option of the old +Springer class SVJour!} +}{} +\fi +% +\if@smartrunh\AtEndDocument{\islastpageeven\getlastpagenumber}\fi +% +\newcommand{\twocoltest}[2]{\if@twocolumn\def\@gtempa{#2}\else\def\@gtempa{#1}\fi +\@gtempa\makeatother} +\newcommand{\columncase}{\makeatletter\twocoltest} +% +\DeclareMathSymbol{\Gamma}{\mathalpha}{letters}{"00} +\DeclareMathSymbol{\Delta}{\mathalpha}{letters}{"01} +\DeclareMathSymbol{\Theta}{\mathalpha}{letters}{"02} +\DeclareMathSymbol{\Lambda}{\mathalpha}{letters}{"03} +\DeclareMathSymbol{\Xi}{\mathalpha}{letters}{"04} +\DeclareMathSymbol{\Pi}{\mathalpha}{letters}{"05} +\DeclareMathSymbol{\Sigma}{\mathalpha}{letters}{"06} +\DeclareMathSymbol{\Upsilon}{\mathalpha}{letters}{"07} +\DeclareMathSymbol{\Phi}{\mathalpha}{letters}{"08} +\DeclareMathSymbol{\Psi}{\mathalpha}{letters}{"09} +\DeclareMathSymbol{\Omega}{\mathalpha}{letters}{"0A} +% +\setlength\parindent{15\p@} +\setlength\smallskipamount{3\p@ \@plus 1\p@ \@minus 1\p@} +\setlength\medskipamount{6\p@ \@plus 2\p@ \@minus 2\p@} +\setlength\bigskipamount{12\p@ \@plus 4\p@ \@minus 4\p@} +\setlength\headheight{12\p@} +\setlength\headsep {16.74dd} +\setlength\topskip {10\p@} +\setlength\footskip{30\p@} +\setlength\maxdepth{.5\topskip} +% +\@settopoint\textwidth +\setlength\marginparsep {10\p@} +\setlength\marginparpush{5\p@} +\setlength\topmargin{-10pt} +\if@twocolumn + \setlength\oddsidemargin {-30\p@} + \setlength\evensidemargin{-30\p@} +\else + \setlength\oddsidemargin {\z@} + \setlength\evensidemargin{\z@} +\fi +\setlength\marginparwidth {48\p@} +\setlength\footnotesep{8\p@} +\setlength{\skip\footins}{9\p@ \@plus 4\p@ \@minus 2\p@} +\setlength\floatsep {12\p@ \@plus 2\p@ \@minus 2\p@} +\setlength\textfloatsep{20\p@ \@plus 2\p@ \@minus 4\p@} +\setlength\intextsep {20\p@ \@plus 2\p@ \@minus 2\p@} +\setlength\dblfloatsep {12\p@ \@plus 2\p@ \@minus 2\p@} +\setlength\dbltextfloatsep{20\p@ \@plus 2\p@ \@minus 4\p@} +\setlength\@fptop{0\p@} +\setlength\@fpsep{12\p@ \@plus 2\p@ \@minus 2\p@} +\setlength\@fpbot{0\p@ \@plus 1fil} +\setlength\@dblfptop{0\p@} +\setlength\@dblfpsep{12\p@ \@plus 2\p@ \@minus 2\p@} +\setlength\@dblfpbot{0\p@ \@plus 1fil} +\setlength\partopsep{2\p@ \@plus 1\p@ \@minus 1\p@} +\def\@listi{\leftmargin\leftmargini + \parsep \z@ + \topsep 6\p@ \@plus2\p@ \@minus4\p@ + \itemsep\parsep} +\let\@listI\@listi +\@listi +\def\@listii {\leftmargin\leftmarginii + \labelwidth\leftmarginii + \advance\labelwidth-\labelsep + \topsep \z@ + \parsep \topsep + \itemsep \parsep} +\def\@listiii{\leftmargin\leftmarginiii + \labelwidth\leftmarginiii + \advance\labelwidth-\labelsep + \topsep \z@ + \parsep \topsep + \itemsep \parsep} +\def\@listiv {\leftmargin\leftmarginiv + \labelwidth\leftmarginiv + \advance\labelwidth-\labelsep} +\def\@listv {\leftmargin\leftmarginv + \labelwidth\leftmarginv + \advance\labelwidth-\labelsep} +\def\@listvi {\leftmargin\leftmarginvi + \labelwidth\leftmarginvi + \advance\labelwidth-\labelsep} +% +\setlength\lineskip{1\p@} +\setlength\normallineskip{1\p@} +\renewcommand\baselinestretch{} +\setlength\parskip{0\p@ \@plus \p@} +\@lowpenalty 51 +\@medpenalty 151 +\@highpenalty 301 +\setcounter{topnumber}{4} +\renewcommand\topfraction{.9} +\setcounter{bottomnumber}{2} +\renewcommand\bottomfraction{.7} +\setcounter{totalnumber}{6} +\renewcommand\textfraction{.1} +\renewcommand\floatpagefraction{.85} +\setcounter{dbltopnumber}{3} +\renewcommand\dbltopfraction{.85} +\renewcommand\dblfloatpagefraction{.85} +\def\ps@headings{% + \let\@oddfoot\@empty\let\@evenfoot\@empty + \def\@evenhead{\small\csname runheadhook\endcsname + \rlap{\thepage}\hfil\leftmark\unskip}% + \def\@oddhead{\small\csname runheadhook\endcsname + \ignorespaces\rightmark\hfil\llap{\thepage}}% + \let\@mkboth\@gobbletwo + \let\sectionmark\@gobble + \let\subsectionmark\@gobble + } +% make indentations changeable +\def\setitemindent#1{\settowidth{\labelwidth}{#1}% + \leftmargini\labelwidth + \advance\leftmargini\labelsep + \def\@listi{\leftmargin\leftmargini + \labelwidth\leftmargini\advance\labelwidth by -\labelsep + \parsep=\parskip + \topsep=\medskipamount + \itemsep=\parskip \advance\itemsep by -\parsep}} +\def\setitemitemindent#1{\settowidth{\labelwidth}{#1}% + \leftmarginii\labelwidth + \advance\leftmarginii\labelsep +\def\@listii{\leftmargin\leftmarginii + \labelwidth\leftmarginii\advance\labelwidth by -\labelsep + \parsep=\parskip + \topsep=\z@ + \itemsep=\parskip \advance\itemsep by -\parsep}} +% labels of description +\def\descriptionlabel#1{\hspace\labelsep #1\hfil} +% adjusted environment "description" +% if an optional parameter (at the first two levels of lists) +% is present, its width is considered to be the widest mark +% throughout the current list. +\def\description{\@ifnextchar[{\@describe}{\list{}{\labelwidth\z@ + \itemindent-\leftmargin \let\makelabel\descriptionlabel}}} +\let\enddescription\endlist +% +\def\describelabel#1{#1\hfil} +\def\@describe[#1]{\relax\ifnum\@listdepth=0 +\setitemindent{#1}\else\ifnum\@listdepth=1 +\setitemitemindent{#1}\fi\fi +\list{--}{\let\makelabel\describelabel}} +% +\newdimen\logodepth +\logodepth=1.2cm +\newdimen\headerboxheight +\headerboxheight=180pt % 18 10.5dd-lines - 2\baselineskip +\advance\headerboxheight by-14.5mm +\newdimen\betweenumberspace % dimension for space between +\betweenumberspace=3.33pt % number and text of titles. +\newdimen\aftertext % dimension for space after +\aftertext=5pt % text of title. +\newdimen\headlineindent % dimension for space between +\headlineindent=1.166cm % number and text of headings. +\if@mathematic + \def\runinend{} % \enspace} + \def\floatcounterend{\enspace} + \def\sectcounterend{} +\else + \def\runinend{.} + \def\floatcounterend{.\ } + \def\sectcounterend{.} +\fi +\def\email#1{\emailname: #1} +\def\keywords#1{\par\addvspace\medskipamount{\rightskip=0pt plus1cm +\def\and{\ifhmode\unskip\nobreak\fi\ $\cdot$ +}\noindent\keywordname\enspace\ignorespaces#1\par}} +% +\def\subclassname{{\bfseries Mathematics Subject Classification +(2000)}\enspace} +\def\subclass#1{\par\addvspace\medskipamount{\rightskip=0pt plus1cm +\def\and{\ifhmode\unskip\nobreak\fi\ $\cdot$ +}\noindent\subclassname\ignorespaces#1\par}} +% +\def\PACSname{\textbf{PACS}\enspace} +\def\PACS#1{\par\addvspace\medskipamount{\rightskip=0pt plus1cm +\def\and{\ifhmode\unskip\nobreak\fi\ $\cdot$ +}\noindent\PACSname\ignorespaces#1\par}} +% +\def\CRclassname{{\bfseries CR Subject Classification}\enspace} +\def\CRclass#1{\par\addvspace\medskipamount{\rightskip=0pt plus1cm +\def\and{\ifhmode\unskip\nobreak\fi\ $\cdot$ +}\noindent\CRclassname\ignorespaces#1\par}} +% +\def\ESMname{\textbf{Electronic Supplementary Material}\enspace} +\def\ESM#1{\par\addvspace\medskipamount +\noindent\ESMname\ignorespaces#1\par} +% +\newcounter{inst} +\newcounter{auth} +\def\authdepth{2} +\newdimen\instindent +\newbox\authrun +\newtoks\authorrunning +\newbox\titrun +\newtoks\titlerunning +\def\authorfont{\bfseries} + +\def\combirunning#1{\gdef\@combi{#1}} +\def\@combi{} +\newbox\combirun +% +\def\ps@last{\def\@evenhead{\small\rlap{\thepage}\hfil +\lastevenhead}} +\newcounter{lastpage} +\def\islastpageeven{\@ifundefined{lastpagenumber} +{\setcounter{lastpage}{0}}{\setcounter{lastpage}{\lastpagenumber}} +\ifnum\value{lastpage}>0 + \ifodd\value{lastpage}% + \else + \if@smartrunh + \thispagestyle{last}% + \fi + \fi +\fi} +\def\getlastpagenumber{\clearpage +\addtocounter{page}{-1}% + \immediate\write\@auxout{\string\gdef\string\lastpagenumber{\thepage}}% + \immediate\write\@auxout{\string\newlabel{LastPage}{{}{\thepage}}}% + \addtocounter{page}{1}} + +\def\journalname#1{\gdef\@journalname{#1}} + +\def\dedication#1{\gdef\@dedic{#1}} +\def\@dedic{} + +\let\@date\undefined +\def\notused{~} + +\def\institute#1{\gdef\@institute{#1}} + +\def\offprints#1{\begingroup +\def\protect{\noexpand\protect\noexpand}\xdef\@thanks{\@thanks +\protect\footnotetext[0]{\unskip\hskip-15pt{\itshape Send offprint requests +to\/}: \ignorespaces#1}}\endgroup\ignorespaces} + +%\def\mail#1{\gdef\@mail{#1}} +%\def\@mail{} + +\def\@thanks{} + +\def\@fnsymbol#1{\ifcase#1\or\star\or{\star\star}\or{\star\star\star}% + \or \dagger\or \ddagger\or + \mathchar "278\or \mathchar "27B\or \|\or **\or \dagger\dagger + \or \ddagger\ddagger \else\@ctrerr\fi\relax} +% +%\def\invthanks#1{\footnotetext[0]{\kern-\bibindent#1}} +% +\def\nothanksmarks{\def\thanks##1{\protected@xdef\@thanks{\@thanks + \protect\footnotetext[0]{\kern-\bibindent##1}}}} +% +\def\subtitle#1{\gdef\@subtitle{#1}} +\def\@subtitle{} + +\def\headnote#1{\gdef\@headnote{#1}} +\def\@headnote{} + +\def\papertype#1{\gdef\paper@type{\MakeUppercase{#1}}} +\def\paper@type{} + +\def\ch@ckobl#1#2{\@ifundefined{@#1} + {\typeout{SVJour2 warning: Missing +\expandafter\string\csname#1\endcsname}% + \csname #1\endcsname{#2}} + {}} +% +\def\ProcessRunnHead{% + \def\\{\unskip\ \ignorespaces}% + \def\thanks##1{\unskip{}}% + \instindent=\textwidth + \advance\instindent by-\headlineindent + \if!\the\titlerunning!\else + \edef\@title{\the\titlerunning}% + \fi + \global\setbox\titrun=\hbox{\small\rmfamily\unboldmath\ignorespaces\@title + \unskip}% + \ifdim\wd\titrun>\instindent + \typeout{^^JSVJour2 Warning: Title too long for running head.}% + \typeout{Please supply a shorter form with \string\titlerunning + \space prior to \string\maketitle}% + \global\setbox\titrun=\hbox{\small\rmfamily + Title Suppressed Due to Excessive Length}% + \fi + \xdef\@title{\copy\titrun}% +% + \if!\the\authorrunning! + \else + \setcounter{auth}{1}% + \edef\@author{\the\authorrunning}% + \fi + \ifnum\value{inst}>\authdepth + \def\stripauthor##1\and##2\endauthor{% + \protected@xdef\@author{##1\unskip\unskip\if!##2!\else\ et al.\fi}}% + \expandafter\stripauthor\@author\and\endauthor + \else + \gdef\and{\unskip, \ignorespaces}% + {\def\and{\noexpand\protect\noexpand\and}% + \protected@xdef\@author{\@author}} + \fi + \global\setbox\authrun=\hbox{\small\rmfamily\unboldmath\ignorespaces + \@author\unskip}% + \ifdim\wd\authrun>\instindent + \typeout{^^JSVJour2 Warning: Author name(s) too long for running head. + ^^JPlease supply a shorter form with \string\authorrunning + \space prior to \string\maketitle}% + \global\setbox\authrun=\hbox{\small\rmfamily Please give a shorter version + with: {\tt\string\authorrunning\space and + \string\titlerunning\space prior to \string\maketitle}}% + \fi + \xdef\@author{\copy\authrun}% + \markboth{\@author}{\@title}% +} +% +\let\orithanks=\thanks +\def\thanks#1{\ClassWarning{SVJour2}{\string\thanks\space may only be +used inside of \string\title, \string\author,\MessageBreak +and \string\date\space prior to \string\maketitle}} +% +\def\maketitle{\par\let\thanks=\orithanks +\ch@ckobl{journalname}{Noname} +\ch@ckobl{date}{the date of receipt and acceptance should be inserted +later} +\ch@ckobl{title}{A title should be given} +\ch@ckobl{author}{Name(s) and initial(s) of author(s) should be given} +\ch@ckobl{institute}{Address(es) of author(s) should be given} +\begingroup +% + \renewcommand\thefootnote{\@fnsymbol\c@footnote}% + \def\@makefnmark{$^{\@thefnmark}$}% + \renewcommand\@makefntext[1]{% + \noindent + \hb@xt@\bibindent{\hss\@makefnmark\enspace}##1\vrule height0pt + width0pt depth8pt} +% + \def\lastand{\ifnum\value{inst}=2\relax + \unskip{} \andname\ + \else + \unskip, \andname\ + \fi}% + \def\and{\stepcounter{auth}\relax + \if@smartand + \ifnum\value{auth}=\value{inst}% + \lastand + \else + \unskip, + \fi + \else + \unskip, + \fi}% + \thispagestyle{empty} + \ifnum \col@number=\@ne + \@maketitle + \else + \twocolumn[\@maketitle]% + \fi +% + \global\@topnum\z@ + \if!\@thanks!\else + \@thanks +\insert\footins{\vskip-3pt\hrule width\columnwidth\vskip3pt}% + \fi + {\def\thanks##1{\unskip{}}% + \def\iand{\\[5pt]\let\and=\nand}% + \def\nand{\ifhmode\unskip\nobreak\fi\ $\cdot$ }% + \let\and=\nand + \def\at{\\\let\and=\iand}% + \footnotetext[0]{\kern-\bibindent + \ignorespaces\@institute}\vspace{5dd}}% +%\if!\@mail!\else +% \footnotetext[0]{\kern-\bibindent\mailname\ +% \ignorespaces\@mail}% +%\fi +% + \if@runhead + \ProcessRunnHead + \fi +% + \endgroup + \setcounter{footnote}{0} + \global\let\thanks\relax + \global\let\maketitle\relax + \global\let\@maketitle\relax + \global\let\@thanks\@empty + \global\let\@author\@empty + \global\let\@date\@empty + \global\let\@title\@empty + \global\let\@subtitle\@empty + \global\let\title\relax + \global\let\author\relax + \global\let\date\relax + \global\let\and\relax} + +\def\makeheadbox{{% +\hbox to0pt{\vbox{\baselineskip=10dd\hrule\hbox +to\hsize{\vrule\kern3pt\vbox{\kern3pt +\hbox{\bfseries\@journalname\ manuscript No.} +\hbox{(will be inserted by the editor)} +\kern3pt}\hfil\kern3pt\vrule}\hrule}% +\hss}}} +% +\def\rubric{\setbox0=\hbox{\small\strut}\@tempdima=\ht0\advance +\@tempdima\dp0\advance\@tempdima2\fboxsep\vrule\@height\@tempdima +\@width\z@} +\newdimen\rubricwidth +% +\def\@maketitle{\newpage +\normalfont +\vbox to0pt{\if@twocolumn\vskip-39pt\else\vskip-49pt\fi +\nointerlineskip +\makeheadbox\vss}\nointerlineskip +\vbox to 0pt{\offinterlineskip\rubricwidth=\columnwidth +\vskip-12.5pt +\if@twocolumn\else % one column journal + \divide\rubricwidth by144\multiply\rubricwidth by89 % perform golden section + \vskip-\topskip +\fi +\hrule\@height0.35mm\noindent +\advance\fboxsep by.25mm +\global\advance\rubricwidth by0pt +\rubric +\vss}\vskip19.5pt +% +\if@twocolumn\else + \gdef\footnoterule{% + \kern-3\p@ + \hrule\@width\columnwidth %rubricwidth + \kern2.6\p@} +\fi +% + \setbox\authrun=\vbox\bgroup + \hrule\@height 9mm\@width0\p@ + \pretolerance=10000 + \rightskip=0pt plus 4cm + \nothanksmarks +% \if!\@headnote!\else +% \noindent +% {\LARGE\normalfont\itshape\ignorespaces\@headnote\par}\vskip 3.5mm +% \fi + {\authorfont + \setbox0=\vbox{\setcounter{auth}{1}\def\and{\stepcounter{auth} }% + \hfuzz=2\textwidth\def\thanks##1{}\@author}% + \setcounter{footnote}{0}% + \global\value{inst}=\value{auth}% + \setcounter{auth}{1}% + \if@twocolumn + \rightskip43mm plus 4cm minus 3mm + \else % one column journal + \rightskip=\linewidth + \advance\rightskip by-\rubricwidth + \advance\rightskip by0pt plus 4cm minus 3mm + \fi +% +\def\and{\unskip\nobreak\enskip{\boldmath$\cdot$}\enskip\ignorespaces}% + \noindent\ignorespaces\@author\vskip7.23pt} + {\LARGE\bfseries + \noindent\ignorespaces + \@title \par}\vskip 11.24pt\relax + \if!\@subtitle!\else + {\large\bfseries + \pretolerance=10000 + \rightskip=0pt plus 3cm + \vskip-5pt + \noindent\ignorespaces\@subtitle \par}\vskip 11.24pt + \fi + \small + \if!\@dedic!\else + \par + \normalsize\it + \addvspace\baselineskip + \noindent\@dedic + \fi + \egroup % end of header box + \@tempdima=\headerboxheight + \advance\@tempdima by-\ht\authrun + \unvbox\authrun + \ifdim\@tempdima>0pt + \vrule width0pt height\@tempdima\par + \fi + \noindent{\small\@date\vskip 6.2mm} + \global\@minipagetrue + \global\everypar{\global\@minipagefalse\global\everypar{}}% +%\vskip22.47pt +} +% +\if@mathematic + \def\vec#1{\ensuremath{\mathchoice + {\mbox{\boldmath$\displaystyle\mathbf{#1}$}} + {\mbox{\boldmath$\textstyle\mathbf{#1}$}} + {\mbox{\boldmath$\scriptstyle\mathbf{#1}$}} + {\mbox{\boldmath$\scriptscriptstyle\mathbf{#1}$}}}} +\else + \def\vec#1{\ensuremath{\mathchoice + {\mbox{\boldmath$\displaystyle#1$}} + {\mbox{\boldmath$\textstyle#1$}} + {\mbox{\boldmath$\scriptstyle#1$}} + {\mbox{\boldmath$\scriptscriptstyle#1$}}}} +\fi +% +\def\tens#1{\ensuremath{\mathsf{#1}}} +% +\setcounter{secnumdepth}{3} +\newcounter {section} +\newcounter {subsection}[section] +\newcounter {subsubsection}[subsection] +\newcounter {paragraph}[subsubsection] +\newcounter {subparagraph}[paragraph] +\renewcommand\thesection {\@arabic\c@section} +\renewcommand\thesubsection {\thesection.\@arabic\c@subsection} +\renewcommand\thesubsubsection{\thesubsection.\@arabic\c@subsubsection} +\renewcommand\theparagraph {\thesubsubsection.\@arabic\c@paragraph} +\renewcommand\thesubparagraph {\theparagraph.\@arabic\c@subparagraph} +% +\def\@hangfrom#1{\setbox\@tempboxa\hbox{#1}% + \hangindent \z@\noindent\box\@tempboxa} +% +\def\@seccntformat#1{\csname the#1\endcsname\sectcounterend +\hskip\betweenumberspace} +% +\newif\if@sectrule +\if@twocolumn\else\let\@sectruletrue=\relax\fi +\if@avier\let\@sectruletrue=\relax\fi +\def\makesectrule{\if@sectrule\global\@sectrulefalse\null\vglue-\topskip +\hrule\nobreak\parskip=5pt\relax\fi} +% +\let\makesectruleori=\makesectrule +\def\restoresectrule{\global\let\makesectrule=\makesectruleori\global\@sectrulefalse} +\def\nosectrule{\let\makesectrule=\restoresectrule} +% +\def\@startsection#1#2#3#4#5#6{% + \if@noskipsec \leavevmode \fi + \par + \@tempskipa #4\relax + \@afterindenttrue + \ifdim \@tempskipa <\z@ + \@tempskipa -\@tempskipa \@afterindentfalse + \fi + \if@nobreak + \everypar{}% + \else + \addpenalty\@secpenalty\addvspace\@tempskipa + \fi + \ifnum#2=1\relax\@sectruletrue\fi + \@ifstar + {\@ssect{#3}{#4}{#5}{#6}}% + {\@dblarg{\@sect{#1}{#2}{#3}{#4}{#5}{#6}}}} +% +\def\@sect#1#2#3#4#5#6[#7]#8{% + \ifnum #2>\c@secnumdepth + \let\@svsec\@empty + \else + \refstepcounter{#1}% + \protected@edef\@svsec{\@seccntformat{#1}\relax}% + \fi + \@tempskipa #5\relax + \ifdim \@tempskipa>\z@ + \begingroup + #6{\makesectrule + \@hangfrom{\hskip #3\relax\@svsec}% + \raggedright + \hyphenpenalty \@M% + \interlinepenalty \@M #8\@@par}% + \endgroup + \csname #1mark\endcsname{#7}% + \addcontentsline{toc}{#1}{% + \ifnum #2>\c@secnumdepth \else + \protect\numberline{\csname the#1\endcsname\sectcounterend}% + \fi + #7}% + \else + \def\@svsechd{% + #6{\hskip #3\relax + \@svsec #8\/\hskip\aftertext}% + \csname #1mark\endcsname{#7}% + \addcontentsline{toc}{#1}{% + \ifnum #2>\c@secnumdepth \else + \protect\numberline{\csname the#1\endcsname}% + \fi + #7}}% + \fi + \@xsect{#5}} +% +\def\@ssect#1#2#3#4#5{% + \@tempskipa #3\relax + \ifdim \@tempskipa>\z@ + \begingroup + #4{\makesectrule + \@hangfrom{\hskip #1}% + \interlinepenalty \@M #5\@@par}% + \endgroup + \else + \def\@svsechd{#4{\hskip #1\relax #5}}% + \fi + \@xsect{#3}} + +% +% measures and setting of sections +% +\def\section{\@startsection{section}{1}{\z@}% + {-21dd plus-8pt minus-4pt}{10.5dd} + {\normalsize\bfseries\boldmath}} +\def\subsection{\@startsection{subsection}{2}{\z@}% + {-21dd plus-8pt minus-4pt}{10.5dd} + {\normalsize\upshape}} +\def\subsubsection{\@startsection{subsubsection}{3}{\z@}% + {-13dd plus-8pt minus-4pt}{10.5dd} + {\normalsize\itshape}} +\def\paragraph{\@startsection{paragraph}{4}{\z@}% + {-13pt plus-8pt minus-4pt}{\z@}{\normalsize\itshape}} + +\setlength\leftmargini {\parindent} +\leftmargin \leftmargini +\setlength\leftmarginii {\parindent} +\setlength\leftmarginiii {1.87em} +\setlength\leftmarginiv {1.7em} +\setlength\leftmarginv {.5em} +\setlength\leftmarginvi {.5em} +\setlength \labelsep {.5em} +\setlength \labelwidth{\leftmargini} +\addtolength\labelwidth{-\labelsep} +\@beginparpenalty -\@lowpenalty +\@endparpenalty -\@lowpenalty +\@itempenalty -\@lowpenalty +\renewcommand\theenumi{\@arabic\c@enumi} +\renewcommand\theenumii{\@alph\c@enumii} +\renewcommand\theenumiii{\@roman\c@enumiii} +\renewcommand\theenumiv{\@Alph\c@enumiv} +\newcommand\labelenumi{\theenumi.} +\newcommand\labelenumii{(\theenumii)} +\newcommand\labelenumiii{\theenumiii.} +\newcommand\labelenumiv{\theenumiv.} +\renewcommand\p@enumii{\theenumi} +\renewcommand\p@enumiii{\theenumi(\theenumii)} +\renewcommand\p@enumiv{\p@enumiii\theenumiii} +\newcommand\labelitemi{\normalfont\bfseries --} +\newcommand\labelitemii{\normalfont\bfseries --} +\newcommand\labelitemiii{$\m@th\bullet$} +\newcommand\labelitemiv{$\m@th\cdot$} + +\if@spthms +% definition of the "\spnewtheorem" command. +% +% Usage: +% +% \spnewtheorem{env_nam}{caption}[within]{cap_font}{body_font} +% or \spnewtheorem{env_nam}[numbered_like]{caption}{cap_font}{body_font} +% or \spnewtheorem*{env_nam}{caption}{cap_font}{body_font} +% +% New is "cap_font" and "body_font". It stands for +% fontdefinition of the caption and the text itself. +% +% "\spnewtheorem*" gives a theorem without number. +% +% A defined spnewthoerem environment is used as described +% by Lamport. +% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\def\@thmcountersep{} +\def\@thmcounterend{} +\newcommand\nocaption{\noexpand\@gobble} +\newdimen\spthmsep \spthmsep=5pt + +\def\spnewtheorem{\@ifstar{\@sthm}{\@Sthm}} + +% definition of \spnewtheorem with number + +\def\@spnthm#1#2{% + \@ifnextchar[{\@spxnthm{#1}{#2}}{\@spynthm{#1}{#2}}} +\def\@Sthm#1{\@ifnextchar[{\@spothm{#1}}{\@spnthm{#1}}} + +\def\@spxnthm#1#2[#3]#4#5{\expandafter\@ifdefinable\csname #1\endcsname + {\@definecounter{#1}\@addtoreset{#1}{#3}% + \expandafter\xdef\csname the#1\endcsname{\expandafter\noexpand + \csname the#3\endcsname \noexpand\@thmcountersep \@thmcounter{#1}}% + \expandafter\xdef\csname #1name\endcsname{#2}% + \global\@namedef{#1}{\@spthm{#1}{\csname #1name\endcsname}{#4}{#5}}% + \global\@namedef{end#1}{\@endtheorem}}} + +\def\@spynthm#1#2#3#4{\expandafter\@ifdefinable\csname #1\endcsname + {\@definecounter{#1}% + \expandafter\xdef\csname the#1\endcsname{\@thmcounter{#1}}% + \expandafter\xdef\csname #1name\endcsname{#2}% + \global\@namedef{#1}{\@spthm{#1}{\csname #1name\endcsname}{#3}{#4}}% + \global\@namedef{end#1}{\@endtheorem}}} + +\def\@spothm#1[#2]#3#4#5{% + \@ifundefined{c@#2}{\@latexerr{No theorem environment `#2' defined}\@eha}% + {\expandafter\@ifdefinable\csname #1\endcsname + {\global\@namedef{the#1}{\@nameuse{the#2}}% + \expandafter\xdef\csname #1name\endcsname{#3}% + \global\@namedef{#1}{\@spthm{#2}{\csname #1name\endcsname}{#4}{#5}}% + \global\@namedef{end#1}{\@endtheorem}}}} + +\def\@spthm#1#2#3#4{\topsep 7\p@ \@plus2\p@ \@minus4\p@ +\labelsep=\spthmsep\refstepcounter{#1}% +\@ifnextchar[{\@spythm{#1}{#2}{#3}{#4}}{\@spxthm{#1}{#2}{#3}{#4}}} + +\def\@spxthm#1#2#3#4{\@spbegintheorem{#2}{\csname the#1\endcsname}{#3}{#4}% + \ignorespaces} + +\def\@spythm#1#2#3#4[#5]{\@spopargbegintheorem{#2}{\csname + the#1\endcsname}{#5}{#3}{#4}\ignorespaces} + +\def\normalthmheadings{\def\@spbegintheorem##1##2##3##4{\trivlist\normalfont + \item[\hskip\labelsep{##3##1\ ##2\@thmcounterend}]##4} +\def\@spopargbegintheorem##1##2##3##4##5{\trivlist + \item[\hskip\labelsep{##4##1\ ##2}]{##4(##3)\@thmcounterend\ }##5}} +\normalthmheadings + +\def\reversethmheadings{\def\@spbegintheorem##1##2##3##4{\trivlist\normalfont + \item[\hskip\labelsep{##3##2\ ##1\@thmcounterend}]##4} +\def\@spopargbegintheorem##1##2##3##4##5{\trivlist + \item[\hskip\labelsep{##4##2\ ##1}]{##4(##3)\@thmcounterend\ }##5}} + +% definition of \spnewtheorem* without number + +\def\@sthm#1#2{\@Ynthm{#1}{#2}} + +\def\@Ynthm#1#2#3#4{\expandafter\@ifdefinable\csname #1\endcsname + {\global\@namedef{#1}{\@Thm{\csname #1name\endcsname}{#3}{#4}}% + \expandafter\xdef\csname #1name\endcsname{#2}% + \global\@namedef{end#1}{\@endtheorem}}} + +\def\@Thm#1#2#3{\topsep 7\p@ \@plus2\p@ \@minus4\p@ +\@ifnextchar[{\@Ythm{#1}{#2}{#3}}{\@Xthm{#1}{#2}{#3}}} + +\def\@Xthm#1#2#3{\@Begintheorem{#1}{#2}{#3}\ignorespaces} + +\def\@Ythm#1#2#3[#4]{\@Opargbegintheorem{#1} + {#4}{#2}{#3}\ignorespaces} + +\def\@Begintheorem#1#2#3{#3\trivlist + \item[\hskip\labelsep{#2#1\@thmcounterend}]} + +\def\@Opargbegintheorem#1#2#3#4{#4\trivlist + \item[\hskip\labelsep{#3#1}]{#3(#2)\@thmcounterend\ }} + +% initialize theorem environment + +\if@envcntsect + \def\@thmcountersep{.} + \spnewtheorem{theorem}{Theorem}[section]{\bfseries}{\itshape} +\else + \spnewtheorem{theorem}{Theorem}{\bfseries}{\itshape} + \if@envcntreset + \@addtoreset{theorem}{section} + \else + \@addtoreset{theorem}{chapter} + \fi +\fi + +%definition of divers theorem environments +\spnewtheorem*{claim}{Claim}{\itshape}{\rmfamily} +\spnewtheorem*{proof}{Proof}{\itshape}{\rmfamily} +\if@envcntsame % all environments like "Theorem" - using its counter + \def\spn@wtheorem#1#2#3#4{\@spothm{#1}[theorem]{#2}{#3}{#4}} +\else % all environments with their own counter + \if@envcntsect % show section counter + \def\spn@wtheorem#1#2#3#4{\@spxnthm{#1}{#2}[section]{#3}{#4}} + \else % not numbered with section + \if@envcntreset + \def\spn@wtheorem#1#2#3#4{\@spynthm{#1}{#2}{#3}{#4} + \@addtoreset{#1}{section}} + \else + \let\spn@wtheorem=\@spynthm + \fi + \fi +\fi +% +\let\spdefaulttheorem=\spn@wtheorem +% +\spn@wtheorem{case}{Case}{\itshape}{\rmfamily} +\spn@wtheorem{conjecture}{Conjecture}{\itshape}{\rmfamily} +\spn@wtheorem{corollary}{Corollary}{\bfseries}{\itshape} +\spn@wtheorem{definition}{Definition}{\bfseries}{\rmfamily} +\spn@wtheorem{example}{Example}{\itshape}{\rmfamily} +\spn@wtheorem{exercise}{Exercise}{\bfseries}{\rmfamily} +\spn@wtheorem{lemma}{Lemma}{\bfseries}{\itshape} +\spn@wtheorem{note}{Note}{\itshape}{\rmfamily} +\spn@wtheorem{problem}{Problem}{\bfseries}{\rmfamily} +\spn@wtheorem{property}{Property}{\itshape}{\rmfamily} +\spn@wtheorem{proposition}{Proposition}{\bfseries}{\itshape} +\spn@wtheorem{question}{Question}{\itshape}{\rmfamily} +\spn@wtheorem{solution}{Solution}{\bfseries}{\rmfamily} +\spn@wtheorem{remark}{Remark}{\itshape}{\rmfamily} +% +\newenvironment{theopargself} + {\def\@spopargbegintheorem##1##2##3##4##5{\trivlist + \item[\hskip\labelsep{##4##1\ ##2}]{##4##3\@thmcounterend\ }##5} + \def\@Opargbegintheorem##1##2##3##4{##4\trivlist + \item[\hskip\labelsep{##3##1}]{##3##2\@thmcounterend\ }}}{} +\newenvironment{theopargself*} + {\def\@spopargbegintheorem##1##2##3##4##5{\trivlist + \item[\hskip\labelsep{##4##1\ ##2}]{\hspace*{-\labelsep}##4##3\@thmcounterend}##5} + \def\@Opargbegintheorem##1##2##3##4{##4\trivlist + \item[\hskip\labelsep{##3##1}]{\hspace*{-\labelsep}##3##2\@thmcounterend}}}{} +% +\fi + +\def\@takefromreset#1#2{% + \def\@tempa{#1}% + \let\@tempd\@elt + \def\@elt##1{% + \def\@tempb{##1}% + \ifx\@tempa\@tempb\else + \@addtoreset{##1}{#2}% + \fi}% + \expandafter\expandafter\let\expandafter\@tempc\csname cl@#2\endcsname + \expandafter\def\csname cl@#2\endcsname{}% + \@tempc + \let\@elt\@tempd} + +\def\squareforqed{\hbox{\rlap{$\sqcap$}$\sqcup$}} +\def\qed{\ifmmode\else\unskip\quad\fi\squareforqed} +\def\smartqed{\def\qed{\ifmmode\squareforqed\else{\unskip\nobreak\hfil +\penalty50\hskip1em\null\nobreak\hfil\squareforqed +\parfillskip=0pt\finalhyphendemerits=0\endgraf}\fi}} + +% Define `abstract' environment +\def\abstract{\topsep=0pt\partopsep=0pt\parsep=0pt\itemsep=0pt\relax +\trivlist\item[\hskip\labelsep +{\bfseries\abstractname}]\if!\abstractname!\hskip-\labelsep\fi} +\if@twocolumn + \if@avier + \def\endabstract{\endtrivlist\addvspace{5mm}\strich} + \def\strich{\hrule\vskip1ptplus12pt} + \else + \def\endabstract{\endtrivlist\addvspace{3mm}} + \fi +\else +\fi +% +\newenvironment{verse} + {\let\\\@centercr + \list{}{\itemsep \z@ + \itemindent -1.5em% + \listparindent\itemindent + \rightmargin \leftmargin + \advance\leftmargin 1.5em}% + \item\relax} + {\endlist} +\newenvironment{quotation} + {\list{}{\listparindent 1.5em% + \itemindent \listparindent + \rightmargin \leftmargin + \parsep \z@ \@plus\p@}% + \item\relax} + {\endlist} +\newenvironment{quote} + {\list{}{\rightmargin\leftmargin}% + \item\relax} + {\endlist} +\newcommand\appendix{\par\small + \setcounter{section}{0}% + \setcounter{subsection}{0}% + \renewcommand\thesection{\@Alph\c@section}} +\setlength\arraycolsep{1.5\p@} +\setlength\tabcolsep{6\p@} +\setlength\arrayrulewidth{.4\p@} +\setlength\doublerulesep{2\p@} +\setlength\tabbingsep{\labelsep} +\skip\@mpfootins = \skip\footins +\setlength\fboxsep{3\p@} +\setlength\fboxrule{.4\p@} +\renewcommand\theequation{\@arabic\c@equation} +\newcounter{figure} +\renewcommand\thefigure{\@arabic\c@figure} +\def\fps@figure{tbp} +\def\ftype@figure{1} +\def\ext@figure{lof} +\def\fnum@figure{\figurename~\thefigure} +\newenvironment{figure} + {\@float{figure}} + {\end@float} +\newenvironment{figure*} + {\@dblfloat{figure}} + {\end@dblfloat} +\newcounter{table} +\renewcommand\thetable{\@arabic\c@table} +\def\fps@table{tbp} +\def\ftype@table{2} +\def\ext@table{lot} +\def\fnum@table{\tablename~\thetable} +\newenvironment{table} + {\@float{table}} + {\end@float} +\newenvironment{table*} + {\@dblfloat{table}} + {\end@dblfloat} +% +\def \@floatboxreset {% + \reset@font + \small + \@setnobreak + \@setminipage +} +% +\newcommand{\tableheadseprule}{\noalign{\hrule height.375mm}} +% +\newlength\abovecaptionskip +\newlength\belowcaptionskip +\setlength\abovecaptionskip{10\p@} +\setlength\belowcaptionskip{0\p@} +\newcommand\leftlegendglue{} + +\def\fig@type{figure} + +\newdimen\figcapgap\figcapgap=3pt +\newdimen\tabcapgap\tabcapgap=5.5pt + +\@ifundefined{floatlegendstyle}{\def\floatlegendstyle{\bfseries}}{} + +\long\def\@caption#1[#2]#3{\par\addcontentsline{\csname + ext@#1\endcsname}{#1}{\protect\numberline{\csname + the#1\endcsname}{\ignorespaces #2}}\begingroup + \@parboxrestore + \@makecaption{\csname fnum@#1\endcsname}{\ignorespaces #3}\par + \endgroup} + +\def\capstrut{\vrule\@width\z@\@height\topskip} + +\@ifundefined{captionstyle}{\def\captionstyle{\normalfont\small}}{} + +\long\def\@makecaption#1#2{% + \captionstyle + \ifx\@captype\fig@type + \vskip\figcapgap + \fi + \setbox\@tempboxa\hbox{{\floatlegendstyle #1\floatcounterend}% + \capstrut #2}% + \ifdim \wd\@tempboxa >\hsize + {\floatlegendstyle #1\floatcounterend}\capstrut #2\par + \else + \hbox to\hsize{\leftlegendglue\unhbox\@tempboxa\hfil}% + \fi + \ifx\@captype\fig@type\else + \vskip\tabcapgap + \fi} + +\newdimen\figgap\figgap=1cc +\long\def\@makesidecaption#1#2{% + \parbox[b]{\@tempdimb}{\captionstyle{\floatlegendstyle + #1\floatcounterend}#2}} +\def\sidecaption#1\caption{% +\setbox\@tempboxa=\hbox{#1\unskip}% +\if@twocolumn + \ifdim\hsize<\textwidth\else + \ifdim\wd\@tempboxa<\columnwidth + \typeout{Double column float fits into single column - + ^^Jyou'd better switch the environment. }% + \fi + \fi +\fi +\@tempdimb=\hsize +\advance\@tempdimb by-\figgap +\advance\@tempdimb by-\wd\@tempboxa +\ifdim\@tempdimb<3cm + \typeout{\string\sidecaption: No sufficient room for the legend; + using normal \string\caption. }% + \unhbox\@tempboxa + \let\@capcommand=\@caption +\else + \let\@capcommand=\@sidecaption + \leavevmode + \unhbox\@tempboxa + \hfill +\fi +\refstepcounter\@captype +\@dblarg{\@capcommand\@captype}} + +\long\def\@sidecaption#1[#2]#3{\addcontentsline{\csname + ext@#1\endcsname}{#1}{\protect\numberline{\csname + the#1\endcsname}{\ignorespaces #2}}\begingroup + \@parboxrestore + \@makesidecaption{\csname fnum@#1\endcsname}{\ignorespaces #3}\par + \endgroup} + +% Define `acknowledgement' environment +\def\acknowledgement{\par\addvspace{17pt}\small\rmfamily +\trivlist\if!\ackname!\item[]\else +\item[\hskip\labelsep +{\bfseries\ackname}]\fi} +\def\endacknowledgement{\endtrivlist\addvspace{6pt}} +\newenvironment{acknowledgements}{\begin{acknowledgement}} +{\end{acknowledgement}} +% Define `noteadd' environment +\def\noteadd{\par\addvspace{17pt}\small\rmfamily +\trivlist\item[\hskip\labelsep +{\itshape\noteaddname}]} +\def\endnoteadd{\endtrivlist\addvspace{6pt}} + +\DeclareOldFontCommand{\rm}{\normalfont\rmfamily}{\mathrm} +\DeclareOldFontCommand{\sf}{\normalfont\sffamily}{\mathsf} +\DeclareOldFontCommand{\tt}{\normalfont\ttfamily}{\mathtt} +\DeclareOldFontCommand{\bf}{\normalfont\bfseries}{\mathbf} +\DeclareOldFontCommand{\it}{\normalfont\itshape}{\mathit} +\DeclareOldFontCommand{\sl}{\normalfont\slshape}{\@nomath\sl} +\DeclareOldFontCommand{\sc}{\normalfont\scshape}{\@nomath\sc} +\DeclareRobustCommand*\cal{\@fontswitch\relax\mathcal} +\DeclareRobustCommand*\mit{\@fontswitch\relax\mathnormal} +\newcommand\@pnumwidth{1.55em} +\newcommand\@tocrmarg{2.55em} +\newcommand\@dotsep{4.5} +\setcounter{tocdepth}{1} +\newcommand\tableofcontents{% + \section*{\contentsname}% + \@starttoc{toc}% + \addtocontents{toc}{\begingroup\protect\small}% + \AtEndDocument{\addtocontents{toc}{\endgroup}}% + } +\newcommand*\l@part[2]{% + \ifnum \c@tocdepth >-2\relax + \addpenalty\@secpenalty + \addvspace{2.25em \@plus\p@}% + \begingroup + \setlength\@tempdima{3em}% + \parindent \z@ \rightskip \@pnumwidth + \parfillskip -\@pnumwidth + {\leavevmode + \large \bfseries #1\hfil \hb@xt@\@pnumwidth{\hss #2}}\par + \nobreak + \if@compatibility + \global\@nobreaktrue + \everypar{\global\@nobreakfalse\everypar{}}% + \fi + \endgroup + \fi} +\newcommand*\l@section{\@dottedtocline{1}{0pt}{1.5em}} +\newcommand*\l@subsection{\@dottedtocline{2}{1.5em}{2.3em}} +\newcommand*\l@subsubsection{\@dottedtocline{3}{3.8em}{3.2em}} +\newcommand*\l@paragraph{\@dottedtocline{4}{7.0em}{4.1em}} +\newcommand*\l@subparagraph{\@dottedtocline{5}{10em}{5em}} +\newcommand\listoffigures{% + \section*{\listfigurename + \@mkboth{\listfigurename}% + {\listfigurename}}% + \@starttoc{lof}% + } +\newcommand*\l@figure{\@dottedtocline{1}{1.5em}{2.3em}} +\newcommand\listoftables{% + \section*{\listtablename + \@mkboth{\listtablename}{\listtablename}}% + \@starttoc{lot}% + } +\let\l@table\l@figure +\newdimen\bibindent +\setlength\bibindent{\parindent} +\def\@biblabel#1{#1.} +\def\@lbibitem[#1]#2{\item[{[#1]}\hfill]\if@filesw + {\let\protect\noexpand + \immediate + \write\@auxout{\string\bibcite{#2}{#1}}}\fi\ignorespaces} +\newenvironment{thebibliography}[1] + {\section*{\refname + \@mkboth{\refname}{\refname}}\small + \list{\@biblabel{\@arabic\c@enumiv}}% + {\settowidth\labelwidth{\@biblabel{#1}}% + \leftmargin\labelwidth + \advance\leftmargin\labelsep + \@openbib@code + \usecounter{enumiv}% + \let\p@enumiv\@empty + \renewcommand\theenumiv{\@arabic\c@enumiv}}% + \sloppy\clubpenalty4000\widowpenalty4000% + \sfcode`\.\@m} + {\def\@noitemerr + {\@latex@warning{Empty `thebibliography' environment}}% + \endlist} +% +\newcount\@tempcntc +\def\@citex[#1]#2{\if@filesw\immediate\write\@auxout{\string\citation{#2}}\fi + \@tempcnta\z@\@tempcntb\m@ne\def\@citea{}\@cite{\@for\@citeb:=#2\do + {\@ifundefined + {b@\@citeb}{\@citeo\@tempcntb\m@ne\@citea\def\@citea{,}{\bfseries + ?}\@warning + {Citation `\@citeb' on page \thepage \space undefined}}% + {\setbox\z@\hbox{\global\@tempcntc0\csname b@\@citeb\endcsname\relax}% + \ifnum\@tempcntc=\z@ \@citeo\@tempcntb\m@ne + \@citea\def\@citea{,\hskip0.1em\ignorespaces}\hbox{\csname b@\@citeb\endcsname}% + \else + \advance\@tempcntb\@ne + \ifnum\@tempcntb=\@tempcntc + \else\advance\@tempcntb\m@ne\@citeo + \@tempcnta\@tempcntc\@tempcntb\@tempcntc\fi\fi}}\@citeo}{#1}} +\def\@citeo{\ifnum\@tempcnta>\@tempcntb\else + \@citea\def\@citea{,\hskip0.1em\ignorespaces}% + \ifnum\@tempcnta=\@tempcntb\the\@tempcnta\else + {\advance\@tempcnta\@ne\ifnum\@tempcnta=\@tempcntb \else \def\@citea{--}\fi + \advance\@tempcnta\m@ne\the\@tempcnta\@citea\the\@tempcntb}\fi\fi} +% +\newcommand\newblock{\hskip .11em\@plus.33em\@minus.07em} +\let\@openbib@code\@empty +\newenvironment{theindex} + {\if@twocolumn + \@restonecolfalse + \else + \@restonecoltrue + \fi + \columnseprule \z@ + \columnsep 35\p@ + \twocolumn[\section*{\indexname}]% + \@mkboth{\indexname}{\indexname}% + \thispagestyle{plain}\parindent\z@ + \parskip\z@ \@plus .3\p@\relax + \let\item\@idxitem} + {\if@restonecol\onecolumn\else\clearpage\fi} +\newcommand\@idxitem{\par\hangindent 40\p@} +\newcommand\subitem{\@idxitem \hspace*{20\p@}} +\newcommand\subsubitem{\@idxitem \hspace*{30\p@}} +\newcommand\indexspace{\par \vskip 10\p@ \@plus5\p@ \@minus3\p@\relax} + +\if@twocolumn + \renewcommand\footnoterule{% + \kern-3\p@ + \hrule\@width\columnwidth + \kern2.6\p@} +\else + \renewcommand\footnoterule{% + \kern-3\p@ + \hrule\@width.382\columnwidth + \kern2.6\p@} +\fi +\newcommand\@makefntext[1]{% + \noindent + \hb@xt@\bibindent{\hss\@makefnmark\enspace}#1} +% +\def\trans@english{\switcht@albion} +\def\trans@french{\switcht@francais} +\def\trans@german{\switcht@deutsch} +\newenvironment{translation}[1]{\if!#1!\else +\@ifundefined{selectlanguage}{\csname trans@#1\endcsname}{\selectlanguage{#1}}% +\fi}{} +% languages +% English section +\def\switcht@albion{%\typeout{English spoken.}% + \def\abstractname{Abstract}% + \def\ackname{Acknowledgements}% + \def\andname{and}% + \def\lastandname{, and}% + \def\appendixname{Appendix}% + \def\chaptername{Chapter}% + \def\claimname{Claim}% + \def\conjecturename{Conjecture}% + \def\contentsname{Contents}% + \def\corollaryname{Corollary}% + \def\definitionname{Definition}% + \def\emailname{E-mail}% + \def\examplename{Example}% + \def\exercisename{Exercise}% + \def\figurename{Fig.}% + \def\keywordname{{\bfseries Keywords}}% + \def\indexname{Index}% + \def\lemmaname{Lemma}% + \def\contriblistname{List of Contributors}% + \def\listfigurename{List of Figures}% + \def\listtablename{List of Tables}% + \def\mailname{{\itshape Correspondence to\/}:}% + \def\noteaddname{Note added in proof}% + \def\notename{Note}% + \def\partname{Part}% + \def\problemname{Problem}% + \def\proofname{Proof}% + \def\propertyname{Property}% + \def\questionname{Question}% + \def\refname{References}% + \def\remarkname{Remark}% + \def\seename{see}% + \def\solutionname{Solution}% + \def\tablename{Table}% + \def\theoremname{Theorem}% +}\switcht@albion % make English default +% +% French section +\def\switcht@francais{\svlanginfo +%\typeout{On parle francais.}% + \def\abstractname{R\'esum\'e\runinend}% + \def\ackname{Remerciements\runinend}% + \def\andname{et}% + \def\lastandname{ et}% + \def\appendixname{Appendice}% + \def\chaptername{Chapitre}% + \def\claimname{Pr\'etention}% + \def\conjecturename{Hypoth\`ese}% + \def\contentsname{Table des mati\`eres}% + \def\corollaryname{Corollaire}% + \def\definitionname{D\'efinition}% + \def\emailname{E-mail}% + \def\examplename{Exemple}% + \def\exercisename{Exercice}% + \def\figurename{Fig.}% + \def\keywordname{{\bfseries Mots-cl\'e\runinend}}% + \def\indexname{Index}% + \def\lemmaname{Lemme}% + \def\contriblistname{Liste des contributeurs}% + \def\listfigurename{Liste des figures}% + \def\listtablename{Liste des tables}% + \def\mailname{{\itshape Correspondence to\/}:}% + \def\noteaddname{Note ajout\'ee \`a l'\'epreuve}% + \def\notename{Remarque}% + \def\partname{Partie}% + \def\problemname{Probl\`eme}% + \def\proofname{Preuve}% + \def\propertyname{Caract\'eristique}% +%\def\propositionname{Proposition}% + \def\questionname{Question}% + \def\refname{Bibliographie}% + \def\remarkname{Remarque}% + \def\seename{voyez}% + \def\solutionname{Solution}% +%\def\subclassname{{\it Subject Classifications\/}:}% + \def\tablename{Tableau}% + \def\theoremname{Th\'eor\`eme}% +} +% +% German section +\def\switcht@deutsch{\svlanginfo +%\typeout{Man spricht deutsch.}% + \def\abstractname{Zusammenfassung\runinend}% + \def\ackname{Danksagung\runinend}% + \def\andname{und}% + \def\lastandname{ und}% + \def\appendixname{Anhang}% + \def\chaptername{Kapitel}% + \def\claimname{Behauptung}% + \def\conjecturename{Hypothese}% + \def\contentsname{Inhaltsverzeichnis}% + \def\corollaryname{Korollar}% +%\def\definitionname{Definition}% + \def\emailname{E-Mail}% + \def\examplename{Beispiel}% + \def\exercisename{\"Ubung}% + \def\figurename{Abb.}% + \def\keywordname{{\bfseries Schl\"usselw\"orter\runinend}}% + \def\indexname{Index}% +%\def\lemmaname{Lemma}% + \def\contriblistname{Mitarbeiter}% + \def\listfigurename{Abbildungsverzeichnis}% + \def\listtablename{Tabellenverzeichnis}% + \def\mailname{{\itshape Correspondence to\/}:}% + \def\noteaddname{Nachtrag}% + \def\notename{Anmerkung}% + \def\partname{Teil}% +%\def\problemname{Problem}% + \def\proofname{Beweis}% + \def\propertyname{Eigenschaft}% +%\def\propositionname{Proposition}% + \def\questionname{Frage}% + \def\refname{Literatur}% + \def\remarkname{Anmerkung}% + \def\seename{siehe}% + \def\solutionname{L\"osung}% +%\def\subclassname{{\it Subject Classifications\/}:}% + \def\tablename{Tabelle}% +%\def\theoremname{Theorem}% +} +\newcommand\today{} +\edef\today{\ifcase\month\or + January\or February\or March\or April\or May\or June\or + July\or August\or September\or October\or November\or December\fi + \space\number\day, \number\year} +\setlength\columnsep{1.5cc} +\setlength\columnseprule{0\p@} +% +\frenchspacing +\clubpenalty=10000 +\widowpenalty=10000 +\def\thisbottomragged{\def\@textbottom{\vskip\z@ plus.0001fil +\global\let\@textbottom\relax}} +\pagestyle{headings} +\pagenumbering{arabic} +\if@twocolumn + \twocolumn +\fi +\if@avier + \onecolumn + \setlength{\textwidth}{156mm} + \setlength{\textheight}{226mm} +\fi +\if@referee + \makereferee +\fi +\flushbottom +\endinput +%% +%% End of file `svjour2.cls'. diff --git a/vldb07/terminology.tex b/vldb07/terminology.tex new file mode 100755 index 0000000..fd2cf1d --- /dev/null +++ b/vldb07/terminology.tex @@ -0,0 +1,18 @@ +% Time-stamp: +\vspace{-3mm} +\section{Notation and terminology} +\vspace{-2mm} +\label{sec:notation} + +\enlargethispage{2\baselineskip} +The essential notation and terminology used throughout this paper are as follows. +\begin{itemize} +\item $U$: key universe. $|U| = u$. +\item $S$: actual static key set. $S \subset U$, $|S| = n \ll u$. +\item $h: U \to M$ is a hash function that maps keys from a universe $U$ into +a given range $M = \{0,1,\dots,m-1\}$ of integer numbers. +\item $h$ is a perfect hash function if it is one-to-one on~$S$, i.e., if + $h(k_1) \not = h(k_2)$ for all $k_1 \not = k_2$ from $S$. +\item $h$ is a minimal perfect hash function (MPHF) if it is one-to-one on~$S$ + and $n=m$. +\end{itemize} diff --git a/vldb07/thealgorithm.tex b/vldb07/thealgorithm.tex new file mode 100755 index 0000000..1fb256f --- /dev/null +++ b/vldb07/thealgorithm.tex @@ -0,0 +1,78 @@ +%% Nivio: 13/jan/06, 21/jan/06 29/jan/06 +% Time-stamp: +\vspace{-3mm} +\section{The algorithm} +\label{sec:new-algorithm} +\vspace{-2mm} + +\enlargethispage{2\baselineskip} +The main idea supporting our algorithm is the classical divide and conquer technique. +The algorithm is a two-step external memory based algorithm +that generates a MPHF $h$ for a set $S$ of $n$ keys. +Figure~\ref{fig:new-algo-main-steps} illustrates the two steps of the +algorithm: the partitioning step and the searching step. + +\vspace{-2mm} +\begin{figure}[ht] +\centering +\begin{picture}(0,0)% +\includegraphics{figs/brz.ps}% +\end{picture}% +\setlength{\unitlength}{4144sp}% +% +\begingroup\makeatletter\ifx\SetFigFont\undefined% +\gdef\SetFigFont#1#2#3#4#5{% + \reset@font\fontsize{#1}{#2pt}% + \fontfamily{#3}\fontseries{#4}\fontshape{#5}% + \selectfont}% +\fi\endgroup% +\begin{picture}(3704,2091)(1426,-5161) +\put(2570,-4301){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}0}}}} +\put(2782,-4301){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}1}}}} +\put(2996,-4301){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}2}}}} +\put(4060,-4006){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Buckets}}}} +\put(3776,-4301){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}${\lceil n/b\rceil - 1}$}}}} +\put(4563,-3329){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Key Set $S$}}}} +\put(2009,-3160){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}0}}}} +\put(2221,-3160){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}1}}}} +\put(4315,-3160){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}n-1}}}} +\put(1992,-5146){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}0}}}} +\put(2204,-5146){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}1}}}} +\put(4298,-5146){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}n-1}}}} +\put(4546,-4977){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Hash Table}}}} +\put(1441,-3616){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Partitioning}}}} +\put(1441,-4426){\makebox(0,0)[lb]{\smash{{\SetFigFont{7}{8.4}{\familydefault}{\mddefault}{\updefault}Searching}}}} +\put(1981,-4786){\makebox(0,0)[lb]{\smash{{\SetFigFont{5}{6.0}{\familydefault}{\mddefault}{\updefault}MPHF$_0$}}}} +\put(2521,-4786){\makebox(0,0)[lb]{\smash{{\SetFigFont{5}{6.0}{\familydefault}{\mddefault}{\updefault}MPHF$_1$}}}} +\put(3016,-4786){\makebox(0,0)[lb]{\smash{{\SetFigFont{5}{6.0}{\familydefault}{\mddefault}{\updefault}MPHF$_2$}}}} +\put(3826,-4786){\makebox(0,0)[lb]{\smash{{\SetFigFont{5}{6.0}{\familydefault}{\mddefault}{\updefault}MPHF$_{\lceil n/b \rceil - 1}$}}}} +\end{picture}% +\vspace{-1mm} +\caption{Main steps of our algorithm} +\label{fig:new-algo-main-steps} +\vspace{-3mm} +\end{figure} + +The partitioning step takes a key set $S$ and uses a universal hash function +$h_0$ proposed by Jenkins~\cite{j97} +%for each key $k \in S$ of length $|k|$ +to transform each key~$k\in S$ into an integer~$h_0(k)$. +Reducing~$h_0(k)$ modulo~$\lceil n/b\rceil$, we partition~$S$ into $\lceil n/b +\rceil$ buckets containing at most 256 keys in each bucket (with high +probability). + +The searching step generates a MPHF$_i$ for each bucket $i$, +$0 \leq i < \lceil n/b \rceil$. +The resulting MPHF $h(k)$, $k \in S$, is given by +\begin{eqnarray}\label{eq:mphf} +h(k) = \mathrm{MPHF}_i (k) + \mathit{offset}[i], +\end{eqnarray} +where~$i=h_0(k)\bmod\lceil n/b\rceil$. +The $i$th entry~$\mathit{offset}[i]$ of the displacement vector +$\mathit{offset}$, $0 \leq i < \lceil n/b \rceil$, contains the total number +of keys in the buckets from 0 to $i-1$, that is, it gives the interval of the +keys in the hash table addressed by the MPHF$_i$. In the following we explain +each step in detail. + + + diff --git a/vldb07/thedataandsetup.tex b/vldb07/thedataandsetup.tex new file mode 100755 index 0000000..8739705 --- /dev/null +++ b/vldb07/thedataandsetup.tex @@ -0,0 +1,21 @@ +% Nivio: 29/jan/06 +% Time-stamp: +\vspace{-2mm} +\subsection{The data and the experimental setup} +\label{sec:data-exper-set} + +The algorithms were implemented in the C language and +are available at \texttt{http://\-cmph.sf.net} +under the GNU Lesser General Public License (LGPL). +% free software licence. +All experiments were carried out on +a computer running the Linux operating system, version 2.6, +with a 2.4 gigahertz processor and +1 gigabyte of main memory. +In the experiments related to the new +algorithm we limited the main memory in 500 megabytes. + +Our data consists of a collection of 1 billion +URLs collected from the Web, each URL 64 characters long on average. +The collection is stored on disk in 60.5 gigabytes. + diff --git a/vldb07/vldb.tex b/vldb07/vldb.tex new file mode 100644 index 0000000..618c108 --- /dev/null +++ b/vldb07/vldb.tex @@ -0,0 +1,194 @@ +%%%%%%%%%%%%%%%%%%%%%%% file template.tex %%%%%%%%%%%%%%%%%%%%%%%%% +% +% This is a template file for the LaTeX package SVJour2 for the +% Springer journal "The VLDB Journal". +% +% Springer Heidelberg 2004/12/03 +% +% Copy it to a new file with a new name and use it as the basis +% for your article. Delete % as needed. +% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +% +% First comes an example EPS file -- just ignore it and +% proceed on the \documentclass line +% your LaTeX will extract the file if required +%\begin{filecontents*}{figs/minimalperfecthash-ph-mph.ps} +%!PS-Adobe-3.0 EPSF-3.0 +%%BoundingBox: 19 19 221 221 +%%CreationDate: Mon Sep 29 1997 +%%Creator: programmed by hand (JK) +%%EndComments +%gsave +%newpath +% 20 20 moveto +% 20 220 lineto +% 220 220 lineto +% 220 20 lineto +%closepath +%2 setlinewidth +%gsave +% .4 setgray fill +%grestore +%stroke +%grestore +%\end{filecontents*} +% +\documentclass[twocolumn,fleqn,runningheads]{svjour2} +% +\smartqed % flush right qed marks, e.g. at end of proof +% +\usepackage{graphicx} +\usepackage{listings} +\usepackage{epsfig} +\usepackage{textcomp} +\usepackage[latin1]{inputenc} +\usepackage{amssymb} + +%\DeclareGraphicsExtensions{.png} +% +% \usepackage{mathptmx} % use Times fonts if available on your TeX system +% +% insert here the call for the packages your document requires +%\usepackage{latexsym} +% etc. +% +% please place your own definitions here and don't use \def but +% \newcommand{}{} +% + +\lstset{ + language=Pascal, + basicstyle=\fontsize{9}{9}\selectfont, + captionpos=t, + aboveskip=1mm, + belowskip=1mm, + abovecaptionskip=1mm, + belowcaptionskip=1mm, +% numbers = left, + mathescape=true, + escapechar=@, + extendedchars=true, + showstringspaces=false, + columns=fixed, + basewidth=0.515em, + frame=single, + framesep=2mm, + xleftmargin=2mm, + xrightmargin=2mm, + framerule=0.5pt +} + +\def\cG{{\mathcal G}} +\def\crit{{\rm crit}} +\def\ncrit{{\rm ncrit}} +\def\scrit{{\rm scrit}} +\def\bedges{{\rm bedges}} +\def\ZZ{{\mathbb Z}} + +\journalname{The VLDB Journal} +% + +\begin{document} + +\title{Space and Time Efficient Minimal Perfect Hash \\[0.2cm] +Functions for Very Large Databases\thanks{ +This work was supported in part by +GERINDO Project--grant MCT/CNPq/CT-INFO 552.087/02-5, +CAPES/PROF Scholarship (Fabiano C. Botelho), +FAPESP Proj.\ Tem.\ 03/09925-5 and CNPq Grant 30.0334/93-1 +(Yoshiharu Kohayakawa), +and CNPq Grant 30.5237/02-0 (Nivio Ziviani).} +} +%\subtitle{Do you have a subtitle?\\ If so, write it here} + +%\titlerunning{Short form of title} % if too long for running head + +\author{Fabiano C. Botelho \and Davi C. Reis \and Yoshiharu Kohayakawa \and Nivio Ziviani} +%\authorrunning{Short form of author list} % if too long for running head +\institute{ +F. C. Botelho \and +N. Ziviani \at +Dept. of Computer Science, +Federal Univ. of Minas Gerais, +Belo Horizonte, Brazil\\ +\email{\{fbotelho,nivio\}@dcc.ufmg.br} +\and +D. C. Reis \at +Google, Brazil \\ +\email{davi.reis@gmail.com} +\and +Y. Kohayakawa +Dept. of Computer Science, +Univ. of S\~ao Paulo, +S\~ao Paulo, Brazil\\ +\email{yoshi@ime.usp.br} +} + +\date{Received: date / Accepted: date} +% The correct dates will be entered by the editor + + +\maketitle + +\begin{abstract} +We propose a novel external memory based algorithm for constructing minimal +perfect hash functions~$h$ for huge sets of keys. +For a set of~$n$ keys, our algorithm outputs~$h$ in time~$O(n)$. +The algorithm needs a small vector of one byte entries +in main memory to construct $h$. +The evaluation of~$h(x)$ requires three memory accesses for any key~$x$. +The description of~$h$ takes a constant number of bits +for each key, which is optimal, i.e., the theoretical lower bound is $1/\ln 2$ +bits per key. +In our experiments, we used a collection of 1 billion URLs collected +from the web, each URL 64 characters long on average. +For this collection, our algorithm +(i) finds a minimal perfect hash function in approximately +3 hours using a commodity PC, +(ii) needs just 5.45 megabytes of internal memory to generate $h$ +and (iii) takes 8.1 bits per key for the description of~$h$. +\keywords{Minimal Perfect Hashing \and Large Databases} +\end{abstract} + +% main text + +\def\cG{{\mathcal G}} +\def\crit{{\rm crit}} +\def\ncrit{{\rm ncrit}} +\def\scrit{{\rm scrit}} +\def\bedges{{\rm bedges}} +\def\ZZ{{\mathbb Z}} +\def\BSmax{\mathit{BS}_{\mathit{max}}} +\def\Bi{\mathop{\rm Bi}\nolimits} + +\input{introduction} +%\input{terminology} +\input{relatedwork} +\input{thealgorithm} +\input{partitioningthekeys} +\input{searching} +%\input{computingoffset} +%\input{hashingbuckets} +\input{determiningb} +%\input{analyticalandexperimentalresults} +\input{analyticalresults} +%\input{results} +\input{conclusions} + + + + +%\input{acknowledgments} +%\begin{acknowledgements} +%If you'd like to thank anyone, place your comments here +%and remove the percent signs. +%\end{acknowledgements} + +% BibTeX users please use +%\bibliographystyle{spmpsci} +%\bibliography{} % name your BibTeX data base +\bibliographystyle{plain} +\bibliography{references} +\input{appendix} +\end{document}