Motiejus Jakštys
37e24524c2
git-subtree-dir: deps/cmph git-subtree-mainline:5040f4007b
git-subtree-split:a250982ade
111 lines
4.9 KiB
Plaintext
111 lines
4.9 KiB
Plaintext
Comparison Between BMZ And CHM Algorithms
|
|
|
|
|
|
%!includeconf: CONFIG.t2t
|
|
|
|
----------------------------------------
|
|
|
|
==Characteristics==
|
|
Table 1 presents the main characteristics of the two algorithms.
|
|
The number of edges in the graph [figs/img27.png] is [figs/img236.png],
|
|
the number of keys in the input set [figs/img20.png].
|
|
The number of vertices of [figs/img32.png] is equal
|
|
to [figs/img12.png] and [figs/img237.png] for BMZ algorithm and the CHM algorithm, respectively.
|
|
This measure is related to the amount of space to store the array [figs/img37.png].
|
|
This improves the space required to store a function in BMZ algorithm to [figs/img238.png] of the space required by the CHM algorithm.
|
|
The number of critical edges is [figs/img76.png] and 0, for BMZ algorithm and the CHM algorithm,
|
|
respectively.
|
|
BMZ algorithm generates random graphs that necessarily contains cycles and the
|
|
CHM algorithm
|
|
generates
|
|
acyclic random graphs.
|
|
Finally, the CHM algorithm generates [order preserving functions concepts.html]
|
|
while BMZ algorithm does not preserve order.
|
|
|
|
%!include(html): ''TABLE1.t2t''
|
|
| **Table 1:** Main characteristics of the algorithms.
|
|
|
|
----------------------------------------
|
|
|
|
==Memory Consumption==
|
|
|
|
- Memory consumption to generate the minimal perfect hash function (MPHF):
|
|
|| Algorithm | //c// | Memory consumption to generate a MPHF |
|
|
| BMZ | 0.93 | //24.80n + O(1)// |
|
|
| BMZ | 1.15 | //26.42n + O(1)// |
|
|
| CHM | 2.09 | //33.00n + O(1)// |
|
|
|
|
| **Table 2:** Memory consumption to generate a MPHF using the algorithms BMZ and CHM.
|
|
|
|
- Memory consumption to store the resulting minimal perfect hash function (MPHF):
|
|
|| Algorithm | //c// | Memory consumption to store a MPHF |
|
|
| BMZ | 0.93 | //3.72n// |
|
|
| BMZ | 1.15 | //4.60n// |
|
|
| CHM | 2.09 | //8.36n// |
|
|
|
|
| **Table 3:** Memory consumption to store a MPHF generated by the algorithms BMZ and CHM.
|
|
|
|
----------------------------------------
|
|
|
|
==Run times==
|
|
We now present some experimental results to compare the BMZ and CHM algorithms.
|
|
The data consists of a collection of 100 million universe resource locations
|
|
(URLs) collected from the Web.
|
|
The average length of a URL in the collection is 63 bytes.
|
|
All experiments were carried on
|
|
a computer running the Linux operating system, version 2.6.7,
|
|
with a 2.4 gigahertz processor and
|
|
4 gigabytes of main memory.
|
|
|
|
Table 4 presents time measurements.
|
|
All times are in seconds.
|
|
The table entries represent averages over 50 trials.
|
|
The column labelled as [figs/img243.png] represents
|
|
the number of iterations to generate the random graph [figs/img32.png] in the
|
|
mapping step of the algorithms.
|
|
The next columns represent the run times
|
|
for the mapping plus ordering steps together and the searching
|
|
step for each algorithm.
|
|
The last column represents the percent gain of our algorithm
|
|
over the CHM algorithm.
|
|
|
|
%!include(html): ''TABLE4.t2t''
|
|
| **Table 4:** Time measurements for BMZ and the CHM algorithm.
|
|
|
|
The mapping step of the BMZ algorithm is faster because
|
|
the expected number of iterations in the mapping step to generate [figs/img32.png] are
|
|
2.13 and 2.92 for BMZ algorithm and the CHM algorithm, respectively
|
|
(see [[2 bmz.html#papers]] for details).
|
|
The graph [figs/img32.png] generated by BMZ algorithm
|
|
has [figs/img12.png] vertices, against [figs/img237.png] for the CHM algorithm.
|
|
These two facts make BMZ algorithm faster in the mapping step.
|
|
The ordering step of BMZ algorithm is approximately equal to
|
|
the time to check if [figs/img32.png] is acyclic for the CHM algorithm.
|
|
The searching step of the CHM algorithm is faster, but the total
|
|
time of BMZ algorithm is, on average, approximately 59 % faster
|
|
than the CHM algorithm.
|
|
It is important to notice the times for the searching step:
|
|
for both algorithms they are not the dominant times,
|
|
and the experimental results clearly show
|
|
a linear behavior for the searching step.
|
|
|
|
We now present run times for BMZ algorithm using a [heuristic bmz.html#heuristic] that
|
|
reduces the space requirement
|
|
to any given value between [figs/img12.png] words and [figs/img13.png] words.
|
|
For example, for [figs/img244.png] and [figs/img6.png], the analytical expected number
|
|
of iterations are [figs/img245.png] and [figs/img246.png], respectively
|
|
(for [figs/img247.png], the number of iterations are 2.78 for [figs/img244.png] and 3.04
|
|
for [figs/img6.png]).
|
|
Table 5 presents the total times to construct a
|
|
function for [figs/img247.png], with an increase from [figs/img248.png] seconds
|
|
for [figs/img128.png] (see Table 4) to [figs/img249.png] seconds for [figs/img244.png] and
|
|
to [figs/img250.png] seconds for [figs/img6.png].
|
|
|
|
%!include(html): ''TABLE5.t2t''
|
|
| **Table 5:** Time measurements for BMZ tuned algorithm with [figs/img5.png] and [figs/img6.png].
|
|
|
|
%!include: ALGORITHMS.t2t
|
|
|
|
%!include: FOOTER.t2t
|
|
|
|
%!include(html): ''GOOGLEANALYTICS.t2t'' |