turbonss/COMPARISON.t2t
2009-06-12 21:49:26 -03:00

111 lines
4.9 KiB
Plaintext

Comparison Between BMZ And CHM Algorithms
%!includeconf: CONFIG.t2t
----------------------------------------
==Characteristics==
Table 1 presents the main characteristics of the two algorithms.
The number of edges in the graph [figs/img27.png] is [figs/img236.png],
the number of keys in the input set [figs/img20.png].
The number of vertices of [figs/img32.png] is equal
to [figs/img12.png] and [figs/img237.png] for BMZ algorithm and the CHM algorithm, respectively.
This measure is related to the amount of space to store the array [figs/img37.png].
This improves the space required to store a function in BMZ algorithm to [figs/img238.png] of the space required by the CHM algorithm.
The number of critical edges is [figs/img76.png] and 0, for BMZ algorithm and the CHM algorithm,
respectively.
BMZ algorithm generates random graphs that necessarily contains cycles and the
CHM algorithm
generates
acyclic random graphs.
Finally, the CHM algorithm generates [order preserving functions concepts.html]
while BMZ algorithm does not preserve order.
%!include(html): ''TABLE1.t2t''
| **Table 1:** Main characteristics of the algorithms.
----------------------------------------
==Memory Consumption==
- Memory consumption to generate the minimal perfect hash function (MPHF):
|| Algorithm | //c// | Memory consumption to generate a MPHF |
| BMZ | 0.93 | //24.80n + O(1)// |
| BMZ | 1.15 | //26.42n + O(1)// |
| CHM | 2.09 | //33.00n + O(1)// |
| **Table 2:** Memory consumption to generate a MPHF using the algorithms BMZ and CHM.
- Memory consumption to store the resulting minimal perfect hash function (MPHF):
|| Algorithm | //c// | Memory consumption to store a MPHF |
| BMZ | 0.93 | //3.72n// |
| BMZ | 1.15 | //4.60n// |
| CHM | 2.09 | //8.36n// |
| **Table 3:** Memory consumption to store a MPHF generated by the algorithms BMZ and CHM.
----------------------------------------
==Run times==
We now present some experimental results to compare the BMZ and CHM algorithms.
The data consists of a collection of 100 million universe resource locations
(URLs) collected from the Web.
The average length of a URL in the collection is 63 bytes.
All experiments were carried on
a computer running the Linux operating system, version 2.6.7,
with a 2.4 gigahertz processor and
4 gigabytes of main memory.
Table 4 presents time measurements.
All times are in seconds.
The table entries represent averages over 50 trials.
The column labelled as [figs/img243.png] represents
the number of iterations to generate the random graph [figs/img32.png] in the
mapping step of the algorithms.
The next columns represent the run times
for the mapping plus ordering steps together and the searching
step for each algorithm.
The last column represents the percent gain of our algorithm
over the CHM algorithm.
%!include(html): ''TABLE4.t2t''
| **Table 4:** Time measurements for BMZ and the CHM algorithm.
The mapping step of the BMZ algorithm is faster because
the expected number of iterations in the mapping step to generate [figs/img32.png] are
2.13 and 2.92 for BMZ algorithm and the CHM algorithm, respectively
(see [[2 bmz.html#papers]] for details).
The graph [figs/img32.png] generated by BMZ algorithm
has [figs/img12.png] vertices, against [figs/img237.png] for the CHM algorithm.
These two facts make BMZ algorithm faster in the mapping step.
The ordering step of BMZ algorithm is approximately equal to
the time to check if [figs/img32.png] is acyclic for the CHM algorithm.
The searching step of the CHM algorithm is faster, but the total
time of BMZ algorithm is, on average, approximately 59 % faster
than the CHM algorithm.
It is important to notice the times for the searching step:
for both algorithms they are not the dominant times,
and the experimental results clearly show
a linear behavior for the searching step.
We now present run times for BMZ algorithm using a [heuristic bmz.html#heuristic] that
reduces the space requirement
to any given value between [figs/img12.png] words and [figs/img13.png] words.
For example, for [figs/img244.png] and [figs/img6.png], the analytical expected number
of iterations are [figs/img245.png] and [figs/img246.png], respectively
(for [figs/img247.png], the number of iterations are 2.78 for [figs/img244.png] and 3.04
for [figs/img6.png]).
Table 5 presents the total times to construct a
function for [figs/img247.png], with an increase from [figs/img248.png] seconds
for [figs/img128.png] (see Table 4) to [figs/img249.png] seconds for [figs/img244.png] and
to [figs/img250.png] seconds for [figs/img6.png].
%!include(html): ''TABLE5.t2t''
| **Table 5:** Time measurements for BMZ tuned algorithm with [figs/img5.png] and [figs/img6.png].
%!include: ALGORITHMS.t2t
%!include: FOOTER.t2t
%!include(html): ''GOOGLEANALYTICS.t2t''