Comparison Between BMZ And CHM Algorithms
Characteristics
Table 1 presents the main characteristics of the two algorithms.
The number of edges in the graph is ,
the number of keys in the input set .
The number of vertices of is equal
to and for BMZ algorithm and the CHM algorithm, respectively.
This measure is related to the amount of space to store the array .
This improves the space required to store a function in BMZ algorithm to of the space required by the CHM algorithm.
The number of critical edges is and 0, for BMZ algorithm and the CHM algorithm,
respectively.
BMZ algorithm generates random graphs that necessarily contains cycles and the
CHM algorithm
generates
acyclic random graphs.
Finally, the CHM algorithm generates order preserving functions
while BMZ algorithm does not preserve order.
Table 1: Main characteristics of the algorithms. |
Memory Consumption
- Memory consumption to generate the minimal perfect hash function (MPHF):
Algorithm |
c |
Memory consumption to generate a MPHF |
BMZ |
0.93 |
24.80n + O(1) |
BMZ |
1.15 |
26.42n + O(1) |
CHM |
2.09 |
33.00n + O(1) |
Table 2: Memory consumption to generate a MPHF using the algorithms BMZ and CHM. |
- Memory consumption to store the resulting minimal perfect hash function (MPHF):
Algorithm |
c |
Memory consumption to store a MPHF |
BMZ |
0.93 |
3.72n |
BMZ |
1.15 |
4.60n |
CHM |
2.09 |
8.36n |
Table 3: Memory consumption to store a MPHF generated by the algorithms BMZ and CHM. |
Run times
We now present some experimental results to compare the BMZ and CHM algorithms.
The data consists of a collection of 100 million universe resource locations
(URLs) collected from the Web.
The average length of a URL in the collection is 63 bytes.
All experiments were carried on
a computer running the Linux operating system, version 2.6.7,
with a 2.4 gigahertz processor and
4 gigabytes of main memory.
Table 4 presents time measurements.
All times are in seconds.
The table entries represent averages over 50 trials.
The column labelled as represents
the number of iterations to generate the random graph in the
mapping step of the algorithms.
The next columns represent the run times
for the mapping plus ordering steps together and the searching
step for each algorithm.
The last column represents the percent gain of our algorithm
over the CHM algorithm.
Table 4: Time measurements for BMZ and the CHM algorithm. |
The mapping step of the BMZ algorithm is faster because
the expected number of iterations in the mapping step to generate are
2.13 and 2.92 for BMZ algorithm and the CHM algorithm, respectively
(see [2] for details).
The graph generated by BMZ algorithm
has vertices, against for the CHM algorithm.
These two facts make BMZ algorithm faster in the mapping step.
The ordering step of BMZ algorithm is approximately equal to
the time to check if is acyclic for the CHM algorithm.
The searching step of the CHM algorithm is faster, but the total
time of BMZ algorithm is, on average, approximately 59 % faster
than the CHM algorithm.
It is important to notice the times for the searching step:
for both algorithms they are not the dominant times,
and the experimental results clearly show
a linear behavior for the searching step.
We now present run times for BMZ algorithm using a heuristic that
reduces the space requirement
to any given value between words and words.
For example, for and , the analytical expected number
of iterations are and , respectively
(for , the number of iterations are 2.78 for and 3.04
for ).
Table 5 presents the total times to construct a
function for , with an increase from seconds
for (see Table 4) to seconds for and
to seconds for .
Table 5: Time measurements for BMZ tuned algorithm with and . |
Enjoy!
Davi de Castro Reis
Djamel Belazzougui
Fabiano Cupertino Botelho
Nivio Ziviani