Motiejus Jakštys
37e24524c2
git-subtree-dir: deps/cmph git-subtree-mainline:5040f4007b
git-subtree-split:a250982ade
88 lines
3.4 KiB
Plaintext
88 lines
3.4 KiB
Plaintext
CHM Algorithm
|
|
|
|
|
|
%!includeconf: CONFIG.t2t
|
|
|
|
----------------------------------------
|
|
|
|
==The Algorithm==
|
|
The algorithm is presented in [[1,2,3 #papers]].
|
|
----------------------------------------
|
|
|
|
==Memory Consumption==
|
|
|
|
Now we detail the memory consumption to generate and to store minimal perfect hash functions
|
|
using the CHM algorithm. The structures responsible for memory consumption are in the
|
|
following:
|
|
- Graph:
|
|
+ **first**: is a vector that stores //cn// integer numbers, each one representing
|
|
the first edge (index in the vector edges) in the list of
|
|
edges of each vertex.
|
|
The integer numbers are 4 bytes long. Therefore,
|
|
the vector first is stored in //4cn// bytes.
|
|
|
|
+ **edges**: is a vector to represent the edges of the graph. As each edge
|
|
is compounded by a pair of vertices, each entry stores two integer numbers
|
|
of 4 bytes that represent the vertices. As there are //n// edges, the
|
|
vector edges is stored in //8n// bytes.
|
|
|
|
+ **next**: given a vertex [figs/img139.png], we can discover the edges that
|
|
contain [figs/img139.png] following its list of edges, which starts on
|
|
first[[figs/img139.png]] and the next
|
|
edges are given by next[...first[[figs/img139.png]]...]. Therefore,
|
|
the vectors first and next represent
|
|
the linked lists of edges of each vertex. As there are two vertices for each edge,
|
|
when an edge is iserted in the graph, it must be inserted in the two linked lists
|
|
of the vertices in its composition. Therefore, there are //2n// entries of integer
|
|
numbers in the vector next, so it is stored in //4*2n = 8n// bytes.
|
|
|
|
- Other auxiliary structures
|
|
+ **visited**: is a vector of //cn// bits, where each bit indicates if the g value of
|
|
a given vertex was already defined. Therefore, the vector visited is stored
|
|
in //cn/8// bytes.
|
|
|
|
+ **function //g//**: is represented by a vector of //cn// integer numbers.
|
|
As each integer number is 4 bytes long, the function //g// is stored in
|
|
//4cn// bytes.
|
|
|
|
|
|
Thus, the total memory consumption of CHM algorithm for generating a minimal
|
|
perfect hash function (MPHF) is: //(8.125c + 16)n + O(1)// bytes.
|
|
As the value of constant //c// must be at least 2.09 we have:
|
|
|| //c// | Memory consumption to generate a MPHF |
|
|
| 2.09 | //33.00n + O(1)// |
|
|
|
|
| **Table 1:** Memory consumption to generate a MPHF using the CHM algorithm.
|
|
|
|
Now we present the memory consumption to store the resulting function.
|
|
We only need to store the //g// function. Thus, we need //4cn// bytes.
|
|
Again we have:
|
|
|| //c// | Memory consumption to store a MPHF |
|
|
| 2.09 | //8.36n// |
|
|
|
|
| **Table 2:** Memory consumption to store a MPHF generated by the CHM algorithm.
|
|
|
|
----------------------------------------
|
|
|
|
==Experimental Results==
|
|
|
|
[CHM x BMZ comparison.html]
|
|
|
|
----------------------------------------
|
|
|
|
==Papers==[papers]
|
|
|
|
+ Z.J. Czech, G. Havas, and B.S. Majewski. [An optimal algorithm for generating minimal perfect hash functions. papers/chm92.pdf], Information Processing Letters, 43(5):257-264, 1992.
|
|
|
|
+ Z.J. Czech, G. Havas, and B.S. Majewski. Fundamental study perfect hashing.
|
|
Theoretical Computer Science, 182:1-143, 1997.
|
|
|
|
+ B.S. Majewski, N.C. Wormald, G. Havas, and Z.J. Czech. A family of perfect hashing methods.
|
|
The Computer Journal, 39(6):547--554, 1996.
|
|
|
|
|
|
%!include: ALGORITHMS.t2t
|
|
|
|
%!include: FOOTER.t2t
|
|
|
|
%!include(html): ''GOOGLEANALYTICS.t2t'' |