122 lines
5.0 KiB
Plaintext
Executable File
122 lines
5.0 KiB
Plaintext
Executable File
BDZ Algorithm
|
|
|
|
|
|
%!includeconf: CONFIG.t2t
|
|
|
|
----------------------------------------
|
|
==Introduction==
|
|
|
|
Coming soon...
|
|
|
|
----------------------------------------
|
|
|
|
==The Algorithm==
|
|
|
|
Coming soon...
|
|
|
|
|
|
----------------------------------------
|
|
|
|
===Mapping Step===
|
|
|
|
Coming soon...
|
|
|
|
----------------------------------------
|
|
|
|
===Assigning Step===
|
|
|
|
Coming soon...
|
|
|
|
|
|
----------------------------------------
|
|
|
|
===Ranking Step===
|
|
|
|
Coming soon...
|
|
|
|
----------------------------------------
|
|
|
|
==Memory Consumption==
|
|
|
|
Now we detail the memory consumption to generate and to store minimal perfect hash functions
|
|
using the BDZ algorithm. The structures responsible for memory consumption are in the
|
|
following:
|
|
- 3-graph:
|
|
+ **first**: is a vector that stores //cn// integer numbers, each one representing
|
|
the first edge (index in the vector edges) in the list of
|
|
incident edges of each vertex. The integer numbers are 4 bytes long. Therefore,
|
|
the vector first is stored in //4cn// bytes.
|
|
|
|
+ **edges**: is a vector to represent the edges of the graph. As each edge
|
|
is compounded by three vertices, each entry stores three integer numbers
|
|
of 4 bytes that represent the vertices. As there are //n// edges, the
|
|
vector edges is stored in //12n// bytes.
|
|
|
|
+ **next**: given a vertex [figs/img139.png], we can discover the edges that
|
|
contain [figs/img139.png] following its list of incident edges,
|
|
which starts on first[[figs/img139.png]] and the next
|
|
edges are given by next[...first[[figs/img139.png]]...]. Therefore, the vectors first and next represent
|
|
the linked lists of edges of each vertex. As there are three vertices for each edge,
|
|
when an edge is iserted in the 3-graph, it must be inserted in the three linked lists
|
|
of the vertices in its composition. Therefore, there are //3n// entries of integer
|
|
numbers in the vector next, so it is stored in //4*3n = 12n// bytes.
|
|
|
|
+ **Vertices degree (vert_degree vector)**: is a vector of //cn// bytes
|
|
that represents the degree of each vertex. We can use just one byte for each
|
|
vertex because the 3-graph is sparse, once it has more vertices than edges.
|
|
Therefore, the vertices degree is represented in //cn// bytes.
|
|
|
|
- Acyclicity test:
|
|
+ **List of deleted edges obtained when we test whether the 3-graph is a forest (queue vector)**:
|
|
is a vector of //n// integer numbers containing indexes of vector edges. Therefore, it
|
|
requires //4n// bytes in internal memory.
|
|
|
|
+ **Marked edges in the acyclicity test (marked_edges vector)**:
|
|
is a bit vector of //n// bits to indicate the edges that have already been deleted during
|
|
the acyclicity test. Therefore, it requires //n/8// bytes in internal memory.
|
|
|
|
- MPHF description
|
|
+ **function //g//**: is represented by a vector of //2cn// bits. Therefore, it is
|
|
stored in //0.25cn// bytes
|
|
+ **ranktable**: is a lookup table used to store some precomputed ranking information.
|
|
It has //(cn)/(2^b)// entries of 4-byte integer numbers. Therefore it is stored in
|
|
//(4cn)/(2^b)// bytes. The larger is b, the more compact is the resulting MPHFs and
|
|
the slower are the functions. So b imposes a trade-of between space and time.
|
|
+ **Total**: 0.25cn + (4cn)/(2^b) bytes
|
|
|
|
|
|
Thus, the total memory consumption of BDZ algorithm for generating a minimal
|
|
perfect hash function (MPHF) is: //(28.125 + 5c)n + 0.25cn + (4cn)/(2^b) + O(1)// bytes.
|
|
As the value of constant //c// may be larger than or equal to 1.23 we have:
|
|
|| //c// | //b// | Memory consumption to generate a MPHF (in bytes) |
|
|
| 1.23 | //7// | //34.62n + O(1)// |
|
|
| 1.23 | //8// | //34.60n + O(1)// |
|
|
|
|
| **Table 1:** Memory consumption to generate a MPHF using the BDZ algorithm.
|
|
|
|
Now we present the memory consumption to store the resulting function.
|
|
So we have:
|
|
|| //c// | //b// | Memory consumption to store a MPHF (in bits) |
|
|
| 1.23 | //7// | //2.77n + O(1)// |
|
|
| 1.23 | //8// | //2.61n + O(1)// |
|
|
|
|
| **Table 2:** Memory consumption to store a MPHF generated by the BDZ algorithm.
|
|
----------------------------------------
|
|
|
|
==Experimental Results==
|
|
|
|
Experimental results to compare the BDZ algorithm with the other ones in the CMPH
|
|
library are presented in Botelho, Pagh and Ziviani [[1 #papers],[2 #papers]].
|
|
----------------------------------------
|
|
|
|
==Papers==[papers]
|
|
|
|
+ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], R. Pagh, [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [Simple and space-efficient minimal perfect hash functions papers/wads07.pdf]. //10th International Workshop on Algorithms and Data Structures (WADs'07),// Springer-Verlag Lecture Notes in Computer Science, vol. 4619, Halifax, Canada, August 2007, 139-150.
|
|
|
|
+ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho]. [Near Space-Optimal Perfect Hashing Algorithms papers/thesis.pdf]. //Thesis Proposal//, //Department of Computer Science//, //Federal University of Minas Gerais//, July 2007.
|
|
|
|
|
|
%!include: ALGORITHMS.t2t
|
|
|
|
%!include: FOOTER.t2t
|