2008-03-23 05:45:01 +02:00
BDZ Algorithm
2008-03-23 04:17:44 +02:00
%!includeconf: CONFIG.t2t
----------------------------------------
2008-03-23 05:45:01 +02:00
==Introduction==
2008-03-23 04:17:44 +02:00
2008-03-30 02:59:30 +02:00
Coming soon...
2008-03-23 04:17:44 +02:00
----------------------------------------
==The Algorithm==
2008-03-30 02:59:30 +02:00
Coming soon...
2008-03-23 04:17:44 +02:00
----------------------------------------
===Mapping Step===
2008-03-30 02:59:30 +02:00
Coming soon...
2008-03-23 04:17:44 +02:00
----------------------------------------
2008-03-23 05:45:01 +02:00
===Assigning Step===
2008-03-23 04:17:44 +02:00
2008-03-30 02:59:30 +02:00
Coming soon...
2008-03-23 04:17:44 +02:00
----------------------------------------
2008-03-23 05:45:01 +02:00
===Ranking Step===
2008-03-23 04:17:44 +02:00
2008-03-30 02:59:30 +02:00
Coming soon...
2008-03-23 04:17:44 +02:00
----------------------------------------
==Memory Consumption==
Now we detail the memory consumption to generate and to store minimal perfect hash functions
2008-03-23 05:45:01 +02:00
using the BDZ algorithm. The structures responsible for memory consumption are in the
2008-03-23 04:17:44 +02:00
following:
2008-03-23 05:45:01 +02:00
- 3-graph:
2008-03-23 04:17:44 +02:00
+ **first**: is a vector that stores //cn// integer numbers, each one representing
the first edge (index in the vector edges) in the list of
2008-03-23 05:45:01 +02:00
incident edges of each vertex. The integer numbers are 4 bytes long. Therefore,
2008-03-23 04:17:44 +02:00
the vector first is stored in //4cn// bytes.
+ **edges**: is a vector to represent the edges of the graph. As each edge
2008-03-23 05:45:01 +02:00
is compounded by three vertices, each entry stores three integer numbers
2008-03-23 04:17:44 +02:00
of 4 bytes that represent the vertices. As there are //n// edges, the
2008-03-23 05:45:01 +02:00
vector edges is stored in //12n// bytes.
2008-03-23 04:17:44 +02:00
+ **next**: given a vertex [figs/img139.png], we can discover the edges that
2008-03-23 05:45:01 +02:00
contain [figs/img139.png] following its list of incident edges,
2008-03-23 04:17:44 +02:00
which starts on first[[figs/img139.png]] and the next
edges are given by next[...first[[figs/img139.png]]...]. Therefore, the vectors first and next represent
2008-03-23 05:45:01 +02:00
the linked lists of edges of each vertex. As there are three vertices for each edge,
when an edge is iserted in the 3-graph, it must be inserted in the three linked lists
of the vertices in its composition. Therefore, there are //3n// entries of integer
numbers in the vector next, so it is stored in //4*3n = 12n// bytes.
2008-03-23 04:17:44 +02:00
2008-03-23 05:45:01 +02:00
+ **Vertices degree (vert_degree vector)**: is a vector of //cn// bytes
that represents the degree of each vertex. We can use just one byte for each
vertex because the 3-graph is sparse, once it has more vertices than edges.
Therefore, the vertices degree is represented in //cn// bytes.
- Acyclicity test:
+ **List of deleted edges obtained when we test whether the 3-graph is a forest (queue vector)**:
is a vector of //n// integer numbers containing indexes of vector edges. Therefore, it
requires //4n// bytes in internal memory.
2008-03-23 04:17:44 +02:00
2008-03-23 05:45:01 +02:00
+ **Marked edges in the acyclicity test (marked_edges vector)**:
is a bit vector of //n// bits to indicate the edges that have already been deleted during
the acyclicity test. Therefore, it requires //n/8// bytes in internal memory.
- MPHF description
+ **function //g//**: is represented by a vector of //2cn// bits. Therefore, it is
stored in //0.25cn// bytes
+ **ranktable**: is a lookup table used to store some precomputed ranking information.
It has //(cn)/(2^b)// entries of 4-byte integer numbers. Therefore it is stored in
//(4cn)/(2^b)// bytes. The larger is b, the more compact is the resulting MPHFs and
the slower are the functions. So b imposes a trade-of between space and time.
+ **Total**: 0.25cn + (4cn)/(2^b) bytes
2008-03-23 04:17:44 +02:00
2008-03-23 05:45:01 +02:00
Thus, the total memory consumption of BDZ algorithm for generating a minimal
perfect hash function (MPHF) is: //(28.125 + 5c)n + 0.25cn + (4cn)/(2^b) + O(1)// bytes.
As the value of constant //c// may be larger than or equal to 1.23 we have:
|| //c// | //b// | Memory consumption to generate a MPHF (in bytes) |
| 1.23 | //7// | //34.62n + O(1)// |
| 1.23 | //8// | //34.60n + O(1)// |
2008-03-23 04:17:44 +02:00
2008-03-23 05:45:01 +02:00
| **Table 1:** Memory consumption to generate a MPHF using the BDZ algorithm.
2008-03-23 04:17:44 +02:00
Now we present the memory consumption to store the resulting function.
2008-03-23 05:45:01 +02:00
So we have:
|| //c// | //b// | Memory consumption to store a MPHF (in bits) |
| 1.23 | //7// | //2.77n + O(1)// |
| 1.23 | //8// | //2.61n + O(1)// |
2008-03-23 04:17:44 +02:00
2008-03-23 05:45:01 +02:00
| **Table 2:** Memory consumption to store a MPHF generated by the BDZ algorithm.
2008-03-23 04:17:44 +02:00
----------------------------------------
==Experimental Results==
2008-03-23 05:45:01 +02:00
Experimental results to compare the BDZ algorithm with the other ones in the CMPH
library are presented in Botelho, Pagh and Ziviani [[1 #papers],[2 #papers]].
2008-03-23 04:17:44 +02:00
----------------------------------------
==Papers==[papers]
2008-03-23 05:45:01 +02:00
+ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], R. Pagh, [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [Simple and space-efficient minimal perfect hash functions papers/wads07.pdf]. //10th International Workshop on Algorithms and Data Structures (WADs'07),// Springer-Verlag Lecture Notes in Computer Science, vol. 4619, Halifax, Canada, August 2007, 139-150.
2008-03-23 04:17:44 +02:00
2008-03-23 05:45:01 +02:00
+ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho]. [Near Space-Optimal Perfect Hashing Algorithms papers/thesis.pdf]. //Thesis Proposal//, //Department of Computer Science//, //Federal University of Minas Gerais//, July 2007.
2008-03-23 04:17:44 +02:00
%!include: ALGORITHMS.t2t
%!include: FOOTER.t2t