BDZ Algorithm %!includeconf: CONFIG.t2t ---------------------------------------- ==Introduction== ---------------------------------------- ==The Algorithm== ---------------------------------------- ===Mapping Step=== ---------------------------------------- ===Assigning Step=== ---------------------------------------- ===Ranking Step=== ---------------------------------------- ==Memory Consumption== Now we detail the memory consumption to generate and to store minimal perfect hash functions using the BDZ algorithm. The structures responsible for memory consumption are in the following: - 3-graph: + **first**: is a vector that stores //cn// integer numbers, each one representing the first edge (index in the vector edges) in the list of incident edges of each vertex. The integer numbers are 4 bytes long. Therefore, the vector first is stored in //4cn// bytes. + **edges**: is a vector to represent the edges of the graph. As each edge is compounded by three vertices, each entry stores three integer numbers of 4 bytes that represent the vertices. As there are //n// edges, the vector edges is stored in //12n// bytes. + **next**: given a vertex [figs/img139.png], we can discover the edges that contain [figs/img139.png] following its list of incident edges, which starts on first[[figs/img139.png]] and the next edges are given by next[...first[[figs/img139.png]]...]. Therefore, the vectors first and next represent the linked lists of edges of each vertex. As there are three vertices for each edge, when an edge is iserted in the 3-graph, it must be inserted in the three linked lists of the vertices in its composition. Therefore, there are //3n// entries of integer numbers in the vector next, so it is stored in //4*3n = 12n// bytes. + **Vertices degree (vert_degree vector)**: is a vector of //cn// bytes that represents the degree of each vertex. We can use just one byte for each vertex because the 3-graph is sparse, once it has more vertices than edges. Therefore, the vertices degree is represented in //cn// bytes. - Acyclicity test: + **List of deleted edges obtained when we test whether the 3-graph is a forest (queue vector)**: is a vector of //n// integer numbers containing indexes of vector edges. Therefore, it requires //4n// bytes in internal memory. + **Marked edges in the acyclicity test (marked_edges vector)**: is a bit vector of //n// bits to indicate the edges that have already been deleted during the acyclicity test. Therefore, it requires //n/8// bytes in internal memory. - MPHF description + **function //g//**: is represented by a vector of //2cn// bits. Therefore, it is stored in //0.25cn// bytes + **ranktable**: is a lookup table used to store some precomputed ranking information. It has //(cn)/(2^b)// entries of 4-byte integer numbers. Therefore it is stored in //(4cn)/(2^b)// bytes. The larger is b, the more compact is the resulting MPHFs and the slower are the functions. So b imposes a trade-of between space and time. + **Total**: 0.25cn + (4cn)/(2^b) bytes Thus, the total memory consumption of BDZ algorithm for generating a minimal perfect hash function (MPHF) is: //(28.125 + 5c)n + 0.25cn + (4cn)/(2^b) + O(1)// bytes. As the value of constant //c// may be larger than or equal to 1.23 we have: || //c// | //b// | Memory consumption to generate a MPHF (in bytes) | | 1.23 | //7// | //34.62n + O(1)// | | 1.23 | //8// | //34.60n + O(1)// | | **Table 1:** Memory consumption to generate a MPHF using the BDZ algorithm. Now we present the memory consumption to store the resulting function. So we have: || //c// | //b// | Memory consumption to store a MPHF (in bits) | | 1.23 | //7// | //2.77n + O(1)// | | 1.23 | //8// | //2.61n + O(1)// | | **Table 2:** Memory consumption to store a MPHF generated by the BDZ algorithm. ---------------------------------------- ==Experimental Results== Experimental results to compare the BDZ algorithm with the other ones in the CMPH library are presented in Botelho, Pagh and Ziviani [[1 #papers],[2 #papers]]. ---------------------------------------- ==Papers==[papers] + [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], R. Pagh, [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [Simple and space-efficient minimal perfect hash functions papers/wads07.pdf]. //10th International Workshop on Algorithms and Data Structures (WADs'07),// Springer-Verlag Lecture Notes in Computer Science, vol. 4619, Halifax, Canada, August 2007, 139-150. + [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho]. [Near Space-Optimal Perfect Hashing Algorithms papers/thesis.pdf]. //Thesis Proposal//, //Department of Computer Science//, //Federal University of Minas Gerais//, July 2007. %!include: ALGORITHMS.t2t %!include: FOOTER.t2t