diff --git a/ALGORITHMS.t2t b/ALGORITHMS.t2t index c78fac9..8e1536e 100644 --- a/ALGORITHMS.t2t +++ b/ALGORITHMS.t2t @@ -2,5 +2,5 @@ ---------------------------------------- - | [Home index.html] | [BDZ bdz.html] | [BMZ bmz.html] | [CHM chm.html] | [BRZ brz.html] | [FCH fch.html] + | [Home index.html] | [CHD chd.html] | [BDZ bdz.html] | [BMZ bmz.html] | [CHM chm.html] | [BRZ brz.html] | [FCH fch.html] ---------------------------------------- diff --git a/BDZ.t2t b/BDZ.t2t index e0b2a45..02d0ae2 100755 --- a/BDZ.t2t +++ b/BDZ.t2t @@ -6,33 +6,64 @@ BDZ Algorithm ---------------------------------------- ==Introduction== -Coming soon... +The BDZ algorithm was designed by Fabiano C. Botelho, Djamal Belazzougui, Rasmus Pagh and Nivio Ziviani. It is a simple, efficient, near-optimal space and practical algorithm to generate a family [figs/bdz/img8.png] of PHFs and MPHFs. It is also referred to as BPZ algorithm because the work presented by Botelho, Pagh and Ziviani in [[2 #papers]]. In the Botelho's PhD. dissertation [[1 #papers]] it is also referred to as RAM algorithm because it is more suitable for key sets that can be handled in internal memory. + +The BDZ algorithm uses //r//-uniform random hypergraphs given by function values of //r// uniform random hash functions on the input key set //S// for generating PHFs and MPHFs that require //O(n)// bits to be stored. A hypergraph is the generalization of a standard undirected graph where each edge connects [figs/bdz/img12.png] vertices. This idea is not new, see e.g. [[8 #papers]], but we have proceeded differently to achieve a space usage of //O(n)// bits rather than //O(n log n)// bits. Evaluation time for all schemes considered is constant. For //r=3// we obtain a space usage of approximately //2.6n// bits for an MPHF. More compact, and even simpler, representations can be achieved for larger //m//. For example, for //m=1.23n// we can get a space usage of //1.95n// bits. + +Our best MPHF space upper bound is within a factor of //2// from the information theoretical lower bound of approximately //1.44// bits. We have shown that the BDZ algorithm is far more practical than previous methods with proven space complexity, both because of its simplicity, and because the constant factor of the space complexity is more than //6// times lower than its closest competitor, for plausible problem sizes. We verify the practicality experimentally, using slightly more space than in the mentioned theoretical bounds. ---------------------------------------- ==The Algorithm== -Coming soon... +The BDZ algorithm is a three-step algorithm that generates PHFs and MPHFs based on random //r//-partite hypergraphs. This is an approach that provides a much tighter analysis and is much more simple than the one presented in [[3 #papers]], where it was implicit how to construct similar PHFs.The fastest and most compact functions are generated when //r=3//. In this case a PHF can be stored in approximately //1.95// bits per key and an MPHF in approximately //2.62// bits per key. + +Figure 1 gives an overview of the algorithm for //r=3//, taking as input a key set [figs/bdz/img22.png] containing three English words, i.e., //S={who,band,the}//. The edge-oriented data structure proposed in [[4 #papers]] is used to represent hypergraphs, where each edge is explicitly represented as an array of //r// vertices and, for each vertex //v//, there is a list of edges that are incident on //v//. + + | [figs/bdz/img50.png] + | **Figure 1:** (a) The mapping step generates a random acyclic //3//-partite hypergraph + | with //m=6// vertices and //n=3// edges and a list [figs/bdz/img4.png] of edges obtained when we test + | whether the hypergraph is acyclic. (b) The assigning step builds an array //g// that + | maps values from //[0,5]// to //[0,3]// to uniquely assign an edge to a vertex. (c) The ranking + | step builds the data structure used to compute function //rank// in //O(1)// time. ----------------------------------------- -===Mapping Step=== +The //Mapping Step// in Figure 1(a) carries out two important tasks: -Coming soon... - ----------------------------------------- - -===Assigning Step=== - -Coming soon... ++ It assumes that it is possible to find three uniform hash functions //h,,0,,//, //h,,1,,// and //h,,2,,//, with ranges //{0,1}//, //{2,3}// and //{4,5}//, respectively. These functions build an one-to-one mapping of the key set //S// to the edge set //E// of a random acyclic //3//-partite hypergraph //G=(V,E)//, where //|V|=m=6// and //|E|=n=3//. In [[1,2 #papers]] it is shown that it is possible to obtain such a hypergraph with probability tending to //1// as //n// tends to infinity whenever //m=cn// and //c > 1.22//. The value of that minimizes the hypergraph size (and thereby the amount of bits to represent the resulting functions) is in the range //(1.22,1.23)//. To illustrate the mapping, key "who" is mapped to edge //{h,,0,,("who"), h,,1,,("who"), h,,2,,("who")} = {1,3,5}//, key "band" is mapped to edge //{h,,0,,("band"), h,,1,,("band"), h,,2,,("band")} = {1,2,4}//, and key "the" is mapped to edge //{h,,0,,("the"), h,,1,,("the"), h,,2,,("the")} = {0,2,5}//. ++ It tests whether the resulting random //3//-partite hypergraph contains cycles by iteratively deleting edges connecting vertices of degree 1. The deleted edges are stored in the order of deletion in a list [figs/bdz/img4.png] to be used in the assigning step. The first deleted edge in Figure 1(a) was //{1,2,4}//, the second one was //{1,3,5}// and the third one was //{0,2,5}//. If it ends with an empty graph, then the test succeeds, otherwise it fails. ----------------------------------------- -===Ranking Step=== +We now show how to use the Jenkins hash functions [[7 #papers]] to implement the three hash functions //h,,i,,//, which map values from //S// to //V,,i,,//, where [figs/bdz/img52.png]. These functions are used to build a random //3//-partite hypergraph, where [figs/bdz/img53.png] and [figs/bdz/img54.png]. Let [figs/bdz/img55.png] be a Jenkins hash function for [figs/bdz/img56.png], where +//w=32 or 64// for 32-bit and 64-bit architectures, respectively. +Let //H'// be an array of 3 //w//-bit values. The Jenkins hash function +allow us to compute in parallel the three entries in //H'// +and thereby the three hash functions //h,,i,,//, as follows: + + | //H' = h'(x)// + | //h,,0,,(x) = H'[0] mod// [figs/bdz/img136.png] + | //h,,1,,(x) = H'[1] mod// [figs/bdz/img136.png] //+// [figs/bdz/img136.png] + | //h,,2,,(x) = H'[2] mod// [figs/bdz/img136.png] //+ 2//[figs/bdz/img136.png] + + +The //Assigning Step// in Figure 1(b) outputs a PHF that maps the key set //S// into the range //[0,m-1]// and is represented by an array //g// storing values from the range //[0,3]//. The array //g// allows to select one out of the //3// vertices of a given edge, which is associated with a key //k//. A vertex for a key //k// is given by either //h,,0,,(k)//, //h,,1,,(k)// or //h,,2,,(k)//. The function //h,,i,,(k)// to be used for //k// is chosen by calculating //i = (g[h,,0,,(k)] + g[h,,1,,(k)] + g[h,,2,,(k)]) mod 3//. For instance, the values 1 and 4 represent the keys "who" and "band" because //i = (g[1] + g[3] + g[5]) mod 3 = 0// and //h,,0,,("who") = 1//, and //i = (g[1] + g[2] + g[4]) mod 3 = 2// and //h,,2,,("band") = 4//, respectively. The assigning step firstly initializes //g[i]=3// to mark every vertex as unassigned and //Visited[i]= false//, [figs/bdz/img88.png]. Let //Visited// be a boolean vector of size //m// to indicate whether a vertex has been visited. Then, for each edge [figs/bdz/img90.png] from tail to head, it looks for the first vertex //u// belonging //e// not yet visited. This is a sufficient condition for success [[1,2,8 #papers]]. Let //j// be the index of //u// in //e// for //j// in the range //[0,2]//. Then, it assigns [figs/bdz/img95.png]. Whenever it passes through a vertex //u// from //e//, if //u// has not yet been visited, it sets //Visited[u] = true//. + + +If we stop the BDZ algorithm in the assigning step we obtain a PHF with range //[0,m-1]//. The PHF has the following form: //phf(x) = h,,i(x),,(x)//, where key //x// is in //S// and //i(x) = (g[h,,0,,(x)] + g[h,,1,,(x)] + g[h,,2,,(x)]) mod 3//. In this case we do not need information for ranking and can set //g[i] = 0// whenever //g[i]// is equal to //3//, where //i// is in the range //[0,m-1]//. Therefore, the range of the values stored in //g// is narrowed from //[0,3]// to //[0,2]//. By using arithmetic coding as block of values (see [[1,2 #papers]] for details), or any compression technique that allows to perform random access in constant time to an array of compressed values [[5,6,12 #papers]], we can store the resulting PHFs in //mlog 3 = cnlog 3// bits, where //c > 1.22//. For //c = 1.23//, the space requirement is //1.95n// bits. + +The //Ranking Step// in Figure 1 (c) outputs a data structure that permits to narrow the range of a PHF generated in the assigning step from //[0,m-1]// to //[0,n-1]// and thereby an MPHF is produced. The data structure allows to compute in constant time a function //rank// from //[0,m-1]// to //[0,n-1]// that counts the number of assigned positions before a given position //v// in //g//. For instance, //rank(4) = 2// because the positions //0// and //1// are assigned since //g[0]// and //g[1]// are not equal to //3//. + + +For the implementation of the ranking step we have borrowed a simple and efficient implementation from [[10 #papers]]. It requires [figs/bdz/img111.png] additional bits of space, where [figs/bdz/img112.png], and is obtained by storing explicitly the //rank// of every //k//th index in a rankTable, where [figs/bdz/img114.png]. The larger is //k// the more compact is the resulting MPHF. Therefore, the users can tradeoff space for evaluation time by setting //k// appropriately in the implementation. We only allow values for //k// that are power of two (i.e., //k=2^^b,,k,,^^// for some constant //b,,k,,// in order to replace the expensive division and modulo operations by bit-shift and bitwise "and" operations, respectively. We have used //k=256// in the experiments for generating more succinct MPHFs. We remark that it is still possible to obtain a more compact data structure by using the results presented in [[9,11 #papers]], but at the cost of a much more complex implementation. + + +We need to use an additional lookup table //T,,r,,// to guarantee the constant evaluation time of //rank(u)//. Let us illustrate how //rank(u)// is computed using both the rankTable and the lookup table //T,,r,,//. We first look up the rank of the largest precomputed index //v// lower than or equal to //u// in the rankTable, and use //T,,r,,// to count the number of assigned vertices from position //v// to //u-1//. The lookup table //T_r// allows us to count in constant time the number of assigned vertices in [figs/bdz/img122.png] bits, where [figs/bdz/img112.png]. Thus the actual evaluation time is [figs/bdz/img123.png]. For simplicity and without loss of generality we let [figs/bdz/img124.png] be a multiple of the number of bits [figs/bdz/img125.png] used to encode each entry of //g//. As the values in //g// come from the range //[0,3]//, +then [figs/bdz/img126.png] bits and we have tried [figs/bdz/img124.png] equal to //8// and //16//. We would expect that [figs/bdz/img124.png] equal to 16 should provide a faster evaluation time because we would need to carry out fewer lookups in //T,,r,,//. But, for both values the lookup table //T,,r,,// fits entirely in the CPU cache and we did not realize any significant difference in the evaluation times. Therefore we settle for the value //8//. We remark that each value of //r// requires a different lookup table //T,,r,, that can be generated a priori. + +The resulting MPHFs have the following form: //h(x) = rank(phf(x))//. Then, we cannot get rid of the raking information by replacing the values 3 by 0 in the entries of //g//. In this case each entry in the array //g// is encoded with //2// bits and we need [figs/bdz/img133.png] additional bits to compute function //rank// in constant time. Then, the total space to store the resulting functions is [figs/bdz/img134.png] bits. By using //c = 1.23// and [figs/bdz/img135.png] we have obtained MPHFs that require approximately //2.62// bits per key to be stored. -Coming soon... ---------------------------------------- @@ -106,16 +137,38 @@ So we have: ==Experimental Results== Experimental results to compare the BDZ algorithm with the other ones in the CMPH -library are presented in Botelho, Pagh and Ziviani [[1 #papers],[2 #papers]]. +library are presented in Botelho, Pagh and Ziviani [[1,2 #papers]. ---------------------------------------- ==Papers==[papers] -+ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], R. Pagh, [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [Simple and space-efficient minimal perfect hash functions papers/wads07.pdf]. //10th International Workshop on Algorithms and Data Structures (WADs'07),// Springer-Verlag Lecture Notes in Computer Science, vol. 4619, Halifax, Canada, August 2007, 139-150. ++ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho]. [Near-Optimal Space Perfect Hashing Algorithms papers/thesis.pdf]. //PhD. Thesis//, //Department of Computer Science//, //Federal University of Minas Gerais//, September 2008. Supervised by [N. Ziviani http://www.dcc.ufmg.br/~nivio]. -+ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho]. [Near Space-Optimal Perfect Hashing Algorithms papers/thesis.pdf]. //Thesis Proposal//, //Department of Computer Science//, //Federal University of Minas Gerais//, July 2007. ++ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], [R. Pagh http://www.itu.dk/~pagh/], [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [Simple and space-efficient minimal perfect hash functions papers/wads07.pdf]. //In Proceedings of the 10th International Workshop on Algorithms and Data Structures (WADs'07),// Springer-Verlag Lecture Notes in Computer Science, vol. 4619, Halifax, Canada, August 2007, 139-150. + ++ B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal. The bloomier filter: An efficient data structure for static support lookup tables. //In Proceedings of the 15th annual ACM-SIAM symposium on Discrete algorithms (SODA'04)//, pages 30–39, Philadelphia, PA, USA, 2004. Society for Industrial and Applied Mathematics. + ++ J. Ebert. A versatile data structure for edges oriented graph algorithms. //Communication of The ACM//, (30):513–519, 1987. + ++ K. Fredriksson and F. Nikitin. Simple compression code supporting random access and fast string matching. //In Proceedings of the 6th International Workshop on Efficient and Experimental Algorithms (WEA’07)//, pages 203–216, 2007. + ++ R. Gonzalez and G. Navarro. Statistical encoding of succinct data structures. //In Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM’06)//, pages 294–305, 2006. + ++ B. Jenkins. Algorithm alley: Hash functions. //Dr. Dobb's Journal of Software Tools//, 22(9), september 1997. Extended version available at [http://burtleburtle.net/bob/hash/doobs.html http://burtleburtle.net/bob/hash/doobs.html]. + ++ B.S. Majewski, N.C. Wormald, G. Havas, and Z.J. Czech. A family of perfect hashing methods. //The Computer Journal//, 39(6):547–554, 1996. + ++ D. Okanohara and K. Sadakane. Practical entropy-compressed rank/select dictionary. //In Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX’07)//, 2007. + ++ [R. Pagh http://www.itu.dk/~pagh/]. Low redundancy in static dictionaries with constant query time. //SIAM Journal on Computing//, 31(2):353–363, 2001. + ++ R. Raman, V. Raman, and S. S. Rao. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. //In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms (SODA’02)//, pages 233–242, Philadelphia PA, USA, 2002. Society for Industrial and Applied Mathematics. + ++ K. Sadakane and R. Grossi. Squeezing succinct data structures into entropy bounds. //In Proceedings of the 17th annual ACM-SIAM symposium on Discrete algorithms (SODA’06)//, pages 1230–1239, 2006. %!include: ALGORITHMS.t2t %!include: FOOTER.t2t + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/BMZ.t2t b/BMZ.t2t index 72921a6..8d0460f 100644 --- a/BMZ.t2t +++ b/BMZ.t2t @@ -401,3 +401,5 @@ Again we have: %!include: ALGORITHMS.t2t %!include: FOOTER.t2t + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/BRZ.t2t b/BRZ.t2t index 079029a..59c032f 100644 --- a/BRZ.t2t +++ b/BRZ.t2t @@ -436,3 +436,5 @@ has smart policies for avoiding seeks and diminishing the average seek time %!include: ALGORITHMS.t2t %!include: FOOTER.t2t + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/CHD.t2t b/CHD.t2t new file mode 100644 index 0000000..f17a142 --- /dev/null +++ b/CHD.t2t @@ -0,0 +1,44 @@ +Compress, Hash and Displace: CHD Algorithm + + +%!includeconf: CONFIG.t2t + +---------------------------------------- +==Introduction== + +The important performance parameters of a PHF are representation size, evaluation time and construction time. The representation size plays an important role when the whole function fits in a faster memory and the actual data is stored in a slower memory. For instace, compact PHFs can be entirely fit in a CPU cache and this makes their computation really fast by avoiding cache misses. The CHD algorithm plays an important role in this context. It was designed by Djamal Belazzougui, Fabiano C. Botelho, and Martin Dietzfelbinger in [[2 #papers]]. + + +The CHD algorithm permits to obtain PHFs with representation size very close to optimal while retaining //O(n)// construction time and //O(1)// evaluation time. For example, in the case //m=2n// we obtain a PHF that uses space //0.67// bits per key, and for //m=1.23n// we obtain space //1.4// bits per key, which was not achievable with previously known methods. The CHD algorithm is inspired by several known algorithms; the main new feature is that it combines a modification of Pagh's ``hash-and-displace'' approach with data compression on a sequence of hash function indices. That combination makes it possible to significantly reduce space usage while retaining linear construction time and constant query time. The CHD algorithm can also be used for //k//-perfect hashing, where at most //k// keys may be mapped to the same value. For the analysis we assume that fully random hash functions are given for free; such assumptions can be justified and were made in previous papers. + +The compact PHFs generated by the CHD algorithm can be used in many applications in which we want to assign a unique identifier to each key without storing any information on the key. One of the most obvious applications of those functions (or //k//-perfect hash functions) is when we have a small fast memory in which we can store the perfect hash function while the keys and associated satellite data are stored in slower but larger memory. The size of a block or a transfer unit may be chosen so that //k// data items can be retrieved in one read access. In this case we can ensure that data associated with a key can be retrieved in a single probe to slower memory. This has been used for example in hardware routers [[4 #papers]]. + + +The CHD algorithm generates the most compact PHFs and MPHFs we know of in //O(n)// time. The time required to evaluate the generated functions is constant (in practice less than //1.4// microseconds). The storage space of the resulting PHFs and MPHFs are distant from the information theoretic lower bound by a factor of //1.43//. The closest competitor is the algorithm by Martin and Pagh [[3 #papers]] but their algorithm do not work in linear time. Furthermore, the CHD algorithm can be tuned to run faster than the BPZ algorithm [[1 #papers]] (the fastest algorithm available in the literature so far) and to obtain more compact functions. The most impressive characteristic is that it has the ability, in principle, to approximate the information theoretic lower bound while being practical. A detailed description of the CHD algorithm can be found in [[2 #papers]]. + + + +---------------------------------------- + +==Experimental Results== + +Experimental results comparing the CHD algorithm with [the BDZ algorithm bdz.html] +and others available in the CMPH library are presented in [[2 #papers]]. +---------------------------------------- + +==Papers==[papers] + ++ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], [R. Pagh http://www.itu.dk/~pagh/], [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [Simple and space-efficient minimal perfect hash functions papers/wads07.pdf]. //In Proceedings of the 10th International Workshop on Algorithms and Data Structures (WADs'07),// Springer-Verlag Lecture Notes in Computer Science, vol. 4619, Halifax, Canada, August 2007, 139-150. + ++ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], D. Belazzougui and M. Dietzfelbinger. [Compress, hash and displace papers/esa09.pdf]. //In Proceedings of the 17th European Symposium on Algorithms (ESA’09)//. Springer LNCS, 2009. + ++ M. Dietzfelbinger and [R. Pagh http://www.itu.dk/~pagh/]. Succinct data structures for retrieval and approximate membership. //In Proceedings of the 35th international colloquium on Automata, Languages and Programming (ICALP’08)//, pages 385–396, Berlin, Heidelberg, 2008. Springer-Verlag. + ++ B. Prabhakar and F. Bonomi. Perfect hashing for network applications. //In Proceedings of the IEEE International Symposium on Information Theory//. IEEE Press, 2006. + + +%!include: ALGORITHMS.t2t + +%!include: FOOTER.t2t + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/CHM.t2t b/CHM.t2t index d696d38..adf9b30 100644 --- a/CHM.t2t +++ b/CHM.t2t @@ -84,3 +84,5 @@ Again we have: %!include: ALGORITHMS.t2t %!include: FOOTER.t2t + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/COMPARISON.t2t b/COMPARISON.t2t index 21b3dd1..d5aba53 100644 --- a/COMPARISON.t2t +++ b/COMPARISON.t2t @@ -107,3 +107,5 @@ to [figs/img250.png] seconds for [figs/img6.png]. %!include: ALGORITHMS.t2t %!include: FOOTER.t2t + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/CONCEPTS.t2t b/CONCEPTS.t2t index 745a03d..b8cb2c9 100644 --- a/CONCEPTS.t2t +++ b/CONCEPTS.t2t @@ -52,3 +52,5 @@ data mining techniques. %!include: ALGORITHMS.t2t %!include: FOOTER.t2t + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/CONFIG.t2t b/CONFIG.t2t index cf391eb..d3eb24f 100644 --- a/CONFIG.t2t +++ b/CONFIG.t2t @@ -44,3 +44,8 @@ %! PostProc(html): 'ALIGN="middle" SRC="figs/img248.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img237.png"\1>' %! PostProc(html): 'ALIGN="middle" SRC="figs/img249.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img249.png"\1>' %! PostProc(html): 'ALIGN="middle" SRC="figs/img250.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img250.png"\1>' +%! PostProc(html): 'ALIGN="middle" SRC="figs/bdz/img8.png"(.*?)>' 'ALIGN="bottom" SRC="figs/bdz/img8.png"\1>' +% The ^ need to be escaped by \ +%!postproc(html): \^\^(.*?)\^\^ \1 +%!postproc(html): ,,(.*?),, \1 + diff --git a/EXAMPLES.t2t b/EXAMPLES.t2t index 8f523bf..cddb03a 100644 --- a/EXAMPLES.t2t +++ b/EXAMPLES.t2t @@ -12,30 +12,45 @@ Using cmph is quite simple. Take a look in the following examples. #include // Create minimal perfect hash function from in-memory vector int main(int argc, char **argv) -{ - // Creating a filled vector - const char *vector[] = {"aaaaaaaaaa", "bbbbbbbbbb", "cccccccccc", "dddddddddd", "eeeeeeeeee", - "ffffffffff", "gggggggggg", "hhhhhhhhhh", "iiiiiiiiii", "jjjjjjjjjj"}; - unsigned int nkeys = 10; - // Source of keys - cmph_io_adapter_t *source = cmph_io_vector_adapter((char **)vector, nkeys); - - //Create minimal perfect hash function using the default (chm) algorithm. - cmph_config_t *config = cmph_config_new(source); - cmph_t *hash = cmph_new(config); - cmph_config_destroy(config); - - //Find key - const char *key = "jjjjjjjjjj"; - unsigned int id = cmph_search(hash, key, strlen(key)); - fprintf(stderr, "Id:%u\n", id); - //Destroy hash - cmph_destroy(hash); - cmph_io_vector_adapter_destroy(source); - return 0; +{ + + // Creating a filled vector + unsigned int i = 0; + const char *vector[] = {"aaaaaaaaaa", "bbbbbbbbbb", "cccccccccc", "dddddddddd", "eeeeeeeeee", + "ffffffffff", "gggggggggg", "hhhhhhhhhh", "iiiiiiiiii", "jjjjjjjjjj"}; + unsigned int nkeys = 10; + FILE* mphf_fd = fopen("temp.mph", "w"); + // Source of keys + cmph_io_adapter_t *source = cmph_io_vector_adapter((char **)vector, nkeys); + + //Create minimal perfect hash function using the brz algorithm. + cmph_config_t *config = cmph_config_new(source); + cmph_config_set_algo(config, CMPH_BRZ); + cmph_config_set_mphf_fd(config, mphf_fd); + cmph_t *hash = cmph_new(config); + cmph_config_destroy(config); + cmph_dump(hash, mphf_fd); + cmph_destroy(hash); + fclose(mphf_fd); + + //Find key + mphf_fd = fopen("temp.mph", "r"); + hash = cmph_load(mphf_fd); + while (i < nkeys) { + const char *key = vector[i]; + unsigned int id = cmph_search(hash, key, (cmph_uint32)strlen(key)); + fprintf(stderr, "key:%s -- hash:%u\n", key, id); + i++; + } + + //Destroy hash + cmph_destroy(hash); + cmph_io_vector_adapter_destroy(source); + fclose(mphf_fd); + return 0; } ``` -Download [vector_adapter_ex1.c examples/vector_adapter_ex1.c]. This example does not work in versions below 0.3. +Download [vector_adapter_ex1.c examples/vector_adapter_ex1.c]. This example does not work in versions below 0.6. ------------------------------- ``` @@ -45,9 +60,9 @@ Download [vector_adapter_ex1.c examples/vector_adapter_ex1.c]. This example does #pragma pack(1) typedef struct { - cmph_uint32 id; - char key[11]; - cmph_uint32 year; + cmph_uint32 id; + char key[11]; + cmph_uint32 year; } rec_t; #pragma pack(0) @@ -56,15 +71,15 @@ int main(int argc, char **argv) // Creating a filled vector unsigned int i = 0; rec_t vector[10] = {{1, "aaaaaaaaaa", 1999}, {2, "bbbbbbbbbb", 2000}, {3, "cccccccccc", 2001}, - {4, "dddddddddd", 2002}, {5, "eeeeeeeeee", 2003}, {6, "ffffffffff", 2004}, - {7, "gggggggggg", 2005}, {8, "hhhhhhhhhh", 2006}, {9, "iiiiiiiiii", 2007}, - {10,"jjjjjjjjjj", 2008}}; + {4, "dddddddddd", 2002}, {5, "eeeeeeeeee", 2003}, {6, "ffffffffff", 2004}, + {7, "gggggggggg", 2005}, {8, "hhhhhhhhhh", 2006}, {9, "iiiiiiiiii", 2007}, + {10,"jjjjjjjjjj", 2008}}; unsigned int nkeys = 10; FILE* mphf_fd = fopen("temp_struct_vector.mph", "w"); // Source of keys - cmph_io_adapter_t *source = cmph_io_struct_vector_adapter(vector, sizeof(rec_t), sizeof(cmph_uint32), 11, nkeys); + cmph_io_adapter_t *source = cmph_io_struct_vector_adapter(vector, (cmph_uint32)sizeof(rec_t), (cmph_uint32)sizeof(cmph_uint32), 11, nkeys); - //Create minimal perfect hash function using the default (chm) algorithm. + //Create minimal perfect hash function using the BDZ algorithm. cmph_config_t *config = cmph_config_new(source); cmph_config_set_algo(config, CMPH_BDZ); cmph_config_set_mphf_fd(config, mphf_fd); @@ -78,10 +93,10 @@ int main(int argc, char **argv) mphf_fd = fopen("temp_struct_vector.mph", "r"); hash = cmph_load(mphf_fd); while (i < nkeys) { - const char *key = vector[i].key; - unsigned int id = cmph_search(hash, key, 11); - fprintf(stderr, "key:%s -- hash:%u\n", key, id); - i++; + const char *key = vector[i].key; + unsigned int id = cmph_search(hash, key, 11); + fprintf(stderr, "key:%s -- hash:%u\n", key, id); + i++; } //Destroy hash @@ -91,45 +106,47 @@ int main(int argc, char **argv) return 0; } ``` -Download [struct_vector_adapter_ex3.c examples/struct_vector_adapter_ex3.c]. This example does not work in versions below 0.7. +Download [struct_vector_adapter_ex3.c examples/struct_vector_adapter_ex3.c]. This example does not work in versions below 0.8. ------------------------------- ``` #include #include #include - // Create minimal perfect hash function from in-disk keys using BMZ algorithm + // Create minimal perfect hash function from in-disk keys using BDZ algorithm int main(int argc, char **argv) -{ - //Open file with newline separated list of keys +{ + //Open file with newline separated list of keys FILE * keys_fd = fopen("keys.txt", "r"); cmph_t *hash = NULL; - if (keys_fd == NULL) + if (keys_fd == NULL) { - fprintf(stderr, "File \"keys.txt\" not found\n"); - exit(1); - } + fprintf(stderr, "File \"keys.txt\" not found\n"); + exit(1); + } // Source of keys cmph_io_adapter_t *source = cmph_io_nlfile_adapter(keys_fd); - + cmph_config_t *config = cmph_config_new(source); - cmph_config_set_algo(config, CMPH_BMZ); + cmph_config_set_algo(config, CMPH_BDZ); hash = cmph_new(config); cmph_config_destroy(config); - + //Find key const char *key = "jjjjjjjjjj"; - unsigned int id = cmph_search(hash, key, strlen(key)); + unsigned int id = cmph_search(hash, key, (cmph_uint32)strlen(key)); fprintf(stderr, "Id:%u\n", id); //Destroy hash cmph_destroy(hash); - cmph_io_nlfile_adapter_destroy(source); + cmph_io_nlfile_adapter_destroy(source); fclose(keys_fd); return 0; } ``` -Download [file_adapter_ex2.c examples/file_adapter_ex2.c] and [keys.txt examples/keys.txt] +Download [file_adapter_ex2.c examples/file_adapter_ex2.c] and [keys.txt examples/keys.txt]. This example does not work in versions below 0.8. %!include: ALGORITHMS.t2t %!include: FOOTER.t2t + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/FAQ.t2t b/FAQ.t2t index a013867..7807bc6 100644 --- a/FAQ.t2t +++ b/FAQ.t2t @@ -34,3 +34,5 @@ one is executed? %!include: ALGORITHMS.t2t %!include: FOOTER.t2t + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/FCH.t2t b/FCH.t2t index 73acfa5..872e040 100644 --- a/FCH.t2t +++ b/FCH.t2t @@ -43,3 +43,5 @@ We only need to store the //g// function and a constant number of bytes for the %!include: ALGORITHMS.t2t %!include: FOOTER.t2t + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/GOOGLEANALYTICS.t2t b/GOOGLEANALYTICS.t2t new file mode 100644 index 0000000..360af4c --- /dev/null +++ b/GOOGLEANALYTICS.t2t @@ -0,0 +1,9 @@ + + \ No newline at end of file diff --git a/GPERF.t2t b/GPERF.t2t index 218ce52..b047af6 100644 --- a/GPERF.t2t +++ b/GPERF.t2t @@ -35,3 +35,5 @@ the compiler programming area (detect reserved keywords). %!include: ALGORITHMS.t2t %!include: FOOTER.t2t + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/NEWSLOG.t2t b/NEWSLOG.t2t index 772757a..606a843 100644 --- a/NEWSLOG.t2t +++ b/NEWSLOG.t2t @@ -3,6 +3,15 @@ News Log %!includeconf: CONFIG.t2t +---------------------------------------- + +==News for version 0.9== + +- [The CHD algorithm chd.html], which is an algorithm that can be tuned to generate MPHFs that require approximately 2.07 bits per key to be stored. The algorithm outperforms [the BDZ algorithm bdz.html] and therefore is the fastest one available in the literature for sets that can be treated in internal memory. +- [The CHD_PH algorithm chd.html], which is an algorithm to generate PHFs with load factor up to //99 %//. It is actually the CHD algorithm without the ranking step. If we set the load factor to //81 %//, which is the maximum that can be obtained with [the BDZ algorithm bdz.html], the resulting functions can be stored in //1.40// bits per key. The space requirement increases with the load factor. +- All reported bugs and suggestions have been corrected and included as well. + + ---------------------------------------- ==News for version 0.8== @@ -61,3 +70,5 @@ News Log %!include: ALGORITHMS.t2t %!include: FOOTER.t2t + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/README.t2t b/README.t2t index 71cf01e..a187043 100644 --- a/README.t2t +++ b/README.t2t @@ -42,43 +42,61 @@ The CMPH Library encapsulates the newest and more efficient algorithms in an eas ==Supported Algorithms== -%html% - [BDZ Algorithm bdz.html]. -%txt% - BDZ Algorithm. - The fastest algorithm to build PHFs and MPHFs. It is based on random 3-graphs. A 3-graph is a - generalization of a graph where each edge connects 3 vertices instead of only 2. The - resulting functions are not order preserving and can be stored in only //(2 + x)cn// - bits, where //c// should be larger than or equal to //1.23// and //x// is a constant - larger than //0// (actually, x = 1/b and b is a parameter that should be larger than 2). - For //c = 1.23// and //b = 8//, the resulting functions are stored in approximately 2.6 bits per key. -%html% - [BMZ Algorithm bmz.html]. -%txt% - BMZ Algorithm. - A very fast algorithm based on cyclic random graphs to construct minimal - perfect hash functions in linear time. The resulting functions are not order preserving and - can be stored in only //4cn// bytes, where //c// is between 0.93 and 1.15. -%html% - [BRZ Algorithm brz.html]. -%txt% - BRZ Algorithm. - A very fast external memory based algorithm for constructing minimal perfect hash functions - for sets in the order of billion of keys in linear time. The resulting functions are not order preserving and - can be stored using just 8.1 bits per key. -%html% - [CHM Algorithm chm.html]. -%txt% - CHM Algorithm. - An algorithm based on acyclic random graphs to construct minimal - perfect hash functions in linear time. The resulting functions are order preserving and - are stored in //4cn// bytes, where //c// is greater than 2. -%html% - [FCH Algorithm fch.html]. -%txt% - FCH Algorithm. - An algorithm to construct minimal perfect hash functions that require - less than 4 bits per key to be stored. Although the resulting MPHFs are - very compact, the algorithm is only efficient for small sets. - However, it is used as internal algorithm in the BRZ algorithm for efficiently solving - larger problems and even so to generate MPHFs that require approximately - 4.1 bits per key to be stored. For that, you just need to set the parameters -a to brz and - -c to a value larger than or equal to 2.6. +%html% - [CHD Algorithm chd.html]: +%txt% - CHD Algorithm: + - It is the fastest algorithm to build PHFs and MPHFs in linear time. + - It generates the most compact PHFs and MPHFs we know of. + - It can generate PHFs with a load factor up to //99 %//. + - It can be used to generate //t//-perfect hash functions. A //t//-perfect hash function allows at most //t// collisions in a given bin. It is a well-known fact that modern memories are organized as blocks which constitute transfer unit. Example of such blocks are cache lines for internal memory or sectors for hard disks. Thus, it can be very useful for devices that carry out I/O operations in blocks. + - It is a two level scheme. It uses a first level hash function to split the key set in buckets of average size determined by a parameter //b// in the range //[1,32]//. In the second level it uses displacement values to resolve the collisions that have given rise to the buckets. + - It can generate MPHFs that can be stored in approximately //2.07// bits per key. + - For a load factor equal to the maximum one that is achieved by the BDZ algorithm (//81 %//), the resulting PHFs are stored in approximately //1.40// bits per key. +%html% - [BDZ Algorithm bdz.html]: +%txt% - BDZ Algorithm: + - It is very simple and efficient. It outperforms all the ones below. + - It constructs both PHFs and MPHFs in linear time. + - The maximum load factor one can achieve for a PHF is //1/1.23//. + - It is based on acyclic random 3-graphs. A 3-graph is a generalization of a graph where each edge connects 3 vertices instead of only 2. + - The resulting MPHFs are not order preserving. + - The resulting MPHFs can be stored in only //(2 + x)cn// bits, where //c// should be larger than or equal to //1.23// and //x// is a constant larger than //0// (actually, x = 1/b and b is a parameter that should be larger than 2). For //c = 1.23// and //b = 8//, the resulting functions are stored in approximately 2.6 bits per key. + - For its maximum load factor (//81 %//), the resulting PHFs are stored in approximately //1.95// bits per key. +%html% - [BMZ Algorithm bmz.html]: +%txt% - BMZ Algorithm: + - Construct MPHFs in linear time. + - It is based on cyclic random graphs. This makes it faster than the CHM algorithm. + - The resulting MPHFs are not order preserving. + - The resulting MPHFs are more compact than the ones generated by the CHM algorithm and can be stored in //4cn// bytes, where //c// is in the range //[0.93,1.15]//. +%html% - [BRZ Algorithm brz.html]: +%txt% - BRZ Algorithm: + - A very fast external memory based algorithm for constructing minimal perfect hash functions for sets in the order of billions of keys. + - It works in linear time. + - The resulting MPHFs are not order preserving. + - The resulting MPHFs can be stored using less than //8.0// bits per key. +%html% - [CHM Algorithm chm.html]: +%txt% - CHM Algorithm: + - Construct minimal MPHFs in linear time. + - It is based on acyclic random graphs + - The resulting MPHFs are order preserving. + - The resulting MPHFs are stored in //4cn// bytes, where //c// is greater than 2. +%html% - [FCH Algorithm fch.html]: +%txt% - FCH Algorithm: + - Construct minimal perfect hash functions that require less than 4 bits per key to be stored. + - The resulting MPHFs are very compact and very efficient at evaluation time + - The algorithm is only efficient for small sets. + - It is used as internal algorithm in the BRZ algorithm to efficiently solve larger problems and even so to generate MPHFs that require approximately 4.1 bits per key to be stored. For that, you just need to set the parameters -a to brz and -c to a value larger than or equal to 2.6. ---------------------------------------- -==News for version 0.8 (Coming soon)== +==News for version 0.9== + +- [The CHD algorithm chd.html], which is an algorithm that can be tuned to generate MPHFs that require approximately 2.07 bits per key to be stored. The algorithm outperforms [the BDZ algorithm bdz.html] and therefore is the fastest one available in the literature for sets that can be treated in internal memory. +- [The CHD_PH algorithm chd.html], which is an algorithm to generate PHFs with load factor up to //99 %//. It is actually the CHD algorithm without the ranking step. If we set the load factor to //81 %//, which is the maximum that can be obtained with [the BDZ algorithm bdz.html], the resulting functions can be stored in //1.40// bits per key. The space requirement increases with the load factor. +- All reported bugs and suggestions have been corrected and included as well. + + + +==News for version 0.8 == - [An algorithm to generate MPHFs that require around 2.6 bits per key to be stored bdz.html], which is referred to as BDZ algorithm. The algorithm is the fastest one available in the literature for sets that can be treated in internal memory. - [An algorithm to generate PHFs with range m = cn, for c > 1.22 bdz.html], which is referred to as BDZ_PH algorithm. It is actually the BDZ algorithm without the ranking step. The resulting functions can be stored in 1.95 bits per key for //c = 1.23// and are considerably faster than the MPHFs generated by the BDZ algorithm. @@ -88,10 +106,6 @@ The CMPH Library encapsulates the newest and more efficient algorithms in an eas - All reported bugs and suggestions have been corrected and included as well. -==News for version 0.7== - -- Added man pages and a pkgconfig file. - [News log newslog.html] ---------------------------------------- @@ -106,67 +120,82 @@ Using cmph is quite simple. Take a look. #include // Create minimal perfect hash function from in-memory vector int main(int argc, char **argv) -{ - // Creating a filled vector - const char *vector[] = {"aaaaaaaaaa", "bbbbbbbbbb", "cccccccccc", "dddddddddd", "eeeeeeeeee", - "ffffffffff", "gggggggggg", "hhhhhhhhhh", "iiiiiiiiii", "jjjjjjjjjj"}; - unsigned int nkeys = 10; - // Source of keys - cmph_io_adapter_t *source = cmph_io_vector_adapter((char **)vector, nkeys); - - //Create minimal perfect hash function using the default (chm) algorithm. - cmph_config_t *config = cmph_config_new(source); - cmph_t *hash = cmph_new(config); - cmph_config_destroy(config); - - //Find key - const char *key = "jjjjjjjjjj"; - unsigned int id = cmph_search(hash, key, strlen(key)); - fprintf(stderr, "Id:%u\n", id); - //Destroy hash - cmph_destroy(hash); - cmph_io_vector_adapter_destroy(source); - return 0; +{ + + // Creating a filled vector + unsigned int i = 0; + const char *vector[] = {"aaaaaaaaaa", "bbbbbbbbbb", "cccccccccc", "dddddddddd", "eeeeeeeeee", + "ffffffffff", "gggggggggg", "hhhhhhhhhh", "iiiiiiiiii", "jjjjjjjjjj"}; + unsigned int nkeys = 10; + FILE* mphf_fd = fopen("temp.mph", "w"); + // Source of keys + cmph_io_adapter_t *source = cmph_io_vector_adapter((char **)vector, nkeys); + + //Create minimal perfect hash function using the brz algorithm. + cmph_config_t *config = cmph_config_new(source); + cmph_config_set_algo(config, CMPH_BRZ); + cmph_config_set_mphf_fd(config, mphf_fd); + cmph_t *hash = cmph_new(config); + cmph_config_destroy(config); + cmph_dump(hash, mphf_fd); + cmph_destroy(hash); + fclose(mphf_fd); + + //Find key + mphf_fd = fopen("temp.mph", "r"); + hash = cmph_load(mphf_fd); + while (i < nkeys) { + const char *key = vector[i]; + unsigned int id = cmph_search(hash, key, (cmph_uint32)strlen(key)); + fprintf(stderr, "key:%s -- hash:%u\n", key, id); + i++; + } + + //Destroy hash + cmph_destroy(hash); + cmph_io_vector_adapter_destroy(source); + fclose(mphf_fd); + return 0; } ``` -Download [vector_adapter_ex1.c examples/vector_adapter_ex1.c]. This example does not work in version 0.3. You need to update the sources from CVS to make it works. +Download [vector_adapter_ex1.c examples/vector_adapter_ex1.c]. This example does not work in versions below 0.6. You need to update the sources from GIT to make it work. ------------------------------- ``` #include #include #include - // Create minimal perfect hash function from in-disk keys using BMZ algorithm + // Create minimal perfect hash function from in-disk keys using BDZ algorithm int main(int argc, char **argv) -{ - //Open file with newline separated list of keys +{ + //Open file with newline separated list of keys FILE * keys_fd = fopen("keys.txt", "r"); cmph_t *hash = NULL; - if (keys_fd == NULL) + if (keys_fd == NULL) { - fprintf(stderr, "File \"keys.txt\" not found\n"); - exit(1); - } + fprintf(stderr, "File \"keys.txt\" not found\n"); + exit(1); + } // Source of keys cmph_io_adapter_t *source = cmph_io_nlfile_adapter(keys_fd); - + cmph_config_t *config = cmph_config_new(source); - cmph_config_set_algo(config, CMPH_BMZ); + cmph_config_set_algo(config, CMPH_BDZ); hash = cmph_new(config); cmph_config_destroy(config); - + //Find key const char *key = "jjjjjjjjjj"; - unsigned int id = cmph_search(hash, key, strlen(key)); + unsigned int id = cmph_search(hash, key, (cmph_uint32)strlen(key)); fprintf(stderr, "Id:%u\n", id); //Destroy hash cmph_destroy(hash); - cmph_io_nlfile_adapter_destroy(source); + cmph_io_nlfile_adapter_destroy(source); fclose(keys_fd); return 0; } ``` -Download [file_adapter_ex2.c examples/file_adapter_ex2.c] and [keys.txt examples/keys.txt] +Download [file_adapter_ex2.c examples/file_adapter_ex2.c] and [keys.txt examples/keys.txt]. This example does not work in versions below 0.8. You need to update the sources from GIT to make it work. [Click here to see more examples examples.html] -------------------------------------- @@ -195,41 +224,55 @@ utility. ``` -usage: cmph [-v] [-h] [-V] [-k nkeys] [-f hash_function] [-g [-c value][-s seed] ] - [-a algorithm] [-M memory_in_MB] [-b BRZ_parameter] [-d tmp_dir] +usage: cmph [-v] [-h] [-V] [-k nkeys] [-f hash_function] [-g [-c algorithm_dependent_value][-s seed] ] + [-a algorithm] [-M memory_in_MB] [-b algorithm_dependent_value] [-t keys_per_bin] [-d tmp_dir] [-m file.mph] keysfile Minimum perfect hashing tool - -h print this help message - -c c value determines: - the number of vertices in the graph for the algorithms BMZ and CHM - the number of bits per key required in the FCH algorithm - -a algorithm - valid values are - * bmz - * bmz8 - * chm - * brz - * fch - * bdz - * bdz_ph - -f hash function (may be used multiple times) - valid values are - * jenkins - -V print version number and exit - -v increase verbosity (may be used multiple times) - -k number of keys - -g generation mode - -s random seed - -m minimum perfect hash function file - -M main memory availability (in MB) - -d temporary directory used in brz algorithm - -b the meaning of this parameter depends on the algorithm used. - If BRZ algorithm is selected in -a option, than it is used - to make the maximal number of keys in a bucket lower than 256. - In this case its value should be an integer in the range [64,175]. - If BDZ algorithm is selected in option -a, than it is used to - determine the size of some precomputed rank information and - its value should be an integer in the range [3,10] - keysfile line separated file with keys + -h print this help message + -c c value determines: + * the number of vertices in the graph for the algorithms BMZ and CHM + * the number of bits per key required in the FCH algorithm + * the load factor in the CHD_PH algorithm + -a algorithm - valid values are + * bmz + * bmz8 + * chm + * brz + * fch + * bdz + * bdz_ph + * chd_ph + * chd + -f hash function (may be used multiple times) - valid values are + * jenkins + -V print version number and exit + -v increase verbosity (may be used multiple times) + -k number of keys + -g generation mode + -s random seed + -m minimum perfect hash function file + -M main memory availability (in MB) used in BRZ algorithm + -d temporary directory used in BRZ algorithm + -b the meaning of this parameter depends on the algorithm selected in the -a option: + * For BRZ it is used to make the maximal number of keys in a bucket lower than 256. + In this case its value should be an integer in the range [64,175]. Default is 128. + + * For BDZ it is used to determine the size of some precomputed rank + information and its value should be an integer in the range [3,10]. Default + is 7. The larger is this value, the more compact are the resulting functions + and the slower are them at evaluation time. + + * For CHD and CHD_PH it is used to set the average number of keys per bucket + and its value should be an integer in the range [1,32]. Default is 4. The + larger is this value, the slower is the construction of the functions. + This parameter has no effect for other algorithms. + + -t set the number of keys per bin for a t-perfect hashing function. A t-perfect + hash function allows at most t collisions in a given bin. This parameter applies + only to the CHD and CHD_PH algorithms. Its value should be an integer in the + range [1,128]. Defaul is 1 + keysfile line separated file with keys ``` ==Additional Documentation== @@ -250,3 +293,5 @@ Code is under the LGPL and the MPL 1.1. %!include(html): ''LOGO.t2t'' Last Updated: %%date(%c) + +%!include(html): ''GOOGLEANALYTICS.t2t'' \ No newline at end of file diff --git a/examples/file_adapter_ex2.c b/examples/file_adapter_ex2.c index 9dfa22c..bcdfada 100644 --- a/examples/file_adapter_ex2.c +++ b/examples/file_adapter_ex2.c @@ -1,7 +1,7 @@ #include #include #include - // Create minimal perfect hash function from in-disk keys using BMZ algorithm + // Create minimal perfect hash function from in-disk keys using BDZ algorithm int main(int argc, char **argv) { //Open file with newline separated list of keys @@ -16,7 +16,7 @@ int main(int argc, char **argv) cmph_io_adapter_t *source = cmph_io_nlfile_adapter(keys_fd); cmph_config_t *config = cmph_config_new(source); - cmph_config_set_algo(config, CMPH_BMZ); + cmph_config_set_algo(config, CMPH_BDZ); hash = cmph_new(config); cmph_config_destroy(config); diff --git a/examples/struct_vector_adapter_ex3.c b/examples/struct_vector_adapter_ex3.c index b80c576..ed61764 100644 --- a/examples/struct_vector_adapter_ex3.c +++ b/examples/struct_vector_adapter_ex3.c @@ -12,40 +12,40 @@ typedef struct { int main(int argc, char **argv) { - // Creating a filled vector + // Creating a filled vector unsigned int i = 0; rec_t vector[10] = {{1, "aaaaaaaaaa", 1999}, {2, "bbbbbbbbbb", 2000}, {3, "cccccccccc", 2001}, - {4, "dddddddddd", 2002}, {5, "eeeeeeeeee", 2003}, {6, "ffffffffff", 2004}, - {7, "gggggggggg", 2005}, {8, "hhhhhhhhhh", 2006}, {9, "iiiiiiiiii", 2007}, - {10,"jjjjjjjjjj", 2008}}; - unsigned int nkeys = 10; + {4, "dddddddddd", 2002}, {5, "eeeeeeeeee", 2003}, {6, "ffffffffff", 2004}, + {7, "gggggggggg", 2005}, {8, "hhhhhhhhhh", 2006}, {9, "iiiiiiiiii", 2007}, + {10,"jjjjjjjjjj", 2008}}; + unsigned int nkeys = 10; FILE* mphf_fd = fopen("temp_struct_vector.mph", "w"); - // Source of keys - cmph_io_adapter_t *source = cmph_io_struct_vector_adapter(vector, (cmph_uint32)sizeof(rec_t), (cmph_uint32)sizeof(cmph_uint32), 11, nkeys); + // Source of keys + cmph_io_adapter_t *source = cmph_io_struct_vector_adapter(vector, (cmph_uint32)sizeof(rec_t), (cmph_uint32)sizeof(cmph_uint32), 11, nkeys); - //Create minimal perfect hash function using the default (chm) algorithm. - cmph_config_t *config = cmph_config_new(source); - cmph_config_set_algo(config, CMPH_BDZ); - cmph_config_set_mphf_fd(config, mphf_fd); - cmph_t *hash = cmph_new(config); - cmph_config_destroy(config); + //Create minimal perfect hash function using the BDZ algorithm. + cmph_config_t *config = cmph_config_new(source); + cmph_config_set_algo(config, CMPH_BDZ); + cmph_config_set_mphf_fd(config, mphf_fd); + cmph_t *hash = cmph_new(config); + cmph_config_destroy(config); cmph_dump(hash, mphf_fd); - cmph_destroy(hash); - fclose(mphf_fd); + cmph_destroy(hash); + fclose(mphf_fd); - //Find key + //Find key mphf_fd = fopen("temp_struct_vector.mph", "r"); hash = cmph_load(mphf_fd); while (i < nkeys) { - const char *key = vector[i].key; - unsigned int id = cmph_search(hash, key, 11); - fprintf(stderr, "key:%s -- hash:%u\n", key, id); - i++; + const char *key = vector[i].key; + unsigned int id = cmph_search(hash, key, 11); + fprintf(stderr, "key:%s -- hash:%u\n", key, id); + i++; } - //Destroy hash - cmph_destroy(hash); - cmph_io_vector_adapter_destroy(source); + //Destroy hash + cmph_destroy(hash); + cmph_io_vector_adapter_destroy(source); fclose(mphf_fd); - return 0; + return 0; } diff --git a/examples/vector_adapter_ex1.c b/examples/vector_adapter_ex1.c index 85769a5..44305dc 100755 --- a/examples/vector_adapter_ex1.c +++ b/examples/vector_adapter_ex1.c @@ -13,7 +13,7 @@ int main(int argc, char **argv) // Source of keys cmph_io_adapter_t *source = cmph_io_vector_adapter((char **)vector, nkeys); - //Create minimal perfect hash function using the default (chm) algorithm. + //Create minimal perfect hash function using the brz algorithm. cmph_config_t *config = cmph_config_new(source); cmph_config_set_algo(config, CMPH_BRZ); cmph_config_set_mphf_fd(config, mphf_fd); diff --git a/figs/bdz/img1.png b/figs/bdz/img1.png new file mode 100644 index 0000000..e113680 Binary files /dev/null and b/figs/bdz/img1.png differ diff --git a/figs/bdz/img10.png b/figs/bdz/img10.png new file mode 100644 index 0000000..9aa3459 Binary files /dev/null and b/figs/bdz/img10.png differ diff --git a/figs/bdz/img100.png b/figs/bdz/img100.png new file mode 100644 index 0000000..ca07eb8 Binary files /dev/null and b/figs/bdz/img100.png differ diff --git a/figs/bdz/img101.png b/figs/bdz/img101.png new file mode 100644 index 0000000..28fa162 Binary files /dev/null and b/figs/bdz/img101.png differ diff --git a/figs/bdz/img102.png b/figs/bdz/img102.png new file mode 100644 index 0000000..59a1949 Binary files /dev/null and b/figs/bdz/img102.png differ diff --git a/figs/bdz/img103.png b/figs/bdz/img103.png new file mode 100644 index 0000000..5e76dad Binary files /dev/null and b/figs/bdz/img103.png differ diff --git a/figs/bdz/img104.png b/figs/bdz/img104.png new file mode 100644 index 0000000..784fb28 Binary files /dev/null and b/figs/bdz/img104.png differ diff --git a/figs/bdz/img105.png b/figs/bdz/img105.png new file mode 100644 index 0000000..64f65d9 Binary files /dev/null and b/figs/bdz/img105.png differ diff --git a/figs/bdz/img106.png b/figs/bdz/img106.png new file mode 100644 index 0000000..f673433 Binary files /dev/null and b/figs/bdz/img106.png differ diff --git a/figs/bdz/img107.png b/figs/bdz/img107.png new file mode 100644 index 0000000..142ee7f Binary files /dev/null and b/figs/bdz/img107.png differ diff --git a/figs/bdz/img108.png b/figs/bdz/img108.png new file mode 100644 index 0000000..d1a4cc5 Binary files /dev/null and b/figs/bdz/img108.png differ diff --git a/figs/bdz/img109.png b/figs/bdz/img109.png new file mode 100644 index 0000000..1cdf449 Binary files /dev/null and b/figs/bdz/img109.png differ diff --git a/figs/bdz/img11.png b/figs/bdz/img11.png new file mode 100644 index 0000000..c536697 Binary files /dev/null and b/figs/bdz/img11.png differ diff --git a/figs/bdz/img110.png b/figs/bdz/img110.png new file mode 100644 index 0000000..8eca594 Binary files /dev/null and b/figs/bdz/img110.png differ diff --git a/figs/bdz/img111.png b/figs/bdz/img111.png new file mode 100644 index 0000000..b843878 Binary files /dev/null and b/figs/bdz/img111.png differ diff --git a/figs/bdz/img112.png b/figs/bdz/img112.png new file mode 100644 index 0000000..112e52d Binary files /dev/null and b/figs/bdz/img112.png differ diff --git a/figs/bdz/img113.png b/figs/bdz/img113.png new file mode 100644 index 0000000..2915506 Binary files /dev/null and b/figs/bdz/img113.png differ diff --git a/figs/bdz/img114.png b/figs/bdz/img114.png new file mode 100644 index 0000000..3680a92 Binary files /dev/null and b/figs/bdz/img114.png differ diff --git a/figs/bdz/img115.png b/figs/bdz/img115.png new file mode 100644 index 0000000..fe22b2c Binary files /dev/null and b/figs/bdz/img115.png differ diff --git a/figs/bdz/img116.png b/figs/bdz/img116.png new file mode 100644 index 0000000..aa8c194 Binary files /dev/null and b/figs/bdz/img116.png differ diff --git a/figs/bdz/img117.png b/figs/bdz/img117.png new file mode 100644 index 0000000..113e5bf Binary files /dev/null and b/figs/bdz/img117.png differ diff --git a/figs/bdz/img118.png b/figs/bdz/img118.png new file mode 100644 index 0000000..fad99fe Binary files /dev/null and b/figs/bdz/img118.png differ diff --git a/figs/bdz/img119.png b/figs/bdz/img119.png new file mode 100644 index 0000000..19bb214 Binary files /dev/null and b/figs/bdz/img119.png differ diff --git a/figs/bdz/img12.png b/figs/bdz/img12.png new file mode 100644 index 0000000..05eafb0 Binary files /dev/null and b/figs/bdz/img12.png differ diff --git a/figs/bdz/img120.png b/figs/bdz/img120.png new file mode 100644 index 0000000..67e70e8 Binary files /dev/null and b/figs/bdz/img120.png differ diff --git a/figs/bdz/img121.png b/figs/bdz/img121.png new file mode 100644 index 0000000..7964ed8 Binary files /dev/null and b/figs/bdz/img121.png differ diff --git a/figs/bdz/img122.png b/figs/bdz/img122.png new file mode 100644 index 0000000..2bc10dc Binary files /dev/null and b/figs/bdz/img122.png differ diff --git a/figs/bdz/img123.png b/figs/bdz/img123.png new file mode 100644 index 0000000..6ea4568 Binary files /dev/null and b/figs/bdz/img123.png differ diff --git a/figs/bdz/img124.png b/figs/bdz/img124.png new file mode 100644 index 0000000..5775722 Binary files /dev/null and b/figs/bdz/img124.png differ diff --git a/figs/bdz/img125.png b/figs/bdz/img125.png new file mode 100644 index 0000000..229dfe5 Binary files /dev/null and b/figs/bdz/img125.png differ diff --git a/figs/bdz/img126.png b/figs/bdz/img126.png new file mode 100644 index 0000000..3db0a8b Binary files /dev/null and b/figs/bdz/img126.png differ diff --git a/figs/bdz/img127.png b/figs/bdz/img127.png new file mode 100644 index 0000000..3800c2d Binary files /dev/null and b/figs/bdz/img127.png differ diff --git a/figs/bdz/img128.png b/figs/bdz/img128.png new file mode 100644 index 0000000..aba0085 Binary files /dev/null and b/figs/bdz/img128.png differ diff --git a/figs/bdz/img129.png b/figs/bdz/img129.png new file mode 100644 index 0000000..2fb6ac7 Binary files /dev/null and b/figs/bdz/img129.png differ diff --git a/figs/bdz/img13.png b/figs/bdz/img13.png new file mode 100644 index 0000000..afe53ea Binary files /dev/null and b/figs/bdz/img13.png differ diff --git a/figs/bdz/img130.png b/figs/bdz/img130.png new file mode 100644 index 0000000..05eafb0 Binary files /dev/null and b/figs/bdz/img130.png differ diff --git a/figs/bdz/img131.png b/figs/bdz/img131.png new file mode 100644 index 0000000..5129044 Binary files /dev/null and b/figs/bdz/img131.png differ diff --git a/figs/bdz/img132.png b/figs/bdz/img132.png new file mode 100644 index 0000000..f7f7cb1 Binary files /dev/null and b/figs/bdz/img132.png differ diff --git a/figs/bdz/img133.png b/figs/bdz/img133.png new file mode 100644 index 0000000..2941544 Binary files /dev/null and b/figs/bdz/img133.png differ diff --git a/figs/bdz/img134.png b/figs/bdz/img134.png new file mode 100644 index 0000000..07dc9a4 Binary files /dev/null and b/figs/bdz/img134.png differ diff --git a/figs/bdz/img135.png b/figs/bdz/img135.png new file mode 100644 index 0000000..2334771 Binary files /dev/null and b/figs/bdz/img135.png differ diff --git a/figs/bdz/img136.png b/figs/bdz/img136.png new file mode 100644 index 0000000..458da07 Binary files /dev/null and b/figs/bdz/img136.png differ diff --git a/figs/bdz/img137.png b/figs/bdz/img137.png new file mode 100644 index 0000000..af3b0f8 Binary files /dev/null and b/figs/bdz/img137.png differ diff --git a/figs/bdz/img138.png b/figs/bdz/img138.png new file mode 100644 index 0000000..28eb39f Binary files /dev/null and b/figs/bdz/img138.png differ diff --git a/figs/bdz/img14.png b/figs/bdz/img14.png new file mode 100644 index 0000000..a02fd7e Binary files /dev/null and b/figs/bdz/img14.png differ diff --git a/figs/bdz/img15.png b/figs/bdz/img15.png new file mode 100644 index 0000000..49300f3 Binary files /dev/null and b/figs/bdz/img15.png differ diff --git a/figs/bdz/img16.png b/figs/bdz/img16.png new file mode 100644 index 0000000..0be3c78 Binary files /dev/null and b/figs/bdz/img16.png differ diff --git a/figs/bdz/img17.png b/figs/bdz/img17.png new file mode 100644 index 0000000..9956e98 Binary files /dev/null and b/figs/bdz/img17.png differ diff --git a/figs/bdz/img18.png b/figs/bdz/img18.png new file mode 100644 index 0000000..d51b70f Binary files /dev/null and b/figs/bdz/img18.png differ diff --git a/figs/bdz/img19.png b/figs/bdz/img19.png new file mode 100644 index 0000000..1b9da3d Binary files /dev/null and b/figs/bdz/img19.png differ diff --git a/figs/bdz/img2.png b/figs/bdz/img2.png new file mode 100644 index 0000000..6eeaca4 Binary files /dev/null and b/figs/bdz/img2.png differ diff --git a/figs/bdz/img20.png b/figs/bdz/img20.png new file mode 100644 index 0000000..77f0f3a Binary files /dev/null and b/figs/bdz/img20.png differ diff --git a/figs/bdz/img21.png b/figs/bdz/img21.png new file mode 100644 index 0000000..e1cfbbd Binary files /dev/null and b/figs/bdz/img21.png differ diff --git a/figs/bdz/img22.png b/figs/bdz/img22.png new file mode 100644 index 0000000..453156a Binary files /dev/null and b/figs/bdz/img22.png differ diff --git a/figs/bdz/img23.png b/figs/bdz/img23.png new file mode 100644 index 0000000..210427f Binary files /dev/null and b/figs/bdz/img23.png differ diff --git a/figs/bdz/img24.png b/figs/bdz/img24.png new file mode 100644 index 0000000..57e6170 Binary files /dev/null and b/figs/bdz/img24.png differ diff --git a/figs/bdz/img25.png b/figs/bdz/img25.png new file mode 100644 index 0000000..9ea6253 Binary files /dev/null and b/figs/bdz/img25.png differ diff --git a/figs/bdz/img26.png b/figs/bdz/img26.png new file mode 100644 index 0000000..0f68b9f Binary files /dev/null and b/figs/bdz/img26.png differ diff --git a/figs/bdz/img27.png b/figs/bdz/img27.png new file mode 100644 index 0000000..11577b9 Binary files /dev/null and b/figs/bdz/img27.png differ diff --git a/figs/bdz/img28.png b/figs/bdz/img28.png new file mode 100644 index 0000000..04d203f Binary files /dev/null and b/figs/bdz/img28.png differ diff --git a/figs/bdz/img29.png b/figs/bdz/img29.png new file mode 100644 index 0000000..4a6df7b Binary files /dev/null and b/figs/bdz/img29.png differ diff --git a/figs/bdz/img3.png b/figs/bdz/img3.png new file mode 100644 index 0000000..6f12957 Binary files /dev/null and b/figs/bdz/img3.png differ diff --git a/figs/bdz/img30.png b/figs/bdz/img30.png new file mode 100644 index 0000000..bc8b7dd Binary files /dev/null and b/figs/bdz/img30.png differ diff --git a/figs/bdz/img31.png b/figs/bdz/img31.png new file mode 100644 index 0000000..365af58 Binary files /dev/null and b/figs/bdz/img31.png differ diff --git a/figs/bdz/img32.png b/figs/bdz/img32.png new file mode 100644 index 0000000..81af967 Binary files /dev/null and b/figs/bdz/img32.png differ diff --git a/figs/bdz/img33.png b/figs/bdz/img33.png new file mode 100644 index 0000000..82c9229 Binary files /dev/null and b/figs/bdz/img33.png differ diff --git a/figs/bdz/img34.png b/figs/bdz/img34.png new file mode 100644 index 0000000..b6fcf46 Binary files /dev/null and b/figs/bdz/img34.png differ diff --git a/figs/bdz/img35.png b/figs/bdz/img35.png new file mode 100644 index 0000000..ba08621 Binary files /dev/null and b/figs/bdz/img35.png differ diff --git a/figs/bdz/img36.png b/figs/bdz/img36.png new file mode 100644 index 0000000..169fb93 Binary files /dev/null and b/figs/bdz/img36.png differ diff --git a/figs/bdz/img37.png b/figs/bdz/img37.png new file mode 100644 index 0000000..dc766e6 Binary files /dev/null and b/figs/bdz/img37.png differ diff --git a/figs/bdz/img38.png b/figs/bdz/img38.png new file mode 100644 index 0000000..5b58a4f Binary files /dev/null and b/figs/bdz/img38.png differ diff --git a/figs/bdz/img39.png b/figs/bdz/img39.png new file mode 100644 index 0000000..c951bae Binary files /dev/null and b/figs/bdz/img39.png differ diff --git a/figs/bdz/img4.png b/figs/bdz/img4.png new file mode 100644 index 0000000..ac4059b Binary files /dev/null and b/figs/bdz/img4.png differ diff --git a/figs/bdz/img40.png b/figs/bdz/img40.png new file mode 100644 index 0000000..dd582c9 Binary files /dev/null and b/figs/bdz/img40.png differ diff --git a/figs/bdz/img41.png b/figs/bdz/img41.png new file mode 100644 index 0000000..052ece0 Binary files /dev/null and b/figs/bdz/img41.png differ diff --git a/figs/bdz/img42.png b/figs/bdz/img42.png new file mode 100644 index 0000000..e6817d5 Binary files /dev/null and b/figs/bdz/img42.png differ diff --git a/figs/bdz/img43.png b/figs/bdz/img43.png new file mode 100644 index 0000000..1bd5e88 Binary files /dev/null and b/figs/bdz/img43.png differ diff --git a/figs/bdz/img44.png b/figs/bdz/img44.png new file mode 100644 index 0000000..83fc06b Binary files /dev/null and b/figs/bdz/img44.png differ diff --git a/figs/bdz/img45.png b/figs/bdz/img45.png new file mode 100644 index 0000000..805ce11 Binary files /dev/null and b/figs/bdz/img45.png differ diff --git a/figs/bdz/img46.png b/figs/bdz/img46.png new file mode 100644 index 0000000..bcb077f Binary files /dev/null and b/figs/bdz/img46.png differ diff --git a/figs/bdz/img47.png b/figs/bdz/img47.png new file mode 100644 index 0000000..15d2511 Binary files /dev/null and b/figs/bdz/img47.png differ diff --git a/figs/bdz/img48.png b/figs/bdz/img48.png new file mode 100644 index 0000000..10a75e9 Binary files /dev/null and b/figs/bdz/img48.png differ diff --git a/figs/bdz/img49.png b/figs/bdz/img49.png new file mode 100644 index 0000000..08d3c41 Binary files /dev/null and b/figs/bdz/img49.png differ diff --git a/figs/bdz/img5.png b/figs/bdz/img5.png new file mode 100644 index 0000000..8a601b2 Binary files /dev/null and b/figs/bdz/img5.png differ diff --git a/figs/bdz/img50.png b/figs/bdz/img50.png new file mode 100644 index 0000000..971b9fe Binary files /dev/null and b/figs/bdz/img50.png differ diff --git a/figs/bdz/img51.png b/figs/bdz/img51.png new file mode 100644 index 0000000..b2d8970 Binary files /dev/null and b/figs/bdz/img51.png differ diff --git a/figs/bdz/img52.png b/figs/bdz/img52.png new file mode 100644 index 0000000..6f9131d Binary files /dev/null and b/figs/bdz/img52.png differ diff --git a/figs/bdz/img53.png b/figs/bdz/img53.png new file mode 100644 index 0000000..419997a Binary files /dev/null and b/figs/bdz/img53.png differ diff --git a/figs/bdz/img54.png b/figs/bdz/img54.png new file mode 100644 index 0000000..7fdaf73 Binary files /dev/null and b/figs/bdz/img54.png differ diff --git a/figs/bdz/img55.png b/figs/bdz/img55.png new file mode 100644 index 0000000..8f7c5aa Binary files /dev/null and b/figs/bdz/img55.png differ diff --git a/figs/bdz/img56.png b/figs/bdz/img56.png new file mode 100644 index 0000000..005cfa5 Binary files /dev/null and b/figs/bdz/img56.png differ diff --git a/figs/bdz/img57.png b/figs/bdz/img57.png new file mode 100644 index 0000000..1335545 Binary files /dev/null and b/figs/bdz/img57.png differ diff --git a/figs/bdz/img58.png b/figs/bdz/img58.png new file mode 100644 index 0000000..7bed5d1 Binary files /dev/null and b/figs/bdz/img58.png differ diff --git a/figs/bdz/img59.png b/figs/bdz/img59.png new file mode 100644 index 0000000..4b53e6b Binary files /dev/null and b/figs/bdz/img59.png differ diff --git a/figs/bdz/img6.png b/figs/bdz/img6.png new file mode 100644 index 0000000..0ce7463 Binary files /dev/null and b/figs/bdz/img6.png differ diff --git a/figs/bdz/img60.png b/figs/bdz/img60.png new file mode 100644 index 0000000..6c275a5 Binary files /dev/null and b/figs/bdz/img60.png differ diff --git a/figs/bdz/img61.png b/figs/bdz/img61.png new file mode 100644 index 0000000..803a1d7 Binary files /dev/null and b/figs/bdz/img61.png differ diff --git a/figs/bdz/img62.png b/figs/bdz/img62.png new file mode 100644 index 0000000..c98fe78 Binary files /dev/null and b/figs/bdz/img62.png differ diff --git a/figs/bdz/img63.png b/figs/bdz/img63.png new file mode 100644 index 0000000..a9fbde4 Binary files /dev/null and b/figs/bdz/img63.png differ diff --git a/figs/bdz/img64.png b/figs/bdz/img64.png new file mode 100644 index 0000000..f1bfdc7 Binary files /dev/null and b/figs/bdz/img64.png differ diff --git a/figs/bdz/img65.png b/figs/bdz/img65.png new file mode 100644 index 0000000..78cee79 Binary files /dev/null and b/figs/bdz/img65.png differ diff --git a/figs/bdz/img66.png b/figs/bdz/img66.png new file mode 100644 index 0000000..77599c0 Binary files /dev/null and b/figs/bdz/img66.png differ diff --git a/figs/bdz/img67.png b/figs/bdz/img67.png new file mode 100644 index 0000000..4654a88 Binary files /dev/null and b/figs/bdz/img67.png differ diff --git a/figs/bdz/img68.png b/figs/bdz/img68.png new file mode 100644 index 0000000..ee429ec Binary files /dev/null and b/figs/bdz/img68.png differ diff --git a/figs/bdz/img69.png b/figs/bdz/img69.png new file mode 100644 index 0000000..66ef636 Binary files /dev/null and b/figs/bdz/img69.png differ diff --git a/figs/bdz/img7.png b/figs/bdz/img7.png new file mode 100644 index 0000000..214174a Binary files /dev/null and b/figs/bdz/img7.png differ diff --git a/figs/bdz/img70.png b/figs/bdz/img70.png new file mode 100644 index 0000000..cb305b1 Binary files /dev/null and b/figs/bdz/img70.png differ diff --git a/figs/bdz/img71.png b/figs/bdz/img71.png new file mode 100644 index 0000000..a4c645c Binary files /dev/null and b/figs/bdz/img71.png differ diff --git a/figs/bdz/img72.png b/figs/bdz/img72.png new file mode 100644 index 0000000..abd1b6d Binary files /dev/null and b/figs/bdz/img72.png differ diff --git a/figs/bdz/img73.png b/figs/bdz/img73.png new file mode 100644 index 0000000..0878ba6 Binary files /dev/null and b/figs/bdz/img73.png differ diff --git a/figs/bdz/img74.png b/figs/bdz/img74.png new file mode 100644 index 0000000..48b4c45 Binary files /dev/null and b/figs/bdz/img74.png differ diff --git a/figs/bdz/img75.png b/figs/bdz/img75.png new file mode 100644 index 0000000..e88e608 Binary files /dev/null and b/figs/bdz/img75.png differ diff --git a/figs/bdz/img76.png b/figs/bdz/img76.png new file mode 100644 index 0000000..373ffcd Binary files /dev/null and b/figs/bdz/img76.png differ diff --git a/figs/bdz/img77.png b/figs/bdz/img77.png new file mode 100644 index 0000000..f90d843 Binary files /dev/null and b/figs/bdz/img77.png differ diff --git a/figs/bdz/img78.png b/figs/bdz/img78.png new file mode 100644 index 0000000..5bb4230 Binary files /dev/null and b/figs/bdz/img78.png differ diff --git a/figs/bdz/img79.png b/figs/bdz/img79.png new file mode 100644 index 0000000..ecc271b Binary files /dev/null and b/figs/bdz/img79.png differ diff --git a/figs/bdz/img8.png b/figs/bdz/img8.png new file mode 100644 index 0000000..f9cbf14 Binary files /dev/null and b/figs/bdz/img8.png differ diff --git a/figs/bdz/img80.png b/figs/bdz/img80.png new file mode 100644 index 0000000..d99c085 Binary files /dev/null and b/figs/bdz/img80.png differ diff --git a/figs/bdz/img81.png b/figs/bdz/img81.png new file mode 100644 index 0000000..6dd8680 Binary files /dev/null and b/figs/bdz/img81.png differ diff --git a/figs/bdz/img82.png b/figs/bdz/img82.png new file mode 100644 index 0000000..3673fe9 Binary files /dev/null and b/figs/bdz/img82.png differ diff --git a/figs/bdz/img83.png b/figs/bdz/img83.png new file mode 100644 index 0000000..1b48dd3 Binary files /dev/null and b/figs/bdz/img83.png differ diff --git a/figs/bdz/img84.png b/figs/bdz/img84.png new file mode 100644 index 0000000..aa611e1 Binary files /dev/null and b/figs/bdz/img84.png differ diff --git a/figs/bdz/img85.png b/figs/bdz/img85.png new file mode 100644 index 0000000..464e03b Binary files /dev/null and b/figs/bdz/img85.png differ diff --git a/figs/bdz/img86.png b/figs/bdz/img86.png new file mode 100644 index 0000000..d22ecaf Binary files /dev/null and b/figs/bdz/img86.png differ diff --git a/figs/bdz/img87.png b/figs/bdz/img87.png new file mode 100644 index 0000000..15466c7 Binary files /dev/null and b/figs/bdz/img87.png differ diff --git a/figs/bdz/img88.png b/figs/bdz/img88.png new file mode 100644 index 0000000..59a1949 Binary files /dev/null and b/figs/bdz/img88.png differ diff --git a/figs/bdz/img89.png b/figs/bdz/img89.png new file mode 100644 index 0000000..d2a5978 Binary files /dev/null and b/figs/bdz/img89.png differ diff --git a/figs/bdz/img9.png b/figs/bdz/img9.png new file mode 100644 index 0000000..d20eff1 Binary files /dev/null and b/figs/bdz/img9.png differ diff --git a/figs/bdz/img90.png b/figs/bdz/img90.png new file mode 100644 index 0000000..9a0875e Binary files /dev/null and b/figs/bdz/img90.png differ diff --git a/figs/bdz/img91.png b/figs/bdz/img91.png new file mode 100644 index 0000000..b676745 Binary files /dev/null and b/figs/bdz/img91.png differ diff --git a/figs/bdz/img92.png b/figs/bdz/img92.png new file mode 100644 index 0000000..7bbb2ca Binary files /dev/null and b/figs/bdz/img92.png differ diff --git a/figs/bdz/img93.png b/figs/bdz/img93.png new file mode 100644 index 0000000..2f0f6fb Binary files /dev/null and b/figs/bdz/img93.png differ diff --git a/figs/bdz/img94.png b/figs/bdz/img94.png new file mode 100644 index 0000000..79ed640 Binary files /dev/null and b/figs/bdz/img94.png differ diff --git a/figs/bdz/img95.png b/figs/bdz/img95.png new file mode 100644 index 0000000..d2d76e2 Binary files /dev/null and b/figs/bdz/img95.png differ diff --git a/figs/bdz/img96.png b/figs/bdz/img96.png new file mode 100644 index 0000000..6878925 Binary files /dev/null and b/figs/bdz/img96.png differ diff --git a/figs/bdz/img97.png b/figs/bdz/img97.png new file mode 100644 index 0000000..05637c8 Binary files /dev/null and b/figs/bdz/img97.png differ diff --git a/figs/bdz/img98.png b/figs/bdz/img98.png new file mode 100644 index 0000000..0e6208a Binary files /dev/null and b/figs/bdz/img98.png differ diff --git a/figs/bdz/img99.png b/figs/bdz/img99.png new file mode 100644 index 0000000..bae28bc Binary files /dev/null and b/figs/bdz/img99.png differ diff --git a/figs/brz/bmz_temporegressao.png b/figs/brz/bmz_temporegressao.png new file mode 100644 index 0000000..dc6c6a9 Binary files /dev/null and b/figs/brz/bmz_temporegressao.png differ diff --git a/figs/brz/brz-partitioning.png b/figs/brz/brz-partitioning.png new file mode 100644 index 0000000..750d378 Binary files /dev/null and b/figs/brz/brz-partitioning.png differ diff --git a/figs/brz/brz.png b/figs/brz/brz.png new file mode 100644 index 0000000..6733d79 Binary files /dev/null and b/figs/brz/brz.png differ diff --git a/figs/brz/brz_temporegressao.eps b/figs/brz/brz_temporegressao.eps new file mode 100755 index 0000000..2986c0a --- /dev/null +++ b/figs/brz/brz_temporegressao.eps @@ -0,0 +1,701 @@ +%!PS-Adobe-2.0 EPSF-2.0 +%% This is a Stata generated postscript file +%%BoundingBox: 0 0 396 288 +%%HiResBoundingBox: 0.000 0.000 396.000 288.000 +/xratio 0.012375 def +/yratio 0.012375 def +/Sbgfill { + /y1 exch def + /x1 exch def + /y0 exch def + /x0 exch def + x0 y0 moveto + x0 y1 lineto x1 y1 lineto x1 y0 lineto x0 y0 lineto + fill +} def +/Spt { + yratio mul + /yp exch def + xratio mul + /xp exch def + Slrgb setrgbcolor + newpath + xp yp moveto + xp Slw add yp + lineto + currentlinecap + 1 setlinecap + stroke + setlinecap +} def +/Sln { + yratio mul + /y1p exch def + xratio mul + /x1p exch def + yratio mul + /y0p exch def + xratio mul + /x0p exch def + Slw setlinewidth + Slrgb setrgbcolor + x0p y0p M x1p y1p lineto S +} def +/Stxtl { + /sp exch def + yratio mul + /sizep exch def + dup + /anglep exch def + 0 exch sub + /angle2p exch def + yratio mul + /y0p exch def + xratio mul + /x0p exch def + Strgb setrgbcolor + x0p y0p M anglep rotate sizep fntsize sp show stroke angle2p rotate clear +} def +/Stxtc { + /sp exch def + yratio mul + /sizep exch def + dup + /anglep exch def + 0 exch sub + /angle2p exch def + yratio mul + /y0p exch def + xratio mul + /x0p exch def + Strgb setrgbcolor + x0p y0p M anglep rotate sizep fntsize sp stringwidth exch -2 div exch rm sp show stroke angle2p rotate clear +} def +/Stxtr { + /sp exch def + yratio mul + /sizep exch def + dup + /anglep exch def + 0 exch sub + /angle2p exch def + yratio mul + /y0p exch def + xratio mul + /x0p exch def + Strgb setrgbcolor + x0p y0p M anglep rotate sizep fntsize sp stringwidth 1 index -1 mul exch rm pop sp show stroke angle2p rotate clear +} def +/Srect { + /sfill exch def + yratio mul + /y1 exch def + xratio mul + /x1 exch def + yratio mul + /y0 exch def + xratio mul + /x0 exch def + sfill 1 eq { + Ssrgb setrgbcolor + x0 y0 moveto + x0 y1 lineto x1 y1 lineto x1 y0 lineto x0 y0 lineto + fill + } if + Slw setlinewidth + Slrgb setrgbcolor + x0 y0 moveto + x0 y1 lineto x1 y1 lineto x1 y0 lineto x0 y0 lineto + stroke +} def +/Stri { + /sfill exch def + xratio mul + /r exch def + yratio mul + /y0 exch def + xratio mul + /x0 exch def + /xcen x0 def + y0 r add + /ytop exch def + r 2 div + y0 exch sub + /ybot exch def + r 3 sqrt 2 div mul dup + xcen exch sub + /xleft exch def + xcen add + /xright exch def + sfill 1 eq { + Ssrgb setrgbcolor + xcen ytop moveto xright ybot lineto xleft ybot lineto xcen ytop lineto fill + } if + Slw setlinewidth + Slrgb setrgbcolor + xcen ytop moveto xright ybot lineto xleft ybot lineto xcen ytop lineto stroke +} def +/Soldtri { + /sfill exch def + xratio mul + /r exch def + yratio mul + /y0 exch def + xratio mul + /x0 exch def + x0 r sub + /x1 exch def + y0 r sub + /y1 exch def + x0 r add + /x2 exch def + y0 r sub + /y2 exch def + /x3 x0 def + y0 r add + /y3 exch def + sfill 1 eq { + Ssrgb setrgbcolor + x1 y1 moveto x2 y2 lineto x3 y3 lineto x1 y1 lineto fill + } if + Slw setlinewidth + Slrgb setrgbcolor + x1 y1 moveto x2 y2 lineto x3 y3 lineto x1 y1 lineto stroke +} def +/Sdia { + /sfill exch def + xratio mul + /r exch def + yratio mul + /y exch def + xratio mul + /x exch def + x r sub + /x0 exch def + /y0 y def + /x1 x def + y r sub + /y1 exch def + x r add + /x2 exch def + /y2 y def + /x3 x def + y r add + /y3 exch def + sfill 1 eq { + Ssrgb setrgbcolor + x0 y0 moveto x1 y1 lineto x2 y2 lineto x3 y3 lineto x0 y0 lineto fill + } if + Slw setlinewidth + Slrgb setrgbcolor + x0 y0 moveto x1 y1 lineto x2 y2 lineto x3 y3 lineto x0 y0 lineto stroke +} def +/Scc { + /sfill exch def + xratio mul + /r0 exch def + yratio mul + /y0 exch def + xratio mul + /x0 exch def + sfill 1 eq { + Ssrgb setrgbcolor + x0 y0 r0 0 360 arc fill + } if + Slw setlinewidth + Slrgb setrgbcolor + x0 y0 r0 0 360 arc stroke +} def +/Spie { + /sfill exch def + /a1 exch def + /a0 exch def + xratio mul + /r exch def + yratio mul + /y exch def + xratio mul + /x exch def + sfill 1 eq { + Ssrgb setrgbcolor + newpath x y moveto x y r a0 a1 arc closepath + fill + } if + Slw setlinewidth + Slrgb setrgbcolor + newpath x y moveto x y r a0 a1 arc closepath + stroke +} def +/Splu { + xratio mul + /r exch def + yratio mul + /y exch def + xratio mul + /x exch def + x r sub + /x0 exch def + x r add + /x1 exch def + x0 y M x1 y L + y r sub + /y0 exch def + y r add + /y1 exch def + x y0 M x y1 L +} def +/Scro { + xratio mul + /r exch def + yratio mul + /y exch def + xratio mul + /x exch def + x r sub + /x0 exch def + y r sub + /y0 exch def + x r add + /x1 exch def + y r add + /y1 exch def + x0 y0 M x1 y1 L + x r add + /x0 exch def + y r sub + /y0 exch def + x r sub + /x1 exch def + y r add + /y1 exch def + x0 y0 M x1 y1 L +} def +/Sm { + yratio mul + /y exch def + xratio mul + /x exch def + x y M +} def +/Sl { + yratio mul + /y exch def + xratio mul + /x exch def + x y L +} def +/SPl { + yratio mul + /y exch def + xratio mul + /x exch def + x y PL +} def +/Sbp { + newpath +} def +/Sep { + /sfill exch def + closepath + sfill 1 eq { + Ssrgb setrgbcolor + gsave + fill + grestore + } if + Slw setlinewidth + Slrgb setrgbcolor + stroke +} def +/cp {currentpoint} def +/M {moveto} def +/rm {rmoveto} def +/S {stroke} def +/L {Slw setlinewidth Slrgb setrgbcolor lineto S} def +/PL {Slw setlinewidth Slrgb setrgbcolor lineto} def +/MF { % make new latin1 encoded font + /newfontname exch def + /fontname exch def + /fontdict fontname findfont def + /newfont fontdict maxlength dict def + fontdict { + exch dup /FID eq {pop pop} {exch newfont 3 1 roll put} ifelse + } forall + newfont /FontName newfontname put + newfont /Encoding ISOLatin1Encoding put + newfontname newfont definefont pop +} def +/Helvetica /Helvetica-Latin1 MF +/reg {/Helvetica-Latin1 findfont 1 scalefont setfont } def +/fntsize {/Helvetica-Latin1 findfont exch scalefont setfont } def +/Slw 0.120 def +0.882 0.902 0.941 setrgbcolor +0 0 396.000 288.000 Sbgfill +/Slrgb {1.000 1.000 1.000} def +/Strgb {1.000 1.000 1.000} def +/Ssrgb {1.000 1.000 1.000} def +/Ssrgb {0.882 0.902 0.941} def +/Slw 0.576 def +/Slrgb {0.882 0.902 0.941} def +0 0 31999 23272 1 Srect +/Ssrgb {1.000 1.000 1.000} def +/Slrgb {1.000 1.000 1.000} def +4189 4189 31184 22457 1 Srect +/Slrgb {0.000 0.000 0.000} def +/Strgb {0.000 0.000 0.000} def +/Slw 0.864 def +/Slrgb {0.882 0.902 0.941} def +4189 4856 31184 4856 Sln +/Slw 0.576 def +/Slrgb {0.000 0.000 0.000} def +/Slw 0.864 def +/Slrgb {0.882 0.902 0.941} def +4189 10552 31184 10552 Sln +/Slw 0.576 def +/Slrgb {0.000 0.000 0.000} def +/Slw 0.864 def +/Slrgb {0.882 0.902 0.941} def +4189 16249 31184 16249 Sln +/Slw 0.576 def +/Slrgb {0.000 0.000 0.000} def +/Slw 0.864 def +/Slrgb {0.882 0.902 0.941} def +4189 21945 31184 21945 Sln +/Slw 0.576 def +/Slrgb {0.000 0.000 0.000} def +/Slw 0.864 def +/Slrgb {0.153 0.247 0.435} def +/Ssrgb {0.153 0.247 0.435} def +4726 4864 178 1 Scc +4726 4863 178 1 Scc +4726 4863 178 1 Scc +4726 4863 178 1 Scc +4726 4863 178 1 Scc +4726 4863 178 1 Scc +4726 4863 178 1 Scc +4726 4863 178 1 Scc +4726 4863 178 1 Scc +4726 4863 178 1 Scc +4752 4871 178 1 Scc +4752 4871 178 1 Scc +4752 4871 178 1 Scc +4752 4871 178 1 Scc +4752 4871 178 1 Scc +4752 4871 178 1 Scc +4752 4870 178 1 Scc +4752 4871 178 1 Scc +4752 4871 178 1 Scc +4752 4871 178 1 Scc +4804 4889 178 1 Scc +4804 4890 178 1 Scc +4804 4892 178 1 Scc +4804 4891 178 1 Scc +4804 4892 178 1 Scc +4804 4892 178 1 Scc +4804 4892 178 1 Scc +4804 4892 178 1 Scc +4804 4892 178 1 Scc +4804 4892 178 1 Scc +4908 4935 178 1 Scc +4908 4935 178 1 Scc +4908 4934 178 1 Scc +4908 4934 178 1 Scc +4908 4935 178 1 Scc +4908 4933 178 1 Scc +4908 4935 178 1 Scc +4908 4933 178 1 Scc +4908 4939 178 1 Scc +4908 4934 178 1 Scc +5116 5015 178 1 Scc +5116 5013 178 1 Scc +5116 5014 178 1 Scc +5116 5015 178 1 Scc +5116 5013 178 1 Scc +5116 5013 178 1 Scc +5116 5016 178 1 Scc +5116 5014 178 1 Scc +5116 5013 178 1 Scc +5116 5026 178 1 Scc +5531 5179 178 1 Scc +5531 5182 178 1 Scc +5531 5179 178 1 Scc +5531 5181 178 1 Scc +5531 5180 178 1 Scc +5531 5177 178 1 Scc +5531 5177 178 1 Scc +5531 5179 178 1 Scc +5531 5178 178 1 Scc +5531 5177 178 1 Scc +6363 5518 178 1 Scc +6363 5524 178 1 Scc +6363 5521 178 1 Scc +6363 5522 178 1 Scc +6363 5524 178 1 Scc +6363 5527 178 1 Scc +6363 5541 178 1 Scc +6363 5524 178 1 Scc +6363 5526 178 1 Scc +6363 5523 178 1 Scc +8025 6265 178 1 Scc +8025 6247 178 1 Scc +8025 6239 178 1 Scc +8025 6251 178 1 Scc +8025 6251 178 1 Scc +8025 6244 178 1 Scc +8025 6251 178 1 Scc +8025 6256 178 1 Scc +8025 6240 178 1 Scc +8025 6247 178 1 Scc +17998 11668 178 1 Scc +17998 11667 178 1 Scc +17998 11661 178 1 Scc +17998 11634 178 1 Scc +17998 11640 178 1 Scc +17998 11645 178 1 Scc +17998 11680 178 1 Scc +17998 11649 178 1 Scc +17998 11645 178 1 Scc +17998 11639 178 1 Scc +30672 19916 178 1 Scc +30672 19954 178 1 Scc +30672 19915 178 1 Scc +30672 19960 178 1 Scc +30672 19920 178 1 Scc +30672 19936 178 1 Scc +30672 19916 178 1 Scc +30672 19905 178 1 Scc +30672 19928 178 1 Scc +/Slrgb {0.000 0.000 0.000} def +4726 4700 Sm +4726 4700 Sl +4726 4700 Sm +4726 4700 Sl +4726 4700 Sm +4726 4700 Sl +4726 4700 Sm +4726 4700 Sl +4726 4700 Sm +4726 4700 Sl +4726 4700 Sm +4726 4700 Sl +4726 4700 Sm +4726 4700 Sl +4726 4700 Sm +4726 4700 Sl +4726 4700 Sm +4726 4700 Sl +4726 4700 Sm +4752 4715 Sl +4752 4715 Sm +4752 4715 Sl +4752 4715 Sm +4752 4715 Sl +4752 4715 Sm +4752 4715 Sl +4752 4715 Sm +4752 4715 Sl +4752 4715 Sm +4752 4715 Sl +4752 4715 Sm +4752 4715 Sl +4752 4715 Sm +4752 4715 Sl +4752 4715 Sm +4752 4715 Sl +4752 4715 Sm +4752 4715 Sl +4752 4715 Sm +4804 4745 Sl +4804 4745 Sm +4804 4745 Sl +4804 4745 Sm +4804 4745 Sl +4804 4745 Sm +4804 4745 Sl +4804 4745 Sm +4804 4745 Sl +4804 4745 Sm +4804 4745 Sl +4804 4745 Sm +4804 4745 Sl +4804 4745 Sm +4804 4745 Sl +4804 4745 Sm +4804 4745 Sl +4804 4745 Sm +4804 4745 Sl +4804 4745 Sm +4908 4804 Sl +4908 4804 Sm +4908 4804 Sl +4908 4804 Sm +4908 4804 Sl +4908 4804 Sm +4908 4804 Sl +4908 4804 Sm +4908 4804 Sl +4908 4804 Sm +4908 4804 Sl +4908 4804 Sm +4908 4804 Sl +4908 4804 Sm +4908 4804 Sl +4908 4804 Sm +4908 4804 Sl +4908 4804 Sm +4908 4804 Sl +4908 4804 Sm +5116 4923 Sl +5116 4923 Sm +5116 4923 Sl +5116 4923 Sm +5116 4923 Sl +5116 4923 Sm +5116 4923 Sl +5116 4923 Sm +5116 4923 Sl +5116 4923 Sm +5116 4923 Sl +5116 4923 Sm +5116 4923 Sl +5116 4923 Sm +5116 4923 Sl +5116 4923 Sm +5116 4923 Sl +5116 4923 Sm +5116 4923 Sl +5116 4923 Sm +5531 5160 Sl +5531 5160 Sm +5531 5160 Sl +5531 5160 Sm +5531 5160 Sl +5531 5160 Sm +5531 5160 Sl +5531 5160 Sm +5531 5160 Sl +5531 5160 Sm +5531 5160 Sl +5531 5160 Sm +5531 5160 Sl +5531 5160 Sm +5531 5160 Sl +5531 5160 Sm +5531 5160 Sl +5531 5160 Sm +5531 5160 Sl +5531 5160 Sm +6363 5635 Sl +6363 5635 Sm +6363 5635 Sl +6363 5635 Sm +6363 5635 Sl +6363 5635 Sm +6363 5635 Sl +6363 5635 Sm +6363 5635 Sl +6363 5635 Sm +6363 5635 Sl +6363 5635 Sm +6363 5635 Sl +6363 5635 Sm +6363 5635 Sl +6363 5635 Sm +6363 5635 Sl +6363 5635 Sm +6363 5635 Sl +6363 5635 Sm +8025 6584 Sl +8025 6584 Sm +8025 6584 Sl +8025 6584 Sm +8025 6584 Sl +8025 6584 Sm +8025 6584 Sl +8025 6584 Sm +8025 6584 Sl +8025 6584 Sm +8025 6584 Sl +8025 6584 Sm +8025 6584 Sl +8025 6584 Sm +8025 6584 Sl +8025 6584 Sm +8025 6584 Sl +8025 6584 Sm +8025 6584 Sl +8025 6584 Sm +17998 12281 Sl +17998 12281 Sm +17998 12281 Sl +17998 12281 Sm +17998 12281 Sl +17998 12281 Sm +17998 12281 Sl +17998 12281 Sm +17998 12281 Sl +17998 12281 Sm +17998 12281 Sl +17998 12281 Sm +17998 12281 Sl +17998 12281 Sm +17998 12281 Sl +17998 12281 Sm +17998 12281 Sl +17998 12281 Sm +17998 12281 Sl +17998 12281 Sm +30672 19520 Sl +30672 19520 Sm +30672 19520 Sl +30672 19520 Sm +30672 19520 Sl +30672 19520 Sm +30672 19520 Sl +30672 19520 Sm +30672 19520 Sl +30672 19520 Sm +30672 19520 Sl +30672 19520 Sm +30672 19520 Sl +30672 19520 Sm +30672 19520 Sl +30672 19520 Sm +30672 19520 Sl +/Slw 0.576 def +4189 4189 4189 22457 Sln +4189 4856 3866 4856 Sln +3365 4856 90.000 1131 (0) Stxtc +4189 10552 3866 10552 Sln +3365 10552 90.000 1131 (5000) Stxtc +4189 16249 3866 16249 Sln +3365 16249 90.000 1131 (10000) Stxtc +4189 21945 3866 21945 Sln +3365 21945 90.000 1131 (15000) Stxtc +1768 13323 90.000 1131 (Time \(s\)) Stxtc +4189 4189 31184 4189 Sln +4701 4189 4701 3866 Sln +4701 2912 0.000 1131 (0) Stxtc +9895 4189 9895 3866 Sln +9895 2912 0.000 1131 (200) Stxtc +15090 4189 15090 3866 Sln +15090 2912 0.000 1131 (400) Stxtc +20284 4189 20284 3866 Sln +20284 2912 0.000 1131 (600) Stxtc +25478 4189 25478 3866 Sln +25478 2912 0.000 1131 (800) Stxtc +30672 4189 30672 3866 Sln +30672 2912 0.000 1131 (1000) Stxtc +17687 1315 0.000 1131 (Number of keys \(millions\)) Stxtc +/Ssrgb {1.000 1.000 1.000} def +4468 20349 29084 22178 1 Srect +/Slw 0.864 def +/Slrgb {0.153 0.247 0.435} def +/Ssrgb {0.153 0.247 0.435} def +4817 21263 178 1 Scc +/Slrgb {0.000 0.000 0.000} def +16281 21263 19306 21263 Sln +5302 21037 0.000 1131 (Experimental times) Stxtl +19791 21037 0.000 1131 (Linear regression) Stxtl +S showpage +%%EOF diff --git a/figs/brz/brz_temporegressao.png b/figs/brz/brz_temporegressao.png new file mode 100644 index 0000000..851570d Binary files /dev/null and b/figs/brz/brz_temporegressao.png differ diff --git a/figs/brz/img135.png b/figs/brz/img135.png new file mode 100644 index 0000000..d05d35a Binary files /dev/null and b/figs/brz/img135.png differ diff --git a/figs/brz/img159.png b/figs/brz/img159.png new file mode 100644 index 0000000..8f61c14 Binary files /dev/null and b/figs/brz/img159.png differ diff --git a/figs/brz/img160.png b/figs/brz/img160.png new file mode 100644 index 0000000..f6f5068 Binary files /dev/null and b/figs/brz/img160.png differ diff --git a/figs/brz/img162.png b/figs/brz/img162.png new file mode 100644 index 0000000..ffccb38 Binary files /dev/null and b/figs/brz/img162.png differ diff --git a/figs/brz/img167.png b/figs/brz/img167.png new file mode 100644 index 0000000..ec42d9f Binary files /dev/null and b/figs/brz/img167.png differ diff --git a/figs/brz/img168.png b/figs/brz/img168.png new file mode 100644 index 0000000..47fbf9e Binary files /dev/null and b/figs/brz/img168.png differ diff --git a/figs/brz/img169.png b/figs/brz/img169.png new file mode 100644 index 0000000..bb5055e Binary files /dev/null and b/figs/brz/img169.png differ diff --git a/figs/brz/img170.png b/figs/brz/img170.png new file mode 100644 index 0000000..2019c65 Binary files /dev/null and b/figs/brz/img170.png differ diff --git a/figs/brz/img171.png b/figs/brz/img171.png new file mode 100644 index 0000000..f7d3827 Binary files /dev/null and b/figs/brz/img171.png differ diff --git a/figs/brz/img172.png b/figs/brz/img172.png new file mode 100644 index 0000000..406afa8 Binary files /dev/null and b/figs/brz/img172.png differ diff --git a/figs/brz/img173.png b/figs/brz/img173.png new file mode 100644 index 0000000..092b379 Binary files /dev/null and b/figs/brz/img173.png differ diff --git a/figs/brz/img174.png b/figs/brz/img174.png new file mode 100644 index 0000000..cde8e03 Binary files /dev/null and b/figs/brz/img174.png differ diff --git a/figs/brz/img175.png b/figs/brz/img175.png new file mode 100644 index 0000000..e72529a Binary files /dev/null and b/figs/brz/img175.png differ diff --git a/figs/brz/img176.png b/figs/brz/img176.png new file mode 100644 index 0000000..4b2b690 Binary files /dev/null and b/figs/brz/img176.png differ diff --git a/figs/brz/img177.png b/figs/brz/img177.png new file mode 100644 index 0000000..bf32902 Binary files /dev/null and b/figs/brz/img177.png differ diff --git a/figs/brz/img178.png b/figs/brz/img178.png new file mode 100644 index 0000000..1787b97 Binary files /dev/null and b/figs/brz/img178.png differ diff --git a/figs/brz/img179.png b/figs/brz/img179.png new file mode 100644 index 0000000..bc2cfec Binary files /dev/null and b/figs/brz/img179.png differ diff --git a/figs/brz/img181.png b/figs/brz/img181.png new file mode 100644 index 0000000..cd14e96 Binary files /dev/null and b/figs/brz/img181.png differ diff --git a/figs/brz/img182.png b/figs/brz/img182.png new file mode 100644 index 0000000..be8bab0 Binary files /dev/null and b/figs/brz/img182.png differ diff --git a/figs/brz/img187.png b/figs/brz/img187.png new file mode 100644 index 0000000..50ae12f Binary files /dev/null and b/figs/brz/img187.png differ diff --git a/figs/brz/img188.png b/figs/brz/img188.png new file mode 100644 index 0000000..86096bf Binary files /dev/null and b/figs/brz/img188.png differ diff --git a/figs/brz/img189.png b/figs/brz/img189.png new file mode 100644 index 0000000..ee14934 Binary files /dev/null and b/figs/brz/img189.png differ diff --git a/figs/brz/img19.png b/figs/brz/img19.png new file mode 100644 index 0000000..d0904ab Binary files /dev/null and b/figs/brz/img19.png differ diff --git a/figs/brz/img190.png b/figs/brz/img190.png new file mode 100644 index 0000000..da1b90b Binary files /dev/null and b/figs/brz/img190.png differ diff --git a/figs/brz/img191.png b/figs/brz/img191.png new file mode 100644 index 0000000..7272572 Binary files /dev/null and b/figs/brz/img191.png differ diff --git a/figs/brz/img192.png b/figs/brz/img192.png new file mode 100644 index 0000000..71acf80 Binary files /dev/null and b/figs/brz/img192.png differ diff --git a/figs/brz/img193.png b/figs/brz/img193.png new file mode 100644 index 0000000..6d2d374 Binary files /dev/null and b/figs/brz/img193.png differ diff --git a/figs/brz/img194.png b/figs/brz/img194.png new file mode 100644 index 0000000..8fa967d Binary files /dev/null and b/figs/brz/img194.png differ diff --git a/figs/brz/img195.png b/figs/brz/img195.png new file mode 100644 index 0000000..5b84b7c Binary files /dev/null and b/figs/brz/img195.png differ diff --git a/figs/brz/img196.png b/figs/brz/img196.png new file mode 100644 index 0000000..5f6dcff Binary files /dev/null and b/figs/brz/img196.png differ diff --git a/figs/brz/img197.png b/figs/brz/img197.png new file mode 100644 index 0000000..339ba68 Binary files /dev/null and b/figs/brz/img197.png differ diff --git a/figs/brz/img198.png b/figs/brz/img198.png new file mode 100644 index 0000000..2340ee0 Binary files /dev/null and b/figs/brz/img198.png differ diff --git a/figs/brz/img199.png b/figs/brz/img199.png new file mode 100644 index 0000000..9f06f9c Binary files /dev/null and b/figs/brz/img199.png differ diff --git a/figs/brz/img200.png b/figs/brz/img200.png new file mode 100644 index 0000000..a8860fc Binary files /dev/null and b/figs/brz/img200.png differ diff --git a/figs/brz/img201.png b/figs/brz/img201.png new file mode 100644 index 0000000..fbf7a1d Binary files /dev/null and b/figs/brz/img201.png differ diff --git a/figs/brz/img202.png b/figs/brz/img202.png new file mode 100644 index 0000000..11b3dda Binary files /dev/null and b/figs/brz/img202.png differ diff --git a/figs/brz/img203.png b/figs/brz/img203.png new file mode 100644 index 0000000..777165f Binary files /dev/null and b/figs/brz/img203.png differ diff --git a/figs/brz/img204.png b/figs/brz/img204.png new file mode 100644 index 0000000..8af2aad Binary files /dev/null and b/figs/brz/img204.png differ diff --git a/figs/brz/img205.png b/figs/brz/img205.png new file mode 100644 index 0000000..e20316c Binary files /dev/null and b/figs/brz/img205.png differ diff --git a/figs/brz/img206.png b/figs/brz/img206.png new file mode 100644 index 0000000..d2fdd29 Binary files /dev/null and b/figs/brz/img206.png differ diff --git a/figs/brz/img207.png b/figs/brz/img207.png new file mode 100644 index 0000000..d5e4612 Binary files /dev/null and b/figs/brz/img207.png differ diff --git a/figs/brz/img209.png b/figs/brz/img209.png new file mode 100644 index 0000000..826181a Binary files /dev/null and b/figs/brz/img209.png differ diff --git a/figs/brz/img210.png b/figs/brz/img210.png new file mode 100644 index 0000000..8f50457 Binary files /dev/null and b/figs/brz/img210.png differ diff --git a/figs/brz/img212.png b/figs/brz/img212.png new file mode 100644 index 0000000..b2373fc Binary files /dev/null and b/figs/brz/img212.png differ diff --git a/figs/brz/img213.png b/figs/brz/img213.png new file mode 100644 index 0000000..718d3cf Binary files /dev/null and b/figs/brz/img213.png differ diff --git a/figs/brz/img214.png b/figs/brz/img214.png new file mode 100644 index 0000000..f73229b Binary files /dev/null and b/figs/brz/img214.png differ diff --git a/figs/brz/img215.png b/figs/brz/img215.png new file mode 100644 index 0000000..2dc082b Binary files /dev/null and b/figs/brz/img215.png differ diff --git a/figs/brz/img216.png b/figs/brz/img216.png new file mode 100644 index 0000000..cec0a36 Binary files /dev/null and b/figs/brz/img216.png differ diff --git a/figs/brz/img217.png b/figs/brz/img217.png new file mode 100644 index 0000000..ef8bc48 Binary files /dev/null and b/figs/brz/img217.png differ diff --git a/figs/brz/img218.png b/figs/brz/img218.png new file mode 100644 index 0000000..dbb8e20 Binary files /dev/null and b/figs/brz/img218.png differ diff --git a/figs/brz/img219.png b/figs/brz/img219.png new file mode 100644 index 0000000..3c8198b Binary files /dev/null and b/figs/brz/img219.png differ diff --git a/figs/brz/img22.png b/figs/brz/img22.png new file mode 100644 index 0000000..6d4ec43 Binary files /dev/null and b/figs/brz/img22.png differ diff --git a/figs/brz/img220.png b/figs/brz/img220.png new file mode 100644 index 0000000..b33351f Binary files /dev/null and b/figs/brz/img220.png differ diff --git a/figs/brz/img221.png b/figs/brz/img221.png new file mode 100644 index 0000000..9c6ed0c Binary files /dev/null and b/figs/brz/img221.png differ diff --git a/figs/brz/img222.png b/figs/brz/img222.png new file mode 100644 index 0000000..d9fe91a Binary files /dev/null and b/figs/brz/img222.png differ diff --git a/figs/brz/img223.png b/figs/brz/img223.png new file mode 100644 index 0000000..c9969f6 Binary files /dev/null and b/figs/brz/img223.png differ diff --git a/figs/brz/img224.png b/figs/brz/img224.png new file mode 100644 index 0000000..6113dc1 Binary files /dev/null and b/figs/brz/img224.png differ diff --git a/figs/brz/img225.png b/figs/brz/img225.png new file mode 100644 index 0000000..d9d1e91 Binary files /dev/null and b/figs/brz/img225.png differ diff --git a/figs/brz/img226.png b/figs/brz/img226.png new file mode 100644 index 0000000..11713b3 Binary files /dev/null and b/figs/brz/img226.png differ diff --git a/figs/brz/img227.png b/figs/brz/img227.png new file mode 100644 index 0000000..47e6cee Binary files /dev/null and b/figs/brz/img227.png differ diff --git a/figs/brz/img228.png b/figs/brz/img228.png new file mode 100644 index 0000000..464162f Binary files /dev/null and b/figs/brz/img228.png differ diff --git a/figs/brz/img229.png b/figs/brz/img229.png new file mode 100644 index 0000000..c4efc26 Binary files /dev/null and b/figs/brz/img229.png differ diff --git a/figs/brz/img23.png b/figs/brz/img23.png new file mode 100644 index 0000000..4ef1c82 Binary files /dev/null and b/figs/brz/img23.png differ diff --git a/figs/brz/img230.png b/figs/brz/img230.png new file mode 100644 index 0000000..44c50ff Binary files /dev/null and b/figs/brz/img230.png differ diff --git a/figs/brz/img231.png b/figs/brz/img231.png new file mode 100644 index 0000000..08f939d Binary files /dev/null and b/figs/brz/img231.png differ diff --git a/figs/brz/img232.png b/figs/brz/img232.png new file mode 100644 index 0000000..f6336b6 Binary files /dev/null and b/figs/brz/img232.png differ diff --git a/figs/brz/img233.png b/figs/brz/img233.png new file mode 100644 index 0000000..e7c3100 Binary files /dev/null and b/figs/brz/img233.png differ diff --git a/figs/brz/img234.png b/figs/brz/img234.png new file mode 100644 index 0000000..e02cf5c Binary files /dev/null and b/figs/brz/img234.png differ diff --git a/figs/brz/img235.png b/figs/brz/img235.png new file mode 100644 index 0000000..8a4239c Binary files /dev/null and b/figs/brz/img235.png differ diff --git a/figs/brz/img236.png b/figs/brz/img236.png new file mode 100644 index 0000000..461172f Binary files /dev/null and b/figs/brz/img236.png differ diff --git a/figs/brz/img237.png b/figs/brz/img237.png new file mode 100644 index 0000000..d9282e5 Binary files /dev/null and b/figs/brz/img237.png differ diff --git a/figs/brz/img238.png b/figs/brz/img238.png new file mode 100644 index 0000000..59a24fc Binary files /dev/null and b/figs/brz/img238.png differ diff --git a/figs/brz/img239.png b/figs/brz/img239.png new file mode 100644 index 0000000..3e2a468 Binary files /dev/null and b/figs/brz/img239.png differ diff --git a/figs/brz/img24.png b/figs/brz/img24.png new file mode 100644 index 0000000..c2f7a58 Binary files /dev/null and b/figs/brz/img24.png differ diff --git a/figs/brz/img240.png b/figs/brz/img240.png new file mode 100644 index 0000000..3f51c58 Binary files /dev/null and b/figs/brz/img240.png differ diff --git a/figs/brz/img241.png b/figs/brz/img241.png new file mode 100644 index 0000000..9e68232 Binary files /dev/null and b/figs/brz/img241.png differ diff --git a/figs/brz/img242.png b/figs/brz/img242.png new file mode 100644 index 0000000..c15b703 Binary files /dev/null and b/figs/brz/img242.png differ diff --git a/figs/brz/img243.png b/figs/brz/img243.png new file mode 100644 index 0000000..f081eae Binary files /dev/null and b/figs/brz/img243.png differ diff --git a/figs/brz/img4.png b/figs/brz/img4.png new file mode 100644 index 0000000..42bbe81 Binary files /dev/null and b/figs/brz/img4.png differ diff --git a/figs/brz/img42.png b/figs/brz/img42.png new file mode 100644 index 0000000..7a19030 Binary files /dev/null and b/figs/brz/img42.png differ diff --git a/figs/brz/img43.png b/figs/brz/img43.png new file mode 100644 index 0000000..317d1cb Binary files /dev/null and b/figs/brz/img43.png differ diff --git a/figs/brz/img44.png b/figs/brz/img44.png new file mode 100644 index 0000000..435bbf1 Binary files /dev/null and b/figs/brz/img44.png differ diff --git a/figs/brz/img46.png b/figs/brz/img46.png new file mode 100644 index 0000000..37142c9 Binary files /dev/null and b/figs/brz/img46.png differ diff --git a/figs/brz/img47.png b/figs/brz/img47.png new file mode 100644 index 0000000..8c3006f Binary files /dev/null and b/figs/brz/img47.png differ diff --git a/figs/brz/img49.png b/figs/brz/img49.png new file mode 100644 index 0000000..eebdfb0 Binary files /dev/null and b/figs/brz/img49.png differ diff --git a/figs/brz/img5.png b/figs/brz/img5.png new file mode 100644 index 0000000..4047b12 Binary files /dev/null and b/figs/brz/img5.png differ diff --git a/figs/brz/img50.png b/figs/brz/img50.png new file mode 100644 index 0000000..cdbf3fe Binary files /dev/null and b/figs/brz/img50.png differ diff --git a/figs/brz/img54.png b/figs/brz/img54.png new file mode 100644 index 0000000..6187668 Binary files /dev/null and b/figs/brz/img54.png differ diff --git a/figs/brz/img55.png b/figs/brz/img55.png new file mode 100644 index 0000000..b1290ec Binary files /dev/null and b/figs/brz/img55.png differ diff --git a/figs/brz/img57.png b/figs/brz/img57.png new file mode 100644 index 0000000..218bcb3 Binary files /dev/null and b/figs/brz/img57.png differ diff --git a/figs/brz/img58.png b/figs/brz/img58.png new file mode 100644 index 0000000..04d6356 Binary files /dev/null and b/figs/brz/img58.png differ diff --git a/figs/brz/img63.png b/figs/brz/img63.png new file mode 100644 index 0000000..89a86fd Binary files /dev/null and b/figs/brz/img63.png differ diff --git a/figs/brz/img64.png b/figs/brz/img64.png new file mode 100644 index 0000000..0340f8a Binary files /dev/null and b/figs/brz/img64.png differ diff --git a/figs/brz/img66.png b/figs/brz/img66.png new file mode 100644 index 0000000..e90b7b5 Binary files /dev/null and b/figs/brz/img66.png differ diff --git a/figs/brz/img67.png b/figs/brz/img67.png new file mode 100644 index 0000000..0b8a6cf Binary files /dev/null and b/figs/brz/img67.png differ diff --git a/figs/brz/img8.png b/figs/brz/img8.png new file mode 100644 index 0000000..d257d38 Binary files /dev/null and b/figs/brz/img8.png differ diff --git a/figs/brz/img83.png b/figs/brz/img83.png new file mode 100644 index 0000000..f2ff02d Binary files /dev/null and b/figs/brz/img83.png differ diff --git a/gendocs b/gendocs index 0fc609d..332675b 100755 --- a/gendocs +++ b/gendocs @@ -1,6 +1,7 @@ #!/bin/sh txt2tags -t html --mask-email -i README.t2t -o index.html +txt2tags -t html -i CHD.t2t -o chd.html txt2tags -t html -i BDZ.t2t -o bdz.html txt2tags -t html -i BMZ.t2t -o bmz.html txt2tags -t html -i BRZ.t2t -o brz.html @@ -14,6 +15,7 @@ txt2tags -t html -i NEWSLOG.t2t -o newslog.html txt2tags -t html -i EXAMPLES.t2t -o examples.html txt2tags -t txt --mask-email -i README.t2t -o README +txt2tags -t txt -i CHD.t2t -o CHD txt2tags -t txt -i BDZ.t2t -o BDZ txt2tags -t txt -i BMZ.t2t -o BMZ txt2tags -t txt -i BRZ.t2t -o BRZ diff --git a/papers/esa09.pdf b/papers/esa09.pdf new file mode 100644 index 0000000..820882a Binary files /dev/null and b/papers/esa09.pdf differ diff --git a/papers/thesis.pdf b/papers/thesis.pdf index b89c4f9..c8b8371 100755 Binary files a/papers/thesis.pdf and b/papers/thesis.pdf differ diff --git a/scpscript b/scpscript index 9c113c1..a4b2e54 100755 --- a/scpscript +++ b/scpscript @@ -1,3 +1,3 @@ -scp -r *.html fc_botelho@shell.sourceforge.net:/home/groups/c/cm/cmph/htdocs -scp -r examples/*.c examples/keys.txt fc_botelho@shell.sourceforge.net:/home/groups/c/cm/cmph/htdocs/examples -scp -r papers/*.pdf fc_botelho@shell.sourceforge.net:/home/groups/c/cm/cmph/htdocs/papers/ +scp -r *.html fc_botelho,cmph@web.sourceforge.net:htdocs/ +scp -r examples/*.c examples/keys.txt fc_botelho,cmph@web.sourceforge.net:htdocs/examples/ +scp -r papers/*.pdf fc_botelho,cmph@web.sourceforge.net:htdocs/papers/ diff --git a/src/main.c b/src/main.c index 9bb92db..f739b32 100644 --- a/src/main.c +++ b/src/main.c @@ -31,9 +31,9 @@ void usage_long(const char *prg) fprintf(stderr, "Minimum perfect hashing tool\n\n"); fprintf(stderr, " -h\t print this help message\n"); fprintf(stderr, " -c\t c value determines:\n"); - fprintf(stderr, " \t the number of vertices in the graph for the algorithms BMZ and CHM\n"); - fprintf(stderr, " \t the number of bits per key required in the FCH algorithm\n"); - fprintf(stderr, " \t the load factor in the CHD_PH algorithm\n"); + fprintf(stderr, " \t * the number of vertices in the graph for the algorithms BMZ and CHM\n"); + fprintf(stderr, " \t * the number of bits per key required in the FCH algorithm\n"); + fprintf(stderr, " \t * the load factor in the CHD_PH algorithm\n"); fprintf(stderr, " -a\t algorithm - valid values are\n"); for (i = 0; i < CMPH_COUNT; ++i) fprintf(stderr, " \t * %s\n", cmph_names[i]); fprintf(stderr, " -f\t hash function (may be used multiple times) - valid values are\n"); @@ -44,20 +44,23 @@ void usage_long(const char *prg) fprintf(stderr, " -g\t generation mode\n"); fprintf(stderr, " -s\t random seed\n"); fprintf(stderr, " -m\t minimum perfect hash function file \n"); - fprintf(stderr, " -M\t main memory availability (in MB)\n"); - fprintf(stderr, " -d\t temporary directory used in brz algorithm \n"); - fprintf(stderr, " -b\t the meaning of this parameter depends on the algorithm used.\n"); - fprintf(stderr, " \t If BRZ algorithm is selected in -a option, than it is used\n"); - fprintf(stderr, " \t to make the maximal number of keys in a bucket lower than 256.\n"); - fprintf(stderr, " \t In this case its value should be an integer in the range [64,175].\n"); - fprintf(stderr, " \t If BDZ algorithm is selected in option -a, than it is used to\n"); - fprintf(stderr, " \t determine the size of some precomputed rank information and\n"); - fprintf(stderr, " \t its value should be an integer in the range [3,10].\n"); - fprintf(stderr, " \t If CHD_PH algorithm is selected in option -a, than it is used to\n"); - fprintf(stderr, " \t set average number of keys per bucket and its value should be an\n"); - fprintf(stderr, " \t an integer in the range [1,32].\n"); - fprintf(stderr, " -t\t set the number of keys per bin for a t-perfect hashing function.\n"); - fprintf(stderr, " \t A t-perfect hashing function allows at most t collisions in a given bin.\n"); + fprintf(stderr, " -M\t main memory availability (in MB) used in BRZ algorithm \n"); + fprintf(stderr, " -d\t temporary directory used in BRZ algorithm \n"); + fprintf(stderr, " -b\t the meaning of this parameter depends on the algorithm selected in the -a option:\n"); + fprintf(stderr, " \t * For BRZ it is used to make the maximal number of keys in a bucket lower than 256.\n"); + fprintf(stderr, " \t In this case its value should be an integer in the range [64,175]. Default is 128.\n\n"); + fprintf(stderr, " \t * For BDZ it is used to determine the size of some precomputed rank\n"); + fprintf(stderr, " \t information and its value should be an integer in the range [3,10]. Default\n"); + fprintf(stderr, " \t is 7. The larger is this value, the more compact are the resulting functions\n"); + fprintf(stderr, " \t and the slower are them at evaluation time.\n\n"); + fprintf(stderr, " \t * For CHD and CHD_PH it is used to set the average number of keys per bucket\n"); + fprintf(stderr, " \t and its value should be an integer in the range [1,32]. Default is 4. The\n"); + fprintf(stderr, " \t larger is this value, the slower is the construction of the functions.\n"); + fprintf(stderr, " \t This parameter has no effect for other algorithms.\n\n"); + fprintf(stderr, " -t\t set the number of keys per bin for a t-perfect hashing function. A t-perfect\n"); + fprintf(stderr, " \t hash function allows at most t collisions in a given bin. This parameter applies\n"); + fprintf(stderr, " \t only to the CHD and CHD_PH algorithms. Its value should be an integer in the\n"); + fprintf(stderr, " \t range [1,128]. Defaul is 1\n"); fprintf(stderr, " keysfile\t line separated file with keys\n"); } diff --git a/tex/bdz/bdz.bib b/tex/bdz/bdz.bib new file mode 100755 index 0000000..3727169 --- /dev/null +++ b/tex/bdz/bdz.bib @@ -0,0 +1,140 @@ +@inproceedings{bpz07, + author = {F.C. Botelho and R. Pagh and N. Ziviani}, + title = {Simple and Space-Efficient Minimal Perfect Hash Functions}, + booktitle = {Proceedings of the 10th Workshop on Algorithms and Data Structures (WADs'07)}, + publisher = {Springer LNCS vol. 4619}, + pages = {139-150}, + Moth = August, + location = {Halifax, Canada}, + year = 2007, + key = {author} +} + +@PhdThesis{b08, +author = {F. C. Botelho}, +title = {Near-Optimal Space Perfect Hashing Algorithms}, +school = {Federal University of Minas Gerais}, +year = {2008}, +OPTkey = {}, +OPTtype = {}, +OPTaddress = {}, +month = {September}, +note = {Supervised by Nivio Ziviani, \url{http://www.dcc.ufmg.br/pos/cursos/defesas/255D.PDF}}, +OPTannote = {}, +OPTurl = {http://www.dcc.ufmg.br/pos/cursos/defesas/255D.PDF}, +OPTdoi = {}, +OPTissn = {}, +OPTlocalfile = {}, +OPTabstract = {} +} + +@Article{mwhc96, + author = {B.S. Majewski and N.C. Wormald and G. Havas and Z.J. Czech}, + title = {A family of perfect hashing methods}, + journal = {The Computer Journal}, + year = {1996}, + volume = {39}, + number = {6}, + pages = {547-554}, + key = {author} +} + +@inproceedings{ckrt04, + author = {B. Chazelle and J. Kilian and R. Rubinfeld and A. Tal}, + title = {The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables}, + booktitle = {Proceedings of the 15th annual ACM-SIAM symposium on Discrete algorithms (SODA'04)}, + year = {2004}, + isbn = {0-89871-558-X}, + pages = {30--39}, + location = {New Orleans, Louisiana}, + publisher = {Society for Industrial and Applied Mathematics}, + address = {Philadelphia, PA, USA}, + optpublisher = {Society for Industrial and Applied Mathematics} + } + +@Article{j97, + author = {B. Jenkins}, + title = {Algorithm Alley: Hash Functions}, + journal = {Dr. Dobb's Journal of Software Tools}, + volume = {22}, + number = {9}, + month = {september}, + year = {1997}, + note = {Extended version available at \url{http://burtleburtle.net/bob/hash/doobs.html}} +} + + +@Article{e87, + author = {J. Ebert}, + title = {A Versatile Data Structure for Edges Oriented Graph Algorithms}, + journal = {Communication of The ACM}, + year = {1987}, + OPTkey = {}, + OPTvolume = {}, + number = {30}, + pages = {513-519}, + OPTmonth = {}, + OPTnote = {}, + OPTannote = {} +} + +@article {dict-jour, + AUTHOR = {R. Pagh}, + TITLE = {Low Redundancy in Static Dictionaries with Constant Query Time}, + OPTJOURNAL = sicomp, + JOURNAL = fsicomp, + VOLUME = {31}, + YEAR = {2001}, + NUMBER = {2}, + PAGES = {353--363}, +} + + +@inproceedings{sg06, + author = {K. Sadakane and R. Grossi}, + title = {Squeezing succinct data structures into entropy bounds}, + booktitle = {Proceedings of the 17th annual ACM-SIAM symposium on Discrete algorithms (SODA'06)}, + year = {2006}, + pages = {1230--1239} +} + +@inproceedings{gn06, + author = {R. Gonzalez and + G. Navarro}, + title = {Statistical Encoding of Succinct Data Structures}, + booktitle = {Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM'06)}, + year = {2006}, + pages = {294--305} +} + +@inproceedings{fn07, + author = {K. Fredriksson and + F. Nikitin}, + title = {Simple Compression Code Supporting Random Access and Fast + String Matching}, + booktitle = {Proceedings of the 6th International Workshop on Efficient and Experimental Algorithms (WEA'07)}, + year = {2007}, + pages = {203--216} +} + +@inproceedings{os07, + author = {D. Okanohara and K. Sadakane}, + title = {Practical Entropy-Compressed Rank/Select Dictionary}, + booktitle = {Proceedings of the Workshop on Algorithm Engineering and + Experiments (ALENEX'07)}, + year = {2007}, + location = {New Orleans, Louisiana, USA} + } + + +@inproceedings{rrr02, + author = {R. Raman and V. Raman and S. S. Rao}, + title = {Succinct indexable dictionaries with applications to encoding k-ary trees and multisets}, + booktitle = {Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms (SODA'02)}, + year = {2002}, + isbn = {0-89871-513-X}, + pages = {233--242}, + location = {San Francisco, California}, + publisher = {Society for Industrial and Applied Mathematics}, + address = {Philadelphia, PA, USA}, + } diff --git a/tex/bdz/bdz.tex b/tex/bdz/bdz.tex new file mode 100755 index 0000000..3af13ad --- /dev/null +++ b/tex/bdz/bdz.tex @@ -0,0 +1,70 @@ +\documentclass[12pt]{article} +\usepackage{graphicx} + +\usepackage{latexsym} +\usepackage{url} + +\usepackage{a4wide} +\usepackage{amsmath} +\usepackage{amssymb} +\usepackage{amsfonts} +\usepackage{graphicx} +\usepackage{listings} +\usepackage{fancyhdr} +\usepackage{graphics} +\usepackage{multicol} +\usepackage{epsfig} +\usepackage{textcomp} +\usepackage{url} + +% \usepackage{subfigure} +% \usepackage{subfig} +% \usepackage{wrapfig} + + +\bibliographystyle{plain} +% \bibliographystyle{sbc} +% \bibliographystyle{abnt-alf} +% \bibliographystyle{abnt-num} + +\begin{document} + +\sloppy + +% \renewcommand{\baselinestretch}{1.24}\normalsize % set the space between lines to 1.24 + +% set headings +% \pagestyle{fancy} +% \lhead[\fancyplain{}{\footnotesize\thepage}] +% {\fancyplain{}{\footnotesize\rightmark}} +% \rhead[\fancyplain{}{\footnotesize\leftmark}] +% {\fancyplain{}{\footnotesize\thepage}} +% +% \cfoot{} + +\lstset{ + language=C, + basicstyle=\fontsize{8}{8}\selectfont, + captionpos=t, + aboveskip=0mm, + belowskip=0mm, + abovecaptionskip=0.5mm, + belowcaptionskip=0.5mm, +% numbers = left, + mathescape=true, + escapechar=@, + extendedchars=true, + showstringspaces=false, +% columns=fixed, + basewidth=0.515em, + frame=single, + framesep=1mm, + xleftmargin=1mm, + xrightmargin=1mm, + framerule=0pt +} + +\include{introduction} % Introducao +\bibliography{bdz} + +\end{document} diff --git a/tex/bdz/figs/overviewinternal3g.eps b/tex/bdz/figs/overviewinternal3g.eps new file mode 100644 index 0000000..f646da8 --- /dev/null +++ b/tex/bdz/figs/overviewinternal3g.eps @@ -0,0 +1,783 @@ +%!PS-Adobe-2.0 EPSF-2.0 +%%Title: overviewinternal3g.fig +%%Creator: fig2dev Version 3.2 Patchlevel 5 +%%CreationDate: Fri May 29 11:09:04 2009 +%%For: fbotelho@fbotelho-laptop (Fabiano C. Botelho,,,) +%%BoundingBox: 0 0 342 128 +%Magnification: 1.0000 +%%EndComments +%%BeginProlog +/MyAppDict 100 dict dup begin def +/$F2psDict 200 dict def +$F2psDict begin +$F2psDict /mtrx matrix put +/col-1 {0 setgray} bind def +/col0 {0.000 0.000 0.000 srgb} bind def +/col1 {0.000 0.000 1.000 srgb} bind def +/col2 {0.000 1.000 0.000 srgb} bind def +/col3 {0.000 1.000 1.000 srgb} bind def +/col4 {1.000 0.000 0.000 srgb} bind def +/col5 {1.000 0.000 1.000 srgb} bind def +/col6 {1.000 1.000 0.000 srgb} bind def +/col7 {1.000 1.000 1.000 srgb} bind def +/col8 {0.000 0.000 0.560 srgb} bind def +/col9 {0.000 0.000 0.690 srgb} bind def +/col10 {0.000 0.000 0.820 srgb} bind def +/col11 {0.530 0.810 1.000 srgb} bind def +/col12 {0.000 0.560 0.000 srgb} bind def +/col13 {0.000 0.690 0.000 srgb} bind def +/col14 {0.000 0.820 0.000 srgb} bind def +/col15 {0.000 0.560 0.560 srgb} bind def +/col16 {0.000 0.690 0.690 srgb} bind def +/col17 {0.000 0.820 0.820 srgb} bind def +/col18 {0.560 0.000 0.000 srgb} bind def +/col19 {0.690 0.000 0.000 srgb} bind def +/col20 {0.820 0.000 0.000 srgb} bind def +/col21 {0.560 0.000 0.560 srgb} bind def +/col22 {0.690 0.000 0.690 srgb} bind def +/col23 {0.820 0.000 0.820 srgb} bind def +/col24 {0.500 0.190 0.000 srgb} bind def +/col25 {0.630 0.250 0.000 srgb} bind def +/col26 {0.750 0.380 0.000 srgb} bind def +/col27 {1.000 0.500 0.500 srgb} bind def +/col28 {1.000 0.630 0.630 srgb} bind def +/col29 {1.000 0.750 0.750 srgb} bind def +/col30 {1.000 0.880 0.880 srgb} bind def +/col31 {1.000 0.840 0.000 srgb} bind def + +end + +% This junk string is used by the show operators +/PATsstr 1 string def +/PATawidthshow { % cx cy cchar rx ry string + % Loop over each character in the string + { % cx cy cchar rx ry char + % Show the character + dup % cx cy cchar rx ry char char + PATsstr dup 0 4 -1 roll put % cx cy cchar rx ry char (char) + false charpath % cx cy cchar rx ry char + /clip load PATdraw + % Move past the character (charpath modified the + % current point) + currentpoint % cx cy cchar rx ry char x y + newpath + moveto % cx cy cchar rx ry char + % Reposition by cx,cy if the character in the string is cchar + 3 index eq { % cx cy cchar rx ry + 4 index 4 index rmoveto + } if + % Reposition all characters by rx ry + 2 copy rmoveto % cx cy cchar rx ry + } forall + pop pop pop pop pop % - + currentpoint + newpath + moveto +} bind def +/PATcg { + 7 dict dup begin + /lw currentlinewidth def + /lc currentlinecap def + /lj currentlinejoin def + /ml currentmiterlimit def + /ds [ currentdash ] def + /cc [ currentrgbcolor ] def + /cm matrix currentmatrix def + end +} bind def +% PATdraw - calculates the boundaries of the object and +% fills it with the current pattern +/PATdraw { % proc + save exch + PATpcalc % proc nw nh px py + 5 -1 roll exec % nw nh px py + newpath + PATfill % - + restore +} bind def +% PATfill - performs the tiling for the shape +/PATfill { % nw nh px py PATfill - + PATDict /CurrentPattern get dup begin + setfont + % Set the coordinate system to Pattern Space + PatternGState PATsg + % Set the color for uncolored pattezns + PaintType 2 eq { PATDict /PColor get PATsc } if + % Create the string for showing + 3 index string % nw nh px py str + % Loop for each of the pattern sources + 0 1 Multi 1 sub { % nw nh px py str source + % Move to the starting location + 3 index 3 index % nw nh px py str source px py + moveto % nw nh px py str source + % For multiple sources, set the appropriate color + Multi 1 ne { dup PC exch get PATsc } if + % Set the appropriate string for the source + 0 1 7 index 1 sub { 2 index exch 2 index put } for pop + % Loop over the number of vertical cells + 3 index % nw nh px py str nh + { % nw nh px py str + currentpoint % nw nh px py str cx cy + 2 index oldshow % nw nh px py str cx cy + YStep add moveto % nw nh px py str + } repeat % nw nh px py str + } for + 5 { pop } repeat + end +} bind def + +% PATkshow - kshow with the current pattezn +/PATkshow { % proc string + exch bind % string proc + 1 index 0 get % string proc char + % Loop over all but the last character in the string + 0 1 4 index length 2 sub { + % string proc char idx + % Find the n+1th character in the string + 3 index exch 1 add get % string proc char char+1 + exch 2 copy % strinq proc char+1 char char+1 char + % Now show the nth character + PATsstr dup 0 4 -1 roll put % string proc chr+1 chr chr+1 (chr) + false charpath % string proc char+1 char char+1 + /clip load PATdraw + % Move past the character (charpath modified the current point) + currentpoint newpath moveto + % Execute the user proc (should consume char and char+1) + mark 3 1 roll % string proc char+1 mark char char+1 + 4 index exec % string proc char+1 mark... + cleartomark % string proc char+1 + } for + % Now display the last character + PATsstr dup 0 4 -1 roll put % string proc (char+1) + false charpath % string proc + /clip load PATdraw + neewath + pop pop % - +} bind def +% PATmp - the makepattern equivalent +/PATmp { % patdict patmtx PATmp patinstance + exch dup length 7 add % We will add 6 new entries plus 1 FID + dict copy % Create a new dictionary + begin + % Matrix to install when painting the pattern + TilingType PATtcalc + /PatternGState PATcg def + PatternGState /cm 3 -1 roll put + % Check for multi pattern sources (Level 1 fast color patterns) + currentdict /Multi known not { /Multi 1 def } if + % Font dictionary definitions + /FontType 3 def + % Create a dummy encoding vector + /Encoding 256 array def + 3 string 0 1 255 { + Encoding exch dup 3 index cvs cvn put } for pop + /FontMatrix matrix def + /FontBBox BBox def + /BuildChar { + mark 3 1 roll % mark dict char + exch begin + Multi 1 ne {PaintData exch get}{pop} ifelse % mark [paintdata] + PaintType 2 eq Multi 1 ne or + { XStep 0 FontBBox aload pop setcachedevice } + { XStep 0 setcharwidth } ifelse + currentdict % mark [paintdata] dict + /PaintProc load % mark [paintdata] dict paintproc + end + gsave + false PATredef exec true PATredef + grestore + cleartomark % - + } bind def + currentdict + end % newdict + /foo exch % /foo newlict + definefont % newfont +} bind def +% PATpcalc - calculates the starting point and width/height +% of the tile fill for the shape +/PATpcalc { % - PATpcalc nw nh px py + PATDict /CurrentPattern get begin + gsave + % Set up the coordinate system to Pattern Space + % and lock down pattern + PatternGState /cm get setmatrix + BBox aload pop pop pop translate + % Determine the bounding box of the shape + pathbbox % llx lly urx ury + grestore + % Determine (nw, nh) the # of cells to paint width and height + PatHeight div ceiling % llx lly urx qh + 4 1 roll % qh llx lly urx + PatWidth div ceiling % qh llx lly qw + 4 1 roll % qw qh llx lly + PatHeight div floor % qw qh llx ph + 4 1 roll % ph qw qh llx + PatWidth div floor % ph qw qh pw + 4 1 roll % pw ph qw qh + 2 index sub cvi abs % pw ph qs qh-ph + exch 3 index sub cvi abs exch % pw ph nw=qw-pw nh=qh-ph + % Determine the starting point of the pattern fill + %(px, py) + 4 2 roll % nw nh pw ph + PatHeight mul % nw nh pw py + exch % nw nh py pw + PatWidth mul exch % nw nh px py + end +} bind def + +% Save the original routines so that we can use them later on +/oldfill /fill load def +/oldeofill /eofill load def +/oldstroke /stroke load def +/oldshow /show load def +/oldashow /ashow load def +/oldwidthshow /widthshow load def +/oldawidthshow /awidthshow load def +/oldkshow /kshow load def + +% These defs are necessary so that subsequent procs don't bind in +% the originals +/fill { oldfill } bind def +/eofill { oldeofill } bind def +/stroke { oldstroke } bind def +/show { oldshow } bind def +/ashow { oldashow } bind def +/widthshow { oldwidthshow } bind def +/awidthshow { oldawidthshow } bind def +/kshow { oldkshow } bind def +/PATredef { + MyAppDict begin + { + /fill { /clip load PATdraw newpath } bind def + /eofill { /eoclip load PATdraw newpath } bind def + /stroke { PATstroke } bind def + /show { 0 0 null 0 0 6 -1 roll PATawidthshow } bind def + /ashow { 0 0 null 6 3 roll PATawidthshow } + bind def + /widthshow { 0 0 3 -1 roll PATawidthshow } + bind def + /awidthshow { PATawidthshow } bind def + /kshow { PATkshow } bind def + } { + /fill { oldfill } bind def + /eofill { oldeofill } bind def + /stroke { oldstroke } bind def + /show { oldshow } bind def + /ashow { oldashow } bind def + /widthshow { oldwidthshow } bind def + /awidthshow { oldawidthshow } bind def + /kshow { oldkshow } bind def + } ifelse + end +} bind def +false PATredef +% Conditionally define setcmykcolor if not available +/setcmykcolor where { pop } { + /setcmykcolor { + 1 sub 4 1 roll + 3 { + 3 index add neg dup 0 lt { pop 0 } if 3 1 roll + } repeat + setrgbcolor - pop + } bind def +} ifelse +/PATsc { % colorarray + aload length % c1 ... cn length + dup 1 eq { pop setgray } { 3 eq { setrgbcolor } { setcmykcolor + } ifelse } ifelse +} bind def +/PATsg { % dict + begin + lw setlinewidth + lc setlinecap + lj setlinejoin + ml setmiterlimit + ds aload pop setdash + cc aload pop setrgbcolor + cm setmatrix + end +} bind def + +/PATDict 3 dict def +/PATsp { + true PATredef + PATDict begin + /CurrentPattern exch def + % If it's an uncolored pattern, save the color + CurrentPattern /PaintType get 2 eq { + /PColor exch def + } if + /CColor [ currentrgbcolor ] def + end +} bind def +% PATstroke - stroke with the current pattern +/PATstroke { + countdictstack + save + mark + { + currentpoint strokepath moveto + PATpcalc % proc nw nh px py + clip newpath PATfill + } stopped { + (*** PATstroke Warning: Path is too complex, stroking + with gray) = + cleartomark + restore + countdictstack exch sub dup 0 gt + { { end } repeat } { pop } ifelse + gsave 0.5 setgray oldstroke grestore + } { pop restore pop } ifelse + newpath +} bind def +/PATtcalc { % modmtx tilingtype PATtcalc tilematrix + % Note: tiling types 2 and 3 are not supported + gsave + exch concat % tilingtype + matrix currentmatrix exch % cmtx tilingtype + % Tiling type 1 and 3: constant spacing + 2 ne { + % Distort the pattern so that it occupies + % an integral number of device pixels + dup 4 get exch dup 5 get exch % tx ty cmtx + XStep 0 dtransform + round exch round exch % tx ty cmtx dx.x dx.y + XStep div exch XStep div exch % tx ty cmtx a b + 0 YStep dtransform + round exch round exch % tx ty cmtx a b dy.x dy.y + YStep div exch YStep div exch % tx ty cmtx a b c d + 7 -3 roll astore % { a b c d tx ty } + } if + grestore +} bind def +/PATusp { + false PATredef + PATDict begin + CColor PATsc + end +} bind def + +% crosshatch30 +11 dict begin +/PaintType 1 def +/PatternType 1 def +/TilingType 1 def +/BBox [0 0 1 1] def +/XStep 1 def +/YStep 1 def +/PatWidth 1 def +/PatHeight 1 def +/Multi 2 def +/PaintData [ + { clippath } bind + { 32 16 true [ 32 0 0 -16 0 16 ] + {<033003300c0c0c0c30033003c000c000300330030c0c0c0c + 0330033000c000c0033003300c0c0c0c30033003c000c000 + 300330030c0c0c0c0330033000c000c0>} + imagemask } bind +] def +/PaintProc { + pop + exec fill +} def +currentdict +end +/P3 exch def + +/cp {closepath} bind def +/ef {eofill} bind def +/gr {grestore} bind def +/gs {gsave} bind def +/sa {save} bind def +/rs {restore} bind def +/l {lineto} bind def +/m {moveto} bind def +/rm {rmoveto} bind def +/n {newpath} bind def +/s {stroke} bind def +/sh {show} bind def +/slc {setlinecap} bind def +/slj {setlinejoin} bind def +/slw {setlinewidth} bind def +/srgb {setrgbcolor} bind def +/rot {rotate} bind def +/sc {scale} bind def +/sd {setdash} bind def +/ff {findfont} bind def +/sf {setfont} bind def +/scf {scalefont} bind def +/sw {stringwidth} bind def +/tr {translate} bind def +/tnt {dup dup currentrgbcolor + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb} + bind def +/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul + 4 -2 roll mul srgb} bind def + /DrawEllipse { + /endangle exch def + /startangle exch def + /yrad exch def + /xrad exch def + /y exch def + /x exch def + /savematrix mtrx currentmatrix def + x y tr xrad yrad sc 0 0 1 startangle endangle arc + closepath + savematrix setmatrix + } def + +/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def +/$F2psEnd {$F2psEnteredState restore end} def + +/pageheader { +save +newpath 0 128 moveto 0 0 lineto 342 0 lineto 342 128 lineto closepath clip newpath +-40.3 230.6 translate +1 -1 scale +$F2psBegin +10 setmiterlimit +0 slj 0 slc + 0.06299 0.06299 sc +} bind def +/pagefooter { +$F2psEnd +restore +} bind def +%%EndProlog +pageheader +% +% Fig objects follow +% +% +% here starts figure with depth 53 +% Polyline +0 slj +0 slc +7.500 slw +n 757 1980 m 652 1980 652 2640 105 arcto 4 {pop} repeat + 652 2745 1155 2745 105 arcto 4 {pop} repeat + 1260 2745 1260 2085 105 arcto 4 {pop} repeat + 1260 1980 757 1980 105 arcto 4 {pop} repeat + cp gs col0 s gr +% here ends figure; +% +% here starts figure with depth 51 +% Polyline +0 slj +0 slc +7.500 slw +gs clippath +5215 2261 m 5264 2278 l 5278 2235 l 5229 2219 l 5229 2219 l 5251 2250 l 5215 2261 l cp +eoclip +n 4399 1969 m + 5257 2252 l gs col7 1.00 shd ef gr gs col0 s gr gr + +% arrowhead +n 5215 2261 m 5251 2250 l 5229 2219 l 5215 2261 l cp gs 0.00 setgray ef gr col0 s +% Polyline +gs clippath +5223 2432 m 5272 2449 l 5286 2406 l 5237 2390 l 5237 2390 l 5259 2421 l 5223 2432 l cp +eoclip +n 4407 2140 m + 5265 2423 l gs col7 1.00 shd ef gr gs col0 s gr gr + +% arrowhead +n 5223 2432 m 5259 2421 l 5237 2390 l 5223 2432 l cp gs 0.00 setgray ef gr col0 s +% Polyline +gs clippath +5216 2650 m 5267 2647 l 5264 2602 l 5213 2605 l 5213 2605 l 5245 2626 l 5216 2650 l cp +eoclip +n 4398 2687 m + 5251 2626 l gs col7 1.00 shd ef gr gs col0 s gr gr + +% arrowhead +n 5216 2650 m 5245 2626 l 5213 2605 l 5216 2650 l cp gs 0.00 setgray ef gr col0 s +% Polyline +n 5362 2523 m 5752 2523 l 5752 2696 l 5362 2696 l + cp gs col7 1.00 shd ef gr gs col0 s gr +% Polyline +n 5362 2165 m 5752 2165 l 5752 2338 l 5362 2338 l + cp gs col7 1.00 shd ef gr gs col0 s gr +% Polyline +0.000 slw +n 720 2070 m 900 2070 l 900 2160 l 720 2160 l + cp gs /PC [[1.00 1.00 1.00] [0.00 0.00 0.00]] def +15.00 15.00 sc P3 [16 0 0 -8 48.00 138.00] PATmp PATsp ef gr PATusp +% Polyline +n 720 2565 m 900 2565 l 900 2655 l 720 2655 l + cp gs col7 0.00 shd ef gr +% Polyline +7.500 slw +n 4245 2415 m 4425 2415 l 4425 2595 l 4245 2595 l + cp gs col7 1.00 shd ef gr gs col0 s gr +% Polyline +n 4245 2235 m 4425 2235 l 4425 2415 l 4245 2415 l + cp gs col7 1.00 shd ef gr gs col0 s gr +% Polyline +n 5362 2343 m 5752 2343 l 5752 2516 l 5362 2516 l + cp gs col7 1.00 shd ef gr gs col0 s gr +% Polyline +n 2835 3150 m 3330 3150 l 3330 3465 l 2835 3465 l + cp gs col7 1.00 shd ef gr gs col0 s gr +% Polyline +0.000 slw +n 2880 3330 m 3015 3330 l 3015 3420 l 2880 3420 l + cp gs col7 0.00 shd ef gr +% Polyline +7.500 slw +n 2340 3150 m 2835 3150 l 2835 3465 l 2340 3465 l + cp gs col7 1.00 shd ef gr gs col0 s gr +% Polyline +n 1845 3150 m 2340 3150 l 2340 3465 l 1845 3465 l + cp gs col7 1.00 shd ef gr gs col0 s gr +% Polyline +0.000 slw +n 2385 3330 m 2520 3330 l 2520 3420 l 2385 3420 l + cp gs /PC [[1.00 1.00 1.00] [0.00 0.00 0.00]] def +15.00 15.00 sc P3 [16 0 0 -8 159.00 222.00] PATmp PATsp ef gr PATusp +% Polyline +n 2602 3017 m 2605 2425 l 2792 2423 l 2788 3044 l 2588 3030 l + cp gs /PC [[1.00 1.00 1.00] [0.00 0.00 0.00]] def +15.00 15.00 sc P3 [16 0 0 -8 172.53 161.53] PATmp PATsp ef gr PATusp +% Polyline +n 2609 2477 m 2612 1885 l 2799 1883 l 2795 2504 l 2595 2490 l + cp gs /PC [[1.00 1.00 1.00] [0.00 0.00 0.00]] def +15.00 15.00 sc P3 [16 0 0 -8 173.00 125.53] PATmp PATsp ef gr PATusp +% Polyline +7.500 slw +n 4245 1890 m 4425 1890 l 4425 2070 l 4245 2070 l + cp gs col7 0.85 shd ef gr gs col0 s gr +% Polyline +n 4245 2063 m 4425 2063 l 4425 2243 l 4245 2243 l + cp gs col7 0.85 shd ef gr gs col0 s gr +% Polyline +n 4245 2595 m 4425 2595 l 4425 2775 l 4245 2775 l + cp gs col7 0.85 shd ef gr gs col0 s gr +% Polyline +n 4247 2748 m 4427 2748 l 4427 2928 l 4247 2928 l + cp gs col7 1.00 shd ef gr gs col0 s gr +% Polyline +0.000 slw +n 2657 3060 m 2111 2491 l 2244 2360 l 2786 2937 l + cp gs col7 0.00 shd ef gr +% Polyline +n 2111 2402 m 2660 1838 l 2797 1966 l 2242 2527 l + cp gs col7 0.55 shd ef gr +% Polyline +n 2115 3017 m 2118 2425 l 2305 2423 l 2301 3044 l 2101 3030 l + cp gs col7 0.55 shd ef gr +% Polyline +n 1890 3330 m 2025 3330 l 2025 3420 l 1890 3420 l + cp gs col7 0.55 shd ef gr +% Polyline +n 720 2340 m 900 2340 l 900 2430 l 720 2430 l + cp gs col7 0.55 shd ef gr +% Polyline +n 2113 2439 m 2116 1847 l 2303 1845 l 2299 2466 l 2099 2452 l + cp gs col7 0.00 shd ef gr +/Times-Italic ff 142.88 scf sf +2835 2474 m +gs 1 -1 sc (h \(x\)) col0 sh gr +/Times-Roman ff 111.13 scf sf +2916 2520 m +gs 1 -1 sc (1) col0 sh gr +/Times-Italic ff 142.88 scf sf +2835 3030 m +gs 1 -1 sc (h \(x\)) col0 sh gr +/Times-Roman ff 111.13 scf sf +2916 3076 m +gs 1 -1 sc (2) col0 sh gr +/Times-Italic ff 142.88 scf sf +2835 1950 m +gs 1 -1 sc (h \(x\)) col0 sh gr +/Times-Roman ff 111.13 scf sf +2916 1996 m +gs 1 -1 sc (0) col0 sh gr +/Times-Roman ff 142.88 scf sf +4095 2025 m +gs 1 -1 sc (0) col0 sh gr +/Times-Roman ff 142.88 scf sf +4095 2205 m +gs 1 -1 sc (1) col0 sh gr +/Times-Roman ff 142.88 scf sf +4095 2385 m +gs 1 -1 sc (2) col0 sh gr +/Times-Roman ff 142.88 scf sf +4095 2565 m +gs 1 -1 sc (3) col0 sh gr +/Times-Roman ff 142.88 scf sf +4095 2745 m +gs 1 -1 sc (4) col0 sh gr +/Times-Roman ff 142.88 scf sf +4095 2925 m +gs 1 -1 sc (5) col0 sh gr +/Times-Italic ff 142.88 scf sf +4320 1800 m +gs 1 -1 sc (g) col0 sh gr +/Times-Roman ff 142.88 scf sf +5220 2115 m +gs 1 -1 sc (Hash Table ) col0 sh gr +/Times-Roman ff 142.88 scf sf +5265 2475 m +gs 1 -1 sc (1) col0 sh gr +/Times-Roman ff 142.88 scf sf +5265 2655 m +gs 1 -1 sc (2) col0 sh gr +/Times-Roman ff 142.88 scf sf +5265 2295 m +gs 1 -1 sc (0) col0 sh gr +/Times-Roman ff 158.75 scf sf +1575 1755 m +gs 1 -1 sc (\(a\)) col0 sh gr +/Times-Roman ff 158.75 scf sf +3465 1755 m +gs 1 -1 sc (\(b\)) col0 sh gr +/Times-Roman ff 158.75 scf sf +4680 1755 m +gs 1 -1 sc (\(c\)) col0 sh gr +/Times-Roman ff 142.88 scf sf +3015 3645 m +gs 1 -1 sc (2) col0 sh gr +/Times-Roman ff 142.88 scf sf +2565 3645 m +gs 1 -1 sc (1) col0 sh gr +/Times-Roman ff 142.88 scf sf +2070 3645 m +gs 1 -1 sc (0) col0 sh gr +/ZapfChancery-MediumItalic ff 174.63 scf sf +3420 3375 m +gs 1 -1 sc (L) col0 sh gr +/Times-Roman ff 142.88 scf sf +2865 3277 m +gs 1 -1 sc ({0,2,5}) col0 sh gr +/Times-Roman ff 142.88 scf sf +2370 3277 m +gs 1 -1 sc ({1,3,5}) col0 sh gr +/Times-Roman ff 142.88 scf sf +1895 3277 m +gs 1 -1 sc ({1,2,4}) col0 sh gr +% here ends figure; +% +% here starts figure with depth 45 +% Polyline +0 slj +0 slc +7.500 slw +gs clippath +1944 2497 m 1995 2497 l 1995 2452 l 1944 2452 l 1944 2452 l 1974 2475 l 1944 2497 l cp +eoclip +n 1357 2475 m + 1980 2475 l gs col7 1.00 shd ef gr gs col0 s gr gr + +% arrowhead +n 1944 2497 m 1974 2475 l 1944 2452 l 1944 2497 l cp gs 0.00 setgray ef gr col0 s +% Polyline +gs clippath +3879 2497 m 3930 2497 l 3930 2452 l 3879 2452 l 3879 2452 l 3909 2475 l 3879 2497 l cp +eoclip +n 3292 2475 m + 3915 2475 l gs col7 1.00 shd ef gr gs col0 s gr gr + +% arrowhead +n 3879 2497 m 3909 2475 l 3879 2452 l 3879 2497 l cp gs 0.00 setgray ef gr col0 s +% Ellipse +n 2704 2448 101 101 0 360 DrawEllipse gs col7 1.00 shd ef gr gs col0 s gr + +% Ellipse +n 2209 2448 101 101 0 360 DrawEllipse gs col7 1.00 shd ef gr gs col0 s gr + +% Ellipse +n 2704 2988 101 101 0 360 DrawEllipse gs col7 1.00 shd ef gr gs col0 s gr + +% Ellipse +n 2209 1908 101 101 0 360 DrawEllipse gs col7 1.00 shd ef gr gs col0 s gr + +% Ellipse +n 2704 1908 101 101 0 360 DrawEllipse gs col7 1.00 shd ef gr gs col0 s gr + +% Ellipse +n 2209 2988 101 101 0 360 DrawEllipse gs col7 1.00 shd ef gr gs col0 s gr + +/Times-Roman ff 142.88 scf sf +5423 2663 m +gs 1 -1 sc (band) col0 sh gr +/Times-Roman ff 142.88 scf sf +5460 2304 m +gs 1 -1 sc (the) col0 sh gr +/Times-Roman ff 142.88 scf sf +1418 2430 m +gs 1 -1 sc (Mapping) col0 sh gr +/Times-Roman ff 142.88 scf sf +3285 2430 m +gs 1 -1 sc (Assigning) col0 sh gr +/Times-Roman ff 142.88 scf sf +2674 2485 m +gs 1 -1 sc (3) col0 sh gr +/Times-Roman ff 142.88 scf sf +2179 2485 m +gs 1 -1 sc (2) col0 sh gr +/Times-Italic ff 142.88 scf sf +945 1935 m +gs 1 -1 sc (S) col0 sh gr +/Times-Roman ff 142.88 scf sf +967 2160 m +gs 1 -1 sc (who) col0 sh gr +/Times-Roman ff 142.88 scf sf +960 2430 m +gs 1 -1 sc (band) col0 sh gr +/Times-Roman ff 142.88 scf sf +1005 2655 m +gs 1 -1 sc (the) col0 sh gr +/Times-Roman ff 142.88 scf sf +4305 2378 m +gs 1 -1 sc (3) col0 sh gr +/Times-Roman ff 142.88 scf sf +5422 2482 m +gs 1 -1 sc (who) col0 sh gr +/Times-Roman ff 142.88 scf sf +4545 2430 m +gs 1 -1 sc (Ranking) col0 sh gr +/Times-Roman ff 142.88 scf sf +3060 3420 m +gs 1 -1 sc (the) col0 sh gr +/Times-Roman ff 142.88 scf sf +2539 3420 m +gs 1 -1 sc (who) col0 sh gr +/Times-Roman ff 142.88 scf sf +2045 3420 m +gs 1 -1 sc (band) col0 sh gr +/Times-Roman ff 142.88 scf sf +2179 1945 m +gs 1 -1 sc (0) col0 sh gr +/Times-Roman ff 142.88 scf sf +2674 1945 m +gs 1 -1 sc (1) col0 sh gr +/Times-Roman ff 142.88 scf sf +2179 3025 m +gs 1 -1 sc (4) col0 sh gr +/Times-Roman ff 142.88 scf sf +2674 3025 m +gs 1 -1 sc (5) col0 sh gr +/Times-Roman ff 142.88 scf sf +4300 2875 m +gs 1 -1 sc (3) col0 sh gr +/Times-Roman ff 142.88 scf sf +4305 2548 m +gs 1 -1 sc (3) col0 sh gr +/Times-Roman ff 142.88 scf sf +4305 2715 m +gs 1 -1 sc (2) col0 sh gr +/Times-Roman ff 142.88 scf sf +4299 2190 m +gs 1 -1 sc (0) col0 sh gr +/Times-Roman ff 142.88 scf sf +4299 2033 m +gs 1 -1 sc (0) col0 sh gr +% here ends figure; +pagefooter +showpage +%%Trailer +end +%EOF diff --git a/tex/bdz/figs/overviewinternal3g.fig b/tex/bdz/figs/overviewinternal3g.fig new file mode 100644 index 0000000..bfc0da1 --- /dev/null +++ b/tex/bdz/figs/overviewinternal3g.fig @@ -0,0 +1,156 @@ +#FIG 3.2 Produced by xfig version 3.2.5 +Landscape +Center +Metric +A4 +100.00 +Single +-2 +1200 2 +6 5355 2520 5760 2700 +6 5400 2520 5715 2700 +4 0 0 45 -1 0 9 0.0000 4 105 285 5423 2663 band\001 +-6 +2 2 0 1 0 7 50 -1 20 0.000 0 0 7 0 0 5 + 5362 2523 5752 2523 5752 2696 5362 2696 5362 2523 +-6 +6 5355 2162 5760 2342 +2 2 0 1 0 7 50 -1 20 0.000 0 0 7 0 0 5 + 5362 2165 5752 2165 5752 2338 5362 2338 5362 2165 +4 0 0 45 -1 0 9 0.0000 4 105 195 5460 2304 the\001 +-6 +6 1350 2340 1980 2520 +6 1350 2340 1980 2520 +2 1 0 1 0 7 45 -1 20 0.000 0 0 7 1 0 2 + 1 1 1.00 45.00 30.00 + 1357 2475 1980 2475 +4 0 0 45 -1 0 9 0.0000 4 135 555 1418 2430 Mapping\001 +-6 +-6 +6 3285 2340 3915 2520 +6 3285 2340 3915 2520 +2 1 0 1 0 7 45 -1 20 0.000 0 0 7 1 0 2 + 1 1 1.00 45.00 30.00 + 3292 2475 3915 2475 +4 0 0 45 -1 0 9 0.0000 4 135 630 3285 2430 Assigning\001 +-6 +-6 +6 2603 2347 2805 2549 +1 3 0 1 0 7 45 -1 20 0.000 1 0.0000 2704 2448 101 101 2704 2448 2749 2538 +4 0 0 45 -1 0 9 0.0000 4 105 75 2674 2485 3\001 +-6 +6 2108 2347 2310 2549 +1 3 0 1 0 7 45 -1 20 0.000 1 0.0000 2209 2448 101 101 2209 2448 2254 2538 +4 0 0 45 -1 0 9 0.0000 4 105 75 2179 2485 2\001 +-6 +6 2835 2340 3150 2520 +4 0 0 50 -1 1 9 0.0000 4 135 300 2835 2474 h (x)\001 +4 0 0 50 -1 0 7 0.0000 4 75 60 2916 2520 1\001 +-6 +6 2835 2925 3150 3105 +4 0 0 50 -1 1 9 0.0000 4 135 300 2835 3030 h (x)\001 +4 0 0 50 -1 0 7 0.0000 4 75 60 2916 3076 2\001 +-6 +6 2835 1845 3135 1996 +4 0 0 50 -1 1 9 0.0000 4 135 300 2835 1950 h (x)\001 +4 0 0 50 -1 0 7 0.0000 4 75 60 2916 1996 0\001 +-6 +1 3 0 1 0 7 45 -1 20 0.000 1 0.0000 2704 2988 101 101 2704 2988 2749 3078 +1 3 0 1 0 7 45 -1 20 0.000 1 0.0000 2209 1908 101 101 2209 1908 2254 1998 +1 3 0 1 0 7 45 -1 20 0.000 1 0.0000 2704 1908 101 101 2704 1908 2749 1998 +1 3 0 1 0 7 45 -1 20 0.000 1 0.0000 2209 2988 101 101 2209 2988 2254 3078 +2 4 0 1 0 7 53 -1 -1 0.000 0 0 7 0 0 5 + 1260 2745 1260 1980 652 1980 652 2745 1260 2745 +2 2 0 0 0 7 50 -1 43 0.000 0 0 -1 0 0 5 + 720 2070 900 2070 900 2160 720 2160 720 2070 +2 2 0 0 0 7 50 -1 0 0.000 0 0 7 0 0 5 + 720 2565 900 2565 900 2655 720 2655 720 2565 +2 2 0 1 0 7 50 -1 20 0.000 0 0 7 0 0 5 + 4245 2415 4425 2415 4425 2595 4245 2595 4245 2415 +2 2 0 1 0 7 50 -1 20 0.000 0 0 7 0 0 5 + 4245 2235 4425 2235 4425 2415 4245 2415 4245 2235 +2 1 0 1 0 7 51 -1 20 0.000 0 0 7 1 0 2 + 1 1 1.00 45.00 30.00 + 4399 1969 5257 2252 +2 2 0 1 0 7 50 -1 20 0.000 0 0 7 0 0 5 + 5362 2343 5752 2343 5752 2516 5362 2516 5362 2343 +2 1 0 1 0 7 51 -1 20 0.000 0 0 7 1 0 2 + 1 1 1.00 45.00 30.00 + 4407 2140 5265 2423 +2 1 0 1 0 7 51 -1 20 0.000 0 0 7 1 0 2 + 1 1 1.00 45.00 30.00 + 4398 2687 5251 2626 +2 2 0 1 0 7 50 -1 20 0.000 0 0 7 0 0 5 + 2835 3150 3330 3150 3330 3465 2835 3465 2835 3150 +2 2 0 0 0 7 50 -1 0 0.000 0 0 7 0 0 5 + 2880 3330 3015 3330 3015 3420 2880 3420 2880 3330 +2 2 0 1 0 7 50 -1 20 0.000 0 0 7 0 0 5 + 2340 3150 2835 3150 2835 3465 2340 3465 2340 3150 +2 2 0 1 0 7 50 -1 20 0.000 0 0 7 0 0 5 + 1845 3150 2340 3150 2340 3465 1845 3465 1845 3150 +2 2 0 0 0 7 50 -1 43 0.000 0 0 7 0 0 5 + 2385 3330 2520 3330 2520 3420 2385 3420 2385 3330 +2 3 0 0 0 7 50 -1 43 0.000 0 0 7 0 0 6 + 2602 3017 2605 2425 2792 2423 2788 3044 2588 3030 2602 3017 +2 3 0 0 0 7 50 -1 43 0.000 0 0 7 0 0 6 + 2609 2477 2612 1885 2799 1883 2795 2504 2595 2490 2609 2477 +2 2 0 1 0 7 50 -1 17 0.000 0 0 7 0 0 5 + 4245 1890 4425 1890 4425 2070 4245 2070 4245 1890 +2 2 0 1 0 7 50 -1 17 0.000 0 0 7 0 0 5 + 4245 2063 4425 2063 4425 2243 4245 2243 4245 2063 +2 2 0 1 0 7 50 -1 17 0.000 0 0 7 0 0 5 + 4245 2595 4425 2595 4425 2775 4245 2775 4245 2595 +2 2 0 1 0 7 50 -1 20 0.000 0 0 7 0 0 5 + 4247 2748 4427 2748 4427 2928 4247 2928 4247 2748 +2 3 0 0 0 7 50 -1 0 0.000 0 0 7 0 0 5 + 2657 3060 2111 2491 2244 2360 2786 2937 2657 3060 +2 3 0 0 0 7 50 -1 11 0.000 0 0 7 0 0 5 + 2111 2402 2660 1838 2797 1966 2242 2527 2111 2402 +2 3 0 0 0 7 50 -1 11 0.000 0 0 7 0 0 6 + 2115 3017 2118 2425 2305 2423 2301 3044 2101 3030 2115 3017 +2 2 0 0 0 7 50 -1 11 0.000 0 0 7 0 0 5 + 1890 3330 2025 3330 2025 3420 1890 3420 1890 3330 +2 2 0 0 0 7 50 -1 11 0.000 0 0 7 0 0 5 + 720 2340 900 2340 900 2430 720 2430 720 2340 +2 3 0 0 0 7 50 -1 0 0.000 0 0 7 0 0 6 + 2113 2439 2116 1847 2303 1845 2299 2466 2099 2452 2113 2439 +4 0 0 45 -1 1 9 0.0000 4 105 75 945 1935 S\001 +4 0 0 45 -1 0 9 0.0000 4 105 270 967 2160 who\001 +4 0 0 45 -1 0 9 0.0000 4 105 285 960 2430 band\001 +4 0 0 45 -1 0 9 0.0000 4 105 195 1005 2655 the\001 +4 0 0 50 -1 0 9 0.0000 4 105 75 4095 2025 0\001 +4 0 0 50 -1 0 9 0.0000 4 105 75 4095 2205 1\001 +4 0 0 50 -1 0 9 0.0000 4 105 75 4095 2385 2\001 +4 0 0 50 -1 0 9 0.0000 4 105 75 4095 2565 3\001 +4 0 0 50 -1 0 9 0.0000 4 105 75 4095 2745 4\001 +4 0 0 50 -1 0 9 0.0000 4 105 75 4095 2925 5\001 +4 0 0 50 -1 1 9 0.0000 4 120 60 4320 1800 g\001 +4 0 0 45 -1 0 9 0.0000 4 105 75 4305 2378 3\001 +4 0 0 50 -1 0 9 0.0000 4 105 810 5220 2115 Hash Table \001 +4 0 0 45 -1 0 9 0.0000 4 105 270 5422 2482 who\001 +4 0 0 50 -1 0 9 0.0000 4 105 75 5265 2475 1\001 +4 0 0 50 -1 0 9 0.0000 4 105 75 5265 2655 2\001 +4 0 0 50 -1 0 9 0.0000 4 105 75 5265 2295 0\001 +4 0 0 50 -1 0 10 0.0000 4 135 180 1575 1755 (a)\001 +4 0 0 50 -1 0 10 0.0000 4 135 195 3465 1755 (b)\001 +4 0 0 50 -1 0 10 0.0000 4 135 180 4680 1755 (c)\001 +4 0 0 45 -1 0 9 0.0000 4 135 510 4545 2430 Ranking\001 +4 0 0 50 -1 0 9 0.0000 4 105 75 3015 3645 2\001 +4 0 0 50 -1 0 9 0.0000 4 105 75 2565 3645 1\001 +4 0 0 50 -1 0 9 0.0000 4 105 75 2070 3645 0\001 +4 0 0 50 -1 33 11 0.0000 4 135 90 3420 3375 L\001 +4 0 0 50 -1 0 9 0.0000 4 135 435 2865 3277 {0,2,5}\001 +4 0 0 45 -1 0 9 0.0000 4 105 195 3060 3420 the\001 +4 0 0 50 -1 0 9 0.0000 4 135 435 2370 3277 {1,3,5}\001 +4 0 0 45 -1 0 9 0.0000 4 105 270 2539 3420 who\001 +4 0 0 45 -1 0 9 0.0000 4 105 285 2045 3420 band\001 +4 0 0 50 -1 0 9 0.0000 4 135 435 1895 3277 {1,2,4}\001 +4 0 0 45 -1 0 9 0.0000 4 105 75 2179 1945 0\001 +4 0 0 45 -1 0 9 0.0000 4 105 75 2674 1945 1\001 +4 0 0 45 -1 0 9 0.0000 4 105 75 2179 3025 4\001 +4 0 0 45 -1 0 9 0.0000 4 105 75 2674 3025 5\001 +4 0 0 45 -1 0 9 0.0000 4 105 75 4300 2875 3\001 +4 0 0 45 -1 0 9 0.0000 4 105 75 4305 2548 3\001 +4 0 0 45 -1 0 9 0.0000 4 105 75 4305 2715 2\001 +4 0 0 45 -1 0 9 0.0000 4 105 75 4299 2190 0\001 +4 0 0 45 -1 0 9 0.0000 4 105 75 4299 2033 0\001 diff --git a/tex/bdz/introduction.tex b/tex/bdz/introduction.tex new file mode 100755 index 0000000..97630ef --- /dev/null +++ b/tex/bdz/introduction.tex @@ -0,0 +1,371 @@ +\section{Introduction} \label{sec:introduction} + +The BDZ algorithm was designed by Fabiano C. Botelho, Djamal Belazzougui, Rasmus Pagh and Nivio Ziviani. +It is a simple, efficient, near-optimal space and practical +algorithm to generate a family $\cal F$ of PHFs and MPHFs. +It is also referred to as BPZ algorithm because the work presented +by Botelho, Pagh and Ziviani in \cite{bpz07}. +In the Botelho's PhD. dissertation \cite{b08} it is also referred to as RAM algorithm +because it is more suitable for key sets that can be handled in internal memory. + +The BDZ algorithm uses $r$-uniform random hypergraphs +given by function values of $r$ +uniform random hash functions on the input key set $S$ for generating PHFs and MPHFs that +require $O(n)$ bits to be stored. +A hypergraph is the generalization of a standard undirected +graph where each edge connects $r\geq 2$ vertices. +This idea is not new, see e.g. \cite{mwhc96}, +but we have proceed differently to achieve +a space usage of $O(n)$ bits rather than $O(n\log n)$ bits. +Evaluation time for all schemes considered is constant. +For $r=3$ we obtain a space usage of approximately $2.6n$ bits for +an MPHF. More compact, and even simpler, representations can be +achieved for larger $m$. For example, for $m=1.23n$ we can get a +space usage of $1.95n$ bits. + +Our best MPHF space upper bound is within a +factor of 2 from the information theoretical lower bound of approximately +$1.4427n$ bits. We have shown that the BDZ algorithm is far more +practical than previous methods with proven space complexity, both +because of its simplicity, and because the constant factor of the +space complexity is more than 6 times lower than its closest +competitor, for plausible problem sizes. We verify the practicality +experimentally, using slightly more space than in the mentioned +theoretical bounds. + +\section{The Algorithm} + +The BDZ algorithm is a three-step algorithm that generates PHFs and MPHFs based on +random $r$-partite hypergraphs. +This is an approach that provides a much tighter analysis and is +much more simple than the one presented in +\cite{ckrt04}, where it was implicit how to construct +similar PHFs. +The fastest and most compact functions +are generated when $r=3$. +In this case a PHF can be stored in +approximately $1.95$ bits per key and +an MPHF in approximately +$2.62$ bits per key. + +Figure~\ref{fig:overview} gives an overview of the algorithm for $r=3$, +taking as input a key set $S \subseteq U$ containing three English words, i.e., $S=\{\mathrm{who},\mathrm{band},\mathrm{the}\}$. +% which are nicely hashed to the name of a rock band ``the who band''. +The edge-oriented data structure proposed in~\cite{e87} is used +to represent hypergraphs, where each edge is explicitly represented +as an array of $r$ vertices and, for each vertex $v$, +there is a list of edges that are incident on $v$. + +The {\em Mapping Step} in Figure~\ref{fig:overview}(a) carries out two +important tasks: +\begin{enumerate} +\item +It assumes that it is possible to find three uniform +hash functions, $h_0$, $h_1$ and $h_2$, with ranges $\{0,1\}$, $\{2,3\}$ and $\{4,5\}$, respectively. +These functions build an one-to-one mapping of the key set $S$ to the edge set $E$ +of a random acyclic +$3$-partite hypergraph $G=(V,E)$, where $|V|=m=6$ and $|E|=n=3$. +In \cite{b08,bpz07} it is shown that +it is possible to obtain such a hypergraph with probability tending to $1$ as $n$ +tends to infinity +whenever $m=cn$ and $c \ge 1.23$. The value of $c$ that minimizes the hypergraph size +(and thereby the amount of bits to represent the resulting functions) is $c \approx 1.23$. +To illustrate the mapping, +key ``who'' is mapped to edge $\{h_0(\text{``who''}),h_1(\text{``who''}),h_2(\text{``who''})\}=\{1,3,5\}$, +key ``band'' is mapped to edge $\{h_0(\text{``band''}),h_1(\text{``band''}),h_2(\text{``band''})\}=\{1,2,4\}$, and +key ``the'' is mapped to edge $\{h_0(\text{``the''}),h_1(\text{``the''}),h_2(\text{``the''})\}=\{0,2,5\}$. +\item +It tests whether the resulting random $3$-partite hypergraph contains cycles +by iteratively deleting edges connecting vertices of degree 1. +The deleted edges are stored in the order of deletion in a list $\cal L$ +to be used in the assigning step. +The first deleted edge in Figure~\ref{fig:overview}(a) +was $\{1,2,4\}$, the second one was $\{1,3,5\}$ and +the third one was $\{0,2,5\}$. +% the last one was $\{0,2,5\}$. +If it ends with an empty graph, then the test succeeds, +otherwise it fails. +\end{enumerate} + + +\begin{figure} +\begin{center} +\scalebox{0.9}{\epsfig{file=figs/overviewinternal3g.eps}} +\end{center} +\caption{(a) The mapping step generates a random acyclic $3$-partite hypergraph with $m=6$ vertices and $n=3$ edges +and a list $\cal L$ of edges obtained when we test whether the hypergraph is acyclic. +(b) The assigning step builds an array $g:[0,5] \to [0,3]$ to uniquely +assign an edge to a vertex. (c) The ranking step builds the data structure used to +compute function $\mathit{rank}: [0,5] \to [0,2]$ in $O(1)$ time.~~~~} +\label{fig:overview} +\end{figure} + + + +We now show how to use the Jenkins hash functions \cite{j97} +to implement the three hash functions $h_i: S \to V_i$, $0\le i \le 2$, which are used to build a random $3$-partite hypergraph +$G=(V,E)$, +where $V= V_0 \cup V_1 \cup V_2$ and $|V_i| = \eta = \lceil \frac{m}{3} \rceil$. +Let $h':S \to \{0,1\}^\gamma$ be a Jenkins hash function +for $\gamma = 3 \times w$, where +$w = 32 \text{ or } 64$ for +32-bit and 64-bit architectures, respectively. +Let $H'$ be an array of 3 $w$-bit values. +The Jenkins hash function +allow us to compute in parallel the three entries in $H'$ +and thereby the three hash functions $h_i$, as follows: +% Thus we can compute the three hash functions $h_i$ +% as follows: +\begin{eqnarray} + H' &=& h'(x) \nonumber \\ + h_0(x) &=& H'[0] \bmod \eta \nonumber \\ + h_1(x) &=& H'[1] \bmod \eta + \eta \nonumber \\ + h_2(x) &=& H'[2] \bmod \eta + 2\eta +\end{eqnarray} + +The {\em Assigning Step} in Figure~\ref{fig:overview}(b) outputs +a PHF that maps the key set $S$ into the range $[0,m-1]$ and is represented by +an array $g$ storing values from the range $[0,3]$. +The array $g$ allows to select one out of the $3$ +vertices of a given edge, which is associated with a +key $k$. +A vertex for a key $k$ is given +by either $h_0(k)$, $h_1(k)$ or $h_2(k)$. +The function $h_i(k)$ +to be used for $k$ is chosen by calculating $i = (g[h_0(k)] + g[h_1(k)] + g[h_2(k)]) \bmod 3$. +For instance, +the values 1 and 4 represent the keys ``who'' and ``band'' +because $i = (g[1] + g[3] + g[5]) \bmod 3 = 0$ and $h_0(\text{``who''}) = 1$, +and $i = (g[1] + g[2] + g[4]) \bmod 3 = 2$ and $h_2(\text{``band''}) = 4$, respectively. +% Likewise, the value 4 represents the key +% because $(g[1] + g[2] + g[4]) \bmod 3 = 2$ and $h_2(\text{``band''}) = 4$, and so on. +The assigning step firstly initializes $g[i]=3$ +to mark every vertex as unassigned +% (i.e., each vertex is unassigned) +and +$\mathit{Visited}[i]=\mathit{false}$, $0\leq i \leq m-1$. +Let $\mathit{Visited}$ be a boolean vector of size $m$ +to indicate whether a vertex has been visited. +Then, for each edge $e \in \cal L$ from tail to head, +it looks for the first +vertex $u$ belonging to $e$ not yet visited. +This is a sufficient condition for success \cite{b08,bpz07,mwhc96}. +Let $j$, $0 \leq j \leq 2$, be the index of $u$ in $e$. +Then, it assigns $g[u]=(j-\sum_{v \in e \wedge \mathit{Visited}[v] = true} g[v]) \bmod 3$. +Whenever it passes through a vertex $u$ from $e$, +if $u$ has not yet been visited, +it sets $\mathit{Visited}[u] = true$. +% The value $g[i]=3$ is used to represent unassigned vertices. + +If we stop the BDZ algorithm in the assigning step +we obtain a PHF with range $[0,m-1]$. +The PHF has the following form: +$phf(x) = h_{i(x)}(x)$, where $x\in S$ and $i(x) = (g[h_0(x)] + g[h_1(x)] + g[h_2(x)]) \bmod 3$. +In this case we do not need information for ranking and +can set $g[i] = 0$ whenever $g[i]$ is equal to 3, where $0 \le i \le m-1$. +Therefore, the range of the values stored in $g$ is narrowed +from $[0,3]$ to $[0,2]$. By using arithmetic coding as block of +values (see \cite{b08,bpz07} for details), +or any compression technique that allows to perform +random access in constant time to an array of compressed values \cite{fn07,gn06,sg06}, +we can store the resulting PHFs in $m\log 3 = c n\log 3$ bits, +where $c \ge 1.23$. For $c = 1.23$, the space requirement is $1.95n$ bits. + + +The {\em Ranking Step} in Figure~\ref{fig:overview}(c) +outputs a data structure +that permits to narrow the range of a PHF generated in the +assigning step from $[0,m-1]$ to $[0,n-1]$ and thereby +an MPHF is produced. +The data structure allows to compute in constant time +a function $\mathit{rank}\!\!:[0,m-1]\to [0,n-1]$ +that counts the number of assigned positions +before a given position $v$ in $g$. +For instance, $\mathit{rank}(4) = 2$ because +the positions $0$ and $1$ are assigned +since $g[0] \text{ and } g[1] \not = 3$. +% and they come before position 4 in $g$. + +For the implementation of the ranking step +we have borrowed +a simple and efficient implementation from +\cite{dict-jour}. +It requires $\epsilon \, m$ additional bits of space, where $0 < \epsilon < 1$, +and is obtained by storing explicitly the +$\mathit{rank}$ of every $k$th index in a rankTable, where $k +=\lfloor\log(m)/\epsilon\rfloor$. +The larger is $k$ the more compact is the resulting MPHF. +Therefore, the users can tradeoff space for evaluation time +by setting $k$ appropriately in the implementation. +% In the implementation we let +% $k$ to be set by the users so that they can trade off +% space for evaluation time and vice-versa. +We only allow values for $k$ +that are power of two (i.e., $k=2^{b_k}$ for some constant $b_k$) in order to replace the expensive +division and modulo operations by +bit-shift and bitwise ``and'' operations, respectively. +We have used $k=256$ +in the experiments +for generating more succinct MPHFs. +We remark that it is still possible to obtain a more compact data structure by +using the results presented in \cite{os07,rrr02}, but at the cost of a much more +complex implementation. + +We need to use an additional lookup table $T_r$ +to guarantee the constant evaluation time of $\mathit{rank}(u)$. +Let us illustrate how $\mathit{rank}(u)$ is computed +using both the rankTable and the lookup table $T_r$. +We first look up +the rank of the largest precomputed index +$v\leq u$ in the rankTable, +and use $T_r$ to count the number of assigned vertices from position +$v$ to $u-1$. +The lookup table $T_r$ allows us to count in constant time +the number of assigned vertices in $\flat=\epsilon \log m$ bits, +where $0 < \epsilon < 1$. Thus the actual evaluation time is $O(1/\epsilon)$. +For simplicity and +without loss of generality we let $\flat$ be a multiple of the number of +bits $\beta$ used to encode each entry of $g$. +As the values in $g$ come from the range $[0,3]$, +then $\beta=2$ bits and we have tried $\flat = 8 \text{ and } 16$. +We would expect that $\flat = 16$ should provide +a faster evaluation time because we would need to carry out fewer lookups +in $T_r$. But, for both values of $\flat$ the lookup table $T_r$ fits entirely in +the CPU cache and we did not realize any significant difference in +the evaluation times. Therefore we settle for $\flat=8$. +We remark that each $r \ge 2$ requires +a different lookup table $T_r$ that can be generated a priori. + + +% To do this in $O(1/\epsilon)$ time +% we use a lookup table $T_r$ that allows us to count +% the number of assigned vertices in $\flat=\epsilon \log m$ bits +% in constant time for any $0 < \epsilon < 1$. + + + +% In general the PHFs or MPHFs are constructed based on random acyclic $r$-partite hypergraphs $G_r=(V,E)$, +% where $V= V_0 \cup V_1 \cup \dots \cup V_{r-1}$ and $|V_i| = \eta = \lceil \frac{m}{r} \rceil$, where $0\leq i < r$. +% The most efficient and compact functions are generated +% when $r=3$ and $m=1.23n$. The value $1.23n$ is required to generate a +% random acyclic $3$-partite hypergraph with high probability\footnote{Throughout this paper +% we write ``with high probability'' to mean with probability +% $1 - n^{-\delta}$ for $\delta > 0$.}~\cite{b08,bpz07}. + + +% the family of linear transformations +% presented in \cite{admp99}. A still faster option is the Jenkins function +% proposed in \cite{j97}, which was used for all methods considered in this paper. + +The resulting +MPHFs have the following form: +$h(x) = \mathit{rank}(\mathit{phf}(x))$. +Then, we cannot get rid of +the raking information by replacing the values 3 by 0 in the entries of $g$. +% The array +% $g$ is now representing a function $g:V\to \{0,1,2,3\}$ +% and $\mathit{rank}: V \to [0,n-1]$ is +% now the cardinality of +% $\{ u\in V \;\mid\; u\!> b_k + 1)$ $\delta$-bit entries, where $\delta = 32 \text{ or } 64$ depending on the architecture. The operator $>\!>$ denotes the right shift of bits.\\[2mm]@ +% void @BDZ@ (@$S$@, @$\cal H$@, @$c$@, @$b_k$@, @$g$@, @rankTable@)@\\[2mm]@ +% // Mapping step +% do +% @$G.E = \emptyset$@; +% select @$h'$@ at random from @$\cal H$@; +% for @{\bf each}@ @$x \in S$@ do +% @$H'$ = $h'(x)$@; +% @$e$@ = @$\{h_0(x), h_1(x), h_2(x)\}$@; +% addEdge (@$G$@, @$e$@); +% @$\cal L$@ = isAcyclic(@$G$@); +% while (@$G.E$@ is not empty); +% +% // Assigning step +% for (@$u = 0$@; @$u < m$@; @$u$++@) +% Visited[@$u$@] = @{\bf false}@; +% @$g[u]$@ = @$3$@; +% for (i = @$|{\cal L}|-1$@; i @$\ge 0$@; i@$--$@) +% @$e$@ = @$\cal L$@[i]; +% sum = 0; +% for (@$v$@ = 2; @$v \ge 0$@; @$v$@@$--$@) +% if (not Visited[@$e[v]$@]) +% Visited[@$e[v]$@] = @{\bf true}@; +% @$u$@ = @$e[v]$@; +% @$j$@ = @$v$@; +% else sum += @$g[e[v]]$@; +% @g[u]@ = @$(j - \mathrm{sum}) \bmod 3$@; +% +% // Ranking step +% sum = 0; +% kmask = @$(2^{b_k}-1)$@; +% for (i = 0; i < @$|g|$@; i++) +% if((i & kmask) @==@ 0) +% rankTable[i @$>\!> b_k$@] = sum; +% if(@$g$@[i] @$\not = 3$@) sum++; +% +% @{\bf PHF Algorithm}\\[1mm]@ +% @{\bf Input:} a key $x \in S$, an array $g$ with $m = \lceil cn \rceil$ 2-bit entries, where $c \ge 1.23$, and the ``good'' hash functions $h'$ selected by the BDZ algorithm.\\[1mm]@ +% @{\bf Output:} the perfect hash function value for the key $x \in S$.\\[2mm]@ +% int phf (@$x$@, @$g$@, @$h'$@) +% @$H'$@ = @$h'(x)$@; +% @$e$@ = @$\{h_0(x), h_1(x), h_2(x)\}$@; +% @$v$@ = @$(g[e[0]] + g[e[1]] + g[e[2]]) \bmod 3$@; +% return @$e[v]$@; +% +% @{\bf Algorithm to Generate the Lookup Table}\\[1mm]@ +% @{\bf Input:} none\\[1mm]@ +% @{\bf Output:} the lookup table @$T_r$@ to be used by the mphf function. It counts the number of assigned +% vertices in a single byte. As each entry in the array $g$ is encoded by 2 bits, then a single byte can store at most four 2-bit values. LS($i'$,2) stands for the value of the 2 least significant bits of $i'$.\\[2mm]@ +% void genLookupTable (@$T_r$@) +% for (i = 0; i < 256; i++) +% sum = 0; +% @$i'$@ = i; +% for (j = 0; j < 4; j++) +% if(@$\text{LS}(i',2) \not = 3$@) sum++; +% @$i'$@ = @$i' >\!> 2$@; +% @$T_r[i]$@ = sum; +% +% @{\bf MPHF Algorithm}\\[1mm]@ +% @{\bf Input:} a key $x \in S$, an array $g$ with $m = \lceil cn \rceil$ 2-bit entries, where $c \ge 1.23$, the chosen ``good'' hash functions $h'$, a constant $b_k$ that makes $k=2^{b_k}$, the lookup table $T_r$ that counts the number of assigned vertices in a single byte, and a rankTable with $(m >\!> b_k + 1)$ $\delta$-bit entries, where $\delta = 32 \text{ or } 64$ depending on the architecture. The notation $g[i \to j]$ represents the values stored in the entries from $g[i]$ to $g[j]$ for $i\leq j$.\\[1mm]@ +% @{\bf Output:} the minimal perfect hash function value for the key $x \in S$.\\[2mm]@ +% int mphf (@$x$@, @$g$@, @$h'$@, @$b_k$@, @$T_r$@, @rankTable@) +% @$u$@ = phf(@$x$@, @$g$@, @$h'$@); +% j = @$u >\!> b_k$@; // @j@ = @$u$@/k +% rank = rankTable[j]; +% i = j @$<\!< b_k$@; // @i@ = @j*k@ +% for(j = i + 4; j < u; i = j, j += 4) +% rank += @$T_r[g[$@i @$\to$@ j@$]]$@; +% for(j = j - 4; j < u; j ++) +% if(@$g$@[j] @$\not =$@ 3) rank ++ ; +% return rank; +% \end{lstlisting} +% \end{center} +% \vspace{-6mm} +% \caption{The BDZ algorithm and the resulting PHFs and MPHFs.} +% \label{prog:ram} +% \vspace{-7mm} +% \end{figure} + +$\eta$ ~~ +$\epsilon$ ~~ +$\varepsilon$ \ No newline at end of file diff --git a/tex/bdz/makefile b/tex/bdz/makefile new file mode 100755 index 0000000..6c378e7 --- /dev/null +++ b/tex/bdz/makefile @@ -0,0 +1,12 @@ +all: + latex bdz.tex + bibtex bdz + latex bdz.tex + latex bdz.tex + dvips bdz.dvi -o bdz.ps +run: clean all + gv bdz.ps & +html: clean all + latex2html bdz.tex +clean: + rm bdz.dvi bdz.ps *.lot *.lof *.aux *.bbl *.blg *.log *.toc diff --git a/tex/chd/chd.bib b/tex/chd/chd.bib new file mode 100755 index 0000000..c29164d --- /dev/null +++ b/tex/chd/chd.bib @@ -0,0 +1,176 @@ +@inproceedings{bpz07, + author = {F.C. Botelho and R. Pagh and N. Ziviani}, + title = {Simple and Space-Efficient Minimal Perfect Hash Functions}, + booktitle = {Proceedings of the 10th Workshop on Algorithms and Data Structures (WADs'07)}, + publisher = {Springer LNCS vol. 4619}, + pages = {139-150}, + Moth = August, + location = {Halifax, Canada}, + year = 2007, + key = {author} +} + +@inproceedings{pb06, + author = {B. Prabhakar and F. Bonomi}, + title = {Perfect Hashing for Network Applications}, + booktitle = {Proceedings of the IEEE International Symposium +on Information Theory}, + year = {2006}, + location = {Seattle, Washington, USA}, + publisher = {IEEE Press} + } + +@inproceedings{dp08, + author = {Martin Dietzfelbinger and Rasmus Pagh}, + title = {Succinct Data Structures for Retrieval and Approximate Membership}, + booktitle = {Proceedings of the 35th international colloquium on Automata, Languages and Programming (ICALP'08)}, + year = {2008}, + isbn = {978-3-540-70574-1}, + pages = {385--396}, + location = {Reykjavik, Iceland}, + doi = {http://dx.doi.org/10.1007/978-3-540-70575-8_32}, + publisher = {Springer-Verlag}, + address = {Berlin, Heidelberg}, + } + + +@inproceedings{bbd09, + author = {D. Belazzougui, F.C. Botelho and M. Dietzfelbinger}, + title = {Compress, Hash and Displace}, + booktitle = {Proceedings of the 17th European Symposium on Algorithms (ESA'09)}, + publisher = {Springer LNCS}, + OPTpages = {139-150}, + Moth = September, + location = {Copenhagen, Denmark}, + year = 2009, + key = {author} +} + +@PhdThesis{b08, +author = {F. C. Botelho}, +title = {Near-Optimal Space Perfect Hashing Algorithms}, +school = {Federal University of Minas Gerais}, +year = {2008}, +OPTkey = {}, +OPTtype = {}, +OPTaddress = {}, +month = {September}, +note = {Supervised by Nivio Ziviani, \url{http://www.dcc.ufmg.br/pos/cursos/defesas/255D.PDF}}, +OPTannote = {}, +OPTurl = {http://www.dcc.ufmg.br/pos/cursos/defesas/255D.PDF}, +OPTdoi = {}, +OPTissn = {}, +OPTlocalfile = {}, +OPTabstract = {} +} + +@Article{mwhc96, + author = {B.S. Majewski and N.C. Wormald and G. Havas and Z.J. Czech}, + title = {A family of perfect hashing methods}, + journal = {The Computer Journal}, + year = {1996}, + volume = {39}, + number = {6}, + pages = {547-554}, + key = {author} +} + +@inproceedings{ckrt04, + author = {B. Chazelle and J. Kilian and R. Rubinfeld and A. Tal}, + title = {The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables}, + booktitle = {Proceedings of the 15th annual ACM-SIAM symposium on Discrete algorithms (SODA'04)}, + year = {2004}, + isbn = {0-89871-558-X}, + pages = {30--39}, + location = {New Orleans, Louisiana}, + publisher = {Society for Industrial and Applied Mathematics}, + address = {Philadelphia, PA, USA}, + optpublisher = {Society for Industrial and Applied Mathematics} + } + +@Article{j97, + author = {B. Jenkins}, + title = {Algorithm Alley: Hash Functions}, + journal = {Dr. Dobb's Journal of Software Tools}, + volume = {22}, + number = {9}, + month = {september}, + year = {1997}, + note = {Extended version available at \url{http://burtleburtle.net/bob/hash/doobs.html}} +} + + +@Article{e87, + author = {J. Ebert}, + title = {A Versatile Data Structure for Edges Oriented Graph Algorithms}, + journal = {Communication of The ACM}, + year = {1987}, + OPTkey = {}, + OPTvolume = {}, + number = {30}, + pages = {513-519}, + OPTmonth = {}, + OPTnote = {}, + OPTannote = {} +} + +@article {dict-jour, + AUTHOR = {R. Pagh}, + TITLE = {Low Redundancy in Static Dictionaries with Constant Query Time}, + OPTJOURNAL = sicomp, + JOURNAL = fsicomp, + VOLUME = {31}, + YEAR = {2001}, + NUMBER = {2}, + PAGES = {353--363}, +} + + +@inproceedings{sg06, + author = {K. Sadakane and R. Grossi}, + title = {Squeezing succinct data structures into entropy bounds}, + booktitle = {Proceedings of the 17th annual ACM-SIAM symposium on Discrete algorithms (SODA'06)}, + year = {2006}, + pages = {1230--1239} +} + +@inproceedings{gn06, + author = {R. Gonzalez and + G. Navarro}, + title = {Statistical Encoding of Succinct Data Structures}, + booktitle = {Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM'06)}, + year = {2006}, + pages = {294--305} +} + +@inproceedings{fn07, + author = {K. Fredriksson and + F. Nikitin}, + title = {Simple Compression Code Supporting Random Access and Fast + String Matching}, + booktitle = {Proceedings of the 6th International Workshop on Efficient and Experimental Algorithms (WEA'07)}, + year = {2007}, + pages = {203--216} +} + +@inproceedings{os07, + author = {D. Okanohara and K. Sadakane}, + title = {Practical Entropy-Compressed Rank/Select Dictionary}, + booktitle = {Proceedings of the Workshop on Algorithm Engineering and + Experiments (ALENEX'07)}, + year = {2007}, + location = {New Orleans, Louisiana, USA} + } + + +@inproceedings{rrr02, + author = {R. Raman and V. Raman and S. S. Rao}, + title = {Succinct indexable dictionaries with applications to encoding k-ary trees and multisets}, + booktitle = {Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms (SODA'02)}, + year = {2002}, + isbn = {0-89871-513-X}, + pages = {233--242}, + location = {San Francisco, California}, + publisher = {Society for Industrial and Applied Mathematics}, + address = {Philadelphia, PA, USA}, + } diff --git a/tex/chd/chd.tex b/tex/chd/chd.tex new file mode 100755 index 0000000..ce870dc --- /dev/null +++ b/tex/chd/chd.tex @@ -0,0 +1,70 @@ +\documentclass[12pt]{article} +\usepackage{graphicx} + +\usepackage{latexsym} +\usepackage{url} + +\usepackage{a4wide} +\usepackage{amsmath} +\usepackage{amssymb} +\usepackage{amsfonts} +\usepackage{graphicx} +\usepackage{listings} +\usepackage{fancyhdr} +\usepackage{graphics} +\usepackage{multicol} +\usepackage{epsfig} +\usepackage{textcomp} +\usepackage{url} + +% \usepackage{subfigure} +% \usepackage{subfig} +% \usepackage{wrapfig} + + +\bibliographystyle{plain} +% \bibliographystyle{sbc} +% \bibliographystyle{abnt-alf} +% \bibliographystyle{abnt-num} + +\begin{document} + +\sloppy + +% \renewcommand{\baselinestretch}{1.24}\normalsize % set the space between lines to 1.24 + +% set headings +% \pagestyle{fancy} +% \lhead[\fancyplain{}{\footnotesize\thepage}] +% {\fancyplain{}{\footnotesize\rightmark}} +% \rhead[\fancyplain{}{\footnotesize\leftmark}] +% {\fancyplain{}{\footnotesize\thepage}} +% +% \cfoot{} + +\lstset{ + language=C, + basicstyle=\fontsize{8}{8}\selectfont, + captionpos=t, + aboveskip=0mm, + belowskip=0mm, + abovecaptionskip=0.5mm, + belowcaptionskip=0.5mm, +% numbers = left, + mathescape=true, + escapechar=@, + extendedchars=true, + showstringspaces=false, +% columns=fixed, + basewidth=0.515em, + frame=single, + framesep=1mm, + xleftmargin=1mm, + xrightmargin=1mm, + framerule=0pt +} + +\include{introduction} % Introducao +\bibliography{chd} + +\end{document} diff --git a/tex/chd/introduction.tex b/tex/chd/introduction.tex new file mode 100755 index 0000000..4f365e7 --- /dev/null +++ b/tex/chd/introduction.tex @@ -0,0 +1,38 @@ +\section{Introduction} \label{sec:introduction} + + +The important performance parameters of a PHF are representation size, evaluation time and construction time. The representation size plays an important role when the whole function fits in a faster memory and the actual data is stored in a slower memory. For instace, compact PHFs can be entirely fit in a CPU cache and this makes their computation really fast by avoiding cache misses. The CHD algorithm plays an important role in this context. It was designed by Djamal Belazzougui, Fabiano C. Botelho, and Martin Dietzfelbinger in \cite{bbd09}. + + +The CHD algorithm permits to obtain PHFs with representation size very close to optimal while retaining $O(n)$ construction time and $O(1)$ evaluation time. For example, in the case $m=2n$ we obtain a PHF that uses space $0.67$ bits per key, and for $m=1.23n$ we obtain space $1.4$ bits per key, which was not achievable with previously known methods. The CHD algorithm is inspired by several known algorithms; +the main new feature is that it combines a modification of Pagh's ``hash-and-displace'' approach +with data compression on a sequence of hash function indices. +That combination makes it possible to significantly reduce space usage +while retaining linear construction time and constant query time. +The CHD algorithm can also be used for $k$-perfect hashing, +where at most $k$ keys may be mapped to the same value. +For the analysis we assume that fully random hash functions are given for free; +such assumptions can be justified and were made in previous papers. + +The compact PHFs generated by the CHD algorithm can be used in many applications in which we want to assign a unique identifier to each key without storing any information on the key. One of the most obvious applications of those functions +(or $k$-perfect hash functions) is when we have a small fast memory in which we can store the perfect hash function while the keys and associated satellite data are stored in slower but larger memory. +The size of a block or a transfer unit may be chosen so that $k$ data items can be retrieved in +one read access. In this case we can ensure that data associated with a key can be retrieved in a single probe to slower memory. This has been used for example in hardware routers~\cite{pb06}. +% Perfect hashing has also been found to be competitive with traditional hashing in internal memory~\cite{blmz08} on standard computers. Recently perfect hashing has been used to accelerate algorithms on graphs~\cite{ESS08} when the graph representation does not fit in main memory. + + +The CHD algorithm generates the most compact PHFs and MPHFs we know of in~$O(n)$ time. +The time required to evaluate the generated functions is constant (in practice less than $1.4$ microseconds). +The storage space of the resulting PHFs and MPHFs are distant from the information +theoretic lower bound by a factor of $1.43$. +The closest competitor is the algorithm by Martin and Pagh \cite{dp08} but +their algorithm do not work in linear time. +Furthermore, the CHD algorithm +can be tuned to run faster than the BPZ algorithm \cite{bpz07} (the fastest algorithm +available in the literature so far) and to obtain more compact functions. +The most impressive characteristic is that it has the ability, in principle, to +approximate the information theoretic lower bound while being practical. +A detailed description of the CHD algorithm can be found in \cite{bbd09}. + + + diff --git a/tex/chd/makefile b/tex/chd/makefile new file mode 100755 index 0000000..686054d --- /dev/null +++ b/tex/chd/makefile @@ -0,0 +1,12 @@ +all: + latex chd.tex + bibtex chd + latex chd.tex + latex chd.tex + dvips chd.dvi -o chd.ps +run: clean all + gv chd.ps & +html: clean all + latex2html chd.tex +clean: + rm chd.dvi chd.ps *.lot *.lof *.aux *.bbl *.blg *.log *.toc