It was improved the documentation of BMZ and CHM algorithms
This commit is contained in:
parent
dfa28a005a
commit
f1b1f12dda
162
BMZ.t2t
162
BMZ.t2t
@ -4,46 +4,176 @@ BMZ Algorithm
|
|||||||
%!includeconf: CONFIG.t2t
|
%!includeconf: CONFIG.t2t
|
||||||
|
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
**History**
|
==History==
|
||||||
|
|
||||||
At the end of 2003, professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] was
|
At the end of 2003, professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] was
|
||||||
finishing the second edition of his book.
|
finishing the second edition of his [book http://www.dcc.ufmg.br/algoritmos/].
|
||||||
During the book writing, professor Nivio studied the problem of generating minimal perfect hash
|
During the [book http://www.dcc.ufmg.br/algoritmos/] writing,
|
||||||
|
professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] studied the problem of generating minimal perfect hash
|
||||||
functions (if you are not familiarized with this problem, see [1][2]).
|
functions (if you are not familiarized with this problem, see [1][2]).
|
||||||
Professor Nivio coded a modified version of the [CHM algorithm chm.html], which was proposed by
|
Professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] coded a modified version of
|
||||||
Czech, Havas and Majewski and put it in his book.
|
the [CHM algorithm chm.html], which was proposed by
|
||||||
|
Czech, Havas and Majewski and put it in his [book http://www.dcc.ufmg.br/algoritmos/].
|
||||||
The [CHM algorithm chm.html] is based on acyclic random graphs to generate order preserving
|
The [CHM algorithm chm.html] is based on acyclic random graphs to generate order preserving
|
||||||
minimal perfect hash functions in linear time. Professor Nivio argued himself, why must the random graph
|
minimal perfect hash functions in linear time. Professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio]
|
||||||
be acyclic? In the modified version availalbe in his book he got rid of such restriction.
|
argued himself, why must the random graph
|
||||||
|
be acyclic? In the modified version availalbe in his [book http://www.dcc.ufmg.br/algoritmos/] he got rid of such restriction.
|
||||||
|
|
||||||
The modification presented a problem, it was impossible to generate minimal perfect hash functions
|
The modification presented a problem, it was impossible to generate minimal perfect hash functions
|
||||||
for sets with more than 1000 keys.
|
for sets with more than 1000 keys.
|
||||||
At the same time, [Fabiano C. Botelho http://www.dcc.ufmg.br/~fbotelho],
|
At the same time, [Fabiano C. Botelho http://www.dcc.ufmg.br/~fbotelho],
|
||||||
a master degree student at [Departament of Computer Science http://www.dcc.ufmg.br] in
|
a master degree student at [Departament of Computer Science http://www.dcc.ufmg.br] in
|
||||||
[Federal University of Minas Gerais http://www.ufmg.br],
|
[Federal University of Minas Gerais http://www.ufmg.br],
|
||||||
started to be advised by Nivio who presented the problem to Fabiano.
|
started to be advised by [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] who presented the problem
|
||||||
|
to [Fabiano http://www.dcc.ufmg.br/~fbotelho].
|
||||||
|
|
||||||
During the master, Fabiano and Nivio faced lots of problems.
|
During the master, [Fabiano http://www.dcc.ufmg.br/~fbotelho] and
|
||||||
Talking with a friend of mine (David Menoti) about our problems, many ideas
|
[Nivio Ziviani http://www.dcc.ufmg.br/~nivio] faced lots of problems.
|
||||||
appeared and after of implementing them, we got a very fast algorithm to generate
|
In april of 2004, [Fabiano http://www.dcc.ufmg.br/~fbotelho] was talking with a
|
||||||
minimal perfect hash functions that does not preserve order.
|
friend of him (David Menoti) about the problems
|
||||||
|
and many ideas appeared.
|
||||||
|
The ideas were implemented and we noticed that a very fast algorithm to generate
|
||||||
|
minimal perfect hash functions had been designed.
|
||||||
We refer the algorithm to as **BMZ**, because it was conceived by Fabiano C. **B**otelho
|
We refer the algorithm to as **BMZ**, because it was conceived by Fabiano C. **B**otelho
|
||||||
David **M**enoti and Nivio **Z**iviani. The algorithm is described in [1].
|
David **M**enoti and Nivio **Z**iviani. The algorithm is described in [1].
|
||||||
To analyse BMZ algorithm we needed some results from the random graph theory, so
|
To analyse BMZ algorithm we needed some results from the random graph theory, so
|
||||||
we invite professor [Yoshiharu Kohayakawa http://www.ime.usp.br/~yoshi] to help us.
|
we invite professor [Yoshiharu Kohayakawa http://www.ime.usp.br/~yoshi] to help us.
|
||||||
The final description and analysis of BMZ algorithm is presented in [2].
|
The final description and analysis of BMZ algorithm is presented in [2].
|
||||||
|
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
|
==The Algorithm==
|
||||||
|
|
||||||
|
Let us show how the minimal perfect hash function [figs/img7.png] will be constructed.
|
||||||
|
We make use of two auxiliary random functions [figs/img41.png] and [figs/img55.png],
|
||||||
|
where [figs/img56.png] for some suitably chosen integer [figs/img57.png],
|
||||||
|
where [figs/img58.png].We build a random graph [figs/img59.png] on [figs/img60.png],
|
||||||
|
whose edge set is [figs/img61.png]. There is an edge in [figs/img32.png] for each
|
||||||
|
key in the set of keys [figs/img20.png].
|
||||||
|
|
||||||
|
In what follows, we shall be interested in the //2-core// of
|
||||||
|
the random graph [figs/img32.png], that is, the maximal subgraph
|
||||||
|
of [figs/img32.png] with minimal degree at
|
||||||
|
least 2 (see, e.g., [2] for details).
|
||||||
|
Because of its importance in our context, we call the 2-core the
|
||||||
|
//critical// subgraph of [figs/img32.png] and denote it by [figs/img63.png].
|
||||||
|
The vertices and edges in [figs/img63.png] are said to be //critical//.
|
||||||
|
We let [figs/img64.png] and [figs/img65.png].
|
||||||
|
Moreover, we let [figs/img66.png] be the set of //non-critical//
|
||||||
|
vertices in [figs/img32.png].
|
||||||
|
We also let [figs/img67.png] be the set of all critical
|
||||||
|
vertices that have at least one non-critical vertex as a neighbour.
|
||||||
|
Let [figs/img68.png] be the set of //non-critical// edges in [figs/img32.png].
|
||||||
|
Finally, we let [figs/img69.png] be the //non-critical// subgraph
|
||||||
|
of [figs/img32.png.
|
||||||
|
The non-critical subgraph [figs/img70.png] corresponds to the //acyclic part//
|
||||||
|
of [figs/img32.png].
|
||||||
|
We have [figs/img71.png].
|
||||||
|
|
||||||
|
We then construct a suitable labelling [figs/img72.png] of the vertices
|
||||||
|
of [figs/img32.png]: we choose [figs/img73.png] for each [figs/img74.png] in such
|
||||||
|
a way that [figs/img75.png] ([figs/img18.png]) is a
|
||||||
|
minimal perfect hash function for [figs/img20.png].
|
||||||
|
We will see later on that this labelling [figs/img37.png] can be found in linear time
|
||||||
|
if the number of edges in [figs/img63.png] is at most [figs/img76.png].
|
||||||
|
|
||||||
|
Figure 2 presents a pseudo code for the algorithm.
|
||||||
|
The procedure GenerateMPHF ([figs/img20.png], [figs/img37.png]) receives as input the set of
|
||||||
|
keys [figs/img20.png] and produces the labelling [figs/img37.png].
|
||||||
|
The method uses a mapping, ordering and searching approach.
|
||||||
|
We now describe each step.
|
||||||
|
| procedure GenerateMPHF ([figs/img20.png], [figs/img37.png])
|
||||||
|
| Mapping ([figs/img20.png], [figs/img32.png]);
|
||||||
|
| Ordering ([figs/img32.png], [figs/img63.png], [figs/img70.png]);
|
||||||
|
| Searching ([figs/img32.png], [figs/img63.png], [figs/img70.png], [figs/img37.png]);
|
||||||
|
**Figure 2**: Main steps of the algorithm for constructing a minimal perfect hash function
|
||||||
|
|
||||||
|
===Mapping Step===
|
||||||
|
|
||||||
|
===Ordering Step===
|
||||||
|
|
||||||
|
===Searching Step===
|
||||||
|
|
||||||
|
====Assignment of Values to Critical Vertices====
|
||||||
|
|
||||||
|
====Assignment of Values to Non-Critical Vertices====
|
||||||
|
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
|
==The Heuristic==
|
||||||
|
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
|
==Memory Consumption==
|
||||||
|
|
||||||
|
Now we detail the memory consumption to generate and to store minimal perfect hash functions
|
||||||
|
using the BMZ algorithm. The structures responsible for memory consumption are in the
|
||||||
|
following:
|
||||||
|
- Graph:
|
||||||
|
+ **first**: is a vector that stores //cn// integer numbers, each one representing
|
||||||
|
the first edge (index in the vector edges) in the list of
|
||||||
|
edges of each vertex.
|
||||||
|
The integer numbers are 4 bytes long. Therefore,
|
||||||
|
the vector first is stored in //4cn// bytes.
|
||||||
|
|
||||||
|
+ **edges**: is a vector to represent the edges of the graph. As each edge
|
||||||
|
is compounded by a pair of vertices, each entry stores two integer numbers
|
||||||
|
of 4 bytes that represent the vertices. As there are //n// edges, the
|
||||||
|
vector edges is stored in //8n// bytes.
|
||||||
|
|
||||||
|
+ **next**: given a vertex //v//, we can discover the edges that contain //v//
|
||||||
|
following its list of edges, which starts on first[//v//] and the next
|
||||||
|
edges are given by next[...first[//v//]...]. Therefore, the vectors first and next represent
|
||||||
|
the linked lists of edges of each vertex. As there are two vertices for each edge,
|
||||||
|
when an edge is iserted in the graph, it must be inserted in the two linked lists
|
||||||
|
of the vertices in its composition. Therefore, there are //2n// entries of integer
|
||||||
|
numbers in the vector next, so it is stored in //4*2n = 8n// bytes.
|
||||||
|
|
||||||
|
+ **critical vertices(critical_nodes vector)**: is a vector of //cn// bits,
|
||||||
|
where each bit indicates if a vertex is critical (1) or non-critical (0).
|
||||||
|
Therefore, the critical and non-critical vertices are represented in //cn/8// bytes.
|
||||||
|
|
||||||
|
+ **critical edges (used_edges vector)**: is a vector of //n// bits, where each
|
||||||
|
bit indicates if an edge is critical (1) or non-critical (0). Therefore, the
|
||||||
|
critical and non-critical edges are represented in //n/8// bytes.
|
||||||
|
|
||||||
|
- Other auxiliary structures
|
||||||
|
+ **queue**: is a queue of integer numbers used in the breadth-first search of the
|
||||||
|
assignment of values to critical vertices. There is an entry in the queue for
|
||||||
|
each two critical vertices. Let //|Vcrit|// be the expected number of critical
|
||||||
|
vertices. Therefore, the queue is stored in //4*0.5*|Vcrit|=2|Vcrit|//.
|
||||||
|
|
||||||
|
+ **visited**: is a vector of //cn// bits, where each bit indicates if the g value of
|
||||||
|
a given vertex was already defined. Therefore, the vector visited is stored
|
||||||
|
in //cn/8// bytes.
|
||||||
|
|
||||||
|
+ **function //g//**: is represented by a vector of //cn// integer numbers.
|
||||||
|
As each integer number is 4 bytes long, the function //g// is stored in
|
||||||
|
//4cn// bytes.
|
||||||
|
|
||||||
|
|
||||||
**The Algorithm**
|
Thus, the total memory consumption of BMZ algorithm for generating a minimal
|
||||||
|
perfect hash function (MPHF) is: //(8.25c + 16.125)n +2|Vcrit| + O(1)// bytes.
|
||||||
|
As the value of constant //c// may be 1.15 and 0.93 we have:
|
||||||
|
|| //c// | //|Vcrit|// | Memory consumption to generate a MPHF |
|
||||||
|
| 0.93 | //0.497n// | //24.80n + O(1)// |
|
||||||
|
| 1.15 | //0.401n// | //26.42n + O(1)// |
|
||||||
|
The values of |Vcrit| were calculated using Eq.(1) presented in [2].
|
||||||
|
|
||||||
**The Heuristic**
|
Now we present the memory consumption to store the resulting function.
|
||||||
|
We only need to store the //g// function. Thus, we need //4cn// bytes.
|
||||||
|
Again we have:
|
||||||
|
|| //c// | Memory consumption to store a MPHF |
|
||||||
|
| 0.93 | //3.72n// |
|
||||||
|
| 1.15 | //4.60n// |
|
||||||
|
|
||||||
**Papers**
|
----------------------------------------
|
||||||
|
|
||||||
|
==Papers==
|
||||||
|
|
||||||
+ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], D. Menoti, [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [A New algorithm for constructing minimal perfect hash functions papers/bmz_tr004_04.ps], Technical Report TR004/04, Department of Computer Science, Federal University of Minas Gerais, 2004.
|
+ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], D. Menoti, [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [A New algorithm for constructing minimal perfect hash functions papers/bmz_tr004_04.ps], Technical Report TR004/04, Department of Computer Science, Federal University of Minas Gerais, 2004.
|
||||||
|
|
||||||
+ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], Y. Kohayakawa, and [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [A Practical Minimal Perfect Hashing Method papers/bmz_wea2005.ps], 4th International Workshop on Efficient and Experimental Algorithms (WEA), 2005.(submitted)
|
+ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], Y. Kohayakawa, and [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [A Practical Minimal Perfect Hashing Method papers/bmz_wea2005.ps] (Submitted).
|
||||||
|
|
||||||
|
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
|
Before Width: | Height: | Size: 2.6 KiB After Width: | Height: | Size: 9.3 KiB |
51
CHM.t2t
51
CHM.t2t
@ -4,12 +4,57 @@ CHM Algorithm
|
|||||||
%!includeconf: CONFIG.t2t
|
%!includeconf: CONFIG.t2t
|
||||||
|
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
|
==The Algorithm==
|
||||||
|
|
||||||
**History**
|
==Memory Consumption==
|
||||||
|
|
||||||
**The Algorithm**
|
Now we detail the memory consumption to generate and to store minimal perfect hash functions
|
||||||
|
using the CHM algorithm. The structures responsible for memory consumption are in the
|
||||||
|
following:
|
||||||
|
- Graph:
|
||||||
|
+ **first**: is a vector that stores //cn// integer numbers, each one representing
|
||||||
|
the first edge (index in the vector edges) in the list of
|
||||||
|
edges of each vertex.
|
||||||
|
The integer numbers are 4 bytes long. Therefore,
|
||||||
|
the vector first is stored in //4cn// bytes.
|
||||||
|
|
||||||
**Papers**
|
+ **edges**: is a vector to represent the edges of the graph. As each edge
|
||||||
|
is compounded by a pair of vertices, each entry stores two integer numbers
|
||||||
|
of 4 bytes that represent the vertices. As there are //n// edges, the
|
||||||
|
vector edges is stored in //8n// bytes.
|
||||||
|
|
||||||
|
+ **next**: given a vertex //v//, we can discover the edges that contain //v//
|
||||||
|
following its list of edges, which starts on first[//v//] and the next
|
||||||
|
edges are given by next[...first[//v//]...]. Therefore, the vectors first and next represent
|
||||||
|
the linked lists of edges of each vertex. As there are two vertices for each edge,
|
||||||
|
when an edge is iserted in the graph, it must be inserted in the two linked lists
|
||||||
|
of the vertices in its composition. Therefore, there are //2n// entries of integer
|
||||||
|
numbers in the vector next, so it is stored in //4*2n = 8n// bytes.
|
||||||
|
|
||||||
|
- Other auxiliary structures
|
||||||
|
+ **visited**: is a vector of //cn// bits, where each bit indicates if the g value of
|
||||||
|
a given vertex was already defined. Therefore, the vector visited is stored
|
||||||
|
in //cn/8// bytes.
|
||||||
|
|
||||||
|
+ **function //g//**: is represented by a vector of //cn// integer numbers.
|
||||||
|
As each integer number is 4 bytes long, the function //g// is stored in
|
||||||
|
//4cn// bytes.
|
||||||
|
|
||||||
|
|
||||||
|
Thus, the total memory consumption of CHM algorithm for generating a minimal
|
||||||
|
perfect hash function (MPHF) is: //(8.125c + 16)n + O(1)// bytes.
|
||||||
|
As the value of constant //c// must be at least 2.09 we have:
|
||||||
|
|| //c// | Memory consumption to generate a MPHF |
|
||||||
|
| 2.09 | //33.00n + O(1)// |
|
||||||
|
|
||||||
|
Now we present the memory consumption to store the resulting function.
|
||||||
|
We only need to store the //g// function. Thus, we need //4cn// bytes.
|
||||||
|
Again we have:
|
||||||
|
|| //c// | Memory consumption to store a MPHF |
|
||||||
|
| 2.09 | //8.36n// |
|
||||||
|
|
||||||
|
|
||||||
|
==Papers==
|
||||||
|
|
||||||
+ Z.J. Czech, G. Havas, and B.S. Majewski. [An optimal algorithm for generating minimal perfect hash functions. papers/chm92.pdf], Information Processing Letters, 43(5):257-264, 1992.
|
+ Z.J. Czech, G. Havas, and B.S. Majewski. [An optimal algorithm for generating minimal perfect hash functions. papers/chm92.pdf], Information Processing Letters, 43(5):257-264, 1992.
|
||||||
|
|
||||||
|
@ -5,14 +5,14 @@ Comparison Between BMZ And CHM Algorithms
|
|||||||
|
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
|
|
||||||
**Features**
|
==Features==
|
||||||
|
|
||||||
**Constructing Minimal Perfect Hash Functions**
|
==Constructing Minimal Perfect Hash Functions==
|
||||||
|
|
||||||
**Memory Consumption**
|
==Memory Consumption==
|
||||||
|
|
||||||
|
|
||||||
**Run times**
|
==Run times==
|
||||||
|
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
[Home index.html]
|
[Home index.html]
|
||||||
|
@ -1,2 +1,4 @@
|
|||||||
%! PreProc(html): '^%html% ' ''
|
%! PreProc(html): '^%html% ' ''
|
||||||
%! PreProc(txt): '^%txt% ' ''
|
%! PreProc(txt): '^%txt% ' ''
|
||||||
|
%! PostProc(html): "&" "&"
|
||||||
|
%! PostProc(txt): " " " "
|
||||||
|
22
README.t2t
22
README.t2t
@ -5,7 +5,7 @@ CMPH - C Minimal Perfect Hashing Library
|
|||||||
|
|
||||||
-------------------------------------------------------------------
|
-------------------------------------------------------------------
|
||||||
|
|
||||||
**Description**
|
==Description==
|
||||||
|
|
||||||
C Minimal Perfect Hashing Library is a portable LGPLed library to create and
|
C Minimal Perfect Hashing Library is a portable LGPLed library to create and
|
||||||
to work with minimal perfect hash functions. The cmph library encapsulates the newest
|
to work with minimal perfect hash functions. The cmph library encapsulates the newest
|
||||||
@ -31,35 +31,35 @@ of the distinguishable features of cmph:
|
|||||||
|
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
|
|
||||||
**Supported Algorithms**
|
==Supported Algorithms==
|
||||||
|
|
||||||
|
|
||||||
%html% - [BMZ Algorithm bmz.html].
|
%html% - [BMZ Algorithm bmz.html].
|
||||||
%txt% - BMZ Algorithm.
|
%txt% - BMZ Algorithm.
|
||||||
A very fast algorithm based on cyclic random graphs to construct minimal
|
A very fast algorithm based on cyclic random graphs to construct minimal
|
||||||
perfect hash functions in linear time. The resulting functions are not order preserving and
|
perfect hash functions in linear time. The resulting functions are not order preserving and
|
||||||
can be stored in only 4cn bytes, where c is between 0.93 and 1.15.
|
can be stored in only //4cn// bytes, where //c// is between 0.93 and 1.15.
|
||||||
%html% - [CHM Algorithm chm.html].
|
%html% - [CHM Algorithm chm.html].
|
||||||
%txt% - CHM Algorithm.
|
%txt% - CHM Algorithm.
|
||||||
An algorithm based on acyclic random graphs to construct minimal
|
An algorithm based on acyclic random graphs to construct minimal
|
||||||
perfect hash functions in linear time. The resulting functions are order preserving and
|
perfect hash functions in linear time. The resulting functions are order preserving and
|
||||||
are stored in 4cn bytes, where c is greater than 2.
|
are stored in //4cn// bytes, where //c// is greater than 2.
|
||||||
|
|
||||||
%html% [Click Here comparison.html] to see a comparison of the supported algorithms.
|
%html% [Click Here comparison.html] to see a comparison of the supported algorithms.
|
||||||
|
|
||||||
|
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
|
|
||||||
**News for version 0.3**
|
==News for version 0.3==
|
||||||
|
|
||||||
- New heuristic added to the bmz algorithm permits to generate a mphf with only
|
- New heuristic added to the bmz algorithm permits to generate a mphf with only
|
||||||
24.61*n + O(1) bytes. The resulting function can be stored in 3.72*n bytes.
|
//24.6n + O(1)// bytes. The resulting function can be stored in //3.72n// bytes.
|
||||||
%html% [click here bmz.html] for details.
|
%html% [click here bmz.html] for details.
|
||||||
|
|
||||||
|
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
|
|
||||||
**Examples**
|
==Examples==
|
||||||
|
|
||||||
Using cmph is quite simple. Take a look.
|
Using cmph is quite simple. Take a look.
|
||||||
|
|
||||||
@ -113,7 +113,7 @@ Using cmph is quite simple. Take a look.
|
|||||||
```
|
```
|
||||||
--------------------------------------
|
--------------------------------------
|
||||||
|
|
||||||
**The cmph application**
|
==The cmph application==
|
||||||
|
|
||||||
cmph is the name of both the library and the utility
|
cmph is the name of both the library and the utility
|
||||||
application that comes with this package. You can use the cmph
|
application that comes with this package. You can use the cmph
|
||||||
@ -157,16 +157,16 @@ utility.
|
|||||||
keysfile line separated file with keys
|
keysfile line separated file with keys
|
||||||
```
|
```
|
||||||
|
|
||||||
**Additional Documentation**
|
==Additional Documentation==
|
||||||
|
|
||||||
[FAQ faq.html]
|
[FAQ faq.html]
|
||||||
|
|
||||||
**Downloads**
|
==Downloads==
|
||||||
|
|
||||||
Use the project page at sourceforge: http://sf.net/projects/cmph
|
Use the project page at sourceforge: http://sf.net/projects/cmph
|
||||||
|
|
||||||
|
|
||||||
**License Stuff**
|
==License Stuff==
|
||||||
|
|
||||||
Code is under the LGPL.
|
Code is under the LGPL.
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
|
Loading…
Reference in New Issue
Block a user