BMZ documentation was finished
This commit is contained in:
parent
9110014044
commit
9abc48f91c
309
BMZ.t2t
309
BMZ.t2t
@ -9,15 +9,17 @@ BMZ Algorithm
|
||||
At the end of 2003, professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] was
|
||||
finishing the second edition of his [book http://www.dcc.ufmg.br/algoritmos/].
|
||||
During the [book http://www.dcc.ufmg.br/algoritmos/] writing,
|
||||
professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] studied the problem of generating minimal perfect hash
|
||||
functions (if you are not familiarized with this problem, see [1][2]).
|
||||
professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] studied the problem of generating
|
||||
[minimal perfect hash functions concepts.html]
|
||||
(if you are not familiarized with this problem, see [[1 #papers]][[2 #papers]]).
|
||||
Professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] coded a modified version of
|
||||
the [CHM algorithm chm.html], which was proposed by
|
||||
Czech, Havas and Majewski and put it in his [book http://www.dcc.ufmg.br/algoritmos/].
|
||||
The [CHM algorithm chm.html] is based on acyclic random graphs to generate order preserving
|
||||
minimal perfect hash functions in linear time. Professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio]
|
||||
Czech, Havas and Majewski, and put it in his [book http://www.dcc.ufmg.br/algoritmos/].
|
||||
The [CHM algorithm chm.html] is based on acyclic random graphs to generate
|
||||
[order preserving minimal perfect hash functions concepts.html] in linear time.
|
||||
Professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio]
|
||||
argued himself, why must the random graph
|
||||
be acyclic? In the modified version availalbe in his [book http://www.dcc.ufmg.br/algoritmos/] he got rid of such restriction.
|
||||
be acyclic? In the modified version availalbe in his [book http://www.dcc.ufmg.br/algoritmos/] he got rid of this restriction.
|
||||
|
||||
The modification presented a problem, it was impossible to generate minimal perfect hash functions
|
||||
for sets with more than 1000 keys.
|
||||
@ -32,19 +34,38 @@ During the master, [Fabiano http://www.dcc.ufmg.br/~fbotelho] and
|
||||
In april of 2004, [Fabiano http://www.dcc.ufmg.br/~fbotelho] was talking with a
|
||||
friend of him (David Menoti) about the problems
|
||||
and many ideas appeared.
|
||||
The ideas were implemented and we noticed that a very fast algorithm to generate
|
||||
The ideas were implemented and a very fast algorithm to generate
|
||||
minimal perfect hash functions had been designed.
|
||||
We refer the algorithm to as **BMZ**, because it was conceived by Fabiano C. **B**otelho
|
||||
David **M**enoti and Nivio **Z**iviani. The algorithm is described in [1].
|
||||
We refer the algorithm to as **BMZ**, because it was conceived by Fabiano C. **B**otelho,
|
||||
David **M**enoti and Nivio **Z**iviani. The algorithm is described in [[1 #papers]].
|
||||
To analyse BMZ algorithm we needed some results from the random graph theory, so
|
||||
we invite professor [Yoshiharu Kohayakawa http://www.ime.usp.br/~yoshi] to help us.
|
||||
The final description and analysis of BMZ algorithm is presented in [2].
|
||||
The final description and analysis of BMZ algorithm is presented in [[2 #papers]].
|
||||
|
||||
----------------------------------------
|
||||
|
||||
==The Algorithm==
|
||||
|
||||
Let us show how the minimal perfect hash function [figs/img7.png] will be constructed.
|
||||
The BMZ algorithm shares several features with the [CHM algorithm chm.html].
|
||||
In particular, BMZ algorithm is also
|
||||
based on the generation of random graphs [figs/img27.png], where [figs/img28.png] is in
|
||||
one-to-one correspondence with the key set [figs/img20.png] for which we wish to
|
||||
generate a [minimal perfect hash function concepts.html].
|
||||
The two main differences between BMZ algorithm and CHM algorithm
|
||||
are as follows: (//i//) BMZ algorithm generates random
|
||||
graphs [figs/img27.png] with [figs/img29.png] and [figs/img30.png], where [figs/img31.png],
|
||||
and hence [figs/img32.png] necessarily contains cycles,
|
||||
while CHM algorithm generates //acyclic// random
|
||||
graphs [figs/img27.png] with [figs/img29.png] and [figs/img30.png],
|
||||
with a greater number of vertices: [figs/img33.png];
|
||||
(//ii//) CHM algorithm generates [order preserving minimal perfect hash functions concepts.html]
|
||||
while BMZ algorithm does not preserve order. Thus, BMZ algorithm improves
|
||||
the space requirement at the expense of generating functions that are not
|
||||
order preserving.
|
||||
|
||||
Suppose [figs/img14.png] is a universe of //keys//.
|
||||
Let [figs/img17.png] be a set of [figs/img8.png] keys from [figs/img14.png].
|
||||
Let us show how the BMZ algorithm constructs a minimal perfect hash function [figs/img7.png].
|
||||
We make use of two auxiliary random functions [figs/img41.png] and [figs/img55.png],
|
||||
where [figs/img56.png] for some suitably chosen integer [figs/img57.png],
|
||||
where [figs/img58.png].We build a random graph [figs/img59.png] on [figs/img60.png],
|
||||
@ -54,7 +75,7 @@ key in the set of keys [figs/img20.png].
|
||||
In what follows, we shall be interested in the //2-core// of
|
||||
the random graph [figs/img32.png], that is, the maximal subgraph
|
||||
of [figs/img32.png] with minimal degree at
|
||||
least 2 (see, e.g., [2] for details).
|
||||
least 2 (see [[2 #papers]] for details).
|
||||
Because of its importance in our context, we call the 2-core the
|
||||
//critical// subgraph of [figs/img32.png] and denote it by [figs/img63.png].
|
||||
The vertices and edges in [figs/img63.png] are said to be //critical//.
|
||||
@ -65,7 +86,7 @@ We also let [figs/img67.png] be the set of all critical
|
||||
vertices that have at least one non-critical vertex as a neighbour.
|
||||
Let [figs/img68.png] be the set of //non-critical// edges in [figs/img32.png].
|
||||
Finally, we let [figs/img69.png] be the //non-critical// subgraph
|
||||
of [figs/img32.png.
|
||||
of [figs/img32.png].
|
||||
The non-critical subgraph [figs/img70.png] corresponds to the //acyclic part//
|
||||
of [figs/img32.png].
|
||||
We have [figs/img71.png].
|
||||
@ -74,33 +95,222 @@ We then construct a suitable labelling [figs/img72.png] of the vertices
|
||||
of [figs/img32.png]: we choose [figs/img73.png] for each [figs/img74.png] in such
|
||||
a way that [figs/img75.png] ([figs/img18.png]) is a
|
||||
minimal perfect hash function for [figs/img20.png].
|
||||
We will see later on that this labelling [figs/img37.png] can be found in linear time
|
||||
if the number of edges in [figs/img63.png] is at most [figs/img76.png].
|
||||
This labelling [figs/img37.png] can be found in linear time
|
||||
if the number of edges in [figs/img63.png] is at most [figs/img76.png] (see [[2 #papers]]
|
||||
for details).
|
||||
|
||||
Figure 2 presents a pseudo code for the algorithm.
|
||||
The procedure GenerateMPHF ([figs/img20.png], [figs/img37.png]) receives as input the set of
|
||||
Figure 1 presents a pseudo code for the BMZ algorithm.
|
||||
The procedure BMZ ([figs/img20.png], [figs/img37.png]) receives as input the set of
|
||||
keys [figs/img20.png] and produces the labelling [figs/img37.png].
|
||||
The method uses a mapping, ordering and searching approach.
|
||||
We now describe each step.
|
||||
| procedure GenerateMPHF ([figs/img20.png], [figs/img37.png])
|
||||
| Mapping ([figs/img20.png], [figs/img32.png]);
|
||||
| Ordering ([figs/img32.png], [figs/img63.png], [figs/img70.png]);
|
||||
| Searching ([figs/img32.png], [figs/img63.png], [figs/img70.png], [figs/img37.png]);
|
||||
**Figure 2**: Main steps of the algorithm for constructing a minimal perfect hash function
|
||||
|
||||
===Mapping Step===
|
||||
|
||||
===Ordering Step===
|
||||
|
||||
===Searching Step===
|
||||
|
||||
====Assignment of Values to Critical Vertices====
|
||||
|
||||
====Assignment of Values to Non-Critical Vertices====
|
||||
| procedure BMZ ([figs/img20.png], [figs/img37.png])
|
||||
| Mapping ([figs/img20.png], [figs/img32.png]);
|
||||
| Ordering ([figs/img32.png], [figs/img63.png], [figs/img70.png]);
|
||||
| Searching ([figs/img32.png], [figs/img63.png], [figs/img70.png], [figs/img37.png]);
|
||||
| **Figure 1**: Main steps of BMZ algorithm for constructing a minimal perfect hash function
|
||||
|
||||
----------------------------------------
|
||||
|
||||
==The Heuristic==
|
||||
===Mapping Step===
|
||||
|
||||
The procedure Mapping ([figs/img20.png], [figs/img32.png]) receives as input the set
|
||||
of keys [figs/img20.png] and generates the random graph [figs/img59.png], by generating
|
||||
two auxiliary functions [figs/img41.png], [figs/img78.png].
|
||||
|
||||
The functions [figs/img41.png] and [figs/img42.png] are constructed as follows.
|
||||
We impose some upper bound [figs/img79.png] on the lengths of the keys in [figs/img20.png].
|
||||
To define [figs/img80.png] ([figs/img81.png], [figs/img62.png]), we generate
|
||||
an [figs/img82.png] table of random integers [figs/img83.png].
|
||||
For a key [figs/img18.png] of length [figs/img84.png] and [figs/img85.png], we let
|
||||
|
||||
| [figs/img86.png]
|
||||
|
||||
The random graph [figs/img59.png] has vertex set [figs/img56.png] and
|
||||
edge set [figs/img61.png]. We need [figs/img32.png] to be
|
||||
simple, i.e., [figs/img32.png] should have neither loops nor multiple edges.
|
||||
A loop occurs when [figs/img87.png] for some [figs/img18.png].
|
||||
We solve this in an ad hoc manner: we simply let [figs/img88.png] in this case.
|
||||
If we still find a loop after this, we generate another pair [figs/img89.png].
|
||||
When a multiple edge occurs we abort and generate a new pair [figs/img89.png].
|
||||
Although the function above causes [collisions concepts.html] with probability //1/t//,
|
||||
in [cmph library index.html] we use faster hash
|
||||
functions ([DJB2 hash http://], [FNV hash http://], [Jenkins hash http://]
|
||||
and [SDBM hash http://]) in which we do not need to impose any upper bound [figs/img79.png] on the lengths of the keys in [figs/img20.png].
|
||||
|
||||
As mentioned before, for us to find the labelling [figs/img72.png] of the
|
||||
vertices of [figs/img59.png] in linear time,
|
||||
we require that [figs/img108.png].
|
||||
The crucial step now is to determine the value
|
||||
of [figs/img1.png] (in [figs/img57.png]) to obtain a random
|
||||
graph [figs/img71.png] with [figs/img109.png].
|
||||
Botelho, Menoti an Ziviani determinded emprically in [[1 #papers]] that
|
||||
the value of [figs/img1.png] is //1.15//. This value is remarkably
|
||||
close to the theoretical value determined in [[2 #papers]],
|
||||
which is around [figs/img112.png].
|
||||
|
||||
----------------------------------------
|
||||
|
||||
===Ordering Step===
|
||||
|
||||
The procedure Ordering ([figs/img32.png], [figs/img63.png], [figs/img70.png]) receives
|
||||
as input the graph [figs/img32.png] and partitions [figs/img32.png] into the two
|
||||
subgraphs [figs/img63.png] and [figs/img70.png], so that [figs/img71.png].
|
||||
|
||||
Figure 2 presents a sample graph with 9 vertices
|
||||
and 8 edges, where the degree of a vertex is shown besides each vertex.
|
||||
Initially, all vertices with degree 1 are added to a queue [figs/img136.png].
|
||||
For the example shown in Figure 2(a), [figs/img137.png] after the initialization step.
|
||||
|
||||
| [figs/img138.png]
|
||||
| **Figure 2:** Ordering step for a graph with 9 vertices and 8 edges.
|
||||
|
||||
Next, we remove one vertex [figs/img139.png] from the queue, decrement its degree and
|
||||
the degree of the vertices with degree greater than 0 in the adjacent
|
||||
list of [figs/img139.png], as depicted in Figure 2(b) for [figs/img140.png].
|
||||
At this point, the adjacencies of [figs/img139.png] with degree 1 are
|
||||
inserted into the queue, such as vertex 1.
|
||||
This process is repeated until the queue becomes empty.
|
||||
All vertices with degree 0 are non-critical vertices and the others are
|
||||
critical vertices, as depicted in Figure 2(c).
|
||||
Finally, to determine the vertices in [figs/img141.png] we collect all
|
||||
vertices [figs/img142.png] with at least one vertex [figs/img143.png] that
|
||||
is in Adj[figs/img144.png] and in [figs/img145.png], as the vertex 8 in Figure 2(c).
|
||||
|
||||
----------------------------------------
|
||||
|
||||
===Searching Step===
|
||||
|
||||
In the searching step, the key part is
|
||||
the //perfect assignment problem//: find [figs/img153.png] such that
|
||||
the function [figs/img154.png] defined by
|
||||
|
||||
| [figs/img155.png]
|
||||
|
||||
is a bijection from [figs/img156.png] to [figs/img157.png] (recall [figs/img158.png]).
|
||||
We are interested in a labelling [figs/img72.png] of
|
||||
the vertices of the graph [figs/img59.png] with
|
||||
the property that if [figs/img11.png] and [figs/img22.png] are keys
|
||||
in [figs/img20.png], then [figs/img159.png]; that is, if we associate
|
||||
to each edge the sum of the labels on its endpoints, then these values
|
||||
should be all distinct.
|
||||
Moreover, we require that all the sums [figs/img160.png] ([figs/img18.png])
|
||||
fall between [figs/img115.png] and [figs/img161.png], and thus we have a bijection
|
||||
between [figs/img20.png] and [figs/img157.png].
|
||||
|
||||
The procedure Searching ([figs/img32.png], [figs/img63.png], [figs/img70.png], [figs/img37.png])
|
||||
receives as input [figs/img32.png], [figs/img63.png], [figs/img70.png] and finds a
|
||||
suitable [figs/img162.png] bit value for each vertex [figs/img74.png], stored in the
|
||||
array [figs/img37.png].
|
||||
This step is first performed for the vertices in the
|
||||
critical subgraph [figs/img63.png] of [figs/img32.png] (the 2-core of [figs/img32.png])
|
||||
and then it is performed for the vertices in [figs/img70.png] (the non-critical subgraph
|
||||
of [figs/img32.png] that contains the "acyclic part" of [figs/img32.png]).
|
||||
The reason the assignment of the [figs/img37.png] values is first
|
||||
performed on the vertices in [figs/img63.png] is to resolve reassignments
|
||||
as early as possible (such reassignments are consequences of the cycles
|
||||
in [figs/img63.png] and are depicted hereinafter).
|
||||
|
||||
----------------------------------------
|
||||
|
||||
====Assignment of Values to Critical Vertices====
|
||||
|
||||
The labels [figs/img73.png] ([figs/img142.png])
|
||||
are assigned in increasing order following a greedy
|
||||
strategy where the critical vertices [figs/img139.png] are considered one at a time,
|
||||
according to a breadth-first search on [figs/img63.png].
|
||||
If a candidate value [figs/img11.png] for [figs/img73.png] is forbidden
|
||||
because setting [figs/img163.png] would create two edges with the same sum,
|
||||
we try [figs/img164.png] for [figs/img73.png]. This fact is referred to
|
||||
as a //reassignment//.
|
||||
|
||||
Let [figs/img165.png] be the set of addresses assigned to edges in [figs/img166.png].
|
||||
Initially [figs/img167.png].
|
||||
Let [figs/img11.png] be a candidate value for [figs/img73.png].
|
||||
Initially [figs/img168.png].
|
||||
Considering the subgraph [figs/img63.png] in Figure 2(c),
|
||||
a step by step example of the assignment of values to vertices in [figs/img63.png] is
|
||||
presented in Figure 3.
|
||||
Initially, a vertex [figs/img139.png] is chosen, the assignment [figs/img163.png] is made
|
||||
and [figs/img11.png] is set to [figs/img164.png].
|
||||
For example, suppose that vertex [figs/img169.png] in Figure 3(a) is
|
||||
chosen, the assignment [figs/img170.png] is made and [figs/img11.png] is set to [figs/img96.png].
|
||||
|
||||
| [figs/img171.png]
|
||||
| **Figure 3:** Example of the assignment of values to critical vertices.
|
||||
|
||||
In Figure 3(b), following the adjacent list of vertex [figs/img169.png],
|
||||
the unassigned vertex [figs/img115.png] is reached.
|
||||
At this point, we collect in the temporary variable [figs/img172.png] all adjacencies
|
||||
of vertex [figs/img115.png] that have been assigned an [figs/img11.png] value,
|
||||
and [figs/img173.png].
|
||||
Next, for all [figs/img174.png], we check if [figs/img175.png].
|
||||
Since [figs/img176.png], then [figs/img177.png] is set
|
||||
to [figs/img96.png], [figs/img11.png] is incremented
|
||||
by 1 (now [figs/img178.png]) and [figs/img179.png].
|
||||
Next, vertex [figs/img180.png] is reached, [figs/img181.png] is set
|
||||
to [figs/img62.png], [figs/img11.png] is set to [figs/img180.png] and [figs/img182.png].
|
||||
Next, vertex [figs/img183.png] is reached and [figs/img184.png].
|
||||
Since [figs/img185.png] and [figs/img186.png], then [figs/img187.png] is
|
||||
set to [figs/img180.png], [figs/img11.png] is set to [figs/img183.png] and [figs/img188.png].
|
||||
Finally, vertex [figs/img189.png] is reached and [figs/img190.png].
|
||||
Since [figs/img191.png], [figs/img11.png] is incremented by 1 and set to 5, as depicted in
|
||||
Figure 3(c).
|
||||
Since [figs/img192.png], [figs/img11.png] is again incremented by 1 and set to 6,
|
||||
as depicted in Figure 3(d).
|
||||
These two reassignments are indicated by the arrows in Figure 3.
|
||||
Since [figs/img193.png] and [figs/img194.png], then [figs/img195.png] is set
|
||||
to [figs/img196.png] and [figs/img197.png]. This finishes the algorithm.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
====Assignment of Values to Non-Critical Vertices====
|
||||
|
||||
As [figs/img70.png] is acyclic, we can impose the order in which addresses are
|
||||
associated with edges in [figs/img70.png], making this step simple to solve
|
||||
by a standard depth first search algorithm.
|
||||
Therefore, in the assignment of values to vertices in [figs/img70.png] we
|
||||
benefit from the unused addresses in the gaps left by the assignment of values
|
||||
to vertices in [figs/img63.png].
|
||||
For that, we start the depth-first search from the vertices in [figs/img141.png] because
|
||||
the [figs/img37.png] values for these critical vertices were already assigned
|
||||
and cannot be changed.
|
||||
|
||||
Considering the subgraph [figs/img70.png] in Figure 2(c),
|
||||
a step by step example of the assignment of values to vertices in [figs/img70.png] is
|
||||
presented in Figure 4.
|
||||
Figure 4(a) presents the initial state of the algorithm.
|
||||
The critical vertex 8 is the only one that has non-critical vertices as
|
||||
adjacent.
|
||||
In the example presented in Figure 3, the addresses [figs/img198.png] were not used.
|
||||
So, taking the first unused address [figs/img115.png] and the vertex [figs/img96.png],
|
||||
which is reached from the vertex [figs/img169.png], [figs/img199.png] is set
|
||||
to [figs/img200.png], as shown in Figure 4(b).
|
||||
The only vertex that is reached from vertex [figs/img96.png] is vertex [figs/img62.png], so
|
||||
taking the unused address [figs/img183.png] we set [figs/img201.png] to [figs/img202.png],
|
||||
as shown in Figure 4(c).
|
||||
This process is repeated until the UnAssignedAddresses list becomes empty.
|
||||
|
||||
| [figs/img203.png]
|
||||
| **Figure 4:** Example of the assignment of values to non-critical vertices.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
==The Heuristic==[heuristic]
|
||||
|
||||
We now present an heuristic for BMZ algorithm that
|
||||
reduces the value of [figs/img1.png] to any given value between //1.15// and //0.93//.
|
||||
This reduces the space requirement to store the resulting function
|
||||
to any given value between [figs/img12.png] words and [figs/img13.png] words.
|
||||
The heuristic reuses, when possible, the set
|
||||
of [figs/img11.png] values that caused reassignments, just before
|
||||
trying [figs/img164.png].
|
||||
Decreasing the value of [figs/img1.png] leads to an increase in the number of
|
||||
iterations to generate [figs/img32.png].
|
||||
For example, for [figs/img244.png] and [figs/img6.png], the analytical expected number
|
||||
of iterations are [figs/img245.png] and [figs/img246.png], respectively (see [[2 #papers]]
|
||||
for details),
|
||||
while for [figs/img128.png] the same value is around //2.13//.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
@ -121,9 +331,10 @@ following:
|
||||
of 4 bytes that represent the vertices. As there are //n// edges, the
|
||||
vector edges is stored in //8n// bytes.
|
||||
|
||||
+ **next**: given a vertex //v//, we can discover the edges that contain //v//
|
||||
following its list of edges, which starts on first[//v//] and the next
|
||||
edges are given by next[...first[//v//]...]. Therefore, the vectors first and next represent
|
||||
+ **next**: given a vertex [figs/img139.png], we can discover the edges that
|
||||
contain [figs/img139.png] following its list of edges,
|
||||
which starts on first[[figs/img139.png]] and the next
|
||||
edges are given by next[...first[[figs/img139.png]]...]. Therefore, the vectors first and next represent
|
||||
the linked lists of edges of each vertex. As there are two vertices for each edge,
|
||||
when an edge is iserted in the graph, it must be inserted in the two linked lists
|
||||
of the vertices in its composition. Therefore, there are //2n// entries of integer
|
||||
@ -140,8 +351,8 @@ following:
|
||||
- Other auxiliary structures
|
||||
+ **queue**: is a queue of integer numbers used in the breadth-first search of the
|
||||
assignment of values to critical vertices. There is an entry in the queue for
|
||||
each two critical vertices. Let //|Vcrit|// be the expected number of critical
|
||||
vertices. Therefore, the queue is stored in //4*0.5*|Vcrit|=2|Vcrit|//.
|
||||
each two critical vertices. Let [figs/img110.png] be the expected number of critical
|
||||
vertices. Therefore, the queue is stored in //4*0.5*[figs/img110.png]=2[figs/img110.png]//.
|
||||
|
||||
+ **visited**: is a vector of //cn// bits, where each bit indicates if the g value of
|
||||
a given vertex was already defined. Therefore, the vector visited is stored
|
||||
@ -153,12 +364,15 @@ following:
|
||||
|
||||
|
||||
Thus, the total memory consumption of BMZ algorithm for generating a minimal
|
||||
perfect hash function (MPHF) is: //(8.25c + 16.125)n +2|Vcrit| + O(1)// bytes.
|
||||
perfect hash function (MPHF) is: //(8.25c + 16.125)n +2[figs/img110.png] + O(1)// bytes.
|
||||
As the value of constant //c// may be 1.15 and 0.93 we have:
|
||||
|| //c// | //|Vcrit|// | Memory consumption to generate a MPHF |
|
||||
|| //c// | [figs/img110.png] | Memory consumption to generate a MPHF |
|
||||
| 0.93 | //0.497n// | //24.80n + O(1)// |
|
||||
| 1.15 | //0.401n// | //26.42n + O(1)// |
|
||||
The values of |Vcrit| were calculated using Eq.(1) presented in [2].
|
||||
|
||||
| **Table 1:** Memory consumption to generate a MPHF using the BMZ algorithm.
|
||||
|
||||
The values of [figs/img110.png] were calculated using Eq.(1) presented in [[2 #papers]].
|
||||
|
||||
Now we present the memory consumption to store the resulting function.
|
||||
We only need to store the //g// function. Thus, we need //4cn// bytes.
|
||||
@ -166,10 +380,17 @@ Again we have:
|
||||
|| //c// | Memory consumption to store a MPHF |
|
||||
| 0.93 | //3.72n// |
|
||||
| 1.15 | //4.60n// |
|
||||
|
||||
|
||||
| **Table 2:** Memory consumption to store a MPHF generated by the BMZ algorithm.
|
||||
----------------------------------------
|
||||
|
||||
==Papers==
|
||||
==Experimental Results==
|
||||
|
||||
[CHM x BMZ comparison.html]
|
||||
|
||||
----------------------------------------
|
||||
|
||||
==Papers==[papers]
|
||||
|
||||
+ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], D. Menoti, [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [A New algorithm for constructing minimal perfect hash functions papers/bmz_tr004_04.ps], Technical Report TR004/04, Department of Computer Science, Federal University of Minas Gerais, 2004.
|
||||
|
||||
@ -177,7 +398,7 @@ Again we have:
|
||||
|
||||
|
||||
----------------------------------------
|
||||
[Home index.html]
|
||||
| [Home index.html] | [CHM chm.html] | [BMZ bmz.html]
|
||||
----------------------------------------
|
||||
|
||||
%!include: FOOTER.t2t
|
||||
|
Before Width: | Height: | Size: 9.3 KiB After Width: | Height: | Size: 21 KiB |
24
CHM.t2t
24
CHM.t2t
@ -4,8 +4,11 @@ CHM Algorithm
|
||||
%!includeconf: CONFIG.t2t
|
||||
|
||||
----------------------------------------
|
||||
|
||||
==The Algorithm==
|
||||
|
||||
----------------------------------------
|
||||
|
||||
==Memory Consumption==
|
||||
|
||||
Now we detail the memory consumption to generate and to store minimal perfect hash functions
|
||||
@ -23,9 +26,11 @@ following:
|
||||
of 4 bytes that represent the vertices. As there are //n// edges, the
|
||||
vector edges is stored in //8n// bytes.
|
||||
|
||||
+ **next**: given a vertex //v//, we can discover the edges that contain //v//
|
||||
following its list of edges, which starts on first[//v//] and the next
|
||||
edges are given by next[...first[//v//]...]. Therefore, the vectors first and next represent
|
||||
+ **next**: given a vertex [figs/img139.png], we can discover the edges that
|
||||
contain [figs/img139.png] following its list of edges, which starts on
|
||||
first[[figs/img139.png]] and the next
|
||||
edges are given by next[...first[[figs/img139.png]]...]. Therefore,
|
||||
the vectors first and next represent
|
||||
the linked lists of edges of each vertex. As there are two vertices for each edge,
|
||||
when an edge is iserted in the graph, it must be inserted in the two linked lists
|
||||
of the vertices in its composition. Therefore, there are //2n// entries of integer
|
||||
@ -47,12 +52,23 @@ As the value of constant //c// must be at least 2.09 we have:
|
||||
|| //c// | Memory consumption to generate a MPHF |
|
||||
| 2.09 | //33.00n + O(1)// |
|
||||
|
||||
| **Table 1:** Memory consumption to generate a MPHF using the CHM algorithm.
|
||||
|
||||
Now we present the memory consumption to store the resulting function.
|
||||
We only need to store the //g// function. Thus, we need //4cn// bytes.
|
||||
Again we have:
|
||||
|| //c// | Memory consumption to store a MPHF |
|
||||
| 2.09 | //8.36n// |
|
||||
|
||||
| **Table 2:** Memory consumption to store a MPHF generated by the CHM algorithm.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
==Experimental Results==
|
||||
|
||||
[CHM x BMZ comparison.html]
|
||||
|
||||
----------------------------------------
|
||||
|
||||
==Papers==
|
||||
|
||||
@ -66,7 +82,7 @@ Again we have:
|
||||
|
||||
|
||||
----------------------------------------
|
||||
[Home index.html]
|
||||
| [Home index.html] | [CHM chm.html] | [BMZ bmz.html]
|
||||
----------------------------------------
|
||||
|
||||
%!include: FOOTER.t2t
|
||||
|
105
COMPARISON.t2t
105
COMPARISON.t2t
@ -5,17 +5,106 @@ Comparison Between BMZ And CHM Algorithms
|
||||
|
||||
----------------------------------------
|
||||
|
||||
==Features==
|
||||
==Characteristics==
|
||||
Table 1 presents the main characteristics of the two algorithms.
|
||||
The number of edges in the graph [figs/img27.png] is [figs/img236.png],
|
||||
the number of keys in the input set [figs/img20.png].
|
||||
The number of vertices of [figs/img32.png] is equal
|
||||
to [figs/img12.png] and [figs/img237.png] for BMZ algorithm and the CHM algorithm, respectively.
|
||||
This measure is related to the amount of space to store the array [figs/img37.png].
|
||||
This improves the space required to store a function in BMZ algorithm to [figs/img238.png] of the space required by the CHM algorithm.
|
||||
The number of critical edges is [figs/img76.png] and 0, for BMZ algorithm and the CHM algorithm,
|
||||
respectively.
|
||||
BMZ algorithm generates random graphs that necessarily contains cycles and the
|
||||
CHM algorithm
|
||||
generates
|
||||
acyclic random graphs.
|
||||
Finally, the CHM algorithm generates [order preserving functions concepts.html]
|
||||
while BMZ algorithm does not preserve order.
|
||||
|
||||
==Constructing Minimal Perfect Hash Functions==
|
||||
|
||||
==Memory Consumption==
|
||||
|
||||
|
||||
==Run times==
|
||||
%!include(html): ''TABLE1.t2t''
|
||||
| **Table 1:** Main characteristics of the algorithms.
|
||||
|
||||
----------------------------------------
|
||||
[Home index.html]
|
||||
|
||||
==Memory Consumption==
|
||||
|
||||
- Memory consumption to generate the minimal perfect hash function (MPHF):
|
||||
|| Algorithm | //c// | Memory consumption to generate a MPHF |
|
||||
| BMZ | 0.93 | //24.80n + O(1)// |
|
||||
| BMZ | 1.15 | //26.42n + O(1)// |
|
||||
| CHM | 2.09 | //33.00n + O(1)// |
|
||||
|
||||
| **Table 2:** Memory consumption to generate a MPHF using the algorithms BMZ and CHM.
|
||||
|
||||
- Memory consumption to store the resulting minimal perfect hash function (MPHF):
|
||||
|| Algorithm | //c// | Memory consumption to store a MPHF |
|
||||
| BMZ | 0.93 | //3.72n// |
|
||||
| BMZ | 1.15 | //4.60n// |
|
||||
| CHM | 2.09 | //8.36n// |
|
||||
|
||||
| **Table 3:** Memory consumption to store a MPHF generated by the algorithms BMZ and CHM.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
==Run times==
|
||||
We now present some experimental results to compare the BMZ and CHM algorithms.
|
||||
The data consists of a collection of 100 million universe resource locations
|
||||
(URLs) collected from the Web.
|
||||
The average length of a URL in the collection is 63 bytes.
|
||||
All experiments were carried on
|
||||
a computer running the Linux operating system, version 2.6.7,
|
||||
with a 2.4 gigahertz processor and
|
||||
4 gigabytes of main memory.
|
||||
|
||||
Table 4 presents time measurements.
|
||||
All times are in seconds.
|
||||
The table entries represent averages over 50 trials.
|
||||
The column labelled as [figs/img243.png] represents
|
||||
the number of iterations to generate the random graph [figs/img32.png] in the
|
||||
mapping step of the algorithms.
|
||||
The next columns represent the run times
|
||||
for the mapping plus ordering steps together and the searching
|
||||
step for each algorithm.
|
||||
The last column represents the percent gain of our algorithm
|
||||
over the CHM algorithm.
|
||||
|
||||
%!include(html): ''TABLE4.t2t''
|
||||
| **Table 4:** Time measurements for BMZ and the CHM algorithm.
|
||||
|
||||
The mapping step of the BMZ algorithm is faster because
|
||||
the expected number of iterations in the mapping step to generate [figs/img32.png] are
|
||||
2.13 and 2.92 for BMZ algorithm and the CHM algorithm, respectively
|
||||
(see [[2 bmz.html#papers]] for details).
|
||||
The graph [figs/img32.png] generated by BMZ algorithm
|
||||
has [figs/img12.png] vertices, against [figs/img237.png] for the CHM algorithm.
|
||||
These two facts make BMZ algorithm faster in the mapping step.
|
||||
The ordering step of BMZ algorithm is approximately equal to
|
||||
the time to check if [figs/img32.png] is acyclic for the CHM algorithm.
|
||||
The searching step of the CHM algorithm is faster, but the total
|
||||
time of BMZ algorithm is, on average, approximately 59 % faster
|
||||
than the CHM algorithm.
|
||||
It is important to notice the times for the searching step:
|
||||
for both algorithms they are not the dominant times,
|
||||
and the experimental results clearly show
|
||||
a linear behavior for the searching step.
|
||||
|
||||
We now present run times for BMZ algorithm using a [heuristic bmz.html#heuristic] that
|
||||
reduces the space requirement
|
||||
to any given value between [figs/img12.png] words and [figs/img13.png] words.
|
||||
For example, for [figs/img244.png] and [figs/img6.png], the analytical expected number
|
||||
of iterations are [figs/img245.png] and [figs/img246.png], respectively
|
||||
(for [figs/img247.png], the number of iterations are 2.78 for [figs/img244.png] and 3.04
|
||||
for [figs/img6.png]).
|
||||
Table 5 presents the total times to construct a
|
||||
function for [figs/img247.png], with an increase from [figs/img248.png] seconds
|
||||
for [figs/img128.png] (see Table 4) to [figs/img249.png] seconds for [figs/img244.png] and
|
||||
to [figs/img250.png] seconds for [figs/img6.png].
|
||||
|
||||
%!include(html): ''TABLE5.t2t''
|
||||
| **Table 5:** Time measurements for BMZ tuned algorithm with [figs/img5.png] and [figs/img6.png].
|
||||
----------------------------------------
|
||||
| [Home index.html] | [CHM chm.html] | [BMZ bmz.html]
|
||||
----------------------------------------
|
||||
|
||||
%!include: FOOTER.t2t
|
||||
|
56
CONCEPTS.t2t
Normal file
56
CONCEPTS.t2t
Normal file
@ -0,0 +1,56 @@
|
||||
Minimal Perfect Hash Functions - Introduction
|
||||
|
||||
|
||||
%!includeconf: CONFIG.t2t
|
||||
|
||||
----------------------------------------
|
||||
==Basic Concepts==
|
||||
|
||||
Suppose [figs/img14.png] is a universe of //keys//.
|
||||
Let [figs/img15.png] be a //hash function// that maps the keys from [figs/img14.png] to a given interval of integers [figs/img16.png].
|
||||
Let [figs/img17.png] be a set of [figs/img8.png] keys from [figs/img14.png].
|
||||
Given a key [figs/img18.png], the hash function [figs/img7.png] computes an
|
||||
integer in [figs/img19.png] for the storage or retrieval of [figs/img11.png] in
|
||||
a //hash table//.
|
||||
Hashing methods for //non-static sets// of keys can be used to construct
|
||||
data structures storing [figs/img20.png] and supporting membership queries
|
||||
"[figs/img18.png]?" in expected time [figs/img21.png].
|
||||
However, they involve a certain amount of wasted space owing to unused
|
||||
locations in the table and waisted time to resolve collisions when
|
||||
two keys are hashed to the same table location.
|
||||
|
||||
For //static sets// of keys it is possible to compute a function
|
||||
to find any key in a table in one probe; such hash functions are called
|
||||
//perfect//.
|
||||
More precisely, given a set of keys [figs/img20.png], we shall say that a
|
||||
hash function [figs/img15.png] is a //perfect hash function//
|
||||
for [figs/img20.png] if [figs/img7.png] is an injection on [figs/img20.png],
|
||||
that is, there are no //collisions// among the keys in [figs/img20.png]:
|
||||
if [figs/img11.png] and [figs/img22.png] are in [figs/img20.png] and [figs/img23.png],
|
||||
then [figs/img24.png].
|
||||
Figure 1(a) illustrates a perfect hash function.
|
||||
Since no collisions occur, each key can be retrieved from the table
|
||||
with a single probe.
|
||||
If [figs/img25.png], that is, the table has the same size as [figs/img20.png],
|
||||
then we say that [figs/img7.png] is a //minimal perfect hash function//
|
||||
for [figs/img20.png].
|
||||
Figure 1(b) illustrates a minimal perfect hash function.
|
||||
Minimal perfect hash functions totally avoid the problem of wasted
|
||||
space and time. A perfect hash function [figs/img7.png] is //order preserving//
|
||||
if the keys in [figs/img20.png] are arranged in some given order
|
||||
and [figs/img7.png] preserves this order in the hash table.
|
||||
|
||||
| [figs/img26.png]
|
||||
| **Figure 1:** (a) Perfect hash function. (b) Minimal perfect hash function.
|
||||
|
||||
Minimal perfect hash functions are widely used for memory efficient
|
||||
storage and fast retrieval of items from static sets, such as words in natural
|
||||
languages, reserved words in programming languages or interactive systems,
|
||||
universal resource locations (URLs) in Web search engines, or item sets in
|
||||
data mining techniques.
|
||||
|
||||
----------------------------------------
|
||||
| [Home index.html] | [CHM chm.html] | [BMZ bmz.html]
|
||||
----------------------------------------
|
||||
|
||||
%!include: FOOTER.t2t
|
42
CONFIG.t2t
42
CONFIG.t2t
@ -1,4 +1,46 @@
|
||||
%! style(html): DOC.css
|
||||
%! PreProc(html): '^%html% ' ''
|
||||
%! PreProc(txt): '^%txt% ' ''
|
||||
%! PostProc(html): "&" "&"
|
||||
%! PostProc(txt): " " " "
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img7.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img7.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img57.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img57.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img32.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img32.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img20.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img20.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img60.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img60.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img62.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img62.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img79.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img79.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img139.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img139.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img140.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img140.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img143.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img143.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img115.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img115.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img11.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img11.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img169.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img169.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img96.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img96.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img178.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img178.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img180.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img180.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img183.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img183.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img189.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img189.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img196.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img196.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img172.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img172.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img8.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img8.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img1.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img1.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img14.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img14.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img128.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img128.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img112.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img112.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img12.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img12.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img13.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img13.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img244.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img244.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img245.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img245.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img246.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img246.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img15.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img15.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img25.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img25.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img168.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img168.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img6.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img6.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img5.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img5.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img28.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img28.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img237.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img237.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img248.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img237.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img248.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img237.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img249.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img249.png"\1>'
|
||||
%! PostProc(html): 'ALIGN="middle" SRC="figs/img250.png"(.*?)>' 'ALIGN="bottom" SRC="figs/img250.png"\1>'
|
||||
|
33
DOC.css
Normal file
33
DOC.css
Normal file
@ -0,0 +1,33 @@
|
||||
/* implement both fixed-size and relative sizes */
|
||||
SMALL.XTINY { }
|
||||
SMALL.TINY { }
|
||||
SMALL.SCRIPTSIZE { }
|
||||
BODY { font-size: 13 }
|
||||
TD { font-size: 13 }
|
||||
SMALL.FOOTNOTESIZE { font-size: 13 }
|
||||
SMALL.SMALL { }
|
||||
BIG.LARGE { }
|
||||
BIG.XLARGE { }
|
||||
BIG.XXLARGE { }
|
||||
BIG.HUGE { }
|
||||
BIG.XHUGE { }
|
||||
|
||||
/* heading styles */
|
||||
H1 { }
|
||||
H2 { }
|
||||
H3 { }
|
||||
H4 { }
|
||||
H5 { }
|
||||
|
||||
|
||||
/* mathematics styles */
|
||||
DIV.displaymath { } /* math displays */
|
||||
TD.eqno { } /* equation-number cells */
|
||||
|
||||
|
||||
/* document-specific styles come next */
|
||||
DIV.navigation { }
|
||||
DIV.center { }
|
||||
SPAN.textit { font-style: italic }
|
||||
SPAN.arabic { }
|
||||
SPAN.eqn-number { }
|
3
FAQ.t2t
3
FAQ.t2t
@ -1,6 +1,7 @@
|
||||
CMPH FAQ
|
||||
|
||||
|
||||
%!includeconf: CONFIG.t2t
|
||||
|
||||
- How do I define the ids of the keys?
|
||||
- You don't. The ids will be assigned by the algorithm creating the minimal
|
||||
@ -26,7 +27,7 @@ one is executed?
|
||||
|
||||
|
||||
----------------------------------------
|
||||
[Home index.html]
|
||||
| [Home index.html] | [CHM chm.html] | [BMZ bmz.html]
|
||||
----------------------------------------
|
||||
|
||||
%!include: FOOTER.t2t
|
||||
|
@ -1,6 +1,7 @@
|
||||
GPERF versus CMPH
|
||||
|
||||
|
||||
%!includeconf: CONFIG.t2t
|
||||
|
||||
You might ask why cmph if [gperf http://www.gnu.org/software/gperf/gperf.html]
|
||||
already works perfectly. Actually, gperf and cmph have different goals.
|
||||
@ -32,7 +33,7 @@ assigning ids to millions of documents), while the former is usually found in
|
||||
the compiler programming area (detect reserved keywords).
|
||||
|
||||
----------------------------------------
|
||||
[Home index.html]
|
||||
| [Home index.html] | [CHM chm.html] | [BMZ bmz.html]
|
||||
----------------------------------------
|
||||
|
||||
%!include: FOOTER.t2t
|
||||
|
1
LOGO.t2t
Normal file
1
LOGO.t2t
Normal file
@ -0,0 +1 @@
|
||||
<a href="http://sourceforge.net"><img src="http://sourceforge.net/sflogo.php?group_id=96251&type=1" width="88" height="31" border="0" alt="SourceForge.net Logo" /> </a>
|
@ -8,7 +8,8 @@ CMPH - C Minimal Perfect Hashing Library
|
||||
==Description==
|
||||
|
||||
C Minimal Perfect Hashing Library is a portable LGPLed library to create and
|
||||
to work with minimal perfect hash functions. The cmph library encapsulates the newest
|
||||
to work with [minimal perfect hash functions concepts.html].
|
||||
The cmph library encapsulates the newest
|
||||
and more efficient algorithms (available in the literature) in an easy-to-use,
|
||||
production-quality and fast API. The library is designed to work with big entries that
|
||||
can not fit in the main memory. It has been used successfully for constructing minimal perfect
|
||||
@ -54,7 +55,7 @@ of the distinguishable features of cmph:
|
||||
|
||||
- New heuristic added to the bmz algorithm permits to generate a mphf with only
|
||||
//24.6n + O(1)// bytes. The resulting function can be stored in //3.72n// bytes.
|
||||
%html% [click here bmz.html] for details.
|
||||
%html% [click here bmz.html#heuristic] for details.
|
||||
|
||||
|
||||
----------------------------------------
|
||||
@ -173,5 +174,5 @@ Code is under the LGPL.
|
||||
|
||||
%!include: FOOTER.t2t
|
||||
|
||||
%!include(html): ''LOGO.html''
|
||||
%!include(html): ''LOGO.t2t''
|
||||
Last Updated: %%date(%c)
|
||||
|
76
TABLE1.t2t
Normal file
76
TABLE1.t2t
Normal file
@ -0,0 +1,76 @@
|
||||
<TABLE CELLPADDING=3 BORDER="1" ALIGN="center">
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
Characteristics </SMALL></TD>
|
||||
<TD ALIGN="CENTER" COLSPAN=2><SMALL CLASS="FOOTNOTESIZE"> <SPAN>Algorithms</SPAN></SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
|
||||
</SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> BMZ </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> CHM </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
|
||||
<SPAN CLASS="MATH"><IMG
|
||||
WIDTH="11" HEIGHT="13" ALIGN="BOTTOM" BORDER="0"
|
||||
SRC="figs/img1.png"
|
||||
ALT="$c$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 1.15 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.09 </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
<SPAN CLASS="MATH"><IMG
|
||||
WIDTH="50" HEIGHT="32" ALIGN="MIDDLE" BORDER="0"
|
||||
SRC="figs/img239.png"
|
||||
ALT="$\vert E(G)\vert$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> <SPAN CLASS="MATH"><IMG
|
||||
WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0"
|
||||
SRC="figs/img8.png"
|
||||
ALT="$n$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> <SPAN CLASS="MATH"><IMG
|
||||
WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0"
|
||||
SRC="figs/img8.png"
|
||||
ALT="$n$"></SPAN> </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
<SPAN CLASS="MATH"><IMG
|
||||
WIDTH="89" HEIGHT="32" ALIGN="MIDDLE" BORDER="0"
|
||||
SRC="figs/img240.png"
|
||||
ALT="$\vert V(G)\vert=\vert g\vert$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> <SPAN CLASS="MATH"><IMG
|
||||
WIDTH="20" HEIGHT="13" ALIGN="BOTTOM" BORDER="0"
|
||||
SRC="figs/img241.png"
|
||||
ALT="$cn$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> <SPAN CLASS="MATH"><IMG
|
||||
WIDTH="20" HEIGHT="13" ALIGN="BOTTOM" BORDER="0"
|
||||
SRC="figs/img241.png"
|
||||
ALT="$cn$"></SPAN> </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
<!-- MATH
|
||||
$|E(G_{\rm crit})|$
|
||||
-->
|
||||
<SPAN CLASS="MATH"><IMG
|
||||
WIDTH="70" HEIGHT="32" ALIGN="MIDDLE" BORDER="0"
|
||||
SRC="figs/img111.png"
|
||||
ALT="$\vert E(G_{\rm crit})\vert$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> <SPAN CLASS="MATH"><IMG
|
||||
WIDTH="71" HEIGHT="32" ALIGN="MIDDLE" BORDER="0"
|
||||
SRC="figs/img242.png"
|
||||
ALT="$0.5\vert E(G)\vert$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 0</SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
<SPAN CLASS="MATH"><IMG
|
||||
WIDTH="17" HEIGHT="14" ALIGN="BOTTOM" BORDER="0"
|
||||
SRC="figs/img32.png"
|
||||
ALT="$G$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> cyclic </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> acyclic </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
Order preserving </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> no </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> yes </SMALL></TD>
|
||||
</TR>
|
||||
</TABLE>
|
109
TABLE4.t2t
Normal file
109
TABLE4.t2t
Normal file
@ -0,0 +1,109 @@
|
||||
<TABLE CELLPADDING=3 BORDER="1" ALIGN="center">
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> <SPAN CLASS="MATH"><IMG
|
||||
WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0"
|
||||
SRC="figs/img8.png"
|
||||
ALT="$n$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER" COLSPAN=4><SMALL CLASS="FOOTNOTESIZE"> <SPAN> BMZ </SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER" COLSPAN=4><SMALL CLASS="FOOTNOTESIZE">
|
||||
<SPAN>CHM algorithm</SPAN></SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> Gain</SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
</SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> <SPAN CLASS="MATH"><IMG
|
||||
WIDTH="22" HEIGHT="30" ALIGN="MIDDLE" BORDER="0"
|
||||
SRC="figs/img243.png"
|
||||
ALT="$N_i$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">Map+Ord </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
Search </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">Total </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
<SPAN CLASS="MATH"><IMG
|
||||
WIDTH="22" HEIGHT="30" ALIGN="MIDDLE" BORDER="0"
|
||||
SRC="figs/img243.png"
|
||||
ALT="$N_i$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">Map+Ord </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">Search </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
Total </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> (%)</SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 1,562,500 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.28 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 8.54 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.37 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 10.91 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.70 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 14.56 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 1.57 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 16.13 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 48 </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 3,125,000 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.16 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 15.92 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 4.88 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 20.80 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.85 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 30.36 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 3.20 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 33.56 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 61 </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 6,250,000 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.20 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 33.09 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 10.48 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 43.57 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.90 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 62.26 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 6.76 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 69.02 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 58 </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 12,500,000 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.00 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 63.26 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 23.04 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 86.30 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.60 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 117.99 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 14.94 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 132.92 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 54 </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 25,000,000 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.00 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 130.79 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 51.55 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 182.34 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.80 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 262.05 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 33.68 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 295.73 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 62 </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 50,000,000 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.07 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 273.75 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 114.12 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 387.87 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.90 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 577.59 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 73.97 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 651.56 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 68 </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 100,000,000 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.07 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 567.47 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 243.13 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 810.60 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.80 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 1,131.06 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 157.23 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 1,288.29 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 59 </SMALL></TD>
|
||||
</TR>
|
||||
</TABLE>
|
46
TABLE5.t2t
Normal file
46
TABLE5.t2t
Normal file
@ -0,0 +1,46 @@
|
||||
<TABLE CELLPADDING=3 BORDER="1" ALIGN="center">
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> <SPAN CLASS="MATH"><IMG
|
||||
WIDTH="14" HEIGHT="13" ALIGN="BOTTOM" BORDER="0"
|
||||
SRC="figs/img8.png"
|
||||
ALT="$n$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER" COLSPAN=4><SMALL CLASS="FOOTNOTESIZE"> <SPAN> BMZ <SPAN CLASS="MATH"><IMG
|
||||
WIDTH="60" HEIGHT="13" ALIGN="BOTTOM" BORDER="0"
|
||||
SRC="figs/img5.png"
|
||||
ALT="$c=1.00$"></SPAN></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER" COLSPAN=4><SMALL CLASS="FOOTNOTESIZE">
|
||||
<SPAN> BMZ <SPAN CLASS="MATH"><IMG
|
||||
WIDTH="60" HEIGHT="13" ALIGN="BOTTOM" BORDER="0"
|
||||
SRC="figs/img6.png"
|
||||
ALT="$c=0.93$"></SPAN></SPAN> </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
</SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> <SPAN CLASS="MATH"><IMG
|
||||
WIDTH="22" HEIGHT="30" ALIGN="MIDDLE" BORDER="0"
|
||||
SRC="figs/img243.png"
|
||||
ALT="$N_i$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">Map+Ord </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
Search </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">Total </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
<SPAN CLASS="MATH"><IMG
|
||||
WIDTH="22" HEIGHT="30" ALIGN="MIDDLE" BORDER="0"
|
||||
SRC="figs/img243.png"
|
||||
ALT="$N_i$"></SPAN> </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">Map+Ord </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">Search </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE">
|
||||
Total </SMALL></TD>
|
||||
</TR>
|
||||
<TR><TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 12,500,000 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 2.78 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 76.68 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 25.06 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 101.74 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 3.04 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 76.39 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 25.80 </SMALL></TD>
|
||||
<TD ALIGN="CENTER"><SMALL CLASS="FOOTNOTESIZE"> 102.19 </SMALL></TD>
|
||||
</TR>
|
||||
</TABLE>
|
Loading…
Reference in New Issue
Block a user