From 29d8ae69951d14a57e21c6659391b88063bd1b0e Mon Sep 17 00:00:00 2001
From: fc_botelho <fc_botelho>
Date: Fri, 28 Jan 2005 20:07:22 +0000
Subject: [PATCH] It was improved the documentation of BMZ and CHM algorithms

---
 BMZ.t2t        | 164 ++++++++++++++++++++++++++++++++++++++++++++-----
 CHM.t2t        |  51 ++++++++++++++-
 COMPARISON.t2t |   8 +--
 CONFIG.t2t     |   2 +
 README.t2t     |  22 +++----
 5 files changed, 212 insertions(+), 35 deletions(-)

diff --git a/BMZ.t2t b/BMZ.t2t
index 616d6bd..37e3101 100644
--- a/BMZ.t2t
+++ b/BMZ.t2t
@@ -4,46 +4,176 @@ BMZ Algorithm
 %!includeconf: CONFIG.t2t
 
 ----------------------------------------
-**History**
+==History==
 
 At the end of 2003, professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] was
-finishing the second edition of his book.
-During the book writing, professor Nivio studied the problem of generating minimal perfect hash
+finishing the second edition of his [book http://www.dcc.ufmg.br/algoritmos/].
+During the [book http://www.dcc.ufmg.br/algoritmos/] writing, 
+professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] studied the problem of generating minimal perfect hash
 functions (if you are not familiarized with this problem, see [1][2]). 
-Professor Nivio coded a modified version of the [CHM algorithm chm.html], which was proposed by
-Czech, Havas and Majewski and put it in his book.
+Professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] coded a modified version of 
+the [CHM algorithm chm.html], which was proposed by
+Czech, Havas and Majewski and put it in his [book http://www.dcc.ufmg.br/algoritmos/].
 The [CHM algorithm chm.html] is based on acyclic random graphs to generate order preserving 
-minimal perfect hash functions in linear time. Professor Nivio argued himself, why must the random graph 
-be acyclic? In the modified version availalbe in his book he got rid of such restriction.
+minimal perfect hash functions in linear time. Professor [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] 
+argued himself, why must the random graph 
+be acyclic? In the modified version availalbe in his [book http://www.dcc.ufmg.br/algoritmos/] he got rid of such restriction.
 
 The modification presented a problem, it was impossible to generate minimal perfect hash functions
 for sets with more than 1000 keys.
 At the same time, [Fabiano C. Botelho http://www.dcc.ufmg.br/~fbotelho],
 a master degree student at [Departament of Computer Science http://www.dcc.ufmg.br] in 
 [Federal University of Minas Gerais http://www.ufmg.br],
-started to be advised by Nivio who presented the problem to Fabiano.
+started to be advised by [Nivio Ziviani http://www.dcc.ufmg.br/~nivio] who presented the problem 
+to [Fabiano http://www.dcc.ufmg.br/~fbotelho].
 
-During the master, Fabiano and Nivio faced lots of problems.
-Talking with a friend of mine (David Menoti) about our problems, many ideas 
-appeared and after of implementing them, we got a very fast algorithm to generate
-minimal perfect hash functions that does not preserve order.
+During the master, [Fabiano http://www.dcc.ufmg.br/~fbotelho] and 
+[Nivio Ziviani http://www.dcc.ufmg.br/~nivio] faced lots of problems.
+In april of 2004, [Fabiano http://www.dcc.ufmg.br/~fbotelho] was talking with a 
+friend of him (David Menoti) about the problems
+and many ideas appeared.
+The ideas were implemented and we noticed that a very fast algorithm to generate
+minimal perfect hash functions had been designed.
 We refer the algorithm to as **BMZ**, because it was conceived by Fabiano C. **B**otelho
 David **M**enoti and Nivio **Z**iviani. The algorithm is described in [1].
 To analyse BMZ algorithm we needed some results from the random graph theory, so 
 we invite professor [Yoshiharu Kohayakawa http://www.ime.usp.br/~yoshi] to help us.
 The final description and analysis of BMZ algorithm is presented in [2].
 
-
+----------------------------------------
  
-**The Algorithm**
+==The Algorithm==
 
-**The Heuristic**
+Let us show how the minimal perfect hash function [figs/img7.png] will be constructed.
+We make use of two auxiliary random functions [figs/img41.png] and [figs/img55.png], 
+where [figs/img56.png] for some suitably chosen integer [figs/img57.png], 
+where [figs/img58.png].We build a random graph [figs/img59.png] on [figs/img60.png],
+whose edge set is [figs/img61.png]. There is an edge in [figs/img32.png] for each 
+key in the set of keys [figs/img20.png].
 
-**Papers**
+In what follows, we shall be interested in the //2-core// of
+the random graph [figs/img32.png], that is, the maximal subgraph 
+of [figs/img32.png] with minimal degree at 
+least 2 (see, e.g., [2] for details).
+Because of its importance in our context, we call the 2-core the
+//critical// subgraph of [figs/img32.png] and denote it by [figs/img63.png].
+The vertices and edges in [figs/img63.png] are said to be //critical//.
+We let [figs/img64.png] and [figs/img65.png].
+Moreover, we let [figs/img66.png] be the set of //non-critical//
+vertices in [figs/img32.png].
+We also let [figs/img67.png] be the set of all critical
+vertices that have at least one non-critical vertex as a neighbour.
+Let [figs/img68.png] be the set of //non-critical// edges in [figs/img32.png].
+Finally, we let [figs/img69.png] be the //non-critical// subgraph 
+of [figs/img32.png.
+The non-critical subgraph [figs/img70.png] corresponds to the //acyclic part//
+of [figs/img32.png].
+We have [figs/img71.png].
+
+We then construct a suitable labelling [figs/img72.png] of the vertices
+of [figs/img32.png]: we choose [figs/img73.png] for each [figs/img74.png] in such
+a way that [figs/img75.png] ([figs/img18.png]) is a
+minimal perfect hash function for [figs/img20.png].
+We will see later on that this labelling [figs/img37.png] can be found in linear time
+if the number of edges in [figs/img63.png] is at most [figs/img76.png].
+
+Figure 2 presents a pseudo code for the algorithm.
+The procedure GenerateMPHF ([figs/img20.png], [figs/img37.png]) receives as input the set of
+keys [figs/img20.png] and produces the labelling [figs/img37.png].
+The method uses a mapping, ordering and searching approach.
+We now describe each step.
+| procedure GenerateMPHF ([figs/img20.png], [figs/img37.png])                              
+| &nbsp;&nbsp;&nbsp;&nbsp;Mapping ([figs/img20.png], [figs/img32.png]);                                           
+| &nbsp;&nbsp;&nbsp;&nbsp;Ordering ([figs/img32.png], [figs/img63.png], [figs/img70.png]);                        
+| &nbsp;&nbsp;&nbsp;&nbsp;Searching ([figs/img32.png], [figs/img63.png], [figs/img70.png], [figs/img37.png]);     
+**Figure 2**: Main steps of the algorithm for constructing a minimal perfect hash function 
+
+===Mapping Step===
+
+===Ordering Step===
+
+===Searching Step===
+
+====Assignment of Values to Critical Vertices====
+
+====Assignment of Values to Non-Critical Vertices====
+
+----------------------------------------
+
+==The Heuristic==
+
+----------------------------------------
+
+==Memory Consumption==
+
+Now we detail the memory consumption to generate and to store minimal perfect hash functions
+using the BMZ algorithm. The structures responsible for memory consumption are in the 
+following:
+- Graph:
+  + **first**: is a vector that stores //cn// integer numbers, each one representing 
+    the first edge (index in the vector edges) in the list of 
+    edges of each vertex. 
+    The integer numbers are 4 bytes long. Therefore,
+    the vector first is stored in //4cn// bytes.
+    
+  + **edges**: is a vector to represent the edges of the graph. As each edge
+    is compounded by a pair of vertices, each entry stores two integer numbers 
+    of 4 bytes that represent the vertices. As there are //n// edges, the 
+    vector edges is stored in //8n// bytes. 
+    
+  + **next**: given a vertex //v//, we can discover the edges that contain //v// 
+    following its list of edges, which starts on first[//v//] and the next
+    edges are given by next[...first[//v//]...]. Therefore, the vectors first and next represent 
+    the linked lists of edges of each vertex. As there are two vertices for each edge,
+    when an edge is iserted in the graph, it must be inserted in the two linked lists 
+    of the vertices in its composition. Therefore, there are //2n// entries of integer
+    numbers in the vector next, so it is stored in //4*2n = 8n// bytes.
+    
+  + **critical vertices(critical_nodes vector)**: is a vector of //cn// bits, 
+    where each bit indicates if a vertex is critical (1) or non-critical (0). 
+    Therefore, the critical and non-critical vertices are represented in //cn/8// bytes.
+    
+  + **critical edges (used_edges vector)**: is a vector of //n// bits, where each 
+    bit indicates if an edge is critical (1) or non-critical (0). Therefore, the 
+    critical and non-critical edges are represented in //n/8// bytes. 
+    
+- Other auxiliary structures 
+  + **queue**: is a queue of integer numbers used in the breadth-first search of the
+    assignment of values to critical vertices. There is an entry in the queue for 
+    each two critical vertices. Let //|Vcrit|// be the expected number of critical 
+    vertices. Therefore, the queue is stored in //4*0.5*|Vcrit|=2|Vcrit|//.
+    
+  + **visited**: is a vector of //cn// bits, where each bit indicates if the g value of 
+    a given vertex was already defined. Therefore, the vector visited is stored
+    in //cn/8// bytes.
+    
+  + **function //g//**: is represented by a vector of //cn// integer numbers.
+    As each integer number is 4 bytes long, the function //g// is stored in
+    //4cn// bytes. 
+
+    
+Thus, the total memory consumption of BMZ algorithm for generating a minimal 
+perfect hash function (MPHF) is: //(8.25c + 16.125)n +2|Vcrit| + O(1)// bytes.
+As the value of constant //c// may be 1.15 and 0.93 we have:
+ || //c// |  //|Vcrit|// | Memory consumption to generate a MPHF |
+  | 0.93  |  //0.497n//  |         //24.80n + O(1)//             |
+  | 1.15  |  //0.401n//  |         //26.42n + O(1)//             |
+The values of |Vcrit| were calculated using Eq.(1) presented in [2].
+    
+Now we present the memory consumption to store the resulting function.
+We only need to store the //g// function. Thus, we need //4cn// bytes.
+Again we have:
+ || //c// | Memory consumption to store a MPHF |
+  | 0.93  |            //3.72n//               |
+  | 1.15  |            //4.60n//               |
+    
+----------------------------------------
+
+==Papers==
 
 + [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], D. Menoti, [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [A New algorithm for constructing minimal perfect hash functions papers/bmz_tr004_04.ps], Technical Report TR004/04, Department of Computer Science, Federal University of Minas Gerais, 2004.
 
-+ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], Y. Kohayakawa, and [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [A Practical Minimal Perfect Hashing Method papers/bmz_wea2005.ps], 4th International Workshop on Efficient and Experimental Algorithms (WEA), 2005.(submitted) 
++ [F. C. Botelho http://www.dcc.ufmg.br/~fbotelho], Y. Kohayakawa, and [N. Ziviani http://www.dcc.ufmg.br/~nivio]. [A Practical Minimal Perfect Hashing Method papers/bmz_wea2005.ps] (Submitted).
 
 
 ----------------------------------------
diff --git a/CHM.t2t b/CHM.t2t
index 1ceccaa..8859eff 100644
--- a/CHM.t2t
+++ b/CHM.t2t
@@ -4,12 +4,57 @@ CHM Algorithm
 %!includeconf: CONFIG.t2t
 
 ----------------------------------------
+==The Algorithm==
 
-**History**
+==Memory Consumption==
 
-**The Algorithm**
+Now we detail the memory consumption to generate and to store minimal perfect hash functions
+using the CHM algorithm. The structures responsible for memory consumption are in the 
+following:
+- Graph:
+  + **first**: is a vector that stores //cn// integer numbers, each one representing 
+    the first edge (index in the vector edges) in the list of 
+    edges of each vertex. 
+    The integer numbers are 4 bytes long. Therefore,
+    the vector first is stored in //4cn// bytes.
+    
+  + **edges**: is a vector to represent the edges of the graph. As each edge
+    is compounded by a pair of vertices, each entry stores two integer numbers 
+    of 4 bytes that represent the vertices. As there are //n// edges, the 
+    vector edges is stored in //8n// bytes. 
+    
+  + **next**: given a vertex //v//, we can discover the edges that contain //v// 
+    following its list of edges, which starts on first[//v//] and the next
+    edges are given by next[...first[//v//]...]. Therefore, the vectors first and next represent 
+    the linked lists of edges of each vertex. As there are two vertices for each edge,
+    when an edge is iserted in the graph, it must be inserted in the two linked lists 
+    of the vertices in its composition. Therefore, there are //2n// entries of integer
+    numbers in the vector next, so it is stored in //4*2n = 8n// bytes.
+    
+- Other auxiliary structures    
+  + **visited**: is a vector of //cn// bits, where each bit indicates if the g value of 
+    a given vertex was already defined. Therefore, the vector visited is stored
+    in //cn/8// bytes.
+    
+  + **function //g//**: is represented by a vector of //cn// integer numbers.
+    As each integer number is 4 bytes long, the function //g// is stored in
+    //4cn// bytes. 
 
-**Papers**
+    
+Thus, the total memory consumption of CHM algorithm for generating a minimal 
+perfect hash function (MPHF) is: //(8.125c + 16)n + O(1)// bytes.
+As the value of constant //c// must be at least 2.09 we have:
+ || //c// |  Memory consumption to generate a MPHF |
+  | 2.09  |          //33.00n + O(1)//             |
+
+Now we present the memory consumption to store the resulting function.
+We only need to store the //g// function. Thus, we need //4cn// bytes.
+Again we have:
+ || //c// | Memory consumption to store a MPHF |
+  | 2.09  |             //8.36n//              |
+  
+  
+==Papers==
 
 + Z.J. Czech, G. Havas, and B.S. Majewski. [An optimal algorithm for generating minimal perfect hash functions. papers/chm92.pdf], Information Processing Letters, 43(5):257-264, 1992.
 
diff --git a/COMPARISON.t2t b/COMPARISON.t2t
index 4176c28..a6ff823 100644
--- a/COMPARISON.t2t
+++ b/COMPARISON.t2t
@@ -5,14 +5,14 @@ Comparison Between BMZ And CHM Algorithms
 
 ----------------------------------------
 
-**Features**
+==Features==
 
-**Constructing Minimal Perfect Hash Functions**
+==Constructing Minimal Perfect Hash Functions==
 
-**Memory Consumption**
+==Memory Consumption==
 
 
-**Run times**
+==Run times==
 
 ----------------------------------------
 [Home index.html]
diff --git a/CONFIG.t2t b/CONFIG.t2t
index 807454c..19dd4e9 100644
--- a/CONFIG.t2t
+++ b/CONFIG.t2t
@@ -1,2 +1,4 @@
 %! PreProc(html): '^%html% ' ''
 %! PreProc(txt): '^%txt% ' ''
+%! PostProc(html): "&amp;" "&"
+%! PostProc(txt): "&nbsp;" " "
diff --git a/README.t2t b/README.t2t
index 0b25b81..e491c32 100644
--- a/README.t2t
+++ b/README.t2t
@@ -5,7 +5,7 @@ CMPH - C Minimal Perfect Hashing Library
 
 -------------------------------------------------------------------
 
-**Description**
+==Description==
 
 C Minimal Perfect Hashing Library is a portable LGPLed library to create and
 to work with minimal perfect hash functions. The cmph library encapsulates the newest
@@ -31,35 +31,35 @@ of the distinguishable features of cmph:
 
 ----------------------------------------
 
-**Supported Algorithms**
+==Supported Algorithms==
 
  
 %html% - [BMZ Algorithm bmz.html].
 %txt% - BMZ Algorithm.
   A very fast algorithm based on cyclic random graphs to construct minimal
   perfect hash functions in linear time. The resulting functions are not order preserving and
-  can be stored in only 4cn bytes, where c is between 0.93 and 1.15.  
+  can be stored in only //4cn// bytes, where //c// is between 0.93 and 1.15.  
 %html% - [CHM Algorithm chm.html].
 %txt% - CHM Algorithm.
   An algorithm based on acyclic random graphs to construct minimal
   perfect hash functions in linear time. The resulting functions are order preserving and
-  are stored in 4cn bytes, where c is greater than 2.
+  are stored in //4cn// bytes, where //c// is greater than 2.
 
 %html% [Click Here comparison.html] to see a comparison of the supported algorithms. 
 
 
 ----------------------------------------
 
-**News for version 0.3**
+==News for version 0.3==
 
 - New heuristic added to the bmz algorithm permits to generate a mphf with only
-  24.61*n + O(1) bytes. The resulting function can be stored in 3.72*n bytes.
+  //24.6n + O(1)// bytes. The resulting function can be stored in //3.72n// bytes.
 %html% [click here bmz.html] for details.
 
 
 ----------------------------------------
 
-**Examples**
+==Examples==
 
 Using cmph is quite simple. Take a look.
 
@@ -113,7 +113,7 @@ Using cmph is quite simple. Take a look.
 ```
 --------------------------------------
 
-**The cmph application**
+==The cmph application==
 
 cmph is the name of both the library and the utility
 application that comes with this package. You can use the cmph
@@ -157,16 +157,16 @@ utility.
    keysfile       line separated file with keys
 ```
 
-**Additional Documentation**
+==Additional Documentation==
 
 [FAQ faq.html]
 
-**Downloads**
+==Downloads==
 
 Use the project page at sourceforge: http://sf.net/projects/cmph
 
 
-**License Stuff**
+==License Stuff==
 
 Code is under the LGPL. 
 ----------------------------------------