FCH algorithm documentation was added

2007-02-14 02:14:10 +00:00
parent beeea04351
commit 60c686a2fc
5 changed files with 60 additions and 7 deletions
--- a/CHM.t2t
+++ b/CHM.t2t
@@ -6,7 +6,7 @@ CHM Algorithm
 ----------------------------------------
 ==The Algorithm==
-
+The algorithm is presented in [[1,2,3 #papers]].
 ----------------------------------------
 ==Memory Consumption==
@@ -70,7 +70,7 @@ Again we have:
 ----------------------------------------
-==Papers==
+==Papers==[papers]
 + Z.J. Czech, G. Havas, and B.S. Majewski. [An optimal algorithm for generating minimal perfect hash functions. papers/chm92.pdf], Information Processing Letters, 43(5):257-264, 1992.
--- a/FCH.t2t
+++ b/FCH.t2t
@@ -0,0 +1,45 @@
 FCH Algorithm
 %!includeconf: CONFIG.t2t
 ----------------------------------------
 ==The Algorithm==
 The algorithm is presented in [[1 #papers]].
 ----------------------------------------
 ==Memory Consumption==
 Now we detail the memory consumption to generate and to store minimal perfect hash functions
 using the FCH algorithm. The structures responsible for memory consumption are in the 
 following:
 - A vector containing all the //n// keys.
 - Data structure to speed up the searching step:
  + **random_table**: is a vector used to remember currently empty slots in the hash table. It stores //n// 4 byte long integer numbers. This vector initially contains a random permutation of the //n// hash addresses. A pointer called filled_count is used to keep the invariant that any slots to the right side of filled_count (inclusive) are empty and any ones to the left are filled.
  + **hash_table**: Table used to check whether all the collisions were resolved. It has //n// entries of one byte.
  + **map_table**: For any unfilled slot //x// in hash_table, the map_table vector contains //n// 4 byte long pointers pointing at random_table such that random_table[map_table[x]] = x. Thus, given an empty slot x in the hash_table, we can locate its position in the random_table vector through map_table.
 - Other auxiliary structures    
  + **sorted_indexes**: is a vector of //cn/(log(n) + 1)// 4 byte long pointers to indirectly keep the buckets sorted by decreasing order of their sizes. 
  + **function //g//**: is represented by a vector of //cn/(log(n) + 1)// 4 byte long integer numbers, one for each bucket. It is used to spread all the keys in a given bucket into the hash table without collisions.
 Thus, the total memory consumption of FCH algorithm for generating a minimal 
 perfect hash function (MPHF) is: //O(n) + 9n + 8cn/(log(n) + 1)// bytes.
 The value of parameter //c// must be greater than or equal to 2.6.
 Now we present the memory consumption to store the resulting function.
 We only need to store the //g// function and a constant number of bytes for the seed of the hash functions used in the resulting MPHF. Thus, we need //cn/(log(n) + 1) + O(1)// bytes.
 ----------------------------------------
 ==Papers==[papers]
 + E.A. Fox, Q.F. Chen, and L.S. Heath. [A faster algorithm for constructing minimal perfect hash functions. papers/fch92.pdf] In Proc. 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 266-273, 1992.
 %!include: ALGORITHMS.t2t
 %!include: FOOTER.t2t
--- a/README.t2t
+++ b/README.t2t
@@ -176,16 +176,21 @@ utility.
 ```
- usage: cmph [-v] [-h] [-V] [-k nkeys] [-f hash_function] [-g [-c value][-s seed] ] [-a algorithm] [-M memory_in_MB] [-b BRZ_parameter] [-d tmp_dir] [-m file.mph] keysfile
+usage: cmph [-v] [-h] [-V] [-k nkeys] [-f hash_function] [-g [-c value][-s seed] ] 
            [-a algorithm] [-M memory_in_MB] [-b BRZ_parameter] [-d tmp_dir] 
            [-m file.mph] keysfile
 Minimum perfect hashing tool
  -h     print this help message
-  -c     c value that determines the number of vertices in the graph
+  -c     c value determines:
           the number of vertices in the graph for the algorithms BMZ and CHM
           the number of bits per key required in the FCH algorithm
  -a     algorithm - valid values are
          * bmz
          * bmz8
          * chm
          * brz
          * fch
  -f     hash function (may be used multiple times) - valid values are
          * djb2
          * fnv
@@ -201,7 +206,6 @@ utility.
  -d     temporary directory used in brz algorithm
  -b     parmeter of BRZ algorithm to make the maximal number of keys in a bucket lower than 256
  keysfile       line separated file with keys
 ```
 ==Additional Documentation==
--- a/2
+++ b/2
@@ -2,6 +2,7 @@ txt2tags -t html --mask-email -i README.t2t -o index.html
 txt2tags -t html -i BMZ.t2t -o bmz.html
 txt2tags -t html -i BRZ.t2t -o brz.html
 txt2tags -t html -i CHM.t2t -o chm.html
 txt2tags -t html -i FCH.t2t -o fch.html
 txt2tags -t html -i COMPARISON.t2t -o comparison.html
 txt2tags -t html -i GPERF.t2t -o gperf.html
 txt2tags -t html -i FAQ.t2t -o faq.html
@@ -12,6 +13,7 @@ txt2tags -t txt --mask-email -i README.t2t -o README
 txt2tags -t txt -i BMZ.t2t -o BMZ
 txt2tags -t txt -i BRZ.t2t -o BRZ
 txt2tags -t txt -i CHM.t2t -o CHM
 txt2tags -t txt -i FCH.t2t -o FCH
 txt2tags -t txt -i COMPARISON.t2t -o COMPARISON
 txt2tags -t txt -i GPERF.t2t -o GPERF
 txt2tags -t txt -i FAQ.t2t -o FAQ
--- a/src/main.c
+++ b/src/main.c
@@ -30,7 +30,9 @@ void usage_long(const char *prg)
 	fprintf(stderr, "usage: %s [-v] [-h] [-V] [-k nkeys] [-f hash_function] [-g [-c value][-s seed] ] [-a algorithm] [-M memory_in_MB] [-b BRZ_parameter] [-d tmp_dir] [-m file.mph] keysfile\n", prg);   
 	fprintf(stderr, "Minimum perfect hashing tool\n\n"); 
 	fprintf(stderr, "  -h\t print this help message\n");
-	fprintf(stderr, "  -c\t c value that determines the number of vertices in the graph\n");
+	fprintf(stderr, "  -c\t c value determines:\n");
 	fprintf(stderr, "    \t   the number of vertices in the graph for the algorithms BMZ and CHM\n");
 	fprintf(stderr, "    \t   the number of bits per key required in the FCH algorithm\n");
 	fprintf(stderr, "  -a\t algorithm - valid values are\n");
 	for (i = 0; i < CMPH_COUNT; ++i) fprintf(stderr, "    \t  * %s\n", cmph_names[i]);
 	fprintf(stderr, "  -f\t hash function (may be used multiple times) - valid values are\n");