From e5f0aef11c19c2dcd17d0574a26d2eed626d1cbd Mon Sep 17 00:00:00 2001 From: fc_botelho Date: Mon, 24 Jan 2005 18:15:50 +0000 Subject: [PATCH] It was fixed some English mistakes and It was included the files BMZ.t2t, CZECH.t2t and COMPARISON.t2t --- BMZ.t2t | 26 ++++++++++ COMPARISON.t2t | 27 +++++++++++ CZECH.t2t | 24 ++++++++++ README.t2t | 125 +++++++++++++++++++++++++++++-------------------- 4 files changed, 152 insertions(+), 50 deletions(-) create mode 100644 BMZ.t2t create mode 100644 COMPARISON.t2t create mode 100644 CZECH.t2t diff --git a/BMZ.t2t b/BMZ.t2t new file mode 100644 index 0000000..ca4bb87 --- /dev/null +++ b/BMZ.t2t @@ -0,0 +1,26 @@ +BMZ Algorithm + + +---------------------------------------- + +**History** + +**The Algorithm** + +**The Heuristic** + +**Papers** + +---------------------------------------- +[Home README.html] +---------------------------------------- +Enjoy! + +Davi de Castro Reis + +Fabiano Cupertino Botelho + + +%preproc(html): '^%html% ' '' +%html% SourceForge.net Logo +Last Updated: %%date(%c) diff --git a/COMPARISON.t2t b/COMPARISON.t2t new file mode 100644 index 0000000..ee4774e --- /dev/null +++ b/COMPARISON.t2t @@ -0,0 +1,27 @@ +Comparison Between BMZ And CZECH Algorithms + + +---------------------------------------- + +**Features** + +**Constructing Minimal Perfect Hash Functions** + +**Memory Consumption** + + +**Run times** + +---------------------------------------- +[Home README.html] +---------------------------------------- +Enjoy! + +Davi de Castro Reis + +Fabiano Cupertino Botelho + + +%preproc(html): '^%html% ' '' +%html% SourceForge.net Logo +Last Updated: %%date(%c) diff --git a/CZECH.t2t b/CZECH.t2t new file mode 100644 index 0000000..d7dc701 --- /dev/null +++ b/CZECH.t2t @@ -0,0 +1,24 @@ +CZECH Algorithm + + +---------------------------------------- + +**History** + +**The Algorithm** + +**Papers** + +---------------------------------------- +[Home README.html] +---------------------------------------- +Enjoy! + +Davi de Castro Reis + +Fabiano Cupertino Botelho + + +%preproc(html): '^%html% ' '' +%html% SourceForge.net Logo +Last Updated: %%date(%c) diff --git a/README.t2t b/README.t2t index d0c1946..36f7bf2 100644 --- a/README.t2t +++ b/README.t2t @@ -1,41 +1,65 @@ -== cmph - C Minimal Perfect Hashing Library == +CMPH - C Minimal Perfect Hashing Library +---------------------------------------- + **Description** C Minimal Perfect Hashing Library is a portable LGPLed library to create and -work with minimal perfect hashes. The cmph library encapsulates the newest -and more efficient algorithms in the literature in a ease-to-use, -production-quality, fast API. The library is designed to work big entries that -won't fit in the main memory. It has been used successfully to create hashes -bigger than 100 million entries. Although there is a lack of similar libraries -in the free software world, we can point out some of the "distinguishing" +to work with minimal perfect hashing functions. The cmph library encapsulates the newest +and more efficient algorithms (available in the literature) in an easy-to-use, +production-quality and fast API. The library is designed to work with big entries that +can not be fit in the main memory. It has been used successfully for constructing minimal perfect +hashing functions for sets with more than 100 million of keys. +Although there is a lack of similar libraries +in the free software world, we can point out some of the distinguishable features of cmph: -- Fast -- Space-efficient with main memory usage carefully documented -- The best modern algorithms are available (or at least scheduled for implementation :-)) -- Works with in-disk key sets through use of adapter pattern -- Serialization of hash functions -- Portable C code (currently works on GNU/Linux and WIN32) -- Object oriented implementation -- Easily extensible -- Well encapsulated API aiming binary compatibility through releases -- Free Software +- Fast. +- Space-efficient with main memory usage carefully documented. +- The best modern algorithms are available (or at least scheduled for implementation :-)). +- Works with in-disk key sets through of using the adapter pattern. +- Serialization of hash functions. +- Portable C code (currently works on GNU/Linux and WIN32). +- Object oriented implementation. +- Easily extensible. +- Well encapsulated API aiming binary compatibility through releases. +- Free Software. +---------------------------------------- + +**Supported Algorithms** + +- [BMZ Algorithm BMZ.html]. A very fast algorithm based on cyclic random graphs to construct minimal + perfect hash functions in linear time. The resulting functions are not order preserving and + can be stored in only 4cn bytes, where c is between 0.93 and 1.15. + +- [CZECH Algorithm CZECH.html]. An algorithm based on acyclic random graphs to construct minimal + perfect hash functions in linear time. The resulting functions are order preserving and + are stored in 4cn bytes, where c is greater than 2. + +[Click Here COMPARISON.html] to see a comparison of the supported algorithms. + + +---------------------------------------- + **News for version 0.3** -- New heuristics in bmz algorithm, providing hash creation with only - (0.93 * 16 + 4)*n bytes and hash query with (0.93*4)n bytes +- New heuristic added to the bmz algorithm permits to generate a mphf with only + (xxx)*n bytes. The resulting function can be stored in (0.93*4)n bytes. + [click here BMZ.html] for details. + + +---------------------------------------- **Examples** -Using cmph is quite ease. Take a look. +Using cmph is quite simple. Take a look. ``` - // Create minimal perfect hash from in-memory vector + // Create minimal perfect hash function from in-memory vector #include ... @@ -44,7 +68,7 @@ Using cmph is quite ease. Take a look. //Fill vector //... - //Create minimal perfect hash + //Create minimal perfect hashing function using the default(czech) algorithm. cmph_config_t *config = cmph_config_new(cmph_io_vector_adapter(vector, nkeys)); cmph_t *hash = cmph_new(config); cmph_config_destroy(config); @@ -59,7 +83,7 @@ Using cmph is quite ease. Take a look. ------------------------------- ``` - // Create minimal perfect hash from in-disk keys using BMZ algorithm + // Create minimal perfect hash function from in-disk keys using BMZ algorithm #include ... @@ -83,18 +107,18 @@ Using cmph is quite ease. Take a look. ``` -------------------------------------- - **The cmph application** cmph is the name of both the library and the utility application that comes with this package. You can use the cmph -application to create minimal perfect hashes from command line. The cmph utility -comes with a number of flags, but it is very simple to create and query -minimal perfect hashes: +application for constructing minimal perfect hashing functions from the command line. +The cmph utility +comes with a number of flags, but it is very simple to create and to query +minimal perfect hashing functions: ``` - $ # Create mph for keys in file keys_file - $ ./cmph keys_file + $ # Using the czech algorithm (default one) for constructing a mphf for keys in file keys_file + $ ./cmph -g keys_file $ # Query id of keys in the file keys_query $ ./cmph -m keys_file.mph keys_query ``` @@ -105,34 +129,35 @@ utility. ``` - usage: cmph [-v] [-h] [-V] [-k] [-g [-s seed] ] [-m file.mph] [-a algorithm] keysfile + usage: cmph [-v] [-h] [-V] [-k nkeys] [-f hash_function] [-g [-c value][-s seed] ] [-m file.mph] [-a algorithm] keysfile Minimum perfect hashing tool - - -h print this help message - -c c value that determines the number of vertices in the graph - -a algorithm - valid values are - * czech - * bmz - -f hash function (may be used multiple times) - valid values are - * jenkins - * djb2 - * sdbm - * fnv - * glib - * pjw - -V print version number and exit - -v increase verbosity (may be used multiple times) - -k number of keys - -g generation mode - -s random seed - -m minimum perfect hash function file - keysfile line separated file with keys + + -h print this help message + -c c value that determines the number of vertices in the graph + -a algorithm - valid values are + * bmz + * czech + -f hash function (may be used multiple times) - valid values are + * djb2 + * fnv + * glib + * jenkins + * pjw + * sdbm + -V print version number and exit + -v increase verbosity (may be used multiple times) + -k number of keys + -g generation mode + -s random seed + -m minimum perfect hash function file + keysfile line separated file with keys ``` **Downloads** Use the project page at sourceforge: http://sf.net/projects/cmph + **License Stuff** Code is under the LGPL.