2005-01-24 20:15:50 +02:00
|
|
|
CMPH - C Minimal Perfect Hashing Library
|
2005-01-20 14:28:42 +02:00
|
|
|
|
|
|
|
|
2005-01-25 22:33:08 +02:00
|
|
|
%!includeconf: CONFIG.t2t
|
|
|
|
|
2005-01-27 02:04:11 +02:00
|
|
|
-------------------------------------------------------------------
|
2005-01-24 20:15:50 +02:00
|
|
|
|
2005-01-28 22:07:22 +02:00
|
|
|
==Description==
|
2005-01-20 14:28:42 +02:00
|
|
|
|
|
|
|
C Minimal Perfect Hashing Library is a portable LGPLed library to create and
|
2005-01-31 20:50:58 +02:00
|
|
|
to work with [minimal perfect hash functions concepts.html].
|
|
|
|
The cmph library encapsulates the newest
|
2005-01-24 20:15:50 +02:00
|
|
|
and more efficient algorithms (available in the literature) in an easy-to-use,
|
|
|
|
production-quality and fast API. The library is designed to work with big entries that
|
2005-01-25 22:33:08 +02:00
|
|
|
can not fit in the main memory. It has been used successfully for constructing minimal perfect
|
2005-01-27 13:14:13 +02:00
|
|
|
hash functions for sets with more than 100 million of keys.
|
2005-01-24 20:15:50 +02:00
|
|
|
Although there is a lack of similar libraries
|
2005-01-27 02:04:11 +02:00
|
|
|
in the free software world ([gperf is a bit different gperf.html]), we can point out some
|
|
|
|
of the distinguishable features of cmph:
|
2005-01-20 14:28:42 +02:00
|
|
|
|
2005-01-24 20:15:50 +02:00
|
|
|
- Fast.
|
|
|
|
- Space-efficient with main memory usage carefully documented.
|
|
|
|
- The best modern algorithms are available (or at least scheduled for implementation :-)).
|
|
|
|
- Works with in-disk key sets through of using the adapter pattern.
|
|
|
|
- Serialization of hash functions.
|
|
|
|
- Portable C code (currently works on GNU/Linux and WIN32).
|
|
|
|
- Object oriented implementation.
|
|
|
|
- Easily extensible.
|
|
|
|
- Well encapsulated API aiming binary compatibility through releases.
|
|
|
|
- Free Software.
|
|
|
|
|
2005-01-27 13:14:13 +02:00
|
|
|
|
2005-01-24 20:15:50 +02:00
|
|
|
----------------------------------------
|
|
|
|
|
2005-01-28 22:07:22 +02:00
|
|
|
==Supported Algorithms==
|
2005-01-24 20:15:50 +02:00
|
|
|
|
2005-01-25 22:33:08 +02:00
|
|
|
|
|
|
|
%html% - [BMZ Algorithm bmz.html].
|
|
|
|
%txt% - BMZ Algorithm.
|
|
|
|
A very fast algorithm based on cyclic random graphs to construct minimal
|
2005-01-24 20:15:50 +02:00
|
|
|
perfect hash functions in linear time. The resulting functions are not order preserving and
|
2005-01-28 22:07:22 +02:00
|
|
|
can be stored in only //4cn// bytes, where //c// is between 0.93 and 1.15.
|
2005-01-25 22:33:08 +02:00
|
|
|
%html% - [CHM Algorithm chm.html].
|
|
|
|
%txt% - CHM Algorithm.
|
|
|
|
An algorithm based on acyclic random graphs to construct minimal
|
2005-01-24 20:15:50 +02:00
|
|
|
perfect hash functions in linear time. The resulting functions are order preserving and
|
2005-01-28 22:07:22 +02:00
|
|
|
are stored in //4cn// bytes, where //c// is greater than 2.
|
2005-01-24 20:15:50 +02:00
|
|
|
|
2005-01-25 22:33:08 +02:00
|
|
|
%html% [Click Here comparison.html] to see a comparison of the supported algorithms.
|
2005-01-20 14:28:42 +02:00
|
|
|
|
|
|
|
|
2005-01-24 20:15:50 +02:00
|
|
|
----------------------------------------
|
|
|
|
|
2005-01-28 22:07:22 +02:00
|
|
|
==News for version 0.3==
|
2005-01-20 14:28:42 +02:00
|
|
|
|
2005-01-24 20:15:50 +02:00
|
|
|
- New heuristic added to the bmz algorithm permits to generate a mphf with only
|
2005-01-31 21:13:56 +02:00
|
|
|
//24.80n + O(1)// bytes. The resulting function can be stored in //3.72n// bytes.
|
2005-01-31 20:50:58 +02:00
|
|
|
%html% [click here bmz.html#heuristic] for details.
|
2005-01-24 20:15:50 +02:00
|
|
|
|
|
|
|
|
|
|
|
----------------------------------------
|
2005-01-20 14:28:42 +02:00
|
|
|
|
2005-01-28 22:07:22 +02:00
|
|
|
==Examples==
|
2005-01-20 14:28:42 +02:00
|
|
|
|
2005-01-24 20:15:50 +02:00
|
|
|
Using cmph is quite simple. Take a look.
|
2005-01-20 14:28:42 +02:00
|
|
|
|
|
|
|
|
|
|
|
```
|
2005-01-24 20:15:50 +02:00
|
|
|
// Create minimal perfect hash function from in-memory vector
|
2005-01-20 14:28:42 +02:00
|
|
|
#include <cmph.h>
|
|
|
|
...
|
|
|
|
|
|
|
|
const char **vector;
|
|
|
|
unsigned int nkeys;
|
|
|
|
//Fill vector
|
|
|
|
//...
|
|
|
|
|
2005-01-27 13:14:13 +02:00
|
|
|
//Create minimal perfect hash function using the default(chm) algorithm.
|
2005-01-20 14:28:42 +02:00
|
|
|
cmph_config_t *config = cmph_config_new(cmph_io_vector_adapter(vector, nkeys));
|
|
|
|
cmph_t *hash = cmph_new(config);
|
|
|
|
cmph_config_destroy(config);
|
|
|
|
|
|
|
|
//Find key
|
|
|
|
const char *key = "sample key";
|
2005-02-17 20:20:14 +02:00
|
|
|
unsigned int id = cmph_search(hash, key, strlen(key));
|
2005-01-20 14:28:42 +02:00
|
|
|
|
|
|
|
//Destroy hash
|
|
|
|
cmph_destroy(hash);
|
|
|
|
```
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
```
|
2005-01-24 20:15:50 +02:00
|
|
|
// Create minimal perfect hash function from in-disk keys using BMZ algorithm
|
2005-01-20 14:28:42 +02:00
|
|
|
#include <cmph.h>
|
|
|
|
...
|
|
|
|
|
|
|
|
//Open file with newline separated list of keys
|
|
|
|
FILE *fd = fopen("keysfile_newline_separated", "r");
|
|
|
|
//check for errors
|
|
|
|
//...
|
|
|
|
|
|
|
|
cmph_config_t *config = cmph_config_new(cmph_io_nlfile_adapter(fd));
|
|
|
|
cmph_config_set_algo(config, CMPH_BMZ);
|
|
|
|
cmph_t *hash = cmph_new(config);
|
|
|
|
cmph_config_destroy(config);
|
|
|
|
fclose(fd);
|
|
|
|
|
|
|
|
//Find key
|
|
|
|
const char *key = "sample key";
|
2005-02-17 20:20:14 +02:00
|
|
|
unsigned int id = cmph_search(hash, key, strlen(key));
|
2005-01-20 14:28:42 +02:00
|
|
|
|
|
|
|
//Destroy hash
|
|
|
|
cmph_destroy(hash);
|
|
|
|
```
|
|
|
|
--------------------------------------
|
|
|
|
|
2005-01-28 22:07:22 +02:00
|
|
|
==The cmph application==
|
2005-01-20 14:28:42 +02:00
|
|
|
|
|
|
|
cmph is the name of both the library and the utility
|
|
|
|
application that comes with this package. You can use the cmph
|
2005-01-27 13:14:13 +02:00
|
|
|
application for constructing minimal perfect hash functions from the command line.
|
2005-01-24 20:15:50 +02:00
|
|
|
The cmph utility
|
|
|
|
comes with a number of flags, but it is very simple to create and to query
|
2005-01-27 13:14:13 +02:00
|
|
|
minimal perfect hash functions:
|
2005-01-20 14:28:42 +02:00
|
|
|
|
|
|
|
```
|
2005-01-25 22:33:08 +02:00
|
|
|
$ # Using the chm algorithm (default one) for constructing a mphf for keys in file keys_file
|
2005-01-24 20:15:50 +02:00
|
|
|
$ ./cmph -g keys_file
|
2005-01-20 14:28:42 +02:00
|
|
|
$ # Query id of keys in the file keys_query
|
|
|
|
$ ./cmph -m keys_file.mph keys_query
|
|
|
|
```
|
|
|
|
|
|
|
|
The additional options let you set most of the parameters you have
|
|
|
|
available through the C API. Below you can see the full help message for the
|
|
|
|
utility.
|
|
|
|
|
|
|
|
|
|
|
|
```
|
2005-01-24 20:15:50 +02:00
|
|
|
usage: cmph [-v] [-h] [-V] [-k nkeys] [-f hash_function] [-g [-c value][-s seed] ] [-m file.mph] [-a algorithm] keysfile
|
2005-01-20 14:28:42 +02:00
|
|
|
Minimum perfect hashing tool
|
2005-01-24 20:15:50 +02:00
|
|
|
|
|
|
|
-h print this help message
|
|
|
|
-c c value that determines the number of vertices in the graph
|
|
|
|
-a algorithm - valid values are
|
|
|
|
* bmz
|
2005-01-25 22:33:08 +02:00
|
|
|
* chm
|
2005-01-24 20:15:50 +02:00
|
|
|
-f hash function (may be used multiple times) - valid values are
|
|
|
|
* djb2
|
|
|
|
* fnv
|
|
|
|
* jenkins
|
|
|
|
* sdbm
|
|
|
|
-V print version number and exit
|
|
|
|
-v increase verbosity (may be used multiple times)
|
|
|
|
-k number of keys
|
|
|
|
-g generation mode
|
|
|
|
-s random seed
|
|
|
|
-m minimum perfect hash function file
|
|
|
|
keysfile line separated file with keys
|
2005-01-20 14:28:42 +02:00
|
|
|
```
|
|
|
|
|
2005-01-28 22:07:22 +02:00
|
|
|
==Additional Documentation==
|
2005-01-27 15:01:45 +02:00
|
|
|
|
|
|
|
[FAQ faq.html]
|
|
|
|
|
2005-01-28 22:07:22 +02:00
|
|
|
==Downloads==
|
2005-01-20 14:28:42 +02:00
|
|
|
|
|
|
|
Use the project page at sourceforge: http://sf.net/projects/cmph
|
|
|
|
|
2005-01-24 20:15:50 +02:00
|
|
|
|
2005-01-28 22:07:22 +02:00
|
|
|
==License Stuff==
|
2005-01-20 14:28:42 +02:00
|
|
|
|
|
|
|
Code is under the LGPL.
|
|
|
|
----------------------------------------
|
|
|
|
|
2005-01-27 18:21:49 +02:00
|
|
|
%!include: FOOTER.t2t
|
2005-01-20 14:28:42 +02:00
|
|
|
|
2005-01-31 20:50:58 +02:00
|
|
|
%!include(html): ''LOGO.t2t''
|
2005-01-20 14:28:42 +02:00
|
|
|
Last Updated: %%date(%c)
|