1
Fork 0

stable version of BRZ algorithm using buffers

This commit is contained in:
fc_botelho 2006-01-25 19:45:14 +00:00
parent 312947b34f
commit dcd8e025e2
14 changed files with 508 additions and 383 deletions

166
INSTALL
View File

@ -1,27 +1,43 @@
Installation Instructions
*************************
Copyright (C) 1994, 1995, 1996, 1999, 2000, 2001, 2002, 2004, 2005 Free
Software Foundation, Inc.
This file is free documentation; the Free Software Foundation gives
unlimited permission to copy, distribute and modify it.
Basic Installation Basic Installation
================== ==================
These are generic installation instructions. These are generic installation instructions.
The `configure' shell script attempts to guess correct values for The `configure' shell script attempts to guess correct values for
various system-dependent variables used during compilation. It uses various system-dependent variables used during compilation. It uses
those values to create a `Makefile' in each directory of the package. those values to create a `Makefile' in each directory of the package.
It may also create one or more `.h' files containing system-dependent It may also create one or more `.h' files containing system-dependent
definitions. Finally, it creates a shell script `config.status' that definitions. Finally, it creates a shell script `config.status' that
you can run in the future to recreate the current configuration, a file you can run in the future to recreate the current configuration, and a
`config.cache' that saves the results of its tests to speed up file `config.log' containing compiler output (useful mainly for
reconfiguring, and a file `config.log' containing compiler output debugging `configure').
(useful mainly for debugging `configure').
It can also use an optional file (typically called `config.cache'
and enabled with `--cache-file=config.cache' or simply `-C') that saves
the results of its tests to speed up reconfiguring. (Caching is
disabled by default to prevent problems with accidental use of stale
cache files.)
If you need to do unusual things to compile the package, please try If you need to do unusual things to compile the package, please try
to figure out how `configure' could check whether to do them, and mail to figure out how `configure' could check whether to do them, and mail
diffs or instructions to the address given in the `README' so they can diffs or instructions to the address given in the `README' so they can
be considered for the next release. If at some point `config.cache' be considered for the next release. If you are using the cache, and at
contains results you don't want to keep, you may remove or edit it. some point `config.cache' contains results you don't want to keep, you
may remove or edit it.
The file `configure.in' is used to create `configure' by a program The file `configure.ac' (or `configure.in') is used to create
called `autoconf'. You only need `configure.in' if you want to change `configure' by a program called `autoconf'. You only need
it or regenerate `configure' using a newer version of `autoconf'. `configure.ac' if you want to change it or regenerate `configure' using
a newer version of `autoconf'.
The simplest way to compile this package is: The simplest way to compile this package is:
@ -54,20 +70,22 @@ The simplest way to compile this package is:
Compilers and Options Compilers and Options
===================== =====================
Some systems require unusual options for compilation or linking that Some systems require unusual options for compilation or linking that the
the `configure' script does not know about. You can give `configure' `configure' script does not know about. Run `./configure --help' for
initial values for variables by setting them in the environment. Using details on some of the pertinent environment variables.
a Bourne-compatible shell, you can do that on the command line like
this:
CC=c89 CFLAGS=-O2 LIBS=-lposix ./configure
Or on systems that have the `env' program, you can do it like this: You can give `configure' initial values for configuration parameters
env CPPFLAGS=-I/usr/local/include LDFLAGS=-s ./configure by setting variables in the command line or in the environment. Here
is an example:
./configure CC=c89 CFLAGS=-O2 LIBS=-lposix
*Note Defining Variables::, for more details.
Compiling For Multiple Architectures Compiling For Multiple Architectures
==================================== ====================================
You can compile the package for more than one kind of computer at the You can compile the package for more than one kind of computer at the
same time, by placing the object files for each architecture in their same time, by placing the object files for each architecture in their
own directory. To do this, you must use a version of `make' that own directory. To do this, you must use a version of `make' that
supports the `VPATH' variable, such as GNU `make'. `cd' to the supports the `VPATH' variable, such as GNU `make'. `cd' to the
@ -75,28 +93,28 @@ directory where you want the object files and executables to go and run
the `configure' script. `configure' automatically checks for the the `configure' script. `configure' automatically checks for the
source code in the directory that `configure' is in and in `..'. source code in the directory that `configure' is in and in `..'.
If you have to use a `make' that does not supports the `VPATH' If you have to use a `make' that does not support the `VPATH'
variable, you have to compile the package for one architecture at a time variable, you have to compile the package for one architecture at a
in the source code directory. After you have installed the package for time in the source code directory. After you have installed the
one architecture, use `make distclean' before reconfiguring for another package for one architecture, use `make distclean' before reconfiguring
architecture. for another architecture.
Installation Names Installation Names
================== ==================
By default, `make install' will install the package's files in By default, `make install' will install the package's files in
`/usr/local/bin', `/usr/local/man', etc. You can specify an `/usr/local/bin', `/usr/local/man', etc. You can specify an
installation prefix other than `/usr/local' by giving `configure' the installation prefix other than `/usr/local' by giving `configure' the
option `--prefix=PATH'. option `--prefix=PREFIX'.
You can specify separate installation prefixes for You can specify separate installation prefixes for
architecture-specific files and architecture-independent files. If you architecture-specific files and architecture-independent files. If you
give `configure' the option `--exec-prefix=PATH', the package will use give `configure' the option `--exec-prefix=PREFIX', the package will
PATH as the prefix for installing programs and libraries. use PREFIX as the prefix for installing programs and libraries.
Documentation and other data files will still use the regular prefix. Documentation and other data files will still use the regular prefix.
In addition, if you use an unusual directory layout you can give In addition, if you use an unusual directory layout you can give
options like `--bindir=PATH' to specify different values for particular options like `--bindir=DIR' to specify different values for particular
kinds of files. Run `configure --help' for a list of the directories kinds of files. Run `configure --help' for a list of the directories
you can set and what kinds of files go in them. you can set and what kinds of files go in them.
@ -107,7 +125,7 @@ option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
Optional Features Optional Features
================= =================
Some packages pay attention to `--enable-FEATURE' options to Some packages pay attention to `--enable-FEATURE' options to
`configure', where FEATURE indicates an optional part of the package. `configure', where FEATURE indicates an optional part of the package.
They may also pay attention to `--with-PACKAGE' options, where PACKAGE They may also pay attention to `--with-PACKAGE' options, where PACKAGE
is something like `gnu-as' or `x' (for the X Window System). The is something like `gnu-as' or `x' (for the X Window System). The
@ -122,48 +140,86 @@ you can use the `configure' options `--x-includes=DIR' and
Specifying the System Type Specifying the System Type
========================== ==========================
There may be some features `configure' can not figure out There may be some features `configure' cannot figure out automatically,
automatically, but needs to determine by the type of host the package but needs to determine by the type of machine the package will run on.
will run on. Usually `configure' can figure that out, but if it prints Usually, assuming the package is built to be run on the _same_
a message saying it can not guess the host type, give it the architectures, `configure' can figure that out, but if it prints a
`--host=TYPE' option. TYPE can either be a short name for the system message saying it cannot guess the machine type, give it the
type, such as `sun4', or a canonical name with three fields: `--build=TYPE' option. TYPE can either be a short name for the system
type, such as `sun4', or a canonical name which has the form:
CPU-COMPANY-SYSTEM CPU-COMPANY-SYSTEM
See the file `config.sub' for the possible values of each field. If where SYSTEM can have one of these forms:
`config.sub' isn't included in this package, then this package doesn't
need to know the host type.
If you are building compiler tools for cross-compiling, you can also OS KERNEL-OS
See the file `config.sub' for the possible values of each field. If
`config.sub' isn't included in this package, then this package doesn't
need to know the machine type.
If you are _building_ compiler tools for cross-compiling, you should
use the `--target=TYPE' option to select the type of system they will use the `--target=TYPE' option to select the type of system they will
produce code for and the `--build=TYPE' option to select the type of produce code for.
system on which you are compiling the package.
If you want to _use_ a cross compiler, that generates code for a
platform different from the build platform, you should specify the
"host" platform (i.e., that on which the generated programs will
eventually be run) with `--host=TYPE'.
Sharing Defaults Sharing Defaults
================ ================
If you want to set default values for `configure' scripts to share, If you want to set default values for `configure' scripts to share, you
you can create a site shell script called `config.site' that gives can create a site shell script called `config.site' that gives default
default values for variables like `CC', `cache_file', and `prefix'. values for variables like `CC', `cache_file', and `prefix'.
`configure' looks for `PREFIX/share/config.site' if it exists, then `configure' looks for `PREFIX/share/config.site' if it exists, then
`PREFIX/etc/config.site' if it exists. Or, you can set the `PREFIX/etc/config.site' if it exists. Or, you can set the
`CONFIG_SITE' environment variable to the location of the site script. `CONFIG_SITE' environment variable to the location of the site script.
A warning: not all `configure' scripts look for a site script. A warning: not all `configure' scripts look for a site script.
Operation Controls Defining Variables
================== ==================
`configure' recognizes the following options to control how it Variables not defined in a site shell script can be set in the
operates. environment passed to `configure'. However, some packages may run
configure again during the build, and the customized values of these
variables may be lost. In order to avoid this problem, you should set
them in the `configure' command line, using `VAR=value'. For example:
`--cache-file=FILE' ./configure CC=/usr/local2/bin/gcc
Use and save the results of the tests in FILE instead of
`./config.cache'. Set FILE to `/dev/null' to disable caching, for causes the specified `gcc' to be used as the C compiler (unless it is
debugging `configure'. overridden in the site shell script). Here is a another example:
/bin/bash ./configure CONFIG_SHELL=/bin/bash
Here the `CONFIG_SHELL=/bin/bash' operand causes subsequent
configuration-related scripts to be executed by `/bin/bash'.
`configure' Invocation
======================
`configure' recognizes the following options to control how it operates.
`--help' `--help'
`-h'
Print a summary of the options to `configure', and exit. Print a summary of the options to `configure', and exit.
`--version'
`-V'
Print the version of Autoconf used to generate the `configure'
script, and exit.
`--cache-file=FILE'
Enable the cache: use and save the results of the tests in FILE,
traditionally `config.cache'. FILE defaults to `/dev/null' to
disable caching.
`--config-cache'
`-C'
Alias for `--cache-file=config.cache'.
`--quiet' `--quiet'
`--silent' `--silent'
`-q' `-q'
@ -175,8 +231,6 @@ operates.
Look for the package's source code in directory DIR. Usually Look for the package's source code in directory DIR. Usually
`configure' can determine that directory automatically. `configure' can determine that directory automatically.
`--version' `configure' also accepts some other, not widely useful, options. Run
Print the version of Autoconf used to generate the `configure' `configure --help' for more details.
script, and exit.
`configure' also accepts some other, not widely useful, options.

View File

@ -1,7 +1,7 @@
bin_PROGRAMS = cmph bin_PROGRAMS = cmph
lib_LTLIBRARIES = libcmph.la lib_LTLIBRARIES = libcmph.la
include_HEADERS = cmph.h cmph_types.h include_HEADERS = cmph.h cmph_types.h
libcmph_la_SOURCES = debug.h\ libcmph_la_SOURCES = util.h debug.h\
bitbool.h bitbool.c\ bitbool.h bitbool.c\
cmph_types.h\ cmph_types.h\
hash.h hash_state.h hash.c\ hash.h hash_state.h hash.c\
@ -17,9 +17,11 @@ libcmph_la_SOURCES = debug.h\
chm.h chm_structs.h chm.c\ chm.h chm_structs.h chm.c\
bmz.h bmz_structs.h bmz.c\ bmz.h bmz_structs.h bmz.c\
bmz8.h bmz8_structs.h bmz8.c\ bmz8.h bmz8_structs.h bmz8.c\
buffer_manage.h buffer_manage.c\
buffer_entry.h buffer_entry.c\
brz.h brz_structs.h brz.c brz.h brz_structs.h brz.c
libcmph_la_LDFLAGS = -version-info 0:0:0 libcmph_la_LDFLAGS = -version-info 0:0:0
cmph_SOURCES = main.c ../wingetopt.h ../wingetopt.c cmph_SOURCES = main.c wingetopt.h wingetopt.c
cmph_LDADD = libcmph.la cmph_LDADD = libcmph.la

View File

@ -64,7 +64,7 @@ cmph_t *bmz_new(cmph_config_t *mph, float c)
cmph_uint8 *used_edges = NULL; cmph_uint8 *used_edges = NULL;
cmph_uint8 restart_mapping = 0; cmph_uint8 restart_mapping = 0;
cmph_uint8 * visited = NULL; cmph_uint8 * visited = NULL;
bmz_config_data_t *bmz = (bmz_config_data_t *)mph->data; bmz_config_data_t *bmz = (bmz_config_data_t *)mph->data;
DEBUGP("c: %f\n", c); DEBUGP("c: %f\n", c);
bmz->m = mph->key_source->nkeys; bmz->m = mph->key_source->nkeys;
@ -93,7 +93,7 @@ cmph_t *bmz_new(cmph_config_t *mph, float c)
bmz->hashes[0] = hash_state_new(bmz->hashfuncs[0], bmz->n); bmz->hashes[0] = hash_state_new(bmz->hashfuncs[0], bmz->n);
DEBUGP("hash function 2\n"); DEBUGP("hash function 2\n");
bmz->hashes[1] = hash_state_new(bmz->hashfuncs[1], bmz->n); bmz->hashes[1] = hash_state_new(bmz->hashfuncs[1], bmz->n);
DEBUGP("Generating edges\n"); DEBUGP("Generating edges\n");
ok = bmz_gen_edges(mph); ok = bmz_gen_edges(mph);
if (!ok) if (!ok)
{ {
@ -109,20 +109,18 @@ cmph_t *bmz_new(cmph_config_t *mph, float c)
} }
if (iterations == 0) break; if (iterations == 0) break;
} }
else break; else break;
} }
if (iterations == 0) if (iterations == 0)
{ {
graph_destroy(bmz->graph); graph_destroy(bmz->graph);
return NULL; return NULL;
} }
// Ordering step // Ordering step
if (mph->verbosity) if (mph->verbosity)
{ {
fprintf(stderr, "Starting ordering step\n"); fprintf(stderr, "Starting ordering step\n");
} }
graph_obtain_critical_nodes(bmz->graph); graph_obtain_critical_nodes(bmz->graph);
// Searching step // Searching step
@ -164,7 +162,7 @@ cmph_t *bmz_new(cmph_config_t *mph, float c)
free(used_edges); free(used_edges);
free(visited); free(visited);
}while(restart_mapping && iterations_map > 0); }while(restart_mapping && iterations_map > 0);
graph_destroy(bmz->graph); graph_destroy(bmz->graph);
bmz->graph = NULL; bmz->graph = NULL;
if (iterations_map == 0) if (iterations_map == 0)
{ {
@ -181,6 +179,7 @@ cmph_t *bmz_new(cmph_config_t *mph, float c)
bmzf->m = bmz->m; bmzf->m = bmz->m;
mphf->data = bmzf; mphf->data = bmzf;
mphf->size = bmz->m; mphf->size = bmz->m;
DEBUGP("Successfully generated minimal perfect hash\n"); DEBUGP("Successfully generated minimal perfect hash\n");
if (mph->verbosity) if (mph->verbosity)
{ {

View File

@ -5,7 +5,6 @@
#include "hash.h" #include "hash.h"
#include "vqueue.h" #include "vqueue.h"
#include "bitbool.h" #include "bitbool.h"
#include <math.h> #include <math.h>
#include <stdlib.h> #include <stdlib.h>
#include <stdio.h> #include <stdio.h>
@ -66,7 +65,6 @@ cmph_t *bmz8_new(cmph_config_t *mph, float c)
cmph_uint8 * visited = NULL; cmph_uint8 * visited = NULL;
bmz8_config_data_t *bmz8 = (bmz8_config_data_t *)mph->data; bmz8_config_data_t *bmz8 = (bmz8_config_data_t *)mph->data;
if (mph->key_source->nkeys >= 256) if (mph->key_source->nkeys >= 256)
{ {
if (mph->verbosity) fprintf(stderr, "The number of keys in BMZ8 must be lower than 256.\n"); if (mph->verbosity) fprintf(stderr, "The number of keys in BMZ8 must be lower than 256.\n");
@ -168,9 +166,11 @@ cmph_t *bmz8_new(cmph_config_t *mph, float c)
iterations_map--; iterations_map--;
if (mph->verbosity) fprintf(stderr, "Restarting mapping step. %u iterations remaining.\n", iterations_map); if (mph->verbosity) fprintf(stderr, "Restarting mapping step. %u iterations remaining.\n", iterations_map);
} }
free(used_edges); free(used_edges);
free(visited); free(visited);
}while(restart_mapping && iterations_map > 0);
}while(restart_mapping && iterations_map > 0);
graph_destroy(bmz8->graph); graph_destroy(bmz8->graph);
bmz8->graph = NULL; bmz8->graph = NULL;
if (iterations_map == 0) if (iterations_map == 0)
@ -266,8 +266,8 @@ static cmph_uint8 bmz8_traverse_critical_nodes(bmz8_config_data_t *bmz8, cmph_ui
static cmph_uint8 bmz8_traverse_critical_nodes_heuristic(bmz8_config_data_t *bmz8, cmph_uint8 v, cmph_uint8 * biggest_g_value, cmph_uint8 * biggest_edge_value, cmph_uint8 * used_edges, cmph_uint8 * visited) static cmph_uint8 bmz8_traverse_critical_nodes_heuristic(bmz8_config_data_t *bmz8, cmph_uint8 v, cmph_uint8 * biggest_g_value, cmph_uint8 * biggest_edge_value, cmph_uint8 * used_edges, cmph_uint8 * visited)
{ {
cmph_uint8 next_g; cmph_uint8 next_g;
cmph_uint32 u; /* Auxiliary vertex */ cmph_uint32 u;
cmph_uint32 lav; /* lookahead vertex */ cmph_uint32 lav;
cmph_uint8 collision; cmph_uint8 collision;
cmph_uint8 * unused_g_values = NULL; cmph_uint8 * unused_g_values = NULL;
cmph_uint8 unused_g_values_capacity = 0; cmph_uint8 unused_g_values_capacity = 0;
@ -278,7 +278,7 @@ static cmph_uint8 bmz8_traverse_critical_nodes_heuristic(bmz8_config_data_t *bmz
DEBUGP("Labelling critical vertices\n"); DEBUGP("Labelling critical vertices\n");
bmz8->g[v] = (cmph_uint8)ceil ((double)(*biggest_edge_value)/2) - 1; bmz8->g[v] = (cmph_uint8)ceil ((double)(*biggest_edge_value)/2) - 1;
SETBIT(visited, v); SETBIT(visited, v);
next_g = (cmph_uint8)floor((double)(*biggest_edge_value/2)); /* next_g is incremented in the do..while statement*/ next_g = (cmph_uint8)floor((double)(*biggest_edge_value/2));
vqueue_insert(q, v); vqueue_insert(q, v);
while(!vqueue_is_empty(q)) while(!vqueue_is_empty(q))
{ {
@ -324,14 +324,15 @@ static cmph_uint8 bmz8_traverse_critical_nodes_heuristic(bmz8_config_data_t *bmz
{ {
if(nunused_g_values == unused_g_values_capacity) if(nunused_g_values == unused_g_values_capacity)
{ {
unused_g_values = realloc(unused_g_values, (unused_g_values_capacity + BUFSIZ)*sizeof(cmph_uint8)); unused_g_values = realloc(unused_g_values, (unused_g_values_capacity + BUFSIZ)*sizeof(cmph_uint8));
unused_g_values_capacity += BUFSIZ; unused_g_values_capacity += BUFSIZ;
} }
unused_g_values[nunused_g_values++] = next_g; unused_g_values[nunused_g_values++] = next_g;
} }
if (next_g > *biggest_g_value) *biggest_g_value = next_g; if (next_g > *biggest_g_value) *biggest_g_value = next_g;
} }
next_g_index--; next_g_index--;
if (next_g_index < nunused_g_values) unused_g_values[next_g_index] = unused_g_values[--nunused_g_values]; if (next_g_index < nunused_g_values) unused_g_values[next_g_index] = unused_g_values[--nunused_g_values];
@ -345,9 +346,11 @@ static cmph_uint8 bmz8_traverse_critical_nodes_heuristic(bmz8_config_data_t *bmz
if(next_g + bmz8->g[lav] > *biggest_edge_value) *biggest_edge_value = next_g + bmz8->g[lav]; if(next_g + bmz8->g[lav] > *biggest_edge_value) *biggest_edge_value = next_g + bmz8->g[lav];
} }
} }
bmz8->g[u] = next_g; // Labelling vertex u. bmz8->g[u] = next_g; // Labelling vertex u.
SETBIT(visited, u); SETBIT(visited, u);
vqueue_insert(q, u); vqueue_insert(q, u);
} }
} }
@ -537,7 +540,7 @@ cmph_uint8 bmz8_search(cmph_t *mphf, const char *key, cmph_uint32 keylen)
void bmz8_destroy(cmph_t *mphf) void bmz8_destroy(cmph_t *mphf)
{ {
bmz8_data_t *data = (bmz8_data_t *)mphf->data; bmz8_data_t *data = (bmz8_data_t *)mphf->data;
free(data->g); free(data->g);
hash_state_destroy(data->hashes[0]); hash_state_destroy(data->hashes[0]);
hash_state_destroy(data->hashes[1]); hash_state_destroy(data->hashes[1]);
free(data->hashes); free(data->hashes);

364
src/brz.c
View File

@ -4,10 +4,10 @@
#include "brz.h" #include "brz.h"
#include "cmph_structs.h" #include "cmph_structs.h"
#include "brz_structs.h" #include "brz_structs.h"
#include "buffer_manage.h"
#include "cmph.h" #include "cmph.h"
#include "hash.h" #include "hash.h"
#include "bitbool.h" #include "bitbool.h"
#include <math.h> #include <math.h>
#include <stdlib.h> #include <stdlib.h>
#include <stdio.h> #include <stdio.h>
@ -21,12 +21,14 @@ static int brz_gen_graphs(cmph_config_t *mph);
static cmph_uint32 brz_min_index(cmph_uint32 * vector, cmph_uint32 n); static cmph_uint32 brz_min_index(cmph_uint32 * vector, cmph_uint32 n);
static char * brz_read_key(FILE * fd); static char * brz_read_key(FILE * fd);
static void brz_destroy_keys_vd(char ** keys_vd, cmph_uint8 nkeys); static void brz_destroy_keys_vd(char ** keys_vd, cmph_uint8 nkeys);
static void brz_copy_partial_mphf(brz_config_data_t *brz, bmz8_data_t * bmzf, cmph_uint32 index, cmph_io_adapter_t *source); static char * brz_copy_partial_mphf(brz_config_data_t *brz, bmz8_data_t * bmzf, cmph_uint32 index, cmph_uint32 *buflen);
//static void brz_copy_partial_mphf(brz_config_data_t *brz, bmz8_data_t * bmzf, cmph_uint32 index);
static void brz_flush_g(brz_config_data_t *brz, cmph_uint32 *start_index, FILE * fd); static void brz_flush_g(brz_config_data_t *brz, cmph_uint32 *start_index, FILE * fd);
brz_config_data_t *brz_config_new() brz_config_data_t *brz_config_new()
{ {
brz_config_data_t *brz = NULL; brz_config_data_t *brz = NULL;
brz = (brz_config_data_t *)malloc(sizeof(brz_config_data_t)); brz = (brz_config_data_t *)malloc(sizeof(brz_config_data_t));
brz->b = 128;
brz->hashfuncs[0] = CMPH_HASH_JENKINS; brz->hashfuncs[0] = CMPH_HASH_JENKINS;
brz->hashfuncs[1] = CMPH_HASH_JENKINS; brz->hashfuncs[1] = CMPH_HASH_JENKINS;
brz->hashfuncs[2] = CMPH_HASH_JENKINS; brz->hashfuncs[2] = CMPH_HASH_JENKINS;
@ -35,10 +37,11 @@ brz_config_data_t *brz_config_new()
brz->g = NULL; brz->g = NULL;
brz->h1 = NULL; brz->h1 = NULL;
brz->h2 = NULL; brz->h2 = NULL;
brz->h3 = NULL; brz->h0 = NULL;
brz->memory_availability = 1024*1024; brz->memory_availability = 1024*1024;
brz->tmp_dir = (cmph_uint8 *)calloc(10, sizeof(cmph_uint8)); brz->tmp_dir = (cmph_uint8 *)calloc(10, sizeof(cmph_uint8));
strcpy(brz->tmp_dir, "/var/tmp/\0"); brz->mphf_fd = NULL;
strcpy((char *)(brz->tmp_dir), "/var/tmp/");
assert(brz); assert(brz);
return brz; return brz;
} }
@ -46,6 +49,7 @@ brz_config_data_t *brz_config_new()
void brz_config_destroy(cmph_config_t *mph) void brz_config_destroy(cmph_config_t *mph)
{ {
brz_config_data_t *data = (brz_config_data_t *)mph->data; brz_config_data_t *data = (brz_config_data_t *)mph->data;
free(data->tmp_dir);
DEBUGP("Destroying algorithm dependent data\n"); DEBUGP("Destroying algorithm dependent data\n");
free(data); free(data);
} }
@ -74,22 +78,35 @@ void brz_config_set_tmp_dir(cmph_config_t *mph, cmph_uint8 *tmp_dir)
brz_config_data_t *brz = (brz_config_data_t *)mph->data; brz_config_data_t *brz = (brz_config_data_t *)mph->data;
if(tmp_dir) if(tmp_dir)
{ {
cmph_uint32 len = strlen(tmp_dir); cmph_uint32 len = strlen((char *)tmp_dir);
free(brz->tmp_dir); free(brz->tmp_dir);
if(tmp_dir[len-1] != '/') if(tmp_dir[len-1] != '/')
{ {
brz->tmp_dir = calloc(len+2, sizeof(cmph_uint8)); brz->tmp_dir = calloc(len+2, sizeof(cmph_uint8));
sprintf(brz->tmp_dir, "%s/", tmp_dir); sprintf((char *)(brz->tmp_dir), "%s/", (char *)tmp_dir);
} }
else else
{ {
brz->tmp_dir = calloc(len+1, sizeof(cmph_uint8)); brz->tmp_dir = calloc(len+1, sizeof(cmph_uint8));
sprintf(brz->tmp_dir, "%s", tmp_dir); sprintf((char *)(brz->tmp_dir), "%s", (char *)tmp_dir);
} }
} }
} }
void brz_config_set_mphf_fd(cmph_config_t *mph, FILE *mphf_fd)
{
brz_config_data_t *brz = (brz_config_data_t *)mph->data;
brz->mphf_fd = mphf_fd;
assert(brz->mphf_fd);
}
void brz_config_set_b(cmph_config_t *mph, cmph_uint8 b)
{
brz_config_data_t *brz = (brz_config_data_t *)mph->data;
brz->b = b;
}
cmph_t *brz_new(cmph_config_t *mph, float c) cmph_t *brz_new(cmph_config_t *mph, float c)
{ {
cmph_t *mphf = NULL; cmph_t *mphf = NULL;
@ -102,7 +119,7 @@ cmph_t *brz_new(cmph_config_t *mph, float c)
brz->c = c; brz->c = c;
brz->m = mph->key_source->nkeys; brz->m = mph->key_source->nkeys;
DEBUGP("m: %u\n", brz->m); DEBUGP("m: %u\n", brz->m);
brz->k = ceil(brz->m/170); brz->k = ceil(brz->m/(brz->b));
DEBUGP("k: %u\n", brz->k); DEBUGP("k: %u\n", brz->k);
brz->size = (cmph_uint8 *) calloc(brz->k, sizeof(cmph_uint8)); brz->size = (cmph_uint8 *) calloc(brz->k, sizeof(cmph_uint8));
@ -112,22 +129,22 @@ cmph_t *brz_new(cmph_config_t *mph, float c)
fprintf(stderr, "Partioning the set of keys.\n"); fprintf(stderr, "Partioning the set of keys.\n");
} }
brz->h1 = (hash_state_t **)malloc(sizeof(hash_state_t *)*brz->k); // brz->h1 = (hash_state_t **)calloc(brz->k, sizeof(hash_state_t *));
brz->h2 = (hash_state_t **)malloc(sizeof(hash_state_t *)*brz->k); // brz->h2 = (hash_state_t **)calloc(brz->k, sizeof(hash_state_t *));
brz->g = (cmph_uint8 **) malloc(sizeof(cmph_uint8 *) *brz->k); // brz->g = (cmph_uint8 **) calloc(brz->k, sizeof(cmph_uint8 *));
while(1) while(1)
{ {
int ok; int ok;
DEBUGP("hash function 3\n"); DEBUGP("hash function 3\n");
brz->h3 = hash_state_new(brz->hashfuncs[2], brz->k); brz->h0 = hash_state_new(brz->hashfuncs[2], brz->k);
DEBUGP("Generating graphs\n"); DEBUGP("Generating graphs\n");
ok = brz_gen_graphs(mph); ok = brz_gen_graphs(mph);
if (!ok) if (!ok)
{ {
--iterations; --iterations;
hash_state_destroy(brz->h3); hash_state_destroy(brz->h0);
brz->h3 = NULL; brz->h0 = NULL;
DEBUGP("%u iterations remaining to create the graphs in a external file\n", iterations); DEBUGP("%u iterations remaining to create the graphs in a external file\n", iterations);
if (mph->verbosity) if (mph->verbosity)
{ {
@ -150,7 +167,6 @@ cmph_t *brz_new(cmph_config_t *mph, float c)
{ {
brz->offset[i] = brz->size[i-1] + brz->offset[i-1]; brz->offset[i] = brz->size[i-1] + brz->offset[i-1];
} }
// Generating a mphf // Generating a mphf
mphf = (cmph_t *)malloc(sizeof(cmph_t)); mphf = (cmph_t *)malloc(sizeof(cmph_t));
mphf->algo = mph->algo; mphf->algo = mph->algo;
@ -161,14 +177,12 @@ cmph_t *brz_new(cmph_config_t *mph, float c)
brz->h1 = NULL; //transfer memory ownership brz->h1 = NULL; //transfer memory ownership
brzf->h2 = brz->h2; brzf->h2 = brz->h2;
brz->h2 = NULL; //transfer memory ownership brz->h2 = NULL; //transfer memory ownership
brzf->h3 = brz->h3; brzf->h0 = brz->h0;
brz->h3 = NULL; //transfer memory ownership brz->h0 = NULL; //transfer memory ownership
brzf->size = brz->size; brzf->size = brz->size;
brz->size = NULL; //transfer memory ownership brz->size = NULL; //transfer memory ownership
brzf->offset = brz->offset; brzf->offset = brz->offset;
brz->offset = NULL; //transfer memory ownership brz->offset = NULL; //transfer memory ownership
brzf->tmp_dir = brz->tmp_dir;
brz->tmp_dir = NULL; //transfer memory ownership
brzf->k = brz->k; brzf->k = brz->k;
brzf->c = brz->c; brzf->c = brz->c;
brzf->m = brz->m; brzf->m = brz->m;
@ -186,47 +200,42 @@ static int brz_gen_graphs(cmph_config_t *mph)
{ {
cmph_uint32 i, e; cmph_uint32 i, e;
brz_config_data_t *brz = (brz_config_data_t *)mph->data; brz_config_data_t *brz = (brz_config_data_t *)mph->data;
//cmph_uint32 memory_availability = 200*1024*1024;
cmph_uint32 memory_usage = 0; cmph_uint32 memory_usage = 0;
cmph_uint32 nkeys_in_buffer = 0; cmph_uint32 nkeys_in_buffer = 0;
cmph_uint8 *buffer = (cmph_uint8 *)malloc(brz->memory_availability); cmph_uint8 *buffer = (cmph_uint8 *)malloc(brz->memory_availability);
cmph_uint32 *buckets_size = (cmph_uint32 *)calloc(brz->k, sizeof(cmph_uint32)); cmph_uint32 *buckets_size = (cmph_uint32 *)calloc(brz->k, sizeof(cmph_uint32));
cmph_uint32 *keys_index = NULL; cmph_uint32 *keys_index = NULL;
cmph_uint8 **buffer_merge = NULL; cmph_uint8 **buffer_merge = NULL;
cmph_uint32 *buffer_h3 = NULL; cmph_uint32 *buffer_h0 = NULL;
cmph_uint32 nflushes = 0; cmph_uint32 nflushes = 0;
cmph_uint32 h3; cmph_uint32 h0;
FILE * tmp_fd = NULL; FILE * tmp_fd = NULL;
FILE ** tmp_fds = NULL; buffer_manage_t * buff_manage = NULL;
char *filename = NULL; char *filename = NULL;
char *key = NULL; char *key = NULL;
cmph_uint32 keylen; cmph_uint32 keylen;
cmph_uint32 max_size = 0;
cmph_uint32 cur_bucket = 0; cmph_uint32 cur_bucket = 0;
cmph_uint8 nkeys_vd = 0; cmph_uint8 nkeys_vd = 0;
cmph_uint32 start_index = 0;
char ** keys_vd = NULL; char ** keys_vd = NULL;
mph->key_source->rewind(mph->key_source->data); mph->key_source->rewind(mph->key_source->data);
DEBUGP("Generating graphs from %u keys\n", brz->m); DEBUGP("Generating graphs from %u keys\n", brz->m);
// Partitioning // Partitioning
for (e = 0; e < brz->m; ++e) for (e = 0; e < brz->m; ++e)
{ {
mph->key_source->read(mph->key_source->data, &key, &keylen); mph->key_source->read(mph->key_source->data, &key, &keylen);
/* Buffers management */ /* Buffers management */
if (memory_usage + keylen + 1 > brz->memory_availability) // flush buffers if (memory_usage + keylen + 1 > brz->memory_availability) // flush buffers
{ {
if(mph->verbosity) if(mph->verbosity)
{ {
fprintf(stderr, "Flushing %u\n", nkeys_in_buffer); fprintf(stderr, "Flushing %u\n", nkeys_in_buffer);
} }
cmph_uint32 value = buckets_size[0]; cmph_uint32 value = buckets_size[0];
cmph_uint32 sum = 0; cmph_uint32 sum = 0;
cmph_uint32 keylen1 = 0;
cmph_uint32 keylen1 = 0; buckets_size[0] = 0;
buckets_size[0] = 0;
for(i = 1; i < brz->k; i++) for(i = 1; i < brz->k; i++)
{ {
if(buckets_size[i] == 0) continue; if(buckets_size[i] == 0) continue;
@ -239,20 +248,20 @@ static int brz_gen_graphs(cmph_config_t *mph)
keys_index = (cmph_uint32 *)calloc(nkeys_in_buffer, sizeof(cmph_uint32)); keys_index = (cmph_uint32 *)calloc(nkeys_in_buffer, sizeof(cmph_uint32));
for(i = 0; i < nkeys_in_buffer; i++) for(i = 0; i < nkeys_in_buffer; i++)
{ {
keylen1 = strlen(buffer + memory_usage); keylen1 = strlen((char *)(buffer + memory_usage));
h3 = hash(brz->h3, buffer + memory_usage, keylen1) % brz->k; h0 = hash(brz->h0, (char *)(buffer + memory_usage), keylen1) % brz->k;
keys_index[buckets_size[h3]] = memory_usage; keys_index[buckets_size[h0]] = memory_usage;
buckets_size[h3]++; buckets_size[h0]++;
memory_usage = memory_usage + keylen1 + 1; memory_usage = memory_usage + keylen1 + 1;
} }
filename = (char *)calloc(strlen(brz->tmp_dir) + 11, sizeof(char)); filename = (char *)calloc(strlen((char *)(brz->tmp_dir)) + 11, sizeof(char));
sprintf(filename, "%s%u.cmph",brz->tmp_dir, nflushes); sprintf(filename, "%s%u.cmph",brz->tmp_dir, nflushes);
tmp_fd = fopen(filename, "wb"); tmp_fd = fopen(filename, "wb");
free(filename); free(filename);
filename = NULL; filename = NULL;
for(i = 0; i < nkeys_in_buffer; i++) for(i = 0; i < nkeys_in_buffer; i++)
{ {
keylen1 = strlen(buffer + keys_index[i]) + 1; keylen1 = strlen((char *)(buffer + keys_index[i])) + 1;
fwrite(buffer + keys_index[i], 1, keylen1, tmp_fd); fwrite(buffer + keys_index[i], 1, keylen1, tmp_fd);
} }
nkeys_in_buffer = 0; nkeys_in_buffer = 0;
@ -264,17 +273,16 @@ static int brz_gen_graphs(cmph_config_t *mph)
} }
memcpy(buffer + memory_usage, key, keylen + 1); memcpy(buffer + memory_usage, key, keylen + 1);
memory_usage = memory_usage + keylen + 1; memory_usage = memory_usage + keylen + 1;
h3 = hash(brz->h3, key, keylen) % brz->k; h0 = hash(brz->h0, key, keylen) % brz->k;
if ((brz->size[h3] == MAX_BUCKET_SIZE) || ((brz->c >= 1.0) && (cmph_uint8)(brz->c * brz->size[h3]) < brz->size[h3])) if ((brz->size[h0] == MAX_BUCKET_SIZE) || ((brz->c >= 1.0) && (cmph_uint8)(brz->c * brz->size[h0]) < brz->size[h0]))
{ {
free(buffer); free(buffer);
free(buckets_size); free(buckets_size);
return 0; return 0;
} }
brz->size[h3] = brz->size[h3] + 1; brz->size[h0] = brz->size[h0] + 1;
buckets_size[h3] ++; buckets_size[h0] ++;
nkeys_in_buffer++; nkeys_in_buffer++;
mph->key_source->dispose(mph->key_source->data, key, keylen); mph->key_source->dispose(mph->key_source->data, key, keylen);
} }
@ -299,20 +307,20 @@ static int brz_gen_graphs(cmph_config_t *mph)
keys_index = (cmph_uint32 *)calloc(nkeys_in_buffer, sizeof(cmph_uint32)); keys_index = (cmph_uint32 *)calloc(nkeys_in_buffer, sizeof(cmph_uint32));
for(i = 0; i < nkeys_in_buffer; i++) for(i = 0; i < nkeys_in_buffer; i++)
{ {
keylen1 = strlen(buffer + memory_usage); keylen1 = strlen((char *)(buffer + memory_usage));
h3 = hash(brz->h3, buffer + memory_usage, keylen1) % brz->k; h0 = hash(brz->h0, (char *)(buffer + memory_usage), keylen1) % brz->k;
keys_index[buckets_size[h3]] = memory_usage; keys_index[buckets_size[h0]] = memory_usage;
buckets_size[h3]++; buckets_size[h0]++;
memory_usage = memory_usage + keylen1 + 1; memory_usage = memory_usage + keylen1 + 1;
} }
filename = (char *)calloc(strlen(brz->tmp_dir) + 11, sizeof(char)); filename = (char *)calloc(strlen((char *)(brz->tmp_dir)) + 11, sizeof(char));
sprintf(filename, "%s%u.cmph",brz->tmp_dir, nflushes); sprintf(filename, "%s%u.cmph",brz->tmp_dir, nflushes);
tmp_fd = fopen(filename, "wb"); tmp_fd = fopen(filename, "wb");
free(filename); free(filename);
filename = NULL; filename = NULL;
for(i = 0; i < nkeys_in_buffer; i++) for(i = 0; i < nkeys_in_buffer; i++)
{ {
keylen1 = strlen(buffer + keys_index[i]) + 1; keylen1 = strlen((char *)(buffer + keys_index[i])) + 1;
fwrite(buffer + keys_index[i], 1, keylen1, tmp_fd); fwrite(buffer + keys_index[i], 1, keylen1, tmp_fd);
} }
nkeys_in_buffer = 0; nkeys_in_buffer = 0;
@ -322,66 +330,70 @@ static int brz_gen_graphs(cmph_config_t *mph)
free(keys_index); free(keys_index);
fclose(tmp_fd); fclose(tmp_fd);
} }
free(buffer); free(buffer);
free(buckets_size); free(buckets_size);
if(nflushes > 1024) return 0; // Too many files generated. if(nflushes > 1024) return 0; // Too many files generated.
// mphf generation // mphf generation
if(mph->verbosity) if(mph->verbosity)
{ {
fprintf(stderr, "\nMPHF generation \n"); fprintf(stderr, "\nMPHF generation \n");
} }
tmp_fds = (FILE **)calloc(nflushes, sizeof(FILE *)); /* Starting to dump to disk the resultant MPHF: __cmph_dump function */
fwrite(cmph_names[CMPH_BRZ], (cmph_uint32)(strlen(cmph_names[CMPH_BRZ]) + 1), 1, brz->mphf_fd);
fwrite(&(brz->m), sizeof(brz->m), 1, brz->mphf_fd);
fwrite(&(brz->c), sizeof(cmph_float32), 1, brz->mphf_fd);
fwrite(&(brz->k), sizeof(cmph_uint32), 1, brz->mphf_fd); // number of MPHFs
fwrite(brz->size, sizeof(cmph_uint8)*(brz->k), 1, brz->mphf_fd);
//tmp_fds = (FILE **)calloc(nflushes, sizeof(FILE *));
buff_manage = buffer_manage_new(brz->memory_availability, nflushes);
buffer_merge = (cmph_uint8 **)calloc(nflushes, sizeof(cmph_uint8 *)); buffer_merge = (cmph_uint8 **)calloc(nflushes, sizeof(cmph_uint8 *));
buffer_h3 = (cmph_uint32 *)calloc(nflushes, sizeof(cmph_uint32)); buffer_h0 = (cmph_uint32 *)calloc(nflushes, sizeof(cmph_uint32));
filename = (char *)calloc(strlen(brz->tmp_dir) + 11, sizeof(char));
sprintf(filename, "%stmpg.cmph",brz->tmp_dir);
tmp_fd = fopen(filename, "w");
free(filename);
memory_usage = 0; memory_usage = 0;
for(i = 0; i < nflushes; i++) for(i = 0; i < nflushes; i++)
{ {
filename = (char *)calloc(strlen(brz->tmp_dir) + 11, sizeof(char)); filename = (char *)calloc(strlen((char *)(brz->tmp_dir)) + 11, sizeof(char));
sprintf(filename, "%s%u.cmph",brz->tmp_dir, i); sprintf(filename, "%s%u.cmph",brz->tmp_dir, i);
tmp_fds[i] = fopen(filename, "rb"); buffer_manage_open(buff_manage, i, filename);
free(filename); free(filename);
filename = NULL; filename = NULL;
key = brz_read_key(tmp_fds[i]); key = (char *)buffer_manage_read_key(buff_manage, i);
keylen = strlen(key); keylen = strlen(key);
h3 = hash(brz->h3, key, keylen) % brz->k; h0 = hash(brz->h0, key, keylen) % brz->k;
buffer_h3[i] = h3; buffer_h0[i] = h0;
buffer_merge[i] = (cmph_uint8 *)calloc(keylen + 1, sizeof(cmph_uint8)); buffer_merge[i] = (cmph_uint8 *)calloc(keylen + 1, sizeof(cmph_uint8));
memcpy(buffer_merge[i], key, keylen + 1); memcpy(buffer_merge[i], key, keylen + 1);
free(key); free(key);
} }
e = 0; e = 0;
keys_vd = (char **)calloc(MAX_BUCKET_SIZE, sizeof(char *)); keys_vd = (char **)calloc(MAX_BUCKET_SIZE, sizeof(char *));
nkeys_vd = 0; nkeys_vd = 0;
while(e < brz->m) while(e < brz->m)
{ {
i = brz_min_index(buffer_h3, nflushes); i = brz_min_index(buffer_h0, nflushes);
cur_bucket = buffer_h3[i]; cur_bucket = buffer_h0[i];
key = brz_read_key(tmp_fds[i]); key = (char *)buffer_manage_read_key(buff_manage, i);
if(key) if(key)
{ {
while(key) while(key)
{ {
keylen = strlen(key); keylen = strlen(key);
h3 = hash(brz->h3, key, keylen) % brz->k; h0 = hash(brz->h0, key, keylen) % brz->k;
if (h0 != buffer_h0[i]) break;
if (h3 != buffer_h3[i]) break;
keys_vd[nkeys_vd++] = key; keys_vd[nkeys_vd++] = key;
key = NULL; //transfer memory ownership
e++; e++;
key = brz_read_key(tmp_fds[i]); key = (char *)buffer_manage_read_key(buff_manage, i);
} }
if (key) if (key)
{ {
assert(nkeys_vd < brz->size[cur_bucket]); assert(nkeys_vd < brz->size[cur_bucket]);
keys_vd[nkeys_vd++] = buffer_merge[i]; keys_vd[nkeys_vd++] = (char *)buffer_merge[i];
buffer_merge[i] = NULL; //transfer memory ownership
e++; e++;
buffer_h3[i] = h3; buffer_h0[i] = h0;
buffer_merge[i] = (cmph_uint8 *)calloc(keylen + 1, sizeof(cmph_uint8)); buffer_merge[i] = (cmph_uint8 *)calloc(keylen + 1, sizeof(cmph_uint8));
memcpy(buffer_merge[i], key, keylen + 1); memcpy(buffer_merge[i], key, keylen + 1);
free(key); free(key);
@ -390,10 +402,10 @@ static int brz_gen_graphs(cmph_config_t *mph)
if(!key) if(!key)
{ {
assert(nkeys_vd < brz->size[cur_bucket]); assert(nkeys_vd < brz->size[cur_bucket]);
keys_vd[nkeys_vd++] = buffer_merge[i]; keys_vd[nkeys_vd++] = (char *)buffer_merge[i];
buffer_merge[i] = NULL; //transfer memory ownership
e++; e++;
buffer_h3[i] = UINT_MAX; buffer_h0[i] = UINT_MAX;
buffer_merge[i] = NULL;
} }
if(nkeys_vd == brz->size[cur_bucket]) // Generating mphf for each bucket. if(nkeys_vd == brz->size[cur_bucket]) // Generating mphf for each bucket.
@ -402,35 +414,33 @@ static int brz_gen_graphs(cmph_config_t *mph)
cmph_config_t *config = NULL; cmph_config_t *config = NULL;
cmph_t *mphf_tmp = NULL; cmph_t *mphf_tmp = NULL;
bmz8_data_t * bmzf = NULL; bmz8_data_t * bmzf = NULL;
char *bufmphf = NULL;
cmph_uint32 buflenmphf = 0;
// Source of keys // Source of keys
if(nkeys_vd > max_size) max_size = nkeys_vd;
source = cmph_io_vector_adapter(keys_vd, (cmph_uint32)nkeys_vd); source = cmph_io_vector_adapter(keys_vd, (cmph_uint32)nkeys_vd);
config = cmph_config_new(source); config = cmph_config_new(source);
cmph_config_set_algo(config, CMPH_BMZ8); cmph_config_set_algo(config, CMPH_BMZ8);
cmph_config_set_graphsize(config, brz->c); cmph_config_set_graphsize(config, brz->c);
mphf_tmp = cmph_new(config); mphf_tmp = cmph_new(config);
bmzf = (bmz8_data_t *)mphf_tmp->data; bmzf = (bmz8_data_t *)mphf_tmp->data;
brz_copy_partial_mphf(brz, bmzf, cur_bucket, source); bufmphf = brz_copy_partial_mphf(brz, bmzf, cur_bucket, &buflenmphf);
memory_usage += brz->size[cur_bucket]; bmzf = NULL;
if((cur_bucket+1 == brz->k)||(memory_usage > brz->memory_availability)) fwrite(bufmphf, buflenmphf, 1, brz->mphf_fd);
{ free(bufmphf);
brz_flush_g(brz, &start_index, tmp_fd); bufmphf = NULL;
memory_usage = 0;
}
cmph_config_destroy(config); cmph_config_destroy(config);
brz_destroy_keys_vd(keys_vd, nkeys_vd); brz_destroy_keys_vd(keys_vd, nkeys_vd);
cmph_destroy(mphf_tmp); cmph_destroy(mphf_tmp);
free(source); cmph_io_vector_adapter_destroy(source);
nkeys_vd = 0; nkeys_vd = 0;
} }
} }
fclose(tmp_fd);
for(i = 0; i < nflushes; i++) fclose(tmp_fds[i]); buffer_manage_destroy(buff_manage);
free(tmp_fds);
free(keys_vd); free(keys_vd);
free(buffer_merge); free(buffer_merge);
free(buffer_h3); free(buffer_h0);
fprintf(stderr, "Maximal Size: %u\n", max_size);
return 1; return 1;
} }
@ -467,7 +477,7 @@ static char * brz_read_key(FILE * fd)
static void brz_destroy_keys_vd(char ** keys_vd, cmph_uint8 nkeys) static void brz_destroy_keys_vd(char ** keys_vd, cmph_uint8 nkeys)
{ {
cmph_uint8 i; cmph_uint8 i;
for(i = 0; i < nkeys; i++) free(keys_vd[i]); for(i = 0; i < nkeys; i++) { free(keys_vd[i]); keys_vd[i] = NULL;}
} }
static void brz_flush_g(brz_config_data_t *brz, cmph_uint32 *start_index, FILE * fd) static void brz_flush_g(brz_config_data_t *brz, cmph_uint32 *start_index, FILE * fd)
@ -481,93 +491,84 @@ static void brz_flush_g(brz_config_data_t *brz, cmph_uint32 *start_index, FILE *
} }
} }
static void brz_copy_partial_mphf(brz_config_data_t *brz, bmz8_data_t * bmzf, cmph_uint32 index, cmph_io_adapter_t *source) static char * brz_copy_partial_mphf(brz_config_data_t *brz, bmz8_data_t * bmzf, cmph_uint32 index, cmph_uint32 *buflen)
{
cmph_uint32 i;
cmph_uint32 buflenh1 = 0;
cmph_uint32 buflenh2 = 0;
char * bufh1 = NULL;
char * bufh2 = NULL;
char * buf = NULL;
cmph_uint32 n = ceil(brz->c * brz->size[index]);
hash_state_dump(bmzf->hashes[0], &bufh1, &buflenh1);
hash_state_dump(bmzf->hashes[1], &bufh2, &buflenh2);
*buflen = buflenh1 + buflenh2 + n + 2*sizeof(cmph_uint32);
buf = (char *)malloc(*buflen);
//fprintf(stderr,"entrei passei\n");
memcpy(buf, &buflenh1, sizeof(cmph_uint32));
memcpy(buf+sizeof(cmph_uint32), bufh1, buflenh1);
memcpy(buf+sizeof(cmph_uint32)+buflenh1, &buflenh2, sizeof(cmph_uint32));
memcpy(buf+2*sizeof(cmph_uint32)+buflenh1, bufh2, buflenh2);
memcpy(buf+2*sizeof(cmph_uint32)+buflenh1+buflenh2,bmzf->g, n);
free(bufh1);
free(bufh2);
return buf;
}
/*static void brz_copy_partial_mphf(brz_config_data_t *brz, bmz8_data_t * bmzf, cmph_uint32 index)
{ {
cmph_uint32 i; cmph_uint32 i;
cmph_uint32 n = ceil(brz->c * brz->size[index]); cmph_uint32 n = ceil(brz->c * brz->size[index]);
if( brz->g[index]) {fprintf(stderr, "index:%u\n",index);exit(10);}
brz->g[index] = (cmph_uint8 *)calloc(n, sizeof(cmph_uint8)); brz->g[index] = (cmph_uint8 *)calloc(n, sizeof(cmph_uint8));
for(i = 0; i < n; i++) for(i = 0; i < n; i++)
{ {
brz->g[index][i] = bmzf->g[i]; brz->g[index][i] = bmzf->g[i];
//fprintf(stderr, "gsrc[%u]: %u gdest: %u\n", i, (cmph_uint8) bmzf->g[i], brz->g[index][i]); //fprintf(stderr, "gsrc[%u]: %u gdest: %u\n", i, (cmph_uint8) bmzf->g[i], brz->g[index][i]);
} }
brz->h1[index] = hash_state_copy(bmzf->hashes[0]); brz->h1[index] = hash_state_copy(bmzf->hashes[0]);
brz->h2[index] = hash_state_copy(bmzf->hashes[1]); brz->h2[index] = hash_state_copy(bmzf->hashes[1]);
} }
*/
int brz_dump(cmph_t *mphf, FILE *fd) int brz_dump(cmph_t *mphf, FILE *fd)
{ {
brz_data_t *data = (brz_data_t *)mphf->data;
char *buf = NULL; char *buf = NULL;
cmph_uint32 buflen; cmph_uint32 buflen;
cmph_uint32 i;
brz_data_t *data = (brz_data_t *)mphf->data;
FILE * tmpg_fd = NULL;
char * filename = NULL;
filename = (char *)calloc(strlen(data->tmp_dir) + 11, sizeof(char));
sprintf(filename, "%stmpg.cmph",data->tmp_dir);
tmpg_fd = fopen(filename, "rb");
free(filename);
DEBUGP("Dumping brzf\n"); DEBUGP("Dumping brzf\n");
__cmph_dump(mphf, fd); // The initial part of the MPHF have already been dumped to disk during construction
// Dumping h0
fwrite(&(data->k), sizeof(cmph_uint32), 1, fd); hash_state_dump(data->h0, &buf, &buflen);
//dumping h1 and h2. DEBUGP("Dumping hash state with %u bytes to disk\n", buflen);
for(i = 0; i < data->k; i++) fwrite(&buflen, sizeof(cmph_uint32), 1, fd);
{ fwrite(buf, buflen, 1, fd);
// h1 free(buf);
hash_state_dump(data->h1[i], &buf, &buflen); // Dumping m and the vector offset.
DEBUGP("Dumping hash state with %u bytes to disk\n", buflen); fwrite(&(data->m), sizeof(cmph_uint32), 1, fd);
fwrite(&buflen, sizeof(cmph_uint32), 1, fd);
fwrite(buf, buflen, 1, fd);
free(buf);
// h2
hash_state_dump(data->h2[i], &buf, &buflen);
DEBUGP("Dumping hash state with %u bytes to disk\n", buflen);
fwrite(&buflen, sizeof(cmph_uint32), 1, fd);
fwrite(buf, buflen, 1, fd);
free(buf);
}
// Dumping h3.
hash_state_dump(data->h3, &buf, &buflen);
DEBUGP("Dumping hash state with %u bytes to disk\n", buflen);
fwrite(&buflen, sizeof(cmph_uint32), 1, fd);
fwrite(buf, buflen, 1, fd);
free(buf);
// Dumping c, m, size vector and offset vector.
fwrite(&(data->c), sizeof(cmph_float32), 1, fd);
fwrite(&(data->m), sizeof(cmph_uint32), 1, fd);
fwrite(data->size, sizeof(cmph_uint8)*(data->k), 1, fd);
fwrite(data->offset, sizeof(cmph_uint32)*(data->k), 1, fd); fwrite(data->offset, sizeof(cmph_uint32)*(data->k), 1, fd);
// Dumping g function.
for(i = 0; i < data->k; i++)
{
cmph_uint32 n = ceil(data->c * data->size[i]);
buf = (char *)calloc(n, sizeof(cmph_uint8));
fread(buf, sizeof(cmph_uint8), n, tmpg_fd);
fwrite(buf, sizeof(cmph_uint8), n, fd);
free(buf);
}
fclose(tmpg_fd);
return 1; return 1;
} }
void brz_load(FILE *f, cmph_t *mphf) void brz_load(FILE *f, cmph_t *mphf)
{ {
char *buf = NULL; char *buf = NULL;
cmph_uint32 buflen; cmph_uint32 buflen;
cmph_uint32 i; cmph_uint32 i, n;
brz_data_t *brz = (brz_data_t *)malloc(sizeof(brz_data_t)); brz_data_t *brz = (brz_data_t *)malloc(sizeof(brz_data_t));
DEBUGP("Loading brz mphf\n"); DEBUGP("Loading brz mphf\n");
mphf->data = brz; mphf->data = brz;
fread(&(brz->c), sizeof(cmph_float32), 1, f);
fread(&(brz->k), sizeof(cmph_uint32), 1, f); fread(&(brz->k), sizeof(cmph_uint32), 1, f);
brz->size = (cmph_uint8 *) malloc(sizeof(cmph_uint8)*brz->k);
fread(brz->size, sizeof(cmph_uint8)*(brz->k), 1, f);
brz->h1 = (hash_state_t **)malloc(sizeof(hash_state_t *)*brz->k); brz->h1 = (hash_state_t **)malloc(sizeof(hash_state_t *)*brz->k);
brz->h2 = (hash_state_t **)malloc(sizeof(hash_state_t *)*brz->k); brz->h2 = (hash_state_t **)malloc(sizeof(hash_state_t *)*brz->k);
brz->g = (cmph_uint8 **) calloc(brz->k, sizeof(cmph_uint8 *));
DEBUGP("Reading %u h1 and %u h2\n", brz->k, brz->k); DEBUGP("Reading %u h1 and %u h2\n", brz->k, brz->k);
//loading h1 and h2. //loading h_i1, h_i2 and g_i.
for(i = 0; i < brz->k; i++) for(i = 0; i < brz->k; i++)
{ {
// h1 // h1
@ -583,69 +584,62 @@ void brz_load(FILE *f, cmph_t *mphf)
buf = (char *)malloc(buflen); buf = (char *)malloc(buflen);
fread(buf, buflen, 1, f); fread(buf, buflen, 1, f);
brz->h2[i] = hash_state_load(buf, buflen); brz->h2[i] = hash_state_load(buf, buflen);
free(buf); free(buf);
n = ceil(brz->c * brz->size[i]);
DEBUGP("g_i has %u bytes\n", n);
brz->g[i] = (cmph_uint8 *)calloc(n, sizeof(cmph_uint8));
fread(brz->g[i], sizeof(cmph_uint8)*n, 1, f);
} }
//loading h3 //loading h0
fread(&buflen, sizeof(cmph_uint32), 1, f); fread(&buflen, sizeof(cmph_uint32), 1, f);
DEBUGP("Hash state has %u bytes\n", buflen); DEBUGP("Hash state has %u bytes\n", buflen);
buf = (char *)malloc(buflen); buf = (char *)malloc(buflen);
fread(buf, buflen, 1, f); fread(buf, buflen, 1, f);
brz->h3 = hash_state_load(buf, buflen); brz->h0 = hash_state_load(buf, buflen);
free(buf); free(buf);
//loading c, m, size vector and offset vector. //loading c, m, and the vector offset.
fread(&(brz->c), sizeof(cmph_float32), 1, f);
fread(&(brz->m), sizeof(cmph_uint32), 1, f); fread(&(brz->m), sizeof(cmph_uint32), 1, f);
brz->size = (cmph_uint8 *) malloc(sizeof(cmph_uint8)*brz->k);
brz->offset = (cmph_uint32 *)malloc(sizeof(cmph_uint32)*brz->k); brz->offset = (cmph_uint32 *)malloc(sizeof(cmph_uint32)*brz->k);
fread(brz->size, sizeof(cmph_uint8)*(brz->k), 1, f); fread(brz->offset, sizeof(cmph_uint32)*(brz->k), 1, f);
fread(brz->offset, sizeof(cmph_uint32)*(brz->k), 1, f);
//loading g function.
brz->g = (cmph_uint8 **) malloc(sizeof(cmph_uint8 *)*brz->k);
for(i = 0; i < brz->k; i++)
{
cmph_uint32 n = ceil(brz->c * brz->size[i]);
brz->g[i] = (cmph_uint8 *)malloc(sizeof(cmph_uint8)*n);
fread(brz->g[i], sizeof(cmph_uint8)*n, 1, f);
}
return; return;
} }
cmph_uint32 brz_search(cmph_t *mphf, const char *key, cmph_uint32 keylen) cmph_uint32 brz_search(cmph_t *mphf, const char *key, cmph_uint32 keylen)
{ {
brz_data_t *brz = mphf->data; brz_data_t *brz = mphf->data;
cmph_uint32 h3 = hash(brz->h3, key, keylen) % brz->k; cmph_uint32 h0 = hash(brz->h0, key, keylen) % brz->k;
cmph_uint32 m = brz->size[h3]; cmph_uint32 m = brz->size[h0];
cmph_uint32 n = ceil(brz->c * m); cmph_uint32 n = ceil(brz->c * m);
cmph_uint32 h1 = hash(brz->h1[h3], key, keylen) % n; cmph_uint32 h1 = hash(brz->h1[h0], key, keylen) % n;
cmph_uint32 h2 = hash(brz->h2[h3], key, keylen) % n; cmph_uint32 h2 = hash(brz->h2[h0], key, keylen) % n;
cmph_uint8 mphf_bucket; cmph_uint8 mphf_bucket;
if (h1 == h2 && ++h2 >= n) h2 = 0; if (h1 == h2 && ++h2 >= n) h2 = 0;
mphf_bucket = brz->g[h3][h1] + brz->g[h3][h2]; mphf_bucket = brz->g[h0][h1] + brz->g[h0][h2];
DEBUGP("key: %s h1: %u h2: %u h3: %u\n", key, h1, h2, h3); DEBUGP("key: %s h1: %u h2: %u h0: %u\n", key, h1, h2, h0);
DEBUGP("key: %s g[h1]: %u g[h2]: %u offset[h3]: %u edges: %u\n", key, brz->g[h3][h1], brz->g[h3][h2], brz->offset[h3], brz->m); DEBUGP("key: %s g[h1]: %u g[h2]: %u offset[h0]: %u edges: %u\n", key, brz->g[h0][h1], brz->g[h0][h2], brz->offset[h0], brz->m);
DEBUGP("Address: %u\n", mphf_bucket + brz->offset[h3]); DEBUGP("Address: %u\n", mphf_bucket + brz->offset[h0]);
return (mphf_bucket + brz->offset[h3]); return (mphf_bucket + brz->offset[h0]);
} }
void brz_destroy(cmph_t *mphf) void brz_destroy(cmph_t *mphf)
{ {
cmph_uint32 i; cmph_uint32 i;
brz_data_t *data = (brz_data_t *)mphf->data; brz_data_t *data = (brz_data_t *)mphf->data;
for(i = 0; i < data->k; i++) if(data->g)
{ {
free(data->g[i]); for(i = 0; i < data->k; i++)
hash_state_destroy(data->h1[i]); {
hash_state_destroy(data->h2[i]); free(data->g[i]);
hash_state_destroy(data->h1[i]);
hash_state_destroy(data->h2[i]);
}
free(data->g);
free(data->h1);
free(data->h2);
} }
hash_state_destroy(data->h3); hash_state_destroy(data->h0);
free(data->g);
free(data->h1);
free(data->h2);
free(data->size); free(data->size);
free(data->offset); free(data->offset);
free(data->tmp_dir);
free(data); free(data);
free(mphf); free(mphf);
} }

View File

@ -9,6 +9,8 @@ typedef struct __brz_config_data_t brz_config_data_t;
brz_config_data_t *brz_config_new(); brz_config_data_t *brz_config_new();
void brz_config_set_hashfuncs(cmph_config_t *mph, CMPH_HASH *hashfuncs); void brz_config_set_hashfuncs(cmph_config_t *mph, CMPH_HASH *hashfuncs);
void brz_config_set_tmp_dir(cmph_config_t *mph, cmph_uint8 *tmp_dir); void brz_config_set_tmp_dir(cmph_config_t *mph, cmph_uint8 *tmp_dir);
void brz_config_set_mphf_fd(cmph_config_t *mph, FILE *mphf_fd);
void brz_config_set_b(cmph_config_t *mph, cmph_uint8 b);
void brz_config_set_memory_availability(cmph_config_t *mph, cmph_uint32 memory_availability); void brz_config_set_memory_availability(cmph_config_t *mph, cmph_uint32 memory_availability);
void brz_config_destroy(cmph_config_t *mph); void brz_config_destroy(cmph_config_t *mph);
cmph_t *brz_new(cmph_config_t *mph, float c); cmph_t *brz_new(cmph_config_t *mph, float c);

View File

@ -13,8 +13,7 @@ struct __brz_data_t
cmph_uint32 k; // number of components cmph_uint32 k; // number of components
hash_state_t **h1; hash_state_t **h1;
hash_state_t **h2; hash_state_t **h2;
hash_state_t * h3; hash_state_t * h0;
cmph_uint8 * tmp_dir; // temporary directory
}; };
struct __brz_config_data_t struct __brz_config_data_t
@ -25,12 +24,14 @@ struct __brz_config_data_t
cmph_uint8 *size; // size[i] stores the number of edges represented by g[i][...]. cmph_uint8 *size; // size[i] stores the number of edges represented by g[i][...].
cmph_uint32 *offset; // offset[i] stores the sum: size[0] + size[1] + ... size[i-1]. cmph_uint32 *offset; // offset[i] stores the sum: size[0] + size[1] + ... size[i-1].
cmph_uint8 **g; // g function. cmph_uint8 **g; // g function.
cmph_uint8 b; // parameter b.
cmph_uint32 k; // number of components cmph_uint32 k; // number of components
hash_state_t **h1; hash_state_t **h1;
hash_state_t **h2; hash_state_t **h2;
hash_state_t * h3; hash_state_t * h0;
cmph_uint32 memory_availability; cmph_uint32 memory_availability;
cmph_uint8 * tmp_dir; // temporary directory cmph_uint8 * tmp_dir; // temporary directory
FILE * mphf_fd; // mphf file
}; };
#endif #endif

View File

@ -49,7 +49,8 @@ static int key_nlfile_read(void *data, char **key, cmph_uint32 *keylen)
static int key_vector_read(void *data, char **key, cmph_uint32 *keylen) static int key_vector_read(void *data, char **key, cmph_uint32 *keylen)
{ {
cmph_vector_t *cmph_vector = (cmph_vector_t *)data; /*
cmph_vector_t *cmph_vector = (cmph_vector_t *)data;
char **keys_vd = (char **)cmph_vector->vector; char **keys_vd = (char **)cmph_vector->vector;
if (keys_vd + cmph_vector->position == NULL) return -1; if (keys_vd + cmph_vector->position == NULL) return -1;
@ -57,7 +58,17 @@ static int key_vector_read(void *data, char **key, cmph_uint32 *keylen)
*key = (char *)malloc(*keylen + 1); *key = (char *)malloc(*keylen + 1);
strcpy(*key, *(keys_vd + cmph_vector->position)); strcpy(*key, *(keys_vd + cmph_vector->position));
cmph_vector->position = cmph_vector->position + 1; cmph_vector->position = cmph_vector->position + 1;
*/
cmph_vector_t *cmph_vector = (cmph_vector_t *)data;
char **keys_vd = (char **)cmph_vector->vector;
// if (keys_vd + cmph_vector->position == NULL) return -1;
*keylen = strlen(keys_vd[cmph_vector->position]);
*key = (char *)malloc(*keylen + 1);
strcpy(*key, keys_vd[cmph_vector->position]);
cmph_vector->position = cmph_vector->position + 1;
return *keylen; return *keylen;
} }
@ -68,7 +79,7 @@ static void key_nlfile_dispose(void *data, char *key, cmph_uint32 keylen)
static void key_vector_dispose(void *data, char *key, cmph_uint32 keylen) static void key_vector_dispose(void *data, char *key, cmph_uint32 keylen)
{ {
key_nlfile_dispose(data, key, keylen); free(key);
} }
static void key_nlfile_rewind(void *data) static void key_nlfile_rewind(void *data)
@ -236,7 +247,43 @@ void cmph_config_set_tmp_dir(cmph_config_t *mph, cmph_uint8 *tmp_dir)
default: default:
assert(0); assert(0);
} }
}
void cmph_config_set_mphf_fd(cmph_config_t *mph, FILE *mphf_fd)
{
switch (mph->algo)
{
case CMPH_CHM:
break;
case CMPH_BMZ: /* included -- Fabiano */
break;
case CMPH_BMZ8: /* included -- Fabiano */
break;
case CMPH_BRZ: /* included -- Fabiano */
brz_config_set_mphf_fd(mph, mphf_fd);
break;
default:
assert(0);
}
}
void cmph_config_set_b(cmph_config_t *mph, cmph_uint8 b)
{
switch (mph->algo)
{
case CMPH_CHM:
break;
case CMPH_BMZ: /* included -- Fabiano */
break;
case CMPH_BMZ8: /* included -- Fabiano */
break;
case CMPH_BRZ: /* included -- Fabiano */
brz_config_set_b(mph, b);
break;
default:
assert(0);
}
} }
void cmph_config_set_memory_availability(cmph_config_t *mph, cmph_uint32 memory_availability) void cmph_config_set_memory_availability(cmph_config_t *mph, cmph_uint32 memory_availability)

View File

@ -41,6 +41,8 @@ void cmph_config_set_verbosity(cmph_config_t *mph, cmph_uint32 verbosity);
void cmph_config_set_graphsize(cmph_config_t *mph, float c); void cmph_config_set_graphsize(cmph_config_t *mph, float c);
void cmph_config_set_algo(cmph_config_t *mph, CMPH_ALGO algo); void cmph_config_set_algo(cmph_config_t *mph, CMPH_ALGO algo);
void cmph_config_set_tmp_dir(cmph_config_t *mph, cmph_uint8 *tmp_dir); void cmph_config_set_tmp_dir(cmph_config_t *mph, cmph_uint8 *tmp_dir);
void cmph_config_set_mphf_fd(cmph_config_t *mph, FILE *mphf_fd);
void cmph_config_set_b(cmph_config_t *mph, cmph_uint8 b);
void cmph_config_set_memory_availability(cmph_config_t *mph, cmph_uint32 memory_availability); void cmph_config_set_memory_availability(cmph_config_t *mph, cmph_uint32 memory_availability);
void cmph_config_destroy(cmph_config_t *mph); void cmph_config_destroy(cmph_config_t *mph);

View File

@ -89,9 +89,6 @@ jenkins_state_t *jenkins_state_new(cmph_uint32 size) //size of hash table
jenkins_state_t *state = (jenkins_state_t *)malloc(sizeof(jenkins_state_t)); jenkins_state_t *state = (jenkins_state_t *)malloc(sizeof(jenkins_state_t));
DEBUGP("Initializing jenkins hash\n"); DEBUGP("Initializing jenkins hash\n");
state->seed = rand() % size; state->seed = rand() % size;
state->nbits = (cmph_uint32)ceil(log(size)/M_LOG2E);
state->size = size;
DEBUGP("Initialized jenkins with size %u, nbits %u and seed %u\n", size, state->nbits, state->seed);
return state; return state;
} }
void jenkins_state_destroy(jenkins_state_t *state) void jenkins_state_destroy(jenkins_state_t *state)
@ -164,7 +161,7 @@ cmph_uint32 jenkins_hash(jenkins_state_t *state, const char *k, cmph_uint32 keyl
void jenkins_state_dump(jenkins_state_t *state, char **buf, cmph_uint32 *buflen) void jenkins_state_dump(jenkins_state_t *state, char **buf, cmph_uint32 *buflen)
{ {
*buflen = sizeof(cmph_uint32)*3; *buflen = sizeof(cmph_uint32);
*buf = malloc(*buflen); *buf = malloc(*buflen);
if (!*buf) if (!*buf)
{ {
@ -172,10 +169,7 @@ void jenkins_state_dump(jenkins_state_t *state, char **buf, cmph_uint32 *buflen)
return; return;
} }
memcpy(*buf, &(state->seed), sizeof(cmph_uint32)); memcpy(*buf, &(state->seed), sizeof(cmph_uint32));
memcpy(*buf + sizeof(cmph_uint32), &(state->nbits), sizeof(cmph_uint32));
memcpy(*buf + sizeof(cmph_uint32)*2, &(state->size), sizeof(cmph_uint32));
DEBUGP("Dumped jenkins state with seed %u\n", state->seed); DEBUGP("Dumped jenkins state with seed %u\n", state->seed);
return; return;
} }
@ -184,8 +178,6 @@ jenkins_state_t *jenkins_state_copy(jenkins_state_t *src_state)
jenkins_state_t *dest_state = (jenkins_state_t *)malloc(sizeof(jenkins_state_t)); jenkins_state_t *dest_state = (jenkins_state_t *)malloc(sizeof(jenkins_state_t));
dest_state->hashfunc = src_state->hashfunc; dest_state->hashfunc = src_state->hashfunc;
dest_state->seed = src_state->seed; dest_state->seed = src_state->seed;
dest_state->nbits = src_state->nbits;
dest_state->size = src_state->size;
return dest_state; return dest_state;
} }
@ -193,8 +185,6 @@ jenkins_state_t *jenkins_state_load(const char *buf, cmph_uint32 buflen)
{ {
jenkins_state_t *state = (jenkins_state_t *)malloc(sizeof(jenkins_state_t)); jenkins_state_t *state = (jenkins_state_t *)malloc(sizeof(jenkins_state_t));
state->seed = *(cmph_uint32 *)buf; state->seed = *(cmph_uint32 *)buf;
state->nbits = *(((cmph_uint32 *)buf) + 1);
state->size = *(((cmph_uint32 *)buf) + 2);
state->hashfunc = CMPH_HASH_JENKINS; state->hashfunc = CMPH_HASH_JENKINS;
DEBUGP("Loaded jenkins state with seed %u\n", state->seed); DEBUGP("Loaded jenkins state with seed %u\n", state->seed);
return state; return state;

View File

@ -7,8 +7,6 @@ typedef struct __jenkins_state_t
{ {
CMPH_HASH hashfunc; CMPH_HASH hashfunc;
cmph_uint32 seed; cmph_uint32 seed;
cmph_uint32 nbits;
cmph_uint32 size;
} jenkins_state_t; } jenkins_state_t;
jenkins_state_t *jenkins_state_new(cmph_uint32 size); //size of hash table jenkins_state_t *jenkins_state_new(cmph_uint32 size); //size of hash table

View File

@ -22,12 +22,12 @@
void usage(const char *prg) void usage(const char *prg)
{ {
fprintf(stderr, "usage: %s [-v] [-h] [-V] [-k nkeys] [-f hash_function] [-g [-c value][-s seed] ] [-a algorithm] [-M memory_in_MB] [-d tmp_dir] [-m file.mph] keysfile\n", prg); fprintf(stderr, "usage: %s [-v] [-h] [-V] [-k nkeys] [-f hash_function] [-g [-c value][-s seed] ] [-a algorithm] [-M memory_in_MB] [-b BRZ_parameter] [-d tmp_dir] [-m file.mph] keysfile\n", prg);
} }
void usage_long(const char *prg) void usage_long(const char *prg)
{ {
cmph_uint32 i; cmph_uint32 i;
fprintf(stderr, "usage: %s [-v] [-h] [-V] [-k nkeys] [-f hash_function] [-g [-c value][-s seed] ] [-a algorithm] [-M memory_in_MB] [-d tmp_dir] [-m file.mph] keysfile\n", prg); fprintf(stderr, "usage: %s [-v] [-h] [-V] [-k nkeys] [-f hash_function] [-g [-c value][-s seed] ] [-a algorithm] [-M memory_in_MB] [-b BRZ_parameter] [-d tmp_dir] [-m file.mph] keysfile\n", prg);
fprintf(stderr, "Minimum perfect hashing tool\n\n"); fprintf(stderr, "Minimum perfect hashing tool\n\n");
fprintf(stderr, " -h\t print this help message\n"); fprintf(stderr, " -h\t print this help message\n");
fprintf(stderr, " -c\t c value that determines the number of vertices in the graph\n"); fprintf(stderr, " -c\t c value that determines the number of vertices in the graph\n");
@ -43,13 +43,13 @@ void usage_long(const char *prg)
fprintf(stderr, " -m\t minimum perfect hash function file \n"); fprintf(stderr, " -m\t minimum perfect hash function file \n");
fprintf(stderr, " -M\t main memory availability (in MB)\n"); fprintf(stderr, " -M\t main memory availability (in MB)\n");
fprintf(stderr, " -d\t temporary directory used in brz algorithm \n"); fprintf(stderr, " -d\t temporary directory used in brz algorithm \n");
fprintf(stderr, " -b\t parmeter of BRZ algorithm to make the maximal number of keys in a bucket lower than 256\n");
fprintf(stderr, " keysfile\t line separated file with keys\n"); fprintf(stderr, " keysfile\t line separated file with keys\n");
} }
int main(int argc, char **argv) int main(int argc, char **argv)
{ {
char verbosity = 0; char verbosity = 0;
char generate = 0; char generate = 0;
char *mphf_file = NULL; char *mphf_file = NULL;
FILE *mphf_fd = stdout; FILE *mphf_fd = stdout;
@ -67,9 +67,10 @@ int main(int argc, char **argv)
cmph_uint8 * tmp_dir = NULL; cmph_uint8 * tmp_dir = NULL;
cmph_io_adapter_t *source; cmph_io_adapter_t *source;
cmph_uint32 memory_availability = 0; cmph_uint32 memory_availability = 0;
cmph_uint32 b = 128;
while (1) while (1)
{ {
char ch = getopt(argc, argv, "hVvgc:k:a:M:f:m:d:s:"); char ch = getopt(argc, argv, "hVvgc:k:a:M:b:f:m:d:s:");
if (ch == -1) break; if (ch == -1) break;
switch (ch) switch (ch)
{ {
@ -122,6 +123,16 @@ int main(int argc, char **argv)
} }
} }
break; break;
case 'b':
{
char *cptr;
b = strtoul(optarg, &cptr, 10);
if(*cptr != 0) {
fprintf(stderr, "Parameter b was not found: %s\n", optarg);
exit(1);
}
}
break;
case 'v': case 'v':
++verbosity; ++verbosity;
break; break;
@ -184,9 +195,9 @@ int main(int argc, char **argv)
return 1; return 1;
} }
keys_file = argv[optind]; keys_file = argv[optind];
if (seed == UINT_MAX) seed = (cmph_uint32)time(NULL); if (seed == UINT_MAX) seed = (cmph_uint32)time(NULL);
srand(seed); srand(seed);
int ret = 0; int ret = 0;
if (mphf_file == NULL) if (mphf_file == NULL)
{ {
@ -196,6 +207,7 @@ int main(int argc, char **argv)
} }
keys_fd = fopen(keys_file, "r"); keys_fd = fopen(keys_file, "r");
if (keys_fd == NULL) if (keys_fd == NULL)
{ {
fprintf(stderr, "Unable to open file %s: %s\n", keys_file, strerror(errno)); fprintf(stderr, "Unable to open file %s: %s\n", keys_file, strerror(errno));
@ -209,33 +221,35 @@ int main(int argc, char **argv)
if (generate) if (generate)
{ {
//Create mphf //Create mphf
mphf_fd = fopen(mphf_file, "w");
config = cmph_config_new(source); config = cmph_config_new(source);
cmph_config_set_algo(config, mph_algo); cmph_config_set_algo(config, mph_algo);
if (nhashes) cmph_config_set_hashfuncs(config, hashes); if (nhashes) cmph_config_set_hashfuncs(config, hashes);
cmph_config_set_verbosity(config, verbosity); cmph_config_set_verbosity(config, verbosity);
cmph_config_set_tmp_dir(config, tmp_dir); cmph_config_set_tmp_dir(config, tmp_dir);
cmph_config_set_mphf_fd(config, mphf_fd);
cmph_config_set_memory_availability(config, memory_availability); cmph_config_set_memory_availability(config, memory_availability);
cmph_config_set_b(config, b);
if(mph_algo == CMPH_BMZ && c >= 2.0) c=1.15; if(mph_algo == CMPH_BMZ && c >= 2.0) c=1.15;
if (c != 0) cmph_config_set_graphsize(config, c); if (c != 0) cmph_config_set_graphsize(config, c);
mphf = cmph_new(config); mphf = cmph_new(config);
cmph_config_destroy(config);
if (mphf == NULL) if (mphf == NULL)
{ {
fprintf(stderr, "Unable to create minimum perfect hashing function\n"); fprintf(stderr, "Unable to create minimum perfect hashing function\n");
cmph_config_destroy(config); //cmph_config_destroy(config);
free(mphf_file); free(mphf_file);
return -1; return -1;
} }
mphf_fd = fopen(mphf_file, "w");
if (mphf_fd == NULL) if (mphf_fd == NULL)
{ {
fprintf(stderr, "Unable to open output file %s: %s\n", mphf_file, strerror(errno)); fprintf(stderr, "Unable to open output file %s: %s\n", mphf_file, strerror(errno));
free(mphf_file); free(mphf_file);
return -1; return -1;
} }
cmph_dump(mphf, mphf_fd); cmph_dump(mphf, mphf_fd);
cmph_destroy(mphf); cmph_destroy(mphf);
fclose(mphf_fd); fclose(mphf_fd);
} }
else else
@ -289,6 +303,7 @@ int main(int argc, char **argv)
fclose(keys_fd); fclose(keys_fd);
free(mphf_file); free(mphf_file);
free(tmp_dir); free(tmp_dir);
free(source); cmph_io_nlfile_adapter_destroy(source);
return ret; return ret;
} }

View File

@ -46,5 +46,5 @@ void vqueue_print(vqueue_t * q)
void vqueue_destroy(vqueue_t *q) void vqueue_destroy(vqueue_t *q)
{ {
free(q->values); q->values = NULL; free(q->values); q->values = NULL; free(q);
} }

View File

@ -8,96 +8,79 @@ Single
-2 -2
1200 2 1200 2
0 32 #bebebe 0 32 #bebebe
6 3285 3600 3555 4230 6 2025 3015 3555 3690
6 3285 3780 3555 4230
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3285 4140 3555 4140 3555 4230 3285 4230 3285 4140
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3285 4050 3555 4050 3555 4140 3285 4140 3285 4050
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3285 3960 3555 3960 3555 4050 3285 4050 3285 3960
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3285 3870 3555 3870 3555 3960 3285 3960 3285 3870
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3285 3780 3555 3780 3555 3870 3285 3870 3285 3780
-6
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3285 3690 3555 3690 3555 3780 3285 3780 3285 3690
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3285 3600 3555 3600 3555 3690 3285 3690 3285 3600
-6
6 1800 4500 3330 5175
2 3 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 8 2 3 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 8
1800 4770 2070 4770 2070 4500 3060 4500 3060 4770 3330 4770 2025 3285 2295 3285 2295 3015 3285 3015 3285 3285 3555 3285
2565 5175 1800 4770 2790 3690 2025 3285
4 0 0 50 -1 0 10 0.0000 4 150 600 2265 4867 Spreading\001 4 0 0 50 -1 0 10 0.0000 4 135 765 2385 3330 Partitioning\001
-6 -6
6 2250 3060 2880 3600 6 1890 3735 3780 4365
6 2250 3060 2880 3600 6 2430 3735 2700 4365
6 2250 3060 2880 3600 6 2430 3915 2700 4365
6 2250 3060 2880 3600 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2 3 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 8 2430 4275 2700 4275 2700 4365 2430 4365 2430 4275
2250 3330 2430 3330 2430 3060 2700 3060 2700 3330 2880 3330 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2565 3600 2250 3330 2430 4185 2700 4185 2700 4275 2430 4275 2430 4185
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2430 4095 2700 4095 2700 4185 2430 4185 2430 4095
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2430 4005 2700 4005 2700 4095 2430 4095 2430 4005
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2430 3915 2700 3915 2700 4005 2430 4005 2430 3915
-6 -6
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2430 3825 2700 3825 2700 3915 2430 3915 2430 3825
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2430 3735 2700 3735 2700 3825 2430 3825 2430 3735
-6 -6
4 0 0 50 -1 0 10 0.0000 4 105 75 2521 3382 h\001 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
-6 1890 4275 2160 4275 2160 4365 1890 4365 1890 4275
4 0 0 50 -1 0 6 0.0000 4 60 45 2589 3419 1\001 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
-6 1890 4185 2160 4185 2160 4275 1890 4275 1890 4185
6 1395 2655 3825 2970 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2 4 0 1 0 7 50 -1 -1 0.000 0 0 7 0 0 5 2160 4275 2430 4275 2430 4365 2160 4365 2160 4275
3825 2970 3825 2655 1395 2655 1395 2970 3825 2970 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
4 0 0 50 -1 0 10 0.0000 4 135 795 2212 2850 Set of Keys S\001 2160 4185 2430 4185 2430 4275 2160 4275 2160 4185
-6 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2160 4095 2430 4095 2430 4185 2160 4185 2160 4095
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2160 4005 2430 4005 2430 4095 2160 4095 2160 4005
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2160 3915 2430 3915 2430 4005 2160 4005 2160 3915
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2700 4275 2970 4275 2970 4365 2700 4365 2700 4275
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2700 4185 2970 4185 2970 4275 2700 4275 2700 4185
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2700 4095 2970 4095 2970 4185 2700 4185 2700 4095
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2700 4005 2970 4005 2970 4095 2700 4095 2700 4005
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2160 3825 2430 3825 2430 3915 2160 3915 2160 3825
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3240 4275 3510 4275 3510 4365 3240 4365 3240 4275
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3510 4275 3780 4275 3780 4365 3510 4365 3510 4275
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2970 4275 3240 4275 3240 4365 2970 4365 2970 4275
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3240 4185 3510 4185 3510 4275 3240 4275 3240 4185
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1890 4095 2160 4095 2160 4185 1890 4185 1890 4095
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3510 4185 3780 4185 3780 4275 3510 4275 3510 4185
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3240 4095 3510 4095 3510 4185 3240 4185 3240 4095
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3240 4005 3510 4005 3510 4095 3240 4095 3240 4005
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3240 3915 3510 3915 3510 4005 3240 4005 3240 3915
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
1395 4230 3825 4230 1890 4365 3780 4365
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1395 4140 1665 4140 1665 4230 1395 4230 1395 4140 2970 4185 3240 4185 3240 4275 2970 4275 2970 4185
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 -6
1395 4050 1665 4050 1665 4140 1395 4140 1395 4050 6 1260 5310 4230 5580
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1665 4140 1935 4140 1935 4230 1665 4230 1665 4140
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1665 4050 1935 4050 1935 4140 1665 4140 1665 4050
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1665 3960 1935 3960 1935 4050 1665 4050 1665 3960
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1665 3870 1935 3870 1935 3960 1665 3960 1665 3870
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1665 3780 1935 3780 1935 3870 1665 3870 1665 3780
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2205 4140 2475 4140 2475 4230 2205 4230 2205 4140
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2205 4050 2475 4050 2475 4140 2205 4140 2205 4050
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2205 3960 2475 3960 2475 4050 2205 4050 2205 3960
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2205 3870 2475 3870 2475 3960 2205 3960 2205 3870
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1665 3690 1935 3690 1935 3780 1665 3780 1665 3690
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2745 4140 3015 4140 3015 4230 2745 4230 2745 4140
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3015 4140 3285 4140 3285 4230 3015 4230 3015 4140
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2475 4140 2745 4140 2745 4230 2475 4230 2475 4140
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2745 4050 3015 4050 3015 4140 2745 4140 2745 4050
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1395 3960 1665 3960 1665 4050 1395 4050 1395 3960
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3555 4140 3825 4140 3825 4230 3555 4230 3555 4140
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3555 4050 3825 4050 3825 4140 3555 4140 3555 4050
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3015 4050 3285 4050 3285 4140 3015 4140 3015 4050
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2745 3960 3015 3960 3015 4050 2745 4050 2745 3960
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2745 3870 3015 3870 3015 3960 2745 3960 2745 3870
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2745 3780 3015 3780 3015 3870 2745 3870 2745 3780
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
1260 5400 4230 5400 1260 5400 4230 5400
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
@ -122,14 +105,49 @@ Single
3150 5310 3420 5310 3420 5400 3150 5400 3150 5310 3150 5310 3420 5310 3420 5400 3150 5400 3150 5310
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1260 5310 1530 5310 1530 5400 1260 5400 1260 5310 1260 5310 1530 5310 1530 5400 1260 5400 1260 5310
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3285 3510 3555 3510 3555 3600 3285 3600 3285 3510
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3285 3420 3555 3420 3555 3510 3285 3510 3285 3420
4 0 0 50 -1 0 10 0.0000 4 105 75 1485 4410 0\001
4 0 0 50 -1 0 10 0.0000 4 105 210 3600 4410 b-1\001
4 0 0 50 -1 0 10 0.0000 4 105 480 720 4050 Buckets\001
4 0 0 50 -1 0 10 0.0000 4 105 90 900 4230 B\001
4 0 0 50 -1 0 10 0.0000 4 105 210 4005 5580 n-1\001 4 0 0 50 -1 0 10 0.0000 4 105 210 4005 5580 n-1\001
4 0 0 50 -1 0 10 0.0000 4 105 75 1350 5580 0\001 4 0 0 50 -1 0 10 0.0000 4 105 75 1350 5580 0\001
4 0 0 50 -1 0 10 0.0000 4 105 690 450 5400 Hash Table\001 -6
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
1260 2925 4230 2925
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1530 2835 1800 2835 1800 2925 1530 2925 1530 2835
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2070 2835 2340 2835 2340 2925 2070 2925 2070 2835
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2340 2835 2610 2835 2610 2925 2340 2925 2340 2835
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2610 2835 2880 2835 2880 2925 2610 2925 2610 2835
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
2880 2835 3150 2835 3150 2925 2880 2925 2880 2835
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3420 2835 3690 2835 3690 2925 3420 2925 3420 2835
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3690 2835 3960 2835 3960 2925 3690 2925 3690 2835
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3960 2835 4230 2835 4230 2925 3960 2925 3960 2835
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1800 2835 2070 2835 2070 2925 1800 2925 1800 2835
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
3150 2835 3420 2835 3420 2925 3150 2925 3150 2835
2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
1260 2835 1530 2835 1530 2925 1260 2925 1260 2835
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
3510 4410 3510 4590
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
3510 4410 3600 4410
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
3690 4410 3780 4410
2 3 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 8
2025 4815 2295 4815 2295 4545 3285 4545 3285 4815 3555 4815
2790 5220 2025 4815
2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
3780 4410 3780 4590
4 0 0 50 -1 0 10 0.0000 4 135 585 2475 4860 Searching\001
4 0 0 50 -1 0 10 0.0000 4 105 75 1980 4545 0\001
4 0 0 50 -1 0 10 0.0000 4 105 690 4410 5400 Hash Table\001
4 0 0 50 -1 0 10 0.0000 4 105 480 4410 4230 Buckets\001
4 0 0 50 -1 0 10 0.0000 4 135 555 4410 2925 Key set S\001
4 0 0 50 -1 0 10 0.0000 4 105 75 1350 2745 0\001
4 0 0 50 -1 0 10 0.0000 4 105 210 4005 2745 n-1\001
4 0 0 50 -1 0 10 0.0000 4 105 420 3555 4545 n/b - 1\001