Pre-release comments.

2012-06-03 04:17:14 -03:00 · 2012-06-03 04:17:14 -03:00 · cc42ab3b74
commit cc42ab3b74
parent 0d7a176458
4 changed files with 41 additions and 13 deletions
--- a/14
+++ b/14
@ -84,6 +84,18 @@ The CMPH Library encapsulates the newest and more efficient algorithms in an eas
 ----------------------------------------


+	News for version 2.0
+	====================
+
+Cleaned up most warnings for the c code.
+
+Experimental C++ interface (--enable-cxxmph) implementing the BDZ algorithm in
+a convenient SimpleMPHIndex interface, which serves as the basis
+for drop-in replacements for std::unordered_map, sparsehash::sparse_hash_map
+and sparsehash::dense_hash_map. Faster lookup time at the expense of insertion
+time. See cxxmpph/mph_map.h and cxxmph/mph_index.h for details.
+
+
 	News for version 1.1
 	====================

@ -310,5 +322,5 @@ Fabiano Cupertino Botelho (fc_botelho@users.sourceforge.net)

 Nivio Ziviani (nivio@dcc.ufmg.br)

-Last Updated: Fri Jun  1 19:04:40 2012
+Last Updated: Sun Jun  3 04:09:55 2012

--- a/README.t2t
+++ b/README.t2t
@ -88,6 +88,16 @@ The CMPH Library encapsulates the newest and more efficient algorithms in an eas

 ----------------------------------------

+==News for version 2.0==
+
+Cleaned up most warnings for the c code.
+
+Experimental C++ interface (--enable-cxxmph) implementing the BDZ algorithm in
+a convenient interface, which serves as the basis
+for drop-in replacements for std::unordered_map, sparsehash::sparse_hash_map
+and sparsehash::dense_hash_map. Potentially faster lookup time at the expense
+of insertion time. See cxxmpph/mph_map.h and cxxmph/mph_index.h for details.
+
 ==News for version 1.1==

 Fixed a bug in the chd_pc algorithm and reorganized tests.
--- a/cxxmph/mph_index.h
+++ b/cxxmph/mph_index.h
@ -10,16 +10,12 @@
 // This is a pretty uncommon data structure, and if you application has a real
 // use case for it, chances are that it is a real win. If all you are doing is
 // a straightforward implementation of an in-memory associative mapping data
-// structure, then it will probably be slower, since that the
-// evaluation of index() is typically slower than the total cost of running a
-// traditional hash function over a key and doing 2-3 conflict resolutions on
-// 100byte-ish strings. If you still want to do, take a look at mph_map.h
+// structure, then it will probably be slower. Take a look at mph_map.h
 // instead.
 //
 // Thesis presenting this and similar algorithms:
 // http://homepages.dcc.ufmg.br/~fbotelho/en/talks/thesis2008/thesis.pdf
 //
-//
 // Notes:
 //
 // Most users can use the SimpleMPHIndex wrapper instead of the MPHIndex which
--- a/cxxmph/mph_map.h
+++ b/cxxmph/mph_map.h
@ -3,15 +3,25 @@
 // Implementation of the unordered associative mapping interface using a
 // minimal perfect hash function.
 //
-// This class not necessarily faster than unordered_map (or ext/hash_map).
-// Benchmark your code before using it. If you do not call rehash() before
-// starting your reads, it will be very likely slower than unordered_map.
+// Since these are header-mostly libraries, make sure you compile your code
+// with -DNDEBUG and -O3. The code requires a modern C++11 compiler.
+//
+// The container comes in 3 flavors, all in the cxxmph namespace and drop-in
+// replacement for the popular classes with the same names.
+// * dense_hash_map
+//    -> fast, uses more memory, 2.93 bits per bucket, ~50% occupation
+// * unordered_map (aliases:  hash_map, mph_map)
+//    -> middle ground, uses 2.93 bits per bucket, ~81% occupation
+// * sparse_hash_map -> slower, uses 3.6 bits per bucket
+//    -> less fast, uses 3.6 bits per bucket, 100% occupation
+//
+// Those classes are not necessarily faster than their existing counterparts.
+// Benchmark your code before using it. The larger the key, the larger the
+// number of elements inserted, and the bigger the number of failed searches,
+// the more likely those classes will outperform existing code.
 //
 // For large sets of urls (>100k), which are a somewhat expensive to compare, I
-// found this class to be about 10%-30% faster than unordered_map.
-//
-// The space overhead of this map is 2.6 bits per bucket and it achieves 100%
-// occupation with a rehash call.
+// found those class to be about 10%-50% faster than unordered_map.

 #include <algorithm>
 #include <iostream>