Commit Graph

36 Commits

Author SHA1 Message Date
Matthias Sohn 0db0476542 Fire IndexChangedEvent on DirCache.commit()
Since we replaced GitIndex by DirCache JGit didn't fire
IndexChangedEvents anymore. For EGit this still worked with a high
latency since its RepositoryChangeScanner which is scheduled to
run each 10 seconds fires the event in case the index changes.
This scanner is meant to detect index changes induced by a different
process e.g. by calling "git add" from native git.

When the index is changed from within the same process we should fire
the event synchronously. Compare the index checksum on write to index
checksum when index was read earlier to determine if index really
changed. Use IndexChangedListener interface to keep DirCache decoupled
from Repository.

Change-Id: Id4311f7a7859ffe8738863b3d86c83c8b5f513af
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-09-30 00:00:22 +02:00
Matthias Sohn 19a366d532 Prepare 1.2.0 builds
Change-Id: I9ec247135d93ef28d732e94f18d0ec1d0e2e6d44
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-09-15 22:51:46 +02:00
Matthias Sohn 57d6585522 Prepare post v1.1.0.201109151100-r build
Change-Id: Ib099ec93d8243b238641d79328216874532ab5eb
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-09-15 21:51:23 +02:00
Matthias Sohn 1cb0510cee JGit v1.1.0.201109151100-r
Change-Id: Iadcec7e5973600e005cbdeb837fa197d3ae2ea86
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-09-15 17:32:58 +02:00
Matthias Sohn b09d21b6eb Prepare post v1.1.0.201109071825-rc3 builds
Change-Id: I1244f6639263d156a6f9e4530167e5eb1826a535
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-09-08 01:50:41 +02:00
Matthias Sohn 75611a8314 JGit v1.1.0.201109071825-rc3
Change-Id: I1b989d3101272632eacabe25a0b111ad0ff5bb3b
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-09-08 00:54:27 +02:00
Matthias Sohn cfdb09e9db Use commit message best practices for Mylyn Commit template
We should use a template for Mylyn commit messages that matches with our
guidelines for commit messages.

http://wiki.eclipse.org/EGit/Contributor_Guide#Commit_message_guidelines

Bug: 337401
Change-Id: I05812abf0eb0651d22c439142640f173fc2f2ba0
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-09-05 23:57:21 +02:00
Matthias Sohn df117d3da9 Prepare post-v1.1.0.201109011030-rc2 builds
Change-Id: I8dda83cdbe88beba4a480df9846848bf3aceb9e2
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-09-01 17:36:10 +02:00
Matthias Sohn 384ffa7ee9 JGit v1.1.0.201109011030-rc2
Change-Id: Ie6d65fe45ad92c813ce3a227729aa43681922249
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-09-01 16:38:13 +02:00
Kevin Sawicki e54404d555 Reassign symbolic ref list after calling put.
This is required since RefList.put returns a new RefList.

Change-Id: I717d75d6f6154a6e0dc7cde3b72b0a59c68d955c
Signed-off-by: Kevin Sawicki <kevin@github.com>
2011-08-24 13:22:05 -07:00
Shawn O. Pearce d34ec12019 DHT: Change DhtReadher caches to be dynamic by workload
Instead of fixing the prefetch queue and recent chunk queue as
different sizes, allow these to share the same limit but be scaled
based on the work being performed.

During walks about 20% of the space will be given to the prefetcher,
and the other 80% will be used by the recent chunks cache. This
should improve cases where there is bad locality between chunks.

During writing of a pack stream, 90-100% of the space should be
made available to the prefetcher, as the prefetch plan is usually
very accurate about the order chunks will be needed in.

Change-Id: I1ca7acb4518e66eb9d4138fb753df38e7254704d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2011-06-09 19:10:15 -07:00
Shawn O. Pearce 1e6b02643c DHT: Use a proper HashMap for RecentChunk lookups
A linear search is somewhat acceptable for only 4 recent chunks, but
a HashMap based lookup would be better. The table will have 16 slots
by default and given the hashCode() of ChunkKey is derived from the
SHA-1 of the chunk, each chunk will fall into its own bucket within
the table and thus evaluate only 1 entry during lookup instead of 4.

Some users may also want to devote more memory to the recent chunks,
in which case expanding this list to a longer length will help to
reduce chunk faults, but would increase search time. Using a HashMap
will help this code to scale to larger sizes better.

Change-Id: Ia41b7a1cc69ad27b85749e3b74cbf8d0aa338044
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2011-06-09 17:59:22 -07:00
Shawn O. Pearce 57853e4949 DHT: Always have at least one recent chunk in DhtReader
The RecentChunks cache assumes there is always at least one recent
chunk in the maxSize that it receives from the DhtReaderOptions.
Ensure that is true by requiring the size to be at least 1.

Running with 0 recent chunk cache is very a bad idea, often
during commit walking the parents of a commit will be found
on the same chunk as the commit that was just accessed. In
these cases its a good idea to keep that last chunk around
so the parents can be quickly accessed.

Change-Id: I33b65286e8a4cbf6ef4ced28c547837f173e065d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2011-06-09 17:55:52 -07:00
Shawn O. Pearce d00f527d65 DHT: Fix NPE during prefetch
The Prefetcher may have loaded a chunk that is a fragment, if the
DhtReader is scanning the Prefetcher's chunks for a particular
object fragment chunks will be missing the index and NPE during
the findOffset() call into the index itself.

Change-Id: Ie2823724c289f745655076c5209acec32361a1ea
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2011-06-09 17:29:46 -07:00
Shawn O. Pearce 0e1d5ad8f8 DHT: Drop leading hash digits from row keys
Originally I put the first two digits of the object SHA-1 into the
start of a row key to try and spread the load of objects around a DHT
service. Unfortunately this tends to not work as well as I had hoped.

Servers reading a repository need to contact every node in a DHT
cluster if the cluster tries to evenly distribute the object rows.
This is a lot of connections, especially if the cluster has many
backend storage servers.  If the library has an open connection
limit (possibly due to JVM file descriptor limitations) it may need
to open and close a lot of connections to access a repository,
rather than being able to reuse the same connection to a handful
of backend servers.  This results in a lot of connection thrashing
for some DHT type databases, and is inefficient.

Some DHTs are able to operate even if part of the database space
is currently unavailable.  For example, a DHT service might assign
some section of the key space to a node, and then fail that section
over to another node when the primary is noticed as being offline.
During that failover period that section of the key space is not
available, but other sections hosted by other backends are still
ready for service. Spreading keys all over the cluster makes it
likely that any single backend being temporarily down means the
entire cluster is down, rather than only some.

This is a massive schema change, but it should improve relability
and performance for any DHT system.

Change-Id: I6b65bfb4c14b6f7bd323c2bd0638b49d429245be
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2011-06-09 17:29:46 -07:00
Matthias Sohn 0ab7be9681 Merge branch 'stable-1.0'
* stable-1.0:
  Prepare post JGit v1.0.0.201106090707-r builds
  JGit v1.0.0.201106090707-r
  Include about.html files in maven build
  Prepare post v1.0.0.201106081625-r builds
  JGit v1.0.0.201106081625-r
  Add missing about.html files to all shipped bundles
  Prepare post v1.0.0.201106071701-r builds
  JGit v1.0.0.201106071701-r
2011-06-09 17:41:16 +02:00
Matthias Sohn 6646c72d17 Prepare post JGit v1.0.0.201106090707-r builds
Change-Id: I35292f9f6fb5ebc591308fdd2d069203413e189d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-09 14:11:23 +02:00
Matthias Sohn b26ff6ebd6 JGit v1.0.0.201106090707-r
Change-Id: Iba44e71b6441a0e39122ca8666b51989e605f25f
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-09 13:11:58 +02:00
Matthias Sohn e1af16ad99 Include about.html files in maven build
Change-Id: Ifa96090eb0fc336ee8080385f48212b5158dd9f7
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-09 11:08:07 +02:00
Matthias Sohn 22df55c8b3 Prepare post v1.0.0.201106081625-r builds
Change-Id: I5e6994844405f7839ad3b3439f98bcadb59d329b
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-09 11:08:07 +02:00
Matthias Sohn eacd7104a2 JGit v1.0.0.201106081625-r
Change-Id: I629990189083bab4737938ad712080fba7917582
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-08 22:42:20 +02:00
Matthias Sohn 8c5f403c0c Add missing about.html files to all shipped bundles
Change-Id: I5a4ad9493da3816f21d9fdd0b5b977388d074500
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-08 21:51:51 +02:00
Matthias Sohn 9c67a391f1 Prepare post v1.0.0.201106071701-r builds
Change-Id: I67ee2912ef54462cf860dc4ec0a6334e9c619384
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-08 16:32:01 +02:00
Matthias Sohn ac71f9045a JGit v1.0.0.201106071701-r
Change-Id: Ic8f49336ba96c8dcf4bab2f74c0f1efc1ab55131
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-07 23:04:55 +02:00
Matthias Sohn f1713abcdc Prepare 1.1.0 builds
Change-Id: I4cf017cd567543846839612ab3ace6d26233e01d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-06 01:24:32 +02:00
Matthias Sohn 4a4e1f764c Prepare post v1.0.0.201106051725-r builds
Change-Id: I4839877e1a6fa7782f37423213af8d579727a494
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-06 01:17:16 +02:00
Matthias Sohn f65513f753 JGit v1.0.0.201106051725-r
Change-Id: I39f4a23cf284505395d511dfedf02b7f5608df95
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-05 23:26:56 +02:00
Matthias Sohn ada903085d Prepare post v1.0.0.201106011211-rc3 builds
Change-Id: I4dec8eba7e35858aef65fcc10f91fad3fe5b52b9
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-01 18:55:11 +02:00
Matthias Sohn 81371d385b JGit v1.0.0.201106011211-rc3
Change-Id: I574a05200471c431b3a02ac6ff208dc6aa90f539
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-06-01 18:22:44 +02:00
Matthias Sohn f5f1536f3f Remove incubation marker
Change-Id: I6018ce0cd3b7c8137e137848fe1f04551b257538
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2011-05-31 22:53:53 +02:00
Shawn O. Pearce 50f236aff8 DHT: Support removing a repository name
The first step to deleting a repository from the DHT storage is to
remove the name binding in the RepositoryIndexTable, making the
repository unavailable for lookup.

Change-Id: I469bf92f4bf2f555a15949569b21937c14cb142b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2011-05-31 08:58:45 -07:00
Shawn O. Pearce 042a66fe8c DHT: Fix thread-safety issue in AbstractWriteBuffer
There is a data corruption issue with the 'running' list if a
background thread schedules something onto the buffer while the
application thread is also using it.

Change-Id: I5ba78b98b6632965d677a9c8f209f0cf8320cc3d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2011-05-31 08:58:45 -07:00
Shawn O. Pearce b8c508e54d DHT: Add sequence RefData
RefData now uses a sequence number as part of the field, ensuring
that updates always increase the sequence number by one whenever
a reference is modified.

Attaching a sequence number to RefData will help with storing
reference log entries during updates. As the sequence number should
be unique within the reference name space, log entries can be keyed
by the sequence number and remain unique.  Making this work over
reference delete-create cycles will require an additional RefTable
API to return the oldest sequence number previously used in the
reference log to seed the recreated reference.

Change-Id: I11cfff2a96ef962e57f29925a3eef41bdbf9f9bb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
2011-05-25 09:08:33 -05:00
Shawn O. Pearce 6ec6169215 DHT: Replace TinyProtobuf with Google Protocol Buffers
The standard Google distribution of Protocol Buffers in Java is better
maintained than TinyProtobuf, and should be faster for most uses.  It
does use slightly more memory due to many of our key types being
stored as strings in protobuf messages, but this is probably worth the
small hit to memory in exchange for better maintained code that is
easier to reuse in other applications.

Exposing all of our data members to the underlying implementation
makes it easier to develop reporting and data mining tools, or to
expand out a nested structure like RefData into a flat format in a SQL
database table.

Since the C++ `protoc` tool is necessary to convert the protobuf
script into Java code, the generated files are committed as part of
the source repository to make it easier for developers who do not have
this tool installed to still build the overall JGit package and make
use of it.  Reviewers will need to be careful to ensure that any edits
made to a *.proto file come in a commit that also updates the
generated code to match.

CQ: 5135
Change-Id: I53e11e82c186b9cf0d7b368e0276519e6a0b2893
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
2011-05-25 09:00:42 -05:00
Shawn O. Pearce 7cad0adc7d DHT: Remove per-process ChunkCache
Performance testing has indicated the per-process ChunkCache isn't
very effective for the DHT storage implementation.  If a server is
using the DHT storage backend, it is most likely part of a larger
cluster where requests are distributed in a round-robin fashion
between the member servers.

In such a scenario there is insufficient data locality between
requests to get a good hit ratio on the per-process ChunkCache.  A low
hit ratio means the cache is actually hurting performance by eating up
memory that could otherwise be used for transient request data, and
increasing pressure on the GC when it needs to find free space.

Remove all of the ChunkCache code.  Installations that want to cache
(to reduce database usage) should wrap their Database with a
CacheDatabase and use a network based CacheServer.

I left the ChunkCache in the original DHT storage commit because I
wanted to document in the history of the project that its probably
worth *not* having, but leave open a door for someone to revert this
change if they find otherwise at a later date.

Change-Id: I364d0725c46c5a19f7443642a40c89ba4d3fdd29
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
2011-05-25 08:50:30 -05:00
Shawn O. Pearce de8946c0c2 Store Git on any DHT
jgit.storage.dht is a storage provider implementation for JGit that
permits storing the Git repository in a distributed hashtable, NoSQL
system, or other database.  The actual underlying storage system is
undefined, and can be plugged in by implementing 7 small interfaces:

  *  Database
  *  RepositoryIndexTable
  *  RepositoryTable
  *  RefTable
  *  ChunkTable
  *  ObjectIndexTable
  *  WriteBuffer

The storage provider interface tries to assume very little about the
underlying storage system, and requires only three key features:

  *  key -> value lookup (a hashtable is suitable)
  *  atomic updates on single rows
  *  asynchronous operations (Java's ExecutorService is easy to use)

Most NoSQL database products offer all 3 of these features in their
clients, and so does any decent network based cache system like the
open source memcache product.  Relying only on key equality for data
retrevial makes it simple for the storage engine to distribute across
multiple machines.  Traditional SQL systems could also be used with a
JDBC based spi implementation.

Before submitting this change I have implemented six storage systems
for the spi layer:

  * Apache HBase[1]
  * Apache Cassandra[2]
  * Google Bigtable[3]
  * an in-memory implementation for unit testing
  * a JDBC implementation for SQL
  * a generic cache provider that can ride on top of memcache

All six systems came in with an spi layer around 1000 lines of code to
implement the above 7 interfaces.  This is a huge reduction in size
compared to prior attempts to implement a new JGit storage layer.  As
this package shows, a complete JGit storage implementation is more
than 17,000 lines of fairly complex code.

A simple cache is provided in storage.dht.spi.cache.  Implementers can
use CacheDatabase to wrap any other type of Database and perform fast
reads against a network based cache service, such as the open source
memcached[4].  An implementation of CacheService must be provided to
glue this spi onto the network cache.

[1] https://github.com/spearce/jgit_hbase
[2] https://github.com/spearce/jgit_cassandra
[3] http://labs.google.com/papers/bigtable.html
[4] http://memcached.org/

Change-Id: I0aa4072781f5ccc019ca421c036adff2c40c4295
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2011-05-05 10:21:12 -07:00