Commit Graph

2675 Commits

Author SHA1 Message Date
Shawn Pearce 437be8dfad Simplify UploadPack by parsing wants separately from haves
The DHT backend was very slow at parsing objects. To work around
that performance limitation I obfuscated UploadPack by folding both
the want and have sets together in a single parse queue. Since DHT
was removed the complexity is no longer constructive to JGit.

Doing this refactoring prepares the code for a slightly future
change where the have lines need to be handled specially from the
want lines. Splitting the parsing up into two phases makes such
a modification trivial.

Change-Id: If7aad533b82448bbb688278e21f709282e5ccf4b
2013-03-08 12:25:12 -08:00
Shawn Pearce ea5eef912a Cluster UNREACHABLE_GARBAGE packs at the end of the search list
Garbage is unlikely to be used by a reader. Ensure they always
cluster at the end of the search list, no matter what timestamp
was used on the pack files.

Change-Id: I3bed89e9569ee3363c36bb3f73fcd34057a3883f
2013-03-08 11:19:44 -08:00
Shawn Pearce bb002c619b Avoid repacking unreachable garbage in DfsGarbageCollector
If a repository has significant amounts of unreachable garbage the
final phase to coalesce it can take longer than any other part of the
garbage collection phase. Provide a setting for applications to tweak
the threshold where coalescing ends and files just remain on disk.

Change-Id: I5f11a998a7185c75ece3271d8bc6181bb83f54c1
2013-03-08 11:07:51 -08:00
Robin Stocker 3ee04e3531 Include the number of ms in timeout error message
Noticed that while analyzing bug 402131.

Change-Id: If3fd40b64d5088c4579946271a67346cbd9e6556
2013-03-08 18:00:19 +01:00
Robin Rosenberg 3ad454497c Do not cherry-pick merge commits during rebase
Rebase computes the list of commits that are included in
the merges, just like Git does, so do not try to include
the merge commits. Re-recreating merges during rebase is
a bit more complicated and might be a useful future extension,
but for now just linearize during rebase.

Change-Id: I61239d265f395e5ead580df2528e46393dc6bdbd
Signed-off-by: Robin Stocker <robin@nibor.org>
2013-03-08 16:40:19 +01:00
Robin Rosenberg 08d5ede281 Extend FileUtils.delete with option to delete empty directories only
The new option EMPTY_DIRECTORIES_ONLY will make delete() only delete
empty directories. Any attempt to delete files will fail. Can be
combined with RECURSIVE to wipe out entire tree structures and
IGNORE_ERRORS to silently ignore any files or non-empty directories.

Change-Id: Icaa9a30e5302ee5c0ba23daad11c7b93e26b7445
Signed-off-by: Robin Stocker <robin@nibor.org>
2013-03-08 16:26:10 +01:00
Matthias Sohn 13ea3b0957 Add javaewah bundle to features using it
This ensures that OSGi consumers can retrieve this dependency from the
JGit or EGit p2 repository.

Change-Id: I6f88a4914a19e4e18aa60d59b0cc8a33b61f7fc2
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-03-07 08:51:06 +01:00
Shawn Pearce 913cccd5c4 Do not attempt to read bitmap from invalid pack
If a pack file has been marked invalid due to a prior IOException
accessing its contents, do not offer its bitmap index to callers.
The pack cannot be used so its bitmap should be off limits from
any reader trying to work from a bitmap.

Change-Id: Ia44e46558abdddee560bb184158b1e0af9437eee
2013-03-06 12:48:25 -08:00
Shawn Pearce 88c962484f Rename DfsPackFile getBitmap method to match PackFile
There is no reason for these to differ in name. Match the
shorter name used by PackFile.

Change-Id: I2d3a299069acc5ce276b1b5439ff2258903c6ff3
2013-03-06 12:47:37 -08:00
Colby Ranger c660362768 Write the bitmap index correctly in DFS GC.
A bug caused the .bitmap to actually have the .idx contents.

Change-Id: I428bb27d419e8b1b69b6f3e2fd07cd29703669ad
2013-03-06 12:33:05 -08:00
Colby Ranger e6883dfe4b Enable writing bitmaps during GC by default.
Bitmaps provide a huge performance boost for counting objects and they
play nice with the cgit implementation.

Change-Id: I33b05a6c8f1ee2df7770f0b9fdc50d0b4bbf1029
2013-03-05 11:16:08 -08:00
Colby Ranger f82821728b Enable writing pack indexes with bitmaps in the GC.
Update the dfs and file GC implementations to prepare and write
bitmaps on the packs that contain the full closure of the object
graph. Update the DfsPackDescription to include the index version.

Change-Id: I3f1421e9cd90fe93e7e2ef2b8179ae2f1ba819ed
2013-03-05 11:15:19 -08:00
Colby Ranger 43ea887c8b Enable serving upload requests using bitmaps.
If the pack index has bitmaps, allow the PackWriter to use the bitmaps
for upload requests.

Change-Id: Iefa995fe927a11e4fd78afb34530995614221fc0
2013-03-05 11:14:48 -08:00
Colby Ranger dafcb8f6db Support creating pack bitmap indexes in PackWriter.
Update the PackWriter to support writing out pack bitmap indexes,
a parallel ".bitmap" file to the ".pack" file.
Bitmaps are selected at commits every 1 to 5,000 commits for
each unique path from the start. The most recent 100 commits are
all bitmapped. The next 19,000 commits have a bitmaps every 100
commits. The remaining commits have a bitmap every 5,000 commits.
Commits with more than 1 parent are prefered over ones
with 1 or less. Furthermore, previously computed bitmaps are reused,
if the previous entry had the reuse flag set, which is set when the
bitmap was placed at the max allowed distance.

Bitmaps are used to speed up the counting phase when packing, for
requests that are not shallow. The PackWriterBitmapWalker uses
a RevFilter to proactively mark commits with RevFlag.SEEN, when
they appear in a bitmap. The walker produces the full closure
of reachable ObjectIds, given the collection of starting ObjectIds.

For fetch request, two ObjectWalks are executed to compute the
ObjectIds reachable from the haves and from the wants. The
ObjectIds needed to be written are determined by taking all the
resulting wants AND NOT the haves.

For clone requests, we get cached pack support for "free" since
it is possible to determine if all of the ObjectIds in a pack file
are included in the resulting list of ObjectIds to write.

On my machine, the best times for clones and fetches of the linux
kernel repository (with about 2.6M objects and 300K commits) are
tabulated below:

Operation                   Index V2               Index VE003
Clone                       37530ms (524.06 MiB)     82ms (524.06 MiB)
Fetch (1 commit back)          75ms                 107ms
Fetch (10 commits back)       456ms (269.51 KiB)    341ms (265.19 KiB)
Fetch (100 commits back)      449ms (269.91 KiB)    337ms (267.28 KiB)
Fetch (1000 commits back)    2229ms ( 14.75 MiB)    189ms ( 14.42 MiB)
Fetch (10000 commits back)   2177ms ( 16.30 MiB)    254ms ( 15.88 MiB)
Fetch (100000 commits back) 14340ms (185.83 MiB)   1655ms (189.39 MiB)

Change-Id: Icdb0cdd66ff168917fb9ef17b96093990cc6a98d
2013-03-05 11:14:45 -08:00
Colby Ranger 3b325917a5 Added read/write support for pack bitmap index.
A pack bitmap index is an additional index of compressed
bitmaps of the object graph. Furthermore, a logical API of the index
functionality is included, as it is expected to be used by the
PackWriter.

Compressed bitmaps are created using the javaewah library, which is a
word-aligned compressed variant of the Java bitset class based on
run-length encoding. The library only works with positive integer
values. Thus, the maximum number of ObjectIds in a pack file that
this index can currently support is limited to Integer.MAX_VALUE.

Every ObjectId is given an integer mapping. The integer is the
position of the ObjectId in the complete ObjectId list, sorted
by offset, for the pack file. That integer is what the bitmaps
use to reference the ObjectId. Currently, the new index format can
only be used with pack files that contain a complete closure of the
object graph e.g. the result of a garbage collection.

The index file includes four bitmaps for the Git object types i.e.
commits, trees, blobs, and tags. In addition, a collection of
bitmaps keyed by an ObjectId is also included. The bitmap for each entry
in the collection represents the full closure of ObjectIds reachable
from the keyed ObjectId (including the keyed ObjectId itself). The
bitmaps are further compressed by XORing the current bitmaps against
prior bitmaps in the index, and selecting the smallest representation.
The XOR'd bitmap and offset from the current entry to the position
of the bitmap to XOR against is the actual representation of the entry
in the index file. Each entry contains one byte, which is currently
used to note whether the bitmap should be blindly reused.

Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
2013-03-05 11:09:44 -08:00
Shawn Pearce 234b4e0432 Merge "Break the dependency on RevObject when creating a newObjectToPack()." 2013-03-04 20:12:35 -05:00
Shawn Pearce 374406ac46 Merge "Fix RefUpdate performance for existing Refs" 2013-03-04 19:11:08 -05:00
Shawn Pearce 22625cd1d8 Merge "Fix corrupted CloneCommand bare-repo fetch-refspec (#402031)" 2013-03-04 17:53:02 -05:00
Colby Ranger be7a135e94 Break the dependency on RevObject when creating a newObjectToPack().
Update the ObjectReuseAsIs API to support creating new
ObjectToPack with only the AnyObjectId and Git object type. This is
needed to support the future pack index bitmaps, which only contain
this information and do not want the overhead of creating a temporary
object for every ObjectId.

Change-Id: I906360b471412688bf429ecef74fd988f47875dc
2013-03-04 14:43:22 -08:00
Colby Ranger 8d4f227c13 Merge "Remove the unused method PackFile.hasExt()." 2013-03-04 17:30:27 -05:00
Colby Ranger 1512d0ab4e Remove the unused method PackFile.hasExt().
It will be used in a future change, so just include it with that change.

Change-Id: I7db28d86f8e8b282a403acd9a4c4defaae828f94
2013-03-04 14:16:36 -08:00
Roberto Tyley a46b042905 Fix corrupted CloneCommand bare-repo fetch-refspec (#402031)
CloneCommand has been creating fetch refspecs like this on bare clones:

[remote "origin"]
        url = ssh://example.com/my-repo.git
        fetch = +refs/heads/*:refs/heads//*

As you can see, the destination ref pattern has a superfluous slash.

It looks like this behaviour has always been the case for CloneCommand,
at least since cc2197ed when code catering to bare-clone fetch refspecs
was added. That was released with JGit v1.0 almost 2 years ago, so
there will probably be some bare repos in the wild which will have been
cloned with JGit and have these corrupted refspecs.

The effect of the corrupted fetch refspec is quite interesting. Up to
and including JGit 2.0, the corrupt refspec was tolerated and fetches
would work as intended with no indication to the user that anything was
amiss. With JGit 2.1, a change was introduced which made JGit less
tolerant, and fetches now attempt to update the non-existing ref
"refs/heads//master". No exception is raised, but the real ref -
"refs/heads/master" - is not updated.

This behaviour was noticed by a user of Agit (which does bare clones by
default and recently updated from JGit v2.0 to v2.2), reported here:

https://github.com/rtyley/agit/issues/92


If you run C-Git fetch on a bare-repo cloned by JGit, it flat-out
rejects the refspec (checked against v1.7.10.4):

fatal: Invalid refspec '+refs/heads/*:refs/heads//*'

Incidentally, C-Git does not create an explicit fetch refspec at all
when performing a bare clone - the full remote config generated by C-Git
looks like this:

[remote "origin"]
        url = ssh://example.com/my-repo.git

Using JGit on such a repository works fine, so omitting the fetch
refspec entirely is also an option.

Change-Id: I14b0d359dc69b8908f68e02cea7a756ac34bf881
2013-03-04 00:03:20 +00:00
Roberto Tyley f1dea3e279 Fix RefUpdate performance for existing Refs
No longer invoke the expensive RefDatabase.isNameConflicting() check on
updating existing refs, reducing batch ref update time by ~97%.

The RefDirectory implementation of isNameConflicting() is quite
slow (it has to do an expensive loose-ref scan) but it's only necessary
to perform this check on ref update if the ref is being *created* - if
the ref already exists, we can already guarantee that it does not
conflict with any other refs.

C-Git seems to use a similar condition before making the
is_refname_available() check:

https://github.com/git/git/blob/v1.8.1.4/refs.c#L1660-L1670

As an example of the effects on performance, here's a simple timing
experiment using The BFG to remove one file from the JGit repo:

---
$ wget http://repo1.maven.org/maven2/com/madgag/bfg-repo-cleaner/1.0.1/bfg-1.0.1.jar
$ git clone --mirror https://git.eclipse.org/r/p/jgit/jgit.git
$ java -jar bfg-1.0.1.jar -D make_jgit.sh jgit.git
....
Updating references:    100% (5760/5760)
...Ref update completed in 148,949 ms.

BFG run is complete!
---

The execution time for the run is completely dominated by the batch ref
update at the end. Repeating the experiment with BFG v1.0.2 (using JGit
patched with this change), the refs update is dramatically reduced:

---
Updating references:    100% (5760/5760)
...Ref update completed in 4,327 ms.
---

Change-Id: I9057bc4ee22f9cc269b1cc00c493841c71527cd6
2013-03-01 21:49:58 +00:00
Shawn Pearce 178d55c24d Merge "Improve the documentation of the ByteArraySet used by PathFilterGroup" 2013-02-28 19:21:59 -05:00
Colby Ranger 4a317a1790 Include supported extensions in PackFile constructor.
Previously a PackFile class was assumed to only support a .pack and .idx
file. Update the constructor to enumerate the supported extensions for
the pack file. This will allow the bitmap code to only be executed if
the bitmap extension file is known to exist.

Change-Id: Ie59041dffec5f60d7ea2771026ffd945106bd4bf
2013-02-28 11:35:07 -08:00
Gustaf Lundh 212fb3071c Fix while boundries in DateRevQueue.add()
In add(), "low" will never equals "first". This fact
should be reflected in the code.

Change-Id: I5cab51374e67bd2d3301e5d9dac47c4259b5e562
2013-02-25 18:30:03 +01:00
Shawn Pearce 9613b04d81 Merge "Performance fixes in DateRevQueue" 2013-02-25 11:50:21 -05:00
Gustaf Lundh 84afea9179 Performance fixes in DateRevQueue
When a lot of commits are added to DateRevQueue, the
sort-on-insertion approach is very heavy on CPU cycles.

One approach to fix this was made by Dave Borowitz:
https://git.eclipse.org/r/#/c/5491/

But using Java's PriorityQueue seems to have brought some
extra overhead, and the desired performance could not be
reached.

This fix takes another approach to the insertion problem,
without changing the expected behaviour or bringing extra
memory overhead:

If we detect over 1000 commits in the DateRevQueue, a
"seek-index" is rebuilt every 1000th added commit.

The index keeps track of every 100th commit in the
DateRevQueue. During insertions, it will be used for a
preliminary scanning (binary search) of the queue, with
the intention of helping add() find a good starting point
to start walking from. After finding this starting point,
add() will step commit-by-commit until the correct
insertion place in the queue is found (today, the queue
is expected to be sorted at all times).

When applied to repositories with many refs, this approach
has proven to bring huge performance gains and scales quite
well.

For instance, in a repository with close to 80000 refs,
we could cut down the time a typical Gerrit replication
of 1 commit would take (just a push from JGit's point of
view) from 32sec down to 3.5sec.

Below you see some typical times to add a specific amount
of commits (with random commit times) to the DateRevQueue
and the difference the preliminary seek-index makes:

Commits | Index | No Index
   1024     8ms        8ms
   2048    13ms        9ms
   4096     5ms       59ms
   8192    11ms      595ms
  16384    22ms     3058ms
  32768    64ms    13811ms
  65536   201ms    62677ms
 131072   783ms   331585ms

Only one extra reference is needed for every 100 inserted
commits (and only when we see more than 1000 commits in
the queue), so the memory overhead should be negligible.

Various index-stepping values were tested, and 100 seemed to
scale very well and be effective from start.

In the future, it should probably be dynamic and based on
the number of refs in the queue, but this should serve well
as a starting point.

Note: While other fundamentally different data structures may
be more suitable, the DateRevQueue is extremely central to
many of the Git core operations. This approach was chosen,
since the effect of the patch is easy to predict in conjuction
with the current implementation. A totally new data structure
will make it harder to predict behaviour in many common and
uncommon cases (in terms of breaking ties, memory usage, cost
when using few elements, object creation/disposing overhead,
etc).

Change-Id: Ie7b99f40eacf6324bfb4716d82073adeda64d10f
2013-02-25 12:36:29 +01:00
Matthias Sohn 912e19a8d6 Update last release version to 2.3.1.201302201838-r
Change-Id: I9c6d774526028e56707e15e80370460d964de76e
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-24 00:11:21 +01:00
Matthias Sohn af64b9a3b3 Deploy Maven artifacts to Eclipse Nexus repository
Bug: 401469
Bug: 401470
Change-Id: I4901dc208fe8f9e4055d27ab7e0ced979fd234f5
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-23 14:30:57 +01:00
George C. Young ab99b78ca0 Implement recursive merge strategy
Extend ResolveMerger with RecursiveMerger to merge two tips
that have up to 200 bases.

Bug: 380314
CQ: 6854
Change-Id: I6292bb7bda55c0242a448a94956f2d6a94fddbaa
Also-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-22 23:51:50 +01:00
Robin Rosenberg 78606404de Improve the documentation of the ByteArraySet used by PathFilterGroup
Change-Id: I2ba7a67e8e1596aa6c33a9caddee03a6be48f008
2013-02-22 23:21:57 +01:00
Colby Ranger 95ef1e83d0 Fix off by one error in PackReverseIndex.
The last 32bit offset is at Integer.MAX_VALUE.

Change-Id: Idee8be3c7887e1d0c8339ff94aceff36dbf000db
2013-02-20 22:59:35 -08:00
Matthias Sohn c033f016c9 Merge branch 'stable-2.3'
* stable-2.3:
  Prepare 2.3.2-SNAPSHOT builds
  JGit v2.3.1.201302201838-r
  Accept Change-Id even if footer contains not well-formed entries
  Fix false positives in hashing used by PathFilterGroup

Change-Id: I5882aa3b482d6bcd40a45bed51e5ab03f018a5bc
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-21 02:34:17 +01:00
Matthias Sohn 49ec6c1b3b Prepare 2.3.2-SNAPSHOT builds
Change-Id: I51a8a53194928416b1aef1f3fce0ce66aadceca4
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-21 02:13:15 +01:00
Matthias Sohn 63dceceb0e JGit v2.3.1.201302201838-r
Change-Id: I0d79873137ad4042ecc2a0210fe1f6305608b851
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-21 01:00:08 +01:00
Matthias Sohn 301df23d9b Merge "Accept Change-Id even if footer contains not well-formed entries" into stable-2.3 2013-02-20 18:35:44 -05:00
Stefan Lay 3b41fcbd96 Accept Change-Id even if footer contains not well-formed entries
Instead of only looking for a Change-Id in the last section if it 
consists only of well-formed "key: value" lines replace the last
occurrence of a valid Change-Id line in the last section. Some tools
require footer lines e.g. without a colon.

Gerrit doesn't accept Change-Id lines in the footer if the Change-Id
line doesn't start at the beginning of the line.

Bug: 400818
Change-Id: Icce54872adc8c566994beea848448a2f7ca87085
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-20 23:49:43 +01:00
Robin Stocker 5d7b722f6e Fix false positives in hashing used by PathFilterGroup
The ByteArraySet failed to check the length of the entry correctly leading
to matches where no match should be.

Bug: 401249
Change-Id: I925bc48d9cafcdf13e1a797bb09fc2555eb270c5
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2013-02-20 00:37:57 +01:00
Robin Rosenberg 6cadceee16 Must use double single quotes around parameters
Change-Id: I34da782e6b9a492e3e291b36ef82f06ce8347660
2013-02-16 10:54:52 -05:00
Robin Rosenberg 29546877b1 Add the --branch flag to the jgit clone command
--branch or -b allows the user to specify which branch to checkout after
clone.

Change-Id: Ie27533e5ecb43097862a8337a27a742b501e17a5
2013-02-16 10:38:54 -05:00
Matthias Sohn ae57189712 Merge "Prepare 2.4.0-SNAPSHOT builds" 2013-02-13 19:00:11 -05:00
Matthias Sohn ba6ae0c7ec Prepare 2.4.0-SNAPSHOT builds
Change-Id: I4ab2baeb5d598d40d5dadfccdfe75152a1b9b7bf
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-14 00:59:25 +01:00
Robin Rosenberg 0e4c74fdf6 Merge "Cleanup unused imports" 2013-02-13 18:39:27 -05:00
Robin Rosenberg 9ac9f13c15 Cleanup unused imports
Change-Id: I7fe51f4a6187c32c917b2f503d6b50d0d6b28a61
2013-02-14 00:34:15 +01:00
Matthias Sohn da59df358b Remove unused imports
These imports are unused since commit
cb349da017

Change-Id: I74ea2a17bf4976d9c74255500e5deeff18208e87
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-14 00:08:46 +01:00
Matthias Sohn 647acec3d9 Merge branch 'stable-2.3'
* stable-2.3:
  Prepare post 2.3.0.201302130906 builds
  JGit v2.3.0.201302130906
  Replace explicit version by property where possible
  Add better documentation to DirCacheCheckout
  Prepare post 2.3rc1 builds
  JGit v2.3.0.201302060400-rc1

Change-Id: I5a7626014dc38e7623937a4241dbf8a6db05c1f9
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-13 23:12:40 +01:00
Matthias Sohn 9a5f4b46cc Prepare post 2.3.0.201302130906 builds
Change-Id: Ia11b4000557d0cf235c4e33cda4539cab25fef47
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-13 23:08:30 +01:00
Robin Rosenberg c0cfcc76e0 Merge "Remove unused availableRefs local from Clone.guessHEAD" 2013-02-13 16:46:29 -05:00
Matthias Sohn 19d6cadeff JGit v2.3.0.201302130906
Change-Id: If2e5fcbc01c2a7f058ef13d60b0bba5f77300d52
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-13 15:24:41 +01:00