Commit Graph

2690 Commits

Author SHA1 Message Date
Edwin Kempin eac218b7b4 Add toString() for PackConfig
This is helpful for writing the pack configuration into a log file.

Change-Id: I5e7f5ff7e01c9538ca12a1860844ba9b467bdf05
Signed-off-by: Edwin Kempin <edwin.kempin@sap.com>
2013-03-15 10:24:58 +01:00
Edwin Kempin 9b20a3b0dd Add toString() for RepoStatistics
This is helpful for writing the repository statistics into a log file.

Change-Id: I0e8cd9ad05f123ab3851960890a50213f353a373
Signed-off-by: Edwin Kempin <edwin.kempin@sap.com>
2013-03-15 09:42:19 +01:00
Shawn Pearce 3760e4319b Remove cached_packs support in favor of bitmaps
The bitmap code in PackWriter knows exactly when to use a pack as
a "cached pack". It enables cached pack usage only when the pack
has a bitmap and its entire closure of objects needs to be sent.
This is a much simpler code path to maintain, and JGit actually
has a way to write the necessary index.

Change-Id: I2645d482f8733fdf0c4120cc59ba9aa4d4ba6881
2013-03-14 16:36:57 -07:00
Shawn Pearce b2c0021b8a Remove objects before optimization from DfsGarbageCollector
Just counting objects is not sufficient. There are some race
conditions with receive packs and delta base completion that
may confuse such a simple algorithm.

Instead always do the larger set computations, and rely on the
PackWriter having no objects pending as the way to avoid creating
an empty pack file.

Change-Id: Ic81fefb158ed6ef8d6522062f2be0338a49f6bc4
2013-03-14 16:36:36 -07:00
Shawn Pearce fc6b898cbe Simplfy caching of DfsPackDescription from PackWriter.Statistics
Let the pack description copy the relevant stats values. This
moves it out of the garbage collector and compactor algorithms,
co-locating with something that might care.

Remove some unnecessary code from the DfsPackCompactor, the stats
tracks the same information and can supply it.

Change-Id: Id64ab38d507c0ed19ae0d106862d175b7364eba3
2013-03-14 16:36:04 -07:00
Dave Borowitz 8e2a24a3b6 NameRevCommand: Use ~ notation for first parents of merges
Prefer ~(N+1) to ^1~N. Although both are correct, the former is
cleaner and matches "git name-rev".

Change-Id: I772001a219e5eb346f5552c92e6d98c70b2cfa98
2013-03-14 09:35:00 -07:00
Dave Borowitz d2a6c4b955 Allow adding single refs or all tags to NameRevCommand
Change-Id: I90e85bc835d11278631afd0e801425a292578bba
2013-03-13 12:28:58 -07:00
Shawn Pearce e175daf123 Merge "Cluster UNREACHABLE_GARBAGE packs at the end of the search list" 2013-03-12 17:56:48 -04:00
Shawn Pearce 7e229c75c1 Merge "Avoid repacking unreachable garbage in DfsGarbageCollector" 2013-03-12 17:55:47 -04:00
Shawn Pearce c017d7ef45 Merge changes Icd550359,If7aad533
* changes:
  Avoid looking at UNREACHABLE_GARBAGE for client have lines
  Simplify UploadPack by parsing wants separately from haves
2013-03-12 17:45:31 -04:00
Shawn Pearce ef91da3605 Merge "Add a NameRevCommand for describing IDs in terms of refnames" 2013-03-11 18:33:55 -04:00
Dave Borowitz 30ba407a9a Add a NameRevCommand for describing IDs in terms of refnames
The walk logic does not use RevWalk because it needs to walk all paths
to each of the requested commits, keeping track of each path along which
the commit was found in the RevCommit subclass. From these paths, a
single "best" path is chosen based on the total path length, with a
penalty applied for paths that traverse merges.

This functionality parallels "git name-rev".

Change-Id: I92bfb47dd16c898313d2ee525395609c3bf72ebe
2013-03-11 12:47:28 -07:00
Robin Rosenberg 3cd089f04c A folder does not constitute a dirty work tree
This fixes two cases:
- A folder without tracked content exist both in the workdir and merged
commit, as long as there names within that folder does not conflict.
- An empty folder structure exists with the same name as a file in the
merged commit.

Bug: 402834
Change-Id: I4c5b9f11313dd1665fcbdae2d0755fdb64deb3ef
2013-03-10 16:53:23 +01:00
Robin Stocker 9105e1c9af Add isRebasing to RepositoryState
See EGit change Ic69f5c952a49f023c0949f04b3e976be1b267fbe where this
could be used.

Change-Id: I9ec8568fa1100d2e9c8d4ca0e347bf77ec6d8734
2013-03-09 16:20:57 +01:00
Shawn Pearce 4e9fe58bb5 Avoid looking at UNREACHABLE_GARBAGE for client have lines
Clients send a bunch of unknown objects to UploadPack on each round
of negotiation. Many of these are not known to the server, which
leads the implementation to be looking at indexes for garbage packs.

Disable examining the index of a garbage pack, allowing servers to
avoid reading them from disk during negotiation.

The effect of this change is the server will only ACK a have line
if the object was reachable during the last garbage collection,
or was recently added to the repository. For most repositories
there is no impact in this behavior change.

If a repository rewinds a branch, runs GC, and then resets the
branch back to where it was before, the now current tip is going to
be skipped by this change. A client that has the commit may wind up
getting a slightly larger data transfer from the server as an older
common ancestor will be chosen during negotiation. This is fixable
on the server side by running GC again to correct the layout of
objects in pack files.

Change-Id: Icd550359ef70fc7b701980f9b13d923fd13c744b
2013-03-08 12:45:28 -08:00
Shawn Pearce 437be8dfad Simplify UploadPack by parsing wants separately from haves
The DHT backend was very slow at parsing objects. To work around
that performance limitation I obfuscated UploadPack by folding both
the want and have sets together in a single parse queue. Since DHT
was removed the complexity is no longer constructive to JGit.

Doing this refactoring prepares the code for a slightly future
change where the have lines need to be handled specially from the
want lines. Splitting the parsing up into two phases makes such
a modification trivial.

Change-Id: If7aad533b82448bbb688278e21f709282e5ccf4b
2013-03-08 12:25:12 -08:00
Shawn Pearce ea5eef912a Cluster UNREACHABLE_GARBAGE packs at the end of the search list
Garbage is unlikely to be used by a reader. Ensure they always
cluster at the end of the search list, no matter what timestamp
was used on the pack files.

Change-Id: I3bed89e9569ee3363c36bb3f73fcd34057a3883f
2013-03-08 11:19:44 -08:00
Shawn Pearce bb002c619b Avoid repacking unreachable garbage in DfsGarbageCollector
If a repository has significant amounts of unreachable garbage the
final phase to coalesce it can take longer than any other part of the
garbage collection phase. Provide a setting for applications to tweak
the threshold where coalescing ends and files just remain on disk.

Change-Id: I5f11a998a7185c75ece3271d8bc6181bb83f54c1
2013-03-08 11:07:51 -08:00
Robin Stocker 3ee04e3531 Include the number of ms in timeout error message
Noticed that while analyzing bug 402131.

Change-Id: If3fd40b64d5088c4579946271a67346cbd9e6556
2013-03-08 18:00:19 +01:00
Robin Rosenberg 3ad454497c Do not cherry-pick merge commits during rebase
Rebase computes the list of commits that are included in
the merges, just like Git does, so do not try to include
the merge commits. Re-recreating merges during rebase is
a bit more complicated and might be a useful future extension,
but for now just linearize during rebase.

Change-Id: I61239d265f395e5ead580df2528e46393dc6bdbd
Signed-off-by: Robin Stocker <robin@nibor.org>
2013-03-08 16:40:19 +01:00
Robin Rosenberg 08d5ede281 Extend FileUtils.delete with option to delete empty directories only
The new option EMPTY_DIRECTORIES_ONLY will make delete() only delete
empty directories. Any attempt to delete files will fail. Can be
combined with RECURSIVE to wipe out entire tree structures and
IGNORE_ERRORS to silently ignore any files or non-empty directories.

Change-Id: Icaa9a30e5302ee5c0ba23daad11c7b93e26b7445
Signed-off-by: Robin Stocker <robin@nibor.org>
2013-03-08 16:26:10 +01:00
Matthias Sohn 13ea3b0957 Add javaewah bundle to features using it
This ensures that OSGi consumers can retrieve this dependency from the
JGit or EGit p2 repository.

Change-Id: I6f88a4914a19e4e18aa60d59b0cc8a33b61f7fc2
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-03-07 08:51:06 +01:00
Shawn Pearce 913cccd5c4 Do not attempt to read bitmap from invalid pack
If a pack file has been marked invalid due to a prior IOException
accessing its contents, do not offer its bitmap index to callers.
The pack cannot be used so its bitmap should be off limits from
any reader trying to work from a bitmap.

Change-Id: Ia44e46558abdddee560bb184158b1e0af9437eee
2013-03-06 12:48:25 -08:00
Shawn Pearce 88c962484f Rename DfsPackFile getBitmap method to match PackFile
There is no reason for these to differ in name. Match the
shorter name used by PackFile.

Change-Id: I2d3a299069acc5ce276b1b5439ff2258903c6ff3
2013-03-06 12:47:37 -08:00
Colby Ranger c660362768 Write the bitmap index correctly in DFS GC.
A bug caused the .bitmap to actually have the .idx contents.

Change-Id: I428bb27d419e8b1b69b6f3e2fd07cd29703669ad
2013-03-06 12:33:05 -08:00
Colby Ranger e6883dfe4b Enable writing bitmaps during GC by default.
Bitmaps provide a huge performance boost for counting objects and they
play nice with the cgit implementation.

Change-Id: I33b05a6c8f1ee2df7770f0b9fdc50d0b4bbf1029
2013-03-05 11:16:08 -08:00
Colby Ranger f82821728b Enable writing pack indexes with bitmaps in the GC.
Update the dfs and file GC implementations to prepare and write
bitmaps on the packs that contain the full closure of the object
graph. Update the DfsPackDescription to include the index version.

Change-Id: I3f1421e9cd90fe93e7e2ef2b8179ae2f1ba819ed
2013-03-05 11:15:19 -08:00
Colby Ranger 43ea887c8b Enable serving upload requests using bitmaps.
If the pack index has bitmaps, allow the PackWriter to use the bitmaps
for upload requests.

Change-Id: Iefa995fe927a11e4fd78afb34530995614221fc0
2013-03-05 11:14:48 -08:00
Colby Ranger dafcb8f6db Support creating pack bitmap indexes in PackWriter.
Update the PackWriter to support writing out pack bitmap indexes,
a parallel ".bitmap" file to the ".pack" file.
Bitmaps are selected at commits every 1 to 5,000 commits for
each unique path from the start. The most recent 100 commits are
all bitmapped. The next 19,000 commits have a bitmaps every 100
commits. The remaining commits have a bitmap every 5,000 commits.
Commits with more than 1 parent are prefered over ones
with 1 or less. Furthermore, previously computed bitmaps are reused,
if the previous entry had the reuse flag set, which is set when the
bitmap was placed at the max allowed distance.

Bitmaps are used to speed up the counting phase when packing, for
requests that are not shallow. The PackWriterBitmapWalker uses
a RevFilter to proactively mark commits with RevFlag.SEEN, when
they appear in a bitmap. The walker produces the full closure
of reachable ObjectIds, given the collection of starting ObjectIds.

For fetch request, two ObjectWalks are executed to compute the
ObjectIds reachable from the haves and from the wants. The
ObjectIds needed to be written are determined by taking all the
resulting wants AND NOT the haves.

For clone requests, we get cached pack support for "free" since
it is possible to determine if all of the ObjectIds in a pack file
are included in the resulting list of ObjectIds to write.

On my machine, the best times for clones and fetches of the linux
kernel repository (with about 2.6M objects and 300K commits) are
tabulated below:

Operation                   Index V2               Index VE003
Clone                       37530ms (524.06 MiB)     82ms (524.06 MiB)
Fetch (1 commit back)          75ms                 107ms
Fetch (10 commits back)       456ms (269.51 KiB)    341ms (265.19 KiB)
Fetch (100 commits back)      449ms (269.91 KiB)    337ms (267.28 KiB)
Fetch (1000 commits back)    2229ms ( 14.75 MiB)    189ms ( 14.42 MiB)
Fetch (10000 commits back)   2177ms ( 16.30 MiB)    254ms ( 15.88 MiB)
Fetch (100000 commits back) 14340ms (185.83 MiB)   1655ms (189.39 MiB)

Change-Id: Icdb0cdd66ff168917fb9ef17b96093990cc6a98d
2013-03-05 11:14:45 -08:00
Colby Ranger 3b325917a5 Added read/write support for pack bitmap index.
A pack bitmap index is an additional index of compressed
bitmaps of the object graph. Furthermore, a logical API of the index
functionality is included, as it is expected to be used by the
PackWriter.

Compressed bitmaps are created using the javaewah library, which is a
word-aligned compressed variant of the Java bitset class based on
run-length encoding. The library only works with positive integer
values. Thus, the maximum number of ObjectIds in a pack file that
this index can currently support is limited to Integer.MAX_VALUE.

Every ObjectId is given an integer mapping. The integer is the
position of the ObjectId in the complete ObjectId list, sorted
by offset, for the pack file. That integer is what the bitmaps
use to reference the ObjectId. Currently, the new index format can
only be used with pack files that contain a complete closure of the
object graph e.g. the result of a garbage collection.

The index file includes four bitmaps for the Git object types i.e.
commits, trees, blobs, and tags. In addition, a collection of
bitmaps keyed by an ObjectId is also included. The bitmap for each entry
in the collection represents the full closure of ObjectIds reachable
from the keyed ObjectId (including the keyed ObjectId itself). The
bitmaps are further compressed by XORing the current bitmaps against
prior bitmaps in the index, and selecting the smallest representation.
The XOR'd bitmap and offset from the current entry to the position
of the bitmap to XOR against is the actual representation of the entry
in the index file. Each entry contains one byte, which is currently
used to note whether the bitmap should be blindly reused.

Change-Id: Id328724bf6b4c8366a088233098c18643edcf40f
2013-03-05 11:09:44 -08:00
Shawn Pearce 234b4e0432 Merge "Break the dependency on RevObject when creating a newObjectToPack()." 2013-03-04 20:12:35 -05:00
Shawn Pearce 374406ac46 Merge "Fix RefUpdate performance for existing Refs" 2013-03-04 19:11:08 -05:00
Shawn Pearce 22625cd1d8 Merge "Fix corrupted CloneCommand bare-repo fetch-refspec (#402031)" 2013-03-04 17:53:02 -05:00
Colby Ranger be7a135e94 Break the dependency on RevObject when creating a newObjectToPack().
Update the ObjectReuseAsIs API to support creating new
ObjectToPack with only the AnyObjectId and Git object type. This is
needed to support the future pack index bitmaps, which only contain
this information and do not want the overhead of creating a temporary
object for every ObjectId.

Change-Id: I906360b471412688bf429ecef74fd988f47875dc
2013-03-04 14:43:22 -08:00
Colby Ranger 8d4f227c13 Merge "Remove the unused method PackFile.hasExt()." 2013-03-04 17:30:27 -05:00
Colby Ranger 1512d0ab4e Remove the unused method PackFile.hasExt().
It will be used in a future change, so just include it with that change.

Change-Id: I7db28d86f8e8b282a403acd9a4c4defaae828f94
2013-03-04 14:16:36 -08:00
Roberto Tyley a46b042905 Fix corrupted CloneCommand bare-repo fetch-refspec (#402031)
CloneCommand has been creating fetch refspecs like this on bare clones:

[remote "origin"]
        url = ssh://example.com/my-repo.git
        fetch = +refs/heads/*:refs/heads//*

As you can see, the destination ref pattern has a superfluous slash.

It looks like this behaviour has always been the case for CloneCommand,
at least since cc2197ed when code catering to bare-clone fetch refspecs
was added. That was released with JGit v1.0 almost 2 years ago, so
there will probably be some bare repos in the wild which will have been
cloned with JGit and have these corrupted refspecs.

The effect of the corrupted fetch refspec is quite interesting. Up to
and including JGit 2.0, the corrupt refspec was tolerated and fetches
would work as intended with no indication to the user that anything was
amiss. With JGit 2.1, a change was introduced which made JGit less
tolerant, and fetches now attempt to update the non-existing ref
"refs/heads//master". No exception is raised, but the real ref -
"refs/heads/master" - is not updated.

This behaviour was noticed by a user of Agit (which does bare clones by
default and recently updated from JGit v2.0 to v2.2), reported here:

https://github.com/rtyley/agit/issues/92


If you run C-Git fetch on a bare-repo cloned by JGit, it flat-out
rejects the refspec (checked against v1.7.10.4):

fatal: Invalid refspec '+refs/heads/*:refs/heads//*'

Incidentally, C-Git does not create an explicit fetch refspec at all
when performing a bare clone - the full remote config generated by C-Git
looks like this:

[remote "origin"]
        url = ssh://example.com/my-repo.git

Using JGit on such a repository works fine, so omitting the fetch
refspec entirely is also an option.

Change-Id: I14b0d359dc69b8908f68e02cea7a756ac34bf881
2013-03-04 00:03:20 +00:00
Roberto Tyley f1dea3e279 Fix RefUpdate performance for existing Refs
No longer invoke the expensive RefDatabase.isNameConflicting() check on
updating existing refs, reducing batch ref update time by ~97%.

The RefDirectory implementation of isNameConflicting() is quite
slow (it has to do an expensive loose-ref scan) but it's only necessary
to perform this check on ref update if the ref is being *created* - if
the ref already exists, we can already guarantee that it does not
conflict with any other refs.

C-Git seems to use a similar condition before making the
is_refname_available() check:

https://github.com/git/git/blob/v1.8.1.4/refs.c#L1660-L1670

As an example of the effects on performance, here's a simple timing
experiment using The BFG to remove one file from the JGit repo:

---
$ wget http://repo1.maven.org/maven2/com/madgag/bfg-repo-cleaner/1.0.1/bfg-1.0.1.jar
$ git clone --mirror https://git.eclipse.org/r/p/jgit/jgit.git
$ java -jar bfg-1.0.1.jar -D make_jgit.sh jgit.git
....
Updating references:    100% (5760/5760)
...Ref update completed in 148,949 ms.

BFG run is complete!
---

The execution time for the run is completely dominated by the batch ref
update at the end. Repeating the experiment with BFG v1.0.2 (using JGit
patched with this change), the refs update is dramatically reduced:

---
Updating references:    100% (5760/5760)
...Ref update completed in 4,327 ms.
---

Change-Id: I9057bc4ee22f9cc269b1cc00c493841c71527cd6
2013-03-01 21:49:58 +00:00
Shawn Pearce 178d55c24d Merge "Improve the documentation of the ByteArraySet used by PathFilterGroup" 2013-02-28 19:21:59 -05:00
Colby Ranger 4a317a1790 Include supported extensions in PackFile constructor.
Previously a PackFile class was assumed to only support a .pack and .idx
file. Update the constructor to enumerate the supported extensions for
the pack file. This will allow the bitmap code to only be executed if
the bitmap extension file is known to exist.

Change-Id: Ie59041dffec5f60d7ea2771026ffd945106bd4bf
2013-02-28 11:35:07 -08:00
Gustaf Lundh 212fb3071c Fix while boundries in DateRevQueue.add()
In add(), "low" will never equals "first". This fact
should be reflected in the code.

Change-Id: I5cab51374e67bd2d3301e5d9dac47c4259b5e562
2013-02-25 18:30:03 +01:00
Shawn Pearce 9613b04d81 Merge "Performance fixes in DateRevQueue" 2013-02-25 11:50:21 -05:00
Gustaf Lundh 84afea9179 Performance fixes in DateRevQueue
When a lot of commits are added to DateRevQueue, the
sort-on-insertion approach is very heavy on CPU cycles.

One approach to fix this was made by Dave Borowitz:
https://git.eclipse.org/r/#/c/5491/

But using Java's PriorityQueue seems to have brought some
extra overhead, and the desired performance could not be
reached.

This fix takes another approach to the insertion problem,
without changing the expected behaviour or bringing extra
memory overhead:

If we detect over 1000 commits in the DateRevQueue, a
"seek-index" is rebuilt every 1000th added commit.

The index keeps track of every 100th commit in the
DateRevQueue. During insertions, it will be used for a
preliminary scanning (binary search) of the queue, with
the intention of helping add() find a good starting point
to start walking from. After finding this starting point,
add() will step commit-by-commit until the correct
insertion place in the queue is found (today, the queue
is expected to be sorted at all times).

When applied to repositories with many refs, this approach
has proven to bring huge performance gains and scales quite
well.

For instance, in a repository with close to 80000 refs,
we could cut down the time a typical Gerrit replication
of 1 commit would take (just a push from JGit's point of
view) from 32sec down to 3.5sec.

Below you see some typical times to add a specific amount
of commits (with random commit times) to the DateRevQueue
and the difference the preliminary seek-index makes:

Commits | Index | No Index
   1024     8ms        8ms
   2048    13ms        9ms
   4096     5ms       59ms
   8192    11ms      595ms
  16384    22ms     3058ms
  32768    64ms    13811ms
  65536   201ms    62677ms
 131072   783ms   331585ms

Only one extra reference is needed for every 100 inserted
commits (and only when we see more than 1000 commits in
the queue), so the memory overhead should be negligible.

Various index-stepping values were tested, and 100 seemed to
scale very well and be effective from start.

In the future, it should probably be dynamic and based on
the number of refs in the queue, but this should serve well
as a starting point.

Note: While other fundamentally different data structures may
be more suitable, the DateRevQueue is extremely central to
many of the Git core operations. This approach was chosen,
since the effect of the patch is easy to predict in conjuction
with the current implementation. A totally new data structure
will make it harder to predict behaviour in many common and
uncommon cases (in terms of breaking ties, memory usage, cost
when using few elements, object creation/disposing overhead,
etc).

Change-Id: Ie7b99f40eacf6324bfb4716d82073adeda64d10f
2013-02-25 12:36:29 +01:00
Matthias Sohn 912e19a8d6 Update last release version to 2.3.1.201302201838-r
Change-Id: I9c6d774526028e56707e15e80370460d964de76e
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-24 00:11:21 +01:00
Matthias Sohn af64b9a3b3 Deploy Maven artifacts to Eclipse Nexus repository
Bug: 401469
Bug: 401470
Change-Id: I4901dc208fe8f9e4055d27ab7e0ced979fd234f5
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-23 14:30:57 +01:00
George C. Young ab99b78ca0 Implement recursive merge strategy
Extend ResolveMerger with RecursiveMerger to merge two tips
that have up to 200 bases.

Bug: 380314
CQ: 6854
Change-Id: I6292bb7bda55c0242a448a94956f2d6a94fddbaa
Also-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-22 23:51:50 +01:00
Robin Rosenberg 78606404de Improve the documentation of the ByteArraySet used by PathFilterGroup
Change-Id: I2ba7a67e8e1596aa6c33a9caddee03a6be48f008
2013-02-22 23:21:57 +01:00
Colby Ranger 95ef1e83d0 Fix off by one error in PackReverseIndex.
The last 32bit offset is at Integer.MAX_VALUE.

Change-Id: Idee8be3c7887e1d0c8339ff94aceff36dbf000db
2013-02-20 22:59:35 -08:00
Matthias Sohn c033f016c9 Merge branch 'stable-2.3'
* stable-2.3:
  Prepare 2.3.2-SNAPSHOT builds
  JGit v2.3.1.201302201838-r
  Accept Change-Id even if footer contains not well-formed entries
  Fix false positives in hashing used by PathFilterGroup

Change-Id: I5882aa3b482d6bcd40a45bed51e5ab03f018a5bc
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-21 02:34:17 +01:00
Matthias Sohn 49ec6c1b3b Prepare 2.3.2-SNAPSHOT builds
Change-Id: I51a8a53194928416b1aef1f3fce0ce66aadceca4
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2013-02-21 02:13:15 +01:00