Go to file
Shawn Pearce 61d4922928 Fix missing deltas near type boundaries
Delta search was discarding discovered deltas if an object appeared
near a type boundary in the delta search window. This has caused JGit
to produce larger pack files than other implementations of the packing
algorithm.

Delta search works by pushing prior objects into a search window, an
ordered list of objects to attempt to delta compress the next object
against. (The window size is bounded, avoiding O(N^2) behavior.)

For implementation reasons multiple object types can appear in the
input list, and the window. PackWriter commonly passes both trees and
blobs in the input list handed to the DeltaWindow algorithm. The pack
file format requires an object to only delta compress against the same
type, so the DeltaWindow algorithm must stop doing comparisions if a
blob would be compared to a tree.

Because the input list is sorted by object type and the window is
recently considered prior objects, once a wrong type is discovered in
the window the search algorithm stops and uses the current result.

Unfortunately the termination condition was discarding any found
delta by setting deltaBase and deltaBuf to null when it was trying
to break the window search.

When this bug occurs, the state of the DeltaWindow looks like this:

                                 current
                                  |
                                 \ /
  input list:  tree0 tree1 blob1 blob2

  window:      blob1 tree1 tree0
                / \
                 |
              res.prev

As the loop iterates to the right across the window, it first finds
that blob1 is a suitable delta base for blob2, and temporarily holds
this in the bestDelta/deltaBuf fields. It then considers tree1, but
tree1 has the wrong type (blob != tree), so the window loop must give
up and fall through the remaining code.

Moving the condition up and discarding the window contents allows
the bestDelta/deltaBuf to be kept, letting the final file delta
compress blob1 against blob0.

The impact of this bug (and its fix) on real world repositories is
likely minimal. The boundary from blob to tree happens approximately
once in the search, as the input list is sorted by type. Only the
first window size worth of blobs (e.g. 10 or 250) were failing to
produce a delta in the final file.

This bug fix does produce significantly different results for small
test repositories created in the unit test suite, such as when a pack
may contains 6 objects (2 commits, 2 trees, 2 blobs).  Packing test
cases can now better sample different output pack file sizes depending
on delta compression and object reuse flags in PackConfig.

Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
2017-02-08 14:36:24 -08:00
.mvn Configure max heap size for Maven build 2016-12-09 11:02:10 +01:00
lib Update JavaEWAH to 1.1.6 2016-11-17 00:26:44 +01:00
org.eclipse.jgit Fix missing deltas near type boundaries 2017-02-08 14:36:24 -08:00
org.eclipse.jgit.ant Prepare 4.7.0-SNAPSHOT builds 2016-12-27 01:45:50 +01:00
org.eclipse.jgit.ant.test Prepare 4.7.0-SNAPSHOT builds 2016-12-27 01:45:50 +01:00
org.eclipse.jgit.archive Format Bazel files with buildifier 2017-01-22 22:34:11 +01:00
org.eclipse.jgit.http.apache Remove unused org.apache.http.impl.client.cache requirement 2017-01-26 15:30:36 -04:00
org.eclipse.jgit.http.server Format Bazel files with buildifier 2017-01-22 22:34:11 +01:00
org.eclipse.jgit.http.test Follow redirects in transport 2017-02-02 21:20:23 -04:00
org.eclipse.jgit.junit RepositoryCacheTest: avoid to close already closed repository 2017-01-28 21:19:55 +01:00
org.eclipse.jgit.junit.http Prepare 4.7.0-SNAPSHOT builds 2016-12-27 01:45:50 +01:00
org.eclipse.jgit.lfs Don't rely on default locale when using toUpperCase() and toLowerCase() 2017-01-28 15:06:15 +01:00
org.eclipse.jgit.lfs.server Don't rely on default locale when using toUpperCase() and toLowerCase() 2017-01-28 15:06:15 +01:00
org.eclipse.jgit.lfs.server.test Prepare 4.7.0-SNAPSHOT builds 2016-12-27 01:45:50 +01:00
org.eclipse.jgit.lfs.test Don't rely on default locale when using toUpperCase() and toLowerCase() 2017-01-28 15:06:15 +01:00
org.eclipse.jgit.packaging Merge branch 'stable-4.6' 2017-01-26 11:01:32 +09:00
org.eclipse.jgit.pgm Don't rely on default locale when using toUpperCase() and toLowerCase() 2017-01-28 15:06:15 +01:00
org.eclipse.jgit.pgm.test Prepare 4.7.0-SNAPSHOT builds 2016-12-27 01:45:50 +01:00
org.eclipse.jgit.test Fix missing deltas near type boundaries 2017-02-08 14:36:24 -08:00
org.eclipse.jgit.ui Prepare 4.7.0-SNAPSHOT builds 2016-12-27 01:45:50 +01:00
tools Implement initial framework of Bazel build 2017-01-18 19:13:16 -04:00
.buckconfig Change JGit minimum execution environment to JavaSE-1.8 2016-09-20 11:32:36 +02:00
.buckversion Upgrade buck to latest version 2016-12-01 15:57:17 +09:00
.gitattributes Initial JGit contribution to eclipse.org 2009-09-29 16:47:03 -07:00
.gitignore Implement initial framework of Bazel build 2017-01-18 19:13:16 -04:00
.mailmap Update .mailmap 2016-08-09 11:10:39 +09:00
BUCK Buck: Simplify root build file 2016-02-14 11:45:30 +01:00
BUILD Implement Bazel build for http-apache, lfs, lfs-server 2017-01-22 22:34:12 +01:00
CONTRIBUTING.md Update SUBMITTING_PATCHES 2014-07-20 17:44:53 -04:00
LICENSE Clean up LICENSE file 2010-07-02 14:52:49 -07:00
README.md Remove references to org.eclipse.jgit.java7 2016-08-05 11:22:27 +09:00
WORKSPACE Format Bazel files with buildifier 2017-01-22 22:34:11 +01:00
pom.xml Prepare 4.7.0-SNAPSHOT builds 2016-12-27 01:45:50 +01:00

README.md

Java Git

An implementation of the Git version control system in pure Java.

This package is licensed under the EDL (Eclipse Distribution License).

JGit can be imported straight into Eclipse, built and tested from there, but the automated builds use Maven.

  • org.eclipse.jgit

    A pure Java library capable of being run standalone, with no additional support libraries. It provides classes to read and write a Git repository and operate on a working directory.

    All portions of JGit are covered by the EDL. Absolutely no GPL, LGPL or EPL contributions are accepted within this package.

  • org.eclipse.jgit.ant

    Ant tasks based on JGit.

  • org.eclipse.jgit.archive

    Support for exporting to various archive formats (zip etc).

  • org.eclipse.jgit.http.apache

    Apache httpclient support

  • org.eclipse.jgit.http.server

    Server for the smart and dumb Git HTTP protocol.

  • org.eclipse.jgit.pgm

    Command-line interface Git commands implemented using JGit ("pgm" stands for program).

  • org.eclipse.jgit.packaging

    Production of Eclipse features and p2 repository for JGit. See the JGit Wiki on why and how to use this module.

Tests

  • org.eclipse.jgit.junit

    Helpers for unit testing

  • org.eclipse.jgit.test

    Unit tests for org.eclipse.jgit

  • org.eclipse.jgit.ant.test

  • org.eclipse.jgit.pgm.test

  • org.eclipse.jgit.http.test

  • org.eclipse.jgit.junit.test

    No further description needed

Warnings/Caveats

  • Native smbolic links are supported, provided the file system supports them. For Windows you must have Windows Vista/Windows 2008 or newer, use a non-administrator account and have the SeCreateSymbolicLinkPrivilege.

  • Only the timestamp of the index is used by jgit if the index is dirty.

  • JGit requires at least a Java 7 JDK.

  • CRLF conversion is performed depending on the core.autocrlf setting, however Git for Windows by default stores that setting during installation in the "system wide" configuration file. If Git is not installed, use the global or repository configuration for the core.autocrlf setting.

  • The system wide configuration file is located relative to where C Git is installed. Make sure Git can be found via the PATH environment variable. When installing Git for Windows check the "Run Git from the Windows Command Prompt" option. There are other options like Eclipse settings that can be used for pointing out where C Git is installed. Modifying PATH is the recommended option if C Git is installed.

  • We try to use the same notation of $HOME as C Git does. On Windows this is often not the same value as the user.home system property.

Package Features

  • org.eclipse.jgit/

    • Read loose and packed commits, trees, blobs, including deltafied objects.

    • Read objects from shared repositories

    • Write loose commits, trees, blobs.

    • Write blobs from local files or Java InputStreams.

    • Read blobs as Java InputStreams.

    • Copy trees to local directory, or local directory to a tree.

    • Lazily loads objects as necessary.

    • Read and write .git/config files.

    • Create a new repository.

    • Read and write refs, including walking through symrefs.

    • Read, update and write the Git index.

    • Checkout in dirty working directory if trivial.

    • Walk the history from a given set of commits looking for commits introducing changes in files under a specified path.

    • Object transport Fetch via ssh, git, http, Amazon S3 and bundles. Push via ssh, git and Amazon S3. JGit does not yet deltify the pushed packs so they may be a lot larger than C Git packs.

    • Garbage collection

    • Merge

    • Rebase

    • And much more

  • org.eclipse.jgit.pgm/

    • Assorted set of command line utilities. Mostly for ad-hoc testing of jgit log, glog, fetch etc.
  • org.eclipse.jgit.ant/

    • Ant tasks
  • org.eclipse.jgit.archive/

    • Support for Zip/Tar and other formats
  • org.eclipse.http.*/

    • HTTP client and server support

Missing Features

There are some missing features:

  • gitattributes support

Support

Post question, comments or patches to the jgit-dev@eclipse.org mailing list. You need to be subscribed to post, see here:

https://dev.eclipse.org/mailman/listinfo/jgit-dev

Contributing

See the EGit Contributor Guide:

http://wiki.eclipse.org/EGit/Contributor_Guide

About Git

More information about Git, its repository format, and the canonical C based implementation can be obtained from the Git website:

http://git-scm.com/