Go to file
Shawn O. Pearce 53db854185 Speed up ObjectWalk by 6235 objects/sec
The "Counting objects" phase of packing is the most time consuming
part for any server providing access to Git repositories. Scanning
through the entire project history, including every revision of
every tree that has ever existed is expensive and takes an incredible
amount of CPU time.

Inline the tree parsing logic, unroll a number of loops, and setup
to better handle the common case of seeing another occurrence of
an object that was already marked SEEN.

This change boosts the "Counting objects" phase when JGit is acting
as a server and is packing the linux-2.6 repository for its client.
Compared to CGit on the same hardware, a JGit daemon server is now
21883 objects/sec faster:

CGit:
  Counted 2058062 objects in 38981 ms at 52796.54 objects/sec
  Counted 2058062 objects in 38920 ms at 52879.29 objects/sec
  Counted 2058062 objects in 39059 ms at 52691.11 objects/sec

JGit (before):
  Counted 2058062 objects in 31529 ms at 65275.21 objects/sec
  Counted 2058062 objects in 30359 ms at 67790.84 objects/sec
  Counted 2058062 objects in 30033 ms at 68526.69 objects/sec

JGit (this commit):
  Counted 2058062 objects in 28726 ms at 71644.57 objects/sec
  Counted 2058062 objects in 27652 ms at 74427.24 objects/sec
  Counted 2058062 objects in 27528 ms at 74762.50 objects/sec

Above the first run was a "cold server". For JGit the JVM had just
started up with `jgit daemon`, and for CGit we hadn't touched the
repository "recently" (but it was certainly in kernel buffer cache).
The second and third runs were against the running JGit JVM, allowing
timing tests to better reflect the benefits of JGit's pack and index
caching, as well as any optimizations the JIT may have performed.

The timings are fair.  CGit is opening, checking and mmap'ing both
the pack and index during the timer.  JGit is opening, checking
and malloc+read'ing the pack and index data into its Java heap
during the timer. Both processes are walking the same graph space,
and are computing the "path hash" necessary to sort objects in the
object table for delta compression.  Since this commit only impacts
the "Counting objects" phase, delta compression was obviously not
included in the timings and JGit may still be performing delta
compression slower than CGit, resulting in an overall slower server
experience for clients.

Change-Id: Ieb184bfaed8475d6960a494b1f3c870e0382164a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2011-08-13 14:53:18 -07:00
org.eclipse.jgit Speed up ObjectWalk by 6235 objects/sec 2011-08-13 14:53:18 -07:00
org.eclipse.jgit.ant Merge branch 'stable-1.0' 2011-06-09 17:41:16 +02:00
org.eclipse.jgit.ant.test Prepare post JGit v1.0.0.201106090707-r builds 2011-06-09 14:11:23 +02:00
org.eclipse.jgit.console Merge branch 'stable-1.0' 2011-06-09 17:41:16 +02:00
org.eclipse.jgit.generated.storage.dht.proto Merge branch 'stable-1.0' 2011-06-09 17:41:16 +02:00
org.eclipse.jgit.http.server Push errors back over sideband when possible 2011-06-09 17:29:46 -07:00
org.eclipse.jgit.http.test Refactor out ReflogEntry 2011-06-20 10:25:50 -05:00
org.eclipse.jgit.iplog Merge branch 'stable-1.0' 2011-06-09 17:41:16 +02:00
org.eclipse.jgit.junit Cleanup directories leftover by test. 2011-07-07 23:16:40 +02:00
org.eclipse.jgit.junit.http Prepare 1.1.0 builds 2011-06-06 01:24:32 +02:00
org.eclipse.jgit.packaging Prepare post JGit v1.0.0.201106090707-r builds 2011-06-09 14:11:23 +02:00
org.eclipse.jgit.pgm blame: Implement blame on the command line 2011-08-13 14:12:03 -07:00
org.eclipse.jgit.storage.dht Merge branch 'stable-1.0' 2011-06-09 17:41:16 +02:00
org.eclipse.jgit.storage.dht.test Prepare post JGit v1.0.0.201106090707-r builds 2011-06-09 14:11:23 +02:00
org.eclipse.jgit.test Merge "Fix reading of ref names containing characters that sort before /" 2011-08-10 14:40:25 -04:00
org.eclipse.jgit.ui Merge branch 'stable-1.0' 2011-06-09 17:41:16 +02:00
tools Fix version.sh 2011-02-11 23:21:49 +01:00
.eclipse_iplog Update Eclipse IP log for 1.0 2011-05-25 20:14:35 +02:00
.gitattributes Initial JGit contribution to eclipse.org 2009-09-29 16:47:03 -07:00
LICENSE Clean up LICENSE file 2010-07-02 14:52:49 -07:00
README Initial JGit contribution to eclipse.org 2009-09-29 16:47:03 -07:00
SUBMITTING_PATCHES Correcting explanation of EDL 2009-10-28 14:12:07 +01:00
pom.xml Prepare post JGit v1.0.0.201106090707-r builds 2011-06-09 14:11:23 +02:00

README

            == Java GIT ==

This package is licensed under the BSD.

  org.eclipse.jgit/

    A pure Java library capable of being run standalone, with no
    additional support libraries.  Some JUnit tests are provided
    to exercise the library.  The library provides functions to
    read and write a GIT formatted repository.

    All portions of jgit are covered by the BSD.  Absolutely no GPL,
    LGPL or EPL contributions are accepted within this package.

  org.eclipse.jgit.test/
    Unit tests for org.eclipse.jgit and the same licensing rules.

            == WARNINGS / CAVEATS              ==

- Symbolic links are not supported because java does not support it.
  Such links could be damaged.

- Only the timestamp of the index is used by jgit check if  the index
  is dirty.

- Don't try the library with a JDK other than 1.6 (Java 6) unless you
  are prepared to investigate problems yourself. JDK 1.5.0_11 and later
  Java 5 versions *may* work. Earlier versions do not. JDK 1.4 is *not*
  supported. Apple's Java 1.5.0_07 is reported to work acceptably. We
  have no information about other vendors. Please report your findings
  if you try.

- CRLF conversion is never performed. On Windows you should thereforc
  make sure your projects and workspaces are configured to save files
  with Unix (LF) line endings.

            == Package Features                ==

  org.eclipse.jgit/

    * Read loose and packed commits, trees, blobs, including
      deltafied objects.

    * Read objects from shared repositories

    * Write loose commits, trees, blobs.

    * Write blobs from local files or Java InputStreams.

    * Read blobs as Java InputStreams.

    * Copy trees to local directory, or local directory to a tree.

    * Lazily loads objects as necessary.

    * Read and write .git/config files.

    * Create a new repository.

    * Read and write refs, including walking through symrefs.

    * Read, update and write the Git index.

    * Checkout in dirty working directory if trivial.

    * Walk the history from a given set of commits looking for commits
      introducing changes in files under a specified path.

    * Object transport
      Fetch via ssh, git, http, Amazon S3 and bundles.
      Push via ssh, git and Amazon S3. JGit does not yet deltify
      the pushed packs so they may be a lot larger than C Git packs.

  org.eclipse.jgit.pgm/

    * Assorted set of command line utilities. Mostly for ad-hoc testing of jgit
      log, glog, fetch etc.

            == Missing Features                ==

There are a lot of missing features. You need the real Git for this.
For some operations it may just be the preferred solution also. There
are not just a command line, there is e.g. git-gui that makes committing
partial files simple.

- Merging. 

- Repacking.

- Generate a GIT format patch.

- Apply a GIT format patch.

- Documentation. :-)

- gitattributes support
  In particular CRLF conversion is not implemented. Files are treated
  as byte sequences.

- submodule support
  Submodules are not supported or even recognized.

            == Support                         ==

  Post question, comments or patches to the git@vger.kernel.org mailing list.


            == Contributing                    ==

  See SUBMITTING_PATCHES in this directory. However, feedback and bug reports
  are also contributions.


            == About GIT                       ==

More information about GIT, its repository format, and the canonical
C based implementation can be obtained from the GIT websites:

  http://git.or.cz/
  http://www.kernel.org/pub/software/scm/git/
  http://www.kernel.org/pub/software/scm/git/docs/