![]() The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org> |
||
---|---|---|
org.eclipse.jgit | ||
org.eclipse.jgit.console | ||
org.eclipse.jgit.http.server | ||
org.eclipse.jgit.http.test | ||
org.eclipse.jgit.iplog | ||
org.eclipse.jgit.junit | ||
org.eclipse.jgit.junit.http | ||
org.eclipse.jgit.packaging | ||
org.eclipse.jgit.pgm | ||
org.eclipse.jgit.test | ||
org.eclipse.jgit.ui | ||
tools | ||
.eclipse_iplog | ||
.gitattributes | ||
LICENSE | ||
README | ||
SUBMITTING_PATCHES | ||
pom.xml |
README
== Java GIT == This package is licensed under the BSD. org.eclipse.jgit/ A pure Java library capable of being run standalone, with no additional support libraries. Some JUnit tests are provided to exercise the library. The library provides functions to read and write a GIT formatted repository. All portions of jgit are covered by the BSD. Absolutely no GPL, LGPL or EPL contributions are accepted within this package. org.eclipse.jgit.test/ Unit tests for org.eclipse.jgit and the same licensing rules. == WARNINGS / CAVEATS == - Symbolic links are not supported because java does not support it. Such links could be damaged. - Only the timestamp of the index is used by jgit check if the index is dirty. - Don't try the library with a JDK other than 1.6 (Java 6) unless you are prepared to investigate problems yourself. JDK 1.5.0_11 and later Java 5 versions *may* work. Earlier versions do not. JDK 1.4 is *not* supported. Apple's Java 1.5.0_07 is reported to work acceptably. We have no information about other vendors. Please report your findings if you try. - CRLF conversion is never performed. On Windows you should thereforc make sure your projects and workspaces are configured to save files with Unix (LF) line endings. == Package Features == org.eclipse.jgit/ * Read loose and packed commits, trees, blobs, including deltafied objects. * Read objects from shared repositories * Write loose commits, trees, blobs. * Write blobs from local files or Java InputStreams. * Read blobs as Java InputStreams. * Copy trees to local directory, or local directory to a tree. * Lazily loads objects as necessary. * Read and write .git/config files. * Create a new repository. * Read and write refs, including walking through symrefs. * Read, update and write the Git index. * Checkout in dirty working directory if trivial. * Walk the history from a given set of commits looking for commits introducing changes in files under a specified path. * Object transport Fetch via ssh, git, http, Amazon S3 and bundles. Push via ssh, git and Amazon S3. JGit does not yet deltify the pushed packs so they may be a lot larger than C Git packs. org.eclipse.jgit.pgm/ * Assorted set of command line utilities. Mostly for ad-hoc testing of jgit log, glog, fetch etc. == Missing Features == There are a lot of missing features. You need the real Git for this. For some operations it may just be the preferred solution also. There are not just a command line, there is e.g. git-gui that makes committing partial files simple. - Merging. - Repacking. - Generate a GIT format patch. - Apply a GIT format patch. - Documentation. :-) - gitattributes support In particular CRLF conversion is not implemented. Files are treated as byte sequences. - submodule support Submodules are not supported or even recognized. == Support == Post question, comments or patches to the git@vger.kernel.org mailing list. == Contributing == See SUBMITTING_PATCHES in this directory. However, feedback and bug reports are also contributions. == About GIT == More information about GIT, its repository format, and the canonical C based implementation can be obtained from the GIT websites: http://git.or.cz/ http://www.kernel.org/pub/software/scm/git/ http://www.kernel.org/pub/software/scm/git/docs/