* stable-6.4:
Fix getPackedRefs to not throw NoSuchFileException
Add pack options to preserve and prune old pack files
Allow to perform PackedBatchRefUpdate without locking loose refs
Document option "core.sha1Implementation" introduced in 59029aec
Change-Id: I36051c623fcd480aa80ed32b4e89f9bdd1b798e0
* stable-6.3:
Fix getPackedRefs to not throw NoSuchFileException
Add pack options to preserve and prune old pack files
Allow to perform PackedBatchRefUpdate without locking loose refs
Document option "core.sha1Implementation" introduced in 59029aec
Change-Id: I1073098fb06eabafdb3c5e7fcf44d55b86a1b152
* stable-6.2:
Fix getPackedRefs to not throw NoSuchFileException
Add pack options to preserve and prune old pack files
Allow to perform PackedBatchRefUpdate without locking loose refs
Document option "core.sha1Implementation" introduced in 59029aec
Change-Id: I765c7302ce84a6a9c28fdef29da2bfaa49477c6e
The object size index can have up to #(blobs-in-repo) entries, taking
a relevant amount of memory. Let operators configure the threshold size
to include objects in the size index.
The index will include objects with size *at or above* this
value (with -1 for none). This is more effective for the
filter-by-size case.
Lowering the threshold adds more objects to the index. This improves
performance at the cost of memory/storage space. For the object-size
case, more calls will use the index instead of reading IO. For the
filter-by-size case, lower threshold means better granularity (if
ObjectReader#isSmallerThan is implemented based only on the index).
Change-Id: I6ccd9334adbbc2abf95fde51dbbfc85b8230ade0
* stable-6.1:
Fix getPackedRefs to not throw NoSuchFileException
Add pack options to preserve and prune old pack files
Allow to perform PackedBatchRefUpdate without locking loose refs
Document option "core.sha1Implementation" introduced in 59029aec
Change-Id: Id32683d5f506e082d39af269803bccee0280cc27
* stable-6.0:
Add pack options to preserve and prune old pack files
Allow to perform PackedBatchRefUpdate without locking loose refs
Document option "core.sha1Implementation" introduced in 59029aec
Change-Id: I876a38c2de8b7d5eaacd00e36b85599f88173221
* stable-5.13:
Add pack options to preserve and prune old pack files
Allow to perform PackedBatchRefUpdate without locking loose refs
Document option "core.sha1Implementation" introduced in 59029aec
Change-Id: I423f410578f5bbe178832b80fef8998a5372182c
Since Files.newInputStream is from java.nio package, it throws
java.nio.file.NoSuchFileException. This was missed in the change
I00da88e. Without this change, getPackedRefs fails with
NoSuchFileException when there is no packed-refs file in a project.
Change-Id: I93c202ddb73a0a5979af8e4d09e45f5645664b45
Signed-off-by: Prudhvi Akhil Alahari <quic_prudhvi@quicinc.com>
Operations like "clone --filter=blob:limit=N" or the "object-info"
command need to read the size of the objects from the storage. An
index would provide those sizes at once rather than having to seek in
the packfile.
Introduce an interface for the Object-size index. This index returns
the inflated size of an object. Not all objects could be indexed (to
limit memory usage).
This implementation indexes only blobs (no trees, nor
commits) *above* certain size threshold (configurable). Lower
threshold adds more objects to the index, consumes more memory and
provides better performance. 0 means "all blobs" and -1 "disabled".
If we don't index everything, for the filter use case is more
efficient to index the biggest objects first: the set is small and
most objects are filtered by NOT being in the index. For the
object-size, the more objects in the index the better, regardless
their size. All together, it is more helpful to index above threshold.
Change-Id: I9ed608ac240677e199b90ca40d420bcad9231489
The object size index stores positions of objects in the main
index (when ordered by sha1). These positions are per-pack and usually
a pack has <16 million objects (there are exceptions but rather
rare). It could save some memory storing these positions in three bytes
instead of four. Note that these positions are sorted and always positive.
Implement a wrapper around a byte[] to access and search "ints" while
they are stored as unsigned 3 bytes.
Change-Id: Iaa26ce8e2272e706e35fe4cdb648fb6ca7591972
The primary index returns the offset in the pack for an
objectId. Internally it keeps the object-ids in lexicographical order,
but doesn't expose an API to find the position of an object-id in that
list. This is needed for the object-size index, that we want to store
as "position-in-idx, size".
Add a #findPosition(object-id) method to the PackIndex interface to
know where an object-id sits in the ordered list of ids in the pack.
Note that this index position is over the list of ordered object-ids,
while reverse-index position is over the list of objects in packed
order.
Change-Id: I89fa146599e347a26d3012d3477d7f5bbbda7ba4
Add the options
- pack.preserveOldPacks
- pack.prunePreserved
This allows to configure in git config if old packs should be preserved
during gc and pruned during the next gc.
The original implementation in 91132bb0 only allows to set these options
using the API.
Change-Id: I5b23ab4f317d12f5ccd234401419913e8263cc9a
JGit knows how to read/write commit graphs but the DFS stack is not
using it yet.
The DFS garbage collector generates a commit-graph with commits
reachable from any ref. The pack is stored as extra stream in the GC
pack. DfsPackFile mimicks how other indices are loaded storing the
reference in DFS cache.
Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I3f94997377986d21a56b300d8358dd27be37f5de
ObjectReader#getCommitGraph doesn't report errors loading the
commit graph. The caller should be aware of the situation and
ultimately decide what to do.
Add IOException to ObjectReader#getCommitGraph signature. RevWalk
defaults to an empty commit-graph on IO errors.
Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I38eeacff76c7f926b6dfb192d1e5916e40770024
Add another newBatchUpdate method in the RefDirectory where we can
control if the created PackedBatchRefUpdate will lock the loose refs or
not.
This can be useful in cases when we run programs which have exclusive
access to a Git repository and we know that locking loose refs is
unnecessary and just a performance loss.
Change-Id: I7d0932eb1598a3871a2281b1a049021380234df9
(cherry picked from commit cb90ed0852)
The 'size' packet line is an argument, so it
must be preceeded by a 0001 delimiter. See also git's
t5701-git-serve.sh test,
https://github.com/git/git/blob/8b8d9a2/t/t5701-git-serve.sh#L329
Without this fix, the server will choke on the delimiter line, saying
PackProtocolException: unexpected <empty string>
To test, I ran Gerrit locally with this fix
$ curl -X POST -H 'git-protocol: version=2' -H 'content-type:
application/x-git-upload-pack-request' -H 'accept:
application/x-git-upload-pack-result' --data
$'0018command=object-info\n00010009size\n0031oid
d38b1b92bdb2893eb4505667375563f2d6d4086b\n0000'
http://localhost:8080/git.git/git-upload-pack
=>
0008size0032d38b1b92bdb2893eb4505667375563f2d6d4086b 268590000
The same command completes identically on Gitlab (which supports the
object-info command)
$ curl -X POST -H 'git-protocol: version=2' -H 'content-type:
application/x-git-upload-pack-request' -H 'accept:
application/x-git-upload-pack-result' --data
$'0018command=object-info\n00010009size\n0031oid
d38b1b92bdb2893eb4505667375563f2d6d4086b\n0000'
https://gitlab.com/gitlab-org/git.git/git-upload-pack
=>
0008size0032d38b1b92bdb2893eb4505667375563f2d6d4086b 268590000
In this case, the blob is for the COPYING file in the Git source tree,
which is 26859 bytes long.
Change-Id: Ief4ce1eb9303a3b2479547d7950ef01c7c28f472
This change only affects inCore repositories.
Before this change, any file that wasn't part of the patch
wasn't read, and therefore wasn't part of the output tree.
Change-Id: I246ef957088f17aaf367143f7a0b3af0f8264ffb
Bug: Google b/267270348
* master:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
[pgm] Fetch-CLI: add support for shallow
Speedup GC listing objects referenced from reflogs
Re-add servlet-api 4.0 to the target platform
Upgrade maven plugins
Cache trustFolderStat/trustPackedRefsStat value per-instance
Refresh 'objects' dir and retry if a loose object is not found
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: I370bc228481864912c3cd88d43e5a70517b1c186
* stable-6.4:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: Id0ebfbd85eb815716383b9495eb7dd1f54cf4d74
* stable-6.3:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: Iefcf5d832bd0087c1027876f2200689e1150abce
* stable-6.2:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: I2ff386d9a096277360e6c7bd5535b49984620fb3
* stable-6.1:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: Iff2fba026b49463016015b2fae1a42cf76ee2dbb
* stable-6.0:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: Ib5055f2f3b8a313c178d6f6c7c5630285ad5a726
* stable-5.13:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: I58ad4c210a5e7e5a1ba6b22315b04211c8909950
The FetchProcess needs to verify that all the refs received point
to objects that are reachable from the local refs, which could be
very expensive but is needed to avoid missing objects exceptions
because of broken chains.
When the local repository has a lot of refs (e.g. millions) and the
client is fetching a non-commit object (e.g. refs/sequences/changes in
Gerrit) the reachability check on all local refs can be very expensive
compared to the time to fetch the remote ref.
Example for a 2M refs repository:
- fetching a single non-commit object: 50ms
- checking the reachability of local refs: 30s
A ref pointing to a non-commit object doesn't have any parent or
successor objects, hence would never need to have a reachability check
done. Skipping the askForIsComplete() altogether would save the 30s
time spent in an unnecessary phase.
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: I09ac66ded45cede199ba30f9e71cc1055f00941b
FetchCommand#fetchSubmodules assumed that FETCH_HEAD can always be
parsed as a tree. This isn't true if it refers to a Ref referring to a
BLOB. This is e.g. used in Gerrit for Refs like refs/sequences/changes
which are used to implement sequences stored in git.
Change-Id: I414f5b7d9f2184b2d7d53af1dfcd68cccb725ca4
When running a GC.repack() against a repository with over one
thousands of refs/heads and tens of millions of ObjectIds,
the calculation of all bitmaps associated with all the refs
would result in an unreasonable big file that would take up to
several hours to compute.
Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz
Before this change: 20 mins
After this change and 2300 heads excluded: 10 mins (90s for bitmap)
Having such a large bitmap file is also slow in the runtime
processing and have negligible or even negative benefits, because
the time lost in reading and decompressing the bitmap in memory
would not be compensated by the time saved by using it.
It is key to preserve the bitmaps for those refs that are mostly
used in clone/fetch and give the ability to exlude some refs
prefixes that are known to be less frequently accessed, even
though they may actually be actively written.
Example: Gerrit sandbox branches may even be actively
used and selected automatically because its commits are very
recent, however, they may bloat the bitmap, making it ineffective.
A mono-repo with tens of thousands of developers may have
a relatively small number of active branches where the
CI/CD jobs are continuously fetching/cloning the code. However,
because Gerrit allows the use of sandbox branches, the
total number of refs/heads may be even tens to hundred
thousands.
Change-Id: I466dcde69fa008e7f7785735c977f6e150e3b644
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
The InMemoryRepository is used in tests (e.g. in gerrit tests) and it
can be useful to create a custom MemRefDatabase for some tests.
Change-Id: I6fbbbfe04400ea1edc988c8788c8eeb06ca8480a
The annotated tags should be excluded from the bitmap associated
with the heads-only packfile. However, this was not happening
because of the check of exclusion of the peeled object instead
of the objectId to be excluded from the bitmap.
Sample use-case:
refs/heads/main
^
|
commit1 <-- commit2 <- annotated-tag1 <- tag1
^
|
commit0
When creating a bitmap for the above commit graph, before this
change all the commits are included (3 bitmaps), which is
incorrect, because all commits reachable from annotated tags
should not be included.
The heads-only bitmap should include only commit0 and commit1
but because PackWriterBitPreparer was checking for the peeled
pointer of tag1 to be excluded (commit2) which was not found in
the list of tags to exclude (annotated-tag1), the commit2 was
included, even if it wasn't reachable only from the head.
Add an additional check for exclusion of the original objectId
for allowing the exclusion of annotated tags and their pointed
commits. Add one specific test associated with an annotated tag
for making sure that this use-case is covered also.
Example repository benchmark for measuring the improvement:
# refs: 400k (2k heads, 88k tags, 310k changes)
# objects: 11M (88k of them are annotate tags)
# packfiles: 2.7G
Before this change:
GC time: 5h
clone --bare time: 7 mins
After this change:
GC time: 20 mins
clone --bare time: 3 mins
Bug: 581267
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: Iff2bfc6587153001837220189a120ead9ac649dc
When cloning huge repositories I observed percentage of object counts
turning negative. This happened if lastWork * 100 exceeded
Integer.MAX_VALUE.
Change-Id: Ic5f5cf5a911a91338267aace4daba4b873ab3900
We are adding commit-graph loading to the DFS stack and the stats object doesn't have fields to track that.
This change replicates the stats of the primary index for the commit-graph.
Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I4a657bed50083c4ae8bc9f059d4943d612ea2d49
This adds support for shallow cloning. The CloneCommand and the
FetchCommand now have the new options --depth, --shallow-since and
--shallow-exclude to tell the server that the client doesn't want to
download the complete history.
Bug: https://bugs.eclipse.org/bugs/show_bug.cgi?id=475615
Change-Id: I8f113bed25dd6df64f2f95de6a59d4675ab8a903
GC needs to get a ReflogReader for all existing refs to list all objects
referenced from reflogs. The existing Repository#getReflogReader method
accepts the ref name and then resolves the Ref to create a ReflogReader.
GC calling that for a huge number of Refs one by one is very slow. GC
first gets all Refs in bulk and then calls getReflogReader for each of
them.
Fix this by adding another getReflogReader method to Repository which
accepts a Ref directly.
This speeds up running JGit gc on a mirror clone of the Gerrit
repository from 15:36 min to 1:08 min. The repository used in this test
had 45k refs, 275k commits and 1.2m git objects.
Change-Id: I474897fdc6652923e35d461c065a29f54d9949f4