PackWriter offers the ability to write out the pack file and its various
index files, except for the newly introduced file-based reverse index.
Now that PackReverseIndexWriter can write reverse index files,
PackWriter#writeReverseIndex will write one for a pack if the
corresponding config flag PackConfig#writeReverseIndex is on.
Change-Id: Ib75dd2bbfb9ee9366d5aacb46700d8cf8af4823a
Signed-off-by: Anna Papitto <annapapitto@google.com>
With completely independent branches, there is no merge base. In this
case, the list of commits must include the root commit of the branch to
be rebased.
Bug: 581832
Change-Id: I0f5bdf179d5b07ff09f1a274d61c7a0b1c0011c6
Signed-off-by: Thomas Wolf <twolf@apache.org>
Handle the case of the commit to be picked not having any parents.
Since JGit implements cherry-pick as a 3-way-merge between the commit
to be picked and the target commit, using the parent of the picked
commit as merge base, this is super simple: just don't set a base tree.
The merger will not find any merge base and will supply an empty tree
iterator for the base.
Bug: 581832
Change-Id: I88985f1b1723db5b35ce58bf228bc48d23d6fca3
Signed-off-by: Thomas Wolf <twolf@apache.org>
JGit's AddCommand always renormalizes tracked files. C git does so only
on git add --renormalize. Especially for git add . and the JGit
equivalent git.add().addFilepattern(".").call() this can make a big
difference if there are many files, or large files.
Add a "renormalize" option to AddCommand. To maintain compatibility with
existing uses, this option is "true" by default, and the behavior of
AddCommand is as it has always been in JGit.
If set to "false", use an IndexDiffFilter (in addition to a path filter,
if any). This skips any unchanged files (that are not racily clean) from
content checks. Note that changes in CRLF settings or in filters will be
ignored for such files if renormalize == false.
Add the "--renormalize" option to the Add command in the JGit command
line program. For the command-line program, the default is as in C git:
renormalize is off by default and enabled only if the option is given.
Note that --renormalize implies --update in the command line program, as
in C git. In AddCommand, the two settings are independent.
Additionally, avoid opening input streams unnecessarily in
WorkingTreeIterator.getEntryContentLength() and fix some bogus
indentation.
Add a simple test that adds 1000 files of 10kB in 10 directories twice
and that fails if the second invocation (without any changes) with
renormalize=false is not significantly faster.
Locally, I observe for that second invocation
* git.add().addFilepattern(".").call() ~660ms
* git.add().addFilepattern(".").setRenormalize(false).call() ~16ms
Bug: 494323
Change-Id: I30f9d518563fa55d7058a48c27c425f3b60aeb4c
Signed-off-by: Thomas Wolf <twolf@apache.org>
* stable-6.5:
[bazel] Move ToolTestCase to src folder (6.2)
GcConcurrentTest: @Ignore flaky testInterruptGc
Fix CommitTemplateConfigTest
Fix after_open config and Snapshotting RefDir tests to work with bazel
[bazel] Skip ConfigTest#testCommitTemplatePathInHomeDirecory
Demote severity of some error prone bug patterns to warnings
Parse pull.rebase=preserve as alias for pull.rebase=merges
UploadPack: Fix NPE when traversing a tag chain
Change-Id: I16e8553d187a8ef541f578291f47fc39c3da4ac0
The reverse index for a pack is used to quickly find an object's
position in the pack's forward index based on that object's pack offset.
It is currently computed from the forward index by sorting the index
entries by the corresponding pack offset. This computation uses
bucket sort with insertion sort, which has an average runtime of
O(n log n) and worst case runtime of O(n^2); and memory usage of
3*size(int)*n because it maintains 3 int arrays, even after sorting is
completed. The computation must be performed every time that the reverse
index object is created in memory.
In contrast, Cgit persists a pack reverse index file to avoid
recomputing the reverse index ordering every time that it is needed.
Instead they write a file with format
https://git-scm.com/docs/pack-format#_pack_rev_files_have_the_format
which can later be read and parsed into an in-memory reverse index each
time it is needed.
Introduce these reverse index files to JGit. PackReverseIndexWriter
writes out a reverse index file to be read later when needed. Subclass
PackReverseIndexWriterV1 writes a file with the official version 1
format.
To avoid temporarily allocating an Integer collection while sorting and
writing out the contents, using memory 4*size(Integer)*n, use an
IntList and its #sort method, which uses quicksort.
Change-Id: I6437745777a16f723e2f1cfcce4e0d94e599dcee
Signed-off-by: Anna Papitto <annapapitto@google.com>
IntList is a class for working with lists of primitive ints without
boxing them into Integers. For writing the reverse index file format,
sorting ints will be needed but IntList doesn't provide a sorting
method yet.
Add the #sort method to sort an IntList by an IntComparator, using
quicksort, which has a average runtime of O(n log n) and sorts in-place
by using O(log n) stack frames for recursive calls.
Change-Id: Id69a687c8a16d46b13b28783b194a880f3f4c437
Signed-off-by: Anna Papitto <annapapitto@google.com>
* stable-6.4:
[bazel] Move ToolTestCase to src folder (6.2)
GcConcurrentTest: @Ignore flaky testInterruptGc
Fix CommitTemplateConfigTest
Fix after_open config and Snapshotting RefDir tests to work with bazel
[bazel] Skip ConfigTest#testCommitTemplatePathInHomeDirecory
Demote severity of some error prone bug patterns to warnings
UploadPack: Fix NPE when traversing a tag chain
Change-Id: I6d20fea3a417e4361b61e81756253343668eb5de
* stable-6.3:
[bazel] Move ToolTestCase to src folder (6.2)
GcConcurrentTest: @Ignore flaky testInterruptGc
Fix CommitTemplateConfigTest
Fix after_open config and Snapshotting RefDir tests to work with bazel
[bazel] Skip ConfigTest#testCommitTemplatePathInHomeDirecory
Demote severity of some error prone bug patterns to warnings
UploadPack: Fix NPE when traversing a tag chain
Change-Id: I463f8528e623316add204848d551c44d44d04858
* stable-6.2:
[bazel] Move ToolTestCase to src folder (6.2)
GcConcurrentTest: @Ignore flaky testInterruptGc
Fix CommitTemplateConfigTest
Fix after_open config and Snapshotting RefDir tests to work with bazel
[bazel] Skip ConfigTest#testCommitTemplatePathInHomeDirecory
Demote severity of some error prone bug patterns to warnings
UploadPack: Fix NPE when traversing a tag chain
Change-Id: I736c7d0ed9c6e9718fa98976c3dc5a25ab8cda85
* stable-6.1:
GcConcurrentTest: @Ignore flaky testInterruptGc
Fix CommitTemplateConfigTest
Fix after_open config and Snapshotting RefDir tests to work with bazel
[bazel] Skip ConfigTest#testCommitTemplatePathInHomeDirecory
Demote severity of some error prone bug patterns to warnings
UploadPack: Fix NPE when traversing a tag chain
Change-Id: I9863cbce95d845efc891724898954b0b2f8dbf7b
During my development of Id7721cc5b7ea650e77c2db47042715487983cae6, I
have found this test to be flaky when run by CI. As a speculative fix,
mark this test as @Ignore so it won't be run.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Change-Id: Idfe04d7f1fb72a772d4c8d249ca86a9c2eec0b1a
The changes I1db6fcf414b and I634b92877f added tests which were failing
with errors [1] and [2] with "bazel test //...". This was not caught
because we don't have CI running with bazel. Fix bazel build file so
that these errors are no longer thrown when run with bazel.
[1] error: cannot find symbol FileRepositoryBuilderTest
[2] error: cannot find symbol RefDirectoryTest
Bug: 581816
Signed-off-by: Prudhvi Akhil Alahari <quic_prudhvi@quicinc.com>
Change-Id: I1e57111662825f5f14f373bc4f8d24cce1fec0b8
* stable-6.0:
[bazel] Skip ConfigTest#testCommitTemplatePathInHomeDirecory
Demote severity of some error prone bug patterns to warnings
UploadPack: Fix NPE when traversing a tag chain
Change-Id: I5e13d5b5414aef97e518898166bfa166c692e60f
Move this test to another class and skip it when running tests with
bazel since the bazel test runner does not allow to create files in the
home directory.
FS#userHome retrieves the home directory on the first call and caches it
for subsequent calls to avoid overhead in case path translation is
required (currently on cygwin). This prevents that the test can mock the
home directory using MockSystemReader like SshTestHarness does.
Change-Id: I6a22f37f4a19eb4b4935509eae508a23e56db7aa
This ensures backwards compatibility to the old config value which was
removed in git 2.34 which JGit followed in Ic07ff954e2.
Change-Id: I2b4e27fd71898b6e0e227e406c40682bd9786cd4
Always parse RevTags including their body before getting their object
to ensure that non-cached objects are handled correctly when traversing
a tag chain. An NPE in UploadPack#addTagChain will occur on a depth=1
fetch of a branch containing a tag chain and the ref to one of the
middle tags in the chain is deleted.
Change-Id: Ifd8fe868869070b365df926fec5dcd8e64d4f521
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
A commit A can reach a commit B only if the generation number of A is
strictly larger than the generation number of B. This condition allows
significantly short-circuiting commit-graph walks.
On a copy of the Linux repository where HEAD is contained in v6.3-rc4
but no earlier tag, the command 'git tag --contains HEAD' of
ListTagCommand#call() had the following peformance improvement:
(excluded the startup time of the repo)
Before: 2649ms (core.commitgraph=true)
11909ms (core.commitgraph=false)
After: 91ms (core.commitgraph=true)
11934ms (core.commitgraph=false)
Bug: 574368
Change-Id: Ia2efaa4e9ae598266f72e70eb7e3b27655cbf85b
Signed-off-by: kylezhao <kylezhao@tencent.com>
* stable-6.5:
Ensure parsed RevCommitCG has derived data from commit-graph
Downgrade maven-site-plugin to 3.12.1
Use wagon-ssh-external to deploy Maven site
Change-Id: Ide721fb088fa04f6276ac495968a45e732f6e139
If a RevCommitCG was newly created and called #parseCanonical(RevWalk,
byte[]) method immediately, its flag was marked as PARSED, but no
derived data was obtained from the commit-graph. This is different from
what we expected.
Change-Id: I5d417efa3c42d211f19e6acf255f761e84d84450
Signed-off-by: kylezhao <kylezhao@tencent.com>
Running
bazel test //org.eclipse.jgit.test:org_eclipse_jgit_patch_PatchApplierTest
results in a DefaultCharset error on my machine, which can be avoided
by explicitly specifying a charset when calling getBytes on a string. In
these tests, the charset used doesn't really matter, so go with UTF_8.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Change-Id: Id7721cc5b7ea650e77c2db47042715487983cae6
During my development of Id7721cc5b7ea650e77c2db47042715487983cae6, I
have found this test to be flaky when run by CI. As a speculative fix,
mark this test as @Ignore so it won't be run.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Change-Id: Idfe04d7f1fb72a772d4c8d249ca86a9c2eec0b1a
Introduce a SnapshottingRefDirectory class which allows users to get
a snapshot of the ref database and use it in a request scope (for
example a Gerrit query) instead of having to re-read packed-refs
several times in a request.
This can potentially be further improved to avoid scanning/reading a
loose ref several times in a request. This would especially help
repeated lookups of a packed ref, where we check for the existence of
a loose ref each time.
Change-Id: I634b92877f819f8bf36a3b9586bbc1815108189a
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
Since the first attempt to read a ref is not expected to trigger
a RefsChangedEvent, update the test to ensure 'lastNotifiedModCnt'
is not 0 before we start the actual work. The test has been passing
luckily because createBareRepository in setUp() happens to bump
'lastNotifiedModCnt'.
Change-Id: Ibd981f677920e8c3b965aa742fe669c42b8c1c93
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
By default, Git will report, to the server, commits reachable from all local refs to find common commits in an attempt to reduce the size of the to-be-received packfile. If specified with negotiation tip, Git will only report commits reachable from the given tips. This is useful to speed up fetches when the user knows which local ref is likely to have commits in common with the upstream ref being fetched.
When negotation-tip is on, use the wanted refs instead of all refs as source of the "have" list to send.
This is controlled by the `fetch.usenegotationtip` flag, false by default. This works only for programmatic fetches and there is no support for it yet in the CLI.
Change-Id: I19f8fe48889bfe0ece7cdf78019b678ede5c6a32
Support the new option index.skipHash which was introduced in git 2.40
[1]. If it is set to true skip computing the git index checksum. This
accelerates Git commands that manipulate the index, such as git add, git
commit, or git status. Instead of storing the checksum, write a trailing
set of bytes with value zero, indicating that the computation was
skipped.
Accept a skipped checksum consisting of 20 null bytes when reading the
index since the option could have been set to true at the time when the
index was written.
[1] https://git-scm.com/docs/git-config#Documentation/git-config.txt-indexskipHash
Bug: 581723
Change-Id: I28ebe44c5ca1cbcb882438665d686452a0c111b2
Some test data introduced in [1] includes code with Apache
license. This triggers warnings in license analyzers.
Remove the licensed files and the test case as a quick relief. We
should restore the test with appropiate data in a follow-up change.
[1] https://git.eclipse.org/r/c/jgit/jgit/+/197724
Change-Id: I42670dc7d994f77d2c7f2c2156bcf1e112374022
1. For general errors, throw IOException instead of wrapping them with
PatchApplyException. The wrapping was moved (back) to ApplyCommand.
2. For file specific errors, log the errors as part of
PatchApplier::Result.
3. Change applyPatch() to receive the parsed Patch object, so the caller
can decide how to handle parsing errors.
Background: this utility class was extracted from ApplyCommand on V6.4.0.
During the extraction, we left the exception wrapping by
PatchApplyException intact. This attitude made it harder for the callers to
distinguish between the actual error causes.
Change-Id: Ib0f2b5e97a13df2339d8b65f2fea1c819c161ac3
When commit-graph file already exists in the repository, a newly
created FileCommitGraph didn't scan CommitGraph until the file was
modified, resulting in wrong result.
Change-Id: Ic85676f2d3b6a88f3ae28d4065729926b6fb2f23
Signed-off-by: kylezhao <kylezhao@tencent.com>
As of today, we don't have test coverage for RefDirectory when
core.trustPackedRefsStat config is set to after_open. Thus create new
test classes which set core.trustPackedRefsStat config to after_open in
setup and extend RefDirectoryTest and FileRepositoryBuilderTest
respectively.
Change-Id: I1db6fcf414bc488106ad4c85fb934480f299c995
Signed-off-by: Prudhvi Akhil Alahari <quic_prudhvi@quicinc.com>
The native git option to preserve merge commits during rebase
has been changed from pull.rebase=preserve to pull.rebase=merges.
This changeset in jgit makes the same config change. The old "preserve"
option is no longer recognized and is replaced by new option called
"merges".
This makes jgit's rebase configuration compatible with native git
versions 2.34 and newer where the old "preserve" option has been
removed.
Change-Id: Ic07ff954e258115e76465a1593ef3259f4c418a3
Display elapsed time per task if enabled via
ProgressMonitor#showDuration or if system property or environment
variable GIT_TRACE_PERFORMANCE is set to "true". If both the system
property and the environment variable are set the system property takes
precedence.
E.g. using jgit CLI:
$ GIT_TRACE_PERFORMANCE=true jgit clone https://foo.bar/foobar
Cloning into 'foobar'...
remote: Counting objects: 1 [0.002s]
remote: Finding sources: 100% (15531/15531) [0.006s]
Receiving objects: 100% (169737/169737) [13.045s]
Resolving deltas: 100% (67579/67579) [1.842s]
Change-Id: I4d624e7858b286aeddbe7d4e557589986d73659e
PackWriter callers tell the writer what do the want to include in the
pack and invoke #writePack(). Afterwards, they can invoke #writeIndex()
to write the corresponding pack index.
Mirror this for the object-size index, adding a #writeObjectSizeIndex()
method.
Change-Id: Ic319975c72c239cd6488303f7d4cced797e6fe00
* stable-6.4:
If tryLock fails to get the lock another gc has it
Fix GcConcurrentTest#testInterruptGc
Don't swallow IOException in GC.PidLock#lock
Check if FileLock is valid before using or releasing it
Change-Id: Ia2797b44a60342eb9df53f0b3d674cba92a512fc
* stable-6.3:
If tryLock fails to get the lock another gc has it
Fix GcConcurrentTest#testInterruptGc
Don't swallow IOException in GC.PidLock#lock
Check if FileLock is valid before using or releasing it
Change-Id: I5af34c92e423a651db53b4dc45ed844d5f39910d
* stable-6.2:
If tryLock fails to get the lock another gc has it
Fix GcConcurrentTest#testInterruptGc
Don't swallow IOException in GC.PidLock#lock
Check if FileLock is valid before using or releasing it
Change-Id: I5b6b10622b61fde3f0f10455a74ae159a0b69082
* stable-6.1:
If tryLock fails to get the lock another gc has it
Fix GcConcurrentTest#testInterruptGc
Don't swallow IOException in GC.PidLock#lock
Check if FileLock is valid before using or releasing it
Change-Id: I3ffe92566cc145053bb762f612dd96bc6d542c62
* stable-6.0:
If tryLock fails to get the lock another gc has it
Fix GcConcurrentTest#testInterruptGc
Don't swallow IOException in GC.PidLock#lock
Check if FileLock is valid before using or releasing it
Change-Id: Idea23e555c024557d7e39a86efe25f609400b962
* stable-5.13:
If tryLock fails to get the lock another gc has it
Fix GcConcurrentTest#testInterruptGc
Don't swallow IOException in GC.PidLock#lock
Check if FileLock is valid before using or releasing it
Change-Id: I708d0936fa86b028e4da4e7e21f332f8b48ad293
C git 2.11 supports setting the equivalent of RequestPolicy.ANY with
uploadpack.allowAnySHA1InWant[1]. Parse this into TransportConfig and
use it from UploadPack.
Add additional tests for [2] and this change.
We can execute "git clone --filter=blob:none --no-checkout" successfully
with config uploadPack.allowFilter is true. But when we checkout, the
git will fetch other missing objects required by the checkout(this is
why we need this config).
When both uploadPack.allowFilter and uploadPack.allowAnySHA1InWant are
true, jgit will support partial clone. If you are using an extremely
large monorepo, this feature can help. It allows users to work on an
incomplete repo which reduces disk usage.
[1] f8edeaa05d
[2] change Id39771a6e42d8082099acde11249306828a053c0
Bug: 573390
Change-Id: I8fe75f03bf1fea7c11e0d67c8637bd05dd1f9b89
Signed-off-by: kylezhao <kylezhao@tencent.com>
Operations like "clone --filter=blob:limit=N" or the "object-info"
command need to read the size of the objects from the storage. An
index would provide those sizes at once rather than having to seek in
the packfile.
Introduce an interface for the Object-size index. This index returns
the inflated size of an object. Not all objects could be indexed (to
limit memory usage).
This implementation indexes only blobs (no trees, nor
commits) *above* certain size threshold (configurable). Lower
threshold adds more objects to the index, consumes more memory and
provides better performance. 0 means "all blobs" and -1 "disabled".
If we don't index everything, for the filter use case is more
efficient to index the biggest objects first: the set is small and
most objects are filtered by NOT being in the index. For the
object-size, the more objects in the index the better, regardless
their size. All together, it is more helpful to index above threshold.
Change-Id: I9ed608ac240677e199b90ca40d420bcad9231489
The object size index stores positions of objects in the main
index (when ordered by sha1). These positions are per-pack and usually
a pack has <16 million objects (there are exceptions but rather
rare). It could save some memory storing these positions in three bytes
instead of four. Note that these positions are sorted and always positive.
Implement a wrapper around a byte[] to access and search "ints" while
they are stored as unsigned 3 bytes.
Change-Id: Iaa26ce8e2272e706e35fe4cdb648fb6ca7591972
The primary index returns the offset in the pack for an
objectId. Internally it keeps the object-ids in lexicographical order,
but doesn't expose an API to find the position of an object-id in that
list. This is needed for the object-size index, that we want to store
as "position-in-idx, size".
Add a #findPosition(object-id) method to the PackIndex interface to
know where an object-id sits in the ordered list of ids in the pack.
Note that this index position is over the list of ordered object-ids,
while reverse-index position is over the list of objects in packed
order.
Change-Id: I89fa146599e347a26d3012d3477d7f5bbbda7ba4
JGit knows how to read/write commit graphs but the DFS stack is not
using it yet.
The DFS garbage collector generates a commit-graph with commits
reachable from any ref. The pack is stored as extra stream in the GC
pack. DfsPackFile mimicks how other indices are loaded storing the
reference in DFS cache.
Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I3f94997377986d21a56b300d8358dd27be37f5de
The 'size' packet line is an argument, so it
must be preceeded by a 0001 delimiter. See also git's
t5701-git-serve.sh test,
https://github.com/git/git/blob/8b8d9a2/t/t5701-git-serve.sh#L329
Without this fix, the server will choke on the delimiter line, saying
PackProtocolException: unexpected <empty string>
To test, I ran Gerrit locally with this fix
$ curl -X POST -H 'git-protocol: version=2' -H 'content-type:
application/x-git-upload-pack-request' -H 'accept:
application/x-git-upload-pack-result' --data
$'0018command=object-info\n00010009size\n0031oid
d38b1b92bdb2893eb4505667375563f2d6d4086b\n0000'
http://localhost:8080/git.git/git-upload-pack
=>
0008size0032d38b1b92bdb2893eb4505667375563f2d6d4086b 268590000
The same command completes identically on Gitlab (which supports the
object-info command)
$ curl -X POST -H 'git-protocol: version=2' -H 'content-type:
application/x-git-upload-pack-request' -H 'accept:
application/x-git-upload-pack-result' --data
$'0018command=object-info\n00010009size\n0031oid
d38b1b92bdb2893eb4505667375563f2d6d4086b\n0000'
https://gitlab.com/gitlab-org/git.git/git-upload-pack
=>
0008size0032d38b1b92bdb2893eb4505667375563f2d6d4086b 268590000
In this case, the blob is for the COPYING file in the Git source tree,
which is 26859 bytes long.
Change-Id: Ief4ce1eb9303a3b2479547d7950ef01c7c28f472
This change only affects inCore repositories.
Before this change, any file that wasn't part of the patch
wasn't read, and therefore wasn't part of the output tree.
Change-Id: I246ef957088f17aaf367143f7a0b3af0f8264ffb
Bug: Google b/267270348
* stable-6.4:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: Id0ebfbd85eb815716383b9495eb7dd1f54cf4d74
* stable-6.3:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: Iefcf5d832bd0087c1027876f2200689e1150abce
* stable-6.2:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: I2ff386d9a096277360e6c7bd5535b49984620fb3
* stable-6.1:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: Iff2fba026b49463016015b2fae1a42cf76ee2dbb
* stable-6.0:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: Ib5055f2f3b8a313c178d6f6c7c5630285ad5a726
* stable-5.13:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: I58ad4c210a5e7e5a1ba6b22315b04211c8909950
When running a GC.repack() against a repository with over one
thousands of refs/heads and tens of millions of ObjectIds,
the calculation of all bitmaps associated with all the refs
would result in an unreasonable big file that would take up to
several hours to compute.
Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz
Before this change: 20 mins
After this change and 2300 heads excluded: 10 mins (90s for bitmap)
Having such a large bitmap file is also slow in the runtime
processing and have negligible or even negative benefits, because
the time lost in reading and decompressing the bitmap in memory
would not be compensated by the time saved by using it.
It is key to preserve the bitmaps for those refs that are mostly
used in clone/fetch and give the ability to exlude some refs
prefixes that are known to be less frequently accessed, even
though they may actually be actively written.
Example: Gerrit sandbox branches may even be actively
used and selected automatically because its commits are very
recent, however, they may bloat the bitmap, making it ineffective.
A mono-repo with tens of thousands of developers may have
a relatively small number of active branches where the
CI/CD jobs are continuously fetching/cloning the code. However,
because Gerrit allows the use of sandbox branches, the
total number of refs/heads may be even tens to hundred
thousands.
Change-Id: I466dcde69fa008e7f7785735c977f6e150e3b644
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
The annotated tags should be excluded from the bitmap associated
with the heads-only packfile. However, this was not happening
because of the check of exclusion of the peeled object instead
of the objectId to be excluded from the bitmap.
Sample use-case:
refs/heads/main
^
|
commit1 <-- commit2 <- annotated-tag1 <- tag1
^
|
commit0
When creating a bitmap for the above commit graph, before this
change all the commits are included (3 bitmaps), which is
incorrect, because all commits reachable from annotated tags
should not be included.
The heads-only bitmap should include only commit0 and commit1
but because PackWriterBitPreparer was checking for the peeled
pointer of tag1 to be excluded (commit2) which was not found in
the list of tags to exclude (annotated-tag1), the commit2 was
included, even if it wasn't reachable only from the head.
Add an additional check for exclusion of the original objectId
for allowing the exclusion of annotated tags and their pointed
commits. Add one specific test associated with an annotated tag
for making sure that this use-case is covered also.
Example repository benchmark for measuring the improvement:
# refs: 400k (2k heads, 88k tags, 310k changes)
# objects: 11M (88k of them are annotate tags)
# packfiles: 2.7G
Before this change:
GC time: 5h
clone --bare time: 7 mins
After this change:
GC time: 20 mins
clone --bare time: 3 mins
Bug: 581267
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: Iff2bfc6587153001837220189a120ead9ac649dc
When cloning huge repositories I observed percentage of object counts
turning negative. This happened if lastWork * 100 exceeded
Integer.MAX_VALUE.
Change-Id: Ic5f5cf5a911a91338267aace4daba4b873ab3900
RevWalk#createCommit() will inspect the commit-graph file to find the
specified object's graph position and then return a new RevCommitCG
instance.
RevCommitGC is a RevCommit with an additional "pointer" (the position)
to the commit-graph, so it can load the headers and metadata from there
instead of the pack. This saves IO access in walks where the body is not
needed (i.e. #isRetainBody is false and #parseBody is not invoked).
RevWalk uses automatically the commit-graph if available, no action
needed from callers. The commit-graph is fetched on first access from
the reader (that internally can keep it loaded and reuse it between
walks).
The startup cost of reading the entire commit graph is small. After
testing, reading a commit-graph with 1 million commits takes less than
50ms. If we use RepositoryCache, it will not be initialized util the
commit-graph is rewritten.
Bug: 574368
Change-Id: I90d0f64af24f3acc3eae6da984eae302d338f5ee
Signed-off-by: kylezhao <kylezhao@tencent.com>
Add a couple tests that confirm what the docs say about isModified() and
equals(MISSING_FILE) behavior.
Change-Id: I6093040ba3594934c3270331405a44b2634b97c5
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
In shallow repos, GC writes to the commit-graph that shallow commits
do not have parents. This won't be true after a "git fetch --unshallow"
(and before another GC).
Do not write the commit-graph from shallow clones of a repo. The
commit-graph must have the real metadata of commits and that is not
available in a shallow view of the repo.
Change-Id: Ic9f2358ddaa607c74f4dbf289c9bf2a2f0af9ce0
Signed-off-by: kylezhao <kylezhao@tencent.com>
A ternary search tree is a type of tree where nodes are arranged in a
manner similar to a binary search tree, but with up to three children
rather than the binary tree's limit of two.
Each node of a ternary search tree stores a single character, a
reference to a value object and references to its three children named
equal kid, lo kid and hi kid. The lo kid pointer must point to a node
whose character value is less than the current node. The hi kid pointer
must point to a node whose character is greater than the current
node.[1] The equal kid points to the next character in the word. Each
node in a ternary search tree represents a prefix of the stored strings.
All strings in the middle subtree of a node start with that prefix.
Like other prefix trees, a ternary search tree can be used as an
associative map with the ability for incremental string search. Ternary
search trees are more space efficient compared to standard prefix trees,
at the cost of speed.
They allow efficient prefix search which is important to implement
searching refs by prefix in a RefDatabase.
Searching by prefix returns all keys if the prefix is an empty string.
Bug: 576165
Change-Id: If160df70151a8e1c1bd6716ee4968e4c45b2c7ac
FileRepository's ObjectReader#getCommitGraph will return commit-graph
when it exists and core.commitGraph is true.
DfsRepository is not supported currently.
Change-Id: I992d43d104cf542797e6949470e95e56de025107
Signed-off-by: kylezhao <kylezhao@tencent.com>
If the last line came from the patch, use the patch to determine whether
or not there should be a trailing newline. Otherwise use the old text.
Add test cases for
- no newline at end, last line not in patch hunk
- no newline at end, last line in patch hunk
- patch removing the last newline
- patch adding a newline at the end of file not having one
all for core.autocrlf false, true, and input.
Add a test case where the "no newline" indicator line is not the last
line of the last hunk. This can happen if the patch ends with removals
at the file end.
Bug: 581234
Change-Id: I09d079b51479b89400ad300d0662c1dcb50deab6
Also-by: Yuriy Mitrofanov <a2terminator@mail.ru>
Signed-off-by: Thomas Wolf <twolf@apache.org>
This change makes JGit can read .git/objects/info/commit-graph file
and then get CommitGraph.
Loading a new commit-graph into memory requires additional time. After
testing, loading a copy of the Linux's commit-graph(1039139 commits)
is under 50ms.
Bug: 574368
Change-Id: Iadfdd6ed437945d3cdfdbe988cf541198140a8bf
Signed-off-by: kylezhao <kylezhao@tencent.com>
Some lines were too long, unnecessary fully qualified class names,
and an assertEquals(actual, expected) when it should have been
assertEquals(expected, actual).
Change-Id: I3b3c46c963afe2fb82a79c1e93970e73778877e5
Signed-off-by: Thomas Wolf <twolf@apache.org>
IO#readFully is often called with the intent to fill the destination
array from beginning to end. The redundant arguments for where to start
and stop filling are opportunities for bugs if specified incorrectly or
if not changed to match a changed array length.
Provide a overloaded method for filling the full destination array.
Change-Id: I964f18f4a061189cce1ca00ff0258669277ff499
Signed-off-by: Anna Papitto <annapapitto@google.com>