StartGenerator is responsible for propagating the RevWalk's
parent rewrite setting, but it currently only does so when a
non-default TreeFilter is set, when it should also do so if
the default TreeFilter is used with a non-default RevFilter.
Adding a new if condition within StartGenerator to enable parent
rewrite with non-default RevFilter.
TreeRevFilter relied on the old buggy functionality and has
been modified to explicitly refrain from rewriting parents.
Change-Id: I4e4ff67fb279edbcc3461496b132cea774fb742f
We want to know what objects had bitmaps in the walk of the
request. We can check their position in the history and evaluate
our bitmap selection algorithm.
Use the listener interface of the BitmapWalker to get the objects
walked with bitmaps and store them in the statistics.
Change-Id: Id15a904eb642d7f50d80ac77d1146db4fe4706eb
According to the https://git-scm.com/docs/git-interpret-trailers the
CGit supports multiline trailers. Subsequent lines of such multiline
trailers have to start with a whitespace.
We also rewrite the original parsing code to make it easier to work
with. The old code had pointers moving both backwards and forwards at
the same time. In the rewritten code we first find the start of the last
paragraph and then do all the parsing.
Since all the getters of the FooterLine return String, I've considered
rewriting the parsing code to operate on strings. However the original
code seems to be written with the idea, that the data is only lazily
copied in getters and no extra allocations should be performed during
original parsing (ex. during RevWalk). The changed code keeps to this
idea.
Bug: Google b/312440626
Change-Id: Ie1e3b17a4a5ab767b771c95f00c283ea6c300220
We want to know what objects had bitmaps in the walk of the
request. We can check their position in the history and evaluate our
bitmap selection algorithm.
Introduce a listener interface to the BitmapIndex to report which
getBitmap() calls returned a bitmap (or not) and a method to the
bitmap index to set the listener.
Change-Id: Iac8fcc1539ddd2dd450e8a1cf5a5b1089679c378
We can track bitmaps queries that found a bitmap directly in the
BitmapIndex.
Remove the listener.
Change-Id: I5ad518a58b681bf327fee3ae5c5f6e4449d3da1f
When a tag with the same name as the branch exists, the branch creation
process should work too. We should detect that the branch already
exists, and allow to force create it when the force option is used.
Bug: 582538
Change-Id: I3b350d03be8edcde10e97b2318343240ca896cb0
In this new interface default methods are useful only to instantiate
noop instances. We rather reuse the same noop instance and save the
"default" to add backward compatible methods to existing interfaces.
Make the methods regular interface methods and provide a noop
instance.
Change-Id: Ie84ff17c8e9f16837245751739ee8c99463e76ee
During the walk, the commit can be either
1. already in the walk bitmap
2. unvisited so far with bitmap in the bitmap index
3. unvisited so far without bitmap in the bitmap index
Expose these three states in the interface. This makes the interface
easier to explain: it reports the commits found during the walk.
As it is all about commits, rename the methods to onCommit***.
Change-Id: I661f303eb22d3e735b0e439f16df7ace612376d9
The documentation for TemporaryBuffer::length says:
"The length is only accurate after {@link #close()} has been invoked".
However, we need to have the stream open while accessing the length.
This prevents patches on large files to be applied correctly, as the
result get trimmed.
Bug: Google b/309500446
Change-Id: Ic1540f6d0044088f3b46f1fad5f6a28ec254b711
We want to know what objects had a bitmap in the walk, to see where do
they sit in the commit history and evaluate our bitmap selection
algorithm.
Add a listener interface to the bitmap walker announcing the objects
walked and whether they had bitmap.
Change-Id: I956fe2ad927a500710d2cbe78ecd4d26f178c266
Since it's part of the API deprecate the wrong spelling and add the
correct one with the same value.
Change-Id: I0f6ea95a5e66c9e80142eb6d40eb7ec3a7aaf8e2
By first checking for null-ness and then for the number of strings to
compare we can get rid of a redundant null check.
Change-Id: I0d9a088352c6c1ffea12bc2cded2c63e5293a8a7
Currently for file-based repositories JGit will go over all refs in the
repository forach `ref-prefix` listed in the `ls-refs` command in git
protocol v2 request.
Native git, uses a different approach, where all refs are read once and
then for each ref, all `ref-prefix` filter values are checked in one
pass.
This change implements this approach in JGit only in the `RefDirectory`
backend. And makes `ref-prefix` filtering ~40% faster for repositories
with packed refs.
Different implementations were tested on a synthetic file repository
with 10k refs in `refs/heads/` and `290k` in `refs/changes`. Before
testing `git pack-refs` command was executed. All results are in
seconds.
Current Impl: 39.340 37.093 35.996
Nested for loops: 25.077 24.742 24.748
Nested streams: 24.827 24.890 27.525
Parallel stream + stream: 23.357 23.318 23.174
Nested parallel streams: 23.490 23.318 23.317
Stream + for loop: 23.147 23.210 23.126
Parallel stream + for loop: 23.317 23.423 22.847
The elapsed time was measured around `getRefByPrefix` call in
`Uploadapack.getFilteredRefs(Collection<String>)` (around lines 952 and
954). For testing a modified version of
`UploadPackTest.testV2LsRefsRefPrefix()` was used. The modifications
here included:
* shadowing protected `repo` variable with `FileRepository` pointing
to the synthetic repo with 300k refs described above,
* mimicking the git client clone request by adding `ref-prefix HEAD`,
`ref-prefix refs/heads/` and `ref-prefix refs/tags/`
Based on the above results, the implementation with parallel stream and
stream was selected.
Bug: 578550
Signed-off-by: Dariusz Luksza <dariusz.luksza@gmail.com>
Change-Id: I6416846c074b611ff6ec9d351dbafcfbcaf68e66
Change [1] reduced the scope of the "writing commit graph" monitoring
task. This left some monitor#update() calls out of any task. When out
of a task, the #update call is a noop.
Delete this update calls as they are noops and misleading.
The affected chunks are usually small and quick to write, so probably
they don't need progress monitoring.
[1] https://git.eclipse.org/r/c/jgit/jgit/+/205339
Change-Id: I74d94e6e44e58816937dc8a84e5a10b340e54e0b
* changes:
Use try-with-resource to ensure UploadPack is closed
Fix hiding field warning
Fix warning for empty code blocks
Fix boxing warnings
errorprone: remove unnecessary parentheses
Update mockito to 5.7.0 and bytebuddy to 1.14.9
Enable Maven reproducible builds
Upgrade bazlets to the latest revision
- configure Maven to run build reproducibly [1]
- use UTC timestamp of checked out commit as build timestamp
- add git-describe, git-commit-id, git-commit-id, git-tags,
git-remote-origin-url to MANIFEST.MF files
- configure cyclonedx-maven-plugin to also use UTC timestamp of
checked out commit
- for packaging build use tycho-buildtimestamp-jgit [2] to ensure
version uses the timestamp of the last commit
- SBOMs are not reproducible by design [3] they should have a build
timestamp matching the time when the build was executed and a serial
number which is a unique UUID per build run. Hence exclude them from
comparison [4].
- Use gmavenplus-plugin to format build timestamps. Maven expects
build timestamp in ISO-8601 format, to replace the qualifier in
versions the timestamp format must be compatible with rules for OSGi
version numbers. Didn't find a way to read the properties set by the
git-commit-id-maven-plugin from another plugin. Hence use JGit in a
groovy script to get the commit time of the current HEAD and provide
it in these two formats.
TODO: packaging build (features and p2 repository) is not yet binary
reproducible since that's not yet supported by Tycho [5], artefacts have
reproducible version numbers but file lastModified timestamps are not
yet reproducible.
Test plan for Maven build:
- build using
mvn clean install"
- verify second build is reproducible:
mvn -T1 clean verify artifact:compare
verification seems not to be thread-safe, hence run it with a single
thread using option -T1
For packaging build (still fails due to non-reproducible file
timestamps):
- build using
mvn -f org.eclipse.jgit.packaging/pom.xml clean install
- verify second build is reproducible:
mvn -T1 -f org.eclipse.jgit.packaging/pom.xml clean verify artifact:compare
[1] https://maven.apache.org/guides/mini/guide-reproducible-builds.html
[2] https://wiki.eclipse.org/Tycho/Reproducible_Version_Qualifiers
[3] https://github.com/CycloneDX/cyclonedx-maven-plugin/issues/84
[4] https://maven.apache.org/plugins/maven-artifact-plugin/compare-mojo.html
[5] https://github.com/eclipse-tycho/tycho/issues/233
Change-Id: I0202f55a1b6ae0edd922cfef638beb39d2ce9417
This reverts commit 3937300f3e.
Reason for revert: This kills performance on the DFS side, that relies on loading the minimal amount of refs and reftables for quick prefix searches.
Reverting as a safe option to keep master in good performance until we decide how to reintroduce this change.
Change-Id: I7b1a3f900d9c78ce95cf0972abb50b6becfe3bb1
Bloom filter computation can be an expensive process and right now it
is invisible to the user.
Report progress while calculating bloom filters.
Log of GC with bloom filter enabled:
Computing commit-graph path bloom filters: 100% (9551/9551)
Computing commit-graph generation numbers: 100% (9551/9551)
Writing out commit-graph: 100% (9551/9551)
Change-Id: Ife65e63ac2c37d064d5f049a366cbb52c3ef6798
The same progress monitor is passed around as parameter and inside the
output stream. The functions use one to start tasks and another to
report progress, which is confusing. The stream needs the monitor to
check cancellations so we cannot remove it from there.
Make all code take the monitor from the stream.
Change-Id: Id3cb9c1cb0bd47318b46ef934a9d4037341e25a7
The ProgressMonitor task to track the calculation of generation
numbers is nested inside the task that follows the writing of all
lines in the commit-graph. ProgressMonitor doesn't support nested
tasks and this confuses the counting.
Move the start/end of the "writing commit graph" task to the
writeCommitData section, after calculating the generation
numbers. Make that task track by commits instead of by lines.
Moving the start/end of the progress task to the chunk-writing
functions is clearer and easier to extend.
Logging of GC before:
Writing out commit-graph in 3 passes: 51% ( 9807/19358)
Computing commit-graph generation numbers: 100% (9551/9551)
Logging of GC after:
Computing commit-graph generation numbers: 100% (9551/9551)
Writing out commit-graph: 100% (9551/9551)
Change-Id: I87d69c06c9a3c7e75be12b6f0d1a63b5924e298a
Currenty JGit will go over all refs in the repository for each
`ref-prefix`. This means that refs will be read multiple
times, which leads to subpar performance.
Native git, uses a different approach, where all refs are read once
and then for each ref, all `ref-prefix` filter values are checked in
one pass.
This change implements this approach in JGit. And makes `ref-prefix`
filtering ~28% faster for a repository with fully packed refs
and ~5% when RefTable is used instead of refdir.
Different implementations were tested on a synthetic file repository
with 300k refs. Different implementations were tested for unpacked and
fully packed refs (results are in seconds).
Unpacked refs:
Current Impl: 54.838 57.234 56.138
Nested for loops: 36.094 37.025 36.502
Nested stream's: 36.154 35.989 37.262
Parallel stream + stream: 36.923 37.272 35.362
Nested parallel stream's: 35.512 38.395 36.745
Stream + for loop: 34.950 36.164 37.191
Parallel stream + for loop: 37.695 35.511 35.378
Packed refs:
Current Impl: 39.713 39.954 38.653
Nested for loops: 29.891 29.753 29.377
Nested stream's: 30.340 29.637 30.412
Parallel stream + stream: 28.653 28.254 29.138
Nested parallel stream's: 29.942 28.850 31.030
Stream + for loop: 29.405 29.576 30.539
Parallel stream + for loop: 29.012 29.215 29.380
RefTable:
Current Impl: 0.273 0.294 0.330
Nested for loops: 0.252 0.169 0.215
Nested stream's: 0.252 0.228 0.213
Parallel stream + stream: 0.233 0.259 0.247
Nested parallel stream's: 0.416 0.309 0.340
Stream + for loop: 0.224 0.247 0.242
Parallel stream + for loop: 0.347 0.246 0.346
The elapsed time was measured around `getRefsByPrefix` call in
`UploadPack.getFilteredRefs(Collection<String>)` (around lines 952 and
954).
Based on the above results, the implementation with parallel stream and
stream was selected.
Bug: 578550
Change-Id: Iac3a3aacf897b87b3448c1d528cdac64ad312199
Signed-off-by: Dariusz Luksza <dariusz.luksza@gmail.com>
Protocol v2 introduced refs-in-wants and ls-remote with
prefixes. UploadPack already uses prefixes provided by the client
during a v2 ref advertisement (ls-refs). However, when the client
consequently sends another request to fetch a previously advertised
ref (with want-ref lines), the server uses the whole set of advertised
refs to compute reachability.
In repos with many refs, this slows down the reachability checks
setting up and walking through unnecessary refs. For gerrit it can
also break valid requests because in gerrit "all" means "recent" and
the wanted-ref could fall out of the "recent" range when reloading all
refs at fetch time.
Treat wanted-refs like a ref-prefix when calculating the advertised
refs on v2 fetch command. Less refs means a faster setup and less walk
for the reachability checks. Note that wanted-refs filters only over
the refs visible to the user, so this doesn't give any extra
visibility to the caller.
If the request contains also "want <oid>" lines, we cannot use this
optimization. Those objects could be reachable from any visible
branch, not necessarily in the wanted-refs.
Google-Bug: b/122888978
Change-Id: I2a4ae171d4fc5d4cb30b020cb073ad23dd5a66c4
With the useNegotiationTip flag (introduced in change 738dacb), the client sends to the server only the tips of the wanted refs for the negotiation. Some wanted refs may not exist in the client (yet) and our implementation ignores them. So when only non-existing refs are wanted, jgit doesn't send any tips and the server understands it is a full clone.
In useNegotiationTip, send ALL_REFS if any of the wanted refs does not exists locally.
Change-Id: Ide04c5df785b9212abcd9d3cba194515e0af166f
Other getters (e.g. bitmap or commit graph) cover the case that the
pack doesn't have the corresponding extension.
Do the same here to detect this early and avoid an IOException in
openFile.
Change-Id: I29726b7ede0f795d35543453a3e7f92cee872a78
When the exception is thrown, we don't know if it is because the
stream didn't have data or had a wrong header.
Log the read bytes to differentiate these cases.
Change-Id: Ie7612eab39016f5ad7f1bfb2e07cab972dab796f
This was added in
- f103a1d5c6 "Add support for git config repack.packKeptObjects"
- f5f4bf0ad9 "Do not exclude objects in locked packs from bitmap
processing"
Change-Id: Id6af9fe549535c4e92de9080a41ef9f72a6646dd