Since Files.newInputStream is from java.nio package, it throws
java.nio.file.NoSuchFileException. This was missed in the change
I00da88e. Without this change, getPackedRefs fails with
NoSuchFileException when there is no packed-refs file in a project.
Change-Id: I93c202ddb73a0a5979af8e4d09e45f5645664b45
Signed-off-by: Prudhvi Akhil Alahari <quic_prudhvi@quicinc.com>
* stable-6.0:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: Ib5055f2f3b8a313c178d6f6c7c5630285ad5a726
* stable-5.13:
Shortcut during git fetch for avoiding looping through all local refs
FetchCommand: fix fetchSubmodules to work on a Ref to a blob
Silence API warnings introduced by I466dcde6
Allow the exclusions of refs prefixes from bitmap
PackWriterBitmapPreparer: do not include annotated tags in bitmap
BatchingProgressMonitor: avoid int overflow when computing percentage
Speedup GC listing objects referenced from reflogs
FileSnapshotTest: Add more MISSING_FILE coverage
Change-Id: I58ad4c210a5e7e5a1ba6b22315b04211c8909950
The FetchProcess needs to verify that all the refs received point
to objects that are reachable from the local refs, which could be
very expensive but is needed to avoid missing objects exceptions
because of broken chains.
When the local repository has a lot of refs (e.g. millions) and the
client is fetching a non-commit object (e.g. refs/sequences/changes in
Gerrit) the reachability check on all local refs can be very expensive
compared to the time to fetch the remote ref.
Example for a 2M refs repository:
- fetching a single non-commit object: 50ms
- checking the reachability of local refs: 30s
A ref pointing to a non-commit object doesn't have any parent or
successor objects, hence would never need to have a reachability check
done. Skipping the askForIsComplete() altogether would save the 30s
time spent in an unnecessary phase.
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: I09ac66ded45cede199ba30f9e71cc1055f00941b
FetchCommand#fetchSubmodules assumed that FETCH_HEAD can always be
parsed as a tree. This isn't true if it refers to a Ref referring to a
BLOB. This is e.g. used in Gerrit for Refs like refs/sequences/changes
which are used to implement sequences stored in git.
Change-Id: I414f5b7d9f2184b2d7d53af1dfcd68cccb725ca4
When running a GC.repack() against a repository with over one
thousands of refs/heads and tens of millions of ObjectIds,
the calculation of all bitmaps associated with all the refs
would result in an unreasonable big file that would take up to
several hours to compute.
Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz
Before this change: 20 mins
After this change and 2300 heads excluded: 10 mins (90s for bitmap)
Having such a large bitmap file is also slow in the runtime
processing and have negligible or even negative benefits, because
the time lost in reading and decompressing the bitmap in memory
would not be compensated by the time saved by using it.
It is key to preserve the bitmaps for those refs that are mostly
used in clone/fetch and give the ability to exlude some refs
prefixes that are known to be less frequently accessed, even
though they may actually be actively written.
Example: Gerrit sandbox branches may even be actively
used and selected automatically because its commits are very
recent, however, they may bloat the bitmap, making it ineffective.
A mono-repo with tens of thousands of developers may have
a relatively small number of active branches where the
CI/CD jobs are continuously fetching/cloning the code. However,
because Gerrit allows the use of sandbox branches, the
total number of refs/heads may be even tens to hundred
thousands.
Change-Id: I466dcde69fa008e7f7785735c977f6e150e3b644
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
The annotated tags should be excluded from the bitmap associated
with the heads-only packfile. However, this was not happening
because of the check of exclusion of the peeled object instead
of the objectId to be excluded from the bitmap.
Sample use-case:
refs/heads/main
^
|
commit1 <-- commit2 <- annotated-tag1 <- tag1
^
|
commit0
When creating a bitmap for the above commit graph, before this
change all the commits are included (3 bitmaps), which is
incorrect, because all commits reachable from annotated tags
should not be included.
The heads-only bitmap should include only commit0 and commit1
but because PackWriterBitPreparer was checking for the peeled
pointer of tag1 to be excluded (commit2) which was not found in
the list of tags to exclude (annotated-tag1), the commit2 was
included, even if it wasn't reachable only from the head.
Add an additional check for exclusion of the original objectId
for allowing the exclusion of annotated tags and their pointed
commits. Add one specific test associated with an annotated tag
for making sure that this use-case is covered also.
Example repository benchmark for measuring the improvement:
# refs: 400k (2k heads, 88k tags, 310k changes)
# objects: 11M (88k of them are annotate tags)
# packfiles: 2.7G
Before this change:
GC time: 5h
clone --bare time: 7 mins
After this change:
GC time: 20 mins
clone --bare time: 3 mins
Bug: 581267
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: Iff2bfc6587153001837220189a120ead9ac649dc
When cloning huge repositories I observed percentage of object counts
turning negative. This happened if lastWork * 100 exceeded
Integer.MAX_VALUE.
Change-Id: Ic5f5cf5a911a91338267aace4daba4b873ab3900
GC needs to get a ReflogReader for all existing refs to list all objects
referenced from reflogs. The existing Repository#getReflogReader method
accepts the ref name and then resolves the Ref to create a ReflogReader.
GC calling that for a huge number of Refs one by one is very slow. GC
first gets all Refs in bulk and then calls getReflogReader for each of
them.
Fix this by adding another getReflogReader method to Repository which
accepts a Ref directly.
This speeds up running JGit gc on a mirror clone of the Gerrit
repository from 15:36 min to 1:08 min. The repository used in this test
had 45k refs, 275k commits and 1.2m git objects.
Change-Id: I474897fdc6652923e35d461c065a29f54d9949f4
Instead of re-reading the config every time the methods using these
values were called, cache the config value at the time of instance
construction. Caching the values improves performance for each of the
method calls. These configs are set based on the filesystem storing the
repository and unlikely to change while an application is running.
Change-Id: I1cae26dad672dd28b766ac532a871671475652df
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
A new loose object may not be immediately visible on a NFS
client if it was created on another client. Refreshing the
'objects' dir and trying again can help work around the NFS
behavior.
Here's an E2E problem that this change can help fix. Consider
a Gerrit multi-primary setup with repositories based on NFS.
Add a new patch-set to an existing change and then immediately
fetch the new patch-set of that change. If the fetch is handled
by a Gerrit primary different that the one which created the
patch-set, then we sometimes run into a MissingObjectException
that causes the fetch to fail.
Bug: 581317
Change-Id: Iccc6676c68ef13a1e8b2ff52b3eeca790a89a13d
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
Add a couple tests that confirm what the docs say about isModified() and
equals(MISSING_FILE) behavior.
Change-Id: I6093040ba3594934c3270331405a44b2634b97c5
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
Currently, we always read packed-refs file when 'trustFolderStat'
is false. Introduce a new config 'trustPackedRefsStat' which takes
precedence over 'trustFolderStat' when reading packed refs. Possible
values for this new config are:
* always: Trust packed-refs file attributes
* after_open: Same as 'always', but refresh the file attributes of
packed-refs before trusting it
* never: Always read the packed-refs file
* unset: Fallback to 'trustFolderStat' to determine if the file
attributes of packed-refs can be trusted
Folks whose repositories are on NFS and have traditionally been
setting 'trustFolderStat=false' can now get some performance improvement
with 'trustPackedRefsStat=after_open' as it refreshes the file
attributes of packed-refs (at least on some NFS clients) before
considering it.
For example, consider a repository on NFS with ~500k packed-refs. Here
are some stats which illustrate the improvement with this new config
when reading packed refs on NFS:
trustFolderStat=true trustPackedRefsStat=unset: 0.2ms
trustFolderStat=false trustPackedRefsStat=unset: 155ms
trustFolderStat=false trustPackedRefsStat=after_open: 1.5ms
Change-Id: I00da88e4cceebbcf3475be0fc0011ff65767c111
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
Update documentation for core.trustFolderStat to highlight that it is
also used when reading the packed-refs file.
Change-Id: I3eac377c3a7f48493abc8ae6d0889ee70a05d24d
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
introduced by
- addition of configurable SHA1 implementation in 5.13.2
- 3-digit @since 5.9.1 annotations on GitServlet methods
Change-Id: If19853fcc5e3677e5b18e8e3fbbcd2773378dffc
* stable-6.0:
[benchmarks] Remove profiler configuration
Add SHA1 benchmark
[benchmarks] Set version of maven-compiler-plugin to 3.8.1
Fix running JMH benchmarks
Add option to allow using JDK's SHA1 implementation
Ignore IllegalStateException if JVM is already shutting down
Change-Id: I176419026c3f4fdd8ebd34c61468c1ec3482ff45
* stable-5.13:
[benchmarks] Remove profiler configuration
Add SHA1 benchmark
[benchmarks] Set version of maven-compiler-plugin to 3.8.1
Fix running JMH benchmarks
Add option to allow using JDK's SHA1 implementation
Ignore IllegalStateException if JVM is already shutting down
Change-Id: I40105336f0b9e593a8a2c242a9557f854c274fdc
Without this I sometimes hit the error:
$ java -jar target/benchmarks.jar
Exception in thread "main" java.lang.RuntimeException: ERROR: Unable to
find the resource: /META-INF/BenchmarkList
at org.openjdk.jmh.runner.AbstractResourceReader.getReaders(AbstractResourceReader.java:98)
at org.openjdk.jmh.runner.BenchmarkList.find(BenchmarkList.java:124)
at org.openjdk.jmh.runner.Runner.internalRun(Runner.java:253)
at org.openjdk.jmh.runner.Runner.run(Runner.java:209)
at org.openjdk.jmh.Main.main(Main.java:71)
Change-Id: Iea9431d5f332f799d55a8a2d178c79a2ef0da22b
The change If6da9833 moved the computation of SHA1 from the JVM's
JCE to a pure Java implementation with collision detection.
The extra security for public sites comes with a cost of slower
SHA1 processing compared to the native implementation in the JDK.
When JGit is used internally and not exposed to any traffic from
external or untrusted users, the extra cost of the pure Java SHA1
implementation can be avoided, falling back to the previous
native MessageDigest implementation.
Bug: 580310
Change-Id: Ic24c0ba1cb0fb6282b8ca3025ffbffa84035565e
Extract private static method UploadPackServlet#statusCodeForThrowable
to a public static method in the UploadPackErrorHandler interface so
that implementers of this interface can reuse the default mapping.
Change-Id: Ie4a0a006b0148d5b828d610c55d19ce407aab055
The fix that fixed the propagation of error-codes:
8984e1f66 HTTP Smart: set correct HTTP status on error [1]
made some faulty assumptions.
"Wants not valid", can be an intermittent git error and the HTTP
response should be 200 and not 400 since the request isn't necessary
faulty.
[1] https://git.eclipse.org/r/c/jgit/jgit/+/192677
Bug: 579676
Change-Id: I461bc78ff6e450636811ece50d21c57a2a7f2ae3
Trying to register/unregister a shutdown hook when the JVM is already in
shutdown throws an IllegalStateException. Ignore this exception since we
can't do anything about it.
Bug: 580953
Change-Id: I8fc6fdd5585837c81ad0ebd6944430856556d90e
Add another newBatchUpdate method in the RefDirectory where we can
control if the created PackedBatchRefUpdate will lock the loose refs or
not.
This can be useful in cases when we run programs which have exclusive
access to a Git repository and we know that locking loose refs is
unnecessary and just a performance loss.
Change-Id: I7d0932eb1598a3871a2281b1a049021380234df9
* stable-6.0:
UploadPack: don't prematurely terminate timer in case of error
Do not create reflog for remote tracking branches during clone
UploadPack: do not check reachability of visible SHA1s
Add missing package import javax.management to org.eclipse.jgit
Change-Id: I08734ee2c8f3296d908da6a29d53ed87c4b48eb2
* stable-5.13:
UploadPack: don't prematurely terminate timer in case of error
Do not create reflog for remote tracking branches during clone
UploadPack: do not check reachability of visible SHA1s
Add missing package import javax.management to org.eclipse.jgit
Change-Id: I6db0a4d74399fde892eeec62efd2946f97547a5d
In uploadWithExceptionPropagation don't prematurely terminate timer in
case of error to enable reporting it to the client. Expose a close
method so that callers can terminate it at the appropriate time.
If the timer is already terminated when trying to report it to the
client this failed with the error java.lang.IllegalStateException:
"Timer already terminated".
Bug: 579670
Change-Id: I95827442ccb0f9b1ede83630cf7c51cf619c399a
When using JGit on a non-bare repository, the CloneCommand
it previously created local reflogs for all branches including remote
tracking ones, causing the generation of a potentially large
number of files on the local filesystem.
The creation of the remote-tracking branches (refs/remotes/*) during
clone is not an issue for the local filesystem because all of them are
stored in a single packed-refs file. However, the creation of a large
number of ref logs on a local filesystem IS an issue because it
may not be tuned or initialised in term of inodes to contain a very
large number of files.
When a user (or a CI system) performs the CloneCommand against
a potentially large repository (e.g., millions of branches), it is
interested in working or validating a single branch or tag and is
unlikely to work with all the remote-tracking branches.
The eager creation of a reflogs for all the remote-tracking branches is
not just a performance issue but may also compromise the ability to
use JGit for cloning a large repository.
The behaviour implemented in this change is also consistent with the
optimisation done in the C code-base [1].
We differentiate between clone and fetch commands using --branch
<initialBranch> option, that is only available in clone command,
and is set as HEAD per default.
[1] 58f233ce1e
Bug: 579805
Change-Id: I58d0d36a8a4ce42e0f59b8bf063747c4b81bd859
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
When JGit needs to serve a Git client requesting SHA1s
during the want phase, it needs to make a full reachability
check from the advertised refs to the ones requested to
keep all objects in the correct scope of confidentiality
allowed by the avertised refs.
The check is also performed when the SHA1 corresponds to
one of the tips of the advertised refs which is a waste of
resources.
Example:
fetch> ref-prefix refs/heads/foo
fetch< 900505eb8ce8ced2a1757906da1b25c357b9654e refs/heads/foo
fetch< 0000
fetch> command=fetch
fetch> 0001
fetch> thin-pack
fetch> ofs-delta
fetch> want 900505eb8ce8ced2a1757906da1b25c357b9654e
The SHA1 in the want is the tip of refs/heads/foo and therefore
the full reachability check can be shortened and resolved more
quickly.
Change-Id: I49bd9e2464e0bd3bca2abf14c6e9df550d07383b
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Class org.eclipse.jgit.util.Monitoring uses JMX hence we need this
import otherwise OSGi applications can face ClassNotFoundException.
Bug: 577018
Change-Id: Ifd75337b87c7faec95d333b771bb0a2f3e46a418
* stable-6.0:
Prepare 5.13.2-SNAPSHOT builds
JGit v5.13.1.202206130422-r
AmazonS3: Add support for AWS API signature version 4
Change-Id: Ie9c38ab8033fe1283e8b444b6acd3f4298062bf3
* stable-5.13:
Prepare 5.13.2-SNAPSHOT builds
JGit v5.13.1.202206130422-r
AmazonS3: Add support for AWS API signature version 4
Change-Id: Ibd663a1d874d1aac274abc3dd44354fd99f64c39
Updating the AmazonS3 class to support AWS Signature version 4 because
version 2 is no longer supported in all AWS regions. The version can be
selected with the new 'aws.api.signature.version' property (defaults to
2 for backwards compatibility). When set to '4', the user must also
specify the AWS region via the 'region' property. The 'region' property
must match the region that the 'domain' property resolves to.
Bug: 579907
Change-Id: If289dbc6d0f57323cfeaac2624c4eb5028f78d13