Change Ide3445e652 introduced the `--pack-kept-objects` option to GC for
including the objects contained in the locked packfiles during the
repack phase.
Whilst this allowed to explicitly pass a command line argument to the
jgit gc program, it did not allow the option to be read from
configuration.
Allow the pack kept objects option to be configured exactly as C-Git
documents [1], by introducing a new `repack.packKeptObjects`
configuration.
`repack.packKeptObjects` defaults to `true`, when the
`pack.buildBitmaps` is `true` (which is the default case), `false`
otherwise.
[1] https://git-scm.com/docs/git-config#Documentation/git-config.txt-repackpackKeptObjects
Bug: 582292
Change-Id: Ia931667277410d71bc079d27c097a57094299840
Packfiles having an equivalent .keep file are associated with in-flight
pushes that haven't been completed, with potentially a set of git
objects not yet referenced by a ref.
If the Git client is not up-to-date, it may result in pushing a
packfile, generating a <packfile>.keep on the server, which
may also contain existing commits due to the lack of Git protocol
negotiation in the git-receive-pack.
The Git protocol negotiation is the phase where the client and the
server exchange the list of refs they have for trying to find a common
base and minimise the amount of objects to be transferred.
The repack phase in GC was previously skipping all objects that were
contained in all packfiles having a <packfile>.keep file associated
(aka "locked packfiles"), which did not take into consideration the
fact that excluding the existing commits would have resulted in the
generation of an invalid bitmap file.
The code for excluding the objects in the locked packfiles was written
well before the bitmap was introduced, hence could not consider a use
case that did not exist at that time.
However, when the bitmap was introduced, the exclusion of locked
packfiles was not changed, hence creating a potential problem.
The issue went unnoticed for many years because the bitmap generation
was disabled when JGit noticed any locked packfiles; however, the
bitmaps are enabled again since Id722e68d9f , and the the issue is now
visible and is impacting the GC repack phase.
Introduce the '--pack-kept-objects' option in GC for including the
objects contained in the locked packfiles during the repack phase,
which is not an issue because of the following:
- If there are any existing commits duplicated in the packfiles
they will be just considered once anyway because the repack doesn't
generate duplicates in the output packfile.
- If there are any new commits that do not have any ref pointing to
them, they will be automatically excluded from the output repacked
packfile.
The same identical solution is adopted in the C implementation of git
in repack.c.
Because the locked packfile is not pruned, any new commits not pointed
by any refs will remain in the repository and there will not be any
accidental pruning or object loss as it is today before this change.
As a side-effect of this change, it is now potentially possible to still
have duplicate BLOBs after GC when the keep packfile contained existing
objects. However, it is way better to keep the duplication until the
next GC phase rather than omitting existing objects from repacking and,
therefore generating an invalid bitmap and incorrect packfile.
Bug: 582292
Bug: 582455
Change-Id: Ide3445e652fcf256a7912f881cb898897c99b8f8
The packfiles with the .keep extensions are meant to prevent
a packfile from being processed or removed during GC.
From the point of view of the GC process then, the associated
packfile should be completely transparent:
- it should not included in the repacked file
- it should not pruned
- its objects should be left untouched, even if unreachable
- the GC process, including the bitmap generation should continue
as usual, as the the packfiles with .keep file did not exist
Add one explicit test for making sure that the management
of .keep file is also transparent to the generation of bitmaps,
which are still generated if a .keep file exists.
Bug: 582039
Change-Id: I14f6adc3f961c606fbc617e51ea6ed6e2ef8604f
Add an explicit flag to PackWriter for allowing the
GC.repack() phase to explicitly generate bitmaps only for the
heads packfile and not for the others.
Previously the bitmap generation was conditioned to the
presence of object ids exclusion from the PackWriter.
The introduction of the bitmap generation in the PackWriter
done in Icdb0cdd66 has accidentally made the .keep files not
completely transparent, because their presence have disabled
the generation of the bitmap index, even if the generation
of bitmaps is enabled.
This bug has been an accidental consequence of the intention
of the bitmap generator to avoid generating bitmaps for the
non-heads packfile, however the implementation done by Colby
decided to use the excludeInPacks variable (see [1]) which
is unfortunately also used for excluding the packfiles having
an associated .keep file (see [2]).
[1] https://git.eclipse.org/r/c/jgit/jgit/+/7940/18/org.eclipse.jgit/src/org/eclipse/jgit/storage/pack/PackWriter.java#1617
[2] dafcb8f6db/org.eclipse.jgit/src/org/eclipse/jgit/storage/file/GC.java (506)
Bug: 582039
Change-Id: Id722e68d9ff4ac24e73bf765ab11017586b6766e
When loosening the objects inside the packfiles to be pruned, make sure
that the packfile list is stable and prune all the files after the
loosening is done.
This prevents a series of exceptions previously thrown when loosening
the packfiles, due to the too early pruning of the packfiles that were
still in the pack list.
Bug: 581532
Change-Id: I776776e2e083f1fa749d53f965bf50f919823b4f
This reverts commit 9c33f7364d.
Reason for revert: This change was based on the false claim that the
packedrefs file lock is held while the CAS is being done, but it is
actually released before the CAS (the in memory lock is still held,
however that does not prevent external actors from updating the
packedrefs files and then another thread from subsequently re-reading it
and updating the in memory packedRefList). Although reverting this
change can cause the CAS to fail, it should not actually matter since
the failure would indicate that another thread has already updated the
in memory packedRefList to either the same version this thread was
trying to update it too, or to a more recent version. Either way,
failing the CAS is then appropriate and should not be problematic.
Although this change reverts the code in the RefDirectory class, it
keeps the "improvements" to the test so that it continues to pass
reliably. The reason for the quotes around the word "improvements" is
because I believe the test alteration actually dramatically changes the
intent of the test, and that the original intent of the test is
untestable with the GC and RefDirectory classes as is.
Bug: 582044
Change-Id: I3acee7527bb542996dcdfaddfb2bdb45ec444db5
Signed-off-by: Martin Fick <quic_mfick@quicinc.com>
(cherry picked from commit c5617711a1)
During my development of Id7721cc5b7ea650e77c2db47042715487983cae6, I
have found this test to be flaky when run by CI. As a speculative fix,
mark this test as @Ignore so it won't be run.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Change-Id: Idfe04d7f1fb72a772d4c8d249ca86a9c2eec0b1a
Move this test to another class and skip it when running tests with
bazel since the bazel test runner does not allow to create files in the
home directory.
FS#userHome retrieves the home directory on the first call and caches it
for subsequent calls to avoid overhead in case path translation is
required (currently on cygwin). This prevents that the test can mock the
home directory using MockSystemReader like SshTestHarness does.
Change-Id: I6a22f37f4a19eb4b4935509eae508a23e56db7aa
The code is not violation free, so demote the severity of these bug
patterns to warning:
o DefaultCharset
o FutureReturnValueIgnored
o UnusedException
Change-Id: Ie886a4a247770a74953385f018498ac2515ed209
* stable-5.12:
Add missing since tag for SshBasicTestBase
Add missing since tag for SshTestHarness#publicKey2
Silence API errors
Prevent infinite loop rescanning the pack list on
PackMismatchException
Remove blank in maven.config
Change-Id: Ibe6652374ab5971105e62b05279f218c8c130fee
* stable-5.11:
Add missing since tag for SshBasicTestBase
Add missing since tag for SshTestHarness#publicKey2
Silence API errors
Prevent infinite loop rescanning the pack list on PackMismatchException
Remove blank in maven.config
Change-Id: I25bb99687b969f9915a7cbda8d1332bec778096a
* stable-5.10:
Add missing since tag for SshTestHarness#publicKey2
Silence API errors
Prevent infinite loop rescanning the pack list on
PackMismatchException
Remove blank in maven.config
Migrated "Prevent infinite loop rescanning the pack list on
PackMismatchException" to refactoring done in
https://git.eclipse.org/r/q/topic:restore-preserved-packs
Change-Id: I0fb77bb9b498d48d5da88a93486b99bf8121e3bd
* stable-5.9:
Prevent infinite loop rescanning the pack list on
PackMismatchException
Remove blank in maven.config
Change-Id: I15ff2d7716ecaceb0eb87b8823d85670f5db709d
We found, when analysing an incident where Gerrit's gc runner thread got
stuck, that we can end up in an infinite loop in
ObjectDirectory#openPackedObject which tries to rescan the pack
list and starts over trying to open a packed object in an unconfined
loop if it catches a PackMismatchException.
Here the relevant part of a thread dump we created while the gc runner
was stuck:
"WorkQueue-2[java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@350812a3[Not
completed,
task = java.util.concurrent.Executors$RunnableAdapter@5425d7ee]]" #72
tid=0x00007f73cee1c800 nid=0x584
runnable [0x00007f7392d57000]
java.lang.Thread.State: RUNNABLE
at org.eclipse.jgit.internal.storage.file.WindowCache.removeAll(WindowCache.java:716)
at org.eclipse.jgit.internal.storage.file.WindowCache.purge(WindowCache.java:399)
at org.eclipse.jgit.internal.storage.file.PackFile.close(PackFile.java:296)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.reuseMap(ObjectDirectory.java:973)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.scanPacksImpl(ObjectDirectory.java:904)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.scanPacks(ObjectDirectory.java:895)
- locked <0x000000050a498f60> (a
java.util.concurrent.atomic.AtomicReference)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.searchPacksAgain(ObjectDirectory.java:794)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:465)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:417)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:408)
at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:132)
at org.eclipse.jgit.lib.ObjectReader$1.open(ObjectReader.java:279)
at org.eclipse.jgit.revwalk.RevWalk$2.next(RevWalk.java:1031)
at org.eclipse.jgit.internal.storage.pack.PackWriter.findObjectsToPack(PackWriter.java:1911)
at org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(PackWriter.java:960)
at org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(PackWriter.java:876)
at org.eclipse.jgit.internal.storage.file.GC.writePack(GC.java:1168)
at org.eclipse.jgit.internal.storage.file.GC.repack(GC.java:852)
at org.eclipse.jgit.internal.storage.file.GC.doGc(GC.java:269)
at org.eclipse.jgit.internal.storage.file.GC.gc(GC.java:220)
at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:179)
at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:112)
at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:75)
at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:71)
at com.google.gerrit.server.git.GarbageCollectionRunner.run(GarbageCollectionRunner.java:76)
at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:103)
at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.18/Executors.java:515)
at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.18/FutureTask.java:305)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.18/ScheduledThreadPoolExecutor.java:305)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:612)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.18/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.18/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.18/Thread.java:829)
The code in ObjectDirectory#openPackedObject [1] apparently assumes that
this is caused by a transient problem which it can resume from by
retrying. We use `core.trustFolderStat = false` on this server since it
uses NFS. The incident we had showed that we can enter into an infinite
loop here if there is a permanent mismatch between a pack file and its
corresponding pack index. I am not yet sure how this can happen.
Break the infinite loop by limiting the number of attempts rescanning
the pack list to 5 retries. When we exceed this threshold set the type
of the PackMismatchException to permanent and rethrow it which breaks
the infinite loop.
Also apply the same limit in #getPackedObjectSize
and #selectObjectRepresentation where we use similar retry loops.
[1] 011c26ff36/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/ObjectDirectory.java (465)
Change-Id: I20fb63bcc1fdc3a03d39b963f06a90e6f0ba73dc
Support the new option index.skipHash which was introduced in git 2.40
[1]. If it is set to true skip computing the git index checksum. This
accelerates Git commands that manipulate the index, such as git add, git
commit, or git status. Instead of storing the checksum, write a trailing
set of bytes with value zero, indicating that the computation was
skipped.
Accept a skipped checksum consisting of 20 null bytes when reading the
index since the option could have been set to true at the time when the
index was written.
[1] https://git-scm.com/docs/git-config#Documentation/git-config.txt-indexskipHash
Bug: 581723
Change-Id: I28ebe44c5ca1cbcb882438665d686452a0c111b2
From File#lines javadoc: The returned stream from File Lines
encapsulates a Reader. If timely disposal of file system resources is
required, the try-with-resources construct should be used to ensure
that the stream's close method is
invoked after the stream operations are completed.
Wrap File.lines with try-with-resources.
Change-Id: I82c6faa3ef1083f6c7e964f96e9540b4db18eee8
Signed-off-by: Xing Huang <xingkhuang@google.com>
(cherry picked from commit 172a207945)
This broke the test GcConcurrentTest#testInterruptGc which expects
ClosedByInterruptException when the thread doing gc is interrupted.
Change-Id: I89e02fc37aceeccb04c20cfc5b71cb8fa21793d6
Git guards gc by locking a lock file "gc.pid" before starting execution.
The lock file contains the pid and hostname of the process holding the
lock. Git tries to kill the process holding that lock if the lock file
wasn't modified in the last 12 hours and was started from the same host.
Teach JGit to acquire this lock before running gc but skip execution if
another process already holds the lock. Killing the other process could
be undesired if it's a long running application.
If the lock file wasn't modified in the last 12 hours try to lock it and
run gc if locking succeeds.
Register a shutdown hook for the lock file to ensure it is cleaned up if
the process is gracefully killed.
Change-Id: I00b838dcbf4fb0d03863bf7a2cd86b743c6c6971
Add the options
- pack.preserveOldPacks
- pack.prunePreserved
This allows to configure in git config if old packs should be preserved
during gc and pruned during the next gc.
The original implementation in 91132bb0 only allows to set these options
using the API.
Change-Id: I5b23ab4f317d12f5ccd234401419913e8263cc9a
Add another newBatchUpdate method in the RefDirectory where we can
control if the created PackedBatchRefUpdate will lock the loose refs or
not.
This can be useful in cases when we run programs which have exclusive
access to a Git repository and we know that locking loose refs is
unnecessary and just a performance loss.
Change-Id: I7d0932eb1598a3871a2281b1a049021380234df9
(cherry picked from commit cb90ed0852)
The FetchProcess needs to verify that all the refs received point
to objects that are reachable from the local refs, which could be
very expensive but is needed to avoid missing objects exceptions
because of broken chains.
When the local repository has a lot of refs (e.g. millions) and the
client is fetching a non-commit object (e.g. refs/sequences/changes in
Gerrit) the reachability check on all local refs can be very expensive
compared to the time to fetch the remote ref.
Example for a 2M refs repository:
- fetching a single non-commit object: 50ms
- checking the reachability of local refs: 30s
A ref pointing to a non-commit object doesn't have any parent or
successor objects, hence would never need to have a reachability check
done. Skipping the askForIsComplete() altogether would save the 30s
time spent in an unnecessary phase.
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: I09ac66ded45cede199ba30f9e71cc1055f00941b
FetchCommand#fetchSubmodules assumed that FETCH_HEAD can always be
parsed as a tree. This isn't true if it refers to a Ref referring to a
BLOB. This is e.g. used in Gerrit for Refs like refs/sequences/changes
which are used to implement sequences stored in git.
Change-Id: I414f5b7d9f2184b2d7d53af1dfcd68cccb725ca4
When running a GC.repack() against a repository with over one
thousands of refs/heads and tens of millions of ObjectIds,
the calculation of all bitmaps associated with all the refs
would result in an unreasonable big file that would take up to
several hours to compute.
Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz
Before this change: 20 mins
After this change and 2300 heads excluded: 10 mins (90s for bitmap)
Having such a large bitmap file is also slow in the runtime
processing and have negligible or even negative benefits, because
the time lost in reading and decompressing the bitmap in memory
would not be compensated by the time saved by using it.
It is key to preserve the bitmaps for those refs that are mostly
used in clone/fetch and give the ability to exlude some refs
prefixes that are known to be less frequently accessed, even
though they may actually be actively written.
Example: Gerrit sandbox branches may even be actively
used and selected automatically because its commits are very
recent, however, they may bloat the bitmap, making it ineffective.
A mono-repo with tens of thousands of developers may have
a relatively small number of active branches where the
CI/CD jobs are continuously fetching/cloning the code. However,
because Gerrit allows the use of sandbox branches, the
total number of refs/heads may be even tens to hundred
thousands.
Change-Id: I466dcde69fa008e7f7785735c977f6e150e3b644
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
The annotated tags should be excluded from the bitmap associated
with the heads-only packfile. However, this was not happening
because of the check of exclusion of the peeled object instead
of the objectId to be excluded from the bitmap.
Sample use-case:
refs/heads/main
^
|
commit1 <-- commit2 <- annotated-tag1 <- tag1
^
|
commit0
When creating a bitmap for the above commit graph, before this
change all the commits are included (3 bitmaps), which is
incorrect, because all commits reachable from annotated tags
should not be included.
The heads-only bitmap should include only commit0 and commit1
but because PackWriterBitPreparer was checking for the peeled
pointer of tag1 to be excluded (commit2) which was not found in
the list of tags to exclude (annotated-tag1), the commit2 was
included, even if it wasn't reachable only from the head.
Add an additional check for exclusion of the original objectId
for allowing the exclusion of annotated tags and their pointed
commits. Add one specific test associated with an annotated tag
for making sure that this use-case is covered also.
Example repository benchmark for measuring the improvement:
# refs: 400k (2k heads, 88k tags, 310k changes)
# objects: 11M (88k of them are annotate tags)
# packfiles: 2.7G
Before this change:
GC time: 5h
clone --bare time: 7 mins
After this change:
GC time: 20 mins
clone --bare time: 3 mins
Bug: 581267
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: Iff2bfc6587153001837220189a120ead9ac649dc
When cloning huge repositories I observed percentage of object counts
turning negative. This happened if lastWork * 100 exceeded
Integer.MAX_VALUE.
Change-Id: Ic5f5cf5a911a91338267aace4daba4b873ab3900
GC needs to get a ReflogReader for all existing refs to list all objects
referenced from reflogs. The existing Repository#getReflogReader method
accepts the ref name and then resolves the Ref to create a ReflogReader.
GC calling that for a huge number of Refs one by one is very slow. GC
first gets all Refs in bulk and then calls getReflogReader for each of
them.
Fix this by adding another getReflogReader method to Repository which
accepts a Ref directly.
This speeds up running JGit gc on a mirror clone of the Gerrit
repository from 15:36 min to 1:08 min. The repository used in this test
had 45k refs, 275k commits and 1.2m git objects.
Change-Id: I474897fdc6652923e35d461c065a29f54d9949f4
Add a couple tests that confirm what the docs say about isModified() and
equals(MISSING_FILE) behavior.
Change-Id: I6093040ba3594934c3270331405a44b2634b97c5
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
introduced by
- addition of configurable SHA1 implementation in 5.13.2
- 3-digit @since 5.9.1 annotations on GitServlet methods
Change-Id: If19853fcc5e3677e5b18e8e3fbbcd2773378dffc
Without this I sometimes hit the error:
$ java -jar target/benchmarks.jar
Exception in thread "main" java.lang.RuntimeException: ERROR: Unable to
find the resource: /META-INF/BenchmarkList
at org.openjdk.jmh.runner.AbstractResourceReader.getReaders(AbstractResourceReader.java:98)
at org.openjdk.jmh.runner.BenchmarkList.find(BenchmarkList.java:124)
at org.openjdk.jmh.runner.Runner.internalRun(Runner.java:253)
at org.openjdk.jmh.runner.Runner.run(Runner.java:209)
at org.openjdk.jmh.Main.main(Main.java:71)
Change-Id: Iea9431d5f332f799d55a8a2d178c79a2ef0da22b
The change If6da9833 moved the computation of SHA1 from the JVM's
JCE to a pure Java implementation with collision detection.
The extra security for public sites comes with a cost of slower
SHA1 processing compared to the native implementation in the JDK.
When JGit is used internally and not exposed to any traffic from
external or untrusted users, the extra cost of the pure Java SHA1
implementation can be avoided, falling back to the previous
native MessageDigest implementation.
Bug: 580310
Change-Id: Ic24c0ba1cb0fb6282b8ca3025ffbffa84035565e
Trying to register/unregister a shutdown hook when the JVM is already in
shutdown throws an IllegalStateException. Ignore this exception since we
can't do anything about it.
Bug: 580953
Change-Id: I8fc6fdd5585837c81ad0ebd6944430856556d90e
In uploadWithExceptionPropagation don't prematurely terminate timer in
case of error to enable reporting it to the client. Expose a close
method so that callers can terminate it at the appropriate time.
If the timer is already terminated when trying to report it to the
client this failed with the error java.lang.IllegalStateException:
"Timer already terminated".
Bug: 579670
Change-Id: I95827442ccb0f9b1ede83630cf7c51cf619c399a