Commit Graph

8909 Commits

Author SHA1 Message Date
Matthias Sohn 20b7e9435b Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Remove blank in maven.config
  DirCache: support option index.skipHash

Change-Id: Idf757bcab0d7a65ea63504674a681170c6db2f94
2023-04-15 00:49:59 +02:00
Matthias Sohn 273df319fe Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Remove blank in maven.config
  DirCache: support option index.skipHash

Change-Id: I0cc3033b1876c8c691c2a6876206cd71fa07d2e0
2023-04-15 00:49:08 +02:00
Matthias Sohn f73efea21f Remove blank in maven.config
Maven 3.9.1 doesn't accept this whitespace.

Change-Id: I0f6e3652b1e581615c370d35bc782184712ac922
2023-04-15 00:42:22 +02:00
Matthias Sohn 23b9693a75 DirCache: support option index.skipHash
Support the new option index.skipHash which was introduced in git 2.40
[1]. If it is set to true skip computing the git index checksum. This
accelerates Git commands that manipulate the index, such as git add, git
commit, or git status. Instead of storing the checksum, write a trailing
set of bytes with value zero, indicating that the computation was
skipped.

Accept a skipped checksum consisting of 20 null bytes when reading the
index since the option could have been set to true at the time when the
index was written.

[1] https://git-scm.com/docs/git-config#Documentation/git-config.txt-indexskipHash

Bug: 581723
Change-Id: I28ebe44c5ca1cbcb882438665d686452a0c111b2
2023-03-28 23:16:08 +02:00
Matthias Sohn 5b16c8ae15 Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  GC: Close File.lines stream

Change-Id: I2f9e6da5584a40bb4b4efed0b87ae456f119d757
2023-03-23 09:05:42 +01:00
Matthias Sohn 55164c43b9 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  GC: Close File.lines stream

Change-Id: Ib473750e5a3ad3d74b0cb41f25052890f50a975c
2023-03-23 09:04:50 +01:00
Xing Huang 3212c8fa38 GC: Close File.lines stream
From File#lines javadoc: The returned stream from File Lines
encapsulates a Reader. If timely disposal of file system resources is
required, the try-with-resources construct should be used to ensure
that the stream's close method is
invoked after the stream operations are completed.

Wrap File.lines with try-with-resources.

Change-Id: I82c6faa3ef1083f6c7e964f96e9540b4db18eee8
Signed-off-by: Xing Huang <xingkhuang@google.com>
(cherry picked from commit 172a207945)
2023-03-23 08:19:26 +01:00
Prudhvi Akhil Alahari a4ca500d26 Improve test coverage when core.trustPackedRefsStat set to after_open
As of today, we don't have test coverage for RefDirectory when
core.trustPackedRefsStat config is set to after_open. Thus create new
test classes which set core.trustPackedRefsStat config to after_open in
setup and extend RefDirectoryTest and FileRepositoryBuilderTest
respectively.

Change-Id: I1db6fcf414bc488106ad4c85fb934480f299c995
Signed-off-by: Prudhvi Akhil Alahari <quic_prudhvi@quicinc.com>
2023-03-02 21:20:02 +05:30
Matthias Sohn b526829fba Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: Idea23e555c024557d7e39a86efe25f609400b962
2023-02-22 21:02:47 +01:00
Matthias Sohn 238f1693f7 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I708d0936fa86b028e4da4e7e21f332f8b48ad293
2023-02-22 21:02:09 +01:00
Matthias Sohn d9f75e8bb2 If tryLock fails to get the lock another gc has it
Change-Id: Ifd3bbcc5e0591883b774d23256949a83010ea134
2023-02-22 20:38:43 +01:00
Matthias Sohn 1691e38779 Fix GcConcurrentTest#testInterruptGc
With the new GC.PidLock interrupting a running GC throws a
ClosedByInterruptException.

Change-Id: I7ccea1ae9a43d4edfdab2fcfd1324c64cc22b38f
2023-02-22 20:38:29 +01:00
Matthias Sohn 49f5273867 Don't swallow IOException in GC.PidLock#lock
This broke the test GcConcurrentTest#testInterruptGc which expects
ClosedByInterruptException when the thread doing gc is interrupted.

Change-Id: I89e02fc37aceeccb04c20cfc5b71cb8fa21793d6
2023-02-22 19:27:30 +01:00
Matthias Sohn a6da439b47 Check if FileLock is valid before using or releasing it
Change-Id: I23ba67b61b9b03772f33a929c080c0d02b8c8652
2023-02-22 02:56:06 +01:00
Matthias Sohn 4c111e59d0 Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ib9a2419253ffcbc90874adbfdb8129fee3178210
2023-02-22 01:26:36 +01:00
Matthias Sohn 2a2a208fa1 Use Java 11 ProcessHandle to get pid of the current process
Change-Id: I790f218601c1d5e1b39c4101e3b2708e76b9d782
2023-02-22 01:06:06 +01:00
Matthias Sohn aa13d1daf5 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ibb5c46cb79377d2d2cd7d4586f31c86665d2851c
2023-02-22 01:00:26 +01:00
Matthias Sohn 8eee800fb1 Acquire file lock "gc.pid" before running gc
Git guards gc by locking a lock file "gc.pid" before starting execution.
The lock file contains the pid and hostname of the process holding the
lock. Git tries to kill the process holding that lock if the lock file
wasn't modified in the last 12 hours and was started from the same host.

Teach JGit to acquire this lock before running gc but skip execution if
another process already holds the lock. Killing the other process could
be undesired if it's a long running application.

If the lock file wasn't modified in the last 12 hours try to lock it and
run gc if locking succeeds.

Register a shutdown hook for the lock file to ensure it is cleaned up if
the process is gracefully killed.

Change-Id: I00b838dcbf4fb0d03863bf7a2cd86b743c6c6971
2023-02-21 00:18:33 +01:00
Matthias Sohn 380f091fa5 Silence API errors introduced by 9424052f
Change-Id: Ia9e619a8fa06648086b583c994e4b107ae06c44d
2023-02-21 00:18:33 +01:00
Matthias Sohn 07a9eb06ff Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I876a38c2de8b7d5eaacd00e36b85599f88173221
2023-02-16 16:59:09 +01:00
Matthias Sohn c46eb91611 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I423f410578f5bbe178832b80fef8998a5372182c
2023-02-16 16:48:24 +01:00
Prudhvi Akhil Alahari 012cb77930 Fix getPackedRefs to not throw NoSuchFileException
Since Files.newInputStream is from java.nio package, it throws
java.nio.file.NoSuchFileException. This was missed in the change
I00da88e. Without this change, getPackedRefs fails with
NoSuchFileException when there is no packed-refs file in a project.

Change-Id: I93c202ddb73a0a5979af8e4d09e45f5645664b45
Signed-off-by: Prudhvi Akhil Alahari <quic_prudhvi@quicinc.com>
2023-02-16 16:44:12 +05:30
Matthias Sohn 9424052f27 Add pack options to preserve and prune old pack files
Add the options
- pack.preserveOldPacks
- pack.prunePreserved

This allows to configure in git config if old packs should be preserved
during gc and pruned during the next gc.

The original implementation in 91132bb0 only allows to set these options
using the API.

Change-Id: I5b23ab4f317d12f5ccd234401419913e8263cc9a
2023-02-11 01:19:28 +01:00
Saša Živkov ed2cbd9e8a Allow to perform PackedBatchRefUpdate without locking loose refs
Add another newBatchUpdate method in the RefDirectory where we can
control if the created PackedBatchRefUpdate will lock the loose refs or
not.

This can be useful in cases when we run programs which have exclusive
access to a Git repository and we know that locking loose refs is
unnecessary and just a performance loss.

Change-Id: I7d0932eb1598a3871a2281b1a049021380234df9
(cherry picked from commit cb90ed0852)
2023-02-03 10:18:47 +01:00
Matthias Sohn e55bad514b Document option "core.sha1Implementation" introduced in 59029aec
Bug: 580310
Change-Id: I10f3d6f6b5af7ab96683994c9cbd85e6c18a5084
2023-02-02 21:18:43 +01:00
Matthias Sohn b5de5ccb9e Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Ib5055f2f3b8a313c178d6f6c7c5630285ad5a726
2023-02-01 00:41:52 +01:00
Matthias Sohn da21265a14 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: I58ad4c210a5e7e5a1ba6b22315b04211c8909950
2023-02-01 00:33:20 +01:00
Luca Milanesio 21e902dd7f Shortcut during git fetch for avoiding looping through all local refs
The FetchProcess needs to verify that all the refs received point
to objects that are reachable from the local refs, which could be
very expensive but is needed to avoid missing objects exceptions
because of broken chains.

When the local repository has a lot of refs (e.g. millions) and the
client is fetching a non-commit object (e.g. refs/sequences/changes in
Gerrit) the reachability check on all local refs can be very expensive
compared to the time to fetch the remote ref.

Example for a 2M refs repository:
- fetching a single non-commit object: 50ms
- checking the reachability of local refs: 30s

A ref pointing to a non-commit object doesn't have any parent or
successor objects, hence would never need to have a reachability check
done. Skipping the askForIsComplete() altogether would save the 30s
time spent in an unnecessary phase.

Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: I09ac66ded45cede199ba30f9e71cc1055f00941b
2023-02-01 00:07:45 +01:00
Matthias Sohn 7650832002 FetchCommand: fix fetchSubmodules to work on a Ref to a blob
FetchCommand#fetchSubmodules assumed that FETCH_HEAD can always be
parsed as a tree. This isn't true if it refers to a Ref referring to a
BLOB. This is e.g. used in Gerrit for Refs like refs/sequences/changes
which are used to implement sequences stored in git.

Change-Id: I414f5b7d9f2184b2d7d53af1dfcd68cccb725ca4
2023-01-31 23:52:20 +01:00
Matthias Sohn 8040936f8a Silence API warnings introduced by I466dcde6
Change-Id: I510510da34d33757c2f83af8cd1e26f6206a486a
2023-01-31 23:45:07 +01:00
Luca Milanesio ad977f1572 Allow the exclusions of refs prefixes from bitmap
When running a GC.repack() against a repository with over one
thousands of refs/heads and tens of millions of ObjectIds,
the calculation of all bitmaps associated with all the refs
would result in an unreasonable big file that would take up to
several hours to compute.

Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz
Before this change: 20 mins
After this change and 2300 heads excluded: 10 mins (90s for bitmap)

Having such a large bitmap file is also slow in the runtime
processing and have negligible or even negative benefits, because
the time lost in reading and decompressing the bitmap in memory
would not be compensated by the time saved by using it.

It is key to preserve the bitmaps for those refs that are mostly
used in clone/fetch and give the ability to exlude some refs
prefixes that are known to be less frequently accessed, even
though they may actually be actively written.

Example: Gerrit sandbox branches may even be actively
used and selected automatically because its commits are very
recent, however, they may bloat the bitmap, making it ineffective.

A mono-repo with tens of thousands of developers may have
a relatively small number of active branches where the
CI/CD jobs are continuously fetching/cloning the code. However,
because Gerrit allows the use of sandbox branches, the
total number of refs/heads may be even tens to hundred
thousands.

Change-Id: I466dcde69fa008e7f7785735c977f6e150e3b644
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
2023-01-31 17:14:09 -05:00
Luca Milanesio e4529cd39c PackWriterBitmapPreparer: do not include annotated tags in bitmap
The annotated tags should be excluded from the bitmap associated
with the heads-only packfile. However, this was not happening
because of the check of exclusion of the peeled object instead
of the objectId to be excluded from the bitmap.

Sample use-case:

refs/heads/main
  ^
  |
 commit1 <-- commit2 <- annotated-tag1 <- tag1
  ^
  |
 commit0

When creating a bitmap for the above commit graph, before this
change all the commits are included (3 bitmaps), which is
incorrect, because all commits reachable from annotated tags
should not be included.

The heads-only bitmap should include only commit0 and commit1
but because PackWriterBitPreparer was checking for the peeled
pointer of tag1 to be excluded (commit2) which was not found in
the list of tags to exclude (annotated-tag1), the commit2 was
included, even if it wasn't reachable only from the head.

Add an additional check for exclusion of the original objectId
for allowing the exclusion of annotated tags and their pointed
commits. Add one specific test associated with an annotated tag
for making sure that this use-case is covered also.

Example repository benchmark for measuring the improvement:
# refs: 400k (2k heads, 88k tags, 310k changes)
# objects: 11M (88k of them are annotate tags)
# packfiles: 2.7G

Before this change:
GC time: 5h
clone --bare time: 7 mins

After this change:
GC time: 20 mins
clone --bare time: 3 mins

Bug: 581267
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: Iff2bfc6587153001837220189a120ead9ac649dc
2023-01-31 14:15:56 +01:00
Matthias Sohn 611412a055 BatchingProgressMonitor: avoid int overflow when computing percentage
When cloning huge repositories I observed percentage of object counts
turning negative. This happened if lastWork * 100 exceeded
Integer.MAX_VALUE.

Change-Id: Ic5f5cf5a911a91338267aace4daba4b873ab3900
2023-01-31 14:15:53 +01:00
Matthias Sohn cd3fc7a299 Speedup GC listing objects referenced from reflogs
GC needs to get a ReflogReader for all existing refs to list all objects
referenced from reflogs. The existing Repository#getReflogReader method
accepts the ref name and then resolves the Ref to create a ReflogReader.
GC calling that for a huge number of Refs one by one is very slow. GC
first gets all Refs in bulk and then calls getReflogReader for each of
them.

Fix this by adding another getReflogReader method to Repository which
accepts a Ref directly.

This speeds up running JGit gc on a mirror clone of the Gerrit
repository from 15:36 min to 1:08 min. The repository used in this test
had 45k refs, 275k commits and 1.2m git objects.

Change-Id: I474897fdc6652923e35d461c065a29f54d9949f4
2023-01-23 17:19:14 +01:00
Nasser Grainawi 21b2aef0aa Cache trustFolderStat/trustPackedRefsStat value per-instance
Instead of re-reading the config every time the methods using these
values were called, cache the config value at the time of instance
construction. Caching the values improves performance for each of the
method calls. These configs are set based on the filesystem storing the
repository and unlikely to change while an application is running.

Change-Id: I1cae26dad672dd28b766ac532a871671475652df
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
2023-01-13 18:45:02 +01:00
Kaushik Lingarkar fed1a54935 Refresh 'objects' dir and retry if a loose object is not found
A new loose object may not be immediately visible on a NFS
client if it was created on another client. Refreshing the
'objects' dir and trying again can help work around the NFS
behavior.

Here's an E2E problem that this change can help fix. Consider
a Gerrit multi-primary setup with repositories based on NFS.
Add a new patch-set to an existing change and then immediately
fetch the new patch-set of that change. If the fetch is handled
by a Gerrit primary different that the one which created the
patch-set, then we sometimes run into a MissingObjectException
that causes the fetch to fail.

Bug: 581317
Change-Id: Iccc6676c68ef13a1e8b2ff52b3eeca790a89a13d
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-01-13 18:44:35 +01:00
Nasser Grainawi 2011fe06d2 FileSnapshotTest: Add more MISSING_FILE coverage
Add a couple tests that confirm what the docs say about isModified() and
equals(MISSING_FILE) behavior.

Change-Id: I6093040ba3594934c3270331405a44b2634b97c5
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
2023-01-06 14:23:12 -07:00
Kaushik Lingarkar 82b5aaf7e3 Introduce core.trustPackedRefsStat config
Currently, we always read packed-refs file when 'trustFolderStat'
is false. Introduce a new config 'trustPackedRefsStat' which takes
precedence over 'trustFolderStat' when reading packed refs. Possible
values for this new config are:

* always: Trust packed-refs file attributes
* after_open: Same as 'always', but refresh the file attributes of
              packed-refs before trusting it
* never: Always read the packed-refs file
* unset: Fallback to 'trustFolderStat' to determine if the file
  attributes of packed-refs can be trusted

Folks whose repositories are on NFS and have traditionally been
setting 'trustFolderStat=false' can now get some performance improvement
with 'trustPackedRefsStat=after_open' as it refreshes the file
attributes of packed-refs (at least on some NFS clients) before
considering it.

For example, consider a repository on NFS with ~500k packed-refs. Here
are some stats which illustrate the improvement with this new config
when reading packed refs on NFS:

trustFolderStat=true trustPackedRefsStat=unset: 0.2ms
trustFolderStat=false trustPackedRefsStat=unset: 155ms
trustFolderStat=false trustPackedRefsStat=after_open: 1.5ms

Change-Id: I00da88e4cceebbcf3475be0fc0011ff65767c111
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-01-05 15:52:36 +01:00
Kaushik Lingarkar 52aa9c81fc Fix documentation for core.trustFolderStat
Update documentation for core.trustFolderStat to highlight that it is
also used when reading the packed-refs file.

Change-Id: I3eac377c3a7f48493abc8ae6d0889ee70a05d24d
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2022-12-14 12:27:34 -08:00
Matthias Sohn fec271c11b Silence API errors
Change-Id: I07c42fe9417edb0570dd475a7e935112a878a93b
2022-11-20 20:09:56 +01:00
Matthias Sohn 97ad9bdae6 Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Silence API errors
  Silence API warnings

Change-Id: I2b8336652e60dec97666582cf9331c8505729473
2022-11-20 20:08:42 +01:00
Matthias Sohn 41b33a16b8 Silence API errors
Change-Id: Ie112b2099ea2125bc85863524e56f09ba4907373
2022-11-20 19:55:22 +01:00
Matthias Sohn 12f48276bd Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Silence API warnings

Change-Id: If5ab988a0e177c37b125e0b10625e506eeb2a74f
2022-11-20 19:54:44 +01:00
Matthias Sohn aa9f736c33 Silence API warnings
introduced by
- addition of configurable SHA1 implementation in 5.13.2
- 3-digit @since 5.9.1 annotations on GitServlet methods

Change-Id: If19853fcc5e3677e5b18e8e3fbbcd2773378dffc
2022-11-20 19:45:54 +01:00
Matthias Sohn 77e2f4bd27 Merge "Merge branch 'stable-6.0' into stable-6.1" into stable-6.1 2022-11-16 04:10:43 -05:00
Matthias Sohn 7f36943d0c Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  [benchmarks] Remove profiler configuration
  Add SHA1 benchmark
  [benchmarks] Set version of maven-compiler-plugin to 3.8.1
  Fix running JMH benchmarks
  Add option to allow using JDK's SHA1 implementation
  Ignore IllegalStateException if JVM is already shutting down

Change-Id: I176419026c3f4fdd8ebd34c61468c1ec3482ff45
2022-11-16 09:54:28 +01:00
Matthias Sohn f1909615d3 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  [benchmarks] Remove profiler configuration
  Add SHA1 benchmark
  [benchmarks] Set version of maven-compiler-plugin to 3.8.1
  Fix running JMH benchmarks
  Add option to allow using JDK's SHA1 implementation
  Ignore IllegalStateException if JVM is already shutting down

Change-Id: I40105336f0b9e593a8a2c242a9557f854c274fdc
2022-11-16 00:15:17 +01:00
Matthias Sohn 48316309b1 [benchmarks] Remove profiler configuration
Profiler configuration can be added when required. It was commented out
in most benchmarks.

Change-Id: I725f98757f7d4d2ba2589658e34e2fd6fbbbedee
2022-11-15 23:08:20 +01:00
Matthias Sohn c20e9676c4 Add SHA1 benchmark
Results on a Mac M1 max:

    size     SHA1Native SHA1Java    SHA1Java
                        without     with
                        collision   collision
                        detection   detection
    [kB]     [us/op]    [us/op]     [us/op]
---------------------------------------------
      1       3.662       4.200       4.707
      2       7.053       7.868       8.928
      4      13.883      15.149      17.608
      8      27.225      30.049      35.237
     16      54.014      59.655      70.867
     32     106.457     118.022     140.403
     64     212.712     237.702     281.522
   1024    3469.519    3868.883    4637.287
 131072  445011.724  501751.992  604061.308
1048576 3581702.104 4008087.854 4831023.563

The last 3 sizes (1, 128, 1024 MB) weren't committed
here to limit the total runtime.

Bug: 580310
Change-Id: I7d0382fd4aa4c4734806b12e96b671bee37d26e3
2022-11-15 23:08:19 +01:00
Matthias Sohn 4fac796588 [benchmarks] Set version of maven-compiler-plugin to 3.8.1
Change-Id: Ib14db133c76a55358ea79663ef38d9fb47a67f45
2022-11-15 23:08:19 +01:00