Commit Graph

6378 Commits

Author SHA1 Message Date
Matthias Sohn a30c1da323 Merge branch 'stable-6.5'
* stable-6.5:
  Ensure FileCommitGraph scans commit-graph file if it already exists

Change-Id: I5218ff5214222c7d6d96e452cf427eea1f20c316
2023-03-27 11:02:52 +02:00
kylezhao 827849017d Ensure FileCommitGraph scans commit-graph file if it already exists
When commit-graph file already exists in the repository, a newly
created FileCommitGraph didn't scan CommitGraph until the file was
modified, resulting in wrong result.

Change-Id: Ic85676f2d3b6a88f3ae28d4065729926b6fb2f23
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-03-27 10:51:07 +02:00
Matthias Sohn 67fcf76b4b Merge branch 'stable-6.4' into stable-6.5
* stable-6.4:
  GC: Close File.lines stream

Change-Id: I7e3a4b3671e779fd62062c4e10d224f432e39b54
2023-03-23 09:07:33 +01:00
Matthias Sohn cd2dc85f31 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  GC: Close File.lines stream

Change-Id: I99455916d447f5dffed85e9a5c1d51b323f07a16
2023-03-23 09:07:09 +01:00
Matthias Sohn 137efda0ba Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  GC: Close File.lines stream

Change-Id: Id93b1933a5ce1ede9eb388c9fd54a4b3749694bf
2023-03-23 09:06:43 +01:00
Matthias Sohn b118e7b4c4 Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  GC: Close File.lines stream

Change-Id: Ia2be0b05ed860125a388b01d6c291832f08dd990
2023-03-23 09:06:16 +01:00
Matthias Sohn 5b16c8ae15 Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  GC: Close File.lines stream

Change-Id: I2f9e6da5584a40bb4b4efed0b87ae456f119d757
2023-03-23 09:05:42 +01:00
Matthias Sohn 55164c43b9 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  GC: Close File.lines stream

Change-Id: Ib473750e5a3ad3d74b0cb41f25052890f50a975c
2023-03-23 09:04:50 +01:00
Xing Huang 3212c8fa38 GC: Close File.lines stream
From File#lines javadoc: The returned stream from File Lines
encapsulates a Reader. If timely disposal of file system resources is
required, the try-with-resources construct should be used to ensure
that the stream's close method is
invoked after the stream operations are completed.

Wrap File.lines with try-with-resources.

Change-Id: I82c6faa3ef1083f6c7e964f96e9540b4db18eee8
Signed-off-by: Xing Huang <xingkhuang@google.com>
(cherry picked from commit 172a207945)
2023-03-23 08:19:26 +01:00
Xing Huang 172a207945 GC: Close File.lines stream
From File#lines javadoc: The returned stream from File Lines
encapsulates a Reader. If timely disposal of file system resources is
required, the try-with-resources construct should be used to ensure
that the stream's close method is
invoked after the stream operations are completed.

Wrap File.lines with try-with-resources.

Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I82c6faa3ef1083f6c7e964f96e9540b4db18eee8
Signed-off-by: Xing Huang <xingkhuang@google.com>
2023-03-21 17:57:12 -05:00
Matthias Sohn 228e4de484 Merge branch 'stable-6.5'
* stable-6.5:
  Rerun flaky tests 3 times
  Prepare 6.5.1-SNAPSHOT builds
  JGit v6.5.0.202303070854-r
  Ignore generated org.eclipse.jgit.benchmarks/dependency-reduced-pom.xml
  [sshd] Fix calculation of timeout in AbstractClientProxyConnector
  Silence API error raised for removed BranchRebaseMode#PRESERVE

Change-Id: Ie615980c81371ee26b2395e67e026bbd17422fbd
2023-03-07 16:41:19 +01:00
Matthias Sohn 8dcb02140d Prepare 6.5.1-SNAPSHOT builds
Change-Id: Idd9977ac08a339906e33beb73f57f8f6885ad86f
2023-03-07 16:39:19 +01:00
Matthias Sohn c72dd241f4 JGit v6.5.0.202303070854-r
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Change-Id: I8da37ead0bd527bc4990ed5f8d5d4fb4f4d5cf01
2023-03-07 14:54:32 +01:00
Matthias Sohn 9be2b7f8a8 Silence API error raised for removed BranchRebaseMode#PRESERVE
It was replaced by MERGES to match C git which did that in 2.34.

Change-Id: Ib6a33b4a3650345bf0f9d3726dd9e14c5797e836
2023-03-06 21:39:42 +01:00
Matthias Sohn 0687c73a12 Merge branch 'stable-6.5'
* stable-6.5:
  [errorprone] Suppress [Finally] warnings
  Update Orbit to R20230302014618 for 2023-03
  Improve test coverage when core.trustPackedRefsStat set to after_open
  Prepare 6.5.0-SNAPSHOT builds
  JGit v6.5.0.202302281825-rc1
  Prepare 6.5.0-SNAPSHOT builds
  JGit v6.5.0.202302221508-m3

Change-Id: Ice109c060d14c455262f61aed088111b238d735b
2023-03-03 16:04:00 +01:00
Matthias Sohn 1d073e30d7 [errorprone] Suppress [Finally] warnings
In these cases we use Throwable#addSuppressed to ensure the exception
thrown in the catch block preceding the finally block throwing another
exception isn't lost.

Change-Id: I96e78a5c15238ab77ac90ca1901850ce19acfcd8
2023-03-02 23:33:27 +01:00
Matthias Sohn f34ae6fe31 Prepare 6.6.0-SNAPSHOT builds
Change-Id: I17893f9db12bcb208866f40a06cd4f1ccbb4fe30
2023-03-01 15:40:45 +01:00
Matthias Sohn 69671a7026 Prepare 6.5.0-SNAPSHOT builds
Change-Id: I313e3deed8fa00df0406b3d7b73e5b643dc25a05
2023-03-01 15:30:29 +01:00
Matthias Sohn f43560a760 JGit v6.5.0.202302281825-rc1
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Change-Id: I1eb2e87b70c2da1dc81468cdc7ecf7dbd21d4190
2023-03-01 00:23:58 +01:00
Pavel Salamon 0518a6b0c1 Change config pull.rebase=preserve to pull.rebase=merges
The native git option to preserve merge commits during rebase
has been changed from pull.rebase=preserve to pull.rebase=merges.

This changeset in jgit makes the same config change. The old "preserve"
option is no longer recognized and is replaced by new option called
"merges".

This makes jgit's rebase configuration compatible with native git
versions 2.34 and newer where the old "preserve" option has been
removed.

Change-Id: Ic07ff954e258115e76465a1593ef3259f4c418a3
2023-02-28 23:44:41 +01:00
Matthias Sohn 2d0b908048 BatchingProgressMonitor: expose time spent per task
Display elapsed time per task if enabled via
ProgressMonitor#showDuration or if system property or environment
variable GIT_TRACE_PERFORMANCE is set to "true". If both the system
property and the environment variable are set the system property takes
precedence.

E.g. using jgit CLI:

$ GIT_TRACE_PERFORMANCE=true jgit clone https://foo.bar/foobar
Cloning into 'foobar'...
remote: Counting objects: 1 [0.002s]
remote: Finding sources: 100% (15531/15531) [0.006s]
Receiving objects:      100% (169737/169737) [13.045s]
Resolving deltas:       100% (67579/67579) [1.842s]

Change-Id: I4d624e7858b286aeddbe7d4e557589986d73659e
2023-02-27 16:41:33 -05:00
Ivan Frade ca2c57b2ec PackWriter: offer to write an object-size index for the pack
PackWriter callers tell the writer what do the want to include in the
pack and invoke #writePack(). Afterwards, they can invoke #writeIndex()
to write the corresponding pack index.

Mirror this for the object-size index, adding a #writeObjectSizeIndex()
method.

Change-Id: Ic319975c72c239cd6488303f7d4cced797e6fe00
2023-02-24 12:56:33 -08:00
Matthias Sohn cfacc43b52 Fix formatting in GC#doGc
Change-Id: Ifa3adb66d4e0404bab4036d6b165d6c4dafe921a
2023-02-24 15:18:39 +01:00
Ivan Frade ad07196d60 PackExt: Define new extension for the object size index
Change-Id: I6bbaf43b4e6fb456ca0e9e0c6efcfeded0f94d6d
2023-02-23 09:32:20 -08:00
Matthias Sohn 176f17d05e Merge branch 'stable-6.4'
* stable-6.4:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: Ia2797b44a60342eb9df53f0b3d674cba92a512fc
2023-02-22 21:06:41 +01:00
Matthias Sohn f4eda3360a Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I5af34c92e423a651db53b4dc45ed844d5f39910d
2023-02-22 21:05:55 +01:00
Matthias Sohn 636f377e4e Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I5b6b10622b61fde3f0f10455a74ae159a0b69082
2023-02-22 21:03:52 +01:00
Matthias Sohn 6cc741aa23 Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I3ffe92566cc145053bb762f612dd96bc6d542c62
2023-02-22 21:03:22 +01:00
Matthias Sohn b526829fba Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: Idea23e555c024557d7e39a86efe25f609400b962
2023-02-22 21:02:47 +01:00
Matthias Sohn 238f1693f7 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I708d0936fa86b028e4da4e7e21f332f8b48ad293
2023-02-22 21:02:09 +01:00
Matthias Sohn d9f75e8bb2 If tryLock fails to get the lock another gc has it
Change-Id: Ifd3bbcc5e0591883b774d23256949a83010ea134
2023-02-22 20:38:43 +01:00
Matthias Sohn 49f5273867 Don't swallow IOException in GC.PidLock#lock
This broke the test GcConcurrentTest#testInterruptGc which expects
ClosedByInterruptException when the thread doing gc is interrupted.

Change-Id: I89e02fc37aceeccb04c20cfc5b71cb8fa21793d6
2023-02-22 19:27:30 +01:00
Matthias Sohn a6da439b47 Check if FileLock is valid before using or releasing it
Change-Id: I23ba67b61b9b03772f33a929c080c0d02b8c8652
2023-02-22 02:56:06 +01:00
Matthias Sohn e92212a5a0 Merge branch 'stable-6.4'
* stable-6.4:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ifa4e56b6ecca9305f3f1685e45450019bfc82e22
2023-02-22 01:29:32 +01:00
Matthias Sohn dcd6367391 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ic40dbab18616d8d9fe3820b9890c86652b80eb47
2023-02-22 01:28:27 +01:00
Matthias Sohn c70374e641 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: I53cf9675deac0b588048d8224216d2a7e8bd16ec
2023-02-22 01:27:50 +01:00
Matthias Sohn 628ca9bd6f Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: I0562a4a224779ccf1e4cc1ff8f5a352e55ab220a
2023-02-22 01:27:16 +01:00
Matthias Sohn 4c111e59d0 Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ib9a2419253ffcbc90874adbfdb8129fee3178210
2023-02-22 01:26:36 +01:00
Matthias Sohn 2a2a208fa1 Use Java 11 ProcessHandle to get pid of the current process
Change-Id: I790f218601c1d5e1b39c4101e3b2708e76b9d782
2023-02-22 01:06:06 +01:00
Matthias Sohn aa13d1daf5 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ibb5c46cb79377d2d2cd7d4586f31c86665d2851c
2023-02-22 01:00:26 +01:00
kylezhao d789fe2f4d UploadPack: use allow-any-sha1-in-want configuration
C git 2.11 supports setting the equivalent of RequestPolicy.ANY with
uploadpack.allowAnySHA1InWant[1]. Parse this into TransportConfig and
use it from UploadPack.

Add additional tests for [2] and this change.

We can execute "git clone --filter=blob:none --no-checkout" successfully
with config uploadPack.allowFilter is true. But when we checkout, the
git will fetch other missing objects required by the checkout(this is
why we need this config).

When both uploadPack.allowFilter and uploadPack.allowAnySHA1InWant are
true, jgit will support partial clone. If you are using an extremely
large monorepo, this feature can help. It allows users to work on an
incomplete repo which reduces disk usage.

[1] f8edeaa05d
[2] change Id39771a6e42d8082099acde11249306828a053c0

Bug: 573390
Change-Id: I8fe75f03bf1fea7c11e0d67c8637bd05dd1f9b89
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-02-21 09:11:21 +01:00
Matthias Sohn 8eee800fb1 Acquire file lock "gc.pid" before running gc
Git guards gc by locking a lock file "gc.pid" before starting execution.
The lock file contains the pid and hostname of the process holding the
lock. Git tries to kill the process holding that lock if the lock file
wasn't modified in the last 12 hours and was started from the same host.

Teach JGit to acquire this lock before running gc but skip execution if
another process already holds the lock. Killing the other process could
be undesired if it's a long running application.

If the lock file wasn't modified in the last 12 hours try to lock it and
run gc if locking succeeds.

Register a shutdown hook for the lock file to ensure it is cleaned up if
the process is gracefully killed.

Change-Id: I00b838dcbf4fb0d03863bf7a2cd86b743c6c6971
2023-02-21 00:18:33 +01:00
Matthias Sohn 380f091fa5 Silence API errors introduced by 9424052f
Change-Id: Ia9e619a8fa06648086b583c994e4b107ae06c44d
2023-02-21 00:18:33 +01:00
Matthias Sohn 5d5a0d5376 Externalize strings introduced in c9552aba
Change-Id: I81bb78344df61e6eb42622fcef6235d4da0ae052
2023-02-20 21:40:40 +01:00
Matthias Sohn 37dd45e8a9 Silence API error introduced by 596c445a
Change-Id: I961ba2d89c11373ccb81e6450d7d951204ffca36
2023-02-20 21:31:09 +01:00
Matthias Sohn fe64445c11 Merge branch 'stable-6.4'
* stable-6.4:
  Fix getPackedRefs to not throw NoSuchFileException
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I36051c623fcd480aa80ed32b4e89f9bdd1b798e0
2023-02-20 21:29:30 +01:00
Matthias Sohn f8e6bcba48 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Fix getPackedRefs to not throw NoSuchFileException
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I1073098fb06eabafdb3c5e7fcf44d55b86a1b152
2023-02-20 21:01:38 +01:00
Matthias Sohn 6ea0e11869 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Fix getPackedRefs to not throw NoSuchFileException
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I765c7302ce84a6a9c28fdef29da2bfaa49477c6e
2023-02-20 20:59:14 +01:00
Ivan Frade 596c445af2 PackConfig: add entry for minimum size to index
The object size index can have up to #(blobs-in-repo) entries, taking
a relevant amount of memory. Let operators configure the threshold size
to include objects in the size index.

The index will include objects with size *at or above* this
value (with -1 for none). This is more effective for the
filter-by-size case.

Lowering the threshold adds more objects to the index. This improves
performance at the cost of memory/storage space. For the object-size
case, more calls will use the index instead of reading IO. For the
filter-by-size case, lower threshold means better granularity (if
ObjectReader#isSmallerThan is implemented based only on the index).

Change-Id: I6ccd9334adbbc2abf95fde51dbbfc85b8230ade0
2023-02-16 10:25:44 -08:00
Matthias Sohn d8155c137e Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Fix getPackedRefs to not throw NoSuchFileException
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: Id32683d5f506e082d39af269803bccee0280cc27
2023-02-16 16:59:56 +01:00
Matthias Sohn 07a9eb06ff Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I876a38c2de8b7d5eaacd00e36b85599f88173221
2023-02-16 16:59:09 +01:00
Matthias Sohn c46eb91611 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I423f410578f5bbe178832b80fef8998a5372182c
2023-02-16 16:48:24 +01:00
Prudhvi Akhil Alahari 012cb77930 Fix getPackedRefs to not throw NoSuchFileException
Since Files.newInputStream is from java.nio package, it throws
java.nio.file.NoSuchFileException. This was missed in the change
I00da88e. Without this change, getPackedRefs fails with
NoSuchFileException when there is no packed-refs file in a project.

Change-Id: I93c202ddb73a0a5979af8e4d09e45f5645664b45
Signed-off-by: Prudhvi Akhil Alahari <quic_prudhvi@quicinc.com>
2023-02-16 16:44:12 +05:30
Ivan Frade c9552abaf3 PackObjectSizeIndex: interface and impl for the object-size index
Operations like "clone --filter=blob:limit=N" or the "object-info"
command need to read the size of the objects from the storage. An
index would provide those sizes at once rather than having to seek in
the packfile.

Introduce an interface for the Object-size index. This index returns
the inflated size of an object. Not all objects could be indexed (to
limit memory usage).

This implementation indexes only blobs (no trees, nor
commits) *above* certain size threshold (configurable). Lower
threshold adds more objects to the index, consumes more memory and
provides better performance. 0 means "all blobs" and -1 "disabled".

If we don't index everything, for the filter use case is more
efficient to index the biggest objects first: the set is small and
most objects are filtered by NOT being in the index. For the
object-size, the more objects in the index the better, regardless
their size. All together, it is more helpful to index above threshold.

Change-Id: I9ed608ac240677e199b90ca40d420bcad9231489
2023-02-14 11:50:29 -08:00
Ivan Frade 62d0e7be7c UInt24Array: Array of unsigned ints encoded in 3 bytes.
The object size index stores positions of objects in the main
index (when ordered by sha1). These positions are per-pack and usually
a pack has <16 million objects (there are exceptions but rather
rare). It could save some memory storing these positions in three bytes
instead of four. Note that these positions are sorted and always positive.

Implement a wrapper around a byte[] to access and search "ints" while
they are stored as unsigned 3 bytes.

Change-Id: Iaa26ce8e2272e706e35fe4cdb648fb6ca7591972
2023-02-14 10:19:12 -08:00
Ivan Frade 5b9ca7df42 PackIndex: expose the position of an object-id in the index
The primary index returns the offset in the pack for an
objectId. Internally it keeps the object-ids in lexicographical order,
but doesn't expose an API to find the position of an object-id in that
list. This is needed for the object-size index, that we want to store
as "position-in-idx, size".

Add a #findPosition(object-id) method to the PackIndex interface to
know where an object-id sits in the ordered list of ids in the pack.

Note that this index position is over the list of ordered object-ids,
while reverse-index position is over the list of objects in packed
order.

Change-Id: I89fa146599e347a26d3012d3477d7f5bbbda7ba4
2023-02-14 10:01:29 -08:00
Matthias Sohn 9424052f27 Add pack options to preserve and prune old pack files
Add the options
- pack.preserveOldPacks
- pack.prunePreserved

This allows to configure in git config if old packs should be preserved
during gc and pruned during the next gc.

The original implementation in 91132bb0 only allows to set these options
using the API.

Change-Id: I5b23ab4f317d12f5ccd234401419913e8263cc9a
2023-02-11 01:19:28 +01:00
Xing Huang df5b7959be DfsPackFile/DfsGC: Write commit graphs and expose in pack
JGit knows how to read/write commit graphs but the DFS stack is not
using it yet.

The DFS garbage collector generates a commit-graph with commits
reachable from any ref. The pack is stored as extra stream in the GC
pack. DfsPackFile mimicks how other indices are loaded storing the
reference in DFS cache.

Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I3f94997377986d21a56b300d8358dd27be37f5de
2023-02-07 16:59:56 -05:00
Xing Huang eccae7cf0b ObjectReader: Allow getCommitGraph to throw IOException
ObjectReader#getCommitGraph doesn't report errors loading the
commit graph. The caller should be aware of the situation and
ultimately decide what to do.

Add IOException to ObjectReader#getCommitGraph signature. RevWalk
defaults to an empty commit-graph on IO errors.

Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I38eeacff76c7f926b6dfb192d1e5916e40770024
2023-02-07 11:32:12 -05:00
Saša Živkov ed2cbd9e8a Allow to perform PackedBatchRefUpdate without locking loose refs
Add another newBatchUpdate method in the RefDirectory where we can
control if the created PackedBatchRefUpdate will lock the loose refs or
not.

This can be useful in cases when we run programs which have exclusive
access to a Git repository and we know that locking loose refs is
unnecessary and just a performance loss.

Change-Id: I7d0932eb1598a3871a2281b1a049021380234df9
(cherry picked from commit cb90ed0852)
2023-02-03 10:18:47 +01:00
Han-Wen NIenhuys a1fa0ee679 Merge "UploadPack: consume delimiter in object-info command" 2023-02-02 09:09:25 -05:00
Han-Wen NIenhuys f94ab7680c Merge "PatchApplier fix - init cache with provided tree" 2023-02-02 09:00:56 -05:00
Han-Wen Nienhuys 341116103e UploadPack: consume delimiter in object-info command
The 'size' packet line is an argument, so it
must be preceeded by a 0001 delimiter. See also git's
t5701-git-serve.sh test,

https://github.com/git/git/blob/8b8d9a2/t/t5701-git-serve.sh#L329

Without this fix, the server will choke on the delimiter line, saying
PackProtocolException: unexpected <empty string>

To test, I ran Gerrit locally with this fix

$ curl -X POST   -H 'git-protocol: version=2'   -H 'content-type:
application/x-git-upload-pack-request'   -H 'accept:
application/x-git-upload-pack-result'   --data
$'0018command=object-info\n00010009size\n0031oid
d38b1b92bdb2893eb4505667375563f2d6d4086b\n0000'
http://localhost:8080/git.git/git-upload-pack

=>

0008size0032d38b1b92bdb2893eb4505667375563f2d6d4086b 268590000


The same command completes identically on Gitlab (which supports the
object-info command)

$ curl -X POST   -H 'git-protocol: version=2'   -H 'content-type:
application/x-git-upload-pack-request'   -H 'accept:
application/x-git-upload-pack-result'   --data
$'0018command=object-info\n00010009size\n0031oid
d38b1b92bdb2893eb4505667375563f2d6d4086b\n0000'
https://gitlab.com/gitlab-org/git.git/git-upload-pack

=>

0008size0032d38b1b92bdb2893eb4505667375563f2d6d4086b 268590000

In this case, the blob is for the COPYING file in the Git source tree,
which is 26859 bytes long.

Change-Id: Ief4ce1eb9303a3b2479547d7950ef01c7c28f472
2023-02-02 08:47:35 -05:00
Nitzan Gur-Furman a399bd13b1 PatchApplier fix - init cache with provided tree
This change only affects inCore repositories.
Before this change, any file that wasn't part of the patch
wasn't read, and therefore wasn't part of the output tree.

Change-Id: I246ef957088f17aaf367143f7a0b3af0f8264ffb
Bug: Google b/267270348
2023-02-02 12:39:26 +01:00
Ivan Frade 8898d62dbc Merge "DfsReaderIoStats: Add Commit Graph fields into DfsReaderIoStats" 2023-02-01 18:06:56 -05:00
Matthias Sohn 8bd960bf2b Merge changes I343cc3cf,I9dedf61b
* changes:
  Avoid error-prone warning
  Fix unused exception error-prone warning
2023-02-01 16:52:37 -05:00
Han-Wen Nienhuys b30c75be40 Fix unused exception error-prone warning
Ignoring the exception seems intended in this case.

Change-Id: I9dedf61b9cb5a6ff39fb141dd5da19143f4f6978
2023-02-01 10:53:43 +01:00
Han-Wen Nienhuys 97e8b4cc71 UploadPack: advertise object-info command if enabled
Change-Id: Iad8e5b5f4fdd84bd275eb19ee0d01eb6986d79f2
2023-02-01 10:52:33 +01:00
Han-Wen NIenhuys 66b871b777 Merge "Move MemRefDatabase creation in a separate method." 2023-02-01 04:15:44 -05:00
Matthias Sohn 580cb13f21 Merge branch 'stable-6.4'
* stable-6.4:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Id0ebfbd85eb815716383b9495eb7dd1f54cf4d74
2023-02-01 01:23:34 +01:00
Matthias Sohn ef010db594 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Iefcf5d832bd0087c1027876f2200689e1150abce
2023-02-01 01:12:06 +01:00
Matthias Sohn 82e1362e07 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: I2ff386d9a096277360e6c7bd5535b49984620fb3
2023-02-01 01:10:56 +01:00
Matthias Sohn d8c02aec6a Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Iff2fba026b49463016015b2fae1a42cf76ee2dbb
2023-02-01 00:54:30 +01:00
Matthias Sohn b5de5ccb9e Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Ib5055f2f3b8a313c178d6f6c7c5630285ad5a726
2023-02-01 00:41:52 +01:00
Matthias Sohn da21265a14 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: I58ad4c210a5e7e5a1ba6b22315b04211c8909950
2023-02-01 00:33:20 +01:00
Luca Milanesio 21e902dd7f Shortcut during git fetch for avoiding looping through all local refs
The FetchProcess needs to verify that all the refs received point
to objects that are reachable from the local refs, which could be
very expensive but is needed to avoid missing objects exceptions
because of broken chains.

When the local repository has a lot of refs (e.g. millions) and the
client is fetching a non-commit object (e.g. refs/sequences/changes in
Gerrit) the reachability check on all local refs can be very expensive
compared to the time to fetch the remote ref.

Example for a 2M refs repository:
- fetching a single non-commit object: 50ms
- checking the reachability of local refs: 30s

A ref pointing to a non-commit object doesn't have any parent or
successor objects, hence would never need to have a reachability check
done. Skipping the askForIsComplete() altogether would save the 30s
time spent in an unnecessary phase.

Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: I09ac66ded45cede199ba30f9e71cc1055f00941b
2023-02-01 00:07:45 +01:00
Matthias Sohn 7650832002 FetchCommand: fix fetchSubmodules to work on a Ref to a blob
FetchCommand#fetchSubmodules assumed that FETCH_HEAD can always be
parsed as a tree. This isn't true if it refers to a Ref referring to a
BLOB. This is e.g. used in Gerrit for Refs like refs/sequences/changes
which are used to implement sequences stored in git.

Change-Id: I414f5b7d9f2184b2d7d53af1dfcd68cccb725ca4
2023-01-31 23:52:20 +01:00
Matthias Sohn 8040936f8a Silence API warnings introduced by I466dcde6
Change-Id: I510510da34d33757c2f83af8cd1e26f6206a486a
2023-01-31 23:45:07 +01:00
Luca Milanesio ad977f1572 Allow the exclusions of refs prefixes from bitmap
When running a GC.repack() against a repository with over one
thousands of refs/heads and tens of millions of ObjectIds,
the calculation of all bitmaps associated with all the refs
would result in an unreasonable big file that would take up to
several hours to compute.

Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz
Before this change: 20 mins
After this change and 2300 heads excluded: 10 mins (90s for bitmap)

Having such a large bitmap file is also slow in the runtime
processing and have negligible or even negative benefits, because
the time lost in reading and decompressing the bitmap in memory
would not be compensated by the time saved by using it.

It is key to preserve the bitmaps for those refs that are mostly
used in clone/fetch and give the ability to exlude some refs
prefixes that are known to be less frequently accessed, even
though they may actually be actively written.

Example: Gerrit sandbox branches may even be actively
used and selected automatically because its commits are very
recent, however, they may bloat the bitmap, making it ineffective.

A mono-repo with tens of thousands of developers may have
a relatively small number of active branches where the
CI/CD jobs are continuously fetching/cloning the code. However,
because Gerrit allows the use of sandbox branches, the
total number of refs/heads may be even tens to hundred
thousands.

Change-Id: I466dcde69fa008e7f7785735c977f6e150e3b644
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
2023-01-31 17:14:09 -05:00
Dmitrii Filippov 0f3a3fde95 Move MemRefDatabase creation in a separate method.
The InMemoryRepository is used in tests (e.g. in gerrit tests) and it
can be useful to create a custom MemRefDatabase for some tests.

Change-Id: I6fbbbfe04400ea1edc988c8788c8eeb06ca8480a
2023-01-31 13:55:25 -05:00
Luca Milanesio e4529cd39c PackWriterBitmapPreparer: do not include annotated tags in bitmap
The annotated tags should be excluded from the bitmap associated
with the heads-only packfile. However, this was not happening
because of the check of exclusion of the peeled object instead
of the objectId to be excluded from the bitmap.

Sample use-case:

refs/heads/main
  ^
  |
 commit1 <-- commit2 <- annotated-tag1 <- tag1
  ^
  |
 commit0

When creating a bitmap for the above commit graph, before this
change all the commits are included (3 bitmaps), which is
incorrect, because all commits reachable from annotated tags
should not be included.

The heads-only bitmap should include only commit0 and commit1
but because PackWriterBitPreparer was checking for the peeled
pointer of tag1 to be excluded (commit2) which was not found in
the list of tags to exclude (annotated-tag1), the commit2 was
included, even if it wasn't reachable only from the head.

Add an additional check for exclusion of the original objectId
for allowing the exclusion of annotated tags and their pointed
commits. Add one specific test associated with an annotated tag
for making sure that this use-case is covered also.

Example repository benchmark for measuring the improvement:
# refs: 400k (2k heads, 88k tags, 310k changes)
# objects: 11M (88k of them are annotate tags)
# packfiles: 2.7G

Before this change:
GC time: 5h
clone --bare time: 7 mins

After this change:
GC time: 20 mins
clone --bare time: 3 mins

Bug: 581267
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: Iff2bfc6587153001837220189a120ead9ac649dc
2023-01-31 14:15:56 +01:00
Matthias Sohn 611412a055 BatchingProgressMonitor: avoid int overflow when computing percentage
When cloning huge repositories I observed percentage of object counts
turning negative. This happened if lastWork * 100 exceeded
Integer.MAX_VALUE.

Change-Id: Ic5f5cf5a911a91338267aace4daba4b873ab3900
2023-01-31 14:15:53 +01:00
Xing Huang 66ad43a6c7 DfsReaderIoStats: Add Commit Graph fields into DfsReaderIoStats
We are adding commit-graph loading to the DFS stack and the stats object doesn't have fields to track that.

This change replicates the stats of the primary index for the commit-graph.

Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I4a657bed50083c4ae8bc9f059d4943d612ea2d49
2023-01-25 15:29:04 -06:00
Matthias Sohn cd3fc7a299 Speedup GC listing objects referenced from reflogs
GC needs to get a ReflogReader for all existing refs to list all objects
referenced from reflogs. The existing Repository#getReflogReader method
accepts the ref name and then resolves the Ref to create a ReflogReader.
GC calling that for a huge number of Refs one by one is very slow. GC
first gets all Refs in bulk and then calls getReflogReader for each of
them.

Fix this by adding another getReflogReader method to Repository which
accepts a Ref directly.

This speeds up running JGit gc on a mirror clone of the Gerrit
repository from 15:36 min to 1:08 min. The repository used in this test
had 45k refs, 275k commits and 1.2m git objects.

Change-Id: I474897fdc6652923e35d461c065a29f54d9949f4
2023-01-23 17:19:14 +01:00
Matthias Sohn a1901305b2 Merge branch 'stable-6.4'
* stable-6.4:
  Cache trustFolderStat/trustPackedRefsStat value per-instance
  Refresh 'objects' dir and retry if a loose object is not found

Change-Id: Iea8038dfde29ab988501469f86ee829e578a2fe8
2023-01-13 19:33:54 +01:00
Matthias Sohn 14300dd77b Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Cache trustFolderStat/trustPackedRefsStat value per-instance
  Refresh 'objects' dir and retry if a loose object is not found

Change-Id: I1db2b51ae8101f345d08235d4f3dc416bfcb42d5
2023-01-13 19:32:56 +01:00
Matthias Sohn 5bd2832134 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Cache trustFolderStat/trustPackedRefsStat value per-instance
  Refresh 'objects' dir and retry if a loose object is not found

Change-Id: Ibc9bffab8c9ef9c39384b53c142d99878f7f3f98
2023-01-13 19:32:06 +01:00
Matthias Sohn 9eef6790cf Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Cache trustFolderStat/trustPackedRefsStat value per-instance
  Refresh 'objects' dir and retry if a loose object is not found

Change-Id: I9e876f72f735f58bf02c7862a3d8e657fc46a7b9
2023-01-13 19:31:18 +01:00
Nasser Grainawi 21b2aef0aa Cache trustFolderStat/trustPackedRefsStat value per-instance
Instead of re-reading the config every time the methods using these
values were called, cache the config value at the time of instance
construction. Caching the values improves performance for each of the
method calls. These configs are set based on the filesystem storing the
repository and unlikely to change while an application is running.

Change-Id: I1cae26dad672dd28b766ac532a871671475652df
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
2023-01-13 18:45:02 +01:00
Kaushik Lingarkar fed1a54935 Refresh 'objects' dir and retry if a loose object is not found
A new loose object may not be immediately visible on a NFS
client if it was created on another client. Refreshing the
'objects' dir and trying again can help work around the NFS
behavior.

Here's an E2E problem that this change can help fix. Consider
a Gerrit multi-primary setup with repositories based on NFS.
Add a new patch-set to an existing change and then immediately
fetch the new patch-set of that change. If the fetch is handled
by a Gerrit primary different that the one which created the
patch-set, then we sometimes run into a MissingObjectException
that causes the fetch to fail.

Bug: 581317
Change-Id: Iccc6676c68ef13a1e8b2ff52b3eeca790a89a13d
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-01-13 18:44:35 +01:00
kylezhao de7d06775c RevWalk: integrate commit-graph with commit parsing
RevWalk#createCommit() will inspect the commit-graph file to find the
specified object's graph position and then return a new RevCommitCG
instance.

RevCommitGC is a RevCommit with an additional "pointer" (the position)
to the commit-graph, so it can load the headers and metadata from there
instead of the pack. This saves IO access in walks where the body is not
needed (i.e. #isRetainBody is false and #parseBody is not invoked).

RevWalk uses automatically the commit-graph if available, no action
needed from callers. The commit-graph is fetched on first access from
the reader (that internally can keep it loaded and reuse it between
walks).

The startup cost of reading the entire commit graph is small. After
testing, reading a commit-graph with 1 million commits takes less than
50ms. If we use RepositoryCache, it will not be initialized util the
commit-graph is rewritten.

Bug: 574368
Change-Id: I90d0f64af24f3acc3eae6da984eae302d338f5ee
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-01-10 14:56:33 +08:00
Matthias Sohn 801a56b48a Merge branch 'stable-6.4'
* stable-6.4:
  Introduce core.trustPackedRefsStat config
  Fix documentation for core.trustFolderStat

Change-Id: I93ad0c49b70113134026364c9f647de89d948693
2023-01-06 22:09:55 +01:00
kylezhao 05e5e9907c GC: disable writing commit-graph for shallow repos
In shallow repos, GC writes to the commit-graph that shallow commits
do not have parents. This won't be true after a "git fetch --unshallow"
(and before another GC).

Do not write the commit-graph from shallow clones of a repo. The
commit-graph must have the real metadata of commits and that is not
available in a shallow view of the repo.

Change-Id: Ic9f2358ddaa607c74f4dbf289c9bf2a2f0af9ce0
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-01-06 13:13:13 -05:00
Matthias Sohn 6a35235d16 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Introduce core.trustPackedRefsStat config
  Fix documentation for core.trustFolderStat

Change-Id: I18d9fc89c9ac1ef069dcefa7d7f992a28539ccf3
2023-01-05 16:09:58 +01:00
Matthias Sohn e4c2331af6 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Introduce core.trustPackedRefsStat config
  Fix documentation for core.trustFolderStat

Change-Id: I48b6c095ac62dc859829d6fef45325accbb0a144
2023-01-05 16:05:14 +01:00
Matthias Sohn 62ed46da16 Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Introduce core.trustPackedRefsStat config
  Fix documentation for core.trustFolderStat

Change-Id: Ic78630f74c72624932a384eed52ef79ae1eff3e5
2023-01-05 15:55:19 +01:00
Kaushik Lingarkar 82b5aaf7e3 Introduce core.trustPackedRefsStat config
Currently, we always read packed-refs file when 'trustFolderStat'
is false. Introduce a new config 'trustPackedRefsStat' which takes
precedence over 'trustFolderStat' when reading packed refs. Possible
values for this new config are:

* always: Trust packed-refs file attributes
* after_open: Same as 'always', but refresh the file attributes of
              packed-refs before trusting it
* never: Always read the packed-refs file
* unset: Fallback to 'trustFolderStat' to determine if the file
  attributes of packed-refs can be trusted

Folks whose repositories are on NFS and have traditionally been
setting 'trustFolderStat=false' can now get some performance improvement
with 'trustPackedRefsStat=after_open' as it refreshes the file
attributes of packed-refs (at least on some NFS clients) before
considering it.

For example, consider a repository on NFS with ~500k packed-refs. Here
are some stats which illustrate the improvement with this new config
when reading packed refs on NFS:

trustFolderStat=true trustPackedRefsStat=unset: 0.2ms
trustFolderStat=false trustPackedRefsStat=unset: 155ms
trustFolderStat=false trustPackedRefsStat=after_open: 1.5ms

Change-Id: I00da88e4cceebbcf3475be0fc0011ff65767c111
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-01-05 15:52:36 +01:00
Matthias Sohn 8ef58089a8 RefDatabase: fix javadoc formatting
Change-Id: I547819ac380a0e6a88d05206ff171b69f46a8549
2023-01-04 23:51:30 +01:00
Matthias Sohn ddf1c1ed3c Pull up additionalRefsNames from RefDirectory to RefDatabase
This enables to reuse this constant in all RefDatabase implementations.

Change-Id: I13d8fb780de24f71e005b698965fb5bcdbf3c728
2023-01-04 23:51:30 +01:00
Matthias Sohn 70b436b1b2 Add TernarySearchTree
A ternary search tree is a type of tree where nodes are arranged in a
manner similar to a binary search tree, but with up to three children
rather than the binary tree's limit of two.

Each node of a ternary search tree stores a single character, a
reference to a value object and references to its three children named
equal kid, lo kid and hi kid. The lo kid pointer must point to a node
whose character value is less than the current node. The hi kid pointer
must point to a node whose character is greater than the current
node.[1] The equal kid points to the next character in the word. Each
node in a ternary search tree represents a prefix of the stored strings.
All strings in the middle subtree of a node start with that prefix.

Like other prefix trees, a ternary search tree can be used as an
associative map with the ability for incremental string search. Ternary
search trees are more space efficient compared to standard prefix trees,
at the cost of speed.

They allow efficient prefix search which is important to implement
searching refs by prefix in a RefDatabase.

Searching by prefix returns all keys if the prefix is an empty string.

Bug: 576165
Change-Id: If160df70151a8e1c1bd6716ee4968e4c45b2c7ac
2023-01-04 23:51:23 +01:00