Commit Graph

3289 Commits

Author SHA1 Message Date
Pavel Salamon 0518a6b0c1 Change config pull.rebase=preserve to pull.rebase=merges
The native git option to preserve merge commits during rebase
has been changed from pull.rebase=preserve to pull.rebase=merges.

This changeset in jgit makes the same config change. The old "preserve"
option is no longer recognized and is replaced by new option called
"merges".

This makes jgit's rebase configuration compatible with native git
versions 2.34 and newer where the old "preserve" option has been
removed.

Change-Id: Ic07ff954e258115e76465a1593ef3259f4c418a3
2023-02-28 23:44:41 +01:00
Matthias Sohn 2d0b908048 BatchingProgressMonitor: expose time spent per task
Display elapsed time per task if enabled via
ProgressMonitor#showDuration or if system property or environment
variable GIT_TRACE_PERFORMANCE is set to "true". If both the system
property and the environment variable are set the system property takes
precedence.

E.g. using jgit CLI:

$ GIT_TRACE_PERFORMANCE=true jgit clone https://foo.bar/foobar
Cloning into 'foobar'...
remote: Counting objects: 1 [0.002s]
remote: Finding sources: 100% (15531/15531) [0.006s]
Receiving objects:      100% (169737/169737) [13.045s]
Resolving deltas:       100% (67579/67579) [1.842s]

Change-Id: I4d624e7858b286aeddbe7d4e557589986d73659e
2023-02-27 16:41:33 -05:00
Ivan Frade ca2c57b2ec PackWriter: offer to write an object-size index for the pack
PackWriter callers tell the writer what do the want to include in the
pack and invoke #writePack(). Afterwards, they can invoke #writeIndex()
to write the corresponding pack index.

Mirror this for the object-size index, adding a #writeObjectSizeIndex()
method.

Change-Id: Ic319975c72c239cd6488303f7d4cced797e6fe00
2023-02-24 12:56:33 -08:00
Matthias Sohn 176f17d05e Merge branch 'stable-6.4'
* stable-6.4:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: Ia2797b44a60342eb9df53f0b3d674cba92a512fc
2023-02-22 21:06:41 +01:00
Matthias Sohn f4eda3360a Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I5af34c92e423a651db53b4dc45ed844d5f39910d
2023-02-22 21:05:55 +01:00
Matthias Sohn 636f377e4e Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I5b6b10622b61fde3f0f10455a74ae159a0b69082
2023-02-22 21:03:52 +01:00
Matthias Sohn 6cc741aa23 Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I3ffe92566cc145053bb762f612dd96bc6d542c62
2023-02-22 21:03:22 +01:00
Matthias Sohn b526829fba Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: Idea23e555c024557d7e39a86efe25f609400b962
2023-02-22 21:02:47 +01:00
Matthias Sohn 238f1693f7 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I708d0936fa86b028e4da4e7e21f332f8b48ad293
2023-02-22 21:02:09 +01:00
Matthias Sohn 1691e38779 Fix GcConcurrentTest#testInterruptGc
With the new GC.PidLock interrupting a running GC throws a
ClosedByInterruptException.

Change-Id: I7ccea1ae9a43d4edfdab2fcfd1324c64cc22b38f
2023-02-22 20:38:29 +01:00
kylezhao d789fe2f4d UploadPack: use allow-any-sha1-in-want configuration
C git 2.11 supports setting the equivalent of RequestPolicy.ANY with
uploadpack.allowAnySHA1InWant[1]. Parse this into TransportConfig and
use it from UploadPack.

Add additional tests for [2] and this change.

We can execute "git clone --filter=blob:none --no-checkout" successfully
with config uploadPack.allowFilter is true. But when we checkout, the
git will fetch other missing objects required by the checkout(this is
why we need this config).

When both uploadPack.allowFilter and uploadPack.allowAnySHA1InWant are
true, jgit will support partial clone. If you are using an extremely
large monorepo, this feature can help. It allows users to work on an
incomplete repo which reduces disk usage.

[1] f8edeaa05d
[2] change Id39771a6e42d8082099acde11249306828a053c0

Bug: 573390
Change-Id: I8fe75f03bf1fea7c11e0d67c8637bd05dd1f9b89
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-02-21 09:11:21 +01:00
Ivan Frade c9552abaf3 PackObjectSizeIndex: interface and impl for the object-size index
Operations like "clone --filter=blob:limit=N" or the "object-info"
command need to read the size of the objects from the storage. An
index would provide those sizes at once rather than having to seek in
the packfile.

Introduce an interface for the Object-size index. This index returns
the inflated size of an object. Not all objects could be indexed (to
limit memory usage).

This implementation indexes only blobs (no trees, nor
commits) *above* certain size threshold (configurable). Lower
threshold adds more objects to the index, consumes more memory and
provides better performance. 0 means "all blobs" and -1 "disabled".

If we don't index everything, for the filter use case is more
efficient to index the biggest objects first: the set is small and
most objects are filtered by NOT being in the index. For the
object-size, the more objects in the index the better, regardless
their size. All together, it is more helpful to index above threshold.

Change-Id: I9ed608ac240677e199b90ca40d420bcad9231489
2023-02-14 11:50:29 -08:00
Ivan Frade 62d0e7be7c UInt24Array: Array of unsigned ints encoded in 3 bytes.
The object size index stores positions of objects in the main
index (when ordered by sha1). These positions are per-pack and usually
a pack has <16 million objects (there are exceptions but rather
rare). It could save some memory storing these positions in three bytes
instead of four. Note that these positions are sorted and always positive.

Implement a wrapper around a byte[] to access and search "ints" while
they are stored as unsigned 3 bytes.

Change-Id: Iaa26ce8e2272e706e35fe4cdb648fb6ca7591972
2023-02-14 10:19:12 -08:00
Ivan Frade 5b9ca7df42 PackIndex: expose the position of an object-id in the index
The primary index returns the offset in the pack for an
objectId. Internally it keeps the object-ids in lexicographical order,
but doesn't expose an API to find the position of an object-id in that
list. This is needed for the object-size index, that we want to store
as "position-in-idx, size".

Add a #findPosition(object-id) method to the PackIndex interface to
know where an object-id sits in the ordered list of ids in the pack.

Note that this index position is over the list of ordered object-ids,
while reverse-index position is over the list of objects in packed
order.

Change-Id: I89fa146599e347a26d3012d3477d7f5bbbda7ba4
2023-02-14 10:01:29 -08:00
Xing Huang df5b7959be DfsPackFile/DfsGC: Write commit graphs and expose in pack
JGit knows how to read/write commit graphs but the DFS stack is not
using it yet.

The DFS garbage collector generates a commit-graph with commits
reachable from any ref. The pack is stored as extra stream in the GC
pack. DfsPackFile mimicks how other indices are loaded storing the
reference in DFS cache.

Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I3f94997377986d21a56b300d8358dd27be37f5de
2023-02-07 16:59:56 -05:00
Han-Wen NIenhuys a1fa0ee679 Merge "UploadPack: consume delimiter in object-info command" 2023-02-02 09:09:25 -05:00
Han-Wen NIenhuys f94ab7680c Merge "PatchApplier fix - init cache with provided tree" 2023-02-02 09:00:56 -05:00
Han-Wen Nienhuys 341116103e UploadPack: consume delimiter in object-info command
The 'size' packet line is an argument, so it
must be preceeded by a 0001 delimiter. See also git's
t5701-git-serve.sh test,

https://github.com/git/git/blob/8b8d9a2/t/t5701-git-serve.sh#L329

Without this fix, the server will choke on the delimiter line, saying
PackProtocolException: unexpected <empty string>

To test, I ran Gerrit locally with this fix

$ curl -X POST   -H 'git-protocol: version=2'   -H 'content-type:
application/x-git-upload-pack-request'   -H 'accept:
application/x-git-upload-pack-result'   --data
$'0018command=object-info\n00010009size\n0031oid
d38b1b92bdb2893eb4505667375563f2d6d4086b\n0000'
http://localhost:8080/git.git/git-upload-pack

=>

0008size0032d38b1b92bdb2893eb4505667375563f2d6d4086b 268590000


The same command completes identically on Gitlab (which supports the
object-info command)

$ curl -X POST   -H 'git-protocol: version=2'   -H 'content-type:
application/x-git-upload-pack-request'   -H 'accept:
application/x-git-upload-pack-result'   --data
$'0018command=object-info\n00010009size\n0031oid
d38b1b92bdb2893eb4505667375563f2d6d4086b\n0000'
https://gitlab.com/gitlab-org/git.git/git-upload-pack

=>

0008size0032d38b1b92bdb2893eb4505667375563f2d6d4086b 268590000

In this case, the blob is for the COPYING file in the Git source tree,
which is 26859 bytes long.

Change-Id: Ief4ce1eb9303a3b2479547d7950ef01c7c28f472
2023-02-02 08:47:35 -05:00
Nitzan Gur-Furman a399bd13b1 PatchApplier fix - init cache with provided tree
This change only affects inCore repositories.
Before this change, any file that wasn't part of the patch
wasn't read, and therefore wasn't part of the output tree.

Change-Id: I246ef957088f17aaf367143f7a0b3af0f8264ffb
Bug: Google b/267270348
2023-02-02 12:39:26 +01:00
Han-Wen Nienhuys 9dfd2ff665 Avoid error-prone warning
GC.gc() returns a Future, which should not be discarded. See also
https://errorprone.info/bugpattern/FutureReturnValueIgnored

Change-Id: I343cc3cfe74a564ad7f8d53f0fe9d96a23aaed00
2023-02-01 10:53:46 +01:00
Matthias Sohn 580cb13f21 Merge branch 'stable-6.4'
* stable-6.4:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Id0ebfbd85eb815716383b9495eb7dd1f54cf4d74
2023-02-01 01:23:34 +01:00
Matthias Sohn ef010db594 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Iefcf5d832bd0087c1027876f2200689e1150abce
2023-02-01 01:12:06 +01:00
Matthias Sohn 82e1362e07 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: I2ff386d9a096277360e6c7bd5535b49984620fb3
2023-02-01 01:10:56 +01:00
Matthias Sohn d8c02aec6a Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Iff2fba026b49463016015b2fae1a42cf76ee2dbb
2023-02-01 00:54:30 +01:00
Matthias Sohn b5de5ccb9e Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Ib5055f2f3b8a313c178d6f6c7c5630285ad5a726
2023-02-01 00:41:52 +01:00
Matthias Sohn da21265a14 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: I58ad4c210a5e7e5a1ba6b22315b04211c8909950
2023-02-01 00:33:20 +01:00
Luca Milanesio ad977f1572 Allow the exclusions of refs prefixes from bitmap
When running a GC.repack() against a repository with over one
thousands of refs/heads and tens of millions of ObjectIds,
the calculation of all bitmaps associated with all the refs
would result in an unreasonable big file that would take up to
several hours to compute.

Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz
Before this change: 20 mins
After this change and 2300 heads excluded: 10 mins (90s for bitmap)

Having such a large bitmap file is also slow in the runtime
processing and have negligible or even negative benefits, because
the time lost in reading and decompressing the bitmap in memory
would not be compensated by the time saved by using it.

It is key to preserve the bitmaps for those refs that are mostly
used in clone/fetch and give the ability to exlude some refs
prefixes that are known to be less frequently accessed, even
though they may actually be actively written.

Example: Gerrit sandbox branches may even be actively
used and selected automatically because its commits are very
recent, however, they may bloat the bitmap, making it ineffective.

A mono-repo with tens of thousands of developers may have
a relatively small number of active branches where the
CI/CD jobs are continuously fetching/cloning the code. However,
because Gerrit allows the use of sandbox branches, the
total number of refs/heads may be even tens to hundred
thousands.

Change-Id: I466dcde69fa008e7f7785735c977f6e150e3b644
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
2023-01-31 17:14:09 -05:00
Luca Milanesio e4529cd39c PackWriterBitmapPreparer: do not include annotated tags in bitmap
The annotated tags should be excluded from the bitmap associated
with the heads-only packfile. However, this was not happening
because of the check of exclusion of the peeled object instead
of the objectId to be excluded from the bitmap.

Sample use-case:

refs/heads/main
  ^
  |
 commit1 <-- commit2 <- annotated-tag1 <- tag1
  ^
  |
 commit0

When creating a bitmap for the above commit graph, before this
change all the commits are included (3 bitmaps), which is
incorrect, because all commits reachable from annotated tags
should not be included.

The heads-only bitmap should include only commit0 and commit1
but because PackWriterBitPreparer was checking for the peeled
pointer of tag1 to be excluded (commit2) which was not found in
the list of tags to exclude (annotated-tag1), the commit2 was
included, even if it wasn't reachable only from the head.

Add an additional check for exclusion of the original objectId
for allowing the exclusion of annotated tags and their pointed
commits. Add one specific test associated with an annotated tag
for making sure that this use-case is covered also.

Example repository benchmark for measuring the improvement:
# refs: 400k (2k heads, 88k tags, 310k changes)
# objects: 11M (88k of them are annotate tags)
# packfiles: 2.7G

Before this change:
GC time: 5h
clone --bare time: 7 mins

After this change:
GC time: 20 mins
clone --bare time: 3 mins

Bug: 581267
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: Iff2bfc6587153001837220189a120ead9ac649dc
2023-01-31 14:15:56 +01:00
Matthias Sohn 611412a055 BatchingProgressMonitor: avoid int overflow when computing percentage
When cloning huge repositories I observed percentage of object counts
turning negative. This happened if lastWork * 100 exceeded
Integer.MAX_VALUE.

Change-Id: Ic5f5cf5a911a91338267aace4daba4b873ab3900
2023-01-31 14:15:53 +01:00
kylezhao de7d06775c RevWalk: integrate commit-graph with commit parsing
RevWalk#createCommit() will inspect the commit-graph file to find the
specified object's graph position and then return a new RevCommitCG
instance.

RevCommitGC is a RevCommit with an additional "pointer" (the position)
to the commit-graph, so it can load the headers and metadata from there
instead of the pack. This saves IO access in walks where the body is not
needed (i.e. #isRetainBody is false and #parseBody is not invoked).

RevWalk uses automatically the commit-graph if available, no action
needed from callers. The commit-graph is fetched on first access from
the reader (that internally can keep it loaded and reuse it between
walks).

The startup cost of reading the entire commit graph is small. After
testing, reading a commit-graph with 1 million commits takes less than
50ms. If we use RepositoryCache, it will not be initialized util the
commit-graph is rewritten.

Bug: 574368
Change-Id: I90d0f64af24f3acc3eae6da984eae302d338f5ee
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-01-10 14:56:33 +08:00
Nasser Grainawi 2011fe06d2 FileSnapshotTest: Add more MISSING_FILE coverage
Add a couple tests that confirm what the docs say about isModified() and
equals(MISSING_FILE) behavior.

Change-Id: I6093040ba3594934c3270331405a44b2634b97c5
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
2023-01-06 14:23:12 -07:00
kylezhao 05e5e9907c GC: disable writing commit-graph for shallow repos
In shallow repos, GC writes to the commit-graph that shallow commits
do not have parents. This won't be true after a "git fetch --unshallow"
(and before another GC).

Do not write the commit-graph from shallow clones of a repo. The
commit-graph must have the real metadata of commits and that is not
available in a shallow view of the repo.

Change-Id: Ic9f2358ddaa607c74f4dbf289c9bf2a2f0af9ce0
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-01-06 13:13:13 -05:00
Matthias Sohn 70b436b1b2 Add TernarySearchTree
A ternary search tree is a type of tree where nodes are arranged in a
manner similar to a binary search tree, but with up to three children
rather than the binary tree's limit of two.

Each node of a ternary search tree stores a single character, a
reference to a value object and references to its three children named
equal kid, lo kid and hi kid. The lo kid pointer must point to a node
whose character value is less than the current node. The hi kid pointer
must point to a node whose character is greater than the current
node.[1] The equal kid points to the next character in the word. Each
node in a ternary search tree represents a prefix of the stored strings.
All strings in the middle subtree of a node start with that prefix.

Like other prefix trees, a ternary search tree can be used as an
associative map with the ability for incremental string search. Ternary
search trees are more space efficient compared to standard prefix trees,
at the cost of speed.

They allow efficient prefix search which is important to implement
searching refs by prefix in a RefDatabase.

Searching by prefix returns all keys if the prefix is an empty string.

Bug: 576165
Change-Id: If160df70151a8e1c1bd6716ee4968e4c45b2c7ac
2023-01-04 23:51:23 +01:00
kylezhao 414bfe05ff CommitGraph: teach ObjectReader to get commit-graph
FileRepository's ObjectReader#getCommitGraph will return commit-graph
when it exists and core.commitGraph is true.

DfsRepository is not supported currently.

Change-Id: I992d43d104cf542797e6949470e95e56de025107
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-01-04 14:50:38 +08:00
Ivan Frade 93ac99b52a Merge "CommitGraph: add commit-graph for FileObjectDatabase" 2023-01-03 14:56:53 -05:00
Thomas Wolf 9a6d602488 PatchApplier: fix handling of last newline in text patch
If the last line came from the patch, use the patch to determine whether
or not there should be a trailing newline. Otherwise use the old text.

Add test cases for
- no newline at end, last line not in patch hunk
- no newline at end, last line in patch hunk
- patch removing the last newline
- patch adding a newline at the end of file not having one

all for core.autocrlf false, true, and input.

Add a test case where the "no newline" indicator line is not the last
line of the last hunk. This can happen if the patch ends with removals
at the file end.

Bug: 581234
Change-Id: I09d079b51479b89400ad300d0662c1dcb50deab6
Also-by: Yuriy Mitrofanov <a2terminator@mail.ru>
Signed-off-by: Thomas Wolf <twolf@apache.org>
2022-12-26 11:51:25 +01:00
kylezhao 8a7348df69 CommitGraph: add commit-graph for FileObjectDatabase
This change makes JGit can read .git/objects/info/commit-graph file
and then get CommitGraph.

Loading a new commit-graph into memory requires additional time. After
testing, loading a copy of the Linux's commit-graph(1039139 commits)
is under 50ms.

Bug: 574368
Change-Id: Iadfdd6ed437945d3cdfdbe988cf541198140a8bf
Signed-off-by: kylezhao <kylezhao@tencent.com>
2022-12-23 13:06:06 +08:00
Thomas Wolf aeb74f63d4 Reformat PatchApplier and PatchApplierTest
Some lines were too long, unnecessary fully qualified class names,
and an assertEquals(actual, expected) when it should have been
assertEquals(expected, actual).

Change-Id: I3b3c46c963afe2fb82a79c1e93970e73778877e5
Signed-off-by: Thomas Wolf <twolf@apache.org>
2022-12-22 23:30:02 +01:00
Anna Papitto 9b7c3ac11f IO#readFully: provide overload that fills the full array
IO#readFully is often called with the intent to fill the destination
array from beginning to end. The redundant arguments for where to start
and stop filling are opportunities for bugs if specified incorrectly or
if not changed to match a changed array length.

Provide a overloaded method for filling the full destination array.

Change-Id: I964f18f4a061189cce1ca00ff0258669277ff499
Signed-off-by: Anna Papitto <annapapitto@google.com>
2022-12-19 10:26:41 -08:00
kylezhao b082c58e0f GC: Write commit-graph files when gc
If 'core.commitGraph' and 'gc.writeCommitGraph' are both true, then gc
will rewrite the commit-graph file when 'git gc' is run. Defaults to
false while the commit-graph feature matures.

Bug: 574368
Change-Id: Ic94cd69034c524285c938414610f2e152198e06e
Signed-off-by: kylezhao <kylezhao@tencent.com>
2022-12-16 11:11:45 -05:00
kylezhao 7016e2ddae CommitGraph: add core.commitGraph config
Change-Id: I3b5e735ebafba09ca18fd83da479c7950fa3ea8d
Signed-off-by: kylezhao <kylezhao@tencent.com>
2022-12-16 10:21:09 -05:00
Ivan Frade 6ea36794d1 Merge "Gc#deleteOrphans: avoid dependence on PackExt alphabetical ordering" 2022-12-16 08:20:24 -05:00
kylezhao 7b0f633b67 CommitGraph: implement commit-graph read
Git introduced a new file storing the topology and some metadata of
the commits in the repo (commitGraph). With this data, git can browse
commit history without parsing the pack, speeding up e.g.
reachability checks.

This change teaches JGit to read commit-graph-format file, following
the upstream format([1]).

JGit can read a commit-graph file from a buffered stream, which means
that we can provide this feature for both FileRepository and
DfsRepository.

[1] https://git-scm.com/docs/commit-graph-format/2.21.0

Bug: 574368
Change-Id: Ib5c0d6678cb242870a0f5841bd413ad3885e95f6
Signed-off-by: kylezhao <kylezhao@tencent.com>
2022-12-16 06:57:06 -05:00
Anna Papitto 5c6c374ff6 Gc#deleteOrphans: avoid dependence on PackExt alphabetical ordering
Deleting orphan files depends on .pack and .keep being reverse-sorted
to before the corresponding index files that could be orphans. The new
reverse index file extension (.rev) will break that frail dependency.

Rewrite Gc#deleteOrphans to avoid that dependence by tracking which pack
names have a .pack or .keep file and then deleting any index files that
without a corresponding one. This approach takes linear time instead of
the O(n logn) time needed for sorting.

Change-Id: If83c378ea070b8871d4b01ae008e7bf8270de763
Signed-off-by: Anna Papitto <annapapitto@google.com>
2022-12-15 11:54:11 -08:00
Matthias Sohn ebc1f7d65c commitgraph package: fix exports/imports, add @since tag for new API
Change-Id: I9175b1d796f91f5ba4e21d3418550ae451c054b0
2022-12-08 02:00:58 +01:00
kylezhao cf70e7cbe4 CommitGraph: implement commit-graph writer
Teach JGit to write a commit-graph formatted file by walking commit
graph from specified commit objects.

See: https://git-scm.com/docs/commit-graph-format/2.21.0

Bug: 574368
Change-Id: I34f9f28f8729080c275f86215ebf30b2d05af41d
Signed-off-by: kylezhao <kylezhao@tencent.com>
2022-12-06 20:34:46 +08:00
Matthias Sohn 339b38340f Prepare 6.4.1-SNAPSHOT builds
Change-Id: I860bfde113c05015c41304c4a77c44c224bd0923
2022-11-30 15:41:41 +01:00
Matthias Sohn acd079b372 JGit v6.4.0.202211300538-r
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Change-Id: If4001b255a209849b4acabd2083164d0794f00c4
2022-11-30 11:38:12 +01:00
Dmitrii Filippov cb9f058f9b Fix crashes on rare combination of file names
The NameConflictTreeWalk class is used in merge for iterating over
entries in commits. The class uses a separate iterator for each
commit's tree. In rare cases it can incorrectly report the same entry
twice. As a result, duplicated entries are added to the merge result
and later jgit throws an exception when it tries to process merge
result.

The problem appears only when there is a directory-file conflict for
the last item in trees. Example from the bug:
Commit 1:
* subtree - file
* subtree-0 - file
Commit 2:
* subtree - directory
* subtree-0 - file
Here the names are ordered like this:
"subtree" file <"subtree-0" file < "subtree" directory.

The NameConflictTreeWalk handles similar cases correctly if there are
other files after subtree... in commits - this is processed in the
AbstractTreeIterator.min function. Existing code has a special
optimization for the case, when all trees are pointed to the same
entry name - it skips additional checks. However, this optimization
incorrectly skips checks if one of trees reached the end.

The fix processes a situation when some trees reached the end, while
others are still point to an entry.

bug: 535919
Change-Id: I62fde3dd89779fac282479c093400448b4ac5c86
2022-11-29 10:49:27 +01:00
Han-Wen NIenhuys 1d5a6c77a6 Merge "Fix crashes on rare combination of file names" 2022-11-28 09:34:46 -05:00