Commit Graph

5820 Commits

Author SHA1 Message Date
Jonathan Tan ec11129b1d Merge changes I8c60d970,I09bdd4b8,I87ff3933
* changes:
  Pack: open reverse index from file if present
  PackReverseIndex: open file if present otherwise compute
  PackReverseIndex: verify checksums
2023-07-26 16:39:13 -04:00
Ivan Frade c8f5a3f99d RevCommitCG: Read changed-path-filters directly from commit graph
RevCommit and RevCommitCG were designed like "pointers" to data that
load the content on demand, not on construction. This saves memory.

Make the loading of changed-path-filter follow the same pattern. The
ChangedPathFilters are only pointers to locations in the commit-graph
(not the actual data), so the memory saving is not that big, but this
is more consistent with the rest of the API.

As 6.7 is not released, we can still change the RevWalk API.

Change-Id: Id4186ea744b8a2418d0329facae69f785108d356
2023-07-26 12:22:01 +02:00
Matthias Sohn 68c08265aa Add missing @since tags for new API methods
This was missed in d3b40e72ac.

Change-Id: I6e90157c6be34ae6618e246b02cf77631c8e9732
2023-07-25 23:22:46 +02:00
Jonathan Tan c77fb93478 Merge "Identify a commit that generates a diffEntry on a rename Event." 2023-07-25 12:09:40 -04:00
Ronald Bhuleskar ec3d919aa5 Identify a commit that generates a diffEntry on a rename Event.
When using FollowFilter's rename callback, a callback is generated with the diff. The caller that is interested in the renames knows what the diff's are but have no idea what commit generated that diff.

This will allow FollowFilter's rename callback to track diffEntry for a given commit.

Change-Id: If1e63ccd19fdcb9c58c59137110fe24e0ce023d2
2023-07-24 19:42:51 -04:00
Jonathan Tan 0f4af2bc36 Merge changes I60a92463,Ic3b68220
* changes:
  PackReverseIndexV1: reverse index parsed from version 1 file
  ComputedPackReverseIndex: Clarify custom bucket sort algorithm
2023-07-21 14:05:38 -04:00
Anna Papitto f196c7a0e8 Pack: open reverse index from file if present
The reverse index for a pack is still always computed if needed, which
is slower than parsing it from a file.

Supply the file path where the reverse index file might be so that it
parsed instead of computed if the file is present.

Change-Id: I8c60d970fd587341dfb2763fb87f1c586279f2a5
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-07-18 15:19:26 -07:00
Anna Papitto 000e7caf5e PackReverseIndexV1: reverse index parsed from version 1 file
The reverse index for a pack is used to quickly find an object's
position in the pack's forward index based on that object's pack offset.
It is currently computed from the forward index by sorting the index
entries by the corresponding pack offset. This computation uses
insertion sort, which has an average runtime of O(n^2).

Cgit persists a pack reverse index file
to avoid recomputing the reverse index ordering. Instead they write a
file with format
https://git-scm.com/docs/pack-format#_pack_rev_files_have_the_format
which can later be read and parsed into the in-memory reverse index
each time it is needed.

PackReverseIndexV1 parses a reverse index file with the official
version 1 format into an in-memory representation of the reverse index
which implements methods to find an object's forward index position
from its offset in logorithmic time.

Change-Id: I60a92463fbd6a8cc9c1c7451df1c14d0a21a0f64
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-07-18 15:19:26 -07:00
Anna Papitto 2eba4e5b41 PackReverseIndex: open file if present otherwise compute
The existing #read and #computeFromIndex static builder methods require
the caller to choose whether to supply an input stream of a reverse
index file or a forward index to compute the reverse index from, which
is slower.

Allow a caller to provide a file path where the pack's reverse index
might be and the pack's forward index index and simply get some reverse
index instance back. Prefer opening and parsing the file if it is
present, to save computation time. Otherwise, fall back onto computing
the reverse index from the pack's forward index.

Change-Id: I09bdd4b813ad62c86add586417b2ab86e9331aec
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-07-18 15:19:26 -07:00
Anna Papitto 8123dcd699 PackReverseIndex: verify checksums
The new version 1 file-based reverse index has a footer with the
checksum of the corresponding pack file and a checksum of its own
contents. The initial implementation doesn't enforce that the pack
checksum matches the checksum found in the forward index nor that the
self checksum matches the contents of the file just read in.

Offer a method for reverse index users to verify the checksums in a way
appropriate to the version being used. For the pre-existing computed
version, always succeed since it is not based on a file so there is no
possibility of corruption.

Check for corruption of the file itself during parsing the checksum
footer, by comparing the self checksum with the digest of the file
contents read.

Change-Id: I87ff3933cf1afa76663350400b616695e4966cb6
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-07-18 15:19:26 -07:00
Anna Papitto 7d2669587f ComputedPackReverseIndex: Clarify custom bucket sort algorithm
The ComputedPackReverseIndex uses a custom sorting algorithm, based on
bucket sort with insertion sort but with the data managed as a linked
list across two int arrays. This custom algorithm relies on the set of
values being sorted being exactly 0, ..., n-1; so that they can serve a
second purpose of being indexes into a second equally sized list.

This custom algorithm was introduced ~10 years ago in
6cc532a43c.
The original author is no longer an active contributor, so it is
valuable for the code to be readable, especially as there is currently
active work on reverse indexes.

Rename variables and add comments to clarify the algorithm and improve
readability. There are no functional changes to the algorithm.

Change-Id: Ic3b682203f20e06f9f865f81259e034230f9720a
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-07-18 15:19:20 -07:00
Ronald Bhuleskar 3b77e33ad8 CommitGraphWriter: add option for writing/using bloom filters
Currently, bloom filters are written and used without any way to turn
them off. Add a per-repo config variable to control whether bloom
filters are written. As for reading, add a JGit option to control this.
(A JGit option is used instead of a per-repo config variable as there is
usually no reason not to use the bloom filters if they are present, but
a global control to disable them is useful if there turns out to be an
issue with the implementation of bloom filters.)

The config that controls reading is the same as C Git, but the config
for writing is not: C Git has no config to control writing, but whether
bloom filters are written depends on whether bloom filters are already
present and what arguments are passed to "git commit-graph write". See
the manpage of "git commit-graph" for more information.

Change-Id: I1b7b25340387673506252b9260b22bfe147bde58
2023-07-18 14:21:48 -07:00
Jonathan Tan 77aec62141 CommitGraphWriter: reuse changed path filters
Teach CommitGraphWriter to reuse changed path filters that have been
read from the commit graph file whenever possible.

Change-Id: I1acbfa1613ca7198386a49209028886af360ddb6
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
2023-07-18 14:21:48 -07:00
Jonathan Tan d3b40e72ac RevWalk: use changed path filters
Teach RevWalk, TreeRevFilter, PathFilter, and FollowFilter to use
changed path filters, whenever available, to speed revision walks by
skipping commits that fail the changed path filter.

This work is based on earlier work by Kyle Zhao
(I441be984b609669cff77617ecfc838b080ce0816).

Change-Id: I7396f70241e571c63aabe337f6de1b8b9800f7ed
Signed-off-by: kylezhao <kylezhao@tencent.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
2023-07-18 14:21:48 -07:00
Jonathan Tan ff0f7c174f CommitGraphLoader: read changed-path filters
As described in the parent commit, add support for reading the BIDX and
BDAT chunks of the commit graph file, as described in man gitformat-
commit-graph(5).

This work is based on earlier work by Kyle Zhao
(I160f6b022afaa842c331fb9a086974e49dced7b2).

Change-Id: I82e02e6a3a3b758e6bf9d7bbd2198f0ffe3a331b
Signed-off-by: kylezhao <kylezhao@tencent.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
2023-07-18 14:21:48 -07:00
Jonathan Tan 49beb5ae51 CommitGraphWriter: write changed-path filters
Add support for writing the BIDX and BDAT chunks of the commit graph
file, as described in man gitformat-commit-graph(5). The ability to read
such chunks will be added in a subsequent commit.

This work is based on earlier work by Kyle Zhao
(Ib863782af209f26381e3ca0a2c119b99e84b679c).

Change-Id: Ic18e6f0eeec7da1e1ff31751aabda5e6952dbe6e
Signed-off-by: kylezhao <kylezhao@tencent.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
2023-07-18 14:21:48 -07:00
Matthias Sohn 5dc63514d0 Merge "ssh: PKCS#11 support" 2023-07-17 18:13:06 -04:00
Thomas Wolf 23758d7a61 ssh: PKCS#11 support
Support PKCS#11 HSMs (like YubiKey PIV) for SSH authentication.

Use the SunPKCS11 provider as described at [1]. This provider
dynamically loads the library from the PKCS11Provider SSH configuration
and creates a Java KeyStore with that provider. A Java CallbackHandler
is needed to feed PIN prompts from the KeyStore into the JGit
CredentialsProvider framework. Because the JGit CredentialsProvider may
be specific to a SSH session but the PKCS11Provider may be used by
several sessions, the CallbackHandler needs to be configurable per
session.

PIN prompts respect the NumberOfPasswordPrompts SSH configuration. As
long as the library asks only for a PIN, we use the KeyPasswordProvider
to prompt for it. This gives automatic integration in Eclipse with the
Eclipse secure storage, so a user has even the option to store the PIN
there. (Eclipse will then ask for the secure storage master password on
first access, so the usefulness of this is debatable.)

By default the provider uses the first PKCS#11 token (slot list index
zero). This can be overridden by a non-standard PKCS11SlotListIndex
ssh configuration entry. (For OpenSSH interoperability, also set
"IgnoreUnknown PKCS11SlotListIndex" in the SSH config file then.)

Once loaded, the provider and its shared library and the keys
contained remain available until the application exits.

Manually tested using SoftHSM. See file manual_tests.txt. Kudos to
Christopher Lamb for additional manual testing with a real YubiKey,
also on Windows.[2]

[1] https://docs.oracle.com/en/java/javase/11/security/pkcs11-reference-guide1.html
[2] https://www.eclipse.org/forums/index.php/t/1113295/

Change-Id: I544c97e1e24d05e28a9f0e803fd4b9151a76ed11
Signed-off-by: Thomas Wolf <twolf@apache.org>
2023-07-17 04:52:30 -04:00
Matthias Sohn db08835c6c GC: Remove handling of extra pack for RefTree
RefTree was packed in its own packfile, see
Icbb735be8fa91ccbf0708ca3a219b364e11a6b83.

RefTree was deleted in Ia3da7f2b82d9e365cec2ccf9397cbc47439cd150, since
it was experimental and never used productively. This change missed to
remove the extra pack handling for RefTree.

Change-Id: I8c0d0a66440c331c3d03d0e07d5629682af2a7a9
2023-07-17 00:57:21 +02:00
Matthias Sohn b2f7dc189a Remove redundant specification of type arguments
Change-Id: I8289e0a6ca9154d6411993d250176a35df7cb905
2023-07-16 15:11:17 +02:00
Ivan Frade 760bdd09b1 DfsPackParser: Create object indices if config says so
The DfsInserter writes the pack and its indices in the flush() method,
but when the writing happens via DfsPackParser, it is the parser which
writes the pack and indices. When combined with a parser, flushing the
inserter is a noop.

Add the writing of the object size index to the packparser#parse
method, mirroring how the primary index is written.

Change-Id: I52c5db153fea7e4a8ecd8b3d5de7ad21f7f81a60
2023-07-14 10:51:18 -07:00
Ivan Frade cb99ff5bbb DfsInserter: generate object size index if config says so
DfsInserter receives objects and on flush() writes a pack and its
primary index.

Teach the DfsInserter to write also the object size index if the
config says so.

Change-Id: I89308312f8fd898d4c714a9b68ff948d3663800b
2023-07-14 10:34:46 -07:00
Ivan Frade 4d2a003b91 DfsInserter: populate full size on object insertion
We need the full size of the object to populate the object size index
later.

Save the size the PackedObjectInfo while adding objects to the
pack. Then we don't need to re-read it from the pack at indexing time.

Change-Id: I5bd7ad402df60b4637038def8ef7be2ab45faf87
2023-07-14 10:25:20 -07:00
Ivan Frade 12a4a4ccaa DFSGarbargeCollector: Write object size indices
PackWriter knows how to add an object size index to the pack, but the
garbage collector is not using it yet.

Teach DfsGarbageCollector to write the object size index on
writePack(). Disable by default in the unreachable-garbage pack.

Callers control the content/presence of the index through the
PackConfig option (minBytesForObjSizeIndex) for all other packs, so
there is no need of a specific flag in DfsGarbageCollector.

Change-Id: I86f5f17310e6913381125bec4caab32dc45b7c9d
2023-07-14 10:25:06 -07:00
Ivan Frade 9dace2e6d6 DfsReader/PackFile: Implement isNotLargerThan using the obj size idx
isNotLargerThan() can avoid reading the size of a blob from disk using
the object size idx if available.

Load the object size index in the DfsPackfile following the same
pattern than the other indices. Override isNotLargerThan in DfsReader
to use the index when available.

Following CL introduces the writing of the object size index and the
tests cover this code.

Change-Id: I15c95b84c1424707c487a7d29c5c46b1a9d0ceba
2023-07-13 11:24:17 -07:00
Luca Milanesio 3a6eec9bb6 Express the explicit intention of creating bitmaps in GC
Add an explicit flag to PackWriter for allowing the
GC.repack() phase to explicitly generate bitmaps only for the
heads packfile and not for the others.

Previously the bitmap generation was conditioned to the
presence of object ids exclusion from the PackWriter.

The introduction of the bitmap generation in the PackWriter
done in Icdb0cdd66 has accidentally made the .keep files not
completely transparent, because their presence have disabled
the generation of the bitmap index, even if the generation
of bitmaps is enabled.

This bug has been an accidental consequence of the intention
of the bitmap generator to avoid generating bitmaps for the
non-heads packfile, however the implementation done by Colby
decided to use the excludeInPacks variable (see [1]) which
is unfortunately also used for excluding the packfiles having
an associated .keep file (see [2]).

[1] https://git.eclipse.org/r/c/jgit/jgit/+/7940/18/org.eclipse.jgit/src/org/eclipse/jgit/storage/pack/PackWriter.java#1617
[2] dafcb8f6db/org.eclipse.jgit/src/org/eclipse/jgit/storage/file/GC.java (506)

Bug: 582039
Change-Id: Id722e68d9ff4ac24e73bf765ab11017586b6766e
2023-07-05 15:30:11 +02:00
Luca Milanesio ac8d7838f0 GC: prune all packfiles after the loosen phase
When loosening the objects inside the packfiles to be pruned, make sure
that the packfile list is stable and prune all the files after the
loosening is done.

This prevents a series of exceptions previously thrown when loosening
the packfiles, due to the too early pruning of the packfiles that were
still in the pack list.

Bug: 581532
Change-Id: I776776e2e083f1fa749d53f965bf50f919823b4f
2023-07-05 15:28:16 +02:00
Anna Papitto 91b23cc552 DfsPackFile: make #getReverseIdx public
The DfsPackFile#getReverseIdx method, which wraps creating a
PackReverseIndex in caching, was package-private. This caused
implementations on top of DfsPackFile to directly instantiate a
PackReverseIndex in cases where it would benefit from caching.

Instead, make #getReverseIdx public so that the caching logic can be
reused by implementations where appropriate.

Change-Id: I4553e514a4ac320bfe2455c00023343ad97f9d15
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-06-27 13:25:29 -07:00
Anna Papitto 8e61971620 PackReverseIndex: separate out the computed implementation
PackReverseIndex is a concrete class whose implementation is computed
from a pack's forward index. Callers which have a reverse index file may
want to use an implementation that is file-based instead.

Generalize PackReverseIndex into an interface without
implementation-specific logic and separate out the logic for the
computed implementation into a new concrete class.

Change-Id: I98d9835363c5e1c8c3c11a81b0761af3cdeaa41a
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-06-21 14:04:12 -07:00
Thomas Wolf faefa90f99 Default for global (user) git ignore file
C git has a default for git config core.excludesfile: "Its default
value is $XDG_CONFIG_HOME/git/ignore. If $XDG_CONFIG_HOME is either
not set or empty, $HOME/.config/git/ignore is used instead." [1]

Implement this in the WorkingTreeIterator$RootIgnoreNode.

To make this testable, mock the "user.home" directory for all JGit
tests, otherwise tests might pick up a real user's git ignore file.
Also ensure that JGit code always reads "user.home" via the
SystemReader.

Add tests for both locations.

[1] https://git-scm.com/docs/gitignore#_description

Bug: 436127
Change-Id: Ie510259320286c3c13a6464a37da1bd9ca1e373a
Signed-off-by: Thomas Wolf <twolf@apache.org>
2023-06-19 08:19:29 +02:00
Antoine Musso 7b955048eb Fix all Javadoc warnings and fail on them
This fixes all the javadoc warnings, stops ignoring doclint 'missing'
category and fails the build on javadoc warnings for public and
protected classes and class members.

Since javadoc doesn't allow access specifiers when specifying doclint
configuration we cannot set `-Xdoclint:all,-missing/private`
hence there is no simple way to skip private elements from doclint.
Therefore we check javadoc using the Eclipse Java compiler
(which is used by default) and javadoc configuration in
`.settings/org.eclipse.jdt.core.prefs` files.
This allows more fine grained configuration.

We can reconsider this when javadoc starts supporting access specifiers
in the doclint configuration.

Below are detailled explanations for most modifications.

@inheritDoc
===========
doclint complains about explicits `{@inheritDoc}` when the parent does
not have any documentation. As far as I can tell, javadoc defaults to
inherit comments and should only be used when one wants to append extra
documentation from the parent. Given the parent has no documentation,
remove those usages which doclint complains about.

In some case I have moved up the documentation from the concrete class
up to the abstract class.

Remove `{@inheritDoc}` on overriden methods which don't add additional
documentation since javadoc defaults to inherit javadoc of overridden
methods.

@value to @link
===============
In PackConfig, DEFAULT_SEARCH_FOR_REUSE_TIMEOUT and similar are forged
from Integer.MAX_VALUE and are thus not considered constants (I guess
cause the value would depends on the platform). Replace it with a link
to `Integer.MAX_VALUE`.

In `StringUtils.toBoolean`, @value was used to refer to the
`stringValue` parameter. I have replaced it with `{@code stringValue}`.

{@link <url>} to <a>
====================
@link does not support being given an external URL. Replaces them with
HTML `<a>`.

@since: being invalid
=====================

org.eclipse.jgit/src/org/eclipse/jgit/util/Equality.java has an invalid
tag `@since: ` due to the extra `:`. Javadoc does not complain about it
with version 11.0.18+10 but does with 11.0.19.7. It is invalid
regardless.

invalid HTML syntax
===================

- javadoc doesn't allow <br/>, <p/> and </p> anymore, use <br> and <p>
instead
- replace <tt>code</tt> by {@code code}
- <table> tags don't allow summary attribute, specify caption as
<caption>caption</caption> to fix this

doclint visibility issue
========================

In the private abstract classes `BaseDirCacheEditor` and
`BasePackConnection` links to other methods in the abstract class are
inherited in the public subclasses but doclint gets confused and
considers them unreachable. The HTML documentation for the sub classes
shows the relative links in the sub classes, so it is all correct. It
must be a bug somewhere in javadoc.
Mute those warnings with: @SuppressWarnings("doclint:missing")

Misc
====
Replace `<` and `>` with HTML encoded entities (`&lt; and `&gt;`).
In `SshConstants` I went enclosing a serie of -> arrows in @literal.

Additional tags
===============
Configure maven-javad0c-plugin to allow the following additional tags
defined in https://openjdk.org/jeps/8068562:
- apiNote
- implSpec
- implNote

Missing javadoc
===============
Add missing @params and descriptions

Change-Id: I840056389aa59135cfb360da0d5e40463ce35bd0
Also-By: Matthias Sohn <matthias.sohn@sap.com>
2023-06-16 01:08:13 +02:00
Antoine Musso c7960910f0 Mark COMMIT_GENERATION_* constants final
In org.eclipse.jgit.lib.Constants the constants are all marked final
with the exception of:

- COMMIT_GENERATION_UNKOWN
- COMMIT_GENERATION_NOT_COMPUTED

They were introduced by cf70e7cbe4 without the `final` keyword while
other constants have it which certainly has been forgotten.

The javadoc `{@value}` tag causes raises a warning about the fields not
being constants which is how I have discovered the ommission.

Change-Id: I0ad87f42355440c7d50158e773a280a0526e9671
2023-06-09 16:40:35 +02:00
Luca Milanesio ff581f51e9 Merge branch 'stable-6.4' into stable-6.5
* stable-6.4:
  Revert "RefDirectory: Throw exception if CAS of packed ref list fails"

Change-Id: I7d922a92b7674723cbf6a93fb7c9bc5c0cdb8206
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
2023-06-08 00:27:49 +01:00
Luca Milanesio b6237ca8b6 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Revert "RefDirectory: Throw exception if CAS of packed ref list fails"

Change-Id: I33049e70595f097a66e8f4a63b3d8d1c147e878e
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
2023-06-08 00:27:00 +01:00
Luca Milanesio 880f1234b2 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Revert "RefDirectory: Throw exception if CAS of packed ref list fails"

Change-Id: I70db1bc8529eb6a66610946946da5447a578bffa
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
2023-06-08 00:26:04 +01:00
Luca Milanesio b77fdf6df4 Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Revert "RefDirectory: Throw exception if CAS of packed ref list fails"

Change-Id: I1a98e293ef10917b2d8ad64e88be9e82c7bcf693
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
2023-06-08 00:25:12 +01:00
Luca Milanesio 42d201f46c Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Revert "RefDirectory: Throw exception if CAS of packed ref list fails"

Change-Id: Idc0d1f8ab4524868b7e9754799f70acc1d24f2cb
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
2023-06-08 00:24:00 +01:00
Luca Milanesio a22c62cb6d Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Revert "RefDirectory: Throw exception if CAS of packed ref list fails"

Change-Id: I883b21b00317cc6d9951a8a5f9505078ddd2a3a7
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
2023-06-08 00:21:15 +01:00
Martin Fick f6928f5736 Revert "RefDirectory: Throw exception if CAS of packed ref list fails"
This reverts commit 9c33f7364d.

Reason for revert: This change was based on the false claim that the
packedrefs file lock is held while the CAS is being done, but it is
actually released before the CAS (the in memory lock is still held,
however that does not prevent external actors from updating the
packedrefs files and then another thread from subsequently re-reading it
and updating the in memory packedRefList). Although reverting this
change can cause the CAS to fail, it should not actually matter since
the failure would indicate that another thread has already updated the
in memory packedRefList to either the same version this thread was
trying to update it too, or to a more recent version. Either way,
failing the CAS is then appropriate and should not be problematic.

Although this change reverts the code in the RefDirectory class, it
keeps the "improvements" to the test so that it continues to pass
reliably. The reason for the quotes around the word "improvements" is
because I believe the test alteration actually dramatically changes the
intent of the test, and that the original intent of the test is
untestable with the GC and RefDirectory classes as is.

Bug: 582044
Change-Id: I3acee7527bb542996dcdfaddfb2bdb45ec444db5
Signed-off-by: Martin Fick <quic_mfick@quicinc.com>
(cherry picked from commit c5617711a1)
2023-06-07 19:08:39 -04:00
Anna Papitto 74547f4a68 PackReverseIndex: use static builder instead of constructor
PackReverseIndex instances are created using the constructor directly,
which limits control over the construction logic and refactoring
opportunities for the class itself. These will be needed for a
file-based implementation of the reverse index.

Use a static builder method to create a PackReverseIndex instance using
a pack's forward index.

Change-Id: I4421d907cd61d9ac932df5377e5e28a81679b63f
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-05-31 10:09:50 +02:00
Anna Papitto 181b629f7d Gc#writePack: write the reverse index file to disk
The reverse index is currently created in-memory when needed. A writer
for reverse index files was already implemented.

Make garbage collection write the reverse index file when the PackConfig
enables it. Write it during #writePack, which mirrors how the primary
index is written.

Change-Id: I50131af6622c41a7b24534aaaf2a423ab4178981
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-05-31 10:09:50 +02:00
Jonathan Tan 44461b215e Merge "GraphObjectIndex: fix search in findGraphPosition" 2023-05-23 18:26:47 -04:00
Jonathan Tan 6b3b2b33a5 GraphObjectIndex: fix search in findGraphPosition
In findGraphPosition, when there is no object whose OID starts with
the first byte of the sought OID, low equals high. This violates an
invariant of the loop, and when the sought OID is lexicographically
greater than every other OID in the repository, causes an
ArrayIndexOutOfBoundsException (because we're trying to read outside the
list of OIDs).

Therefore, check the "low < high" condition at the start of the loop,
not only after the first iteration.

Change-Id: Ic8ac198c151bd161c4996b9e7cb6e6660f151733
Helped-by: Ivan Frade <ifrade@google.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
2023-05-23 13:57:32 -07:00
Matthias Sohn 0eedb1affb Merge "Also add suppressed exception if unchecked exception occurs in finally" 2023-05-19 13:49:42 -04:00
Matthias Sohn 2cbf0c1774 Also add suppressed exception if unchecked exception occurs in finally
If a method called in a finally block throws an exception we should add
exceptions caught earlier to the exception we throw in the finally block
not regarding if it's a checked or unchecked exception.

Change-Id: I4c6be9a3a08482b07659ca31d6987ce719d81ca5
2023-05-18 21:10:33 +02:00
Fabio Ponciroli 334852c52f Candidate: use "Objects.equals" instead of "=="
Errorprone raises the following warning:
"[ReferenceEquality] Comparison using reference equality
instead of value equality".

Change-Id: Iacb207ef0625bb987a08406d4e7461e48fade97f
2023-05-18 05:37:11 -04:00
Matthias Sohn 6f2d93fb8d Remove unused $NON-NLS-1$
Change-Id: I3314e5106d873c03903562f9798de6af2ae588a7
2023-05-17 17:01:33 +02:00
Thomas Wolf 43954ea62a [releng] API filter for PackIndex.DEFAULT_WRITE_REVERSE_INDEX
New static final constant is a (very minor) API break that needs to be
suppressed explicitly despite @since 6.6.

Remove a number of no longer needed API filters, and fix a broken
$NON-NLS-1$.

Change-Id: Ie4b0c45e8bd1f3067b6ff81c07d4b21b50bb8685
Signed-off-by: Thomas Wolf <twolf@apache.org>
2023-05-15 20:45:22 +02:00
Ivan Frade c40683929c Merge "UploadPack: Record negotiation stats on fetchV2 call" 2023-05-11 16:53:12 -04:00
Anna Papitto 2c89a3ec74 PackExt: add a #getTmpExtension method
During garbage collection, extensions for temporary files for indices
are formatted manually.

Add a method to PackExt to generate the temporary file extensions for
each type of index file programmatically.

Change-Id: I210bc2702e750bf0aea643b1a9a8536adebef179
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-05-11 16:49:55 -04:00
Ronald Bhuleskar d0564cf8ae UploadPack: Record negotiation stats on fetchV2 call
ServiceV2 is not collecting wants/have in PackStatistics. This records
the stats for fetch and push-negotiation.

Change-Id: Iefd79f36b3d7837195e8bd9fc7007de352089e66
2023-05-11 11:55:38 -07:00
Ivan Frade 73f9f55e3b Merge "PackWriter: write the PackReverseIndex file" 2023-05-08 15:00:46 -04:00
Anna Papitto ce88e62edc PackWriter: write the PackReverseIndex file
PackWriter offers the ability to write out the pack file and its various
index files, except for the newly introduced file-based reverse index.

Now that PackReverseIndexWriter can write reverse index files,
PackWriter#writeReverseIndex will write one for a pack if the
corresponding config flag PackConfig#writeReverseIndex is on.

Change-Id: Ib75dd2bbfb9ee9366d5aacb46700d8cf8af4823a
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-05-08 11:23:30 -07:00
Matthias Sohn 74fa245b3c Merge "Fix inProcessPackedRefsLock not shared with copies of the instance" 2023-05-03 11:10:14 -04:00
Matthias Sohn 3d90c4a433 Add TransportHttp#getAdditionalHeaders
to enable inspecting which additional HTTP headers have been set on the
transport.

Change-Id: I0771be9cb7c837de7c203b7f044109b9b2a7d7ad
2023-05-03 02:40:41 +02:00
Nasser Grainawi 06cfebd066 Fix inProcessPackedRefsLock not shared with copies of the instance
The in process lock is intended to manage contention on locking the
packed-refs file within a single process without acquiring the file
system lock. Not sharing it across RefDirectory instances of the same
repository undermines that intent and results in more contention at the
file system level.

Change-Id: I68f11856aa0b4b1524f43554d7391a322a0a6897
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
2023-05-02 17:14:52 -06:00
Matthias Sohn 076b8e7636 Add missing @since tag to IntComparator
Change-Id: Ic190ab404ccb3af675cdd90cac231ce6e856ea68
2023-05-01 15:34:47 +02:00
Thomas Wolf 8c0c96e0a7 Support rebasing independent branches
With completely independent branches, there is no merge base. In this
case, the list of commits must include the root commit of the branch to
be rebased.

Bug: 581832
Change-Id: I0f5bdf179d5b07ff09f1a274d61c7a0b1c0011c6
Signed-off-by: Thomas Wolf <twolf@apache.org>
2023-04-29 13:24:58 +02:00
Thomas Wolf 8bc13fb79d Support cherry-picking a root commit
Handle the case of the commit to be picked not having any parents.

Since JGit implements cherry-pick as a 3-way-merge between the commit
to be picked and the target commit, using the parent of the picked
commit as merge base, this is super simple: just don't set a base tree.
The merger will not find any merge base and will supply an empty tree
iterator for the base.

Bug: 581832
Change-Id: I88985f1b1723db5b35ce58bf228bc48d23d6fca3
Signed-off-by: Thomas Wolf <twolf@apache.org>
2023-04-29 13:24:32 +02:00
Thomas Wolf 3ed4cdda6b AddCommand: ability to switch off renormalization
JGit's AddCommand always renormalizes tracked files. C git does so only
on git add --renormalize. Especially for git add . and the JGit
equivalent git.add().addFilepattern(".").call() this can make a big
difference if there are many files, or large files.

Add a "renormalize" option to AddCommand. To maintain compatibility with
existing uses, this option is "true" by default, and the behavior of
AddCommand is as it has always been in JGit.

If set to "false", use an IndexDiffFilter (in addition to a path filter,
if any). This skips any unchanged files (that are not racily clean) from
content checks. Note that changes in CRLF settings or in filters will be
ignored for such files if renormalize == false.

Add the "--renormalize" option to the Add command in the JGit command
line program. For the command-line program, the default is as in C git:
renormalize is off by default and enabled only if the option is given.
Note that --renormalize implies --update in the command line program, as
in C git. In AddCommand, the two settings are independent.

Additionally, avoid opening input streams unnecessarily in
WorkingTreeIterator.getEntryContentLength() and fix some bogus
indentation.

Add a simple test that adds 1000 files of 10kB in 10 directories twice
and that fails if the second invocation (without any changes) with
renormalize=false is not significantly faster.

Locally, I observe for that second invocation

* git.add().addFilepattern(".").call()                        ~660ms
* git.add().addFilepattern(".").setRenormalize(false).call()   ~16ms

Bug: 494323
Change-Id: I30f9d518563fa55d7058a48c27c425f3b60aeb4c
Signed-off-by: Thomas Wolf <twolf@apache.org>
2023-04-28 17:04:47 -04:00
Matthias Sohn 2277f13041 Merge "Merge branch 'stable-6.5'" 2023-04-28 15:22:52 -04:00
Matthias Sohn 4d9db14a5e Merge branch 'stable-6.5'
* stable-6.5:
  [bazel] Move ToolTestCase to src folder (6.2)
  GcConcurrentTest: @Ignore flaky testInterruptGc
  Fix CommitTemplateConfigTest
  Fix after_open config and Snapshotting RefDir tests to work with bazel
  [bazel] Skip ConfigTest#testCommitTemplatePathInHomeDirecory
  Demote severity of some error prone bug patterns to warnings
  Parse pull.rebase=preserve as alias for pull.rebase=merges
  UploadPack: Fix NPE when traversing a tag chain

Change-Id: I16e8553d187a8ef541f578291f47fc39c3da4ac0
2023-04-28 19:51:01 +02:00
Anna Papitto 64615b44e6 PackReverseIndexWriter: write out version 1 reverse index file
The reverse index for a pack is used to quickly find an object's
position in the pack's forward index based on that object's pack offset.
It is currently computed from the forward index by sorting the index
entries by the corresponding pack offset. This computation uses
bucket sort with insertion sort, which has an average runtime of
O(n log n) and worst case runtime of O(n^2); and memory usage of
3*size(int)*n because it maintains 3 int arrays, even after sorting is
completed. The computation must be performed every time that the reverse
index object is created in memory.

In contrast, Cgit persists a pack reverse index file to avoid
recomputing the reverse index ordering every time that it is needed.
Instead they write a file with format
https://git-scm.com/docs/pack-format#_pack_rev_files_have_the_format
which can later be read and parsed into an in-memory reverse index each
time it is needed.

Introduce these reverse index files to JGit. PackReverseIndexWriter
writes out a reverse index file to be read later when needed. Subclass
PackReverseIndexWriterV1 writes a file with the official version 1
format.

To avoid temporarily allocating an Integer collection while sorting and
writing out the contents, using memory 4*size(Integer)*n, use an
IntList and its #sort method, which uses quicksort.

Change-Id: I6437745777a16f723e2f1cfcce4e0d94e599dcee
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-04-28 10:19:18 -07:00
Anna Papitto 7d3f893d31 IntList: add #sort using quick sort for O(n log n) runtime.
IntList is a class for working with lists of primitive ints without
boxing them into Integers. For writing the reverse index file format,
sorting ints will be needed but IntList doesn't provide a sorting
method yet.

Add the #sort method to sort an IntList by an IntComparator, using
quicksort, which has a average runtime of O(n log n) and sorts in-place
by using O(log n) stack frames for recursive calls.

Change-Id: Id69a687c8a16d46b13b28783b194a880f3f4c437
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-04-28 10:19:18 -07:00
Matthias Sohn 34a81889b8 Merge branch 'stable-6.4' into stable-6.5
* stable-6.4:
  [bazel] Move ToolTestCase to src folder (6.2)
  GcConcurrentTest: @Ignore flaky testInterruptGc
  Fix CommitTemplateConfigTest
  Fix after_open config and Snapshotting RefDir tests to work with bazel
  [bazel] Skip ConfigTest#testCommitTemplatePathInHomeDirecory
  Demote severity of some error prone bug patterns to warnings
  UploadPack: Fix NPE when traversing a tag chain

Change-Id: I6d20fea3a417e4361b61e81756253343668eb5de
2023-04-27 02:30:20 +02:00
Matthias Sohn f87c456e8a Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  [bazel] Move ToolTestCase to src folder (6.2)
  GcConcurrentTest: @Ignore flaky testInterruptGc
  Fix CommitTemplateConfigTest
  Fix after_open config and Snapshotting RefDir tests to work with bazel
  [bazel] Skip ConfigTest#testCommitTemplatePathInHomeDirecory
  Demote severity of some error prone bug patterns to warnings
  UploadPack: Fix NPE when traversing a tag chain

Change-Id: I463f8528e623316add204848d551c44d44d04858
2023-04-27 02:20:10 +02:00
Matthias Sohn cdf35e8ead Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  [bazel] Move ToolTestCase to src folder (6.2)
  GcConcurrentTest: @Ignore flaky testInterruptGc
  Fix CommitTemplateConfigTest
  Fix after_open config and Snapshotting RefDir tests to work with bazel
  [bazel] Skip ConfigTest#testCommitTemplatePathInHomeDirecory
  Demote severity of some error prone bug patterns to warnings
  UploadPack: Fix NPE when traversing a tag chain

Change-Id: I736c7d0ed9c6e9718fa98976c3dc5a25ab8cda85
2023-04-27 02:08:05 +02:00
Matthias Sohn 206f0f44f6 Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  GcConcurrentTest: @Ignore flaky testInterruptGc
  Fix CommitTemplateConfigTest
  Fix after_open config and Snapshotting RefDir tests to work with bazel
  [bazel] Skip ConfigTest#testCommitTemplatePathInHomeDirecory
  Demote severity of some error prone bug patterns to warnings
  UploadPack: Fix NPE when traversing a tag chain

Change-Id: I9863cbce95d845efc891724898954b0b2f8dbf7b
2023-04-27 01:48:07 +02:00
Matthias Sohn 6082ae25dd Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  [bazel] Skip ConfigTest#testCommitTemplatePathInHomeDirecory
  Demote severity of some error prone bug patterns to warnings
  UploadPack: Fix NPE when traversing a tag chain

Change-Id: I5e13d5b5414aef97e518898166bfa166c692e60f
2023-04-26 21:55:16 +02:00
Matthias Sohn 032eef5b12 Parse pull.rebase=preserve as alias for pull.rebase=merges
This ensures backwards compatibility to the old config value which was
removed in git 2.34 which JGit followed in Ic07ff954e2.

Change-Id: I2b4e27fd71898b6e0e227e406c40682bd9786cd4
2023-04-22 23:43:40 +02:00
Kaushik Lingarkar 064691e90c UploadPack: Fix NPE when traversing a tag chain
Always parse RevTags including their body before getting their object
to ensure that non-cached objects are handled correctly when traversing
a tag chain. An NPE in UploadPack#addTagChain will occur on a depth=1
fetch of a branch containing a tag chain and the ref to one of the
middle tags in the chain is deleted.

Change-Id: Ifd8fe868869070b365df926fec5dcd8e64d4f521
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-04-21 02:04:35 +02:00
Matthias Sohn 4117bf9d74 Add missing @since tag for BatchRefUpdate#getRefDatabase
Change-Id: I5d37cbbd6c338e6e6bb8b95ae13a5ed9b5178a8b
2023-04-21 00:59:07 +02:00
Matthias Sohn f0829b0c46 Merge branch 'stable-6.5'
* stable-6.5:
  Add missing since tag for SshBasicTestBase
  Add missing since tag for SshTestHarness#publicKey2
  Silence API errors
  Prevent infinite loop rescanning the pack list on PackMismatchException
  Remove blank in maven.config

Change-Id: I0b03ca566053a158c6c8e75ccec8360a2f368ed9
2023-04-21 00:52:18 +02:00
Matthias Sohn 06b40b95c2 Merge branch 'stable-6.4' into stable-6.5
* stable-6.4:
  Add missing since tag for SshBasicTestBase
  Add missing since tag for SshTestHarness#publicKey2
  Silence API errors
  Prevent infinite loop rescanning the pack list on
PackMismatchException
  Remove blank in maven.config

Change-Id: I89af76946014fb44bd64c20e2b01a53397768d90
2023-04-21 00:45:30 +02:00
Matthias Sohn 6456059795 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Add missing since tag for SshBasicTestBase
  Add missing since tag for SshTestHarness#publicKey2
  Silence API errors
  Prevent infinite loop rescanning the pack list on PackMismatchException
  Remove blank in maven.config

Change-Id: I18b46be0f09535c61efabe24ab1579faa3d06ba8
2023-04-21 00:33:26 +02:00
Matthias Sohn 76aa6f2840 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Add missing since tag for SshBasicTestBase
  Add missing since tag for SshTestHarness#publicKey2
  Silence API errors
  Prevent infinite loop rescanning the pack list on PackMismatchException
  Remove blank in maven.config

Change-Id: I8006068f16ae442a2246e043a680053f2af34e9f
2023-04-21 00:25:51 +02:00
Matthias Sohn 2ca2671b0c Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Add missing since tag for SshBasicTestBase
  Add missing since tag for SshTestHarness#publicKey2
  Silence API errors
  Prevent infinite loop rescanning the pack list on PackMismatchException
  Remove blank in maven.config

Change-Id: I4c5b000b09287cc32f0e4d6a24a766ef4e17ddbe
2023-04-21 00:19:38 +02:00
Matthias Sohn e59ade2a6f Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Add missing since tag for SshBasicTestBase
  Add missing since tag for SshTestHarness#publicKey2
  Silence API errors
  Prevent infinite loop rescanning the pack list on PackMismatchException
  Remove blank in maven.config

Change-Id: Ia01c5ac5259b8820afb823d97bee247b5a5fb14a
2023-04-21 00:11:40 +02:00
Matthias Sohn 40daa780ef Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Add missing since tag for SshBasicTestBase
  Add missing since tag for SshTestHarness#publicKey2
  Silence API errors
  Prevent infinite loop rescanning the pack list on
PackMismatchException
  Remove blank in maven.config

Change-Id: Id37bee59ca3c7947604c54b6d4e7c02628a657fe
2023-04-21 00:09:10 +02:00
Matthias Sohn 393074368b Merge branch 'stable-5.12' into stable-5.13
* stable-5.12:
  Add missing since tag for SshBasicTestBase
  Add missing since tag for SshTestHarness#publicKey2
  Silence API errors
  Prevent infinite loop rescanning the pack list on
PackMismatchException
  Remove blank in maven.config

Change-Id: Ibe6652374ab5971105e62b05279f218c8c130fee
2023-04-20 16:00:30 +02:00
Matthias Sohn f164bd988d Merge branch 'stable-5.11' into stable-5.12
* stable-5.11:
  Add missing since tag for SshBasicTestBase
  Add missing since tag for SshTestHarness#publicKey2
  Silence API errors
  Prevent infinite loop rescanning the pack list on PackMismatchException
  Remove blank in maven.config

Change-Id: I25bb99687b969f9915a7cbda8d1332bec778096a
2023-04-20 15:12:01 +02:00
Matthias Sohn 48b0781cfe Merge branch 'stable-5.10' into stable-5.11
* stable-5.10:
  Add missing since tag for SshTestHarness#publicKey2
  Silence API errors
  Prevent infinite loop rescanning the pack list on
PackMismatchException
  Remove blank in maven.config

Migrated "Prevent infinite loop rescanning the pack list on
PackMismatchException" to refactoring done in
https://git.eclipse.org/r/q/topic:restore-preserved-packs

Change-Id: I0fb77bb9b498d48d5da88a93486b99bf8121e3bd
2023-04-20 14:58:50 +02:00
Matthias Sohn 26865d5a84 Merge branch 'stable-5.9' into stable-5.10
* stable-5.9:
  Prevent infinite loop rescanning the pack list on
PackMismatchException
  Remove blank in maven.config

Change-Id: I15ff2d7716ecaceb0eb87b8823d85670f5db709d
2023-04-20 14:37:30 +02:00
Matthias Sohn 96d9e3eb19 Prevent infinite loop rescanning the pack list on PackMismatchException
We found, when analysing an incident where Gerrit's gc runner thread got
stuck, that we can end up in an infinite loop in
ObjectDirectory#openPackedObject which tries to rescan the pack
list and starts over trying to open a packed object in an unconfined
loop if it catches a PackMismatchException.

Here the relevant part of a thread dump we created while the gc runner
was stuck:

"WorkQueue-2[java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@350812a3[Not
completed,
task = java.util.concurrent.Executors$RunnableAdapter@5425d7ee]]" #72
tid=0x00007f73cee1c800 nid=0x584
runnable  [0x00007f7392d57000]
   java.lang.Thread.State: RUNNABLE
	at org.eclipse.jgit.internal.storage.file.WindowCache.removeAll(WindowCache.java:716)
	at org.eclipse.jgit.internal.storage.file.WindowCache.purge(WindowCache.java:399)
	at org.eclipse.jgit.internal.storage.file.PackFile.close(PackFile.java:296)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.reuseMap(ObjectDirectory.java:973)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.scanPacksImpl(ObjectDirectory.java:904)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.scanPacks(ObjectDirectory.java:895)
	- locked <0x000000050a498f60> (a
java.util.concurrent.atomic.AtomicReference)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.searchPacksAgain(ObjectDirectory.java:794)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:465)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:417)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:408)
	at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:132)
	at org.eclipse.jgit.lib.ObjectReader$1.open(ObjectReader.java:279)
	at org.eclipse.jgit.revwalk.RevWalk$2.next(RevWalk.java:1031)
	at org.eclipse.jgit.internal.storage.pack.PackWriter.findObjectsToPack(PackWriter.java:1911)
	at org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(PackWriter.java:960)
	at org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(PackWriter.java:876)
	at org.eclipse.jgit.internal.storage.file.GC.writePack(GC.java:1168)
	at org.eclipse.jgit.internal.storage.file.GC.repack(GC.java:852)
	at org.eclipse.jgit.internal.storage.file.GC.doGc(GC.java:269)
	at org.eclipse.jgit.internal.storage.file.GC.gc(GC.java:220)
	at org.eclipse.jgit.api.GarbageCollectCommand.call(GarbageCollectCommand.java:179)
	at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:112)
	at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:75)
	at com.google.gerrit.server.git.GarbageCollection.run(GarbageCollection.java:71)
	at com.google.gerrit.server.git.GarbageCollectionRunner.run(GarbageCollectionRunner.java:76)
	at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:103)
	at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.18/Executors.java:515)
	at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.18/FutureTask.java:305)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.18/ScheduledThreadPoolExecutor.java:305)
	at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:612)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.18/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.18/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.18/Thread.java:829)

The code in ObjectDirectory#openPackedObject [1] apparently assumes that
this is caused by a transient problem which it can resume from by
retrying. We use `core.trustFolderStat = false` on this server since it
uses NFS. The incident we had showed that we can enter into an infinite
loop here if there is a permanent mismatch between a pack file and its
corresponding pack index. I am not yet sure how this can happen.

Break the infinite loop by limiting the number of attempts rescanning
the pack list to 5 retries.  When we exceed this threshold set the type
of the PackMismatchException to permanent and rethrow it which breaks
the infinite loop.

Also apply the same limit in #getPackedObjectSize
and #selectObjectRepresentation where we use similar retry loops.

[1] 011c26ff36/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/ObjectDirectory.java (465)

Change-Id: I20fb63bcc1fdc3a03d39b963f06a90e6f0ba73dc
2023-04-19 16:29:44 +02:00
Matthias Sohn f1a9adf7da PackedBatchRefUpdate#execute: reduce nesting of try-catch blocks
Change-Id: I7ddf20fcbf4971ee908b20d8df9d6328ce9f9f1b
2023-04-18 11:00:54 +02:00
Kaushik Lingarkar 8f8bc703e9 PackedBatchRefUpdate: Handle the case where loose refs fail to pack
If packing loose refs fails due to a lock failure, reject update with
a LOCK_FAILURE.

Change-Id: I100e81efd528d963231a1b87bacd9d68f9245a1b
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-04-18 09:06:27 +02:00
Matthias Sohn 2e3f12a0fc Merge branch 'stable-6.5'
* stable-6.5:
  Remove blank in maven.config
  DirCache: support option index.skipHash

Change-Id: I5bfff523b3174c7b741ab0eaf53937c3ab501252
2023-04-15 22:50:11 +02:00
Matthias Sohn 831da296d9 Merge branch 'stable-6.4' into stable-6.5
* stable-6.4:
  Remove blank in maven.config
  DirCache: support option index.skipHash

Change-Id: I7f822e8a751516a32afccd180cbf6afb389f3a28
2023-04-15 21:39:03 +02:00
Matthias Sohn 4ec1252f90 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Remove blank in maven.config
  DirCache: support option index.skipHash

Change-Id: I18cf0da3a5dcc74865c44d82e7c328329814acae
2023-04-15 21:38:27 +02:00
Matthias Sohn 34dc17ac3a Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Remove blank in maven.config
  DirCache: support option index.skipHash

Change-Id: If0bb5f1a317ab981e6bbf5671851f124b18ab8ca
2023-04-15 21:35:16 +02:00
Matthias Sohn de5cb9a031 Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Remove blank in maven.config
  DirCache: support option index.skipHash

Change-Id: Ief50a2ca8e5a8630627506f4d2142d62c0554615
2023-04-15 21:34:41 +02:00
Matthias Sohn 20b7e9435b Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Remove blank in maven.config
  DirCache: support option index.skipHash

Change-Id: Idf757bcab0d7a65ea63504674a681170c6db2f94
2023-04-15 00:49:59 +02:00
Matthias Sohn 273df319fe Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Remove blank in maven.config
  DirCache: support option index.skipHash

Change-Id: I0cc3033b1876c8c691c2a6876206cd71fa07d2e0
2023-04-15 00:49:08 +02:00
Pat Patterson e06ce59607 Add protocol configuration to Amazon S3 transport
Before this change, attempting to use the jgit Amazon S3 transport with an S3-compatible service that requires https (for example, Backblaze B2) results in an error:

$ jgit push b2
fatal: amazon-s3://metadaddy-jgit/repos/test/objects: error in packed-refs

This change adds a "protocol" property to the Amazon S3 transport configuration, defaulting to http, and uses that value when constructing the URL for the S3 service.

Example configuration for Backblaze B2:

accesskey: <Your B2 Application Key>
secretkey: <Your B2 Application Key Id>
acl: private
protocol: https
domain: s3.us-west-004.backblazeb2.com
region: us-west-004
aws.api.signature.version: 4

Behavior after this change:

$ jgit push b2
Counting objects:       3
Finding sources:        100% (3/3)
Getting sizes:          100% (2/2)
Compressing objects:    100% (37/37)
Writing objects:        100% (3/3)
Put pack-673f9bb.idx:   100% (1/1)
To amazon-s3://.jgit_b2@metadaddy-jgit/repos/test
 * [new branch]      main -> main

Change-Id: I03bdbb3510fb81a416c225a720178f42eec41b21
2023-04-13 08:49:48 -04:00
Matthias Sohn 060dcf1cca ListTagCommand: implement git tag --contains
Change-Id: I07e57ba098eace9656393837fad4cb3590f31b22
2023-04-12 13:56:52 +02:00
kylezhao 5cc9ecde8f RevWalk: use generation number to optimize getMergedInto()
A commit A can reach a commit B only if the generation number of A is
strictly larger than the generation number of B. This condition allows
significantly short-circuiting commit-graph walks.

On a copy of the Linux repository where HEAD is contained in v6.3-rc4
but no earlier tag, the command 'git tag --contains HEAD' of
ListTagCommand#call() had the following peformance improvement:
(excluded the startup time of the repo)

Before: 2649ms    (core.commitgraph=true)
        11909ms   (core.commitgraph=false)
After:  91ms     (core.commitgraph=true)
        11934ms   (core.commitgraph=false)

Bug: 574368
Change-Id: Ia2efaa4e9ae598266f72e70eb7e3b27655cbf85b
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-04-12 11:29:09 +08:00
Ivan Frade 89f7378da5 DfsPackFile: Extract block aligment code
Loading of pack, bitmap and commit-graph copy the same code to adjust
the input stream buffering.

Extract to a common function. Besides reusing the code, the name hints
what it is doing.

This block aligment seems unnecessary as the reading is from storage
not dfs cache. The channel probably knows better. Left a TODO because
I don't know the original intention.

Change-Id: I18b77ae8189830fcd4d5932b6b5823b693ed6090
2023-04-11 13:39:50 -07:00
Matthias Sohn 75db060673 Merge branch 'stable-6.5'
* stable-6.5:
  Ensure parsed RevCommitCG has derived data from commit-graph
  Downgrade maven-site-plugin to 3.12.1
  Use wagon-ssh-external to deploy Maven site

Change-Id: Ide721fb088fa04f6276ac495968a45e732f6e139
2023-04-06 22:16:41 +02:00
kylezhao d3ba40c803 Ensure parsed RevCommitCG has derived data from commit-graph
If a RevCommitCG was newly created and called #parseCanonical(RevWalk,
byte[]) method immediately, its flag was marked as PARSED, but no
derived data was obtained from the commit-graph. This is different from
what we expected.

Change-Id: I5d417efa3c42d211f19e6acf255f761e84d84450
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-04-06 20:05:13 +02:00
Han-Wen NIenhuys d174684273 Merge "PatchApplier: Check for existence of src/dest files before any operation" 2023-03-31 06:24:32 -04:00
Nitzan Gur-Furman 903645835b PatchApplier: Check for existence of src/dest files before any operation
Change-Id: Ia3ec0ce1af65114b48669157a934f70f1e22fd37
Bug: Google b/271474227
2023-03-31 06:22:29 -04:00
Martin Fick c5617711a1 Revert "RefDirectory: Throw exception if CAS of packed ref list fails"
This reverts commit 9c33f7364d.

Reason for revert: This change was based on the false claim that the
packedrefs file lock is held while the CAS is being done, but it is
actually released before the CAS (the in memory lock is still held,
however that does not prevent external actors from updating the
packedrefs files and then another thread from subsequently re-reading it
and updating the in memory packedRefList). Although reverting this
change can cause the CAS to fail, it should not actually matter since
the failure would indicate that another thread has already updated the
in memory packedRefList to either the same version this thread was
trying to update it too, or to a more recent version. Either way,
failing the CAS is then appropriate and should not be problematic.

Although this change reverts the code in the RefDirectory class, it
keeps the "improvements" to the test so that it continues to pass
reliably. The reason for the quotes around the word "improvements" is
because I believe the test alteration actually dramatically changes the
intent of the test, and that the original intent of the test is
untestable with the GC and RefDirectory classes as is.

Change-Id: I3acee7527bb542996dcdfaddfb2bdb45ec444db5
Signed-off-by: Martin Fick <quic_mfick@quicinc.com>
2023-03-30 18:04:16 -04:00
Kaushik Lingarkar 5ae8d28faa RefDirectory.delete: Prevent failures when packed-refs is outdated
The in-memory copy of packed refs might be outdated by the time the
packed-refs lock is acquired, so ensure the one read from disk is
used after acquiring the lock to prevent commit packed-refs from
throwing an exception. As a side-effect, since this updates the
in-memory copy of packed-refs when it is re-read from disk, it can
prevent other callers needing to re-read if it had changed.

Change-Id: I724c866b330b397e955b5eb04b259eedd9911e93
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-03-30 22:32:50 +02:00
Kaushik Lingarkar 4f7627be24 RefDirectory.pack: Only rely on packed refs from disk
Since packed-refs is read from disk anyway, don't rely on the
in-memory copy as that is racy and if outdated, could result in
commit of pack-refs throwing an exception. This change also avoids
a possible unnecessary double read of packed-refs from disk.

Change-Id: I684a64991f53f8bdad58bbd248aae6522d11267d
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-03-30 22:32:50 +02:00
Kaushik Lingarkar 4652dd956e RefDirectory: Make pack() and commitPackRefs() void
There are no more callers (since Iae71cb3) of these methods that need
the returned value. These methods should not have been returning
anything in the first place as that can introduce bugs such as the
one described in Iae71cb3.

Change-Id: I1d083a91603da803a106cfb1506925a82c2ef809
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-03-30 22:32:50 +02:00
Kaushik Lingarkar 33c00f3347 Implement a snapshotting RefDirectory for use in request scope
Introduce a SnapshottingRefDirectory class which allows users to get
a snapshot of the ref database and use it in a request scope (for
example a Gerrit query) instead of having to re-read packed-refs
several times in a request.

This can potentially be further improved to avoid scanning/reading a
loose ref several times in a request. This would especially help
repeated lookups of a packed ref, where we check for the existence of
a loose ref each time.

Change-Id: I634b92877f819f8bf36a3b9586bbc1815108189a
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-03-30 22:25:53 +02:00
Kaushik Lingarkar 47f2f3613c PackedBatchRefUpdate: Ensure updates are applied on latest packed refs
In the window between refs being packed (via refDb.pack) and obtaining
updates (via applyUpdates), packed-refs may have been updated by another
actor and relying on the previously read contents may lead to losing the
updates done by the other actor. To help avoid this, read packed-refs
from disk to ensure we have the latest copy after it is locked and
before committing updates to it.

Bug: 581641
Change-Id: Iae71cb30830b307d0df929c9131911ee476c711c
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-03-30 22:25:52 +02:00
Thomas Wolf d33b9ab5d3 PatchApplier: missing @since, and minor formatting
Change-Id: I561ca2f522579571b29d3e6f35f24e201d1c1663
Signed-off-by: Thomas Wolf <twolf@apache.org>
2023-03-29 21:06:11 +02:00
Ronald Bhuleskar 738dacb7fb BasePackFetchConnection: support negotiationTip feature
By default, Git will report, to the server, commits reachable from all local refs to find common commits in an attempt to reduce the size of the to-be-received packfile. If specified with negotiation tip, Git will only report commits reachable from the given tips. This is useful to speed up fetches when the user knows which local ref is likely to have commits in common with the upstream ref being fetched.

When negotation-tip is on, use the wanted refs instead of all refs as source of the "have" list to send.

This is controlled by the `fetch.usenegotationtip` flag, false by default. This works only for programmatic fetches and there is no support for it yet in the CLI.

Change-Id: I19f8fe48889bfe0ece7cdf78019b678ede5c6a32
2023-03-28 14:29:54 -07:00
Matthias Sohn 23b9693a75 DirCache: support option index.skipHash
Support the new option index.skipHash which was introduced in git 2.40
[1]. If it is set to true skip computing the git index checksum. This
accelerates Git commands that manipulate the index, such as git add, git
commit, or git status. Instead of storing the checksum, write a trailing
set of bytes with value zero, indicating that the computation was
skipped.

Accept a skipped checksum consisting of 20 null bytes when reading the
index since the option could have been set to true at the time when the
index was written.

[1] https://git-scm.com/docs/git-config#Documentation/git-config.txt-indexskipHash

Bug: 581723
Change-Id: I28ebe44c5ca1cbcb882438665d686452a0c111b2
2023-03-28 23:16:08 +02:00
Han-Wen NIenhuys 5166ded098 Merge "Fix PatchApplier error handling." 2023-03-28 05:51:18 -04:00
Nitzan Gur-Furman 3a913a8c34 Fix PatchApplier error handling.
1. For general errors, throw IOException instead of wrapping them with
PatchApplyException. The wrapping was moved (back) to ApplyCommand.
2. For file specific errors, log the errors as part of
PatchApplier::Result.
3. Change applyPatch() to receive the parsed Patch object, so the caller
can decide how to handle parsing errors.

Background: this utility class was extracted from ApplyCommand on V6.4.0.
During the extraction, we left the exception wrapping by
PatchApplyException intact. This attitude made it harder for the callers to
distinguish between the actual error causes.

Change-Id: Ib0f2b5e97a13df2339d8b65f2fea1c819c161ac3
2023-03-28 11:18:08 +02:00
kylezhao 827849017d Ensure FileCommitGraph scans commit-graph file if it already exists
When commit-graph file already exists in the repository, a newly
created FileCommitGraph didn't scan CommitGraph until the file was
modified, resulting in wrong result.

Change-Id: Ic85676f2d3b6a88f3ae28d4065729926b6fb2f23
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-03-27 10:51:07 +02:00
Matthias Sohn 67fcf76b4b Merge branch 'stable-6.4' into stable-6.5
* stable-6.4:
  GC: Close File.lines stream

Change-Id: I7e3a4b3671e779fd62062c4e10d224f432e39b54
2023-03-23 09:07:33 +01:00
Matthias Sohn cd2dc85f31 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  GC: Close File.lines stream

Change-Id: I99455916d447f5dffed85e9a5c1d51b323f07a16
2023-03-23 09:07:09 +01:00
Matthias Sohn 137efda0ba Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  GC: Close File.lines stream

Change-Id: Id93b1933a5ce1ede9eb388c9fd54a4b3749694bf
2023-03-23 09:06:43 +01:00
Matthias Sohn b118e7b4c4 Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  GC: Close File.lines stream

Change-Id: Ia2be0b05ed860125a388b01d6c291832f08dd990
2023-03-23 09:06:16 +01:00
Matthias Sohn 5b16c8ae15 Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  GC: Close File.lines stream

Change-Id: I2f9e6da5584a40bb4b4efed0b87ae456f119d757
2023-03-23 09:05:42 +01:00
Matthias Sohn 55164c43b9 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  GC: Close File.lines stream

Change-Id: Ib473750e5a3ad3d74b0cb41f25052890f50a975c
2023-03-23 09:04:50 +01:00
Xing Huang 3212c8fa38 GC: Close File.lines stream
From File#lines javadoc: The returned stream from File Lines
encapsulates a Reader. If timely disposal of file system resources is
required, the try-with-resources construct should be used to ensure
that the stream's close method is
invoked after the stream operations are completed.

Wrap File.lines with try-with-resources.

Change-Id: I82c6faa3ef1083f6c7e964f96e9540b4db18eee8
Signed-off-by: Xing Huang <xingkhuang@google.com>
(cherry picked from commit 172a207945)
2023-03-23 08:19:26 +01:00
Matthias Sohn 1d073e30d7 [errorprone] Suppress [Finally] warnings
In these cases we use Throwable#addSuppressed to ensure the exception
thrown in the catch block preceding the finally block throwing another
exception isn't lost.

Change-Id: I96e78a5c15238ab77ac90ca1901850ce19acfcd8
2023-03-02 23:33:27 +01:00
Pavel Salamon 0518a6b0c1 Change config pull.rebase=preserve to pull.rebase=merges
The native git option to preserve merge commits during rebase
has been changed from pull.rebase=preserve to pull.rebase=merges.

This changeset in jgit makes the same config change. The old "preserve"
option is no longer recognized and is replaced by new option called
"merges".

This makes jgit's rebase configuration compatible with native git
versions 2.34 and newer where the old "preserve" option has been
removed.

Change-Id: Ic07ff954e258115e76465a1593ef3259f4c418a3
2023-02-28 23:44:41 +01:00
Matthias Sohn 2d0b908048 BatchingProgressMonitor: expose time spent per task
Display elapsed time per task if enabled via
ProgressMonitor#showDuration or if system property or environment
variable GIT_TRACE_PERFORMANCE is set to "true". If both the system
property and the environment variable are set the system property takes
precedence.

E.g. using jgit CLI:

$ GIT_TRACE_PERFORMANCE=true jgit clone https://foo.bar/foobar
Cloning into 'foobar'...
remote: Counting objects: 1 [0.002s]
remote: Finding sources: 100% (15531/15531) [0.006s]
Receiving objects:      100% (169737/169737) [13.045s]
Resolving deltas:       100% (67579/67579) [1.842s]

Change-Id: I4d624e7858b286aeddbe7d4e557589986d73659e
2023-02-27 16:41:33 -05:00
Ivan Frade ca2c57b2ec PackWriter: offer to write an object-size index for the pack
PackWriter callers tell the writer what do the want to include in the
pack and invoke #writePack(). Afterwards, they can invoke #writeIndex()
to write the corresponding pack index.

Mirror this for the object-size index, adding a #writeObjectSizeIndex()
method.

Change-Id: Ic319975c72c239cd6488303f7d4cced797e6fe00
2023-02-24 12:56:33 -08:00
Matthias Sohn cfacc43b52 Fix formatting in GC#doGc
Change-Id: Ifa3adb66d4e0404bab4036d6b165d6c4dafe921a
2023-02-24 15:18:39 +01:00
Ivan Frade ad07196d60 PackExt: Define new extension for the object size index
Change-Id: I6bbaf43b4e6fb456ca0e9e0c6efcfeded0f94d6d
2023-02-23 09:32:20 -08:00
Matthias Sohn 176f17d05e Merge branch 'stable-6.4'
* stable-6.4:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: Ia2797b44a60342eb9df53f0b3d674cba92a512fc
2023-02-22 21:06:41 +01:00
Matthias Sohn f4eda3360a Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I5af34c92e423a651db53b4dc45ed844d5f39910d
2023-02-22 21:05:55 +01:00
Matthias Sohn 636f377e4e Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I5b6b10622b61fde3f0f10455a74ae159a0b69082
2023-02-22 21:03:52 +01:00
Matthias Sohn 6cc741aa23 Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I3ffe92566cc145053bb762f612dd96bc6d542c62
2023-02-22 21:03:22 +01:00
Matthias Sohn b526829fba Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: Idea23e555c024557d7e39a86efe25f609400b962
2023-02-22 21:02:47 +01:00
Matthias Sohn 238f1693f7 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  If tryLock fails to get the lock another gc has it
  Fix GcConcurrentTest#testInterruptGc
  Don't swallow IOException in GC.PidLock#lock
  Check if FileLock is valid before using or releasing it

Change-Id: I708d0936fa86b028e4da4e7e21f332f8b48ad293
2023-02-22 21:02:09 +01:00
Matthias Sohn d9f75e8bb2 If tryLock fails to get the lock another gc has it
Change-Id: Ifd3bbcc5e0591883b774d23256949a83010ea134
2023-02-22 20:38:43 +01:00
Matthias Sohn 49f5273867 Don't swallow IOException in GC.PidLock#lock
This broke the test GcConcurrentTest#testInterruptGc which expects
ClosedByInterruptException when the thread doing gc is interrupted.

Change-Id: I89e02fc37aceeccb04c20cfc5b71cb8fa21793d6
2023-02-22 19:27:30 +01:00
Matthias Sohn a6da439b47 Check if FileLock is valid before using or releasing it
Change-Id: I23ba67b61b9b03772f33a929c080c0d02b8c8652
2023-02-22 02:56:06 +01:00
Matthias Sohn e92212a5a0 Merge branch 'stable-6.4'
* stable-6.4:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ifa4e56b6ecca9305f3f1685e45450019bfc82e22
2023-02-22 01:29:32 +01:00
Matthias Sohn dcd6367391 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ic40dbab18616d8d9fe3820b9890c86652b80eb47
2023-02-22 01:28:27 +01:00
Matthias Sohn c70374e641 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: I53cf9675deac0b588048d8224216d2a7e8bd16ec
2023-02-22 01:27:50 +01:00
Matthias Sohn 628ca9bd6f Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: I0562a4a224779ccf1e4cc1ff8f5a352e55ab220a
2023-02-22 01:27:16 +01:00
Matthias Sohn 4c111e59d0 Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ib9a2419253ffcbc90874adbfdb8129fee3178210
2023-02-22 01:26:36 +01:00
Matthias Sohn 2a2a208fa1 Use Java 11 ProcessHandle to get pid of the current process
Change-Id: I790f218601c1d5e1b39c4101e3b2708e76b9d782
2023-02-22 01:06:06 +01:00
Matthias Sohn aa13d1daf5 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ibb5c46cb79377d2d2cd7d4586f31c86665d2851c
2023-02-22 01:00:26 +01:00
kylezhao d789fe2f4d UploadPack: use allow-any-sha1-in-want configuration
C git 2.11 supports setting the equivalent of RequestPolicy.ANY with
uploadpack.allowAnySHA1InWant[1]. Parse this into TransportConfig and
use it from UploadPack.

Add additional tests for [2] and this change.

We can execute "git clone --filter=blob:none --no-checkout" successfully
with config uploadPack.allowFilter is true. But when we checkout, the
git will fetch other missing objects required by the checkout(this is
why we need this config).

When both uploadPack.allowFilter and uploadPack.allowAnySHA1InWant are
true, jgit will support partial clone. If you are using an extremely
large monorepo, this feature can help. It allows users to work on an
incomplete repo which reduces disk usage.

[1] f8edeaa05d
[2] change Id39771a6e42d8082099acde11249306828a053c0

Bug: 573390
Change-Id: I8fe75f03bf1fea7c11e0d67c8637bd05dd1f9b89
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-02-21 09:11:21 +01:00
Matthias Sohn 8eee800fb1 Acquire file lock "gc.pid" before running gc
Git guards gc by locking a lock file "gc.pid" before starting execution.
The lock file contains the pid and hostname of the process holding the
lock. Git tries to kill the process holding that lock if the lock file
wasn't modified in the last 12 hours and was started from the same host.

Teach JGit to acquire this lock before running gc but skip execution if
another process already holds the lock. Killing the other process could
be undesired if it's a long running application.

If the lock file wasn't modified in the last 12 hours try to lock it and
run gc if locking succeeds.

Register a shutdown hook for the lock file to ensure it is cleaned up if
the process is gracefully killed.

Change-Id: I00b838dcbf4fb0d03863bf7a2cd86b743c6c6971
2023-02-21 00:18:33 +01:00
Matthias Sohn 5d5a0d5376 Externalize strings introduced in c9552aba
Change-Id: I81bb78344df61e6eb42622fcef6235d4da0ae052
2023-02-20 21:40:40 +01:00
Matthias Sohn fe64445c11 Merge branch 'stable-6.4'
* stable-6.4:
  Fix getPackedRefs to not throw NoSuchFileException
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I36051c623fcd480aa80ed32b4e89f9bdd1b798e0
2023-02-20 21:29:30 +01:00
Matthias Sohn f8e6bcba48 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Fix getPackedRefs to not throw NoSuchFileException
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I1073098fb06eabafdb3c5e7fcf44d55b86a1b152
2023-02-20 21:01:38 +01:00
Matthias Sohn 6ea0e11869 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Fix getPackedRefs to not throw NoSuchFileException
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I765c7302ce84a6a9c28fdef29da2bfaa49477c6e
2023-02-20 20:59:14 +01:00
Ivan Frade 596c445af2 PackConfig: add entry for minimum size to index
The object size index can have up to #(blobs-in-repo) entries, taking
a relevant amount of memory. Let operators configure the threshold size
to include objects in the size index.

The index will include objects with size *at or above* this
value (with -1 for none). This is more effective for the
filter-by-size case.

Lowering the threshold adds more objects to the index. This improves
performance at the cost of memory/storage space. For the object-size
case, more calls will use the index instead of reading IO. For the
filter-by-size case, lower threshold means better granularity (if
ObjectReader#isSmallerThan is implemented based only on the index).

Change-Id: I6ccd9334adbbc2abf95fde51dbbfc85b8230ade0
2023-02-16 10:25:44 -08:00
Matthias Sohn d8155c137e Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Fix getPackedRefs to not throw NoSuchFileException
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: Id32683d5f506e082d39af269803bccee0280cc27
2023-02-16 16:59:56 +01:00
Matthias Sohn 07a9eb06ff Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I876a38c2de8b7d5eaacd00e36b85599f88173221
2023-02-16 16:59:09 +01:00
Matthias Sohn c46eb91611 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Add pack options to preserve and prune old pack files
  Allow to perform PackedBatchRefUpdate without locking loose refs
  Document option "core.sha1Implementation" introduced in 59029aec

Change-Id: I423f410578f5bbe178832b80fef8998a5372182c
2023-02-16 16:48:24 +01:00
Prudhvi Akhil Alahari 012cb77930 Fix getPackedRefs to not throw NoSuchFileException
Since Files.newInputStream is from java.nio package, it throws
java.nio.file.NoSuchFileException. This was missed in the change
I00da88e. Without this change, getPackedRefs fails with
NoSuchFileException when there is no packed-refs file in a project.

Change-Id: I93c202ddb73a0a5979af8e4d09e45f5645664b45
Signed-off-by: Prudhvi Akhil Alahari <quic_prudhvi@quicinc.com>
2023-02-16 16:44:12 +05:30
Ivan Frade c9552abaf3 PackObjectSizeIndex: interface and impl for the object-size index
Operations like "clone --filter=blob:limit=N" or the "object-info"
command need to read the size of the objects from the storage. An
index would provide those sizes at once rather than having to seek in
the packfile.

Introduce an interface for the Object-size index. This index returns
the inflated size of an object. Not all objects could be indexed (to
limit memory usage).

This implementation indexes only blobs (no trees, nor
commits) *above* certain size threshold (configurable). Lower
threshold adds more objects to the index, consumes more memory and
provides better performance. 0 means "all blobs" and -1 "disabled".

If we don't index everything, for the filter use case is more
efficient to index the biggest objects first: the set is small and
most objects are filtered by NOT being in the index. For the
object-size, the more objects in the index the better, regardless
their size. All together, it is more helpful to index above threshold.

Change-Id: I9ed608ac240677e199b90ca40d420bcad9231489
2023-02-14 11:50:29 -08:00
Ivan Frade 62d0e7be7c UInt24Array: Array of unsigned ints encoded in 3 bytes.
The object size index stores positions of objects in the main
index (when ordered by sha1). These positions are per-pack and usually
a pack has <16 million objects (there are exceptions but rather
rare). It could save some memory storing these positions in three bytes
instead of four. Note that these positions are sorted and always positive.

Implement a wrapper around a byte[] to access and search "ints" while
they are stored as unsigned 3 bytes.

Change-Id: Iaa26ce8e2272e706e35fe4cdb648fb6ca7591972
2023-02-14 10:19:12 -08:00
Ivan Frade 5b9ca7df42 PackIndex: expose the position of an object-id in the index
The primary index returns the offset in the pack for an
objectId. Internally it keeps the object-ids in lexicographical order,
but doesn't expose an API to find the position of an object-id in that
list. This is needed for the object-size index, that we want to store
as "position-in-idx, size".

Add a #findPosition(object-id) method to the PackIndex interface to
know where an object-id sits in the ordered list of ids in the pack.

Note that this index position is over the list of ordered object-ids,
while reverse-index position is over the list of objects in packed
order.

Change-Id: I89fa146599e347a26d3012d3477d7f5bbbda7ba4
2023-02-14 10:01:29 -08:00
Matthias Sohn 9424052f27 Add pack options to preserve and prune old pack files
Add the options
- pack.preserveOldPacks
- pack.prunePreserved

This allows to configure in git config if old packs should be preserved
during gc and pruned during the next gc.

The original implementation in 91132bb0 only allows to set these options
using the API.

Change-Id: I5b23ab4f317d12f5ccd234401419913e8263cc9a
2023-02-11 01:19:28 +01:00
Xing Huang df5b7959be DfsPackFile/DfsGC: Write commit graphs and expose in pack
JGit knows how to read/write commit graphs but the DFS stack is not
using it yet.

The DFS garbage collector generates a commit-graph with commits
reachable from any ref. The pack is stored as extra stream in the GC
pack. DfsPackFile mimicks how other indices are loaded storing the
reference in DFS cache.

Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I3f94997377986d21a56b300d8358dd27be37f5de
2023-02-07 16:59:56 -05:00
Xing Huang eccae7cf0b ObjectReader: Allow getCommitGraph to throw IOException
ObjectReader#getCommitGraph doesn't report errors loading the
commit graph. The caller should be aware of the situation and
ultimately decide what to do.

Add IOException to ObjectReader#getCommitGraph signature. RevWalk
defaults to an empty commit-graph on IO errors.

Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I38eeacff76c7f926b6dfb192d1e5916e40770024
2023-02-07 11:32:12 -05:00
Saša Živkov ed2cbd9e8a Allow to perform PackedBatchRefUpdate without locking loose refs
Add another newBatchUpdate method in the RefDirectory where we can
control if the created PackedBatchRefUpdate will lock the loose refs or
not.

This can be useful in cases when we run programs which have exclusive
access to a Git repository and we know that locking loose refs is
unnecessary and just a performance loss.

Change-Id: I7d0932eb1598a3871a2281b1a049021380234df9
(cherry picked from commit cb90ed0852)
2023-02-03 10:18:47 +01:00
Han-Wen NIenhuys a1fa0ee679 Merge "UploadPack: consume delimiter in object-info command" 2023-02-02 09:09:25 -05:00
Han-Wen NIenhuys f94ab7680c Merge "PatchApplier fix - init cache with provided tree" 2023-02-02 09:00:56 -05:00
Han-Wen Nienhuys 341116103e UploadPack: consume delimiter in object-info command
The 'size' packet line is an argument, so it
must be preceeded by a 0001 delimiter. See also git's
t5701-git-serve.sh test,

https://github.com/git/git/blob/8b8d9a2/t/t5701-git-serve.sh#L329

Without this fix, the server will choke on the delimiter line, saying
PackProtocolException: unexpected <empty string>

To test, I ran Gerrit locally with this fix

$ curl -X POST   -H 'git-protocol: version=2'   -H 'content-type:
application/x-git-upload-pack-request'   -H 'accept:
application/x-git-upload-pack-result'   --data
$'0018command=object-info\n00010009size\n0031oid
d38b1b92bdb2893eb4505667375563f2d6d4086b\n0000'
http://localhost:8080/git.git/git-upload-pack

=>

0008size0032d38b1b92bdb2893eb4505667375563f2d6d4086b 268590000


The same command completes identically on Gitlab (which supports the
object-info command)

$ curl -X POST   -H 'git-protocol: version=2'   -H 'content-type:
application/x-git-upload-pack-request'   -H 'accept:
application/x-git-upload-pack-result'   --data
$'0018command=object-info\n00010009size\n0031oid
d38b1b92bdb2893eb4505667375563f2d6d4086b\n0000'
https://gitlab.com/gitlab-org/git.git/git-upload-pack

=>

0008size0032d38b1b92bdb2893eb4505667375563f2d6d4086b 268590000

In this case, the blob is for the COPYING file in the Git source tree,
which is 26859 bytes long.

Change-Id: Ief4ce1eb9303a3b2479547d7950ef01c7c28f472
2023-02-02 08:47:35 -05:00
Nitzan Gur-Furman a399bd13b1 PatchApplier fix - init cache with provided tree
This change only affects inCore repositories.
Before this change, any file that wasn't part of the patch
wasn't read, and therefore wasn't part of the output tree.

Change-Id: I246ef957088f17aaf367143f7a0b3af0f8264ffb
Bug: Google b/267270348
2023-02-02 12:39:26 +01:00
Ivan Frade 8898d62dbc Merge "DfsReaderIoStats: Add Commit Graph fields into DfsReaderIoStats" 2023-02-01 18:06:56 -05:00
Matthias Sohn 8bd960bf2b Merge changes I343cc3cf,I9dedf61b
* changes:
  Avoid error-prone warning
  Fix unused exception error-prone warning
2023-02-01 16:52:37 -05:00
Han-Wen Nienhuys b30c75be40 Fix unused exception error-prone warning
Ignoring the exception seems intended in this case.

Change-Id: I9dedf61b9cb5a6ff39fb141dd5da19143f4f6978
2023-02-01 10:53:43 +01:00
Han-Wen Nienhuys 97e8b4cc71 UploadPack: advertise object-info command if enabled
Change-Id: Iad8e5b5f4fdd84bd275eb19ee0d01eb6986d79f2
2023-02-01 10:52:33 +01:00
Han-Wen NIenhuys 66b871b777 Merge "Move MemRefDatabase creation in a separate method." 2023-02-01 04:15:44 -05:00
Matthias Sohn 580cb13f21 Merge branch 'stable-6.4'
* stable-6.4:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Id0ebfbd85eb815716383b9495eb7dd1f54cf4d74
2023-02-01 01:23:34 +01:00
Matthias Sohn ef010db594 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Iefcf5d832bd0087c1027876f2200689e1150abce
2023-02-01 01:12:06 +01:00
Matthias Sohn 82e1362e07 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: I2ff386d9a096277360e6c7bd5535b49984620fb3
2023-02-01 01:10:56 +01:00
Matthias Sohn d8c02aec6a Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Iff2fba026b49463016015b2fae1a42cf76ee2dbb
2023-02-01 00:54:30 +01:00
Matthias Sohn b5de5ccb9e Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: Ib5055f2f3b8a313c178d6f6c7c5630285ad5a726
2023-02-01 00:41:52 +01:00
Matthias Sohn da21265a14 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Shortcut during git fetch for avoiding looping through all local refs
  FetchCommand: fix fetchSubmodules to work on a Ref to a blob
  Silence API warnings introduced by I466dcde6
  Allow the exclusions of refs prefixes from bitmap
  PackWriterBitmapPreparer: do not include annotated tags in bitmap
  BatchingProgressMonitor: avoid int overflow when computing percentage
  Speedup GC listing objects referenced from reflogs
  FileSnapshotTest: Add more MISSING_FILE coverage

Change-Id: I58ad4c210a5e7e5a1ba6b22315b04211c8909950
2023-02-01 00:33:20 +01:00
Luca Milanesio 21e902dd7f Shortcut during git fetch for avoiding looping through all local refs
The FetchProcess needs to verify that all the refs received point
to objects that are reachable from the local refs, which could be
very expensive but is needed to avoid missing objects exceptions
because of broken chains.

When the local repository has a lot of refs (e.g. millions) and the
client is fetching a non-commit object (e.g. refs/sequences/changes in
Gerrit) the reachability check on all local refs can be very expensive
compared to the time to fetch the remote ref.

Example for a 2M refs repository:
- fetching a single non-commit object: 50ms
- checking the reachability of local refs: 30s

A ref pointing to a non-commit object doesn't have any parent or
successor objects, hence would never need to have a reachability check
done. Skipping the askForIsComplete() altogether would save the 30s
time spent in an unnecessary phase.

Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: I09ac66ded45cede199ba30f9e71cc1055f00941b
2023-02-01 00:07:45 +01:00
Matthias Sohn 7650832002 FetchCommand: fix fetchSubmodules to work on a Ref to a blob
FetchCommand#fetchSubmodules assumed that FETCH_HEAD can always be
parsed as a tree. This isn't true if it refers to a Ref referring to a
BLOB. This is e.g. used in Gerrit for Refs like refs/sequences/changes
which are used to implement sequences stored in git.

Change-Id: I414f5b7d9f2184b2d7d53af1dfcd68cccb725ca4
2023-01-31 23:52:20 +01:00
Luca Milanesio ad977f1572 Allow the exclusions of refs prefixes from bitmap
When running a GC.repack() against a repository with over one
thousands of refs/heads and tens of millions of ObjectIds,
the calculation of all bitmaps associated with all the refs
would result in an unreasonable big file that would take up to
several hours to compute.

Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz
Before this change: 20 mins
After this change and 2300 heads excluded: 10 mins (90s for bitmap)

Having such a large bitmap file is also slow in the runtime
processing and have negligible or even negative benefits, because
the time lost in reading and decompressing the bitmap in memory
would not be compensated by the time saved by using it.

It is key to preserve the bitmaps for those refs that are mostly
used in clone/fetch and give the ability to exlude some refs
prefixes that are known to be less frequently accessed, even
though they may actually be actively written.

Example: Gerrit sandbox branches may even be actively
used and selected automatically because its commits are very
recent, however, they may bloat the bitmap, making it ineffective.

A mono-repo with tens of thousands of developers may have
a relatively small number of active branches where the
CI/CD jobs are continuously fetching/cloning the code. However,
because Gerrit allows the use of sandbox branches, the
total number of refs/heads may be even tens to hundred
thousands.

Change-Id: I466dcde69fa008e7f7785735c977f6e150e3b644
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
2023-01-31 17:14:09 -05:00
Dmitrii Filippov 0f3a3fde95 Move MemRefDatabase creation in a separate method.
The InMemoryRepository is used in tests (e.g. in gerrit tests) and it
can be useful to create a custom MemRefDatabase for some tests.

Change-Id: I6fbbbfe04400ea1edc988c8788c8eeb06ca8480a
2023-01-31 13:55:25 -05:00
Luca Milanesio e4529cd39c PackWriterBitmapPreparer: do not include annotated tags in bitmap
The annotated tags should be excluded from the bitmap associated
with the heads-only packfile. However, this was not happening
because of the check of exclusion of the peeled object instead
of the objectId to be excluded from the bitmap.

Sample use-case:

refs/heads/main
  ^
  |
 commit1 <-- commit2 <- annotated-tag1 <- tag1
  ^
  |
 commit0

When creating a bitmap for the above commit graph, before this
change all the commits are included (3 bitmaps), which is
incorrect, because all commits reachable from annotated tags
should not be included.

The heads-only bitmap should include only commit0 and commit1
but because PackWriterBitPreparer was checking for the peeled
pointer of tag1 to be excluded (commit2) which was not found in
the list of tags to exclude (annotated-tag1), the commit2 was
included, even if it wasn't reachable only from the head.

Add an additional check for exclusion of the original objectId
for allowing the exclusion of annotated tags and their pointed
commits. Add one specific test associated with an annotated tag
for making sure that this use-case is covered also.

Example repository benchmark for measuring the improvement:
# refs: 400k (2k heads, 88k tags, 310k changes)
# objects: 11M (88k of them are annotate tags)
# packfiles: 2.7G

Before this change:
GC time: 5h
clone --bare time: 7 mins

After this change:
GC time: 20 mins
clone --bare time: 3 mins

Bug: 581267
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Change-Id: Iff2bfc6587153001837220189a120ead9ac649dc
2023-01-31 14:15:56 +01:00
Matthias Sohn 611412a055 BatchingProgressMonitor: avoid int overflow when computing percentage
When cloning huge repositories I observed percentage of object counts
turning negative. This happened if lastWork * 100 exceeded
Integer.MAX_VALUE.

Change-Id: Ic5f5cf5a911a91338267aace4daba4b873ab3900
2023-01-31 14:15:53 +01:00
Xing Huang 66ad43a6c7 DfsReaderIoStats: Add Commit Graph fields into DfsReaderIoStats
We are adding commit-graph loading to the DFS stack and the stats object doesn't have fields to track that.

This change replicates the stats of the primary index for the commit-graph.

Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I4a657bed50083c4ae8bc9f059d4943d612ea2d49
2023-01-25 15:29:04 -06:00
Matthias Sohn cd3fc7a299 Speedup GC listing objects referenced from reflogs
GC needs to get a ReflogReader for all existing refs to list all objects
referenced from reflogs. The existing Repository#getReflogReader method
accepts the ref name and then resolves the Ref to create a ReflogReader.
GC calling that for a huge number of Refs one by one is very slow. GC
first gets all Refs in bulk and then calls getReflogReader for each of
them.

Fix this by adding another getReflogReader method to Repository which
accepts a Ref directly.

This speeds up running JGit gc on a mirror clone of the Gerrit
repository from 15:36 min to 1:08 min. The repository used in this test
had 45k refs, 275k commits and 1.2m git objects.

Change-Id: I474897fdc6652923e35d461c065a29f54d9949f4
2023-01-23 17:19:14 +01:00
Matthias Sohn a1901305b2 Merge branch 'stable-6.4'
* stable-6.4:
  Cache trustFolderStat/trustPackedRefsStat value per-instance
  Refresh 'objects' dir and retry if a loose object is not found

Change-Id: Iea8038dfde29ab988501469f86ee829e578a2fe8
2023-01-13 19:33:54 +01:00
Matthias Sohn 14300dd77b Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Cache trustFolderStat/trustPackedRefsStat value per-instance
  Refresh 'objects' dir and retry if a loose object is not found

Change-Id: I1db2b51ae8101f345d08235d4f3dc416bfcb42d5
2023-01-13 19:32:56 +01:00
Matthias Sohn 5bd2832134 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Cache trustFolderStat/trustPackedRefsStat value per-instance
  Refresh 'objects' dir and retry if a loose object is not found

Change-Id: Ibc9bffab8c9ef9c39384b53c142d99878f7f3f98
2023-01-13 19:32:06 +01:00
Matthias Sohn 9eef6790cf Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Cache trustFolderStat/trustPackedRefsStat value per-instance
  Refresh 'objects' dir and retry if a loose object is not found

Change-Id: I9e876f72f735f58bf02c7862a3d8e657fc46a7b9
2023-01-13 19:31:18 +01:00
Nasser Grainawi 21b2aef0aa Cache trustFolderStat/trustPackedRefsStat value per-instance
Instead of re-reading the config every time the methods using these
values were called, cache the config value at the time of instance
construction. Caching the values improves performance for each of the
method calls. These configs are set based on the filesystem storing the
repository and unlikely to change while an application is running.

Change-Id: I1cae26dad672dd28b766ac532a871671475652df
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
2023-01-13 18:45:02 +01:00
Kaushik Lingarkar fed1a54935 Refresh 'objects' dir and retry if a loose object is not found
A new loose object may not be immediately visible on a NFS
client if it was created on another client. Refreshing the
'objects' dir and trying again can help work around the NFS
behavior.

Here's an E2E problem that this change can help fix. Consider
a Gerrit multi-primary setup with repositories based on NFS.
Add a new patch-set to an existing change and then immediately
fetch the new patch-set of that change. If the fetch is handled
by a Gerrit primary different that the one which created the
patch-set, then we sometimes run into a MissingObjectException
that causes the fetch to fail.

Bug: 581317
Change-Id: Iccc6676c68ef13a1e8b2ff52b3eeca790a89a13d
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-01-13 18:44:35 +01:00
kylezhao de7d06775c RevWalk: integrate commit-graph with commit parsing
RevWalk#createCommit() will inspect the commit-graph file to find the
specified object's graph position and then return a new RevCommitCG
instance.

RevCommitGC is a RevCommit with an additional "pointer" (the position)
to the commit-graph, so it can load the headers and metadata from there
instead of the pack. This saves IO access in walks where the body is not
needed (i.e. #isRetainBody is false and #parseBody is not invoked).

RevWalk uses automatically the commit-graph if available, no action
needed from callers. The commit-graph is fetched on first access from
the reader (that internally can keep it loaded and reuse it between
walks).

The startup cost of reading the entire commit graph is small. After
testing, reading a commit-graph with 1 million commits takes less than
50ms. If we use RepositoryCache, it will not be initialized util the
commit-graph is rewritten.

Bug: 574368
Change-Id: I90d0f64af24f3acc3eae6da984eae302d338f5ee
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-01-10 14:56:33 +08:00
Matthias Sohn 801a56b48a Merge branch 'stable-6.4'
* stable-6.4:
  Introduce core.trustPackedRefsStat config
  Fix documentation for core.trustFolderStat

Change-Id: I93ad0c49b70113134026364c9f647de89d948693
2023-01-06 22:09:55 +01:00
kylezhao 05e5e9907c GC: disable writing commit-graph for shallow repos
In shallow repos, GC writes to the commit-graph that shallow commits
do not have parents. This won't be true after a "git fetch --unshallow"
(and before another GC).

Do not write the commit-graph from shallow clones of a repo. The
commit-graph must have the real metadata of commits and that is not
available in a shallow view of the repo.

Change-Id: Ic9f2358ddaa607c74f4dbf289c9bf2a2f0af9ce0
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-01-06 13:13:13 -05:00
Matthias Sohn 6a35235d16 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Introduce core.trustPackedRefsStat config
  Fix documentation for core.trustFolderStat

Change-Id: I18d9fc89c9ac1ef069dcefa7d7f992a28539ccf3
2023-01-05 16:09:58 +01:00
Matthias Sohn e4c2331af6 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Introduce core.trustPackedRefsStat config
  Fix documentation for core.trustFolderStat

Change-Id: I48b6c095ac62dc859829d6fef45325accbb0a144
2023-01-05 16:05:14 +01:00
Matthias Sohn 62ed46da16 Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Introduce core.trustPackedRefsStat config
  Fix documentation for core.trustFolderStat

Change-Id: Ic78630f74c72624932a384eed52ef79ae1eff3e5
2023-01-05 15:55:19 +01:00
Kaushik Lingarkar 82b5aaf7e3 Introduce core.trustPackedRefsStat config
Currently, we always read packed-refs file when 'trustFolderStat'
is false. Introduce a new config 'trustPackedRefsStat' which takes
precedence over 'trustFolderStat' when reading packed refs. Possible
values for this new config are:

* always: Trust packed-refs file attributes
* after_open: Same as 'always', but refresh the file attributes of
              packed-refs before trusting it
* never: Always read the packed-refs file
* unset: Fallback to 'trustFolderStat' to determine if the file
  attributes of packed-refs can be trusted

Folks whose repositories are on NFS and have traditionally been
setting 'trustFolderStat=false' can now get some performance improvement
with 'trustPackedRefsStat=after_open' as it refreshes the file
attributes of packed-refs (at least on some NFS clients) before
considering it.

For example, consider a repository on NFS with ~500k packed-refs. Here
are some stats which illustrate the improvement with this new config
when reading packed refs on NFS:

trustFolderStat=true trustPackedRefsStat=unset: 0.2ms
trustFolderStat=false trustPackedRefsStat=unset: 155ms
trustFolderStat=false trustPackedRefsStat=after_open: 1.5ms

Change-Id: I00da88e4cceebbcf3475be0fc0011ff65767c111
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
2023-01-05 15:52:36 +01:00
Matthias Sohn 8ef58089a8 RefDatabase: fix javadoc formatting
Change-Id: I547819ac380a0e6a88d05206ff171b69f46a8549
2023-01-04 23:51:30 +01:00
Matthias Sohn ddf1c1ed3c Pull up additionalRefsNames from RefDirectory to RefDatabase
This enables to reuse this constant in all RefDatabase implementations.

Change-Id: I13d8fb780de24f71e005b698965fb5bcdbf3c728
2023-01-04 23:51:30 +01:00
Matthias Sohn 70b436b1b2 Add TernarySearchTree
A ternary search tree is a type of tree where nodes are arranged in a
manner similar to a binary search tree, but with up to three children
rather than the binary tree's limit of two.

Each node of a ternary search tree stores a single character, a
reference to a value object and references to its three children named
equal kid, lo kid and hi kid. The lo kid pointer must point to a node
whose character value is less than the current node. The hi kid pointer
must point to a node whose character is greater than the current
node.[1] The equal kid points to the next character in the word. Each
node in a ternary search tree represents a prefix of the stored strings.
All strings in the middle subtree of a node start with that prefix.

Like other prefix trees, a ternary search tree can be used as an
associative map with the ability for incremental string search. Ternary
search trees are more space efficient compared to standard prefix trees,
at the cost of speed.

They allow efficient prefix search which is important to implement
searching refs by prefix in a RefDatabase.

Searching by prefix returns all keys if the prefix is an empty string.

Bug: 576165
Change-Id: If160df70151a8e1c1bd6716ee4968e4c45b2c7ac
2023-01-04 23:51:23 +01:00
kylezhao 414bfe05ff CommitGraph: teach ObjectReader to get commit-graph
FileRepository's ObjectReader#getCommitGraph will return commit-graph
when it exists and core.commitGraph is true.

DfsRepository is not supported currently.

Change-Id: I992d43d104cf542797e6949470e95e56de025107
Signed-off-by: kylezhao <kylezhao@tencent.com>
2023-01-04 14:50:38 +08:00