Commit Graph

479 Commits

Author SHA1 Message Date
Ivan Frade 593fbf7c3d CommitGraphWriter: Add progress monitor to bloom filter computation
Bloom filter computation can be an expensive process and right now it
is invisible to the user.

Report progress while calculating bloom filters.

Log of GC with bloom filter enabled:

Computing commit-graph path bloom filters: 100% (9551/9551)
Computing commit-graph generation numbers: 100% (9551/9551)
Writing out commit-graph: 100% (9551/9551)

Change-Id: Ife65e63ac2c37d064d5f049a366cbb52c3ef6798
2023-11-07 14:34:38 -08:00
Ivan Frade c46b54eeac CommitGraphWriter: Unnest generation-number progress
The ProgressMonitor task to track the calculation of generation
numbers is nested inside the task that follows the writing of all
lines in the commit-graph. ProgressMonitor doesn't support nested
tasks and this confuses the counting.

Move the start/end of the "writing commit graph" task to the
writeCommitData section, after calculating the generation
numbers. Make that task track by commits instead of by lines.

Moving the start/end of the progress task to the chunk-writing
functions is clearer and easier to extend.

Logging of GC before:
Writing out commit-graph in 3 passes:  51% ( 9807/19358)
Computing commit-graph generation numbers: 100% (9551/9551)

Logging of GC after:
Computing commit-graph generation numbers: 100% (9551/9551)
Writing out commit-graph: 100% (9551/9551)

Change-Id: I87d69c06c9a3c7e75be12b6f0d1a63b5924e298a
2023-11-07 13:41:54 -08:00
Ivan Frade 9323b430b9 PackObjectSizeIndexLoader: Log wrong bytes on exception
When the exception is thrown, we don't know if it is because the
stream didn't have data or had a wrong header.

Log the read bytes to differentiate these cases.

Change-Id: Ie7612eab39016f5ad7f1bfb2e07cab972dab796f
2023-10-17 15:45:51 -07:00
Thomas Wolf 621685d3ca DeleteBranchCommand: update config only at the end
When multiple branches were to be removed, the git config was updated
after each and every branch. Newly do so only once at the end, after all
branches have been deleted.

Because there may be an exception after some branches have already been
deleted, take care to update the config even if an exception is thrown.

Bug: 451508
Change-Id: I645be8a1a59a1476d421e46933c3f7cbd0639fec
Signed-off-by: Thomas Wolf <twolf@apache.org>
2023-10-14 23:33:11 +02:00
Matthias Sohn a2bce029aa WorkingTreeIterator: directly filter input stream
This way we can avoid to access the byte buffers backing array.
Implement a ByteBufferInputStream to wrap a byte buffer which we can use
to expose the filter result as an input stream.

Change-Id: I461c82090de2562ea9b649b3f953aad4571e3d25
2023-09-25 22:06:13 +02:00
Matthias Sohn 4262150f74 CommitGraphWriter: fix boxing warnings
Change-Id: I35c3a3baadb8d7d73c01252fe4333fa2159722ee
2023-09-22 17:00:15 +02:00
Matthias Sohn d4d6c2b5af Add ShutdownHook to cleanup FileLocks on graceful JVM shutdown
This should avoid stale lock files if the JVM is terminated gracefully.

Implement a ShutdownHook which can register/unregister listeners which
need to do some cleanup during graceful JVM shutdown. This hook is
registered as a Java shutdown hook and  when the JVM shuts down
calls #onShutdown of registered listeners using a parallel stream
to let them run concurrently.

See https://docs.oracle.com/javase/8/docs/technotes/guides/lang/hook-design.html

Bug: 582379
Change-Id: I1621dc5f7d9a8c832b6d1b74cbc47578b1c2f0b8
2023-09-12 22:43:15 +02:00
Matthias Sohn 7a6e852745 Merge branch 'stable-6.6' into stable-6.7
* stable-6.6:
  Prepare 6.6.2-SNAPSHOT builds
  JGit v6.6.1.202309021850-r
  Checkout: better directory handling

Change-Id: Ice82d68b2d343a5fac214807cdb369e486481aab
2023-09-03 02:16:04 +02:00
Thomas Wolf 9072103f3b Checkout: better directory handling
When checking out a file into the working tree ensure that all parent
directories of the file below the working tree root are actually
directories and do exist before we try to create the file.

When multiple files are to be checked out (or even a whole tree), this
may check the same directories over and over again. Asking the file
system every time for file attributes is a potentially expensive
operation. As a remedy, introduce an in-memory cache of directory
states for a particular check-out operation.

Apply the same fix also in the ResolveMerger, which may also check out
files, and also in the PatchApplier. In PatchApplier, also validate
paths.

Change-Id: Ie12864c54c9f901a2ccee7caddec73027f353111
Signed-off-by: Thomas Wolf <twolf@apache.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2023-09-03 00:16:26 +02:00
Anna Papitto 8123dcd699 PackReverseIndex: verify checksums
The new version 1 file-based reverse index has a footer with the
checksum of the corresponding pack file and a checksum of its own
contents. The initial implementation doesn't enforce that the pack
checksum matches the checksum found in the forward index nor that the
self checksum matches the contents of the file just read in.

Offer a method for reverse index users to verify the checksums in a way
appropriate to the version being used. For the pre-existing computed
version, always succeed since it is not based on a file so there is no
possibility of corruption.

Check for corruption of the file itself during parsing the checksum
footer, by comparing the self checksum with the digest of the file
contents read.

Change-Id: I87ff3933cf1afa76663350400b616695e4966cb6
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-07-18 15:19:26 -07:00
Anna Papitto 000e7caf5e PackReverseIndexV1: reverse index parsed from version 1 file
The reverse index for a pack is used to quickly find an object's
position in the pack's forward index based on that object's pack offset.
It is currently computed from the forward index by sorting the index
entries by the corresponding pack offset. This computation uses
insertion sort, which has an average runtime of O(n^2).

Cgit persists a pack reverse index file
to avoid recomputing the reverse index ordering. Instead they write a
file with format
https://git-scm.com/docs/pack-format#_pack_rev_files_have_the_format
which can later be read and parsed into the in-memory reverse index
each time it is needed.

PackReverseIndexV1 parses a reverse index file with the official
version 1 format into an in-memory representation of the reverse index
which implements methods to find an object's forward index position
from its offset in logorithmic time.

Change-Id: I60a92463fbd6a8cc9c1c7451df1c14d0a21a0f64
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-07-18 15:19:26 -07:00
Anna Papitto 64615b44e6 PackReverseIndexWriter: write out version 1 reverse index file
The reverse index for a pack is used to quickly find an object's
position in the pack's forward index based on that object's pack offset.
It is currently computed from the forward index by sorting the index
entries by the corresponding pack offset. This computation uses
bucket sort with insertion sort, which has an average runtime of
O(n log n) and worst case runtime of O(n^2); and memory usage of
3*size(int)*n because it maintains 3 int arrays, even after sorting is
completed. The computation must be performed every time that the reverse
index object is created in memory.

In contrast, Cgit persists a pack reverse index file to avoid
recomputing the reverse index ordering every time that it is needed.
Instead they write a file with format
https://git-scm.com/docs/pack-format#_pack_rev_files_have_the_format
which can later be read and parsed into an in-memory reverse index each
time it is needed.

Introduce these reverse index files to JGit. PackReverseIndexWriter
writes out a reverse index file to be read later when needed. Subclass
PackReverseIndexWriterV1 writes a file with the official version 1
format.

To avoid temporarily allocating an Integer collection while sorting and
writing out the contents, using memory 4*size(Integer)*n, use an
IntList and its #sort method, which uses quicksort.

Change-Id: I6437745777a16f723e2f1cfcce4e0d94e599dcee
Signed-off-by: Anna Papitto <annapapitto@google.com>
2023-04-28 10:19:18 -07:00
Nitzan Gur-Furman 903645835b PatchApplier: Check for existence of src/dest files before any operation
Change-Id: Ia3ec0ce1af65114b48669157a934f70f1e22fd37
Bug: Google b/271474227
2023-03-31 06:22:29 -04:00
Nitzan Gur-Furman 3a913a8c34 Fix PatchApplier error handling.
1. For general errors, throw IOException instead of wrapping them with
PatchApplyException. The wrapping was moved (back) to ApplyCommand.
2. For file specific errors, log the errors as part of
PatchApplier::Result.
3. Change applyPatch() to receive the parsed Patch object, so the caller
can decide how to handle parsing errors.

Background: this utility class was extracted from ApplyCommand on V6.4.0.
During the extraction, we left the exception wrapping by
PatchApplyException intact. This attitude made it harder for the callers to
distinguish between the actual error causes.

Change-Id: Ib0f2b5e97a13df2339d8b65f2fea1c819c161ac3
2023-03-28 11:18:08 +02:00
Matthias Sohn e92212a5a0 Merge branch 'stable-6.4'
* stable-6.4:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ifa4e56b6ecca9305f3f1685e45450019bfc82e22
2023-02-22 01:29:32 +01:00
Matthias Sohn dcd6367391 Merge branch 'stable-6.3' into stable-6.4
* stable-6.3:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ic40dbab18616d8d9fe3820b9890c86652b80eb47
2023-02-22 01:28:27 +01:00
Matthias Sohn c70374e641 Merge branch 'stable-6.2' into stable-6.3
* stable-6.2:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: I53cf9675deac0b588048d8224216d2a7e8bd16ec
2023-02-22 01:27:50 +01:00
Matthias Sohn 628ca9bd6f Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: I0562a4a224779ccf1e4cc1ff8f5a352e55ab220a
2023-02-22 01:27:16 +01:00
Matthias Sohn 4c111e59d0 Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Use Java 11 ProcessHandle to get pid of the current process
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ib9a2419253ffcbc90874adbfdb8129fee3178210
2023-02-22 01:26:36 +01:00
Matthias Sohn aa13d1daf5 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Acquire file lock "gc.pid" before running gc
  Silence API errors introduced by 9424052f

Change-Id: Ibb5c46cb79377d2d2cd7d4586f31c86665d2851c
2023-02-22 01:00:26 +01:00
Matthias Sohn 8eee800fb1 Acquire file lock "gc.pid" before running gc
Git guards gc by locking a lock file "gc.pid" before starting execution.
The lock file contains the pid and hostname of the process holding the
lock. Git tries to kill the process holding that lock if the lock file
wasn't modified in the last 12 hours and was started from the same host.

Teach JGit to acquire this lock before running gc but skip execution if
another process already holds the lock. Killing the other process could
be undesired if it's a long running application.

If the lock file wasn't modified in the last 12 hours try to lock it and
run gc if locking succeeds.

Register a shutdown hook for the lock file to ensure it is cleaned up if
the process is gracefully killed.

Change-Id: I00b838dcbf4fb0d03863bf7a2cd86b743c6c6971
2023-02-21 00:18:33 +01:00
Matthias Sohn 5d5a0d5376 Externalize strings introduced in c9552aba
Change-Id: I81bb78344df61e6eb42622fcef6235d4da0ae052
2023-02-20 21:40:40 +01:00
Xing Huang df5b7959be DfsPackFile/DfsGC: Write commit graphs and expose in pack
JGit knows how to read/write commit graphs but the DFS stack is not
using it yet.

The DFS garbage collector generates a commit-graph with commits
reachable from any ref. The pack is stored as extra stream in the GC
pack. DfsPackFile mimicks how other indices are loaded storing the
reference in DFS cache.

Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I3f94997377986d21a56b300d8358dd27be37f5de
2023-02-07 16:59:56 -05:00
Matthias Sohn 70b436b1b2 Add TernarySearchTree
A ternary search tree is a type of tree where nodes are arranged in a
manner similar to a binary search tree, but with up to three children
rather than the binary tree's limit of two.

Each node of a ternary search tree stores a single character, a
reference to a value object and references to its three children named
equal kid, lo kid and hi kid. The lo kid pointer must point to a node
whose character value is less than the current node. The hi kid pointer
must point to a node whose character is greater than the current
node.[1] The equal kid points to the next character in the word. Each
node in a ternary search tree represents a prefix of the stored strings.
All strings in the middle subtree of a node start with that prefix.

Like other prefix trees, a ternary search tree can be used as an
associative map with the ability for incremental string search. Ternary
search trees are more space efficient compared to standard prefix trees,
at the cost of speed.

They allow efficient prefix search which is important to implement
searching refs by prefix in a RefDatabase.

Searching by prefix returns all keys if the prefix is an empty string.

Bug: 576165
Change-Id: If160df70151a8e1c1bd6716ee4968e4c45b2c7ac
2023-01-04 23:51:23 +01:00
kylezhao 8a7348df69 CommitGraph: add commit-graph for FileObjectDatabase
This change makes JGit can read .git/objects/info/commit-graph file
and then get CommitGraph.

Loading a new commit-graph into memory requires additional time. After
testing, loading a copy of the Linux's commit-graph(1039139 commits)
is under 50ms.

Bug: 574368
Change-Id: Iadfdd6ed437945d3cdfdbe988cf541198140a8bf
Signed-off-by: kylezhao <kylezhao@tencent.com>
2022-12-23 13:06:06 +08:00
kylezhao 7b0f633b67 CommitGraph: implement commit-graph read
Git introduced a new file storing the topology and some metadata of
the commits in the repo (commitGraph). With this data, git can browse
commit history without parsing the pack, speeding up e.g.
reachability checks.

This change teaches JGit to read commit-graph-format file, following
the upstream format([1]).

JGit can read a commit-graph file from a buffered stream, which means
that we can provide this feature for both FileRepository and
DfsRepository.

[1] https://git-scm.com/docs/commit-graph-format/2.21.0

Bug: 574368
Change-Id: Ib5c0d6678cb242870a0f5841bd413ad3885e95f6
Signed-off-by: kylezhao <kylezhao@tencent.com>
2022-12-16 06:57:06 -05:00
kylezhao cf70e7cbe4 CommitGraph: implement commit-graph writer
Teach JGit to write a commit-graph formatted file by walking commit
graph from specified commit objects.

See: https://git-scm.com/docs/commit-graph-format/2.21.0

Bug: 574368
Change-Id: I34f9f28f8729080c275f86215ebf30b2d05af41d
Signed-off-by: kylezhao <kylezhao@tencent.com>
2022-12-06 20:34:46 +08:00
Nitzan Gur-Furman acde6c8f5b Split out ApplyCommand logic to PatchApplier class
PatchApplier now routes updates through the index. This has two
results:

* we can now execute patches in-memory.

* the JGit apply command will now always update the
index to match the working tree.

Change-Id: Id60a88232f05d0367787d038d2518c670cdb543f
Co-authored-by: Han-Wen Nienhuys <hanwen@google.com>
Co-authored-by: Nitzan Gur-Furman <nitzan@google.com>
2022-09-15 09:15:55 +02:00
Robin Müller 673007d529 ObjectDirectory: improve reading of shallow file
Use FileUtils.readWithRetries().

Change-Id: I5929184caca6b83a1ee87b462e541620bd68aa90
2022-07-31 14:08:48 +02:00
Robin Müller 207dd4c938 Fetch: add support for shallow
This adds support for shallow cloning. The CloneCommand and the
FetchCommand now have the new methods setDepth, setShallowSince and
addShallowExclude to tell the server that the client doesn't want to
download the complete history.

Bug: 475615
Change-Id: Ic80fb6efb5474543ae59be590ebe385bec21cc0d
2022-07-31 14:08:47 +02:00
Matthias Sohn 58f5302e1d Merge branch 'stable-6.1' into stable-6.2
* stable-6.1:
  Prepare 5.13.2-SNAPSHOT builds
  JGit v5.13.1.202206130422-r
  AmazonS3: Add support for AWS API signature version 4

Change-Id: Id4965aacd4e2ea1e8575a2c1bd4845729db6049a
2022-06-15 17:39:52 +02:00
Matthias Sohn d0bc2b544a Merge branch 'stable-6.0' into stable-6.1
* stable-6.0:
  Prepare 5.13.2-SNAPSHOT builds
  JGit v5.13.1.202206130422-r
  AmazonS3: Add support for AWS API signature version 4

Change-Id: Ie9c38ab8033fe1283e8b444b6acd3f4298062bf3
2022-06-15 16:32:08 +02:00
Matthias Sohn d961bb6502 Merge branch 'stable-5.13' into stable-6.0
* stable-5.13:
  Prepare 5.13.2-SNAPSHOT builds
  JGit v5.13.1.202206130422-r
  AmazonS3: Add support for AWS API signature version 4

Change-Id: Ibd663a1d874d1aac274abc3dd44354fd99f64c39
2022-06-15 16:31:38 +02:00
eric.steele e9a5430c25 AmazonS3: Add support for AWS API signature version 4
Updating the AmazonS3 class to support AWS Signature version 4 because
version 2 is no longer supported in all AWS regions. The version can be
selected with the new 'aws.api.signature.version' property (defaults to
2 for backwards compatibility). When set to '4', the user must also
specify the AWS region via the 'region' property. The 'region' property
must match the region that the 'domain' property resolves to.

Bug: 579907
Change-Id: If289dbc6d0f57323cfeaac2624c4eb5028f78d13
2022-06-13 09:44:23 +02:00
Andre Bossert c32694e5ae Teach JGit to handle external diff/merge tools defined in .gitattributes
Adds API that allows UI to find (and handle) diff/merge tools, specific
for the given path. The assumption is that user can specify file type
specific diff/merge tools via gitattributes.

Bug: 552840
Change-Id: I1daa091e9afa542a9ebb5417853dff0452ed52dd
Signed-off-by: Mykola Zakharchuk <zakharchuk.vn@gmail.com>
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Signed-off-by: Andre Bossert <andre.bossert@siemens.com>
2022-06-02 10:36:39 +02:00
yunjieli eca101fc05 Fetch: Introduce negative refspecs.
Implement negative refspecs in JGit fetch, following C Git. Git
supports negative refspecs in source only while this change supports
them in both source and destination.

If one branch is equal to any branch or matches any pattern in the
negative refspecs collection, the branch will not be fetched even if
it's in the toFetch collection.

With this feature, users can express more complex patterns during fetch.

Change-Id: Iaa1cd4de5c08c273e198b72e12e3dadae7be709f
Sign-off-by: Yunjie Li<yunjieli@google.com>
2022-04-13 10:21:20 -07:00
Matthias Sohn 6f175ea6c4 Describe: add support for core.abbrev config option
If core.abbrev is unset or "auto" estimate abbreviation length like C
git does:
- Estimate repository's object count by only considering packed objects,
  round up to next power of 2
- With the order of 2^len objects, we expect a collision at 2^(len/2).
  But we also care about hex chars, not bits, and there are 4 bits per
  hex. So all together we need to divide by 2; but we also want to round
  odd numbers up, hence adding one before dividing.
- For small repos use at least 7 hexdigits
- If object database fails to determine object count use 7 hexdigits as
  fallback

If it is set to "no" do not abbreviate object-ids.

Otherwise set it to the configured value capped to the range between 4
and length of an unabbreviated object-id.

Change-Id: I425f9724b69813dbb57872466bf2d2e1d6dc72c6
2022-03-02 19:29:48 +01:00
Matthias Sohn 9244c07d73 Add a typed config getter for integers confined to a range
Use Integer#MIN_VALUE to denote unset option.

Change-Id: I4d65f2434013111f25520c0ed2b9a9dc8123c6cf
2022-03-02 19:28:14 +01:00
Rolf Theunissen 504001228b PushCommand: consider push.default when no RefSpecs are given
When no RefSpecs are given, PushCommand until now simply fell back to
pushing the current branch to an upstream branch of the same name. This
corresponds to push.default=current. Any setting from the git config
for push.default was simply ignored.

Implement the other modes (nothing, matching, upstream, and simple),
too. Add a setter and getter for the PushDefault so that an application
can force a particular mode to be used. For backwards compatibility,
use "current" as the default setting; to figure out the value from the
git config, which defaults to "simple", call setPushDefault(null).

Bug: 351314
Change-Id: I86c5402318771e47d80b137e99947762e1150bb4
Signed-off-by: Rolf Theunissen <rolf.theunissen@gmail.com>
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2022-02-14 10:45:15 +01:00
Thomas Wolf 3444a3be8c Factor out parsing git-style size numbers to StringUtils
Move the code to parse numbers with an optional 'k', 'm', or 'g' suffix
from the config file handling to StringUtils. This enables me to re-use
it in EGit, which has duplicate code in StorageSizeFieldEditor.

As this is generally useful functionality, providing it in the library
makes sense.

Change-Id: I86e4f5f62e14f99b35726b198ba3bbf1669418d9
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-10-30 23:05:22 +02:00
Thomas Wolf fc6fe793ce Don't rely on an implicit default character set
JEP 400 (Java 18) will change the default character set to UTF-8
unconditionally.[1] Introduce SystemReader.getDefaultCharset() that
provides the locale-dependent charset the way JEP 400 recommends.

Change all code locations using Charset.defaultCharset() to use the
new SystemReader method instead.

[1] https://openjdk.java.net/jeps/400

Change-Id: I986f97a410d2fc70748b6f93228a2d45ff100b2c
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-10-26 17:49:20 -04:00
Julian Ruppel e2798413f9 Support commit.template config property
Adds functionality to read the git commit.template property. The
template content is read either via a default encoding or, if present,
via encoding specified by i18n.commitEncoding property.

Bug: 446355

Change-Id: I0c45db98e324ddff26a7e0262835f259d6528a86
Signed-off-by: Julian Ruppel <julian.ruppel@sap.com>
2021-07-21 15:50:02 -04:00
Matthias Sohn d46af8c69d Merge branch 'stable-5.12'
* stable-5.12:
  Retry loose object read upon "Stale file handle" exception
  Ignore missing javadoc in test bundles

Change-Id: I67c613c066a3252f9b0d0a3dcc026b57e10bfe1d
2021-06-26 16:37:59 +02:00
Matthias Sohn 8bd0161c83 Merge branch 'stable-5.11' into stable-5.12
* stable-5.11:
  Retry loose object read upon "Stale file handle" exception
  Ignore missing javadoc in test bundles

Change-Id: Ia4dc886c920cec3c9da86e1a90a0af68bd016b4f
2021-06-26 16:36:55 +02:00
Matthias Sohn e6ace4a96d Merge branch 'stable-5.10' into stable-5.11
* stable-5.10:
  Retry loose object read upon "Stale file handle" exception
  Ignore missing javadoc in test bundles

Change-Id: Ia385fa6b5d2fee64476793e06860a279bf2f6e36
2021-06-26 16:35:20 +02:00
Matthias Sohn bbb1c7f645 Merge branch 'stable-5.9' into stable-5.10
* stable-5.9:
  Retry loose object read upon "Stale file handle" exception
  Ignore missing javadoc in test bundles

Change-Id: I56fc2c47193a891285a705d44b3507f23982dc8a
2021-06-25 22:23:47 +02:00
Fabio Ponciroli 6976a30f44 searchForReuse might impact performance in large repositories
The search for reuse phase for *all* the objects scans *all*
the packfiles, looking for the best candidate to serve back to the
client.

This can lead to an expensive operation when the number of
packfiles and objects is high.

Add parameter "pack.searchForReuseTimeout" to limit the time spent
on this search.

Change-Id: I54f5cddb6796fdc93ad9585c2ab4b44854fa6c48
2021-06-25 17:57:59 +02:00
Antonio Barone 24d6d60538 Retry loose object read upon "Stale file handle" exception
When reading loose objects over NFS it is possible that the OS syscall
would fail with ESTALE errors: This happens when the open file
descriptor no longer refers to a valid file.

Notoriously it is possible to hit this scenario when git data is shared
among multiple clients, for example by multiple gerrit instances in HA.

If one of the two clients performs a GC operation that would cause the
packing and then the pruning of loose objects, the other client might
still hold a reference to those objects, which would cause an exception
to bubble up the stack.

The Linux NFS FAQ[1] (at point A.10), suggests that the proper way to
handle such ESTALE scenarios is to:

"[...] close the file or directory where the error occurred, and reopen
it so the NFS client can resolve the pathname again and retrieve the new
file handle."

In case of a stale file handle exception, we now attempt to read the
loose object again (up to 5 times), until we either succeed or encounter
a FileNotFoundException, in which case the search can continue to
Packfiles and alternates.

The limit of 5 provides an arbitrary upper bounds that is consistent to
the one chosen when handling stale file handles for packed-refs
files (see [2] for context).

[1] http://nfs.sourceforge.net/
[2] https://git.eclipse.org/r/c/jgit/jgit/+/54350

Bug: 573791
Change-Id: I9950002f772bbd8afeb9c6108391923be9d0ef51
2021-06-24 23:52:22 +02:00
Thomas Wolf 10ac449911 ApplyCommand: support binary patches
Implement applying binary patches. Handles both literal and delta
patches. Note that C git also runs binary files through the clean
and smudge filters. Implement the same safeguards against corrupted
patches as in C git: require the full OIDs to be present in the patch
file, and apply a binary patch only if both pre- and post-image hashes
match.

Add tests for applying literal and delta patches.

Bug: 371725
Change-Id: I71dc214fe4145d7cc8e4769384fb78c7d0d6c220
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-05-26 00:38:00 +02:00
Thomas Wolf 0fe794a433 ApplyCommand: add a stream to apply a delta patch
Add a new BinaryDeltaInputStream that applies a delta provided by
another InputStream to a given base. Because delta application needs
random access to the base, the base itself cannot be yet another
InputStream. But at least this enables streaming of the result.

Add a simple test using delta hunks generated by C git.

Bug: 371725
Change-Id: Ibd26fa2f49860737ad5c5387f7f4870d3e85e628
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-05-26 00:37:59 +02:00