Bloom filter computation can be an expensive process and right now it
is invisible to the user.
Report progress while calculating bloom filters.
Log of GC with bloom filter enabled:
Computing commit-graph path bloom filters: 100% (9551/9551)
Computing commit-graph generation numbers: 100% (9551/9551)
Writing out commit-graph: 100% (9551/9551)
Change-Id: Ife65e63ac2c37d064d5f049a366cbb52c3ef6798
The ProgressMonitor task to track the calculation of generation
numbers is nested inside the task that follows the writing of all
lines in the commit-graph. ProgressMonitor doesn't support nested
tasks and this confuses the counting.
Move the start/end of the "writing commit graph" task to the
writeCommitData section, after calculating the generation
numbers. Make that task track by commits instead of by lines.
Moving the start/end of the progress task to the chunk-writing
functions is clearer and easier to extend.
Logging of GC before:
Writing out commit-graph in 3 passes: 51% ( 9807/19358)
Computing commit-graph generation numbers: 100% (9551/9551)
Logging of GC after:
Computing commit-graph generation numbers: 100% (9551/9551)
Writing out commit-graph: 100% (9551/9551)
Change-Id: I87d69c06c9a3c7e75be12b6f0d1a63b5924e298a
When the exception is thrown, we don't know if it is because the
stream didn't have data or had a wrong header.
Log the read bytes to differentiate these cases.
Change-Id: Ie7612eab39016f5ad7f1bfb2e07cab972dab796f
When multiple branches were to be removed, the git config was updated
after each and every branch. Newly do so only once at the end, after all
branches have been deleted.
Because there may be an exception after some branches have already been
deleted, take care to update the config even if an exception is thrown.
Bug: 451508
Change-Id: I645be8a1a59a1476d421e46933c3f7cbd0639fec
Signed-off-by: Thomas Wolf <twolf@apache.org>
This way we can avoid to access the byte buffers backing array.
Implement a ByteBufferInputStream to wrap a byte buffer which we can use
to expose the filter result as an input stream.
Change-Id: I461c82090de2562ea9b649b3f953aad4571e3d25
This should avoid stale lock files if the JVM is terminated gracefully.
Implement a ShutdownHook which can register/unregister listeners which
need to do some cleanup during graceful JVM shutdown. This hook is
registered as a Java shutdown hook and when the JVM shuts down
calls #onShutdown of registered listeners using a parallel stream
to let them run concurrently.
See https://docs.oracle.com/javase/8/docs/technotes/guides/lang/hook-design.html
Bug: 582379
Change-Id: I1621dc5f7d9a8c832b6d1b74cbc47578b1c2f0b8
When checking out a file into the working tree ensure that all parent
directories of the file below the working tree root are actually
directories and do exist before we try to create the file.
When multiple files are to be checked out (or even a whole tree), this
may check the same directories over and over again. Asking the file
system every time for file attributes is a potentially expensive
operation. As a remedy, introduce an in-memory cache of directory
states for a particular check-out operation.
Apply the same fix also in the ResolveMerger, which may also check out
files, and also in the PatchApplier. In PatchApplier, also validate
paths.
Change-Id: Ie12864c54c9f901a2ccee7caddec73027f353111
Signed-off-by: Thomas Wolf <twolf@apache.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The new version 1 file-based reverse index has a footer with the
checksum of the corresponding pack file and a checksum of its own
contents. The initial implementation doesn't enforce that the pack
checksum matches the checksum found in the forward index nor that the
self checksum matches the contents of the file just read in.
Offer a method for reverse index users to verify the checksums in a way
appropriate to the version being used. For the pre-existing computed
version, always succeed since it is not based on a file so there is no
possibility of corruption.
Check for corruption of the file itself during parsing the checksum
footer, by comparing the self checksum with the digest of the file
contents read.
Change-Id: I87ff3933cf1afa76663350400b616695e4966cb6
Signed-off-by: Anna Papitto <annapapitto@google.com>
The reverse index for a pack is used to quickly find an object's
position in the pack's forward index based on that object's pack offset.
It is currently computed from the forward index by sorting the index
entries by the corresponding pack offset. This computation uses
insertion sort, which has an average runtime of O(n^2).
Cgit persists a pack reverse index file
to avoid recomputing the reverse index ordering. Instead they write a
file with format
https://git-scm.com/docs/pack-format#_pack_rev_files_have_the_format
which can later be read and parsed into the in-memory reverse index
each time it is needed.
PackReverseIndexV1 parses a reverse index file with the official
version 1 format into an in-memory representation of the reverse index
which implements methods to find an object's forward index position
from its offset in logorithmic time.
Change-Id: I60a92463fbd6a8cc9c1c7451df1c14d0a21a0f64
Signed-off-by: Anna Papitto <annapapitto@google.com>
The reverse index for a pack is used to quickly find an object's
position in the pack's forward index based on that object's pack offset.
It is currently computed from the forward index by sorting the index
entries by the corresponding pack offset. This computation uses
bucket sort with insertion sort, which has an average runtime of
O(n log n) and worst case runtime of O(n^2); and memory usage of
3*size(int)*n because it maintains 3 int arrays, even after sorting is
completed. The computation must be performed every time that the reverse
index object is created in memory.
In contrast, Cgit persists a pack reverse index file to avoid
recomputing the reverse index ordering every time that it is needed.
Instead they write a file with format
https://git-scm.com/docs/pack-format#_pack_rev_files_have_the_format
which can later be read and parsed into an in-memory reverse index each
time it is needed.
Introduce these reverse index files to JGit. PackReverseIndexWriter
writes out a reverse index file to be read later when needed. Subclass
PackReverseIndexWriterV1 writes a file with the official version 1
format.
To avoid temporarily allocating an Integer collection while sorting and
writing out the contents, using memory 4*size(Integer)*n, use an
IntList and its #sort method, which uses quicksort.
Change-Id: I6437745777a16f723e2f1cfcce4e0d94e599dcee
Signed-off-by: Anna Papitto <annapapitto@google.com>
1. For general errors, throw IOException instead of wrapping them with
PatchApplyException. The wrapping was moved (back) to ApplyCommand.
2. For file specific errors, log the errors as part of
PatchApplier::Result.
3. Change applyPatch() to receive the parsed Patch object, so the caller
can decide how to handle parsing errors.
Background: this utility class was extracted from ApplyCommand on V6.4.0.
During the extraction, we left the exception wrapping by
PatchApplyException intact. This attitude made it harder for the callers to
distinguish between the actual error causes.
Change-Id: Ib0f2b5e97a13df2339d8b65f2fea1c819c161ac3
* stable-6.4:
Use Java 11 ProcessHandle to get pid of the current process
Acquire file lock "gc.pid" before running gc
Silence API errors introduced by 9424052f
Change-Id: Ifa4e56b6ecca9305f3f1685e45450019bfc82e22
* stable-6.3:
Use Java 11 ProcessHandle to get pid of the current process
Acquire file lock "gc.pid" before running gc
Silence API errors introduced by 9424052f
Change-Id: Ic40dbab18616d8d9fe3820b9890c86652b80eb47
* stable-6.2:
Use Java 11 ProcessHandle to get pid of the current process
Acquire file lock "gc.pid" before running gc
Silence API errors introduced by 9424052f
Change-Id: I53cf9675deac0b588048d8224216d2a7e8bd16ec
* stable-6.1:
Use Java 11 ProcessHandle to get pid of the current process
Acquire file lock "gc.pid" before running gc
Silence API errors introduced by 9424052f
Change-Id: I0562a4a224779ccf1e4cc1ff8f5a352e55ab220a
* stable-6.0:
Use Java 11 ProcessHandle to get pid of the current process
Acquire file lock "gc.pid" before running gc
Silence API errors introduced by 9424052f
Change-Id: Ib9a2419253ffcbc90874adbfdb8129fee3178210
Git guards gc by locking a lock file "gc.pid" before starting execution.
The lock file contains the pid and hostname of the process holding the
lock. Git tries to kill the process holding that lock if the lock file
wasn't modified in the last 12 hours and was started from the same host.
Teach JGit to acquire this lock before running gc but skip execution if
another process already holds the lock. Killing the other process could
be undesired if it's a long running application.
If the lock file wasn't modified in the last 12 hours try to lock it and
run gc if locking succeeds.
Register a shutdown hook for the lock file to ensure it is cleaned up if
the process is gracefully killed.
Change-Id: I00b838dcbf4fb0d03863bf7a2cd86b743c6c6971
JGit knows how to read/write commit graphs but the DFS stack is not
using it yet.
The DFS garbage collector generates a commit-graph with commits
reachable from any ref. The pack is stored as extra stream in the GC
pack. DfsPackFile mimicks how other indices are loaded storing the
reference in DFS cache.
Signed-off-by: Xing Huang <xingkhuang@google.com>
Change-Id: I3f94997377986d21a56b300d8358dd27be37f5de
A ternary search tree is a type of tree where nodes are arranged in a
manner similar to a binary search tree, but with up to three children
rather than the binary tree's limit of two.
Each node of a ternary search tree stores a single character, a
reference to a value object and references to its three children named
equal kid, lo kid and hi kid. The lo kid pointer must point to a node
whose character value is less than the current node. The hi kid pointer
must point to a node whose character is greater than the current
node.[1] The equal kid points to the next character in the word. Each
node in a ternary search tree represents a prefix of the stored strings.
All strings in the middle subtree of a node start with that prefix.
Like other prefix trees, a ternary search tree can be used as an
associative map with the ability for incremental string search. Ternary
search trees are more space efficient compared to standard prefix trees,
at the cost of speed.
They allow efficient prefix search which is important to implement
searching refs by prefix in a RefDatabase.
Searching by prefix returns all keys if the prefix is an empty string.
Bug: 576165
Change-Id: If160df70151a8e1c1bd6716ee4968e4c45b2c7ac
This change makes JGit can read .git/objects/info/commit-graph file
and then get CommitGraph.
Loading a new commit-graph into memory requires additional time. After
testing, loading a copy of the Linux's commit-graph(1039139 commits)
is under 50ms.
Bug: 574368
Change-Id: Iadfdd6ed437945d3cdfdbe988cf541198140a8bf
Signed-off-by: kylezhao <kylezhao@tencent.com>
Git introduced a new file storing the topology and some metadata of
the commits in the repo (commitGraph). With this data, git can browse
commit history without parsing the pack, speeding up e.g.
reachability checks.
This change teaches JGit to read commit-graph-format file, following
the upstream format([1]).
JGit can read a commit-graph file from a buffered stream, which means
that we can provide this feature for both FileRepository and
DfsRepository.
[1] https://git-scm.com/docs/commit-graph-format/2.21.0
Bug: 574368
Change-Id: Ib5c0d6678cb242870a0f5841bd413ad3885e95f6
Signed-off-by: kylezhao <kylezhao@tencent.com>
PatchApplier now routes updates through the index. This has two
results:
* we can now execute patches in-memory.
* the JGit apply command will now always update the
index to match the working tree.
Change-Id: Id60a88232f05d0367787d038d2518c670cdb543f
Co-authored-by: Han-Wen Nienhuys <hanwen@google.com>
Co-authored-by: Nitzan Gur-Furman <nitzan@google.com>
This adds support for shallow cloning. The CloneCommand and the
FetchCommand now have the new methods setDepth, setShallowSince and
addShallowExclude to tell the server that the client doesn't want to
download the complete history.
Bug: 475615
Change-Id: Ic80fb6efb5474543ae59be590ebe385bec21cc0d
* stable-6.1:
Prepare 5.13.2-SNAPSHOT builds
JGit v5.13.1.202206130422-r
AmazonS3: Add support for AWS API signature version 4
Change-Id: Id4965aacd4e2ea1e8575a2c1bd4845729db6049a
* stable-6.0:
Prepare 5.13.2-SNAPSHOT builds
JGit v5.13.1.202206130422-r
AmazonS3: Add support for AWS API signature version 4
Change-Id: Ie9c38ab8033fe1283e8b444b6acd3f4298062bf3
* stable-5.13:
Prepare 5.13.2-SNAPSHOT builds
JGit v5.13.1.202206130422-r
AmazonS3: Add support for AWS API signature version 4
Change-Id: Ibd663a1d874d1aac274abc3dd44354fd99f64c39
Updating the AmazonS3 class to support AWS Signature version 4 because
version 2 is no longer supported in all AWS regions. The version can be
selected with the new 'aws.api.signature.version' property (defaults to
2 for backwards compatibility). When set to '4', the user must also
specify the AWS region via the 'region' property. The 'region' property
must match the region that the 'domain' property resolves to.
Bug: 579907
Change-Id: If289dbc6d0f57323cfeaac2624c4eb5028f78d13
Adds API that allows UI to find (and handle) diff/merge tools, specific
for the given path. The assumption is that user can specify file type
specific diff/merge tools via gitattributes.
Bug: 552840
Change-Id: I1daa091e9afa542a9ebb5417853dff0452ed52dd
Signed-off-by: Mykola Zakharchuk <zakharchuk.vn@gmail.com>
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Signed-off-by: Andre Bossert <andre.bossert@siemens.com>
Implement negative refspecs in JGit fetch, following C Git. Git
supports negative refspecs in source only while this change supports
them in both source and destination.
If one branch is equal to any branch or matches any pattern in the
negative refspecs collection, the branch will not be fetched even if
it's in the toFetch collection.
With this feature, users can express more complex patterns during fetch.
Change-Id: Iaa1cd4de5c08c273e198b72e12e3dadae7be709f
Sign-off-by: Yunjie Li<yunjieli@google.com>
If core.abbrev is unset or "auto" estimate abbreviation length like C
git does:
- Estimate repository's object count by only considering packed objects,
round up to next power of 2
- With the order of 2^len objects, we expect a collision at 2^(len/2).
But we also care about hex chars, not bits, and there are 4 bits per
hex. So all together we need to divide by 2; but we also want to round
odd numbers up, hence adding one before dividing.
- For small repos use at least 7 hexdigits
- If object database fails to determine object count use 7 hexdigits as
fallback
If it is set to "no" do not abbreviate object-ids.
Otherwise set it to the configured value capped to the range between 4
and length of an unabbreviated object-id.
Change-Id: I425f9724b69813dbb57872466bf2d2e1d6dc72c6
When no RefSpecs are given, PushCommand until now simply fell back to
pushing the current branch to an upstream branch of the same name. This
corresponds to push.default=current. Any setting from the git config
for push.default was simply ignored.
Implement the other modes (nothing, matching, upstream, and simple),
too. Add a setter and getter for the PushDefault so that an application
can force a particular mode to be used. For backwards compatibility,
use "current" as the default setting; to figure out the value from the
git config, which defaults to "simple", call setPushDefault(null).
Bug: 351314
Change-Id: I86c5402318771e47d80b137e99947762e1150bb4
Signed-off-by: Rolf Theunissen <rolf.theunissen@gmail.com>
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Move the code to parse numbers with an optional 'k', 'm', or 'g' suffix
from the config file handling to StringUtils. This enables me to re-use
it in EGit, which has duplicate code in StorageSizeFieldEditor.
As this is generally useful functionality, providing it in the library
makes sense.
Change-Id: I86e4f5f62e14f99b35726b198ba3bbf1669418d9
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
JEP 400 (Java 18) will change the default character set to UTF-8
unconditionally.[1] Introduce SystemReader.getDefaultCharset() that
provides the locale-dependent charset the way JEP 400 recommends.
Change all code locations using Charset.defaultCharset() to use the
new SystemReader method instead.
[1] https://openjdk.java.net/jeps/400
Change-Id: I986f97a410d2fc70748b6f93228a2d45ff100b2c
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Adds functionality to read the git commit.template property. The
template content is read either via a default encoding or, if present,
via encoding specified by i18n.commitEncoding property.
Bug: 446355
Change-Id: I0c45db98e324ddff26a7e0262835f259d6528a86
Signed-off-by: Julian Ruppel <julian.ruppel@sap.com>
The search for reuse phase for *all* the objects scans *all*
the packfiles, looking for the best candidate to serve back to the
client.
This can lead to an expensive operation when the number of
packfiles and objects is high.
Add parameter "pack.searchForReuseTimeout" to limit the time spent
on this search.
Change-Id: I54f5cddb6796fdc93ad9585c2ab4b44854fa6c48
When reading loose objects over NFS it is possible that the OS syscall
would fail with ESTALE errors: This happens when the open file
descriptor no longer refers to a valid file.
Notoriously it is possible to hit this scenario when git data is shared
among multiple clients, for example by multiple gerrit instances in HA.
If one of the two clients performs a GC operation that would cause the
packing and then the pruning of loose objects, the other client might
still hold a reference to those objects, which would cause an exception
to bubble up the stack.
The Linux NFS FAQ[1] (at point A.10), suggests that the proper way to
handle such ESTALE scenarios is to:
"[...] close the file or directory where the error occurred, and reopen
it so the NFS client can resolve the pathname again and retrieve the new
file handle."
In case of a stale file handle exception, we now attempt to read the
loose object again (up to 5 times), until we either succeed or encounter
a FileNotFoundException, in which case the search can continue to
Packfiles and alternates.
The limit of 5 provides an arbitrary upper bounds that is consistent to
the one chosen when handling stale file handles for packed-refs
files (see [2] for context).
[1] http://nfs.sourceforge.net/
[2] https://git.eclipse.org/r/c/jgit/jgit/+/54350
Bug: 573791
Change-Id: I9950002f772bbd8afeb9c6108391923be9d0ef51
Implement applying binary patches. Handles both literal and delta
patches. Note that C git also runs binary files through the clean
and smudge filters. Implement the same safeguards against corrupted
patches as in C git: require the full OIDs to be present in the patch
file, and apply a binary patch only if both pre- and post-image hashes
match.
Add tests for applying literal and delta patches.
Bug: 371725
Change-Id: I71dc214fe4145d7cc8e4769384fb78c7d0d6c220
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Add a new BinaryDeltaInputStream that applies a delta provided by
another InputStream to a given base. Because delta application needs
random access to the base, the base itself cannot be yet another
InputStream. But at least this enables streaming of the result.
Add a simple test using delta hunks generated by C git.
Bug: 371725
Change-Id: Ibd26fa2f49860737ad5c5387f7f4870d3e85e628
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>