From Git commit 9c1e657a8f:
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at
least one application wherein this is useful - push negotiation.
Therefore, make it possible for this negotiation step to be done
independently.
This feature is for protocol v2 only.
In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for
the client to say "done". In practice, the client will never say it;
instead it will cease requests once it is satisfied.
Advertising the server capability option "wait-for-done" is behind the
transport config: uploadpack.advertisewaitfordone, which by default is
false.
Change-Id: I5ebd3e99ad76b8943597216e23ced2ed38eb5224
This is similar to change Idbc2c29bd that skipped detecting content
renames for large files. With this change, we added a new option in
RenameDetector called "skipContentRenamesForBinaryFiles", that when set,
causes binary files with any slight modification to be identified as
added/deleted. The default for this boolean is false, so preserving
current behaviour.
Change-Id: I4770b1f69c60b1037025ddd0940ba86df6047299
The "branch" field in the .gitmodules is the signal for gerrit to keep
the superproject autoupdated. Tags are immutable and there is no need to
track them, plus the cgit client requires the field to be a "remote
branch name" but not a tag.
Do not set the "branch" field if the revision is a tag. Keep those tags
in another field ("ref") as they help other tools to find the commit in
the destination repository.
We can still have false negatives when a refname is not fully qualified,
but this check covers e.g. the most common case in android.
Note that the javadoc of #setRecordRemoteBranch already mentions that
"submodules that request a tag will not have branch name recorded".
Change-Id: Ib1c321a4d3b7f8d51ca2ea204f72dc0cfed50c37
Signed-off-by: Ivan Frade <ifrade@google.com>
Check the last line of the last hunk of a file, not the last line of
the whole patch.
Note that C git only checks that this line starts with "\ " and is at
least 12 characters long because of possible different texts when non-
English messages are used.
Change-Id: I0db81699eb3e99ed7b536a3e2b8dc97df1f58a89
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
C git treats completely empty lines as empty context lines (which
traditionally have a single blank). Apparently newer GNU diff may
produce such lines; see [1]. ("Newer" meaning "since 2006"...)
[1] https://github.com/git/git/commit/b507b465f7831
Change-Id: I80c1f030edb17a46289b1dabf11a2648d2660d38
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Instead of converting the patch bytes to strings apply the patch on
byte level, like C git does. Converting the input lines and the hunk
lines from bytes to strings and then applying the patch based on
strings may give surprising results if a patch converts a text file
from one encoding to another. Moreover, in the end we don't know which
encoding to use to write the result.
Previous code just wrote the result as UTF-8, which forcibly changed
the encoding if the original input had some other encoding (even if the
patch had the same non-UTF-8 encoding). It was also wrong if the input
was UTF-8, and the patch should have changed the encoding to something
else.
So use ByteBuffers instead of Strings. This has the additional advantage
that all these ByteBuffers can share the underlying byte arrays of the
input and of the patch, so it also reduces memory consumption.
Change-Id: I450975f2ba0e7d0bec8973e3113cc2e7aea187ee
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Implement applying binary patches. Handles both literal and delta
patches. Note that C git also runs binary files through the clean
and smudge filters. Implement the same safeguards against corrupted
patches as in C git: require the full OIDs to be present in the patch
file, and apply a binary patch only if both pre- and post-image hashes
match.
Add tests for applying literal and delta patches.
Bug: 371725
Change-Id: I71dc214fe4145d7cc8e4769384fb78c7d0d6c220
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Add a new BinaryDeltaInputStream that applies a delta provided by
another InputStream to a given base. Because delta application needs
random access to the base, the base itself cannot be yet another
InputStream. But at least this enables streaming of the result.
Add a simple test using delta hunks generated by C git.
Bug: 371725
Change-Id: Ibd26fa2f49860737ad5c5387f7f4870d3e85e628
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add streams that can encode or decode git binary patch data on the fly.
Git writes binary patches base-85 encoded, at most 52 un-encoded bytes,
with the unencoded data length prefixed in a one-character encoding, and
suffixed with a newline character.
Add a test for both the new input and the output stream. The test
roundtrips binary data of different lengths in different ways.
Bug: 371725
Change-Id: Ic3faebaa4637520f5448b3d1acd78d5aaab3907a
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Add an implementation for base-85 encoding and decoding [1]. Git binary
patches use this format.
Base-85 encoding assembles bytes as 32-bit MSB values, then converts
these values to base-85 numbers (always 5 bytes) encoded as printable
ASCII characters. Decoding base-85 is the reverse operation. Note
that decoding may overflow on invalid input as 85^5 > 2^32. Encodings
always have a length that is a multiple of 5. If input length is not
divisible by 4, padding bytes are (logically) added, which are ignored
when decoding. The encoding for n bytes has thus always exactly length
(n + 3) / 4 * 5 in integer arithmetic (truncating division).
Includes tests.
[1] https://datatracker.ietf.org/doc/html/rfc1924
Bug: 371725
Change-Id: Ib5b9a503cd62cf70e080a4fb38c8cd1eeeaebcfe
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Update tests to record the number of events fired post-setup and only
assert for events fired during BatchRefUpdate.execute. For tests which
use writeLooseRef to setup refs, create new tests which assert the
number of RefsChangedEvent(s) rather than updating the existing ones
to call RefDirectory.exactRef as it changes the code path.
Change-Id: I0187811628d179d9c7e874c9bb8a7ddb44dd9df4
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
Applying a patch on Windows failed if the patch had the (normal)
single-LF line endings, but the file on disk had the usual Windows
CR-LF line endings.
Git (and JGit) compute diffs on the git-internal blob, i.e., after
CR-LF transformation and clean filtering. Applying patches to files
directly is thus incorrect and may fail if CR-LF settings don't
match, or if clean/smudge filtering is involved.
Change ApplyCommand to run the file content through the check-in
filters before applying the patch, and run the result through the
check-out filters. This makes patch application succeed even if the
patch has single-LFs, but the file has CR-LF and core.autocrlf is
true.
Add tests for various combinations of line endings in the file and in
the patch, and a test to verify the clean/smudge handling.
See also [1].
Running the file though clean/smudge may give strange results with
LFS-managed files. JGit's DiffFormatter has some extra code and
applies the smudge filter again after having run the file through
the check-in filters (CR-LF and clean). So JGit can actually produce
a diff on LFS-managed files using the normal diff machinery. (If it
doesn't run out of memory, that is. After all, LFS is intended for
_large_ files.) How such a diff would be applied with either C git
or JGit is entirely unclear; neither has any code for this special
case. Compare also [2].
Note that C git just doesn't know about LFS and always diffs after
the check-in filter chain, so for LFS files, it'll produce a diff
of the LFS pointers.
[1] https://github.com/git/git/commit/c24f3abac
[2] https://github.com/git-lfs/git-lfs/issues/440
Bug: 571585
Change-Id: I8f71ff26313b5773ff1da612b0938ad2f18751f5
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
* stable-5.9:
LockFile: create OutputStream only when needed
Remove ReftableNumbersNotIncreasingException
Fix stamping to produce stable file timestamps
Change-Id: I056382d1d93f3e0a95838bdd1f0be89711c8a722
Don't create the stream eagerly in lock(); that may cause JGit to
exceed OS or JVM limits on open file descriptors if many locks need
to be created, for instance when creating many refs. Instead create
the output stream only when one really needs to write something.
Bug: 573328
Change-Id: If9441ed40494d46f594a896d34a5c4f56f91ebf4
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Don't create the stream eagerly in lock(); that may cause JGit to
exceed OS or JVM limits on open file descriptors if many locks need
to be created, for instance when creating many refs. Instead create
the output stream only when one really needs to write something.
Bug: 573328
Change-Id: If9441ed40494d46f594a896d34a5c4f56f91ebf4
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Git has different conflict resolution strategies:
* There is a tree merge strategy "ours" which just ignores any changes
from theirs ("-s ours"). JGit also has the mirror strategy "theirs"
ignoring any changes from "ours". (This doesn't exist in C git.)
Adapt StashApplyCommand and CherrypickCommand to be able to use those
tree merge strategies.
* For the resolve/recursive tree merge strategies, there are content
conflict resolution strategies "ours" and "theirs", which resolve
any conflict hunks by taking the "ours" or "theirs" hunk. In C git
those correspond to "-Xours" or -Xtheirs". Implement that in
MergeAlgorithm, and add API to set and pass through such a strategy
for resolving content conflicts.
* The "ours/theirs" content conflict resolution strategies also apply
for binary files. Handle these cases in ResolveMerger.
Note that the content conflict resolution strategies ("-X ours/theirs")
do _not_ apply to modify/delete or delete/modify conflicts. Such
conflicts are always reported as conflicts by C git. They do apply,
however, if one side completely clears a file's content.
Bug: 501111
Change-Id: I2c9c170c61c440a2ab9c387991e7a0c3ab960e07
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Similar to https://git.eclipse.org/r/c/jgit/jgit/+/175166, ignore
path that have conflicts on attributes, so that the virtual base could
be used by RecursiveMerger.
Change-Id: I99c95445a305558d55bbb9c9e97446caaf61c154
Signed-off-by: Marija Savtchouk <mariasavtchouk@google.com>
In cases where we need to determine if a given commit is merged
into many refs, using isMergedInto(base, tip) for each ref would
cause multiple unwanted walks.
getMergedInto() marks the unreachable commits as uninteresting
which would then avoid walking that same path again.
Using the same api, also introduce isMergedIntoAny() and
isMergedIntoAll()
Change-Id: I65de9873dce67af9c415d1d236bf52d31b67e8fe
Signed-off-by: Adithya Chakilam <quic_achakila@quicinc.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
There are two code paths for detecting renames: one on tree diffs
(using DiffFormatter#scan) and the other on single file diffs (using
DiffFormatter#format). The latter skips binary and large files
for rename detection - check [1], but the former doesn't.
This change skips content rename detection for the tree diffs case for
large files. This is essential to avoid expensive computations while
reading the file, especially for callers who don't want to pay that
cost. Content renames are those which involve files with slightly
modified content. Exact renames will still be identified.
The default threshold for file sizes is reused from
PackConfig.DEFAULT_BIG_FILE_THRESHOLD: 50 MB.
[1] 232876421d/org.eclipse.jgit/src/org/eclipse/jgit/diff/RawText.java (386)
Change-Id: Idbc2c29bd381c6e387185204638f76fda47df41e
Signed-off-by: Youssef Elghareeb <ghareeb@google.com>