Commit Graph

2928 Commits

Author SHA1 Message Date
Matthias Sohn 044c5f215c Prepare 5.12.0-SNAPSHOT builds
Change-Id: Ifc72d3f3ac84b9c4055b95ec0093d877ffb09ab0
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-06-03 20:04:28 +02:00
Matthias Sohn 45a4c131ae JGit v5.12.0.202106021050-rc1
Change-Id: I622ee049f14f37504ff4a062f03d6fc25465d0ec
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-06-02 16:49:17 +02:00
Matthias Sohn 1f733663bf Prepare 5.12.0-SNAPSHOT builds
Change-Id: I25e4efc9b40ae4e7168b37385445c73992c5beb0
2021-06-02 08:47:28 +02:00
Matthias Sohn 94aa245023 JGit v5.12.0.202106011439-rc1
Change-Id: Ieac1d02879defe0f4791062448d4efc328a2f652
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-06-01 20:38:31 +02:00
Han-Wen NIenhuys 6dc3506b52 Merge "Skip detecting content renames for binary files" 2021-05-31 08:57:13 -04:00
Youssef Elghareeb 1788b72d1a Skip detecting content renames for binary files
This is similar to change Idbc2c29bd that skipped detecting content
renames for large files. With this change, we added a new option in
RenameDetector called "skipContentRenamesForBinaryFiles", that when set,
causes binary files with any slight modification to be identified as
added/deleted. The default for this boolean is false, so preserving
current behaviour.

Change-Id: I4770b1f69c60b1037025ddd0940ba86df6047299
2021-05-31 13:48:37 +02:00
Ivan Frade 0667b8ec4d RepoCommand: Do not set 'branch' if the revision is a tag
The "branch" field in the .gitmodules is the signal for gerrit to keep
the superproject autoupdated. Tags are immutable and there is no need to
track them, plus the cgit client requires the field to be a "remote
branch name" but not a tag.

Do not set the "branch" field if the revision is a tag. Keep those tags
in another field ("ref") as they help other tools to find the commit in
the destination repository.

We can still have false negatives when a refname is not fully qualified,
but this check covers e.g. the most common case in android.

Note that the javadoc of #setRecordRemoteBranch already mentions that
"submodules that request a tag will not have branch name recorded".

Change-Id: Ib1c321a4d3b7f8d51ca2ea204f72dc0cfed50c37
Signed-off-by: Ivan Frade <ifrade@google.com>
2021-05-26 14:32:04 +02:00
Thomas Wolf 1126f26d21 ApplyCommand: fix "no newline at end" detection
Check the last line of the last hunk of a file, not the last line of
the whole patch.

Note that C git only checks that this line starts with "\ " and is at
least 12 characters long because of possible different texts when non-
English messages are used.

Change-Id: I0db81699eb3e99ed7b536a3e2b8dc97df1f58a89
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-05-26 00:38:00 +02:00
Thomas Wolf 2a0295ccfd ApplyCommand: handle completely empty context lines in text patches
C git treats completely empty lines as empty context lines (which
traditionally have a single blank). Apparently newer GNU diff may
produce such lines; see [1]. ("Newer" meaning "since 2006"...)

[1] https://github.com/git/git/commit/b507b465f7831

Change-Id: I80c1f030edb17a46289b1dabf11a2648d2660d38
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-05-26 00:38:00 +02:00
Thomas Wolf 76b76a6048 ApplyCommand: use byte arrays for text patches, not strings
Instead of converting the patch bytes to strings apply the patch on
byte level, like C git does. Converting the input lines and the hunk
lines from bytes to strings and then applying the patch based on
strings may give surprising results if a patch converts a text file
from one encoding to another. Moreover, in the end we don't know which
encoding to use to write the result.

Previous code just wrote the result as UTF-8, which forcibly changed
the encoding if the original input had some other encoding (even if the
patch had the same non-UTF-8 encoding). It was also wrong if the input
was UTF-8, and the patch should have changed the encoding to something
else.

So use ByteBuffers instead of Strings. This has the additional advantage
that all these ByteBuffers can share the underlying byte arrays of the
input and of the patch, so it also reduces memory consumption.

Change-Id: I450975f2ba0e7d0bec8973e3113cc2e7aea187ee
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-05-26 00:38:00 +02:00
Thomas Wolf 10ac449911 ApplyCommand: support binary patches
Implement applying binary patches. Handles both literal and delta
patches. Note that C git also runs binary files through the clean
and smudge filters. Implement the same safeguards against corrupted
patches as in C git: require the full OIDs to be present in the patch
file, and apply a binary patch only if both pre- and post-image hashes
match.

Add tests for applying literal and delta patches.

Bug: 371725
Change-Id: I71dc214fe4145d7cc8e4769384fb78c7d0d6c220
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-05-26 00:38:00 +02:00
Thomas Wolf 0fe794a433 ApplyCommand: add a stream to apply a delta patch
Add a new BinaryDeltaInputStream that applies a delta provided by
another InputStream to a given base. Because delta application needs
random access to the base, the base itself cannot be yet another
InputStream. But at least this enables streaming of the result.

Add a simple test using delta hunks generated by C git.

Bug: 371725
Change-Id: Ibd26fa2f49860737ad5c5387f7f4870d3e85e628
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-05-26 00:37:59 +02:00
Thomas Wolf 2eb54afe6a ApplyCommand: add streams to read/write binary patch hunks
Add streams that can encode or decode git binary patch data on the fly.
Git writes binary patches base-85 encoded, at most 52 un-encoded bytes,
with the unencoded data length prefixed in a one-character encoding, and
suffixed with a newline character.

Add a test for both the new input and the output stream. The test
roundtrips binary data of different lengths in different ways.

Bug: 371725
Change-Id: Ic3faebaa4637520f5448b3d1acd78d5aaab3907a
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-05-26 00:37:59 +02:00
Thomas Wolf 501fc0dadd ApplyCommand: add a base-85 codec
Add an implementation for base-85 encoding and decoding [1]. Git binary
patches use this format.

Base-85 encoding assembles bytes as 32-bit MSB values, then converts
these values to base-85 numbers (always 5 bytes) encoded as printable
ASCII characters. Decoding base-85 is the reverse operation. Note
that decoding may overflow on invalid input as 85^5 > 2^32. Encodings
always have a length that is a multiple of 5. If input length is not
divisible by 4, padding bytes are (logically) added, which are ignored
when decoding. The encoding for n bytes has thus always exactly length
(n + 3) / 4 * 5 in integer arithmetic (truncating division).

Includes tests.

[1] https://datatracker.ietf.org/doc/html/rfc1924

Bug: 371725
Change-Id: Ib5b9a503cd62cf70e080a4fb38c8cd1eeeaebcfe
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-05-26 00:37:45 +02:00
Thomas Wolf d2846cc8b2 ApplyCommand: convert to git internal format before applying patch
Applying a patch on Windows failed if the patch had the (normal)
single-LF line endings, but the file on disk had the usual Windows
CR-LF line endings.

Git (and JGit) compute diffs on the git-internal blob, i.e., after
CR-LF transformation and clean filtering. Applying patches to files
directly is thus incorrect and may fail if CR-LF settings don't
match, or if clean/smudge filtering is involved.

Change ApplyCommand to run the file content through the check-in
filters before applying the patch, and run the result through the
check-out filters. This makes patch application succeed even if the
patch has single-LFs, but the file has CR-LF and core.autocrlf is
true.

Add tests for various combinations of line endings in the file and in
the patch, and a test to verify the clean/smudge handling.

See also [1].

Running the file though clean/smudge may give strange results with
LFS-managed files. JGit's DiffFormatter has some extra code and
applies the smudge filter again after having run the file through
the check-in filters (CR-LF and clean). So JGit can actually produce
a diff on LFS-managed files using the normal diff machinery. (If it
doesn't run out of memory, that is. After all, LFS is intended for
_large_ files.) How such a diff would be applied with either C git
or JGit is entirely unclear; neither has any code for this special
case. Compare also [2].

Note that C git just doesn't know about LFS and always diffs after
the check-in filter chain, so for LFS files, it'll produce a diff
of the LFS pointers.

[1] https://github.com/git/git/commit/c24f3abac
[2] https://github.com/git-lfs/git-lfs/issues/440

Bug: 571585
Change-Id: I8f71ff26313b5773ff1da612b0938ad2f18751f5
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-05-18 17:23:34 +02:00
Matthias Sohn e6192c56af Add a cgit interoperability test for LockFile
Change-Id: I30cacd1f50f8f4ff4dd91ad291bf279980e3c4b5
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-05-09 22:49:19 +02:00
Thomas Wolf a9579ba60c LockFile: create OutputStream only when needed
Don't create the stream eagerly in lock(); that may cause JGit to
exceed OS or JVM limits on open file descriptors if many locks need
to be created, for instance when creating many refs. Instead create
the output stream only when one really needs to write something.

Bug: 573328
Change-Id: If9441ed40494d46f594a896d34a5c4f56f91ebf4
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-05-07 12:10:47 +02:00
Thomas Wolf 8210f29fe4 Implement ours/theirs content conflict resolution
Git has different conflict resolution strategies:

* There is a tree merge strategy "ours" which just ignores any changes
  from theirs ("-s ours"). JGit also has the mirror strategy "theirs"
  ignoring any changes from "ours". (This doesn't exist in C git.)
  Adapt StashApplyCommand and CherrypickCommand to be able to use those
  tree merge strategies.
* For the resolve/recursive tree merge strategies, there are content
  conflict resolution strategies "ours" and "theirs", which resolve
  any conflict hunks by taking the "ours" or "theirs" hunk. In C git
  those correspond to "-Xours" or -Xtheirs". Implement that in
  MergeAlgorithm, and add API to set and pass through such a strategy
  for resolving content conflicts.
* The "ours/theirs" content conflict resolution strategies also apply
  for binary files. Handle these cases in ResolveMerger.

Note that the content conflict resolution strategies ("-X ours/theirs")
do _not_ apply to modify/delete or delete/modify conflicts. Such
conflicts are always reported as conflicts by C git. They do apply,
however, if one side completely clears a file's content.

Bug: 501111
Change-Id: I2c9c170c61c440a2ab9c387991e7a0c3ab960e07
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-04-19 01:52:19 +02:00
Thomas Wolf fd03e40256 Fix typo in test method name
Change-Id: I34718829435daf8ded4ce596c824dd3cfbafbaf6
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-04-09 11:31:56 +02:00
Marija Savtchouk 7ceb61494b Allow file mode conflicts in virtual base commit on recursive merge.
Similar to https://git.eclipse.org/r/c/jgit/jgit/+/175166, ignore
path that have conflicts on attributes, so that the virtual base could
be used by RecursiveMerger.

Change-Id: I99c95445a305558d55bbb9c9e97446caaf61c154
Signed-off-by: Marija Savtchouk <mariasavtchouk@google.com>
2021-04-06 09:33:04 +01:00
Adithya Chakilam 0bd2f4bf77 Introduce getMergedInto(RevCommit commit, Collection<Ref> refs)
In cases where we need to determine if a given commit is merged
into many refs, using isMergedInto(base, tip) for each ref would
cause multiple unwanted walks.

getMergedInto() marks the unreachable commits as uninteresting
which would then avoid walking that same path again.

Using the same api, also introduce isMergedIntoAny() and
isMergedIntoAll()

Change-Id: I65de9873dce67af9c415d1d236bf52d31b67e8fe
Signed-off-by: Adithya Chakilam <quic_achakila@quicinc.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-03-14 13:45:29 +01:00
Youssef Elghareeb 4a78d911c5 Skip detecting content renames for large files
There are two code paths for detecting renames: one on tree diffs
(using DiffFormatter#scan) and the other on single file diffs (using
DiffFormatter#format). The latter skips binary and large files
for rename detection - check [1], but the former doesn't.

This change skips content rename detection for the tree diffs case for
large files. This is essential to avoid expensive computations while
reading the file, especially for callers who don't want to pay that
cost. Content renames are those which involve files with slightly
modified content. Exact renames will still be identified.

The default threshold for file sizes is reused from
PackConfig.DEFAULT_BIG_FILE_THRESHOLD: 50 MB.

[1] 232876421d/org.eclipse.jgit/src/org/eclipse/jgit/diff/RawText.java (386)

Change-Id: Idbc2c29bd381c6e387185204638f76fda47df41e
Signed-off-by: Youssef Elghareeb <ghareeb@google.com>
2021-03-14 11:38:13 +01:00
Matthias Sohn 232876421d Prepare 5.12.0-SNAPSHOT builds
Change-Id: I736de7c3deb11da75777d459f47332df0b486443
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-03-10 16:34:28 +01:00
Matthias Sohn 1f368f8867 Prepare 5.11.1-SNAPSHOT builds
Change-Id: I94628ccbb5099a65aa4345cfd28a141ff5555b68
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-03-09 23:42:31 +01:00
Matthias Sohn 30b6887d44 JGit v5.11.0.202103091610-r
Change-Id: I8e6855eaf7228459f492036feb4e34ca085698a7
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-03-09 22:10:22 +01:00
Nasser Grainawi 2a6b2eddcf PackFile: Add id + ext based constructors
Add new constructors to PackFile to improve a common use case where
callers know the directory, id, and extension, but previously needed to
construct a valid file name (with prefix, '.', etc) to create a
PackFile. Most callers can use the variant that has id as an ObjectId,
but provide an id as String variant too.

Change-Id: I39e4466abe8c9509f5916d5bfe675066570b8585
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
2021-03-07 00:02:56 +01:00
Martin Fick 6167641834 Restore preserved packs during missing object seeks
Provide a recovery path for objects being referenced during the pack
pruning race. Due to the pack pruning race, it is possible for objects
to become referenced after a pack has been deemed safe to prune, but
before it actually gets pruned. If this happened previously, the newly
referenced objects would be missing and potentially result in a
corrupted ref.

Add the ability to recover from this situation when an object is missing
but happens to still be available in a pack in the "preserved"
directory. This is likely only useful when used in conjunction with the
--preserve-old-packs GC option, which prunes packs by hard-linking to
the preserved directory. If an object is missing and found in a pack in
the preserved directory, immediately recover that pack and its
associated files (idx, bitmaps...) by moving them back to the original
pack directory, and then retry the operation that would have failed due
to the missing object. This retry can now succeed and the repository
may avoid corruption. This approach should drastically reduce the
chance of a corrupt repository during pack pruning at very little extra
cost. This extra cost should only be incurred when objects are missing
and a failure would normally occur.

Change-Id: I2a704e3276b88cc892159d9bfe2455c6eec64252
Signed-off-by: Martin Fick <quic_mfick@quicinc.com>
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
2021-03-04 22:31:40 +01:00
Nasser Grainawi 7fbff35887 Pack: Replace extensions bitset with bitmapIdx PackFile
The only extension that was ever consulted from the bitmap was the
bitmap index. We can simplify the Pack code as well as the code of
all the callers if we focus on just that usage.

Change-Id: I799ddfdee93142af67ce5081d14a430d36aa4c15
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
2021-03-04 22:25:48 +01:00
Nasser Grainawi 971dafd302 Create a PackFile class for Pack filenames
The PackFile class is intended to be a central place to do all
common pack filename manipulation and parsing to help reduce repeated
code and bugs. Use the PackFile class in the Pack class and in many
tests to ensure it works well in a variety of situations. Later changes
will expand use of PackFiles to even more areas.

Change-Id: I921b30f865759162bae46ddd2c6d669de06add4a
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-03-04 22:19:36 +01:00
Thomas Wolf 40d6eda3f1 HTTP: cookie file stores expiration in seconds
A cookie file stores the expiration in seconds since the Linux Epoch,
not in milliseconds. Correct reading and writing cookie files; with
a backwards-compatibility hack to read files that contain a millisecond
timestamp.

Add a test, and fix tests not to rely on the actual current time so
that they will also run successfully after 2030-01-01 noon.

Bug: 571574
Change-Id: If3ba68391e574520701cdee119544eedc42a1ff2
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-03-03 00:26:51 +01:00
Matthias Sohn 927deed5a5 init: add config option to set default for the initial branch name
We introduced the option --initial-branch=<branch-name> to allow
initializing a new repository with a different initial branch.

To allow users to override the initial branch name more permanently
(i.e. without having to specify the name manually for each 'git init'),
introduce the 'init.defaultBranch' option.

This option was added to git in 2.28.0.

See https://git-scm.com/docs/git-config#Documentation/git-config.txt-initdefaultBranch

Bug: 564794
Change-Id: I679b14057a54cd3d19e44460c4a5bd3a368ec848
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-02-22 23:11:45 +01:00
Matthias Sohn cb8924a80d init: allow specifying the initial branch name for the new repository
Add option --initial-branch/-b to InitCommand and the CLI init command.
This is the first step to implement support for the new option
init.defaultBranch. Both were added to git in release 2.28.

See https://git-scm.com/docs/git-init#Documentation/git-init.txt--bltbranch-namegt

Bug: 564794
Change-Id: Ia383b3f90b5549db80f99b2310450a7faf6bce4c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-02-22 23:11:45 +01:00
Jonathan Nieder f1312b4a90 Merge "Rename PackFile to Pack" 2021-02-18 17:04:07 -05:00
Matthias Sohn 3b94ba6c24 Fix boxing warnings
Change-Id: Idf4887a99e87c375ec32e2fd289cfce82d78cbce
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-02-17 16:24:34 -05:00
Nasser Grainawi efb154fc24 Rename PackFile to Pack
Pack better represents the purpose of the object and paves the way to
add a PackFile object that extends File.

Change-Id: I39b4f697902d395e9b6df5e8ce53078ce72fcea3
Signed-off-by: Nasser Grainawi <quic_nasserg@quicinc.com>
2021-02-10 22:46:15 -07:00
Marija Savtchouk 1b9911d9ae Allow dir/file conflicts in virtual base commit on recursive merge.
If RecursiveMerger finds multiple base commits, it tries to compute
the virtual ancestor to use as a base for the three way merge.
Currently, the content conflicts between ancestors are ignored (file
staged with the conflict markers). If the path is a file in one ancestor
and a dir in the other, it results in NoMergeBaseException
(CONFLICTS_DURING_MERGE_BASE_CALCULATION).

Allow these conflicts by ignoring this unmerged path in the virtual
base. The merger will compute diff in the children instead and it
can be further fixed manually if needed.

Change-Id: Id59648ae1d6bdf300b26fff513c3204317b755ab
Signed-off-by: Marija Savtchouk <mariasavtchouk@google.com>
2021-02-09 15:26:03 +00:00
David Ostrovsky 7755645777 Bazel: Remove unused resources variable
Change-Id: Iac2e547791929c26027ab4730ceac6177899ccf1
Signed-off-by: David Ostrovsky <david@ostrovsky.org>
2021-02-07 23:06:25 +01:00
Adithya Chakilam c7685003d8 Fix DateRevQueue tie breaks with more than 2 elements
DateRevQueue is expected to give out the commits that have higher
commit time. But in case of tie(same commit time), it should give
the commit that is inserted first. This is inferred from the
testInsertTie test case written for DateRevQueue. Also that test
case, right now uses just two commits which caused it not to fail
with the current implementation, so added another commit to make
the test more robust.

By fixing the DateRevQueue, we would also match the behaviour of
LogCommand.addRange(c1,c2) with git log c1..c2. A test case for
the same is added to show that current behaviour is not the
expected one.

By fixing addRange(), the order in which commits are applied during
a rebase is altered. Rebase logic should have never depended upon
LogCommand.addRange() since the intended order of addRange() is not
the order a rebase should use. So, modify the RebaseCommand to use
RevWalk directly with TopoNonIntermixSortGenerator.

Add a new LogCommandTest.addRangeWithMerge() test case which creates
commits in the following order:

         A - B - C - M
              \     /
                -D-

Using git 2.30.0, git log B..M outputs:  M C D
LogCommand.addRange(B, M) without this fix outputs: M D C
LogCommand.addRange(B, M) with this fix outputs: M C D

Change-Id: I30cc3ba6c97f0960f64e9e021df96ff276f63db7
Signed-off-by: Adithya Chakilam <achakila@codeaurora.org>
2021-02-07 06:09:48 -05:00
Matthias Sohn 8ad53baa14 Fix bazel tests broken by classes moved in dbd05433
Change-Id: I88a3547c4b52bcf28c0f0f548ba1bb41a7787704
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-02-04 01:19:42 +01:00
Thomas Wolf 91ddc0e284 IO: fix IO.readFully(InputStream, byte[], int)
This would run into an endless loop if the offset given was not zero.
Fix the logic to exit the read loop when the buffer is full.

Luckily all existing uses of this method call it only with offset zero.

Change-Id: I0ec2a4fb43efe4a605d06ac2e88cf155d50e2f1e
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-01-31 10:31:10 +01:00
Jonathan Nieder 59420886e9 Merge "Move reachability checker generation into the ObjectReader object" 2021-01-29 01:52:13 -05:00
Terry Parker dbd05433ec Move reachability checker generation into the ObjectReader object
Reachability checkers are retrieved from RevWalk and ObjectWalk objects:
* RevWalk.createReachabilityChecker()
* ObjectWalk.createObjectReachabilityChecker()

Since RevWalks and ObjectWalks are themselves directly instantiated
in hundreds of places (e.g. UploadPack...) overriding them in a
consistent way requires overloading 100s of methods, which isn't
feasible. Moving reachability checker generation to a more central
place solves that problem.

The ObjectReader object seems a good place from which to get
reachability checkers, because reachability checkers return
information about relationships between objects. ObjectDatabases
delegate many operations to ObjectReaders, and reachability bitmaps
are attached to ObjectReaders.

The Bitmapped and Pedestrian reachability checker objects were
package private in the org.eclipse.jgit.revwalk package. This change
makes them public and moves them to the
org.eclipse.jgit.internal.revwalk package. Corresponding tests are
also moved.

Motivation:
1) Reachability checking algorithms need to scale. One of the
   internal Android repositories has ~2.4 million refs/changes/*
   references, causing bad long tail performance in reachability
   checks.
2) Reachability check performance is impacted by repository
   topography: number of refs, number of objects, amounts of
   related vs. unrelated history.
3) Reachability check performance is also affected by per-branch
   access (Gerrit branch permissions) since different users can
   see different branches.
4) Reachability check performance isn't affected by any state in a
   RevWalk or ObjectWalk.

I don't yet know if a single algorithm will work for all cases in #2
and #3. We may need to evolve the ReachabilityChecker interfaces
over time to solve the Gerrit branch permissions case, or use
Gerrit-specific identity information to solve that in an efficient
way.

This change takes the existing public API and moves it to the
ObjectReader/whole repository level, which is where we can do
consistent customizations for #2 and #3. We intend to upstream the
best of whatever works, but anticipate the need for multiple rounds
of experimentation.

Change-Id: I9185feff43551fb387957c436112d5250486833d
Signed-off-by: Terry Parker <tparker@google.com>
2021-01-28 22:17:26 -08:00
Youssef Elghareeb 6f82690aaf Add the "compression-level" option to all ArchiveCommand formats
Different archive formats support a compression level in the range
[0-9]. The value 0 is for lowest compressions and 9 for highest. Highest
levels produce output files of smaller sizes but require more memory to
do the compression.

This change allows passing a "compression-level" option to the git
archive command and implements using it for different file formats.

Change-Id: I5758f691c37ba630dbac24db67bb7da827bbc8e1
Signed-off-by: Youssef Elghareeb <ghareeb@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2021-01-28 02:57:22 -05:00
Jonathan Tan c29ec3447d Merge changes I36d9b63e,I8c5db581,I2c02e89c
* changes:
  Compare getting all refs except specific refs with seek and with filter
  Add getsRefsByPrefixWithSkips (excluding prefixes) to ReftableDatabase
  Add seekPastPrefix method to RefCursor
2021-01-27 13:36:43 -05:00
Gal Paikin a6b90b7ec5 Add getsRefsByPrefixWithSkips (excluding prefixes) to ReftableDatabase
We sometimes want to get all the refs except specific prefixes,
similarly to getRefsByPrefix that gets all the refs of a specific
prefix.

We now create a new method that gets all refs matching a prefix except a
set of specific prefixes.

One use-case is for Gerrit to be able to get all the refs except
refs/changes; in Gerrit we often have lots of refs/changes, but very
little other refs. Currently, to get all the refs except refs/changes we
need to get all the refs and then filter the refs/changes, which is very
inefficient. With this method, we can simply skip the unneeded prefix so
that we don't have to go over all the elements.

RefDirectory still uses the inefficient implementation, since there
isn't a simple way to use Refcursor to achieve the efficient
implementation (as done in ReftableDatabase).

Signed-off-by: Gal Paikin <paiking@google.com>
Change-Id: I8c5db581acdeb6698e3d3a2abde8da32f70c854c
2021-01-27 02:22:45 -05:00
Gal Paikin 68b95afc70 Add seekPastPrefix method to RefCursor
This method will be used by the follow-up change. This useful if we want
to go over all the changes after a specific ref.

For example, the new method allows us to create a follow-up that would
go over all the refs until we reach a specific ref (e.g refs/changes/),
and then we use seekPastPrefix(refs/changes/) to read the rest of the refs,
thus basically we return all refs except a specific prefix.

When seeking past a prefix, the previous condition that created the
RefCursor still applies. E.g, if the cursor was created by
seekRefsWithPrefix, we can skip some refs but we will not return refs
that are not starting with this prefix.

Signed-off-by: Gal Paikin <paiking@google.com>
Change-Id: I2c02e89c877fe90da8619cb8a4a9a0c865f238ef
2021-01-26 21:47:28 +01:00
Thomas Wolf 84dbc2d431 TemporaryBuffer: fix toByteArray(limit)
Heap always copied whole blocks, which leads to AIOOBEs. LocalFile
didn't overwrite the method and thus caused NPEs.

Change-Id: Ia37d4a875df9f25d4825e6bc95fed7f0dff42afb
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-01-22 23:00:01 +01:00
Thomas Wolf dd3846513b Tag message must not include the signature
Signatures on tags are just tacked onto the end of the message.
Getting the message must not return the signature. Compare [1]
and [2] in C git, which both drop a signature at the end of an
object body.

[1] https://github.com/git/git/blob/21bf933/builtin/tag.c#L173
[2] https://github.com/git/git/blob/21bf933/ref-filter.c#L1276

Change-Id: Ic8a1062b8bc77f2d7c138c3fe8a7fd13b1253f38
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-01-10 10:19:40 -05:00
Thomas Wolf 0853a2410f Client-side protocol V2 support for fetching
Make all transports request protocol V2 when fetching. Depending on
the transport, set the GIT_PROTOCOL environment variable (file and
ssh), pass the Git-Protocol header (http), or set the hidden
"\0version=2\0" (git anon). We'll fall back to V0 if the server
doesn't reply with a version 2 answer.

A user can control which protocol the client requests via the git
config protocol.version; if not set, JGit requests protocol V2 for
fetching. Pushing always uses protocol V0 still.

In the API, there is only a new Transport.openFetch() version that
takes a collection of RefSpecs plus additional patterns to construct
the Ref prefixes for the "ls-refs" command in protocol V2. If none
are given, the server will still advertise all refs, even in protocol
V2.

BasePackConnection.readAdvertisedRefs() handles falling back to
protocol V0. It newly returns true if V0 was used and the advertised
refs were read, and false if V2 is used and an explicit "ls-refs" is
needed. (This can't be done transparently inside readAdvertisedRefs()
because a "stateless RPC" transport like TransportHttp may need to
open a new connection for writing.)

BasePackFetchConnection implements the changes needed for the protocol
V2 "fetch" command (stateless protocol, simplified ACK handling,
delimiters, section headers).

In TransportHttp, change readSmartHeaders() to also recognize the
"version 2" packet line as a valid smart server indication.

Adapt tests, and run all the HTTP tests not only with both HTTP
connection factories (JDK and Apache HttpClient) but also with both
protocol V0 and V2. The SSH tests are much slower and much more
focused on the SSH protocol and SSH key handling. Factor out two
very simple cloning and pulling tests and make those run with
protocol V2.

Bug: 553083
Change-Id: I357c7f5daa7efb2872f1c64ee6f6d54229031ae1
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-01-01 21:22:30 +01:00
Thomas Wolf 5b1a6e0e38 Fix NPE in DirCacheCheckout
If a file exists in head, merge, and the working tree, but not in
the index, and we're doing a force checkout, the checkout must be
an "update", not a "keep".

This is a follow-up on If3a9b9e60064459d187c7db04eb4471a72c6cece.

Bug: 569962
Change-Id: I59a7ac41898ddc1dd90e86b09b621a41fdf45667
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2020-12-30 10:51:14 +01:00