Provide file helper methods in a reusable utility class to
replace many local implementations. java.io.File has some
methods reporting failure by returning false. We prefer to
throw IOException on failure so that callers can't forget
checking the return value.
Change-Id: I430c77b5d2cffcf8b47584326ad4817a7291845e
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Static accessors should come before a constructor.
Change-Id: Iee1051ce4f2038f19a08741e7a3a33f06a97a3c0
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
There are some files that need to exist so that the CLI can continue
after the rebase has been stopped due to conflicts
Change-Id: I3cb4dc98609c059bf0cf9fd5f9e47a9c681cea2d
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Currently the following can happen in LockFile.commit: deletion of the
original file succeeds but renaming fails afterwards. In this case the
original file (e.g. branch file in refs/heads) is lost.
To workaround the issue the same retry logic as for file deletion is
applied to file renaming.
Bug: 331890
Change-Id: I68620c07f2d3ab7f3279c71a91e184e8eac69832
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Signed-off-by: Philipp Thun <philipp.thun@sap.com>
Because tags are more interesting here than local or remote branch
heads, tags get sorted earlier in the array than heads or remotes do.
Bug: 324939
Change-Id: Ifc3863461654df7f34fdecbd2abe1f4b5d2ffb8e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Mathias Kinzler <mathias.kinzler@sap.com>
CC: Stefan Lay <stefan.lay@sap.com>
DirCacheCheckout needs to use ObjectLoader.copyTo to avoid loading the
complete content of a large file into the JVM heap.
Bug: 321097
Change-Id: I967590b6f233fd1c83d873075db01d653208b3b9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Chris Aniszczyk <caniszczyk@gmail.com>
CC: Christian Halstrick <christian.halstrick@sap.com>
If the environment variable GIT_SSH is set, use GIT_SSH for any remote
protocol connections, instead of the local JSch library.
Bug: 321062
Change-Id: Ia18ea49d58f3ed657430067f1f72ef788a2dae4c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In order to honor GIT_SSH the TransportGitSsh class needs to run the
process named by the GIT_SSH environment variable and use that as the
pipes for connectivity to the remote peer. Refactor the current
transport code to support a different type of pipe connectivity, so we
can later add GIT_SSH.
Bug: 321062
Change-Id: I9d8ee1a95f1bac5013b33a4a42dcf1f98f92172f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Displaying the current tree in the ls-tree style output makes it
easier to see what entries are currently stored.
Change-Id: If17c414db0d2e8d84e65de8bbcba7fd1b79aa311
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Reviewed-by: Chris Aniszczyk <caniszczyk@gmail.com>
This makes usage of a TreeFormatter more similar to a CommitBuilder or
a TagBuilder: populate the formatter and pass to the ObjectInserter.
Change-Id: I5a45ef3a35cc73f4905a34bc6f6228510df8eb2c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Reviewed-by: Chris Aniszczyk <caniszczyk@gmail.com>
This better matches the existing API of TreeFormatter, but is just a
simple delegation to build().
Change-Id: I188f43acc34455e773d63836724b05e18f5c7a84
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Reviewed-by: Chris Aniszczyk <caniszczyk@gmail.com>
These objects don't need to be updated with the resulting ObjectId of
the formatted content, callers can get that from the ObjectInserter on
their own.
Change-Id: Idc5f097de9f7beafc5e54e597383d82daf9d7db4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Reviewed-by: Chris Aniszczyk <caniszczyk@gmail.com>
The correct names for these is build(), as that is what a Java
developer will expect given the "builder" pattern.
Bug: 323541
Change-Id: I35042bdc95a955beeaee29e54bde10e4240b2a71
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Reviewed-by: Chris Aniszczyk <caniszczyk@gmail.com>
When in OURS and THEIRS a new file is created we want a conflict
when the two contents differ. If on two branches the same file
with the same content is created this should not be a conflict.
But: the current merge algorithm is throwing NPEs in this case.
Fix this by choosing an empty RawText as common base if the
base is empty.
Change-Id: I21cb23f852965b82fb82ccd66ec961c7edb3ac3d
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
If the object type is a whole object and all we want is the type,
there is no need to skip the length header. The type is already known
and can be returned as-is. Instead skip the length header only for
the two delta formats, where the delta base must itself be scanned.
Change-Id: I87029258e88924b3e5850bdd6c9006a366191d10
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This variable was not used for anything, but Eclipse's JDT failed to
notice because of the "shift += " operation within the body of the
while loop. Here we don't need the shift because we do not decode the
length, but we do have to skip over the bytes that store the length to
locate the delta base.
Bug: 331319
Change-Id: I200a874fd7e39e3adf2640b8cd0f53dcf91ef4c9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Remy Suen <remysuen@ca.ibm.com>
If the CLI stops a rebase upon conflict, the current
step is already popped from the git-rebase-todo and appended to the
"done" file. The current implementation wrongly pops the step only
after successful cherry-pick.
Change-Id: I8640dda0cbb2a5271ecf75fcbad69410122eeab6
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
The Repository is then in state "Rebase interactive".
Change-Id: I5d2de57f8670e1d4c71ed22509ab17f04e2561b5
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
The IndexDiff had not collected the info if the flag
"assume-unchanged" is set. This information is useful for clients
which may want to decide if specific actions are allowed on a file.
Bug: 326213
Change-Id: I14bb7b03247d6c0b429a9d8d3f6b10f21d8ddeb1
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
When the assume unchanged flag is set the Add command must not update
the index for this file if any changes are present in the working
directory.
Bug: 331351
Change-Id: I255870f689225a1d88971182e0eb377952641b42
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Rename detection should be considered enabled if
diff.renames config property is set to "copy" or "copies", instead of
throwing IllegalArgumentException.
Change-Id: If55d955e37235d4d00f5b0febd6aa10c0e27814e
In order to enable interoperability with the command line, we need to
remove line feeds when reading the files.
Change-Id: Ie2f5799037a60243bb4fac52346908ff85c0ce5d
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
The referenced bug showed that JGit produced different merge results
compared to C Git. Unit test was added to reproduce the issue. The
problem can be solved by switching to histogram diff algorithm.
Bug: 331078
Change-Id: I54f30afb3a9fef1dbca365ca5f98f4cc846092e3
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Philipp Thun <philipp.thun@sap.com>
The diff algorithm which is used by Merge, Cherry-Pick, Rebase
should be configurable. A new configuration parameter "diff.algorithm"
is introduced which currently accepts the values "myers" or
"histogram". Based on this parameter for example the ResolveMerger
will choose a diff algorithm. The reason for this is bug 331078.
This bug shows that JGit is more compatible with C Git when
histogram diff is in place. But since histogram diff is quite new we
need an easy way to fall back to Myers diff.
Bug: 331078
Change-Id: I2549c992e478d991c61c9508ad826d1a9e539ae3
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Philipp Thun <philipp.thun@sap.com>
Coverage tests showed that we are missing to test certain areas
in the rebase command. Add the missing tests.
Change-Id: Ia4a272d26cde7e1861dac30496e4b6799fc8187a
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Add the ability to checkout a branch to the working tree.
Bug: 330860
Change-Id: Ie06b9e799a9e1be384da0b8996efa7209b32eac3
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
There was a bug introduced by commit 0e815fe. For non-versioned files
the merge algorithm detected an incoming deletion from THEIRS.
Consequently such files were deleted. That's a severe bug which was
fixed by more precisely detecting incoming deletions.
Change-Id: I4385d3c990db11d62e371a385dc8ee89841db84a
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Philipp Thun <philipp.thun@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This is a first iteration to implement Rebase. At the moment, this
does not implement --continue and --skip, so if the first
conflict is found, the only option is to --abort the command.
Bug: 328217
Change-Id: I24d60c0214e71e5572955f8261e10a42e9e95298
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Instead of copying up to 4 fields from the parent iterator each time a
child iterator is initialized and used, construct a single state
object that contains the 4 fields, and pass that one state object
through to the child. This makes it easier to add additional state
fields that must be inherited, at the slight expense of an extra
object allocation per TreeWalk, and an extra level of field
indirection whenever the options, nameEncoder, or read buffer is
required by the iterator.
Change-Id: Ic4603c33b772d7a45f9c81140537d51945688fcb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Naming these inner classes ensures that stack traces which contain
them will give us useful information about which filter is involved in
the trace, rather than the generated names $1, $2, etc. This makes it
much easier to understand a stack trace at a glance.
Change-Id: Ia6a75fdb382ff6461e02054d94baf011bdeee5aa
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In the case where DirCacheCheckout was used to checkout a tree
without taking HEAD into account (e.g. during a clone or hard reset)
we didn't handle conflicts correctly. E.g. if there are conflicts
(entries with stage != 0) in the index and we tried to hard reset
we have been processing the conflicting pathes multiple times (once
for every stage). With this fix we will update the index with the
entry from the "merge" state (the state we want checkout) when we
detect existing conflicts.
Change-Id: Iffbddccaa588cf0d1460a5e44dabaf540d996e26
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
* rename-detection:
RenameDetector: Only scan deletes if adds exist
SimilarityRenameDetector: Initialize sizes to 0
SimilarityRenameDetector: Avoid allocating source index
SimilarityRenameDetector: Only attempt to index large files once
SimilarityIndex: Don't overflow internal counter fields
SimilarityIndex: Accept files larger than 8 MB
SimilarityIndex: Correct comment explaining the logic
* fs-fsync:
Remove unnecessary flush calls from LockFile
Remove unnecessary region locking from LockFile
Support core.fsyncRefFiles option
Support core.fsyncObjectFiles option
Simplify LockFile write(ObjectId) case
Rewrite the initialization of the encoding tables to be more clear,
but slightly slower to setup. We generally perfer a clear definition
of the data over a slightly slower class load time.
Change-Id: I0c7f89b6ab82dcf71525ffb69a388c312c195913
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since we have already modified this class to localize an error
message, we might as well strip it down to contain only the
functionality we need, or might ever use.
To keep this simple to review we don't adjust formatting right
away, so code that was buried inside of an if or else block whose
condition was removed might not have the correct indentation anymore.
We can fix this with a later reformatting change.
Change-Id: I2996aaa704e9d6182e5500c7a63240d5e9d722cc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When DirCacheCheckout should be used to checkout only one
tree (reset --hard, clone) then we had to use the standard
constructor and specify null as value for head. This change
adds explicit constructors not taking HEAD and documents
that.
Bug: 330021
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Fanout level notes trees are combined back together into a flat leaf
level tree if during a removal of a subtree there are less than 3/4 of
the fanout subtrees still existing, and the size of the combined leaf
is under the 256 split limit noted above.
This rule is used because deletes are less common than insertions, and
SHA-1's relatively uniform distribution suggests that with only 192
subtrees existing in the fanout, there should be approximately 192
names in the combined replacement leaf tree.
Change-Id: Ia9d145ffd5454982509fc40906bc4dbbf2b13952
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Leaf level notes trees are split into a new fan-out tree if an
insertion occurs and the tree already contains >= 256 notes in it.
The splitting may occur multiple times if all of the notes have the
same prefix; in the worst case this produces a tree path such as
"00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/be" if all
of the notes begin with zeros.
Change-Id: I2d7d98f35108def9ec49936ddbdc34b13822a3c7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some algorithms need to be able to iterate through all notes within a
particular bucket, such as when splitting or combining a bucket.
Exposing an Iterator<Note> makes this traversal possible.
For a LeafBucket the iteration is simple, its over the sorted array of
elements. For FanoutBucket its a bit more complex as the iteration
needs to union the iterators of each fanout bucket, lazily loading any
buckets that aren't already in-memory.
Change-Id: I3d5279b11984f44dcf0ddb14a82a4b4e51d4632d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This is necessary to allow applications to wrap the note tree in
a commit and update the note branch with the new state.
Change-Id: Idbd7ead4a1b16ae2b64a30a4a01a29cfed548cdf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
NoteMap now supports editing in-memory, allowing applications to
modify the NoteMap once it has been loaded from the branch. The
ability to write the branch back to tree objects is not yet done,
so the edits are strictly transient.
Change-Id: I63448954abfca2a8e3e95369cd84c0d1176cdb79
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The lock file protocol relies on the atomic creation of a standardized
name in the parent directory of the file being updated. Since the
creation is atomic, at most one thread in any process can succeed on
this creation, and all others will fail. While the lock file exists,
that file is private to the thread that is writing it, and no others
will attempt to read or modify the file.
Consequently the use of the region level locks around the file are
unnecessary, and may actually reduce performance when using NFS, SMB,
or some other sort of remote filesystem that supports locking.
Change-Id: Ice312b6fb4fdf9d36c734c3624c6d0537903913b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If core.fsyncRefFiles is set to true, fsync is used whenever a
reference file is updated, ensuring the file contents are also
written to disk. This can help to prevent empty ref files after
a system crash when using a filesystem such as HFS+ where data
writes may be delayed.
Change-Id: Ie508a974da50f63b0409c38afe68772322dc19f1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some repositories may be on really unstable filesystems, but still
want to have good reliability when objects are written to disk. If
core.fsyncObjectFiles is set to true, request the JVM to ensure the
data is written before returning success to the caller of insert.
The option defaults to false because it should be useless on any
filesystem that orders writes and metadata, such as ext3 mounted with
data=ordered (or data=journal). But it may be useful on some systems
(especially HFS+) where file content may flush to the disk
independently of filesystem structure changes.
Because FileChannel.force(boolean) only claims to ensure data is
written if it was written using the write(ByteBuffer) method of
FileChannel, redirect all writes when using fsyncObjectFiles to go
through the FileChannel interface instead of through the older style
OutputStream interface. This may not be necessary on all JVMs, but
its more portable to follow the definition than the common behavior.
Change-Id: I57f6b6bb7e403c07fbae989dbf3758eaf5edbc78
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If there are only deletes, don't need perform rename or copy
detection. There are no adds (aka destinations) for the deletes
to match against.
Change-Id: I00fb90c509fa26a053de561dd8506cc1e0f5799a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Setting the array elements to -1 is more expensive than relying on
the allocator to zero the array for us first. Shifting the code to
always add 1 to the size (so an empty file is actually 1 byte long)
allows us to detect an unloaded size by comparing to 0, thus saving
the array fill calls.
Change-Id: Iad859e910655675b53ba70de8e6fceaef7cfcdd1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the only file added is really small, and all of the deleted
files are really big, none of the permutations will match up due
to the sizes being too far apart to fit the current rename score.
Avoid allocating the really big deleted SimilarityIndex by deferring
its construction until at least one add along that row has a
reasonable chance of matching it.
This avoids expending a lot of CPU time looking at big deleted
binary files when a small modified text file was broken due to a
high percentage of changed lines.
Change-Id: I11ae37edb80a7be1eef8cc01d79412017c2fc075
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If a file fails to index the first time the loop encounters it, the
file is likely to fail to index again on the next row. Rather than
wasting a huge amount of CPU to index it again and fail, remember
which destination files failed to index and skip over them on each
subsequent row.
Because this condition is very unlikely, avoid allocating the BitSet
until its actually needed. This keeps the memory usage unaffected
for the common case.
Change-Id: I93509b28b61a9bba8f681a7b4df4c6127bca2a09
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The counter portion of each pair is only 32 bits wide, but is part
of a larger 64 bit integer. If the file size was larger than 4 GB
the counter could overflow and impact the key, changing the hash,
and later resulting in an incorrect similarity score.
Guard against this overflow condition by capping the count for each
record at 2^32-1. If any record contains more than that many bytes
the table aborts hashing and throws TableFullException.
This permits the index to scan and work on files that exceed 4 GB
in size, but only if the file contains more than one unique block.
The index throws TableFullException on a 4 GB file containing all
zeros, but should succeed on a 6 GB file containing unique lines.
The index now uses a 64 bit accumulator during the common scoring
algorithm, possibly resulting in slower summations. However this
index is already heavily dependent upon 64 bit integer operations
being efficient, so increasing from 32 bits to 64 bits allows us
to correctly handle 6 GB files.
Change-Id: I14e6dbc88d54ead19336a4c0c25eae18e73e6ec2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Files bigger than 8 MB (2^23 bytes) tended to overflow the internal
hashtable, as the table was capped in size to 2^17 records. If a
file contained 2^17 unique data blocks/lines, the table insertion
got stuck in an infinite loop as the able couldn't grow, and there
was no open slot for the new item.
Remove the artifical 2^17 table limit and instead allow the table
to grow to be as big as 2^30. With a 64 byte block size, this
permits hashing inputs as large as 64 GB.
If the table reaches 2^30 (or cannot be allocated) hashing is
aborted. RenameDetector no longer tries to break a modify file pair,
and it does not try to match the file for rename or copy detection.
Change-Id: Ibb4d756844f4667e181e24a34a468dc3655863ac
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This comment was wrong, due to a copy-and-paste error. Here the
code is looking at records of dst that do not exist in src, and
are skipping past them to find another match.
Change-Id: I07c1fba7dee093a1eeffcf7e0c7ec85446777ffb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In order to safely edit a notes tree, NoteMap needs to retain any
non-note tree entries it read from the source tree and put them
back out into the modified tree when it commits a new version of
the note branch.
Remember any tree entries that didn't look like a note during
the parsing of the tree, so they can be put into a TreeFormatter
later when the tree writes to the repository.
Change-Id: Ia284af7e7866da35db35374c6c5869f00c857944
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of reading a note tree recursively up front when the NoteMap
is loaded, read only the root tree and load subtrees on demand when
they are accessed by the application. This gives a lower latency
to read a note for the recent commits on a branch, as only the paths
that are needed get read.
Given a 2/38 style fanout, the tree will fully load when 256 objects
have been accessed by the application. But unlike the prior version
of NoteMap, the NoteMap will load faster and answer lookups sooner,
as the loading time for all 256 levels is spread out across each of
the get() requests.
Given a 2/2/36 style fanout, the tree won't need to fully load until
about 65,536 objects are accessed.
To simplify the implementation we only support the flat layout (all
notes in the top level tree), or a 2/38, 2/2/36, 2/2/2/34, through
2/.../2 style fanout. Unlike C Git we don't support reading the old
experimental 4/36 fanout. This is sufficient because C Git won't
create the 4/36 style fanout when creating or updating a notes tree,
and there really aren't any in the wild today.
Change-Id: I6099b35916a8404762f31e9c11f632e43e0c1bfd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The NoteMap makes it easy to read a small notes tree as created by
the `git notes` command in C Git. To make the initial implementation
simple a notes tree is read recursively into a map in memory.
This is reasonable if the application will need to access all notes,
or if there are less than 256 notes in the tree, but doesn't behave
well when the number of notes exceeds 256 and the application
doesn't need to access all of them.
We can later add support for lazily loading different subpaths,
thus fixing the large note tree problem described above.
Currently the implementation only supports reading. Writing notes
is more complex because trees need to be expanded or collapsed at
the exact 256 entry cut-off in order to retain the same tree SHA-1
that C Git would use for the same content. It also needs to retain
non-note tree entries such as ".gitignore" or ".gitattribute" files
that might randomly appear within a notes tree. We can also add
writing support later.
Change-Id: I93704bd84ebf650d51de34da3f1577ef0f7a9144
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
When setting up an SSH connection, use the caller supplied
CredentialsProvider, if one has been given to the Transport
or was defined as the default.
The CredentialsProvider is re-wrapped as a JSch UserInfo,
allowing the connection to use this for user interactive
prompts. This give a unified API for authentication on
any transport type.
Change-Id: Id3b4cf5bfd27a23207cdfb188bae3b78e71e02c0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This permits applications to set their preferred credentials UI
implementation once, rather than needing to define it on every
single Transport instance they open.
Change-Id: I010550de1a6becab27f7aa5a9901df5a1c7e74bd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This change is based on http://egit.eclipse.org/r/#change,1652
by David Green. The change adds the concept of a CredentialsProvider
which can be registered for git transports and which is
responsible to return credential-related data like passwords and
usernames. Whenenver the transports detects that an authentication
with certain credentials has to be done it will ask the
CredentialsProvider for this data. Foreseen implementations for
such a Provider may be a EGitCredentialsProvider (caching
credential data entered e.g. in the Clone-Wizzard) or a NetRcProvider
(gathering data out of ~/.netrc file).
Bug: 296201
Change-Id: Ibe13e546b45eed3e193c09ecb414bbec2971d362
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: David Green <dgreen99@gmail.com>
The auth-scheme token (like "Basic" or "Digest") is not specified in a
case sensitive way. RFC2617 (http://tools.ietf.org/html/rfc2617) specifies
in section 1.2 the use of a "case-insensitive token to identify the
authentication scheme". Jetty, for example, uses "basic" as token.
Change-Id: I635a94eb0a741abcb3e68195da6913753bdbd889
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
The ObjectId (for a ref) can be easily reformatted into a temporary
byte[] and then passed off to write(byte[]), removing the duplicated
code that existed in both write methods.
Change-Id: I09740658e070d5f22682333a2e0d325fd1c4a6cb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We stopped handling URIs such as "example.com:/some/p ath", because
this was confused with the Windows absolute path syntax of "c:/path".
Support absolute style scp URIs again, but only when the host name
is more than 2 characters long.
Change-Id: I9ab049bc9aad2d8d42a78c7ab34fa317a28efc1a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This reverts commit 1e510ec20e.
Instead work around the warning by defining our constant by
constructing it through a StringBuilder.
Change-Id: If139509e769d649609c62eff359ebaea5dd286b2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Matthias Sohn <matthias.sohn@sap.com>
CC: Chris Aniszczyk <caniszczyk@gmail.com>
IndexDiff was extended to detect files which are both removed from the
index and untracked. Before this change these files were only added
to the removed collection.
Change-Id: I971d8261d2e8932039fce462b59c12e143f79f90
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The code analyzer can't know that passing a value known to be null is
not a problem. Hence better pass null explicitly instead of the
parameters being null.
Change-Id: I8db6f8014de6c00dd95974d60f61ecc66191e6d4
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
There was a bug in ResolveMerger which is one reason for
bug 328841. If a merge was failing because of conflicts
deletions where not handled correctly. Files which have
to be deleted (because there was a non-conflicting deletion
coming in from THEIRS) are not deleted. In the
non-conflicting case we also forgot to delete the file but
in this case we explicitly checkout in the end these files
get deleted during that checkout.
This is fixed by handling incoming deletions explicitly.
Bug: 328841
Change-Id: I7f4c94ab54138e1b2f3fcdf34fb803d68e209ad0
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
The automatically generated commit message of a merge should have the
same structure as in C Git for consistency (as per git fmt-merge-msg).
Before this change:
merging refs/heads/a into refs/heads/master
After:
Merge branch 'a'
Plurals, "into" and joining by "," and "and" also work.
Change-Id: I9658ce2817adc90d2df1060e8ac508d7bd0571cb
This mirrors the getByte() API in ObjectId and allows the caller to
modify a single byte, which is useful when updating it as part of a
loop walking through 0x00..0xff inside of a range of objects.
Change-Id: I57fa8420011fe5ed5fc6bfeb26f87a02b3197dab
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Processing git notes requires random access to part of the raw data
of each ObjectId... which isn't easy because ObjectIds are stored
with an internal representation of 5 ints. Expose random access
to the individual data bytes through new methods, avoiding the
need to convert first to a byte[20].
Change-Id: I99e64700b27fc0c95aa14ef8ad46a0e8832d4441
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of hiding this logic inside of DirCacheTree and the legacy
Tree type, pull it into a common place where we can reuse it by
creating tree records in a buffer that can be passed directly into
the ObjectInserter. This allows us to avoid some copying, as the
inserter can be given the internal buffer of the formatter.
Because we trust these two callers to feed us records in the proper
order, without '/' in the names, and without duplicate names in the
same tree, we don't do any validation inside of the formatter itself.
To protect themselves from making ordering errors, developers should
continue to use DirCache to process edits to source code trees.
Change-Id: Idf7f10e736d4a44ccdf8afe060535d7b0554a92f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
JGit merge algorithm behaved differently from C Git when
we had adjacent modifications. If line 9 was modified by
OURS and line 10 by theirs then C Git will return a
conflict while JGit was seeing this as independent
modifications. This change is not only there to achieve
compatibility, but there where also some really wrong
merge results produced by JGit in the area of adjacent
modifications.
Change-Id: I8d77cb59e82638214e45b3cf9ce3a1f1e9b35c70
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
When adding a new method near the end of the sequence we want to
show the full method inserted, and not tear the prior method due
to the common trailing curly brace being consumed as part of the
common end region of the sequences.
Bug: 328895
Change-Id: I233bc40445fb5452863f5fb082bc3097433a8da6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
HistogramDiff failed on cases where the initial element for the LCS
was actually very common (e.g. has 20 occurrences), and the first
element of the inserted region after the LCS was also common but
had fewer occurrences (e.g. 10), while the LCS also contained a
unique element (1 occurrence).
This happens often in Java source code. The initial element for
the LCS might be the empty line ("\n"), and the inserted but common
element might be "\t/**\n", with the LCS being a large span of
lines that contains unique method declarations. Even though "/**"
occurs less often than the empty line its not a better LCS if the
LCS we already have contains a unique element.
The logic in HistogramDiff would normally have worked fine, except I
tried to optimize scanning of B by making tryLongestCommonSequence
return the end of the region when there are matching elements
found in A. This allows us to skip over the current LCS region,
as it has already been examined, but caused us to fail to identify
an element that had a lower occurrence count within the region.
The solution used here is to trade space-for-time by keeping a
table of A positions to their occurrence counts. This allows the
matching logic to always use the smallest count for this region,
even if the smallest count doesn't appear on the initial element.
The new unit test testEdit_LcsContainsUnique() verifies this new
behavior works as expected.
Bug: 328895
Change-Id: Id170783b891f645b6a8cf6f133c6682b8de40aaf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fixes the "Method ignores results of InputStream.read()" warning.
This is the only place where read() was used instead of readFully()
and the return value was not checked. So it was either an oversight
or should be documented. This change assumes it was an oversight.
Change-Id: I859404a7d80449c538a552427787f3e57d7c92b4
The value was accessed every time in the loop body with get(),
so use the more efficient entrySet().
Change-Id: I91d90cbd0b0d03ca4a3db986c58b8d80d80f40a4
This was already disabled in the Eclipse preferences for the project.
With this, Hudson should also ignore it.
Change-Id: I7a6b9a20451dc5ba9a61553248b5f4b6c6c7a78b
As described in Bug 328551 there was a bug that the merge algorithm
was not always reporting conflicts when the same line was deleted
and modified. This problem was introduced during commit
0c017188b4 when reported conflicts have
been checked for common pre- and suffixes.
This was fixed here by better determining whether after stripping
off common prefixes and suffixes from a conflicting region there
is still some conflicting part left.
I also added a unit test to test this situation.
Bug: 328551
Change-Id: Iec6c9055d00e5049938484a27ab98dda2577afc4
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
When creating a local branch based on another local branch, the
upstream configuration contains "." as origin and the source branch
as "merge". The PullCommand should support this by skipping the
fetch step altogether and use the base branch to merge with.
Change-Id: I260a1771aeeffca5b0161d1494fd63c672ecc2a6
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
AmbiguousObjectException contains an AbbreviatedObjectId and is
supposed to be serializable, so it should be serializable as well.
Change-Id: I8056e78aee20fdd3cb9600b52cd8ed988544293d
It's probably not possible that these numbers are negative in the
algorithm, but it's cleaner this way and gets rid of three more
FindBugs warnings.
Change-Id: Ifbce4e2c787fb9a7cd309c605e8d86211ef8a352
Don't permit transient worker threads to access the underlying output
stream of a ProgressMonitor, as they might get marked as the stream's
writer thread. Instead proxy update events from the workers back onto
the application's real work thread. This ensures that the stream only
sees a single thread, and its the thread that will remain alive for
the entire life cycle of the operation.
This fixes IOException("Write end dead") during local repository fetch
when threaded delta search is enabled. One of the transient delta
search threads became the designated writer for the pipe, and when it
terminated the reader end thought the writer was dead, even though the
main writer thread was still executing in PackWriter.
Bug: 326557
Change-Id: I01d1b20a3d7be1c0b480c7fb5c9773c161fe5c15
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
RefsDirectory fires a RefsChangedEvent when it detect that one
ref changed (new, modified, deleted). But there was a potential
of wrong events beeing fired leading to a endless loop in EGit.
Problem is that when calling getRefs(ALL) we don't want to report
additional refs and by that we remove the additional refs from
the list of "refs reported upwards last time". We fire an
RefsChangedEvent because we think that the special refs are not
there anymore.
I fixed this by removing eventing for the additional refs. Another
alternative would be to always scan also for additional refs and
put them in the list of refs. But getRefs(ALL) would then remove
the additional refs again. I didn't do that for performance reasons
and also because I am not sure whether we want evnting for
additional refs.
Change-Id: Icb9398b55a8c6bbf03e38f6670feb67754ce91e0
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
When checking out a tree, files that are identical to the file in
the current index and working directory don't need to be updated.
Change-Id: I9e025a53facd42410796eae821baaeff684a25c5
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
IndexDiff now allows to set an additional filter. This can be used
e.g. for restricting the tree walk to a given set of files.
Change-Id: I642de17e74b997fa0c5878c90631f6640ed70bdd
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
The RefDirectory class was not returning FETCH_HEAD and
MERGE_HEAD when trying to get all refs via getRefs(RefDatabase.ALL).
This fix adds constants for FETCH_HEAD and ORIG_HEAD and adds a
new getter getAdditionalRefs() to get these additional refs.
To be compatible with c git the getRefs(ALL) method will not return
FETCH_HEAD, MERGE_HEAD and ORIG_HEAD.
Change-Id: Ie114ca92e9d5e7d61d892f4413ade65acdc08c32
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
We need to use findbugs-maven-plugin:2.3.2-SNAPSHOT
since otherwise build fails with maven-3.0 [1], [2].
We should switch to the release version as soon
as this becomes available.
[1] http://www.sonatype.com/people/2010/10/maven-3-0-has-landed/
[2] http://jira.codehaus.org/browse/MFINDBUGS-122
Bug: 327799
Change-Id: I1c57f81cf6f0450e56411881488c4ee754e458e3
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If an object hasn't been accessed yet the pack list for a repository
may not have been scanned from disk. If an application (e.g. the dumb
transport servlet support code) asks for the pack list for an
ObjectDirectory, we should load it immediately.
Change-Id: I93d7b1bca422d905948e8e83b2afa83c8894a68b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If an ObjectInserter is created from a CachedObjectDirectory, we need
to ensure the cache is updated whenever a new loose object is actually
added to the loose objects directory, otherwise a future read from an
ObjectReader on the CachedObjectDirectory might not be able to open
the newly created object.
We mostly had the infrastructure in place to implement this due to the
injection of unpacked large deltas, but we didn't have a way to pass
the ObjectId from ObjectDirectoryInserter to CachedObjectDirectory,
because the inserter was using the underlying ObjectDirectory and not
the CachedObjectDirectory. Redirecting to CachedObjectDirectory
ensures the cache is updated.
Change-Id: I1f7bdfacc7ad77ebdb885f655e549cc570652225
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
These messages may need to change depending on the current
thread's configured locale, and thus cannot be static.
Change-Id: I96751a63852ec9c4bf6c47edadcf8752700543df
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
There was a chance that we hit a NPE which doing a checkout
with DirCacheCheckout when there is no HEAD (e.g. initial
checkout). This is fixed here.
Change-Id: Ie3b8cae21dcd90ba8352823ea94a700f8ee9221a
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Implemented the initial version of a cherry-pick command.
A correct error handling is missing (what happens if the
checkout fails, the cherry-pick leads to conflicts etc).
But straightforward cherry-picks works.
Change-Id: I235c0eb3a7a2d5bdfe40400f1deed06f29d746e1
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
These routines can be useful when debugging, because we can add an
expression to the Eclipse "Expressions" panel to show the text that
appears on a line. Gerrit Code Review also uses these in its own
subclass of RawText in order to format patch files, so pulling it up
to be part of core JGit may help other applications too.
Change-Id: I20a6b112e3403ecfc1c2715ae75dcecc1a85b167
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since the introduction of HashedSequence we no longer need to supply
the RawTextComparator at the time of constructing a RawText. Drop the
definition from the constructor, because it doesn't make sense as part
of our public API.
Change-Id: Iaab34611d60eee4a2036830142b089b2dae81842
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When an empty line was inserted at the beginning of the common end
part of a RawText the comparator incorrectly considered it to be
common, which meant the DiffAlgorithm would later not even have it be
part of the region it examines. This would cause JGit to skip a line
of insertion, which later confused Gerrit Code Review when it tried to
match up the pre and post RawText files for a difference that had this
type of insertion.
Define two new unit tests to check for this insertion of a blank line
condition and correct for it by removing the LF from the common region
when the condition is detected.
Change-Id: I2108570eb2929803b9a56f9fb9c400c758e7156b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The maven 2 build for jgit source bundle didn't create a correct
OSGi version string, instead of
org.eclipse.jgit.source_0.10.0.<timestamp>
the generated OSGi version was
org.eclipse.jgit.source_0.10.0.SNAPSHOT
This caused trouble when trying to install it from p2 repository.
Bug: 327616
Change-Id: Ic27c763ae9a4bcbb5bd6ed9562cd98bb4da3386b
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
It wrongly uses the full name of the branch to remove the
configuration entries but must use the shortened one.
Change-Id: Ie386a128a6c6beccc20bafd15c2e36254c5f560d
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
When creating a branch with CreateBranchCommand.call() without
specifying an explicit startPoint HEAD should be used as startPoint.
There was a bug leading to an NPE in such a case.
Change-Id: Ic0a5dc1f33a0987d66c09996c8012c45785500ff
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
There was a wrong javadoc comment telling that MergeCommand
only supports fast-forward merges. This has been fixed.
Change-Id: I7edea779a83528beee34a1753026288c384881ce
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
We wanted to wrap all LowLevel JGit excpetions into a
JGitInternalException so that users of this high-level interface
don't have to explicitly catch all of them. This
was forgotten on BranchCreateCommand.call() and I added
it.
Change-Id: Ie140e99574fb004137c66e80fb92eb6c6d0fa5e1
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
HistogramDiff outperforms it for any case where PatienceDiff needs to
fallback to another algorithm. Consequently it's not worth keeping
around, because we would always want a fallback enabled.
Change-Id: I39b99cb1db4b3be74a764dd3d68cd4c9ecd91481
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Its behavior is similar to PatienceDiff, and runs nearly as fast,
often beating the performance of MyersDiff.
Change-Id: I43c3faefa8109f1a68ef57522bec9cf27b5df252
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Large objects stored as deltas get unpacked by JGit into a loose
object, so they are cheaper to access later on. This unpacking was
broken because TeeInputStream copied the wrong length into the loose
object, sometimes copying too many bytes into the result. This
created a loose object that did not have the correct content, and
whose length did not match the length denoted in the object header.
Change-Id: I3ce1fd9f3dc5bd195249c7872b3bec49570424a2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When passing to a fallback algorithm, we can avoid creating a new copy
of the hash codes for each sequence by passing in the hashed sequences
directly. This makes it cheaper to switch from HistogramDiff down to
MyersDiff in a single pass.
Change-Id: Ibf2e81be57c083862eeb134279aed676653bf9b5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
There is a corner case where we get an EMPTY region during recursion,
but we didn't expect to receive that. Its harmless to ignore the
region since the region is empty and has no content, so do so rather
than throwing an exception
Change-Id: I50dcec81ecba763072bb739adfab5879fb48b23a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Certain inputs caused an infinite loop because the prior match data
couldn't be used as expected. Rather than incrementing the match
pointer before looking at an element, do it after, so the loop breaks
when we wrap around to the starting point.
Change-Id: Ieab28bb3485a914eeddc68aa38c256f255dd778c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The need for branching becomes more pressing with pull
support: we need to make sure the upstream configuration entries
are written correctly when creating and renaming branches
(and of course are cleaned up when deleting them).
This adds support for listing, adding, deleting and renaming
branches including the more common options.
Bug: 326938
Change-Id: I00bcc19476e835d6fd78fd188acde64946c1505c
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
In bug 323571 it is mentioned that if you call
'toURI().toURL().toString()' on a java.io.File you cannot pass
that string to jgit as an URIish. Problem is that the passed
URI looks like 'file:/C:/a/b.txt' and that we where expecting
double slashes after scheme':'. This fix adds support for this
single-slash file URLs.
Bug: 323571
Change-Id: I866a76a4fcd0c3b58e0d26a104fc4564e7ba5999
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
This is the minimal implementation of a "Pull" command. It does not
have any parameters besides the generic progress monitor and timeout.
It works on the currently checked-out branch and assumes that the
configuration contains the keys "branch.<branch name>.remote" and
"branch.<branch name>.merge" to determine the remote configuration
for the fetch and the remote branch name for the merge.
Bug: 303404
Change-Id: I7fe09029996d0cfc09a7d8f097b5d6af1488fa93
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
There where quite some bugs regarding wrong URI parsing. In order
to solve them the parsing has to be refactored. We now have
specialized regexps for 'scheme://host/...', scp URIs and local
file names. Now we can detect problems while parsing 'git://host:/abc' which
was previously not possible.
Bug: 315571
Bug: 292897
Bug: 307017
Bug: 323571
Bug: 317388
Change-Id: If72576576ebb6b9d9dc8b7e51ddd87c9909e8b62
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The regular expression which should handle the
user/password part in an URI was potentially
processing too many chars. This led to problems
when user/pwd and port was specified
Change-Id: I87db02494c4b367283e1d00437b1c06d2c8fdd28
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The regular expressions used to parse URI's are constructed by
concatenating different segments to a big String. Introduce
String constants for these segements and document them.
Change-Id: If8b9dbaaf57ca333ac0b6c9610c3d3a515c540f9
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Some strings were not externalized. Also use them in HTTP tests to
ensure that they will also succeed when message bundles are
translated.
Change-Id: Id02717176557e7d57e676e1339cd89f2be88d330
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The strings used to construct the regex to parse
URIs are split differently. This makes it easier
to introduce meaningful String constants later on.
Change-Id: I9355fd42e57e0983204465c5d6fe5b6b93655074
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Projects like org.eclipse.mdt contain large XML files about 6 MiB
in size. So does the Android project platform/frameworks/base.
Doing a clone of either project with JGit takes forever to checkout
the files into the working directory, because delta decompression
tends to be very expensive as we need to constantly reposition the
base stream for each copy instruction. This can be made worse by
a very bad ordering of offsets, possibly due to an XML editor that
doesn't preserve the order of elements in the file very well.
Increasing the threshold to the same limit PackWriter uses when
doing delta compression (50 MiB) permits a default configured
JGit to decompress these XML file objects using the faster
random-access arrays, rather than re-seeking through an inflate
stream, significantly reducing checkout time after a clone.
Since this new limit may be dangerously close to the JVM maximum
heap size, every allocation attempt is now wrapped in a try/catch
so that JGit can degrade by switching to the large object stream
mode when the allocation is refused. It will run slower, but the
operation will still complete.
The large stream mode will run very well for big objects that aren't
delta compressed, and is acceptable for delta compressed objects that
are using only forward referencing copy instructions. Copies using
prior offsets are still going to be horrible, and there is nothing
we can do about it except increase core.streamFileThreshold.
We might in the future want to consider changing the way the delta
generators work in JGit and native C Git to avoid prior offsets once
an object reaches a certain size, even if that causes the delta
instruction stream to be slightly larger. Unfortunately native
C Git won't want to do that until its also able to stream objects
rather than malloc them as contiguous blocks.
Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Natively support the HTTP basic and digest authentication methods
by setting the Authorization header without going through the JREs
java.net.Authenticator API. The Authenticator API is difficult to
work with in a multi-threaded server environment, where its using
a singleton for the entire JVM. Instead compute the Authorization
header from the URIish user and pass, if available.
Change-Id: Ibf83fea57cfb17964020d6aeb3363982be944f87
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
As findbugs pointed out, there was a small risk for creating multiple instances of
translation bundles. If that happens, drop the second instance.
Change-Id: I3aacda86251d511f6bbc2ed7481d561449ce3b6c
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
HistogramDiff is an alternative implementation of patience diff,
performing a search over all matching locations and picking the
longest common subsequence that has the lowest occurrence count.
If there are unique common elements, its behavior is identical to
that of patience diff.
Actual performance on real-world source files usually beats
MyersDiff, sometimes by a factor of 3, especially for complex
comparators that ignore whitespace.
Change-Id: I1806cd708087e36d144fb824a0e5ab7cdd579d73
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Pass through the addAll request to our underlying ArrayList.
This way the underlying ArrayList grows no more than once during the
call, which may be important if the list was originally allocated
at the default size of 16, but 64 Edits are being added.
Change-Id: I31c3261e895766f82c3c832b251a09f6e37e8860
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
FetchCommand was missing the ability to set dry run and thin
preferences on the transport operation.
Change-Id: I0bef388a9b8f2e3a01ecc9e7782aaed7f9ac82ce
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Optional inCore parameter to Resolver/Strategy will
instruct it to perform all the operations in memory
and avoid modifying working folder even if there is one.
Change-Id: I5b873dead3682f79110f58d7806e43f50bcc5045
The hash code returned by RawTextComparator (or that is used
by the SimilarityIndex) play an important role in the speed of
any algorithm that is based upon them. The lower the number of
collisions produced by the hash function, the shorter the hash
chains within hash tables will be, and the less likely we are to
fall into O(N^2) runtime behaviors for algorithms like PatienceDiff.
Our prior hash function was absolutely horrid, so replace it with
the proper definition of the DJB hash that was originally published
by Professor Daniel J. Bernstein.
To support this assertion, below is a table listing the maximum
number of collisions that result when hashing the unique lines in
each source code file of 3 randomly chosen projects:
test_jgit: 931 files; 122 avg. unique lines/file
Algorithm | Collisions
-------------+-----------
prior_hash 418
djb 5
sha1 6
string_hash31 11
test_linux26: 30198 files; 258 avg. unique lines/file
Algorithm | Collisions
-------------+-----------
prior_hash 8675
djb 32
sha1 8
string_hash31 32
test_frameworks_base: 8381 files; 184 avg. unique lines/file
Algorithm | Collisions
-------------+-----------
prior_hash 4615
djb 10
sha1 6
string_hash31 13
We can clearly see that prior_hash performed very poorly, resulting
in 8,675 collisions (elements in the same hash bucket) for at least
one file in the Linux kernel repository. This leads to some very
bad O(N) style insertion and lookup performance, even though the
hash table was sized to be the next power-of-2 larger than the
total number of unique lines in the file.
The djb hash we are replacing prior_hash with performs closer to
SHA-1 in terms of having very few collisions. This indicates it
provides a reasonably distributed output for this type of input,
despite being a much simpler algorithm (and therefore will be much
faster to execute).
The string_hash31 function is provided just to compare results with,
it is the algorithm commonly used by java.lang.String hashCode().
However, life isn't quite this simple.
djb produces a 32 bit hash code, but our hash tables are always
smaller than 2^32 buckets. Mashing the 32 bit code into an array
index used to be done by simply taking the lower bits of the hash
code by a bitwise and operator. This unfortuntely still produces
many collisions, e.g. 32 on the linux-2.6 repository files.
From [1] we can apply a final "cleanup" step to the hash code to
mix the bits together a little better, and give priority to the
higher order bits as they include data from more bytes of input:
test_jgit: 931 files; 122 avg. unique lines/file
Algorithm | Collisions
-------------+-----------
prior_hash 418
djb 5
djb + cleanup 6
test_linux26: 30198 files; 258 avg. unique lines/file
Algorithm | Collisions
-------------+-----------
prior_hash 8675
djb 32
djb + cleanup 7
test_frameworks_base: 8381 files; 184 avg. unique lines/file
Algorithm | Collisions
-------------+-----------
prior_hash 4615
djb 10
djb + cleanup 7
This is a massive improvement, as the number of collisions for
common inputs drops to acceptable levels, and we haven't really
made the hash functions any more complex than they were before.
[1] http://lkml.org/lkml/2009/10/27/404
Change-Id: Ia753b695de9526a157ddba265824240bd05dead1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
As it turns out, every single diff algorithm we might try to
implement can benfit from using the SequenceComparator's native
concept of the simple reduceCommonStartEnd() step. For most inputs,
there can be a significant number of elements that can be removed
from the space the DiffAlgorithm needs to consider, which will
reduce the overall running time for the final solution.
Pool this logic inside of DiffAlgorithm itself as a default, but
permit a specific algorithm to override it when necessary.
Convert MyersDiff to use this reduction to reduce the space it
needs to search, making it perform slightly better on common inputs.
Change-Id: I14004d771117e4a4ab2a02cace8deaeda9814bc1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
PatienceDiff always uses a HashedSequence, which promises to provide
constant time access for hash codes during the equals method and
aborts fast if the hash codes don't match. Therefore we don't need
to cache the hash codes inside of the index, saving us memory.
Change-Id: I80bf1e95094b7670e6c0acc26546364a1012d60e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Most diff implementations really want to use cached hash codes for
elements, rather than element equality, as they need to perform many
compares and unique hash codes for elements can really speed that
process up.
To make it easier to define element hash functions, move the caching
of hash codes into a wrapper sequence type, so that individual
sequence types like RawText don't need to do this themselves. This
has a nice property of also allowing the sequence to no longer care
about the specific SequenceComparator that is going to be used, and
permits the caching to only examine the middle region that isn't
common to the two inputs.
Change-Id: If8623556da9419117b07c5073e8bce39de02570e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This is a faster exact match based form that tries to improve
performance for the common case of the header and trailer of
a text file not changing at all. After this fast path we use
the slower path based on the super class' using equals() to
allow for whitespace ignore modes to still work.
Some simple performance testing showed a major improvement over the
older implementation for a common edit we see in JGit. The test
compared blob 29a89bc and 372a978, which is the ObjectDirectory.java
file difference in commit 41dd9ed1c0.
The two text files are approximately 22 KiB in size.
DEFAULT old 203900 ns
DEFAULT new 100400 ns
This new version is 2x faster for the DEFAULT comparator, which does
not treat space specially. This is because we can now examine a
larger swath of text with fewer instructions per byte compared. The
older algorithm had to stop at each line break and recompute how to
examine the next line, while the new algorithm only stops when the
first difference is found.
WS_IGNORE_ALL old 298500 ns
WS_IGNORE_ALL new 63300 ns
Its 4.7x faster for the whitespace ignore comparator, as the common
header and footer do not have a whitespace difference. Avoiding the
special case handling for whitespace on each byte considered saves a
lot of time.
Since most edits to source code (and other text like files) appears in
the interior of the file, fast elimination of common header/footer
means faster diff throughput. In the less common case of an actual
header or footer edit, the common header/footer elimination is stopped
rather quickly either way, so there is very little downside to the
optimiation applied here.
Change-Id: I1d501b4c3ff80ed086b20bf12faf51ae62167db7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
DiffAlgorithm implementations may find it useful to construct an Edit
and use that to later subsequence the two base sequences, so define
two new utility methods a() and b() to construct the A and B ranges.
Once a subsequence has had Edits created for it the indexes are
within the space of the subsequence. These must be shifted back to
the original base sequence's indexes. Define toBase() as a utility
method to perform that shifting work in-place, so DiffAlgorithm
implementations have an efficient way to convert back to the caller's
original space.
Change-Id: I8d788e4d158b9f466fa9cb4a40865fb806376aee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Adds API for performing git fetch operations.
Change-Id: Idd95664fd4e3bca03211e4ffda3e354849f92a35
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
If a thin pack has a large delta we need to be able to open
its cached copy from the loose object directory through the
CachedObjectDatabase handle. Unfortunately that did not support the
openObject2 method, which the LargePackedDeltaObject used directly
to bypass looking at the pack files.
Bug: 324868
Change-Id: I1d5886a6c3254c6dea2852d50b8614c31a93e615
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* stable-0.9:
Qualify post-0.9.3 builds
JGit 0.9.3
clone: Correct formatting of init message
Fix cloning of repositories with big objects
Qualify post-0.9.1 builds
JGit 0.9.1
Fix PlotCommitList to set lanes on child-less commits
When running IndexPack we use a CachedObjectDirectory, which
knows what objects are loose and tries to avoid stat(2) calls for
objects that do not exist in the repository, as stat(2) on Win32
is very slow.
However large delta objects found in a pack file are expanded into
a loose object, in order to avoid costly delta chain processing
when that object is used as a base for another delta.
If this expand occurs while working with the CachedObjectDirectory,
we need to update the cached directory data to include this new
object, otherwise it won't be available when we try to open it
during the object verify phase.
Bug: 324868
Change-Id: Idf0c76d4849d69aa415ead32e46a435622395d68
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When creating a new FileRepository, probe the capability of the
local filesystem and set core.filemode based on how it reacts.
We can't just rely on FS.supportsExecute() because a POSIX system
(which usually does support execute) might be storing the repository
on a partition that doesn't have execute support (e.g. plain FAT-32).
Creating a temporary file, setting both states, checking we get
the desired results will let us set the variable correctly on
all systems.
Change-Id: I551488ea8d352d2179c7b244f474d2e3d02567a2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In PlotCommitList.enter() commits are positioned on lanes for visual
presentation. This implementation was buggy: commits without
children (often the starting points for the RevWalk) are not positioned
on separate lanes.
The problem was that when handling commits with multiple children
(that's where branches fork out) it was not handled that some of the
children may not have been positioned on a lane yet. I fixed that and
added a number of tests which specifically test the layout of commits
on lanes.
Bug: 300282
Bug: 320263
Change-Id: I267b97ecccb5251cec54cec90207e075ab50503e
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
A diff algorithm may find this type useful if it wants to delegate a
particular range of elements to another algorithm, without changing
the underlying sequence types.
Change-Id: I4544467781233e21ac8b35081304b2bad7db00f6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This makes it easier to parametrize DiffFormatter with a different
implementation, as we later plan to add PatienceDiff to JGit.
Change-Id: Id35ef478d5fa20fe10a1ba297f9436fd7adde9ce
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
git allows remotes to be relative paths, but the regex
validating urls wouldn't accept anything starting with "..".
Other functionality works fine with these paths.
Bug: 311300
Change-Id: Ib74de0450a1c602b22884e19d994ce2f52634c77
Instead of spooling large delta bases into temporary files and then
immediately deleting them afterwards, spool the large delta out to
a normal loose object. Later any requests for that large delta can
be answered by reading from the loose object, which is much easier
to stream efficiently for readers.
Since the object is now duplicated, once in the pack as a delta and
again as a loose object, any future prune-packed will automatically
delete the loose object variant, releasing the wasted disk space.
As prune-packed is run automatically during either repack or gc, and
gc --auto triggers automatically based on the number of loose objects,
we get automatic cache management for free. Large objects that were
unpacked will be periodically cleared out, and will simply be restored
later if they are needed again.
After a short offline discussion with Junio Hamano today, we may want
to propose a change to prune-packed to hold onto larger loose objects
which also exist in pack files as deltas, if the loose object was
recently accessed or modified in the last 2 days.
Change-Id: I3668a3967c807010f48cd69f994dcbaaf582337c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Recently created objects are usually what branches point to, and
are usually written out as loose objects. But due to the high cost
of asking the operating system if a file exists, these are the last
thing that ObjectDirectory examines when looking for an object by
its ObjectId.
Caching recently seen loose objects permits the opening code to
jump directly to the loose object, accelerating lookup for branch
heads that are accessed often.
To avoid exploding the cache its limited to approximately 2048
entries. When more ids are added, the table is simply cleared
and reset in size.
Change-Id: I18f483217412b102f754ffd496c87061d592e535
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This class is used only to cache the unpacked form of an object that
was used as a base for another object. The theory goes that if an
object is used as a delta base for A, it will probably also be a
delta base for B, C, D, E, etc. and therefore having an unpacked copy
of it on hand will make delta resolution for the others very fast.
However since objects are usually only accessed once, we don't want
to cache everything we unpack, just things that we are likely to
need again. The only things we need again are the delta bases.
Hence, its a delta base cache.
This gets us the class name UnpackedObjectCache back, so we can
use it to actually create a cache of unpacked object information.
Change-Id: I121f356cf4eca7b80126497264eac22bd5825a1d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The core.autocrlf variable can take on three values: false, true,
and input. Parsing it as a boolean is wrong, we instead need to
parse a tri-state enumeration.
Add support for parsing and setting enum values from Java from and
to the text based configuration file, and use that to handle the
autocrlf variable.
Bug: 301775
Change-Id: I81b9e33087a33d2ef2eac89ba93b9e83b7ecc223
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of making the sequence itself responsible for the equivalence
function, use an external function that is supplied by the caller.
This cleans up the code because we now say cmp.equals(a, ai, b, bi)
instead of a.equals(ai, b, bi).
This refactoring also removes the odd concept of creating different
types of sequences to have different behaviors for whitespace
ignoring. Instead DiffComparator now supports singleton functions
that apply a particular equivalence algorithm to a type of sequence.
Change-Id: I559f494d81cdc6f06bfb4208f60780c0ae251df9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When checkReferencedIsReachable is set in ReceivePack we are trying
to prove that the push client is permitted to access an object that
it did not send to us, but that the received objects link to either
via a link inside of an object (e.g. commit parent pointer or tree
member) or by a delta base reference.
To do this check we are making a list of every potential delta base,
and then ensuring that every delta base used appears on this list.
If a delta base does not appear on this list, we abort with an error,
letting the client know we are missing a particular object.
Preventing spurious errors about missing delta base objects requires
us to use the exact same list of potential delta bases as the remote
push client used. This means we must use TOPO ordering, and we
need to enable BOUNDARY sorting so that ObjectWalk will correctly
include any trees found during the enumeration back to the common
merge base between the interesting and uninteresting heads.
To ensure JGit's own push client matches this same potential delta
base list, we need to undo 60aae90d4d ("Disable topological
sorting in PackWriter") and switch back to using the conventional
TOPO ordering for commits in a pack file. This ensures that our
own push client will use the same potential base object list as
checkReferencedIsReachable uses on the receiving side.
Change-Id: I14d0a326deb62a43f987b375cfe519711031e172
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since we are only checking the links between objects we don't need
to hold onto commit messages after their headers have been parsed
by the walker. Dropping them saves a bit of memory, which is always
good when accepting huge pack files.
Change-Id: I378920409b6acf04a35cdf24f81567b1ce030e36
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the copy instruction was larger than the input buffer given to us,
we copied the wrong part of the base stream during the next read().
This occurred on really big binary files where a copy instruction
of 64k wasn't unreasonable, but the caller's buffer was only 8192
bytes long. We copied the first 8192 bytes correctly, but then
reseeked the base stream back to the start of the copy region on
the second read of 8192 bytes. Instead of a sequence like ABCD
being read into the caller, we read AAAA.
Change-Id: I240a3f722a3eda1ce8ef5db93b380e3bceb1e201
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
As ObjectStreams are supposed to be buffered, most implementors will
be wrapping their underlying stream inside of a BufferedInputStream
in order to satisfy this requirement. Because developers are by
nature lazy, they will use the default buffer size rather than
specify their own.
The OpenJDk JRE implementations use 8192 as the default buffer
size, and when the higher level reader uses the same buffer size
the buffers "stack" nicely by avoiding a copy to the internal
buffer array. As OpenJDK is a popular virtual machine, we should
try to benefit from this nice stacking property during copyTo().
Change-Id: I69d53f273b870b841ced2be2e9debdfd987d98f4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We can slightly optimize this method by removing some compares
based on knowledge of how the orderings have to work. This way
a getType() invocation requires at most 2 int compares for any
result, vs. the 6 required to find REPLACE before.
Change-Id: I62a04cc513a6d28c300d1c1496a8608d5df4efa6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Exposing isEmpty, getLengthA, getLengthB make it easier to examine
the state of an edit and work with it from higher level code.
The before and after cut routines make it easy to split an edit
that contains another edit, such as to decompose a REPLACE that
contains a common sequence within it.
Change-Id: Id63d6476a7a6b23acb7ab237d414a0a1a7200290
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We shouldn't escape non-special ASCII characters such as '@' or '~'.
These are valid in a path name on POSIX systems, and may appear as
part of a path in a GNU or Git style patch script. Escaping them
into octal just obfuscates the user's intent, with no gain.
When parsing an escaped octal sequence, we must parse no more
than 3 digits. That is, "\1002" is actually "@2", not the Unicode
character \u0202.
Change-Id: I3a849a0d318e69b654f03fd559f5d7f99dd63e5c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
QuotedString.GIT_PATH returns the input reference exactly if
the string does not require quoting, otherwise it returns a
copy that contains the quotes on either end, plus escapes in
the middle where necessary to meet conventions.
Testing the return against '"' + name + '"' is always false,
because GIT_PATH will never return it that way. The only way
we have quotes on either end is if there is an escape in the
middle, in which case the string isn't equal anyway.
Change-Id: I4d21d8e5c7da0d7df9792c01ce719548fa2df16b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>