Commit Graph

481 Commits

Author SHA1 Message Date
Shawn O. Pearce 9f61c615e8 Support core.autocrlf = input
The core.autocrlf variable can take on three values: false, true,
and input.  Parsing it as a boolean is wrong, we instead need to
parse a tri-state enumeration.

Add support for parsing and setting enum values from Java from and
to the text based configuration file, and use that to handle the
autocrlf variable.

Bug: 301775
Change-Id: I81b9e33087a33d2ef2eac89ba93b9e83b7ecc223
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-09-07 17:14:27 -07:00
Shawn O. Pearce 33837e44c3 Merge branch 'unpack-error'
* unpack-error:
  ReceivePack: Rethrow exceptions caught during indexing

Change-Id: I0d0239d69cb5cd1a622bdee879978f0299e0ca40
2010-09-03 11:09:52 -07:00
Shawn O. Pearce 9239c10385 ReceivePack: Rethrow exceptions caught during indexing
If we get an exception while indexing the incoming pack, its likely
a stream corruption.  We already report an error to the client, but
we eat the stack trace, which makes debugging issues related to a
bug inside of JGit nearly impossible.  Rethrow it under a new type
UnpackException, so embedding servers or applications can catch the
error and provide it to a human who might be able to forward such
traces onto a JGit developer for evaluation.

Change-Id: Icad41148bbc0c76f284c7033a195a6b51911beab
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-09-03 10:57:55 -07:00
Marc Strapetz 253b36d27a Partial support for index file format "3".
Extended flags are processed and available via DirCacheEntry's
new isSkipWorkTree() and isIntentToAdd() methods.  "resolve-undo"
information is completely ignored since its an optional extension.

Change-Id: Ie6e9c6784c9f265ca3c013c6dc0e6bd29d3b7233
2010-08-31 12:08:09 -07:00
Shawn O. Pearce e6bd689d2c Improve LargeObjectException reporting
Use 3 different types of LargeObjectException for the 3 major ways
that we can fail to load an object.  For each of these use a unique
string translation which describes the root cause better than just
the ObjectId.name() does.

Change-Id: I810c98d5691b74af9fc6cbd46fc9879e35a7bdca
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-08-30 11:53:25 -07:00
Shawn O. Pearce 1709800f27 Undo translation of protocol string 'unpack error'
This string is part of the network protocol, and isn't meant to
be translated into another language.  Clients actually scan for
the string "unpack error " off the wire and react magically to
this information.  If it were translated, they would instead have
a protocol exception, which isn't very useful when there is already
an error occurring.

Change-Id: Ia5dc8d36ba65ad2552f683bb637e80b77a7d92f0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-08-30 10:58:25 -07:00
Chris Aniszczyk f54e883566 Add TagCommand
A tag command is added to the Git porcelain API. Tests were
also added to stress test the tag command.

Change-Id: Iab282a918eb51b0e9c55f628a3396ff01c9eb9eb
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-08-27 21:11:31 +02:00
Christian Halstrick 2059ed205e Implement a Dircache checkout (needed for merge)
Implementation of a checkout (or 'git read-tree') operation which
works together with DirCache. This implementation does similar things
as WorkDirCheckout which main problem is that it works with deprecated
GitIndex. Since GitIndex doesn't support multiple stages of a file
which is required in merge situations this new implementation is
required to enable merge support.

Change-Id: I13f0f23ad60d98e5168118a7e7e7308e066ecf9c
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
2010-08-27 16:06:49 +02:00
Shawn O. Pearce c44495fa2f Complete an abbreviation when formatting a patch
If we are given a DiffEntry header that already has abbreviated
ObjectIds on it, we may still be able to resolve those locally and
output the difference.  Try to do that through the new resolve API
on ObjectReader.

Change-Id: I0766aa5444b7b8fff73620290f8c9f54adc0be96
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-08-25 17:07:12 -07:00
Shawn O. Pearce edd8029558 Add setLength(long) to DirCacheEntry
Applications should favor the long style interface, especially when
their source input is a long type, e.g. coming from java.io.File.
This way when the index format is later changed to support a
larger file size than 2 GiB we can handle it by just changing the
entry code, and not need to fix a lot of applications.

Change-Id: I332563caeb110014e2d544dc33050ce67ae9e897
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-08-23 10:29:50 -07:00
Shawn O. Pearce b85af06324 Allow object reuse selection to occur in parallel
ObjectReader implementations may wish to use multiple threads in
order to evaluate object reuse faster.  Let the reader make that
decision by passing the iteration down into the reader.

Because the work is pushed into the reader, it may need to locate a
given ObjectToPack given its ObjectId.  This can easily occur if the
reader has sent a list of ObjectIds to the object database and gets
back information keyed only by ObjectId, without the ObjectToPack
handle.  Expose lookup using the PackWriter's own internal map,
so the reader doesn't need to build a redundant copy to track the
assocation of ObjectId back to ObjectToPack.

Change-Id: I0c536405a55034881fb5db92a2d2a99534faed34
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-08-20 17:41:27 -07:00
Shawn O. Pearce 707912b35d Make Tag class only for writing
The Tag class now only supports the creation of an annotated tag
object.  To read an annotated tag, applictions should use RevTag.
This permits us to have exactly one implementation, and RevTag's
is faster and more bug-free.

Change-Id: Ib573f7e15f36855112815269385c21dea532e2cf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-08-20 17:38:53 -07:00
Christian Halstrick 75c9b24385 Enhance MergeResult to report conflicts, etc
The MergeResult class is enhanced to report more data about a
three-way merge. Information about conflicts and the base, ours,
theirs commits can be retrived.

Change-Id: Iaaf41a1f4002b8fe3ddfa62dc73c787f363460c2
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
2010-08-19 12:16:39 -05:00
Mathias Kinzler 6e59e6dab9 Meaningful error message when trying to check-out submodules
Currently, a NullPointerException occurs in this case. We should
instead throw a more meaningful Exception with a proper message.
This is a very "stupid" implementation which simply checks for
the existence of a ".gitmodules" file.

Bug: 300731
Bug: 306765
Bug: 308452
Bug: 314853
Change-Id: I155aa340a85cbc5d7d60da31dba199fc30689b67
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
2010-07-28 11:59:07 -07:00
Jeff Schumacher 396fe6da45 Break dissimilar file pairs during diff
File pairs that are very dissimilar during a diff were not being
broken apart into their constituent ADD/DELETE pairs. The leads to
sub-optimal rename detection. Take, for example, this situation:

A file exists at src/a.txt containing "foo". A user renames src/a.txt
to src/b.txt, then adds a new src/a.txt containing "bar".

Even though the old a.txt and the new b.txt are identical, the
rename detection algorithm would not detect it as a rename since
it was already paired in a MODIFY. I added code to split all
MODIFYs below a certain score into their constituent ADD/DELETE
pairs. This allows situations like the one I described above to be
more correctly handled.

Change-Id: I22c04b70581f206bbc68c4cd1ee87a1f663b418e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-27 18:13:32 -07:00
Shawn O. Pearce fa9b225e06 Merge branch 'delta'
* delta: (103 commits)
  Discard the uncompressed delta as soon as its compressed
  Honor pack.windowlimit to cap memory usage during packing
  Honor pack.threads and perform delta search in parallel
  Cache small deltas during packing
  Implement delta generation during packing
  debug-show-packdelta:  Dump a pack delta to the console
  Initial pack format delta generator
  Add debugging toString() method to ObjectToPack
  Make ObjectToPack clearReuseAsIs signal available to subclasses
  Correctly classify the compressing objects phase
  Refactor ObjectToPack's delta depth setting
  Configure core.bigFileThreshold into PackWriter
  Add doNotDelta flag to ObjectToPack
  Add more configuration options to PackWriter
  Save object path hash codes during packing
  Add path hash code to ObjectWalk
  Add getObjectSize to ObjectReader
  Allow TemporaryBuffer.Heap to allocate smaller than 8 KiB
  Define a constant for 127 in DeltaEncoder
  Cap delta copy instructions at 64k
  ...

Conflicts:
	org.eclipse.jgit.pgm/src/org/eclipse/jgit/pgm/Diff.java
	org.eclipse.jgit/resources/org/eclipse/jgit/JGitText.properties
	org.eclipse.jgit/src/org/eclipse/jgit/JGitText.java
	org.eclipse.jgit/src/org/eclipse/jgit/revwalk/RewriteTreeFilter.java

Change-Id: I7c7a05e443a48d32c836173a409ee7d340c70796
2010-07-22 14:56:34 -07:00
Shawn O. Pearce 6e155d5f41 Merge branch 'js/rename'
* js/rename:
  Implemented file path based tie breaking to exact rename detection
  Added more test cases for RenameDetector
  Added very small optimization to exact rename detection
  Fixed Misleading Javadoc
  Added file path similarity to scoring metric in rename detection
  Fixed potential div by zero bug
  Added file size based rename detection optimization
  Create FileHeader from DiffEntry
  log: Implement --follow
  Cache the diff configuration section
  log: Add whitespace ignore options
  Format submodule links during differences
  Redo DiffFormatter API to be easier to use
  log, diff: Add rename detection support
  Implement similarity based rename detection
  Added a preliminary version of rename detection
  Refactored code out of FileHeader to facilitate rename detection
2010-07-16 10:22:15 -07:00
Shawn O. Pearce 0b46e70155 Fix infinite loop in IndexPack
A programming error using the Inflater API led to an infinite
loop within IndexPack, caused by the Inflater returning 0 from
the inflate() method, but it didn't want more input.  This happens
when it has reached the end of the stream, or has reached a spot
asking for an external dictionary.  Such a case is a failure for us,
and we should abort out.

Thanks to Alex for pointing out that we had 3 implementations of
the inflate rountine, which should be consolidated into one and
use a switch to determine where to load data from.

Bug: 317416
Change-Id: I34120482375b687ea36ed9154002d77047e94b1f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-16 10:12:04 -07:00
Stefan Lay 233e0130b5 Git Porcelain API: Add Command
The new Add command adds files to the Git Index. 
It  uses the DirCache to access the git index. It 
works also in case of an existing conflict. 

Fileglobs (e.g. *.c) are not yet supported. 

The new Add command does add ignored files because
there is no gitignore support in jgit yet.

Bug: 318440
Change-Id: If16fdd4443e46b27361c2a18ed8f51668af5d9ff
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
2010-07-14 11:24:58 +00:00
Robin Rosenberg d787a82e50 Internationalize RepositoryState descriptions
Change-Id: I104cd62f3e89acf010b1d40a2b08e7f68f63bb85
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-07-10 10:24:37 +02:00
Shawn O. Pearce 8a0c58394d log: Add whitespace ignore options
Similar to what we did with diff, implement whitespace ignore options
for log too.  This requires us to define some means of creating any
RawText object type at will inside of DiffFormatter, so we define a
new factory interface to construct RawText instances on demand.

Unfortunately we have to copy the entire block of common options.
args4j only processes the options/arguments on the one command class
and Java doesn't support multiple inheritance.

Change-Id: Ia16cd3a11b850fffae9fbe7b721d7e43f1d0e8a5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 17:32:47 -07:00
Shawn O. Pearce 5be90be996 Redo DiffFormatter API to be easier to use
Passing around the OutputStream and the Repository is crazy.  Instead
put the stream in the constructor, since this formatter exists only to
output to the stream, and put the repository as a member variable that
can be optionally set.

Change-Id: I2bad012fee7f40dc1346700ebd19f1e048982878
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 16:58:37 -07:00
Shawn O. Pearce 978535b090 Implement similarity based rename detection
Content similarity based rename detection is performed only after
a linear time detection is performed using exact content match on
the ObjectIds.  Any names which were paired up during that exact
match phase are excluded from the inexact similarity based rename,
which reduces the space that must be considered.

During rename detection two entries cannot be marked as a rename
if they are different types of files.  This prevents a symlink from
being renamed to a regular file, even if their blob content appears
to be similar, or is identical.

Efficiently comparing two files is performed by building up two
hash indexes and hashing lines or short blocks from each file,
counting the number of bytes that each line or block represents.

Instead of using a standard java.util.HashMap, we use a custom
open hashing scheme similiar to what we use in ObjecIdSubclassMap.
This permits us to have a very light-weight hash, with very little
memory overhead per cell stored.

As we only need two ints per record in the map (line/block key and
number of bytes), we collapse them into a single long inside of
a long array, making very efficient use of available memory when
we create the index table.  We only need object headers for the
index structure itself, and the index table, but not per-cell.
This offers a massive space savings over using java.util.HashMap.

The score calculation is done by approximating how many bytes are
the same between the two inputs (which for a delta would be how much
is copied from the base into the result).  The score is derived by
dividing the approximate number of bytes in common into the length
of the larger of the two input files.

Right now the SimilarityIndex table should average about 1/2 full,
which means we waste about 50% of our memory on empty entries
after we are done indexing a file and sort the table's contents.
If memory becomes an issue we could discard the table and copy all
records over to a new array that is properly sized.

Building the index requires O(M + N log N) time, where M is the
size of the input file in bytes, and N is the number of unique
lines/blocks in the file.  The N log N time constraint comes
from the sort of the index table that is necessary to perform
linear time matching against another SimilarityIndex created for
a different file.

To actually perform the rename detection, a SxD matrix is created,
placing the sources (aka deletions) along one dimension and the
destinations (aka additions) along the other.  A simple O(S x D)
loop examines every cell in this matrix.

A SimilarityIndex is built along the row and reused for each
column compare along that row, avoiding the costly index rebuild
at the row level.  A future improvement would be to load a smaller
square matrix into SimilarityIndexes and process everything in that
sub-matrix before discarding the column dimension and moving down
to the next sub-matrix block along that same grid of rows.

An optional ProgressMonitor is permitted to be passed in, allowing
applications to see the progress of the detector as it works through
the matrix cells.  This provides some indication of current status
for very long running renames.

The default line/block hash function used by the SimilarityIndex
may not be optimal, and may produce too many collisions.  It is
borrowed from RawText's hash, which is used to quickly skip out of
a longer equality test if two lines have different hash functions.
We may need to refine this hash in the future, in order to minimize
the number of collisions we get on common source files.

Based on a handful of test commits in JGit (especially my own
recent rename repository refactoring series), this rename detector
produces output that is very close to C Git.  The content similarity
scores are sometimes off by 1%, which is most probably caused by
our SimilarityIndex type using a different hash function than C
Git uses when it computes the delta size between any two objects
in the rename matrix.

Bug: 318504
Change-Id: I11dff969e8a2e4cf252636d857d2113053bdd9dc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 16:32:03 -07:00
Shawn O. Pearce 08d349a27b amend commit: Refactor repository construction to builder class
During code review, Alex raised a few comments about commit
532421d989 ("Refactor repository construction to builder class").
Due to the size of the related series we aren't going to go back
and rebase in something this minor, so resolve them as a follow-up
commit instead.

Change-Id: Ied52f7a8f7252743353c58d20bfc3ec498933e00
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 10:54:30 -07:00
Jeff Schumacher cb8e1e6014 Added a preliminary version of rename detection
JGit does not currently do rename detection during diffs. I added
a class that, given a TreeWalk to iterate over, can output a list
of DiffEntry's for that TreeWalk, taking into account renames. This
class only detects renames by SHA1's. More complex rename detection,
along the lines of what C Git does will be added later.

Change-Id: I93606ce15da70df6660651ec322ea50718dd7c04
2010-07-01 17:33:53 -07:00
Shawn O. Pearce 71aace52f7 Simplify ObjectLoaders coming from PackFile
We no longer need an ObjectLoader to be lazy and try to delay
the materialization of the object content.  That was done only
to support PackWriter searching for a good reuse candidate.

Instead, simplify the code base by doing the materialization
immediately when the loader asks for it, because any caller
asking for the loader is going to need the content.

Change-Id: Id867b1004529744f234ab8f9cfab3d2c52ca3bd0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 18:50:38 -07:00
Shawn O. Pearce 532421d989 Refactor repository construction to builder class
The new FileRepositoryBuilder class helps applications to construct
a properly configured FileRepository, with properties assumed based
upon the standard Git rules for the local filesystem.

To better support simple command line applications, environment
variable handling and repository searching was moved into this
builder class.

The change gets rid of the ever-growing FileRepository constructor
variants, and the multitude of java.io.File typed parameters,
by using simple named setter methods.

Change-Id: I17e8e0392ad1dbf6a90a7eb49a6d809388d27e4c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:58:40 -07:00
Mathias Kinzler 3c51b35e03 "Bare" Repository should not return working directory.
If a repository is "bare", it currently still returns a working directory.
This conflicts with the specification of "bare"-ness.

Bug: 311902

Change-Id: Ib54b31ddc80b9032e6e7bf013948bb83e12cfd88
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
2010-06-16 08:50:26 +02:00
Stefan Lay 5b0e73b849 Add a merge command to the jgit API
Merges the current head with one other commit.
In this first iteration the merge command supports
only fast forward and already up-to-date.

Change-Id: I0db480f061e01b343570cf7da02cac13a0cbdf8f
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
2010-05-24 09:52:28 -05:00
Christian Halstrick 6ca9843f3e Added merge support to CommitCommand
The CommitCommand should take care to create a merge commit if the file
$GIT_DIR/MERGE_HEAD exists. It should then read the parents for the merge
commit out of this file. It should also take care that when commiting
a merge and no commit message was specified to read the message from
$GIT_DIR/MERGE_MSG.
Finally the CommitCommand should remove these files if the commit
succeeded.

Change-Id: 	I4e292115085099d5b86546d2021680cb1454266c
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
2010-05-21 01:49:46 +02:00
Sasa Zivkov f3d8a8ecad Externalize strings from JGit
The strings are externalized into the root resource bundles.
The resource bundles are stored under the new "resources" source
folder to get proper maven build.

Strings from tests are, in general, not externalized. Only in
cases where it was necessary to make the test pass the strings
were externalized. This was typically necessary in cases where
e.getMessage() was used in assert and the exception message was
slightly changed due to reuse of the externalized strings.

Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
2010-05-19 14:37:16 -07:00