Commit Graph

345 Commits

Author SHA1 Message Date
Shawn O. Pearce e9de5643fa Cache the diff configuration section
This way we don't have to reparse for the rename limit every time
we create a new rename detector for a repository.

Change-Id: I669d031690b85ef4da5e39189be7173fb773fc56
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 18:17:52 -07:00
Shawn O. Pearce 8a0c58394d log: Add whitespace ignore options
Similar to what we did with diff, implement whitespace ignore options
for log too.  This requires us to define some means of creating any
RawText object type at will inside of DiffFormatter, so we define a
new factory interface to construct RawText instances on demand.

Unfortunately we have to copy the entire block of common options.
args4j only processes the options/arguments on the one command class
and Java doesn't support multiple inheritance.

Change-Id: Ia16cd3a11b850fffae9fbe7b721d7e43f1d0e8a5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 17:32:47 -07:00
Shawn O. Pearce bd8740dc14 Format submodule links during differences
Instead of crashing, output a submodule link with the simple
"Subproject commit $fullid\n" syntax used by C Git.

Change-Id: Iae8646941683fb19b73fb038217d2e3bf5f77fa9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 16:59:06 -07:00
Shawn O. Pearce 5be90be996 Redo DiffFormatter API to be easier to use
Passing around the OutputStream and the Repository is crazy.  Instead
put the stream in the constructor, since this formatter exists only to
output to the stream, and put the repository as a member variable that
can be optionally set.

Change-Id: I2bad012fee7f40dc1346700ebd19f1e048982878
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 16:58:37 -07:00
Shawn O. Pearce 04a9d23b9a log, diff: Add rename detection support
Implement rename detection in the command line diff and log commands.
Also support --name-status, -p and -U flags, as these can be quite
useful to view more detail.

All of the Git patch file formatting code is now moved over to the
DiffFormatter class.  This permits us to reuse it in any context,
including inside of IDEs.

Change-Id: I687ccba34e18105a07e0a439d2181c323209d96c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 16:32:03 -07:00
Shawn O. Pearce 978535b090 Implement similarity based rename detection
Content similarity based rename detection is performed only after
a linear time detection is performed using exact content match on
the ObjectIds.  Any names which were paired up during that exact
match phase are excluded from the inexact similarity based rename,
which reduces the space that must be considered.

During rename detection two entries cannot be marked as a rename
if they are different types of files.  This prevents a symlink from
being renamed to a regular file, even if their blob content appears
to be similar, or is identical.

Efficiently comparing two files is performed by building up two
hash indexes and hashing lines or short blocks from each file,
counting the number of bytes that each line or block represents.

Instead of using a standard java.util.HashMap, we use a custom
open hashing scheme similiar to what we use in ObjecIdSubclassMap.
This permits us to have a very light-weight hash, with very little
memory overhead per cell stored.

As we only need two ints per record in the map (line/block key and
number of bytes), we collapse them into a single long inside of
a long array, making very efficient use of available memory when
we create the index table.  We only need object headers for the
index structure itself, and the index table, but not per-cell.
This offers a massive space savings over using java.util.HashMap.

The score calculation is done by approximating how many bytes are
the same between the two inputs (which for a delta would be how much
is copied from the base into the result).  The score is derived by
dividing the approximate number of bytes in common into the length
of the larger of the two input files.

Right now the SimilarityIndex table should average about 1/2 full,
which means we waste about 50% of our memory on empty entries
after we are done indexing a file and sort the table's contents.
If memory becomes an issue we could discard the table and copy all
records over to a new array that is properly sized.

Building the index requires O(M + N log N) time, where M is the
size of the input file in bytes, and N is the number of unique
lines/blocks in the file.  The N log N time constraint comes
from the sort of the index table that is necessary to perform
linear time matching against another SimilarityIndex created for
a different file.

To actually perform the rename detection, a SxD matrix is created,
placing the sources (aka deletions) along one dimension and the
destinations (aka additions) along the other.  A simple O(S x D)
loop examines every cell in this matrix.

A SimilarityIndex is built along the row and reused for each
column compare along that row, avoiding the costly index rebuild
at the row level.  A future improvement would be to load a smaller
square matrix into SimilarityIndexes and process everything in that
sub-matrix before discarding the column dimension and moving down
to the next sub-matrix block along that same grid of rows.

An optional ProgressMonitor is permitted to be passed in, allowing
applications to see the progress of the detector as it works through
the matrix cells.  This provides some indication of current status
for very long running renames.

The default line/block hash function used by the SimilarityIndex
may not be optimal, and may produce too many collisions.  It is
borrowed from RawText's hash, which is used to quickly skip out of
a longer equality test if two lines have different hash functions.
We may need to refine this hash in the future, in order to minimize
the number of collisions we get on common source files.

Based on a handful of test commits in JGit (especially my own
recent rename repository refactoring series), this rename detector
produces output that is very close to C Git.  The content similarity
scores are sometimes off by 1%, which is most probably caused by
our SimilarityIndex type using a different hash function than C
Git uses when it computes the delta size between any two objects
in the rename matrix.

Bug: 318504
Change-Id: I11dff969e8a2e4cf252636d857d2113053bdd9dc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 16:32:03 -07:00
Jeff Schumacher cb8e1e6014 Added a preliminary version of rename detection
JGit does not currently do rename detection during diffs. I added
a class that, given a TreeWalk to iterate over, can output a list
of DiffEntry's for that TreeWalk, taking into account renames. This
class only detects renames by SHA1's. More complex rename detection,
along the lines of what C Git does will be added later.

Change-Id: I93606ce15da70df6660651ec322ea50718dd7c04
2010-07-01 17:33:53 -07:00
Jeff Schumacher 7b0b4110ed Refactored code out of FileHeader to facilitate rename detection
Refactored a superclass out of FileHeader called DiffEntry that holds
the more general data from FileHeader that is useful in rename
detection (old/new Ids, modes, names, as well as changeType and
score). FileHeader is now a DiffEntry that adds Hunks, parsing
abilities, etc.

Change-Id: I8398728cd218f8c6e98f7a4a7f2f342391d865e4
2010-06-30 17:53:27 -07:00
Dmitry Neverov 44854741c5 Fix missing flush in StreamCopyThread
It is possible that StreamCopyThread will not flush everything
from it's src to it's dst.  In most cases StreamCopyThread works
like this:

  in loop:
    n = src.read(buf);
    dst.write(buf, 0, n);

and when we want to flush, we interrupt() StreamCopyThread and it
flushes everything it wrote to dst.

The problem is that our interrupt() could interrupt reading. In this
case we will flush everything we wrote to dst, but not everything
we wrote to src.

Change-Id: Ifaf4d8be87535c7364dd59b217dfc631460018ff
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 10:48:44 -07:00
Jeff Schumacher 9f2249bd26 Added check for binary files while diffing
Added a check in Diff to ensure that files that are most likely
not text are not line-by-line diffed. Files are determined to be
binary by checking the first 8000 bytes for a null character. This
is a similar heuristic to what C Git uses.

Change-Id: I2b6f05674c88d89b3f549a5db483f850f7f46c26
2010-06-29 17:23:00 -07:00
Matthias Sohn 730b708dae Merge "Update build to use Tycho 0.9.0" 2010-06-29 09:02:46 -04:00
Shawn Pearce 3fd4918852 Merge changes Ie56301aa,Ic2f79e85
* changes:
  Added further support for whitespace ignoring during diff
  Added support for whitespace ignoring
2010-06-28 20:27:04 -04:00
Jeff Schumacher 9869ef2592 Added further support for whitespace ignoring during diff
Added code to support ignoring leading, trailing, and changed
whitespace when performing a diff operation. I also added command
line options to Diff to enable the various whitespace ignoring
methods. These match the flags for git diff.

Change-Id: Ie56301aafad59ee3f0fe5de62719f5023cd702c8
2010-06-28 17:25:19 -07:00
Matthias Sohn a2325f6885 Update build to use Tycho 0.9.0
Change-Id: I589267e6cfd0514383c2a3da51c9b7a659f77844
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-06-29 00:08:36 +02:00
Jeff Schumacher 543235b805 Added support for whitespace ignoring
JGit did not have support for skipping whitespace when comparing
lines in RawText objects. I added a subclass of RawText that skips
whitespace in its equals and hashCode methods. I used a subclass
rather than adding functionality into RawText so that performance
would not be impacted by extra logic.

This class only supports ignoring all whitespace. Others will follow
that allow other forms of whitespace ignoring.

Change-Id: Ic2f79e85215e48d3fd53ec1b4ad13373dd183a4a
2010-06-28 10:59:10 -07:00
Shawn O. Pearce 5ed96eb7f4 UploadPack: Avoid unnecessary flush in smart HTTP
Under smart HTTP the biDirectionalPipe flag is false, and we return
back immediately at this point in the negotiation process.  There is
no need to flush the stream to the client, the request is over and
it will be automatically flushed out by the higher level servlet
that invoked us.  Avoiding flush here allows us to only use flush
after a progress message is sent during pack generation.

Change-Id: Id0c8b7e95e3be6ca4c1b479e096bed6b0283b828
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:15 -07:00
Shawn O. Pearce 066df3d1a1 Add MutableObjectId.copyFrom(AnyObjectId)
This simplifies the PackIndex code, which is trying to quickly copy
an existing ObjectId into a MutableObjectId.  Rather than having
the PackIndex violate the ObjectId's internals, expose a copy from
function similar to the other ones for copying from raw byte arrays
or hex formatted strings.

Change-Id: I142635cbece54af2ab83c58477961ce925dc8255
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:15 -07:00
Shawn O. Pearce 677b9b17e2 Expose AnyObjectId compareTo(byte[]) and compareTo(int[])
Storage systems can use these implementations to compare a passed
AnyObjectId with a stored representation of an ObjectId in the
canonical network byte order format.  This can be useful to do a
binary search, or just linear scan, over an encoded storage file.

Change-Id: I8c72993c4f4c6e98d599ac2c9867453752f25fd2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:15 -07:00
Shawn O. Pearce 864cc3de10 Expose RefWriter constructor taking RefList
An implementation might prefer to use the RefList type here, and
RefList is part of our public API.  Expose the constructor so callers
who have a RefList can take advantage of the existing sorting.

Change-Id: I545867f85aa2c479d2d610024ebbe318144709c8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:15 -07:00
Shawn O. Pearce bfc43c13bc Expose RefUpdate constructor to any subclass
When we finally move RefDirectory to the new storage.file package,
its associated RefDirectoryUpdate will need visiblity to this
constructor in order to initialize itself.  This is true of any
other repository implementation, so make it protected rather than
package level visible.

Change-Id: If838aec9baeb80ee2f12dcbca717657c725a9242
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:14 -07:00
Shawn O. Pearce 8e40697047 Expose repository change event constructors
Repository implementations outside of .lib need to be able to
create these events and deliver them to listening application code.

Expose and document the constructors so that they are visible when
we move FileRepository into storage.file.FileRepository.

Change-Id: I7fb6e8f4f5fdab683c5ebb5267673aa6d5b560bb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:14 -07:00
Shawn O. Pearce b3254d1159 isValidRefName: Inline the forbidden ref suffix of ".lock"
A Git reference name must never end with ".lock", as it would
confuse any existing C client that tries to obtain a clone of the
repository over the network.  Even if the repository isn't on a
local filesystem, it still should ban that suffix.

Because I plan to move LockFile to storage.file and make it a private
implementation detail of the local file system storage model,
we can't rely on its package level SUFFIX field here.  Making it
public probably won't work long-term either, as I also plan to
pull storage.file into its own separate project that depends on
the core library.

So, just inline the constant here.  Its as foribidden as ":" is.

Change-Id: If85076861baeacc183b82696375a13e935ba8836
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:14 -07:00
Shawn O. Pearce 252cd74eb0 Remove pack stream from PackWriterTest
This stream was used only to determine how many bytes had been
written thus far.  Except we're always dumping it into a simple
ByteArrayOutputStream, which also knows that.  Drop the dependency
on the pack stream and use ByteArrayOutputStream directly.

This lets us later move this test into the new storage.file
package without dragging along the pack stream that is an internal
implementation detail of PackWriter, which is more general than
just the file storage layer.

Change-Id: I291689c0b1ed799270c213ee73b710b2637fb238
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:14 -07:00
Shawn O. Pearce a5aec660eb Remove pointless setOldObjectId in test
Setting this value is pointless, because its automatically set
by the refs.newUpdate call that created the update operation.
The API is protected by default, because application level code,
including this test, should not be calling it.

Change-Id: I8867a4e8007892e2bd44a05d7dec619081081943
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:14 -07:00
Shawn O. Pearce 66e5895eb4 Remove speed tests based on mapCommit
The mapCommit API is being deprecated because it doesn't run very
fast.  Leaving tests around to test how fast it is relative to C Git
isn't instructive.  Remove them, which should help aid the transition
away from the mapCommit API.

Change-Id: I27e1c844610d7da5b2c44b33a00602706973c9cc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:14 -07:00
Matthias Sohn 33ae23b8b9 Change default target platform for maven build to galileo
Starting with 0.9 we do no longer support ganymede.
http://dev.eclipse.org/mhonarc/lists/egit-dev/msg01277.html

Change-Id: Ibf40342f67d9706e86336748f15d10ea47278096
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-06-19 01:06:14 +02:00
Shawn Pearce f3186974b6 Merge "Fix line endings" 2010-06-18 18:15:53 -04:00
Matthias Sohn 767fb175ed Fix line endings
Some sources had dos line endings. Also configure all projects to use
unix line endings and UTF-8 text encoding.

Change-Id: I8fc9a1dbb219ffa91d1b3011b3b11b7e48e74ca7
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-06-18 23:36:18 +02:00
Shawn Pearce 3149f971e0 Merge ""Bare" Repository should not return working directory." 2010-06-16 22:34:46 -04:00
Andrew Bayer 068eb92710 Make ObjectId, RefSpec, RemoteConfig, URIish serializable
Modifications to various classes in order to allow serialization
for use of JGit in Hudson's git plugin.

Change-Id: If088717d3da7483538c00a927e433a74085ae9e6
2010-06-16 16:10:28 -07:00
Mathias Kinzler 3c51b35e03 "Bare" Repository should not return working directory.
If a repository is "bare", it currently still returns a working directory.
This conflicts with the specification of "bare"-ness.

Bug: 311902

Change-Id: Ib54b31ddc80b9032e6e7bf013948bb83e12cfd88
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
2010-06-16 08:50:26 +02:00
Matthias Sohn 93ccd4a7fe Merge "tools/version.sh: Use backup files on Win32" 2010-06-15 19:34:52 -04:00
Chris Aniszczyk 8a11ac3d69 Merge "Add missing @Override tags in AlternateRepositoryDatabase" 2010-06-15 11:40:04 -04:00
Mathias Kinzler c1c1300a74 Allow to read configured keys
Currently, there is no way to read the content
of the Git Configuration in a  way that would
allow to list all configured values generically.
This change extends the Config class in such a
way as to being able to get a list of sections and
to get a list of names for any given section or
subsection.
This is required in able to implement proper
configuration handling in EGit (show all the
content of a given configuration similar to 
"git config -l").

Change-Id: Idd4bc47be18ed0e36b11be8c23c9c707159dc830
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
2010-06-15 10:12:26 +02:00
Shawn Pearce 86fcdc53ad Merge changes I53f71dc0,I3a899a3a,I3e8bd245,Ie7c9db83,If396326e,I6f4cf8da,I3bf96dd0,I3a2a43a1,I292fe88c,Ia1cf40cf
* changes:
  git-servlet: Fix comparing uploadFactory with the wrong DISABLED instance
  Prefer static inner classes
  Override equals for SwingLane since super class PlotLane defines it
  Make sure a Stream is closed upon errors in IpLogGenerator
  Make constant static in RebuildCommitGraph
  Make inner classes static in http code
  Cache filemode in GitIndex 
  Remove unused parent field in PlotLane
  Removed unused repo field in WorkDirCheckout
  Extend DiffFormatter API to simplify styling
2010-06-14 19:59:48 -04:00
Robin Rosenberg 6d5241110b git-servlet: Fix comparing uploadFactory with the wrong DISABLED instance
Change-Id: I53f71dc0e3c68839da5ff5a2e0f3eeb8340e4793
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-14 23:47:59 +02:00
Shawn O. Pearce bc238acdc5 Add missing @Override tags in AlternateRepositoryDatabase
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-14 12:59:30 -07:00
Shawn O. Pearce 200f3caefc tools/version.sh: Use backup files on Win32
Windows doesn't permit us to edit a file in-place with Perl.
So create backup files when we perform the edit, and remove them
when we are done.  This is a tad slower on POSIX systems, but is
much more portable.

Change-Id: I429c7d698924cb32e709363f5da82f7232bbdab2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-14 08:19:56 -07:00
Shawn O. Pearce 44ba1bc78c Merge branch 'stable-0.8'
* stable-0.8:
  Qualify post-0.8.4 builds
  JGit 0.8.4
  JGit 0.8.3
  Include about.html in org.eclipse.jgit artifact
  Fix build.properties of the JGit feature
  Added the standard SULA for JGit
  Add "resources/" as a source folder

Change-Id: I4ecb0af41184ef84d104345fd1adcc4a240a38f6
2010-06-14 08:12:48 -07:00
Shawn O. Pearce 239ce58553 Start 0.9 development
Change-Id: I84173ece5100f1fcb78168e2e102b649d9466c08
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-14 08:11:27 -07:00
Shawn O. Pearce d28a40d679 Qualify post-0.8.4 builds
Change-Id: I21efed66921eb7e1e4010fccc9fa9af6c4150fc1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-14 08:10:08 -07:00
Matthias Sohn 6970edf35a JGit 0.8.4
Created wrong tags for 0.8.3 hence creating another version.

Change-Id: I4e00bbcffe1cf872e2d7e3f3d88d068701fb5330
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-06-14 15:42:09 +02:00
Matthias Sohn 5255d66143 JGit 0.8.3
Change-Id: I845da83c74475d74ec25d68f53c0a4738a898550
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-06-14 01:34:34 +02:00
Robin Rosenberg 3a899a3af9 Prefer static inner classes
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-13 03:31:52 +02:00
Robin Rosenberg 3e8bd24580 Override equals for SwingLane since super class PlotLane defines it
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-13 03:30:10 +02:00
Robin Rosenberg e7c9db836b Make sure a Stream is closed upon errors in IpLogGenerator
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-13 03:28:04 +02:00
Robin Rosenberg f396326e0b Make constant static in RebuildCommitGraph
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-13 03:27:06 +02:00
Robin Rosenberg 6f4cf8daec Make inner classes static in http code
Static classes are preferrable to keep unwanted dependencies away, 
and they have one less member field.

Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-13 03:19:47 +02:00
Robin Rosenberg 3bf96dd04b Cache filemode in GitIndex
Apparently this was the intention, but never happened

Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-13 03:16:32 +02:00
Robin Rosenberg 3a2a43a1dc Remove unused parent field in PlotLane
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-13 03:13:57 +02:00