Commit Graph

588 Commits

Author SHA1 Message Date
Shawn O. Pearce 97e93ca1ea Merge "Remove static progress task names from PackWriter" 2010-08-05 21:11:30 -04:00
Chris Aniszczyk b69900a415 Merge "Add "all" parameter to the commit Command" 2010-08-05 15:02:59 -04:00
Chris Aniszczyk ad4274abcc Merge "Add the parameter "update" to the Add command" 2010-08-05 15:01:53 -04:00
Stefan Lay 4b464ed458 Allow to replace existing Change-Id
It is useful to be able to replace an existing Change-Id
in the message, for example if the user decides not to
amend the previous commit.

Bug: 321188
Change-Id: I594e7f9efd0c57d794d2bd26d55ec45f4e6a47fd
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
2010-08-05 12:23:38 -05:00
Shawn O. Pearce 60c5939b23 Rename getOldName,getNewName to getOldPath,getNewPath
TreeWalk calls this value "path", while "name" is the stuff after the
last slash.  FileHeader should do the same thing to be consistent.
Rename getOldName to getOldPath and getNewName to getNewPath.

Bug: 318526
Change-Id: Ib2e372ad4426402d37939b48d8f233154cc637da
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-08-04 11:00:07 -07:00
Shawn O. Pearce 7514a6dbdc Merge branch 'js/diff'
* js/diff:
  Fixed bug in scoring mechanism for rename detection
2010-08-04 10:59:35 -07:00
Jeff Schumacher e64cb03065 Fixed bug in scoring mechanism for rename detection
A bug in rename detection would cause file scores to be wrong. The
bug was due to the way rename detection would judge the similarity
between files. If file A has three lines containing 'foo', and file
B has 5 lines containing 'foo', the rename detection phase should
record that A and B have three lines in common (the minimum of the
number of times that line appears in both files). Instead, it would
choose the the number of times the line appeared in the destination
file, in this case file B. I fixed the bug by having the
SimilarityIndex instead choose the minimum number, as it should. I
also added a test case to verify that the bug had been fixed.

Change-Id: Ic75272a2d6e512a361f88eec91e1b8a7c2298d6b
2010-08-04 10:56:19 -07:00
Jens Baumgart 3ba1c7c068 Add gitignore support to IndexDiff and use TreeWalk
IndexDiff was re-implemented and now uses TreeWalk instead
of GitIndex. Additionally, gitignore support and retrieval of
untracked files was added.

Change-Id: Ie6a8e04833c61d44c668c906b161202b200bb509
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
2010-08-04 10:03:20 -05:00
Stefan Lay ab57af08e8 Add "all" parameter to the commit Command
When the add parameter is set all modified and deleted files
are staged prior to commit.

Change-Id: Id23bc25730fcdd151386cd495a7cdc0935cbc00b
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
2010-08-04 13:53:08 +02:00
Stefan Lay fa7d9ac5b8 Add the parameter "update" to the Add command
This change is mainly done for a subsequent commit
which will introduce the "all" parameter to the Commit
command.

Bug: 318439
Change-Id: I85a8a76097d0197ef689a289288ba82addb92fc9
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
2010-08-04 13:36:45 +02:00
Christian Halstrick 94207f0a43 Make use of Repository.writeMerge...()
The CommitCommand should not use java.io to delete MERGE_HEAD and MERGE_MSG
files since Repository already has utility methods for that.

Change-Id: If66a419349b95510e5b5c2237a91f06c1d5ba0d4
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
2010-07-29 15:12:14 +02:00
Christian Halstrick fba2437111 Merge "Fix tag sorting in PlotWalk" 2010-07-28 17:13:27 -04:00
Shawn O. Pearce 5f5da8b1d4 Enable configuration of non-standard pack settings
For daemons we might want to disable delta compression entirely, or
in some strange case an administrator might need to turn of delta
reuse.  Expose these normally internal pack settings through the pack
configuration section.

Change-Id: I39bfefee8384c864cc04ffac724f197240c8a11a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-28 12:13:48 -07:00
Shawn O. Pearce 9fbce904e6 Pass PackConfig down to PackWriter when packing
When we are creating a pack the higher level application should be able
to override the PackConfig used, allowing it to control the number of
threads used or how much memory is allocated per writer.

Change-Id: I47795987bb0d161d3642082acc2f617d7cb28d8c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-28 12:13:48 -07:00
Shawn O. Pearce bb99ec0aa0 Simplify UploadPack use of options during writing
We only use these variables once, so just put them at the proper
use site and avoid assigning the local variable.  The code is a
bit shorter and the intent is a little bit more clear.

Change-Id: I70d120fb149b612ac93055ea39bc053b8d90a5db
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-28 12:13:48 -07:00
Shawn O. Pearce 1a06179ea7 Move PackWriter configuration to PackConfig
This refactoring permits applications to configure global per-process
settings for all packing and easily pass it through to per-request
PackWriters, ensuring that the process configuration overrides the
repository specific settings.

For example this might help in a daemon environment where the server
wants to cap the resources used to serve a dynamic upload pack
request, even though the repository's own pack.* settings might be
configured to be more aggressive.  This allows fast but less bandwidth
efficient serving of clients, while still retaining good compression
through a cron managed `git gc`.

Change-Id: I58cc5e01b48924b1a99f79aa96c8150cdfc50846
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-28 12:13:48 -07:00
Mathias Kinzler 6e59e6dab9 Meaningful error message when trying to check-out submodules
Currently, a NullPointerException occurs in this case. We should
instead throw a more meaningful Exception with a proper message.
This is a very "stupid" implementation which simply checks for
the existence of a ".gitmodules" file.

Bug: 300731
Bug: 306765
Bug: 308452
Bug: 314853
Change-Id: I155aa340a85cbc5d7d60da31dba199fc30689b67
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
2010-07-28 11:59:07 -07:00
Christian Halstrick 08c0c5d938 Fix unit tests under windows
the following tests fail under windows because certain inputstreams
are not closed and files cannot be deleted because of that.  The
main problem I found is UnpackedObject.InflaterInputStream.close().
This method may throw exceptions found by checkValidEndOfStream()
but doesn't call super.close() before leaving. It is not clear to me
which resources a close() method should release before it throws an
exception. But those reseources which are not published to the
outside and which therefore cannot be closed by other means have to
be closed in all cases.
I changed the close() method to call super.close() under all
circumstances.

failing tests:
  testStandardFormat_LargeObject_TruncatedZLibStream(org.eclipse.jgit.storage.file.UnpackedObjectTest)
  testStandardFormat_LargeObject_TrailingGarbage(org.eclipse.jgit.storage.file.UnpackedObjectTest)
  testPackFormat_SmallObject(org.eclipse.jgit.storage.file.UnpackedObjectTest)

Change-Id: Id2e609a29e725aad953ff9bd88af6381df38399d
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
2010-07-28 11:55:11 -07:00
Shawn O. Pearce d0f8d1e819 Fix tag sorting in PlotWalk
By deferring tag sorting until the commit is produced by the walker
we can avoid an infinite loop that was triggered by trying to sort
tags while allocating a commit.  This also avoids needing to look
at commits which aren't going to be produced in the result.

Bug: 321103
Change-Id: I25acc739db2ec0221a50b72c2d2aa618a9a75f37
Reviewed-by: Mathias Kinzler <mathias.kinzler@sap.com>
Reviewed-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-28 11:51:17 -07:00
Shawn O. Pearce 21f76c2a69 Remove static progress task names from PackWriter
These need to be dynamic based on the current thread's environment
at time of execution in order to be properly localized for the end
user that will be seeing these messages.

Change-Id: I4976f462cfe606edd2761c0e36b2f6b20f63d53c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-28 10:50:28 -07:00
Shawn O. Pearce 1b783d0370 Allow PackWriter callers to manage the thread pool
By permitting the caller of PackWriter to select the Executor it
uses for task execution, we give the caller the ability to manage
the lifecycle of the thread pool, including reusing it across
concurrent pack generators.

This is the first step to supporting application thread pools
within Daemon or another managed service like Gerrit Code Review.

Change-Id: I96bee7b9c30ff9885f2bd261d0b6daaac713b5a4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-28 10:50:28 -07:00
Christian Halstrick 74d279fbf0 Teach NameConflictTreeWalk to report DF conflicts
Add a method isDirectoryFileConflict() to NameConflictTreeWalk which
tells whether the current path is part of a directory/file conflict.

Change-Id: Iffcc7090aaec743dd6f3fd1a333cac96c587ae5d
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-28 17:26:31 +02:00
Mathias Kinzler 51c6f513b0 Stack Overflow in EGit History View
This is caused by a recursion in PlotWalk.getTags().
As a hotfix, the sort was simply removed. The sort
must be re-implemented so that parseAny() is not called
again (currently, this happens in the PlotRefComparator).

Change-Id: I060d26fda8a75ac803acaf89cfb7d3b4317328f3
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
2010-07-28 11:46:05 +02:00
Jeff Schumacher 396fe6da45 Break dissimilar file pairs during diff
File pairs that are very dissimilar during a diff were not being
broken apart into their constituent ADD/DELETE pairs. The leads to
sub-optimal rename detection. Take, for example, this situation:

A file exists at src/a.txt containing "foo". A user renames src/a.txt
to src/b.txt, then adds a new src/a.txt containing "bar".

Even though the old a.txt and the new b.txt are identical, the
rename detection algorithm would not detect it as a rename since
it was already paired in a MODIFY. I added code to split all
MODIFYs below a certain score into their constituent ADD/DELETE
pairs. This allows situations like the one I described above to be
more correctly handled.

Change-Id: I22c04b70581f206bbc68c4cd1ee87a1f663b418e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-27 18:13:32 -07:00
Christian Halstrick f56a459966 Add methods which write MERGE_HEAD and MERGE_MSG
Add methods to the Repository class which write into MERGE_HEAD
and MERGE_MSG files. Since we have the read methods in the same
class this seems to be the right place.

Change-Id: I5dd65306ceb06e008fcc71b37ca3a649632ba462
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-27 11:48:23 -07:00
Jens Baumgart db82b8d7eb Fix concurrent read / write issue in LockFile on Windows
LockFile.commit fails if another thread concurrently reads
the base file. The problem is fixed by retrying the rename
operation if it fails.

Change-Id: I6bb76ea7f2e6e90e3ddc45f9dd4d69bd1b6fa1eb
Bug: 308506
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
2010-07-27 10:00:47 -07:00
Robin Stocker a00377a7e2 Fix Javadoc warnings
There were some broken links, incorrect uses of @value, an invalid
tag and an outdated comment.

Change-Id: I22886bcc869a4b62bd606ebed40669f7b4723664
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-27 09:40:01 -07:00
Shawn O. Pearce 80fe789690 Make forPath(ObjectReader) variant in TreeWalk
This simplifies the logic for those who already have an ObjectReader
on hand want to reuse it to lookup a single path.

Change-Id: Ief17d6b2a0674ddb34bbc9f43121b756eae960fb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-27 08:36:24 -07:00
Shawn O. Pearce 7ff18f3ec9 Make StoredConfig an abstraction above FileBasedConfig
This exposes a load and save method, allowing a Repository to denote
that it has a persistent configuration of some kind which can be
accessed by the application, without needing to know exact details
of how its stored .

Change-Id: I7c414bc0f975b80f083084ea875eca25c75a07b2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-26 16:50:11 -07:00
Shawn O. Pearce fa9b225e06 Merge branch 'delta'
* delta: (103 commits)
  Discard the uncompressed delta as soon as its compressed
  Honor pack.windowlimit to cap memory usage during packing
  Honor pack.threads and perform delta search in parallel
  Cache small deltas during packing
  Implement delta generation during packing
  debug-show-packdelta:  Dump a pack delta to the console
  Initial pack format delta generator
  Add debugging toString() method to ObjectToPack
  Make ObjectToPack clearReuseAsIs signal available to subclasses
  Correctly classify the compressing objects phase
  Refactor ObjectToPack's delta depth setting
  Configure core.bigFileThreshold into PackWriter
  Add doNotDelta flag to ObjectToPack
  Add more configuration options to PackWriter
  Save object path hash codes during packing
  Add path hash code to ObjectWalk
  Add getObjectSize to ObjectReader
  Allow TemporaryBuffer.Heap to allocate smaller than 8 KiB
  Define a constant for 127 in DeltaEncoder
  Cap delta copy instructions at 64k
  ...

Conflicts:
	org.eclipse.jgit.pgm/src/org/eclipse/jgit/pgm/Diff.java
	org.eclipse.jgit/resources/org/eclipse/jgit/JGitText.properties
	org.eclipse.jgit/src/org/eclipse/jgit/JGitText.java
	org.eclipse.jgit/src/org/eclipse/jgit/revwalk/RewriteTreeFilter.java

Change-Id: I7c7a05e443a48d32c836173a409ee7d340c70796
2010-07-22 14:56:34 -07:00
Stefan Lay ab062caa22 Allow client of Add command to set a WorkingTreeIterator
This is e.g. useful when a client of the AddCommand has
additional rules to ignore files. In Eclipse a resource can
be set to derived or be excluded by preferences.

Change-Id: I6c47e54a1ce26315faf5ed0723298ad2c2db197c
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
2010-07-22 14:57:00 +02:00
Stefan Lay 88957f6c5a Allow for filepattern "." in AddCommand
Enable adding on repository root level.

Change-Id: I415b10dc74cc9435578424d9f106c972fd703055
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
2010-07-22 14:27:35 +02:00
Stefan Lay aa86cfc339 Do not add ignored files in Add command
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
2010-07-22 11:26:04 +02:00
Shawn O. Pearce 09910ffa32 Move ignore node handling into WorkingTreeIterator
The working tree iterator has perfect knowledge of the path structure
as well as immediate information about whether or not an ignore file
even exists at this level.  We can exploit that to simplify the
logic and running time for testing ignored file status by pushing
all of the checks down into the iterator itself.

Change-Id: I22ff534853e8c5672cc5c2d9444aeb14e294070e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Charley Wang <chwang@redhat.com>
CC: Chris Aniszczyk <caniszczyk@gmail.com>
CC: Stefan Lay <stefan.lay@sap.com>
CC: Matthias Sohn <matthias.sohn@sap.com>
2010-07-21 10:34:08 -07:00
Shawn Pearce 0ec0e21fdf Merge "Fix concurrent read / write issue in GitIndex on Windows" 2010-07-21 13:08:01 -04:00
Jens Baumgart e99c48a61a Fix concurrent read / write issue in GitIndex on Windows
GitIndex.write fails if another thread concurrently reads
the index file. The problem is fixed by retrying the rename
operation if it fails.

Bug: 311051
Change-Id: Ib243d2a90adae312712d02521de4834d06804944
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
2010-07-21 09:35:15 +02:00
Christian Halstrick 5c94321b47 Check for racy git in WorkingTreeIterator
The WorkingTreeIterator has a method to check whether
the current file differs from the corresponding index
entry. This commit improves this check to also handle
racy git situations.

See http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/technical/racy-git.txt;hb=HEAD

Change-Id: I3ad0897211dcbb2eac9eebcb19d095a5052fb06b
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-07-20 21:55:18 +02:00
Christian Halstrick c98d97731b Smudge racily clean index entries by truncating length (like git.git)
To mark an entry racily clean we set its length to 0 (like native git
does). Entries which are not racily clean and have zero length can be
distinguished from racily clean entries by checking P_OBJECTID
against the SHA1 of empty content. When length is 0 and P_OBJECTID is
different from SHA1 of empty content we know the entry is marked
racily clean.

See http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg00488.html

Change-Id: I689552931441ab51964b430b303160c9126b66af
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-07-20 21:54:36 +02:00
Shawn O. Pearce 938943d674 Use proper constants for .gitignore and .git directory
We have a constant for .gitignore, so use it.  While we are in
the same method, correct the reference of ".git" to be the actual
GIT_DIR given.  This might not be within the work tree if the
GIT_DIR and GIT_WORK_TREE environment variables were used.

Change-Id: I38e1cec13405109b9c347858b38dd9fb2f1f2560
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Charley Wang <chwang@redhat.com>
CC: Chris Aniszczyk <caniszczyk@gmail.com>
CC: Stefan Lay <stefan.lay@sap.com>
CC: Matthias Sohn <matthias.sohn@sap.com>
2010-07-20 09:11:39 -07:00
Shawn O. Pearce c59db09bc5 Remove gitIgnoreTimestamp from abstract iterator API
This never should have been exposed on the top of the
AbstractTreeIterator type hierarchy.  There is no concept of a
timestamp in a canonical tree read from the object database, and
the time in the DirCache isn't what we want here either.

Actually all that we need is to find the files whose names are
".gitignore" and are below the root directory.  We can accomplish
that with a suffix filter, and process them immediately.

Change-Id: Ib09cbf81a9e038452ce491385c65498312e2916b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Charley Wang <chwang@redhat.com>
CC: Chris Aniszczyk <caniszczyk@gmail.com>
CC: Stefan Lay <stefan.lay@sap.com>
CC: Matthias Sohn <matthias.sohn@sap.com>
2010-07-20 09:09:01 -07:00
Shawn O. Pearce 395d236058 Fix NPE in RenameDetector
If we have two adds of the same object but no deletes the detector
threw an NPE because the entry that came back from the deleted map
was null (no matching objects).  In this case we need to put the
adds all back onto the list of left over additions since they did
not match a delete.

Change-Id: Ie68fbe7426b4dc0cb571a08911c7adbffff755d5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Jeffrey Schumacher" <jeffschu@google.com>
2010-07-20 07:52:35 -07:00
Shawn O. Pearce b518189b5c IndexPack: Fix spurious pack file corruption errors
We didn't correctly handle the zlib trailer for an object.  If the
trailer bytes were outside of the current buffer window but we had
fully inflated the object itself, we broke out of the loop (as we had
our target size) but inflate wasn't finished (as it did not yet get
the trailer) so we failed the test and threw a corruption exception.

Use an infinite loop and only break out when the inflater is done.

Change-Id: I7c9bbbeb577a990d9bc56a50ebd485935460f6c8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-20 07:40:48 -07:00
Shawn O. Pearce 12fe0f2d1e Discard the uncompressed delta as soon as its compressed
The DeltaCache will most likely need to copy the compressed delta
into a new buffer in order to compact away the wasted space at the
end caused by over allocation.  Since we don't need the uncompressed
format anymore, null out our only reference to it so the GC can
reclaim this memory if it needs to perform a collection in order
to satisfy the cache's allocation attempt.

Change-Id: I50403cfd2e3001b093f93a503cccf7adab43cc9d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-16 10:41:09 -07:00
Shawn O. Pearce 6e155d5f41 Merge branch 'js/rename'
* js/rename:
  Implemented file path based tie breaking to exact rename detection
  Added more test cases for RenameDetector
  Added very small optimization to exact rename detection
  Fixed Misleading Javadoc
  Added file path similarity to scoring metric in rename detection
  Fixed potential div by zero bug
  Added file size based rename detection optimization
  Create FileHeader from DiffEntry
  log: Implement --follow
  Cache the diff configuration section
  log: Add whitespace ignore options
  Format submodule links during differences
  Redo DiffFormatter API to be easier to use
  log, diff: Add rename detection support
  Implement similarity based rename detection
  Added a preliminary version of rename detection
  Refactored code out of FileHeader to facilitate rename detection
2010-07-16 10:22:15 -07:00
Shawn O. Pearce 0b46e70155 Fix infinite loop in IndexPack
A programming error using the Inflater API led to an infinite
loop within IndexPack, caused by the Inflater returning 0 from
the inflate() method, but it didn't want more input.  This happens
when it has reached the end of the stream, or has reached a spot
asking for an external dictionary.  Such a case is a failure for us,
and we should abort out.

Thanks to Alex for pointing out that we had 3 implementations of
the inflate rountine, which should be consolidated into one and
use a switch to determine where to load data from.

Bug: 317416
Change-Id: I34120482375b687ea36ed9154002d77047e94b1f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-16 10:12:04 -07:00
Jeff Schumacher 31311cacfd Implemented file path based tie breaking to exact rename detection
During the exact rename detection phase in RenameDetector, ties were
resolved on a first-found basis. I added support for file path based
tie breaking during that phase. Basically, there are four situations
that have to be handled:

One add matching one delete:
	In this simple case, we pair them as a rename.

One add matching many deletes:
	Find the delete whos path matches the add the closest, and
	pair them as a rename.

Many adds matching one delete:
	Similar to the above case, we find the add that matches the
	delete the closest, and pair them as a rename. The other adds
	are marked as copies of the delete.

Many adds matching many deletes:
	Build a scoring matrix similar to the one used for content-
	based matching, scoring instead by file path. Some of the
	utility functions in SimilarityRenameDetector are used in
	this case, as we use the same encoding scheme. Once the
	matrix is built, scan it for the best matches, marking them
	as renames. The rest are marked as copies.

I don't particularly like the idea of using utility functions right
out of SimilarityRenameDetector, but it works for the moment. A later
commit will likely refactor this into a common utility class, as well
as bringing exact rename detection out of RenameDetector and into a
separate class, much like SimilarityRenameDetector.

Change-Id: I1fb08390aebdcbf20d049aecf402a36506e55611
2010-07-16 09:56:42 -07:00
Christian Halstrick b840ed0121 Added dirty-detection to WorkingTreeIterator
Added possibility to compare the current entry of a WorkingTreeIterator
to a given DirCacheEntry. This is done to detect whether an entry
in the index is dirty or not. 'Dirty' means that the file in the working tree
is different from what's in the index. Merge algorithms will make use of
this to detect conflicts.

Change-Id: I3ff847f4bf392553dcbd6ee236c6ca32a13eedeb
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-07-16 10:08:52 +02:00
Shawn Pearce 19473b1dbc Merge "Handle the tilde notation (~user) of git url" 2010-07-15 17:29:21 -04:00
Robin Rosenberg 845714158a Handle the tilde notation (~user) of git url
When the path is prefixed with ~ the URI parser thought about this
as /~. Strip the / if the next character is the tilde.

Bug: 307017
Change-Id: I58203e5617956b46d83e8987d1f8042beddffac3
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-07-15 01:16:09 +02:00
Stefan Lay 233e0130b5 Git Porcelain API: Add Command
The new Add command adds files to the Git Index. 
It  uses the DirCache to access the git index. It 
works also in case of an existing conflict. 

Fileglobs (e.g. *.c) are not yet supported. 

The new Add command does add ignored files because
there is no gitignore support in jgit yet.

Bug: 318440
Change-Id: If16fdd4443e46b27361c2a18ed8f51668af5d9ff
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
2010-07-14 11:24:58 +00:00
Shawn Pearce 0ef99921fa Merge changes I104cd62f,I1d0238b4
* changes:
  Internationalize RepositoryState descriptions
  Say that commit is allowed during bisect
2010-07-13 20:36:25 -04:00
Charley Wang b878cdcf6b Add compatibility with gitignore specifications
This patch adds ignore compatibility to jgit. It encompasses
exclude files as well as .gitignore. Uses TreeWalk and
FileTreeIterator to find nodes and parses .gitignore
files when required. The patch includes a simple cache that
can be used to save results and avoid excessive gitignore
parsing.

CQ: 4302
Bug: 303925
Change-Id: Iebd7e5bb534accca4bf00d25bbc1f561d7cad11b
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-07-13 00:34:15 +02:00
Jeff Schumacher bc08fafb41 Added very small optimization to exact rename detection
Optimized a small loop in findExactRenames. The loop would go through
all the items in a list of DiffEntries even after it already found
what it was looking for. I made it break out of the loop as soon as
a good match was found.

Change-Id: I28741e0c49ce52d8008930a87cd1db7037700a61
2010-07-12 12:54:01 -07:00
Jeff Schumacher a20e6f6fec Fixed Misleading Javadoc
The javadoc for the setRenameLimit method in RenameDetector said
that you could only have limits in the range (0,100), implying
that 0 and 100 were illegal inputs. The code, however, allowed 0 and
100. I changed the javadoc to say that the range [0,100] was legal.
I also documented the IllegalArgumentException that is thrown if the
limit is outside that range.

Change-Id: I916838f254859f6f0e1516bb55b8e7dc87e57dc2
2010-07-12 12:54:01 -07:00
Jeff Schumacher 9a48de86d8 Added file path similarity to scoring metric in rename detection
The scoring method was not taking into account the similarity of
the file paths and file names. I changed the metric so that it is 99%
based on content (which used to be 100% of the old metric), and 1%
based on path similarity. Of that 1%, half (.5% of the total final
score) is based on the actual file names (e.g. "foo.java"), and half
on the directory (e.g. "src/com/foo/bar/").

Change-Id: I94f0c23bf6413c491b10d5625f6ad7d2ecfb4def
2010-07-12 12:52:05 -07:00
Jeff Schumacher 4c14b7869d Fixed potential div by zero bug
The scoring logic in SimilarityIndex was dividing by the max file
size. If both files are empty, this would cause a div by zero
error. This case cannot currently happen, since two empty files
would have the same SHA1, and would therefore be caught in the
earlier SHA1 based detection pass. Still, if this logic eventually
gets separated from that pass, a div by zero error would occur.

I changed the logic to instead consider two empty files to have a
similarity score of 100.

Change-Id: Ic08e18a066b8fef25bb5e7c62418106a8cee762a
2010-07-12 12:24:42 -07:00
Jeff Schumacher 64b9458640 Added file size based rename detection optimization
Prior to this change, files that were very different in size (enough
so that they could not have enough in common to be detected as
renames) were still having their scores calculated. I added an
optimization to skip such files. For example, if the rename detection
threshold is 60%, the larger file is 200kb, and the smaller file is
50kb, the pair cannot be counted as a rename since they cannot
possibly share 60% of their content in common. (200*.6=120, 120>50)

Change-Id: Icd8315412d5de6292839778e7cea7fe6f061b0fc
2010-07-12 12:24:42 -07:00
Robin Rosenberg d787a82e50 Internationalize RepositoryState descriptions
Change-Id: I104cd62f3e89acf010b1d40a2b08e7f68f63bb85
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-07-10 10:24:37 +02:00
Shawn O. Pearce 9734194917 Honor pack.windowlimit to cap memory usage during packing
The pack.windowlimit configuration parameter places an upper bound
on the number of bytes used by the DeltaWindow class as it scans
through the object list.  If memory usage would exceed the limit
the window is temporarily decreased in size to keep memory used
within that bound.

Change-Id: I09521b8f335475d8aee6125826da8ba2e545060d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:19:07 -07:00
Shawn O. Pearce 74e0835012 Honor pack.threads and perform delta search in parallel
If we have multiple CPUs available, packing usually goes faster
when each CPU is assigned a slice of the available search space.
The number of threads to use is guessed from the runtime if it
wasn't set by the caller, or wasn't set in the configuration.

Change-Id: If554fd8973db77632a52a0f45377dd6ec13fc220
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:17:30 -07:00
Shawn O. Pearce a960d1429e Cache small deltas during packing
PackWriter now caches small deltas, or deltas that are very tiny
compared to their source inputs, so that the writing phase goes
faster by reusing those cached deltas.

The cached data is stored compressed, which usually translates to
a bigger footprint due to deltas being very hard to compress, but
saves time during writing by avoiding the deflate step.  They are
held under SoftReferences so that the JVM GC can clear out deltas
if memory gets very tight.  We would rather continue working and
spend a bit more CPU time during writing than crash due to OOME.

To avoid OutOfMemoryErrors during the caching phase we also trap
OOME and just abort out of the caching.

Because deflateBound() always produces something larger than what
we need to actually store the deflated data, we copy it over into
a new buffer if the actual length doesn't match the buffer length.
When packing jgit.git this saves over 111 KiB in the cache, and is
thus a worthwhile hit on CPU time.

To further save memory we store the inflated size of the delta
(which we need for the object header) in the same field as the
pathHash, as the pathHash is no longer necessary by this phase
of the packing algorithm.

Change-Id: I0da0c600d845e8ec962289751f24e65b5afa56d7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:15:54 -07:00
Shawn O. Pearce dfad23bf3d Implement delta generation during packing
PackWriter now produces new deltas if there is not a suitable delta
available for reuse from an existing pack file.  This permits JGit to
send less data on the wire by sending a delta relative to an object
the other side already has, instead of sending the whole object.

The delta searching algorithm is similar in style to what C Git
uses, but apparently has some differences (see below for more on).
Briefly, objects that should be considered for delta compression are
pushed onto a list.  This list is then sorted by a rough similarity
score, which is derived from the path name the object was discovered
at in the repository during object counting.  The list is then
walked in order.

At each position in the list, up to $WINDOW objects prior to it
are attempted as delta bases.  Each object in the window is tried,
and the shortest delta instruction sequence selects the base object.
Some rough rules are used to prevent pathological behavior during
this matching phase, like skipping pairings of objects that are
not similar enough in size.

PackWriter intentionally excludes commits and annotated tags from
this new delta search phase.  In the JGit repository only 28 out
of 2600+ commits can be delta compressed by C Git.  As the commit
count tends to be a fair percentage of the total number of objects
in the repository, and they generally do not delta compress well,
skipping over them can improve performance with little increase in
the output pack size.

Because this implementation was rebuilt from scratch based on my own
memory of how the packing algorithm has evolved over the years in
C Git, PackWriter, DeltaWindow, and DeltaEncoder don't use exactly
the same rules everywhere, and that leads JGit to produce different
(but logically equivalent) pack files.

  Repository | Pack Size (bytes)                | Packing Time
             | JGit     - CGit     = Difference | JGit / CGit
  -----------+----------------------------------+-----------------
   git       | 25094348 - 24322890 = +771458    | 59.434s / 59.133s
   jgit      |  5669515 -  5709046 = - 39531    |  6.654s /  6.806s
   linux-2.6 |     389M -     386M = +3M        |  20m02s / 18m01s

For the above tests pack.threads was set to 1, window size=10,
delta depth=50, and delta and object reuse was disabled for both
implementations.  Both implementations were reading from an already
fully packed repository on local disk.  The running time reported
is after 1 warm-up run of the tested implementation.

PackWriter is writing 771 KiB more data on git.git, 3M more on
linux-2.6, but is actually 39.5 KiB smaller on jgit.git.  Being
larger by less than 0.7% on linux-2.6 isn't bad, nor is taking an
extra 2 minutes to pack.  On the running time side, JGit is at a
major disadvantage because linux-2.6 doesn't fit into the default
WindowCache of 20M, while C Git is able to mmap the entire pack and
have it available instantly in physical memory (assuming hot cache).

CGit also has a feature where it caches deltas that were created
during the compression phase, and uses those cached deltas during
the writing phase.  PackWriter does not implement this (yet),
and therefore must create every delta twice.  This could easily
account for the increased running time we are seeing.

Change-Id: I6292edc66c2e95fbe45b519b65fdb3918068889c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:14:18 -07:00
Shawn O. Pearce 074055d747 debug-show-packdelta: Dump a pack delta to the console
This is a horribly crude application, it doesn't even verify that
the object its dumping is delta encoded.  Its method of getting the
delta is pretty abusive to the public PackWriter API, because right
now we don't want to expose the real internal low-level methods
actually required to do this.

Change-Id: I437a17ceb98708b5603a2061126eb251e82f4ed4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:12:32 -07:00
Shawn O. Pearce 8612c0ace1 Initial pack format delta generator
DeltaIndex is a simple pack style delta generator.  The function works
by creating a compact index of a source buffer's blocks, and then
walking a sliding window along a desired result buffer, searching for
the window in the index.  When a match is found, the window is
stretched to the longest possible length that is common with the
source buffer, and a copy instruction is created.

Rabin's polynomial hash function is used to compute the hash for a
block, permitting efficient sliding of the window in single byte
increments.  The update function to slide one byte originated from
David Mazieres' work in LBFS, and our implementation of the update
step was certainly inspired by the initial work Geert Bosch proposed
for C Git in http://marc.info/?l=git&m=114565424620771&w=2.

To ensure the encoder runs in linear time with respect to the size of
the two input buffers (source and result), the maximum number of
blocks that can share the same position in the index's hashtable is
capped at a constant number.  This prevents bad inputs from causing
the encoder to run in quadratic time, but comes with a penalty of
creating a longer delta due to fewer considered copy positions.

Strange hackery is used to cap the amount of memory used by the index
to be no more than 12 bytes for every 16 bytes of source buffer, no
matter what the JVM per-object overhead is.  This permits an index to
always be no larger than 1.75x the source buffer length, which is an
important feature to support large windows of candidates to match
against while packing.  Here the strange hackery is nothing more than
a manually managed chained hashtable, where pointers are array indexes
into storage arrays rather than object references.

Computation of the hash function for a single fixed sized block is
done through an unrolled loop, where the first 4 iterations have been
manually reduced down to eliminate unnecessary instructions.  The
pattern is derived from ObjectId.equals(byte[], int, byte[], int),
where we have unrolled the loop required to compare two 20 byte
arrays.  Hours of testing with the Sun 1.6 JRE concluded that the
non-obvious "foo[idx + 1]" style of reference is faster than
"foo[idx++]", and so that is what we use here during hashing.

Change-Id: If9fb2a1524361bc701405920560d8ae752221768
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:10:55 -07:00
Shawn O. Pearce b38426ae8c Add debugging toString() method to ObjectToPack
Its useful to know what the flags are or what the base that was
selected is.  Dump these out as part of the object's toString.

Change-Id: I8810067fb8337b08b4fcafd5f9ea3e1e31ca6726
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:09:19 -07:00
Shawn O. Pearce 699e4aa7c5 Make ObjectToPack clearReuseAsIs signal available to subclasses
A subclass may want to use this method to release handles that are
caching reuse information.  Make it protected so they can override
it and update themselves.

Change-Id: I2277a56ad28560d2d2d97961cbc74bc7405a70d4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:07:45 -07:00
Shawn O. Pearce 4569d77e13 Correctly classify the compressing objects phase
Searching for reuse candidates should be fast compared to actually
doing delta compression.  So pull the progress monitor out of this
phase and rename it back to identify the compressing objects state.

Change-Id: I5eb80919f21c1251e0e3420ff7774126f1f79b27
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:06:10 -07:00
Shawn O. Pearce 85b7a53d52 Refactor ObjectToPack's delta depth setting
Long ago when PackWriter is first written we thought that the delta
depth could be updated automatically.  But its never used.  Instead
make this a simple standard setter so the caller can more directly
set the delta depth of this object.  This permits us to configure a
depth that takes into account more than just the depth of another
object in this same pack.

Change-Id: I1d71b74f2edd7029b8743a2c13b591098ce8cc8f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:04:35 -07:00
Shawn O. Pearce 6730f9e3c8 Configure core.bigFileThreshold into PackWriter
C Git's fast-import uses this to determine the maximum file size
that it tries to delta compress, anything equal to or above this
setting is stored with as a whole object with simple deflate.

Define the configuration so we can use it later.

Change-Id: Iea46e787d019a1b6c51135cc73d7688a02e207f5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:02:54 -07:00
Shawn O. Pearce 823e9a9721 Add doNotDelta flag to ObjectToPack
This flag will later control whether or not PackWriter search for a
delta base for this object.  Edge objects will never get searched,
as the writer won't be outputting them, so they should always have
this flag set on.  Sometime in the future this flag should also be
set for file blobs on file paths that have the "-delta" gitattribute
set in the repository's attributes file.

Change-Id: I6e518e1a6996c8ce00b523727f1b605e400e82c6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:00:49 -07:00
Shawn O. Pearce 616bc74cf7 Add more configuration options to PackWriter
We now at least import other pack settings like pack.window, which
means we can later use these to control how we search for deltas.

The compression level was fixed to use pack.compression rather than
the loose object core.compression setting.

Change-Id: I72ff6d481c936153ceb6a9e485fa731faf075a9a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 19:00:46 -07:00
Robin Rosenberg a1492f1922 Say that commit is allowed during bisect
C Git allows this and it is quite handy.

Change-Id: I1d0238b43fca931ad2079649fb7b431e2815c351
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-07-10 02:32:46 +02:00
Shawn O. Pearce 2f93a09dd1 Save object path hash codes during packing
We need to remember these so we can later cluster objects that
have similar file paths near each other as we search for deltas
between them.

Change-Id: I52cb1e4ca15c9c267a2dbf51dd0d795f885f4cf8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 15:17:26 -07:00
Shawn O. Pearce c20daa7314 Add path hash code to ObjectWalk
PackWriter wants to categorize objects that are similar in path name,
so blobs that are probably from the same file (or same sort of file)
can be delta compressed against each other.  Avoid converting into
a string by performing the hashing directly against the path buffer
in the tree iterator.

We only hash the last 16 bytes of the path, and we try avoid any
spaces, as we want the suffix of a file such as ".java" to be more
important than the directory it is in, like "src".

Change-Id: I31770ee711526306769a6f534afb19f937e0ba85
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 10:37:47 -07:00
Shawn O. Pearce b584cb8754 Add getObjectSize to ObjectReader
This is an informational function used by PackWriter to help it
better organize objects for delta compression.  Storage systems
can implement it to provide up more detailed size information,
or they can simply rely on the default behavior that uses the
ObjectLoader obtained from open.

For local file storage, we can obtain this information faster
through specialized routines that parse a pack object header.

Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 10:37:47 -07:00
Shawn O. Pearce 97311cd3e0 Allow TemporaryBuffer.Heap to allocate smaller than 8 KiB
If the heap limit was set to something smaller than 8 KiB, we were
still allocating the full 8 KiB block size, and accepting up to
the amount we allocated by.  Instead actually put a hard cap on
the limit.

Change-Id: Id1da26fde2102e76510b1da4ede8493928a981cc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-09 10:37:47 -07:00
Matthias Sohn b8f2bb7d2a Add support for updateNeeded flag in DirCacheEntry
Change-Id: If06ff41d9ccd422afbc79ecbc3cfdf8bb2508dcd
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-07-09 14:12:06 +02:00
Jeff Schumacher a8b29afd82 Create FileHeader from DiffEntry
Added support for converting DiffEntrys to FileHeaders. FileHeaders
are DiffEntrys with a buffer containing the diff output as well as
a list of HunkHeaders. The HunkHeaders contain EditLists. The
createFileHeader(DiffEntry) method in DiffFormatter performs a Myers
Diff on the files refered to by the DiffEntry, then puts the returned
EditList into a single HunkHeader, which is then put into the
FileHeader to be returned. It also generates the appropriate diff
header an puts it into the FileHeader's buffer. The rest of the diff
output, which would normally be parsed to generate the HunkHeaders,
is not generated. In fact, the purpose of this method is to avoid
the costly diff output generation and parsing normally required to
create a FileHeader.

Change-Id: I7d8b18c0f6c85e3d02ad58995d3d231e69af5887
2010-07-08 16:58:55 -07:00
Stefan Lay 354b90131a Fix javadoc typos in JGit API
There were some small errors which made it
difficult to read the JavaDoc.

Change-Id: Ib3b34353465162adebaca3514d596d0edf5aea51
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
2010-07-08 10:42:29 +02:00
Shawn O. Pearce 711bd3e3d0 Define a constant for 127 in DeltaEncoder
The special value 127 here means how many bytes we can put into
a single insert command.  Rather than use the magical value 127,
lets name it to better document the code.

Change-Id: I5a326f4380f6ac87987fa833e9477700e984a88e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-07 09:52:09 -07:00
Shawn O. Pearce cd7dd8591e Cap delta copy instructions at 64k
Although all modern delta decoders can process copy instructions
with a count as large as 0xffffff (~15.9 MiB), pack version 2 streams
are only supposed to use delta copy instructions up to 64 KiB.

Rewrite our copy instruction encode loop to use the lower 64 KiB
limit, even though modern decoders would support longer copies.

To improve encoding performance we now try to encode up to four full
copy commands in our buffer before we flush it to the stream, but
we don't try to implement full buffering here.  We are just trying
to amortize the virtual method call to the destination stream when
we have to do a large copy.

Change-Id: I9410a16e6912faa83180a9788dc05f11e33fabae
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-07 09:52:09 -07:00
Shawn O. Pearce 384a19eee0 Deprecate all of the older Tree related code
We want to get rid of these APIs, because they don't perform as well
as DirCache/TreeWalk, or don't offer nearly as many features.

Bug: 319145
Change-Id: I2b28f9cddc36482e1ad42d53e86e9d6461ba3bfc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-07 09:15:02 -07:00
Shawn O. Pearce a215914a56 Fix DeltaEncoder header for objects 128 bytes long
The encode loop had the wrong condition, objects that are 128 bytes
in size need to have their length encoded as two bytes, not one.

Change-Id: I3bef85f2b774871ba6104042b341749eb8e7595c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-07 08:53:03 -07:00
Shawn O. Pearce f29741d1d8 amend commit: Support large delta packed objects as streams
Rename the ByteWindow's inflate() method to setInput.  We have
completely refactored the purpose of this method to be feeding part
(or all) of the window as input to the Inflater, and the actual
inflate activity happens in the caller.

Change-Id: Ie93a5bae0e9e637b5e822d56993ce6b562c6ad15
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-06 19:41:06 -07:00
Shawn O. Pearce ab3c68c512 amend commit: Support large loose objects as streams
We need to validate the stream state after the InflaterInputStream
thinks the stream is done.  Git expects a higher level of service from
the Inflater than the InflaterInputStream usually gives, we need to
ensure the embedded CRC is valid, and that there isn't trailing
garbage at the end of the file.

Change-Id: I1c9642a82dbd76b69e607dceccf8b85dc869a3c1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-06 19:41:01 -07:00
Stefan Lay 311da9b211 Fix comparison of nanoseconds
NB.decodeInt32(info, base + 4) already returns nanoseconds.
Therefore it must not be divided by 1000000.

Change-Id: Ie8f5c4a03f984d98935dccedc2b1ba4457094899
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
2010-07-06 17:57:17 +02:00
Shawn O. Pearce 1913b41bc7 log: Implement --follow
The FollowFilter can be installed on a RevWalk to cause the path
to be updated through rename detection when the affected file is
found to be added to the project.

The filter works reasonably well, for example we can follow the
history of the fsck command in git-core:

  $ jgit log --name-status --follow builtin/fsck.c | grep ^R
  R100	builtin-fsck.c	builtin/fsck.c
  R099	fsck.c	builtin-fsck.c
  R099	fsck-objects.c	fsck.c
  R099	fsck-cache.c	fsck-objects.c

Change-Id: I4017bcfd150126aa342fdd423a688493ca660a1f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 18:17:55 -07:00
Shawn O. Pearce e9de5643fa Cache the diff configuration section
This way we don't have to reparse for the rename limit every time
we create a new rename detector for a repository.

Change-Id: I669d031690b85ef4da5e39189be7173fb773fc56
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 18:17:52 -07:00
Shawn O. Pearce 8a0c58394d log: Add whitespace ignore options
Similar to what we did with diff, implement whitespace ignore options
for log too.  This requires us to define some means of creating any
RawText object type at will inside of DiffFormatter, so we define a
new factory interface to construct RawText instances on demand.

Unfortunately we have to copy the entire block of common options.
args4j only processes the options/arguments on the one command class
and Java doesn't support multiple inheritance.

Change-Id: Ia16cd3a11b850fffae9fbe7b721d7e43f1d0e8a5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 17:32:47 -07:00
Shawn O. Pearce bd8740dc14 Format submodule links during differences
Instead of crashing, output a submodule link with the simple
"Subproject commit $fullid\n" syntax used by C Git.

Change-Id: Iae8646941683fb19b73fb038217d2e3bf5f77fa9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 16:59:06 -07:00
Shawn O. Pearce 5be90be996 Redo DiffFormatter API to be easier to use
Passing around the OutputStream and the Repository is crazy.  Instead
put the stream in the constructor, since this formatter exists only to
output to the stream, and put the repository as a member variable that
can be optionally set.

Change-Id: I2bad012fee7f40dc1346700ebd19f1e048982878
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 16:58:37 -07:00
Shawn O. Pearce 04a9d23b9a log, diff: Add rename detection support
Implement rename detection in the command line diff and log commands.
Also support --name-status, -p and -U flags, as these can be quite
useful to view more detail.

All of the Git patch file formatting code is now moved over to the
DiffFormatter class.  This permits us to reuse it in any context,
including inside of IDEs.

Change-Id: I687ccba34e18105a07e0a439d2181c323209d96c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 16:32:03 -07:00
Shawn O. Pearce 978535b090 Implement similarity based rename detection
Content similarity based rename detection is performed only after
a linear time detection is performed using exact content match on
the ObjectIds.  Any names which were paired up during that exact
match phase are excluded from the inexact similarity based rename,
which reduces the space that must be considered.

During rename detection two entries cannot be marked as a rename
if they are different types of files.  This prevents a symlink from
being renamed to a regular file, even if their blob content appears
to be similar, or is identical.

Efficiently comparing two files is performed by building up two
hash indexes and hashing lines or short blocks from each file,
counting the number of bytes that each line or block represents.

Instead of using a standard java.util.HashMap, we use a custom
open hashing scheme similiar to what we use in ObjecIdSubclassMap.
This permits us to have a very light-weight hash, with very little
memory overhead per cell stored.

As we only need two ints per record in the map (line/block key and
number of bytes), we collapse them into a single long inside of
a long array, making very efficient use of available memory when
we create the index table.  We only need object headers for the
index structure itself, and the index table, but not per-cell.
This offers a massive space savings over using java.util.HashMap.

The score calculation is done by approximating how many bytes are
the same between the two inputs (which for a delta would be how much
is copied from the base into the result).  The score is derived by
dividing the approximate number of bytes in common into the length
of the larger of the two input files.

Right now the SimilarityIndex table should average about 1/2 full,
which means we waste about 50% of our memory on empty entries
after we are done indexing a file and sort the table's contents.
If memory becomes an issue we could discard the table and copy all
records over to a new array that is properly sized.

Building the index requires O(M + N log N) time, where M is the
size of the input file in bytes, and N is the number of unique
lines/blocks in the file.  The N log N time constraint comes
from the sort of the index table that is necessary to perform
linear time matching against another SimilarityIndex created for
a different file.

To actually perform the rename detection, a SxD matrix is created,
placing the sources (aka deletions) along one dimension and the
destinations (aka additions) along the other.  A simple O(S x D)
loop examines every cell in this matrix.

A SimilarityIndex is built along the row and reused for each
column compare along that row, avoiding the costly index rebuild
at the row level.  A future improvement would be to load a smaller
square matrix into SimilarityIndexes and process everything in that
sub-matrix before discarding the column dimension and moving down
to the next sub-matrix block along that same grid of rows.

An optional ProgressMonitor is permitted to be passed in, allowing
applications to see the progress of the detector as it works through
the matrix cells.  This provides some indication of current status
for very long running renames.

The default line/block hash function used by the SimilarityIndex
may not be optimal, and may produce too many collisions.  It is
borrowed from RawText's hash, which is used to quickly skip out of
a longer equality test if two lines have different hash functions.
We may need to refine this hash in the future, in order to minimize
the number of collisions we get on common source files.

Based on a handful of test commits in JGit (especially my own
recent rename repository refactoring series), this rename detector
produces output that is very close to C Git.  The content similarity
scores are sometimes off by 1%, which is most probably caused by
our SimilarityIndex type using a different hash function than C
Git uses when it computes the delta size between any two objects
in the rename matrix.

Bug: 318504
Change-Id: I11dff969e8a2e4cf252636d857d2113053bdd9dc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 16:32:03 -07:00
Shawn O. Pearce 4dd7b35b26 Improve description of isBare and NoWorkTreeException
Alex pointed out that my description of a bare repository might be
confusing for some readers.  Reword the description of the error,
and make it consistent throughout the Repository class's API.

Change-Id: I87929ddd3005f578a7022f363270952d1f7f8664
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 10:54:31 -07:00
Shawn O. Pearce 08d349a27b amend commit: Refactor repository construction to builder class
During code review, Alex raised a few comments about commit
532421d989 ("Refactor repository construction to builder class").
Due to the size of the related series we aren't going to go back
and rebase in something this minor, so resolve them as a follow-up
commit instead.

Change-Id: Ied52f7a8f7252743353c58d20bfc3ec498933e00
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 10:54:30 -07:00
Shawn O. Pearce fe9860a444 Remove pointless size test in PackFile decompress
Now that any large objects are forced through a streaming loader
when its bigger than getStreamFileThreshold(), and that threshold
is pegged at Integer.MAX_VALUE as its largest size, we will never
be able to reach this code path where we threw OutOfMemoryError.

Robin pointed out that we probably should include a message here,
but the code is effectively unreachable, so there isn't any value
in adding a message at this point.

So remove it.

Change-Id: Ie611d005622e38a75537f1350246df0ab89dd500
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 10:54:30 -07:00
Shawn O. Pearce 412ca65bd5 Avoid unbounded getCachedBytes during parseAny
Since we don't know the type of object we are parsing, we don't
know if its a massive blob, or some small commit or annotated tag.
Avoid pulling the cached bytes until we have checked the type and
decided if we actually need them to continue parsing right now.

This way large blobs which won't fit in memory and would throw
a LargeObjectException don't abort parsing.

Change-Id: Ifb70df5d1c59f616aa20ee88898cb69524541636
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 10:54:30 -07:00
Shawn O. Pearce e4a480f658 Make type and size lazy for large delta objects
Callers don't necessarily need the getSize() result from a large
delta.  They instead should be always using openStream() or copyTo()
for blobs going to local files, or they should be checking the
result of the constant-time isLarge() method to determine the type
of access they can use on the ObjectLoader.  Avoid inflating the
delta instruction stream twice by delaying the decoding of the size
until after we have created the DeltaStream and decoded the header.

Likewise with the type, callers don't necessarily always need it
to be present in an ObjectLoader.  Delay looking at it as late as
we can, thereby avoiding an ugly O(N^2) loop looking up the type
for every single object in the entire delta chain.

Change-Id: I6487b75b52a5d201d811a8baed2fb4fcd6431320
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 10:54:29 -07:00
Shawn O. Pearce 113577617b Use core.streamFileThreshold to set our streaming limit
We default this to 1 MiB for now, but we allow users to modify
it through the Repository's configuration file to be a different
value.  A new repository listener is used to identify when the
setting has been updated and trigger a reconfiguration of any
active ObjectReaders.

To prevent a horrible explosion we cap core.streamFileThreshold
at no more than 1/4 of the maximum JVM heap size.  We do this
because we need at least 2 byte arrays equal in size to the
stream threshold for the worst case delta inflation scenario,
and our host application probably also needs some amount of the
heap for their working set size.

Change-Id: I103b3a541dc970bbf1a6d92917a12c5a1ee34d6c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-02 12:41:39 -07:00
Shawn O. Pearce ad68553be4 Support large delta packed objects as streams
Very large delta instruction streams, or deltas which use very large
base objects, are now streamed through as large objects rather than
being inflated into a byte array.

This isn't the most efficient way to access delta encoded content, as
we may need to rewind and reprocess the base object when there was a
block moved within the file, but it will at least prevent the JVM from
having its heap explode.

When streaming a delta we have an inflater open for each level in the
delta chain, to inflate the instruction set of the delta, as well as
an inflater for the base level object.  The base object is buffered,
as is the top level delta requested by the application, but we do not
buffer the intermediate delta streams.  This keeps memory usage lower,
so its closer to 1024 bytes per level in the chain, without having an
adverse impact on raw throughput as the top-level buffer gets pushed
down to the lowest stream that has the next region.

Delta instructions transparently collapse here, if the top level does
not copy a region from its base, the base won't materialize that part
from its own base, etc.  This allows us to avoid copying around a lot
of segments which have been deleted from the final version.

Change-Id: I724d45245cebb4bad2deeae7b896fc55b2dd49b3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-02 02:19:12 -07:00
Shawn O. Pearce ded8f6c721 Support large whole packed objects as streams
Similar to the loose object support, whole packed objects can
now be streamed back to the caller.  The streaming is less
efficient as we copy the data from the cached window array
into the InflaterInputStream's internal buffer, then inflate
it there before returning to the application.

Like with unpacked objects, there is plenty of room for some
optimization, especially for the copyTo method, where we don't
necessarily need so much buffering to exist.

Change-Id: Ie23be81289e37e24b91d17b0891e47b9da988008
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-01 19:34:21 -07:00
Shawn O. Pearce 13e0218a25 Replace PackedObjectLoader with ObjectLoader.SmallObject
The class is identical, but ObjectLoader.SmallObject is part of our
public API for storage implementations to build on top of.

Change-Id: I381a3953b14870b6d3d74a9c295769ace78869dc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-01 18:27:51 -07:00
Shawn O. Pearce fa23482ca7 Support large loose objects as streams
Big loose objects can now be streamed if they are over the large
object size threshold.  This prevents the JVM heap from exploding
with a very large byte array to hold the slurped file, and then
again with its uncompressed copy.

We may have slightly slowed down the simple case for small
loose objects, as the loader no longer slurps the entire thing
and decompresses in memory.  To try and keep good performance
for the very common small objects that are below 8 KiB in size,
buffers are set to 8 KiB, causing the reader to slurp most of the
file anyway.  However the data has to be copied at least once,
from the BufferedInputStream into the InflaterInputStream.

New unit tests are supplied to get nearly 100% code coverage on the
unpacked code paths, for both standard and pack style loose objects.
We tested a fair chunk of the code elsewhere, but these new tests
are better isolated to the specific branches in the code path.

Change-Id: I87b764ab1b84225e9b5619a2a55fd8eaa640e1fe
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-01 18:26:17 -07:00
Jeff Schumacher cb8e1e6014 Added a preliminary version of rename detection
JGit does not currently do rename detection during diffs. I added
a class that, given a TreeWalk to iterate over, can output a list
of DiffEntry's for that TreeWalk, taking into account renames. This
class only detects renames by SHA1's. More complex rename detection,
along the lines of what C Git does will be added later.

Change-Id: I93606ce15da70df6660651ec322ea50718dd7c04
2010-07-01 17:33:53 -07:00
Shawn O. Pearce 2489088235 Permit AnyObjectTo to compareTo AnyObjectId
Assume that the argument of compareTo won't be mutated while we
are doing the compare, and support the wider AnyObjectId type so
MutableObjectId is suitable on either side of the compareTo call.

Change-Id: I2a63a496c0a7b04f0e5f27d588689c6d5e149d98
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 19:07:36 -07:00
Shawn O. Pearce d04b7972d8 Use copyTo during checkout of files to working tree
This way we can stream a large file through memory, rather than
loading the entire thing into a single contiguous byte array.

Change-Id: I3ada2856af2bf518f072edec242667a486fb0df1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 18:56:20 -07:00
Shawn O. Pearce a0fd06e5c2 Stream whole deflated objects in PackWriter
Instead of loading the entire object as a byte array and passing
that into the deflater, let the ObjectLoader copy the object onto
the DeflaterOutputStream.  This has the nice side effect of using
some sort of stride hack in the Sun implementation that may improve
compression performance.

Change-Id: I3f3d681b06af0da93ab96c75468e00e183ff32fe
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 18:50:50 -07:00
Shawn O. Pearce ad0383734e Lazily allocate Deflater in PackWriter
Only allocate the Deflater if we can't reuse everything, but also
make sure we release it when we release the PackWriter's resources.

Change-Id: I16a32b94647af0778658eda87acbafc9a25b314a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 18:40:54 -07:00
Shawn O. Pearce 23e7f6376a Add openStream to ObjectLoader for big blobs
Blobs that are too large to read as a single byte array should be
accessed through an InputStream based interface instead, allowing
the application to walk through the data stream incrementally.

Define the basic interface to support streaming contents, but don't
implement it yet for the file based backend.

Change-Id: If9e4442e9ef4ed52c3e0f1af9398199a73145516
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 18:36:10 -07:00
Jeff Schumacher 7b0b4110ed Refactored code out of FileHeader to facilitate rename detection
Refactored a superclass out of FileHeader called DiffEntry that holds
the more general data from FileHeader that is useful in rename
detection (old/new Ids, modes, names, as well as changeType and
score). FileHeader is now a DiffEntry that adds Hunks, parsing
abilities, etc.

Change-Id: I8398728cd218f8c6e98f7a4a7f2f342391d865e4
2010-06-30 17:53:27 -07:00
Dmitry Neverov 44854741c5 Fix missing flush in StreamCopyThread
It is possible that StreamCopyThread will not flush everything
from it's src to it's dst.  In most cases StreamCopyThread works
like this:

  in loop:
    n = src.read(buf);
    dst.write(buf, 0, n);

and when we want to flush, we interrupt() StreamCopyThread and it
flushes everything it wrote to dst.

The problem is that our interrupt() could interrupt reading. In this
case we will flush everything we wrote to dst, but not everything
we wrote to src.

Change-Id: Ifaf4d8be87535c7364dd59b217dfc631460018ff
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 10:48:44 -07:00
Shawn O. Pearce a1d5f5b6b5 Move DirCache factory methods to Repository
Instead of creating the DirCache from a static factory method, use
an instance method on Repository, permitting the implementation to
override the method with a completely different type of DirCache
reading and writing.  This would better support a repository in the
cloud strategy, or even just an in-memory unit test environment.

Change-Id: I6399894b12d6480c4b3ac84d10775dfd1b8d13e7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 10:39:00 -07:00
Shawn O. Pearce cb9d8285ba Create NoWorkTreeException for bare repositories
Using a custom exception type makes it easire for an application
developer to understand why an exception was thrown out of a method
we declare.  To remain compatiable with existing callers, we still
extend off IllegalStateException.

Change-Id: Ideeef2399b11ca460a2dbb3cd80eb76aa0a025ba
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 09:48:36 -07:00
Jeff Schumacher 9f2249bd26 Added check for binary files while diffing
Added a check in Diff to ensure that files that are most likely
not text are not line-by-line diffed. Files are determined to be
binary by checking the first 8000 bytes for a null character. This
is a similar heuristic to what C Git uses.

Change-Id: I2b6f05674c88d89b3f549a5db483f850f7f46c26
2010-06-29 17:23:00 -07:00
Shawn O. Pearce 515deaf7e5 Ensure RevWalk is released when done
Update a number of calling sites of RevWalk to ensure the walker's
internal ObjectReader is released after the walk is no longer used.
Because the ObjectReader is likely to hold onto a native resource
like an Inflater, we don't want to leak them outside of their
useful scope.

Where possible we also try to share ObjectReaders across several
walk pools, or between a walker and a PackWriter.  This permits
the ObjectReader to actually do some caching if it felt inclined
to do so.

Not everything was updated, we'll probably need to come back and
update even more call sites, but these are some of the biggest
offenders.  Test cases in particular aren't updated.  My plan is to
move most storage-agnostic tests onto some purely in-memory storage
solution that doesn't do compression.

Change-Id: I04087ec79faeea208b19848939898ad7172b6672
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-29 15:12:53 -07:00
Shawn O. Pearce 94228bde22 Use ObjectReader in DirCacheBuilder.addTree
Rather than building a custom reader, have the caller supply us one.

Change-Id: Ief2b5a6b1b75f05c8a6bc732a60d4d1041dd8254
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-29 09:30:29 -07:00
Shawn O. Pearce d6e975f71b Use one ObjectReader for WalkFetchConnection
Instead of creating new ObjectReader for each walker, use one for
the entire connection and delegate reads through it.

Change-Id: I7f0a2ec8c9fe60b095a7be77dc423a2ff8b443a3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 18:47:33 -07:00
Shawn O. Pearce 121d009b9b Use ObjectReader in RevWalk, TreeWalk
We don't actually need a Repository object here, just an ObjectReader
that can load content for us.  So change the API to depend on that.

However, this breaks the asCommit and asTag legacy translation methods
on RevCommit and RevTag, so we still have to keep the Repository
inside of RevWalk for those two types.  Hopefully we can drop those in
the future, and then drop the Repository off the RevWalk.

Change-Id: Iba983e48b663790061c43ae9ffbb77dfe6f4818e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 18:47:29 -07:00
Shawn O. Pearce 06f635a4bc Fix minor formatting issue in UploadPack
Change-Id: Ifc0c3a94dc0e16126af6cf17e9c4a7cb96e8ffab
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 18:47:28 -07:00
Shawn Pearce 3fd4918852 Merge changes Ie56301aa,Ic2f79e85
* changes:
  Added further support for whitespace ignoring during diff
  Added support for whitespace ignoring
2010-06-28 20:27:04 -04:00
Jeff Schumacher 9869ef2592 Added further support for whitespace ignoring during diff
Added code to support ignoring leading, trailing, and changed
whitespace when performing a diff operation. I also added command
line options to Diff to enable the various whitespace ignoring
methods. These match the flags for git diff.

Change-Id: Ie56301aafad59ee3f0fe5de62719f5023cd702c8
2010-06-28 17:25:19 -07:00
Shawn O. Pearce 242b4026d9 Remove volatile keyword from RepositoryEvent
We don't need this field to be volatile.  Events are delivered by
the same thread that created the RepositoryEvent object, and thus
any cross-thread operations would need to be handled by some other
type of synchronization in the listener, and that would protect
both the repository field and any other per-event data.

Change-Id: Iefe345959e1a2d4669709dbf82962bcc1b8913e3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 12:46:18 -07:00
Shawn O. Pearce aa4b06e087 Rename openObject, hasObject to just open, has
Similar to what we did on Repository, the openObject method
already implied we wanted to open an object, given its main
argument was of type AnyObjectId.  Simplify the method name
to just the action, has or open.

Change-Id: If055e5e0d8de0e2424c18a773f6d2bc2f66054f4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 11:57:41 -07:00
Shawn O. Pearce acb7be2c5a Refactor Repository.openObject to be Repository.open
We drop the "Object" suffix, because its pretty clear here that
we want to open an object, given that we pass in AnyObjectId as
the main parameter.  We also fix the calling convention to throw
a MissingObjectException or IncorrectObjectTypeException, so that
callers don't have to do this error checking themselves.

Change-Id: I72c43353cea8372278b032f5086d52082c1eee39
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 11:54:58 -07:00
Shawn O. Pearce 6b62e53b60 Move PackWriter progress monitors onto the operations
Rather than taking the ProgressMonitor objects in our constructor and
carrying them around as instance fields, take them as arguments to the
actual time consuming operations we need to run.

Change-Id: I2b230d07e277de029b1061c807e67de5428cc1c4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 11:47:28 -07:00
Shawn O. Pearce f288c27e46 Pass the PackOutputStream down the call stack
Rather than storing this in an instance member, pass it down the
calling stack.  Its cleaner, we don't have to poke the stream as
a temporary field, and then unset it.

Change-Id: I0fd323371bc12edb10f0493bf11885d7057aeb13
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 11:47:28 -07:00
Shawn O. Pearce 1ad2feb7b3 Remove Repository.openObject(ObjectReader, AnyObjectId)
Going through ObjectReader.openObject(AnyObjectId) is faster, but
also produces cleaner application level code.  The error checking
is done inside of the openObject method, which means it can be
removed from the application code.

Change-Id: Ia927b448d128005e1640362281585023582b1a3a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 11:47:28 -07:00
Shawn O. Pearce 9ba7bd4df4 Throw IncorrectObjectTypeException on bad type hints
If the type hint isn't OBJ_ANY and it doesn't match the actual type
observed from the object store, define the reader to throw back an
IncorrectObjectTypeException.  This way the caller doesn't have to
perform this check itself before it evaluates the object data, and
we can simplify quite a few call sites.

Change-Id: I9f0dfa033857f439c94245361fcae515bc0a6533
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 11:47:25 -07:00
Jeff Schumacher 543235b805 Added support for whitespace ignoring
JGit did not have support for skipping whitespace when comparing
lines in RawText objects. I added a subclass of RawText that skips
whitespace in its equals and hashCode methods. I used a subclass
rather than adding functionality into RawText so that performance
would not be impacted by extra logic.

This class only supports ignoring all whitespace. Others will follow
that allow other forms of whitespace ignoring.

Change-Id: Ic2f79e85215e48d3fd53ec1b4ad13373dd183a4a
2010-06-28 10:59:10 -07:00
Shawn O. Pearce a45728d7a4 Ensure ObjectReader used by PackWriter is released
The ObjectReader API demands that we release the reader when we are
done with it.  PackWriter contains a reader, which it uses for the
entire packing session.  Expose the release of the reader through
a release method on the writer.

This still doesn't address the RevWalk and TreeWalk users, who
don't correctly release their reader.  But its a small step in the
right direction.

Change-Id: I5cb0b5c1b432434a799fceb21b86479e09b84a0a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 10:25:11 -07:00
Shawn O. Pearce b5aa52e98a Ensure PackWriter releases its ObjectReader
Change-Id: I3f8af29066cc5a2132dc4a75c9654d97800f2f18
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 10:16:27 -07:00
Shawn O. Pearce e01abbd543 Release ObjectReader before the cached ObjectDatabase
I don't want to play games with the order of release here, its
probably safer to release the reader before the database, just
in case the one depends on the other.

Change-Id: I2394c7d2477eaf7a7e1556fc3393c59d3b31e764
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 09:47:20 -07:00
Shawn O. Pearce b40f02eb1a Release ObjectInserter in merge() not mergeImpl()
By doing the release at the higher level class, we can ensure
the release occurs if the inserter was allocated, even if the
implementation forgets to do this.  Since the higher level class
is what allocated it, it makes sense to have it also do the release.

Change-Id: Id617b2db864c3208ed68cba4eda80e51612359ad
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 09:35:55 -07:00
Shawn O. Pearce 5aae041a81 Commit: Use Repository.newObjectInserter
Everyone else does.  This must have been a spot I missed during
some sort of squash while developing the series.

Change-Id: I62eae50b618f47ee33ad7cf71fc05b724f603201
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 09:22:48 -07:00
Shawn O. Pearce ea21c111cb Move PackWriter over to storage.pack.PackWriter
Similar to what we did with the file code, move the pack writer
into its own package so the related classes and their package
private methods are hidden from the rest of the library.

Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 18:51:12 -07:00
Shawn O. Pearce 71aace52f7 Simplify ObjectLoaders coming from PackFile
We no longer need an ObjectLoader to be lazy and try to delay
the materialization of the object content.  That was done only
to support PackWriter searching for a good reuse candidate.

Instead, simplify the code base by doing the materialization
immediately when the loader asks for it, because any caller
asking for the loader is going to need the content.

Change-Id: Id867b1004529744f234ab8f9cfab3d2c52ca3bd0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 18:50:38 -07:00
Shawn O. Pearce 68518ca3aa Remove getRawSize, getRawType from ObjectLoader
These were only used by PackWriter to help it filter object
representations.  Their only user disappeared when we rewrote the
object selection code path to use the new representation type.

Change-Id: I9ed676bfe4f87fcf94aa21e53bda43115912e145
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 18:50:38 -07:00
Shawn O. Pearce 86547022f0 Tighten up local packed object representation during packing
Rather than making a loader, and then using that to fill the object
representation, parse the header and set up our data directly.
This saves some time, as we don't waste cycles on information we
won't use right now.

The weight computed for a representation is now its actual stored
size in the pack file, rather than its inflated size.  This accounts
for changes made when the compression level is modified on the
repository.  It is however more costly to determine the weight of
the object, since we have to find its length in the pack.  To try and
recover that cost we now cache the length as part of our ObjectToPack
record, so it doesn't have to be found during the output phase.

A LocalObjectToPack now costs us (assuming 32 bit pointers):

                   (32 bit)     (64 bit)
  vm header:         8 bytes      8 bytes
  ObjectId:         20 bytes     20 bytes
  PackedObjectInfo: 12 bytes     12 bytes
  ObjectToPack:      8 bytes     12 bytes
  LocalOTP:         20 bytes     24 bytes
                 -----------    ---------
                    68 bytes     74 bytes

Change-Id: I923d2736186eb2ac8ab498d3eb137e17930fcb50
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 18:50:38 -07:00
Shawn O. Pearce ad5238dc67 Move FileRepository to storage.file.FileRepository
This move isolates all of the local file specific implementation code
into a single package, where their package-private methods and support
classes are properly hidden away from the rest of the core library.

Because of the sheer number of files impacted, I have limited this
change to only the renames and the updated imports.

Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 18:50:34 -07:00
Shawn O. Pearce 3a7aec03e0 Implement zero-copy for single window objects
Objects that fall completely within a single window can be worked
with in a zero-copy fashion, provided that the window is backed by
a normal byte[] and not by a ByteBuffer.

This works for a surprising number of objects.  The default window
size is 8 KiB, but most deltas are quite a bit smaller than that.
Objects smaller than 1/2 of the window size have a very good chance
of falling completely within a window's array, which means we can
work with them without copying their data around.

Larger objects, or objects which are unlucky enough to span over a
window boundary, get copied through the temporary buffer.  We pay
a tiny penalty to realize we can't use the zero-copy code path,
but its easier than trying to keep track of two adjacent windows.

With this change (as well as everything preceeding it), packing
is actually a bit faster.  Some crude benchmarks based on cloning
linux-2.6.git (~324 MiB, 1,624,785 objects) over localhost using
C git client and JGit daemon shows we get better throughput, and
slightly better times:

  Total Time    | Throughput
  (old)  (now)  | (old)          (now)
  --------------+---------------------------
  2m45s  2m37s  | 12.49 MiB/s    21.17 MiB/s
  2m42s  2m36s  | 16.29 MiB/s    22.63 MiB/s
  2m37s  2m31s  | 16.07 MiB/s    21.92 MiB/s

Change-Id: I48b2c8d37f08d7bf5e76c5a8020cde4a16ae3396
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 16:13:22 -07:00
Shawn O. Pearce ece88b99eb Redo PackWriter object reuse output
Output of selected reuses is refactored to use a new ObjectReuseAsIs
interface that extends the ObjectReader.  This interface allows the
reader to control how it performs the reuse into the output stream,
but also allows it to throw an exception to request the writer to
find a different candidate representation.

The PackFile reuse code was overhauled, cleaning up the APIs so they
aren't exposed in the object loader, but instead are now a single
method on the PackFile itself.  The reuse algorithm was changed to do
a data verification pass, followed by the copy pass to the output.
This permits us to work around a corrupt object in a pack file by
seeking another copy of that object when this one is bad.

The reuse code was also optimized for the common case, where the
in-pack representation is under 16 KiB.  In these smaller cases
data is sent to the pack writer more directly, avoiding some copying.

Change-Id: I6350c2b444118305e8446ce1dfd049259832bcca
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 14:46:05 -07:00
Shawn O. Pearce bf4ffff07f Redo PackWriter object reuse selection
The new selection implementation uses a public API on the
ObjectReader, allowing the storage library to enumerate its
candidates and select the best one for this packer without
needing to build a temporary list of the candidates first.

Change-Id: Ie01496434f7d3581d6d3bbb9e33c8f9fa649b6cd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 14:16:06 -07:00
Shawn O. Pearce e0c9368f3e Reclaim some bits in ObjectToPack flags field
Make the lower bits available for flags that PackWriter can use to
keep track of facts about the object.  We shouldn't need more than
2^24 delta depths, unpacking that chain is unfathomable anyway.

This change gets us 4 bits that are unused in the lower end of the
word, which are typically easier to load from Java and most machine
instruction sets.  We can use these in later changes.

Change-Id: Ib9e11221b5bca17c8a531e4ed130ba14c0e3744f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 23:26:19 -07:00
Shawn O. Pearce 6fc3ecac84 Extract PackFile specific code to ObjectToPack subclass
The ObjectReader class is dual-purposed into being a factory for the
ObjectToPack, permitting specific ObjectDatabase implementations
to override the method and offer their own custom subclass of the
generic ObjectToPack class.  By allowing them to directly extend the
type, each implementation can add custom fields to support tracking
where an object is stored, without incurring any additional penalties
like a parallel Map<ObjectId,Object> would cost.

The reader was chosen to act as a factory rather than the database,
as the reader will eventually be tied more tightly with the
ObjectWalk and TreeWalk.  During object enumeration the reader
would have had to load the object for the RevWalk, and may chose
to cache object position data internally so it can later be reused
and fed into the ObjectToPack instance supplied to the PackWriter.
Since a reader is not thread-safe, and is scoped to this PackWriter
and its internal ObjectWalk, its a great place for the database to
perform caching, if any.

Right now this change goes a bit backwards by changing what should
be generic ObjectToPack references inside of PackWriter to the very
PackFile specific LocalObjectToPack subclass.  We will correct these
in a later commit as we start to refine what the ObjectToPack API
will eventually look like in order to better support the PackWriter.

Change-Id: I9f047d26b97e46dee3bc0ccb4060bbebedbe8ea9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 23:26:19 -07:00
Shawn O. Pearce a2208be6aa Extract ObjectToPack to be top-level
This shortens the implementation within PackWriter, and starts
to open the door for some other refactorings based on changing
the ObjectToPack to be a public part of the API.

Change-Id: Id849cbffc4de20b903e844a2de7737eeb8b7a3ff
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 23:26:19 -07:00
Shawn O. Pearce ffe0614d4d Allow Repository.getDirectory() to be null
Some types of repositories might not be stored on local disk.  For
these, they will most likely return null for getDirectory() as the
java.io.File type cannot describe where their storage is, its not
in the host's filesystem.

Document that getDirectory() can return null now, and update all
current non-test callers in JGit that might run into problems on
such repositories.  For the most part, just act like its bare.

Change-Id: I061236a691372a267fd7d41f0550650e165d2066
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 18:03:41 -07:00
Shawn O. Pearce 8a9844b2af Redo event listeners to be more generic
Replace the old crude event listener system with a much more generic
implementation, patterned after the event dispatch techniques used
in Google Web Toolkit 1.5 and later.

Each event delivers to an interface that defines a single method,
and the event itself is what performs the delivery in a type-safe
way through its own dispatch method.

Listeners are registered in a generic listener list, indexed by
the interface they implement and wish to receive an event for.
Delivery of events is performed by looping through all listeners
implementing the event's corresponding listener interface, and using
the event's own dispatch method to deliver the event.  This is the
classical "double dispatch" pattern for event delivery.

Listeners can be unregistered by invoking remove() on their
registration handle.  This change therefore requires application
code to track the handle if it wishes to remove the listener at a
later point in time.

Event delivery is now exposed as a generic public method on the
Repository class, making it easier for any type of message to
be sent out to any type of listener that has registered, without
needing to pre-arrange for type-safe fireFoo() methods.

New event types can be added in the future simply by defining a
new RepositoryEvent subclass and a corresponding RepositoryListener
interface that it dispatches to.  By always adding new events through
a new interface, we never need to worry about defining an Adapter
to provide default no-op implementations of new event methods.

Change-Id: I651417b3098b9afc93d91085e9f0b2265df8fc81
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 18:03:41 -07:00
Shawn O. Pearce 203bd66267 Rename Repository getWorkDir to getWorkTree
This better matches with the name used in the environment
(GIT_WORK_TREE), in the configuration file (core.worktree),
and in our builder object.

Since we are already breaking a good chunk of other code
related to repository access, and this fairly easy to fix
in an application's code base, I'm not going to offer the
wrapper getWorkDir() method.

Change-Id: Ib698ba4bbc213c48114f342378cecfe377e37bb7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 18:03:41 -07:00
Shawn O. Pearce 532421d989 Refactor repository construction to builder class
The new FileRepositoryBuilder class helps applications to construct
a properly configured FileRepository, with properties assumed based
upon the standard Git rules for the local filesystem.

To better support simple command line applications, environment
variable handling and repository searching was moved into this
builder class.

The change gets rid of the ever-growing FileRepository constructor
variants, and the multitude of java.io.File typed parameters,
by using simple named setter methods.

Change-Id: I17e8e0392ad1dbf6a90a7eb49a6d809388d27e4c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:58:40 -07:00
Shawn O. Pearce 8f46ee4870 Remove Repository.toFile(ObjectId)
Not every type of Repository will be able to map an ObjectId into
a local file system path that stores that object's file contents.
Heck, its not even true for the FileRepository, as an object can
be stored in a pack file and not in its loose format.

Remove this from our public API, it was a mistake to publish it.

Change-Id: I20d1b8c39104023936e6d46a5b0d7ef39ff118e8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:58:39 -07:00
Shawn O. Pearce 41c04bbb28 Use ObjectInserter for loose objects in WalkFetchConnection
Rather than relying on the repository's ability to give us the
local file path for a loose object, just pass its inflated form to
the ObjectInserter for the repository.  We have to recompress it,
which may slow down fetches, but this is the slow dumb protocol.
The extra cost to do the compression locally isn't going to be a
major bottleneck.

This nicely removes the nasty part about computing the object
identity by hand, allowing us to instead rely upon the inserter's
internal computation.  Unfortunately it means we might store a loose
object whose SHA-1 doesn't match the expected SHA-1, such as if the
remote repository was corrupted.  This is fairly harmless, as the
incorrectly named object will now be stored under its proper name,
and will eventually be garbage collected, as its not referenced by
the local repository.

We have to flush the inserter after the object is stored because
we aren't sure if we need to read the object later, or not.

Change-Id: Idb1e2b1af1433a23f8c3fd55aeb20575e6047ef0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:58:06 -07:00
Shawn O. Pearce 5cfc29b491 Replace WindowCache with ObjectReader
The WindowCache is an implementation detail of PackFile and how its
used by ObjectDirectory.  Lets start to hide it and replace the public
API with a more generic concept, ObjectReader.

Because PackedObjectLoader is also considered a private detail of
PackFile, we have to make PackWriter temporarily dependent upon the
WindowCursor and thus FileRepository and ObjectDirectory in order to
just start the refactoring.  In later changes we will clean up the
APIs more, exposing sufficient support to PackWriter without needing
the file specific implementation details.

Change-Id: I676be12b57f3534f1285854ee5de1aa483895398
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:58:01 -07:00
Shawn O. Pearce 133c987f4d Refactor alternate object databases below ObjectDirectory
Not every object storage system will have the concept of alternate
object databases to search, and even if they do, they may not have
the notion of fast-access / slow-access split like we do within
the ObjectDirectory code for pack files and loose objects.

Push all of that down below the generic API so that it is a hidden
detail of the ObjectDirectory and its related supporting classes.

Change-Id: I54bc1ca5ff2ac94dfffad1f9a9dad7af202b9523
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:41 -07:00
Shawn O. Pearce 88530a179e Start using ObjectInserter instead of ObjectWriter
Some newer style APIs are updated to use the newer ObjectInserter
interface instead of the now deprecated ObjectWriter.  In many of
the unit tests we don't bother to release the inserter, these are
typically using the file backend which doesn't need a release,
but in the future should use an in-memory HashMap based store,
which really wouldn't need it either.

Change-Id: I91a15e1dc42da68e6715397814e30fbd87fa2e73
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:41 -07:00
Shawn O. Pearce cad10e6640 Refactor object writing responsiblities to ObjectDatabase
The ObjectInserter API permits ObjectDatabase implementations to
control their own object insertion behavior, rather than forcing
it to always be a new loose file created in the local filesystem.
Inserted objects can also be queued and written asynchronously to
the main application, such as by appending into a pack file that
is later closed and added to the repository.

This change also starts to open the door to non-file based object
storage, such as an in-memory HashMap for unit testing, or a more
complex system built on top of a distributed hash table.

To help existing application code port to the newer interface we
are keeping ObjectWriter as a delegation wrapper to the new API.
Each ObjectWriter instances holds a reference to an ObjectInserter
for the Repository's top-level ObjectDatabase, and it flushes and
releases that instance on each object processed.

Change-Id: I413224fb95563e7330c82748deb0aada4e0d6ace
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:41 -07:00
Shawn O. Pearce 3e3a50db5e Change Repository.getConfig() to return non-file Configs
A repository implementation might support storing configurations
on a non-file storage system, so widen the return value to be any
type of configuration.

Change-Id: If9a0928f4b3ef29a24d270b0ce585a6e77f6fac6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:40 -07:00
Shawn O. Pearce 4c14b7623d Make lib.Repository abstract and lib.FileRepository its implementation
To support other storage models other than just the local filesystem,
we split the Repository class into a nearly abstract interface and
then create a concrete subclass called FileRepository with the file
based IO implementation.

We are using an abstract class for Repository rather than the much
more generic interface, as implementers will want to inherit a large
array of utility functions, such as resolve(String).  Having these in
a base class makes it easy to inherit them.

This isn't the final home for lib.FileRepository.  Future changes
will rename it into storage.file.FileRepository, but to do that we
need to also move a number of other related class, which we aren't
quite ready to do.

Change-Id: I1bd54ea0500337799a8e792874c272eb14d555f7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:40 -07:00
Shawn O. Pearce 77b39df5ec Consistently fail work tree methods on bare repositories
If the working tree isn't available, it doesn't make any sense to
obtain the merge heads, or the buffered commit message.  The
repository shouldn't have a partial merge state to read.  Throw back
the same exception we do when invoking getWorkDir() on a bare
repository instance.

Change-Id: I762c55890b7fe272a183da583f910671d1cadf71
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:40 -07:00
Shawn O. Pearce f18b853044 Consistently use getDirectory() for work tree state
This permits us to leave the implementation of these methods here in
the Repository class, but later refactor how the directory is accessed
into a subclass.

Change-Id: I5785b2009c5b7cca0fb070a968e50814ce847076
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:40 -07:00
Shawn O. Pearce a63494edee Add RepositoryState.BARE
A bare repository cannot be checked out, committed to, etc. as it
doesn't have a working directory.  Define this as a state since the
state enumeration exists only to describe how a working directory
can be modified.

Change-Id: I0a299013c6e42fef6cae3f6a9446f8f6c8e0514a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:40 -07:00
Shawn O. Pearce c9c57d34de Rename Repository 'config' as 'repoConfig'
This better matches with the other configuration variable,
'userConfig', and helps to make it clear what config object
we are dealing with.

Change-Id: I2c585649aecc805e8e66db2f094828cd2649e549
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:40 -07:00
Shawn O. Pearce 6a822f0ebf Remove RepositoryConfig and use FileBasedConfig instead
Change the Repository API to use straight-up FileBasedConfig.
This lets us remove the subclass RepositoryConfig and stop having
a specialized configuration type for repository, letting us instead
focus the config type heirarchy on type-of-storage rather than use.

Change-Id: I7236800e8090624453a89cb0c7a9a632702691c6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:40 -07:00
Shawn O. Pearce bd8b06427f Delegate repository access to refs, objects
Instead of using the internal field directly to access references
or objects, use the getter method to obtain the proper type of
database, and follow down from there.  This permits us to later
do a refactoring that makes those methods abstract and strips the
field out of the Repository class, moving it into a concrete base
class that is more storage implementation specific.

Change-Id: Ic21dd48800e68a04ce372965ad233485b2a84bef
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:40 -07:00
Shawn O. Pearce f6c26dabd2 Cleanup Repository.create()
This method doesn't need to be synchronized, as its only a proxy to
create(boolean), which is the real worker.  While we are touching
it try to improve the Javadoc and whitespace nearby.

Change-Id: Ibdddec6e518ca6d7439cfad90fedfcdc2d6b7a2e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:39 -07:00
Shawn O. Pearce 5309244713 Move additional have enumeration to Repository
This permits the repository implementation to know what its
alternates concept means, and avoids needing to expose finer details
about the ObjectDatabase to network code like the RefAdvertiser.

Change-Id: Ic6d173f300cb72de34519c7607cf7b0ff3ea6882
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:39 -07:00
Shawn O. Pearce 479fcf9e32 Refactor amazon-s3:// property file loading to support no directory
In the future getDirectory() can return null.  Avoid an NPE here by
refactoring the code to support conditionally skipping a check for
the properties file in the repository directory, falling to only
the user's ~/ file location.

Change-Id: I76f5503d4063fdd9d24b7c1b58e1b09ddf1a5670
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:39 -07:00
Shawn O. Pearce f39c9fc741 Download pack-*.idx to /tmp if not on local filesystem
If the destination repository doesn't use an ObjectDirectory to
store its objects, we can't download to the object directory.
Instead pull the pack-*.idx files down to temporary files in the
JVM's default temporary directory.

Change-Id: Ied16bc89be624d87110ba42ba52d698a6ea7d982
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:39 -07:00
Shawn O. Pearce 553c2e5a42 DirCache must use getIndexFile
When reading or locking the index of a repository, we need to use
the index file specified by the repository, to ensure we correctly
honor what the repository was configured with.

Change-Id: I5be366ce32d7923b888dc01d19335912b01b7c4c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-25 17:46:39 -07:00
Shawn O. Pearce 60aae90d4d Disable topological sorting in PackWriter
Its not strictly required that we sort topologically in order to
produce a valid pack file.  This was just something that Linus
thought would be a good idea to do.  In practice its not that
important for most repositories.  Local file IO quickly falls
out of the pattern that topological sorting provides any sort
of benefit for, so expending extra resources to enforce it when
we make a pack isn't really worth it.

I'm removing this sort in the pipeline because later changes
would support really efficient COMMIT_TIME_DESC sorting on a
non-file storage system, but TOPO sorting would be a bit more
ugly to run, due to the in-memory delays it imposes.

Change-Id: I0121453461c2140c6917cb10c6df584eb47e5795
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 17:32:41 -07:00
Shawn O. Pearce ccd0c0c911 UploadPack: Permit flushing progress messages under smart HTTP
If UploadPack invokes flush() on the output stream we pass it, its
most likely the progress messages coming down the side band stream.
As pack generation can take a while, we want to push that down
at the client as early as we can, to keep the connection alive,
and to let the user know we are still working on their behalf.

Ensure we dump the temporary buffer whenever flush() is invoked,
otherwise the messages don't get sent in a timely fashion to the
user agent (in this case, git fetch).

We specifically don't implement flush() for ReceivePack right now,
as that protocol currently does not provide progress messages to
the user, but it does invoke flush several times, as the different
streams include '0000' type flush-pkts to denote various end points.

Change-Id: I797c90a2c562a416223dc0704785f61ac64e0220
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 17:32:41 -07:00
Shawn O. Pearce b6ba9739d5 Rewrite resolve in terms of RevWalk
We want to eventually get rid of the mapCommit, mapTree APIs on
Repository and force everyone into the faster parsers that exist
in RevWalk.  Rewriting resolve in terms of the faster parsers is
a good first step.

It actually simplifies the code a bit, as we no longer need to
keep track of an ObjectId and an Object (the parsed form), since
all RevObjects implicitly have their ObjectId readily available.

Change-Id: I4d234630195616e2c263e7e70038b55a1be4e7a3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 17:32:41 -07:00
Shawn O. Pearce 47c07e1a0d Replace manual peel loops with RevWalk.peel
Instead of peeling things by hand in application level code, defer
the peeling logic into RevWalk's new peel utility method.

Change-Id: Idabd10dc41502e782f6a2eeb56f09566b97775a8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 17:32:40 -07:00
Shawn O. Pearce 599c0ce745 Use RevTag/RevCommit to sort in a PlotWalk
We already have these objects parsed and cached in our object pool.
We shouldn't be looking them up via the legacy mapObject API, but
instead can use the pool and the faster parsing routines available
through the RevWalk that we extend.

While we are here fixing the code, lets also correct the tag date
sorting to accept tags that have no tagger identity, because they
were created before Git knew how to store that field.

Change-Id: Id49a11f6d9c050c82b876e5e11058840c894b2d7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 17:32:40 -07:00
Shawn O. Pearce e1b312b5f7 Use CoreConfig, UserConfig and TransferConfig directly
Rather than relying on the helpers in RepositoryConfig to get
these objects, obtain them directly through the Config API.
Its only slightly more verbose, but permits us to work with the
base Config class, which is more flexible than the highly file
specific RepositoryConfig.

This is what I really meant to do when I added the section parser
and caching support to Config, we just failed to finish updating
all of the call sites.

Change-Id: I481cb365aa00bfa8c21e5ad0cd367ddd9c6c0edd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 17:29:38 -07:00
Shawn O. Pearce 8e396bcddc Use higher level Config types when possible
We don't have to assume/depend on RepositoryConfig here, these
two tests can use higher level versions of the class and still
come up with the same test.  That frees us up to do some changes
to the RepositoryConfig API.

Change-Id: Ia7b263c8c5efa3fae1054416d39c546867288132
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 17:29:37 -07:00
Shawn O. Pearce 5ed96eb7f4 UploadPack: Avoid unnecessary flush in smart HTTP
Under smart HTTP the biDirectionalPipe flag is false, and we return
back immediately at this point in the negotiation process.  There is
no need to flush the stream to the client, the request is over and
it will be automatically flushed out by the higher level servlet
that invoked us.  Avoiding flush here allows us to only use flush
after a progress message is sent during pack generation.

Change-Id: Id0c8b7e95e3be6ca4c1b479e096bed6b0283b828
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:15 -07:00
Shawn O. Pearce 066df3d1a1 Add MutableObjectId.copyFrom(AnyObjectId)
This simplifies the PackIndex code, which is trying to quickly copy
an existing ObjectId into a MutableObjectId.  Rather than having
the PackIndex violate the ObjectId's internals, expose a copy from
function similar to the other ones for copying from raw byte arrays
or hex formatted strings.

Change-Id: I142635cbece54af2ab83c58477961ce925dc8255
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:15 -07:00
Shawn O. Pearce 677b9b17e2 Expose AnyObjectId compareTo(byte[]) and compareTo(int[])
Storage systems can use these implementations to compare a passed
AnyObjectId with a stored representation of an ObjectId in the
canonical network byte order format.  This can be useful to do a
binary search, or just linear scan, over an encoded storage file.

Change-Id: I8c72993c4f4c6e98d599ac2c9867453752f25fd2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:15 -07:00
Shawn O. Pearce 864cc3de10 Expose RefWriter constructor taking RefList
An implementation might prefer to use the RefList type here, and
RefList is part of our public API.  Expose the constructor so callers
who have a RefList can take advantage of the existing sorting.

Change-Id: I545867f85aa2c479d2d610024ebbe318144709c8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:15 -07:00
Shawn O. Pearce bfc43c13bc Expose RefUpdate constructor to any subclass
When we finally move RefDirectory to the new storage.file package,
its associated RefDirectoryUpdate will need visiblity to this
constructor in order to initialize itself.  This is true of any
other repository implementation, so make it protected rather than
package level visible.

Change-Id: If838aec9baeb80ee2f12dcbca717657c725a9242
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:14 -07:00
Shawn O. Pearce 8e40697047 Expose repository change event constructors
Repository implementations outside of .lib need to be able to
create these events and deliver them to listening application code.

Expose and document the constructors so that they are visible when
we move FileRepository into storage.file.FileRepository.

Change-Id: I7fb6e8f4f5fdab683c5ebb5267673aa6d5b560bb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:14 -07:00
Shawn O. Pearce b3254d1159 isValidRefName: Inline the forbidden ref suffix of ".lock"
A Git reference name must never end with ".lock", as it would
confuse any existing C client that tries to obtain a clone of the
repository over the network.  Even if the repository isn't on a
local filesystem, it still should ban that suffix.

Because I plan to move LockFile to storage.file and make it a private
implementation detail of the local file system storage model,
we can't rely on its package level SUFFIX field here.  Making it
public probably won't work long-term either, as I also plan to
pull storage.file into its own separate project that depends on
the core library.

So, just inline the constant here.  Its as foribidden as ":" is.

Change-Id: If85076861baeacc183b82696375a13e935ba8836
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-23 16:54:14 -07:00
Shawn Pearce f3186974b6 Merge "Fix line endings" 2010-06-18 18:15:53 -04:00
Matthias Sohn 767fb175ed Fix line endings
Some sources had dos line endings. Also configure all projects to use
unix line endings and UTF-8 text encoding.

Change-Id: I8fc9a1dbb219ffa91d1b3011b3b11b7e48e74ca7
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-06-18 23:36:18 +02:00
Shawn Pearce 3149f971e0 Merge ""Bare" Repository should not return working directory." 2010-06-16 22:34:46 -04:00
Andrew Bayer 068eb92710 Make ObjectId, RefSpec, RemoteConfig, URIish serializable
Modifications to various classes in order to allow serialization
for use of JGit in Hudson's git plugin.

Change-Id: If088717d3da7483538c00a927e433a74085ae9e6
2010-06-16 16:10:28 -07:00
Mathias Kinzler 3c51b35e03 "Bare" Repository should not return working directory.
If a repository is "bare", it currently still returns a working directory.
This conflicts with the specification of "bare"-ness.

Bug: 311902

Change-Id: Ib54b31ddc80b9032e6e7bf013948bb83e12cfd88
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
2010-06-16 08:50:26 +02:00
Chris Aniszczyk 8a11ac3d69 Merge "Add missing @Override tags in AlternateRepositoryDatabase" 2010-06-15 11:40:04 -04:00
Mathias Kinzler c1c1300a74 Allow to read configured keys
Currently, there is no way to read the content
of the Git Configuration in a  way that would
allow to list all configured values generically.
This change extends the Config class in such a
way as to being able to get a list of sections and
to get a list of names for any given section or
subsection.
This is required in able to implement proper
configuration handling in EGit (show all the
content of a given configuration similar to 
"git config -l").

Change-Id: Idd4bc47be18ed0e36b11be8c23c9c707159dc830
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
2010-06-15 10:12:26 +02:00
Shawn Pearce 86fcdc53ad Merge changes I53f71dc0,I3a899a3a,I3e8bd245,Ie7c9db83,If396326e,I6f4cf8da,I3bf96dd0,I3a2a43a1,I292fe88c,Ia1cf40cf
* changes:
  git-servlet: Fix comparing uploadFactory with the wrong DISABLED instance
  Prefer static inner classes
  Override equals for SwingLane since super class PlotLane defines it
  Make sure a Stream is closed upon errors in IpLogGenerator
  Make constant static in RebuildCommitGraph
  Make inner classes static in http code
  Cache filemode in GitIndex 
  Remove unused parent field in PlotLane
  Removed unused repo field in WorkDirCheckout
  Extend DiffFormatter API to simplify styling
2010-06-14 19:59:48 -04:00
Shawn O. Pearce bc238acdc5 Add missing @Override tags in AlternateRepositoryDatabase
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-14 12:59:30 -07:00
Shawn O. Pearce 44ba1bc78c Merge branch 'stable-0.8'
* stable-0.8:
  Qualify post-0.8.4 builds
  JGit 0.8.4
  JGit 0.8.3
  Include about.html in org.eclipse.jgit artifact
  Fix build.properties of the JGit feature
  Added the standard SULA for JGit
  Add "resources/" as a source folder

Change-Id: I4ecb0af41184ef84d104345fd1adcc4a240a38f6
2010-06-14 08:12:48 -07:00
Shawn O. Pearce 239ce58553 Start 0.9 development
Change-Id: I84173ece5100f1fcb78168e2e102b649d9466c08
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-14 08:11:27 -07:00
Shawn O. Pearce d28a40d679 Qualify post-0.8.4 builds
Change-Id: I21efed66921eb7e1e4010fccc9fa9af6c4150fc1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-14 08:10:08 -07:00
Matthias Sohn 6970edf35a JGit 0.8.4
Created wrong tags for 0.8.3 hence creating another version.

Change-Id: I4e00bbcffe1cf872e2d7e3f3d88d068701fb5330
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-06-14 15:42:09 +02:00
Matthias Sohn 5255d66143 JGit 0.8.3
Change-Id: I845da83c74475d74ec25d68f53c0a4738a898550
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-06-14 01:34:34 +02:00
Robin Rosenberg 3bf96dd04b Cache filemode in GitIndex
Apparently this was the intention, but never happened

Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-13 03:16:32 +02:00
Robin Rosenberg 3a2a43a1dc Remove unused parent field in PlotLane
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-13 03:13:57 +02:00
Robin Rosenberg 292fe88c50 Removed unused repo field in WorkDirCheckout
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-13 03:12:41 +02:00
Robin Rosenberg ce56c5dcc9 Extend DiffFormatter API to simplify styling
Refactor and extend the internals so users can override and
intervene during formatting, e.g. to colorize output.

Change-Id: Ia1cf40cfd4a5ed7dfb6503f8dfc617237bee0659
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-12 15:31:04 +02:00
Matthias Sohn 4269a90aac Include about.html in org.eclipse.jgit artifact
This is required to enable accessing legal info for
org.eclipse.jgit from
Help > About > Installation Details > Plugins

Change-Id: I73f40dd2018112cd23102954d7647ecdbbbf0d89
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-06-08 02:41:37 +02:00
Matthias Sohn ab360d06de Add "resources/" as a source folder
Building jgit with pde.build was broken without resources.

Bug:315823
Change-Id: I45be510ada068b3ffab0feb30ec60f2c96a5ca32
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-06-05 14:39:27 +02:00
Marc Strapetz 936e4ab2f2 Repository can be configured with FS
On Windows, FS_Win32_Cygwin has been used if a Cygwin Git installation
is present in the PATH. Assuming that the user works with the Cygwin
Git installation may result in unnecessary overhead if he actually
does not.

Applications built on top of jgit may have more knowledge on the
actually used Git client (Cygwin or not) and hence should be able to
configure which FS to use accordingly.

Change-Id: Ifc4278078b298781d55cf5421e9647a21fa5db24
2010-06-04 19:08:58 -07:00
Robin Rosenberg 920d89d6af Add support for computing a Change-Id à la Gerrit
A Change-Id helps tools like Gerrit Code Review to keeps different
versions of a patch together. The Change-Id is computed as a SHA-1
hash of some of the same basic information as a commit id on the first
commit intended to solve a particular problem and then reused for
updated solutions.

Change-Id: I04334f84e76e83a4185283cb72ea0308b1cb4182
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-06-04 18:42:14 -07:00
Alex Blewitt 046d1a2ef6 Provide a public entry method to determine whether a URI protocol is supported 2010-06-04 00:38:50 +01:00
Shawn O. Pearce d8ec8527a6 Qualify post-0.8.1 builds
Change-Id: Id86e5876b2f684b2a272c07061a276b054ba410d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-02 15:55:39 -07:00
Shawn O. Pearce be86767d71 JGit 0.8.1
Change-Id: I3d4ac7d0617a3575019e2ed748ed2a298a988340
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-02 14:47:31 -07:00
Shawn O. Pearce 16419dad35 Don't use interruptable pread() to access pack files
The J2SE NIO APIs require that FileChannel close the underlying file
descriptor if a thread is interrupted while it is inside of a read or
write operation on that channel.  This is insane, because it means we
cannot share the file descriptor between threads.  If a thread is in
the middle of the FileChannel variant of IO.readFully() and it
receives an interrupt, the pack will be automatically closed on us.
This causes the other threads trying to use that same FileChannel to
receive IOExceptions, which leads to the pack getting marked as
invalid.  Once the pack is marked invalid, JGit loses access to its
entire contents and starts to report MissingObjectExceptions.

Because PackWriter must ensure that the chosen pack file stays
available until the current object's data is fully copied to the
output, JGit cannot simply reopen the pack when its automatically
closed due to an interrupt being sent at the wrong time.  The pack may
have been deleted by a concurrent `git gc` process, and that open file
descriptor might be the last reference to the inode on disk.  Once its
closed, the PackWriter loses access to that object representation, and
it cannot complete sending the object the client.

Fortunately, RandomAccessFile's readFully method does not have this
problem.  Interrupts during readFully() are ignored.  However, it
requires us to first seek to the offset we need to read, then issue
the read call.  This requires locking around the file descriptor to
prevent concurrent threads from moving the pointer before the read.

This reduces the concurrency level, as now only one window can be
paged in at a time from each pack.  However, the WindowCache should
already be holding most of the pages required to handle the working
set for a process, and its own internal locking was already limiting
us on the number of concurrent loads possible.  Provided that most
concurrent accesses are getting hits in the WindowCache, or are for
different repositories on the same server, we shouldn't see a major
performance hit due to the more serialized loading.

I would have preferred to use a pool of RandomAccessFiles for each
pack, with threads borrowing an instance dedicated to that thread
whenever they needed to page in a window.  This would permit much
higher levels of concurrency by using multiple file descriptors (and
file pointers) for each pack.  However the code became too complex to
develop in any reasonable period of time, so I've chosen to retrofit
the existing code with more serialization instead.

Bug: 308945
Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-27 08:27:32 -07:00
Stefan Lay 5b0e73b849 Add a merge command to the jgit API
Merges the current head with one other commit.
In this first iteration the merge command supports
only fast forward and already up-to-date.

Change-Id: I0db480f061e01b343570cf7da02cac13a0cbdf8f
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
2010-05-24 09:52:28 -05:00
Christian Halstrick 6ca9843f3e Added merge support to CommitCommand
The CommitCommand should take care to create a merge commit if the file
$GIT_DIR/MERGE_HEAD exists. It should then read the parents for the merge
commit out of this file. It should also take care that when commiting
a merge and no commit message was specified to read the message from
$GIT_DIR/MERGE_MSG.
Finally the CommitCommand should remove these files if the commit
succeeded.

Change-Id: 	I4e292115085099d5b86546d2021680cb1454266c
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
2010-05-21 01:49:46 +02:00
Sasa Zivkov f3d8a8ecad Externalize strings from JGit
The strings are externalized into the root resource bundles.
The resource bundles are stored under the new "resources" source
folder to get proper maven build.

Strings from tests are, in general, not externalized. Only in
cases where it was necessary to make the test pass the strings
were externalized. This was typically necessary in cases where
e.getMessage() was used in assert and the exception message was
slightly changed due to reuse of the externalized strings.

Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
2010-05-19 14:37:16 -07:00
Shawn O. Pearce 2e961989e4 Fix SSH deadlock during OutOfMemoryError
In close() method of SshFetchConnection and SshPushConnection
errorThread.join() can wait forever if JSch will not close the
channel's error stream.  Join with a timeout, and interrupt the
copy thread if its blocked on data that will never arrive.

Bug: 312863
Change-Id: I763081267653153eed9cd7763a015059338c2df8
Reported-by: Dmitry Neverov <dmitry.neverov@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-19 11:43:42 -07:00
Dmitry Neverov b3247ba524 Fix race condition in StreamCopyThread
If we get an interrupt during an IO operation (src.read or dst.write)
caused by the flush() method incrementing the flush counter, ensure
we restart the proper section of code.  Just ignore the interrupt
and continue running.

Bug: 313082
Change-Id: Ib2b37901af8141289bbac9807cacf42b4e2461bd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-19 11:40:33 -07:00
Shawn O. Pearce ae972e774e Remove unnecessary truncation of in-pack size during copy
The number of bytes to copy was truncated to an int, but the
pack's copyToStream() method expected to be passed a long here.
Pass through the long so we don't truncate a giant object.

Change-Id: I0786ad60a3a33f84d8746efe51f68d64e127c332
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-17 07:13:55 -07:00
Shawn O. Pearce b6d0586bef Reduce the size of PackWriter's ObjectToPack instances
Rather than holding onto the PackedObjectLoader, only hold the
PackFile and the object offset.  During a reuse copy that is all
we should need to complete a reuse, and the other parts of the
PackedObjectLoader just waste memory.

This change reduces the per-object memory usage of a PackWriter by
32 bytes on a 32 bit JVM using only OFS_DELTA formatted objects.
The savings is even larger (by another 20 bytes) for REF_DELTAs.
This is close to a 50% reduction in the size of ObjectToPack,
making it rather worthwhile to do.

Beyond the memory reduction, this change will help to make future
refactoring work easier.  We need to redo the API used to support
copying data, and disconnecting it from the PackedObjectLoader is
a good first step.

Change-Id: I24ba4e621e101f14e79a16463aec5379f447aa9b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-15 17:51:11 -07:00
Shawn O. Pearce cb5bc19540 Reduce size of PackedObjectLoader by dropping long to int
Rather than keep track of both the position of the object, and the
position of its data, just keep track of the number of bytes used
by the object's header in the pack.  This shaves 4 bytes out of the
size of the PackedObjectLoader instances.

We also can defer the addition instruction to the materialize()
operation, avoiding it entirely if the caller never actually uses
the loader.  This may be relevant for PackWriter invocations,
where only 1 loader gets chosen for a given object, even though
the object may appear on disk in more than one pack file.

Error reporting is now simplified, as we can rely on the object
offset rather than its data offset.  This is the value displayed
by pack debugging tools like `git verify-pack -v`, so its better
to use that in our own errors.

Because nobody needs getDataOffset() now, we can drop that from
the public API.

Change-Id: Ic639c0d5a722315f4f5c8ffda6e26643d90e5f42
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-15 17:37:18 -07:00
Shawn O. Pearce 9c4d42e94d Factor out duplicate Inflater setup in WindowCursor
Since we use this code twice, pull it into a private method.  Let
the compiler/JIT worry about whether or not this logic should be
inlined into the call sites.

Change-Id: Ia44fb01e0328485bcdfd7af96835d62b227a0fb1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-15 16:18:44 -07:00
Shawn O. Pearce d8f20745bf Squash OffsetCache into WindowCache
Originally when I wrote this code I had hoped to use OffsetCache
to also implement the UnpackedObjectCache.  But it turns out they
need rather different code, and it just wasn't worth trying to
reuse the OffsetCache base class.

Before doing any major refactoring or code cleanups here, squash the
two classes together and delete OffsetCache.  As WindowCache is our
only subclass, this is pretty simple to do.  We also get a minor
code reduction due to less duplication between the two classes,
and the JIT should be able to do a better job of optimization here
as we can define types up front rather than relying on generics
that erase back to java.lang.Object.

Change-Id: Icac8bda01260e405899efabfdd274928e98f3521
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-15 16:12:13 -07:00
Shawn O. Pearce 3cb59216f5 Avoid unnecessary second read on OBJ_OFS_DELTA headers
When we read the object header we copy 20 bytes from the pack data,
then start parsing out the type and the inflated size.  For most
objects, this is only going to require 3 bytes, which is sufficient
to represent objects with inflated sizes of up to 2^16.  The local
buffer however still has 17 bytes remaining in it, and that can be
used to satisfy the OBJ_OFS_DELTA header.

We shouldn't need to worry about walking off the end of the buffer
here, because delta offsets cannot be larger than 64 bits, and that
requires only 9 bytes in the OFS_DELTA encoding.

Assuming worst-case scenarios of 9 bytes for the OFS_DELTA encoding,
the pack file itself must be approaching 2^64 bytes, an infeasible
size to store on any current technology.  However, even if this
were the case we still have 11 bytes for the type/size header.
In that encoding we can represent an object as large as 2^74 bytes,
which is also an infeasible size to process in JGit.

So drop the second read here.

The data offsets we pass into the ObjectLoaders being constructed
need to be computed individually now.  This saves a local variable,
but pushes the addition operation into each branch of the switch.

Change-Id: I6cf64697a9878db87bbf31c7636c03392b47a062
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-15 16:09:07 -07:00
Shawn O. Pearce 3cba5377df Fix hang when fetching over SSH
JSch may hang or abort with the timeout if JGit connects before
its obtained the streams.  Instead defer the connect() call until
after the streams have been configured.

Bug: 312383
Change-Id: I7c3a687ba4cb69a41a85e2b60d381d42b9090e3f
Reported-by: Dmitry Neverov <dmitry.neverov@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-13 10:23:33 -07:00
Shawn O. Pearce f999b4aa63 Fix interrupted write in StreamCopyThread
If a flush() gets delivered at the same time that we are blocking
while writing to an interruptable stream, the copy thread will
abort assuming its a stream error.  Instead ignore the interrupt,
and retry the write.

Change-Id: Icbf62d1b8abe0fabbb532dbee088020eecf4c6c2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-13 09:58:21 -07:00
Dmitry Neverov 3f143b8d6b Fix missing flush in StreamCopyThread
It is possible to miss flush() invocation in StreamCopyThread.
In this case some data will not be sent to remote host and we will
wait forever (or until timeout) in src.read().

Use a counter to keep track of the flush requests.

Change-Id: Ia818be9b109a1674d9e2a9c78e125ab248cfb75b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-13 09:58:18 -07:00
Christian Halstrick f3fb5824ba Add builder-style API to jgit and Commit & Log cmd
Added a new package org.eclipse.jgit.api and a builder-style API for
jgit. Added also the first implementation for two git commands: Commit
and Log.

This API is intended to be used by external components when
functionalities of the standard git commands are required. It will also
help to ease writing JGit tests.

For internal usages this API may often not be optimal because the git
commands are doing much more than required or they expect parameters of
an unappropriate type.

Change-Id: I71ac4839ab9d2f848307eba9252090c586b4146b
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
2010-05-10 15:17:55 +02:00
Robin Rosenberg 541ad72ac6 Merge "Added MERGING_RESOLVED repository state" 2010-05-08 17:16:26 -04:00
Robin Rosenberg 0df679aea1 Merge "A stages field and getter for GitIndex entry introduced" 2010-05-08 17:13:25 -04:00
Robin Rosenberg a496410df9 A stages field and getter for GitIndex entry introduced
Currently, if the Index contains a file in more than one stage, only
the last entry (containing the highest stage) will be registered in
GitIndex. For applications it can be useful to not only know about the
highest stage, but also which other stages are present, e.g. to detect
the type of conflict the file is in.

Change-Id: I2d4ff9f6023335d9ba6ea25d8e77c8e283ae53cb
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-05-08 23:12:19 +02:00
Christian Halstrick b9ab040b45 Added MERGING_RESOLVED repository state
The repository state tells in which state the repo is and also which actions
are currently allowed. The state MERGING is telling that a commit is not
possible. But this is only true in the case of unmerged paths in the index.
When we are merging but have resolved all conflicts then we are in a special
state: We are still merging (means the next commit should have multiple
parents) but a commit is now allowed.

Since the MERGING state "canCommit()" cannot be enhanced to return true/false
based on the index state (MERGING is an enum value which does not have a
reference to the repository its state it is representing) I had to introduce a new
state MERGING_RESOLVED. This new state will report that a commit is possible.

CAUTION: there might be the chance that users of jgit previously blindly did a
plain commit (with only one parent) when the RepositoryState allowed them to
do so. With this change these users will now be confronted with a RepositoryState
which says a commit is possible but before they can commit they'll have to
check the MERGE_MESSAGE and MERGE_HEAD files and use the info from these
files.

Change-Id: I0a885e2fe8c85049fb23722351ab89cf2c81a431
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-05-08 22:03:18 +02:00
Shawn O. Pearce dd63f5cfc1 Fix FooterLine.matches(FooterKey) on same length keys
If two keys are the same length, but don't share the same sequence
of characters, we were incorrectly claiming they still matched due
to a bug in the for loop condition.  I used the wrong variable and
the loop never executed, resulting in equality anytime the two keys
being compared were the same length.

Use the proper local variable to loop through the arrays, and add
a JUnit test to verify equality works as expected.

Change-Id: I4a02400e65a9b2e0da925b05a2cc4b579e1dd33a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-05-04 16:25:20 -07:00
Chris Aniszczyk d011a377cb Merge "Fix handling of corruption for truncated objects" 2010-05-03 03:40:36 -04:00
Chris Aniszczyk 28e42cb463 Merge "Don't insert the same pack twice into a pack list" 2010-05-03 03:40:06 -04:00
Chris Aniszczyk 11096a89a5 Merge changes I0d339b9f,I0e6673b8
* changes:
  Favor earlier PackFile instances over later duplicates
  Cleanup duplicated object reuse code in PackWriter
2010-05-03 03:39:47 -04:00
Robin Rosenberg c10e134157 Fix handling of corruption for truncated objects
If a loose object was corrupted by truncation, JGit would hang.

Change-Id: I7e4c14f44183a5fcb37c1562e81682bddeba80ad
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-05-01 09:50:38 +02:00
Chris Aniszczyk f1946b0669 Cleaning up provider and feature names
It is incorrect to use Eclipse.org as the providerName now,
we'll use Eclipse JGit.

Change-Id: I1621b93d4f401176704e7c43935a5ce0c8ee8419
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
2010-04-27 09:26:25 -05:00
Shawn O. Pearce 374c28057a Don't insert the same pack twice into a pack list
If a concurrent thread picks up a newly created PackFile and adds
it to the pack list before the IndexPack thread itself can insert
the item onto the front of the list, do nothing and use the item
that was picked up by that other concurrent scanning thread.

This avoids a potential condition where the same pack exists in
memory twice, which causes confusion later during a rescan of the
directory because we don't know exactly which PackFile instance
should be retained into the new list, and which should be discarded.

We can stop searching through the old pack list as soon as the
sort function declares that the item to insert should be before
the item already in the list.  Because the list is always sorted
by modification time (in seconds), we should never encounter a
case where the pack is positioned at the wrong spot in the list.
This early break out still permits an efficient implementation of
the common case, inserting a new pack at the head of the list.

Change-Id: Ice4459bbd4ee9487078aff5257893883d04f05fb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-04-26 17:33:53 -07:00
Shawn O. Pearce a0a52897ed Favor earlier PackFile instances over later duplicates
There is a potential race condition during insertPack that can lead
to us having the same pack file open twice in the same directory.

A different thread can miss an object on disk, and trigger a scan
of the directory, and notice the pack that was put in by IndexPack.
So the pack winds up in the newly created PackList.

The IndexPack thread then wakes up and finishes its insertPack by
creating a new PackFile and inserting it into position 0 of the list.
We now have the same pack listed twice.

Readers will favor the earlier PackFile instance, because its the
first one they come across as they iterate through the list.

Keep that earlier one when we scan the pack directory again, as
this will avoid needing to purge out all of the windows that may
have been cached.

Of course we should also fix that race condition, but this block
was taking the wrong resolution if this error ever shows up, so
lets first fix the block to use a more sane resolution.

Change-Id: I0d339b9fd1dd8012e8fe5a564b893c0f69109e28
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-04-26 17:32:04 -07:00
Shawn O. Pearce eeed0abd16 Cleanup duplicated object reuse code in PackWriter
This reuse line was identical between the two branches related to
reusing a delta, or reusing a whole object.  Either way they reuse
the body of the object as-is.  So just make that a common function
after the header is written.

Change-Id: I0e6673b8e813c8c08c594ea2ba546fd366339d5d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-04-26 17:29:10 -07:00
Robin Rosenberg 4ef96296f7 Merge "Fix NPE during InflaterCache return after corrupt loose object" 2010-04-24 08:19:01 -04:00
Shawn O. Pearce dafa8fbff4 Fix NPE during InflaterCache return after corrupt loose object
If a corrupt loose object is read, UnpackedObjectLoader was disposing
of the Inflater, and then attempting to return the disposed Inflater
to the InflaterCache.  Since the disposed Inflater had its native
libz resource deallocated and its reference cleared out, the Inflater
threw NullPointerException and refused to reset itself before being
put back into the cache.

Instead of disposing of the Inflater when corruption is found, do
nothing, and allow it to be returned to the cache.  The instance
will get reset, and should be usable by a future caller.

Bug: 310291
Change-Id: I44f2247c08b6e04fa62f8399609341b07508c096
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-04-23 11:16:25 -07:00
Shawn O. Pearce f36df5dc6a Merge branch 'receive-pack-filter'
* receive-pack-filter:
  ReceivePack: Clarify the check reachable option
  ReceivePack: Micro-optimize object lookup when checking connectivity
  ReceivePack: Correct type of not provided object
  IndexPack: Tighten up new and base object bookkeeping
  ReceivePack: Remove need new,base object id properties
  ReceivePack: Discard IndexPack as soon as possible
  ReceivePack: fix ensureProvidedObjectsVisible on thin packs

Change-Id: I4ef2fcb931f3219872e0519abfcee220191d5133
2010-04-19 18:20:42 -07:00
Matthias Sohn 9605fcc0fb Merge "ObjectIdSubclassMap: Correct Iterator to throw NoSuchElementException" 2010-04-17 18:35:38 -04:00
Matthias Sohn f1be93eb87 Merge "ObjectIdSubclassMap: Add isEmpty() method" 2010-04-17 18:29:16 -04:00
Robin Rosenberg c2960cdf65 Merge "IndexPack: Correct thin pack fix using less than 20 bytes" 2010-04-17 07:26:45 -04:00
Shawn O. Pearce 585dcb7a1c ReceivePack: Clarify the check reachable option
This option was mis-named from day 1.  Its not checking that the
objects provided by the client are reachable, its actually doing
a scan to prove that objects referenced by the client are already
reachable through another reference on the server, or were sent
as part of the pack from the client.

Rename it checkReferencedObjectsAreReachable, since we really are
trying to validate that objects referenced by the client's actions
are reachable to the client.

We also need to ensure we run checkConnectivity() anytime this is
enabled, even if the caller didn't turn on fsck for object formats.
Otherwise the check would be completely bypassed.

Change-Id: Ic352ddb0ca8464d407c6da5c83573093e018af19
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-04-16 17:04:38 -07:00
Shawn O. Pearce a770205070 ReceivePack: Micro-optimize object lookup when checking connectivity
If we are checking the visibility of everything referenced in the
pack that isn't already reachable by a reference, it needs to be
in the provided set.  Since the provided set lists everything that
is in this pack, we can avoid checking to see if the blob exists
on disk, because we know it should be there, it was found in the
pack we just consumed.

Change-Id: Ie3c7746f734d13077242100a68e048f1ac18c34a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-04-16 17:04:38 -07:00
Shawn O. Pearce 6029bb24ad ReceivePack: Correct type of not provided object
If a tree was referenced but not provided in the pack, report it
as a missing tree and not as a missing blob.

Change-Id: Iab05705349cdf0d30cc3f8afc6698a8d2a941343
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-04-16 17:04:37 -07:00
Shawn O. Pearce 2bb8defa54 IndexPack: Tighten up new and base object bookkeeping
The only current consumer of these collections is ReceivePack,
where it needs to test ObjectId equality between a RevObject and an
ObjectId.  There we were copying from a traditional HashSet<ObjectId>
into an ObjectIdSubclassMap<ObjectId>, as the latter can perform
hashing using ObjectId's native value support, bypassing RevObject's
override on hashCode() and equals().  Instead of doing that copy,
directly create ObjectIdSubclassMap instances inside of ReceivePack.

We also only need to record the objects that do not appear in the
incoming pack, and were therefore copied from the local repositiory
in order to complete delta resolution.  Instead of listing everything
that used an OBJ_REF_DELTA format, list only the objects that we
pulled from the destination repository via a normal ObjectLoader.

ReceivePack can now discard the IndexPack object, and all of its
other data, as soon as these collections are held by the check
connectivity method.  This frees up memory for the ObjectWalk's
own RevObject pool.

Change-Id: I22ef71b45c2045a0202e7fd550a770ee1f6f38a6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-04-16 17:04:26 -07:00
Shawn O. Pearce 329a0e1689 ReceivePack: Remove need new,base object id properties
These are more like internal implementation details of how IndexPack
works with ReceivePack to validate the incoming object stream.
Callers who are embedding the ReceivePack logic in their own
application don't really need to know the details of which objects
were used for delta bases in the incoming thin pack, or exactly
which objects were newly transmitted.

Hide these from the API, as exposing them through ReceivePack was
an early mistake.

Change-Id: I7ee44a314fa19e6a8520472ce05de92c324ad43e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-04-16 16:32:33 -07:00
Shawn O. Pearce 8279361de8 ReceivePack: Discard IndexPack as soon as possible
The IndexPack object carries a good bit of state within itself about
the objects received over the wire.  The earlier we can discard it,
the sooner the GC is able to reclaim this chunk of memory for other
uses.  So drop it as soon as we are certain the pack is valid and we
have no connectivity concerns.

Change-Id: I1e8bc87c2e9183733043622237a064e55957891f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-04-16 16:32:33 -07:00
Shawn O. Pearce 7a91b180c1 ReceivePack: fix ensureProvidedObjectsVisible on thin packs
If ensureProvidedObjectsVisible is enabled we expected any trees or
blobs directly reachable from an advertised reference to be marked
with UNINTERESTING.  Unfortunately ObjectWalk doesn't bother setting
this until the traversal is complete.  Even then it won't necessarily
set it on every tree if the corresponding commit wasn't popped.

When we are going to check the base objects for the received pack,
ensure the UNINTERESTING flag gets carried into every immediately
reachable tree or blob, because these are the ones that the client
might try to use as delta bases in a thin pack.

Change-Id: I5d5fdcf07e25ac9fc360e79a25dff491925e4101
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-04-16 16:32:23 -07:00
Shawn O. Pearce 466bec3cc9 ObjectIdSubclassMap: Correct Iterator to throw NoSuchElementException
The Iterator contract says next() shall throw NoSuchElementException
if there are no more items remaining in the iteration.  We got this
wrong when I originally wrote the implementation, so fix it.

Change-Id: Iea25e6569ead5c8b3128b8a368c5b2caebec7ecc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-04-16 16:30:21 -07:00