motiejus/jgit - jgit - gitea: Gitea Service

motiejus

jgit

Author	SHA1	Message	Date
Shawn O. Pearce	97e93ca1ea	Merge "Remove static progress task names from PackWriter"	2010-08-05 21:11:30 -04:00
Chris Aniszczyk	b69900a415	Merge "Add "all" parameter to the commit Command"	2010-08-05 15:02:59 -04:00
Chris Aniszczyk	ad4274abcc	Merge "Add the parameter "update" to the Add command"	2010-08-05 15:01:53 -04:00
Stefan Lay	4b464ed458	Allow to replace existing Change-Id It is useful to be able to replace an existing Change-Id in the message, for example if the user decides not to amend the previous commit. Bug: 321188 Change-Id: I594e7f9efd0c57d794d2bd26d55ec45f4e6a47fd Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	2010-08-05 12:23:38 -05:00
Shawn O. Pearce	60c5939b23	Rename getOldName,getNewName to getOldPath,getNewPath TreeWalk calls this value "path", while "name" is the stuff after the last slash. FileHeader should do the same thing to be consistent. Rename getOldName to getOldPath and getNewName to getNewPath. Bug: 318526 Change-Id: Ib2e372ad4426402d37939b48d8f233154cc637da Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-04 11:00:07 -07:00
Shawn O. Pearce	7514a6dbdc	Merge branch 'js/diff' * js/diff: Fixed bug in scoring mechanism for rename detection	2010-08-04 10:59:35 -07:00
Jeff Schumacher	e64cb03065	Fixed bug in scoring mechanism for rename detection A bug in rename detection would cause file scores to be wrong. The bug was due to the way rename detection would judge the similarity between files. If file A has three lines containing 'foo', and file B has 5 lines containing 'foo', the rename detection phase should record that A and B have three lines in common (the minimum of the number of times that line appears in both files). Instead, it would choose the the number of times the line appeared in the destination file, in this case file B. I fixed the bug by having the SimilarityIndex instead choose the minimum number, as it should. I also added a test case to verify that the bug had been fixed. Change-Id: Ic75272a2d6e512a361f88eec91e1b8a7c2298d6b	2010-08-04 10:56:19 -07:00
Jens Baumgart	3ba1c7c068	Add gitignore support to IndexDiff and use TreeWalk IndexDiff was re-implemented and now uses TreeWalk instead of GitIndex. Additionally, gitignore support and retrieval of untracked files was added. Change-Id: Ie6a8e04833c61d44c668c906b161202b200bb509 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	2010-08-04 10:03:20 -05:00
Stefan Lay	ab57af08e8	Add "all" parameter to the commit Command When the add parameter is set all modified and deleted files are staged prior to commit. Change-Id: Id23bc25730fcdd151386cd495a7cdc0935cbc00b Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-08-04 13:53:08 +02:00
Stefan Lay	fa7d9ac5b8	Add the parameter "update" to the Add command This change is mainly done for a subsequent commit which will introduce the "all" parameter to the Commit command. Bug: 318439 Change-Id: I85a8a76097d0197ef689a289288ba82addb92fc9 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-08-04 13:36:45 +02:00
Christian Halstrick	94207f0a43	Make use of Repository.writeMerge...() The CommitCommand should not use java.io to delete MERGE_HEAD and MERGE_MSG files since Repository already has utility methods for that. Change-Id: If66a419349b95510e5b5c2237a91f06c1d5ba0d4 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	2010-07-29 15:12:14 +02:00
Christian Halstrick	fba2437111	Merge "Fix tag sorting in PlotWalk"	2010-07-28 17:13:27 -04:00
Shawn O. Pearce	5f5da8b1d4	Enable configuration of non-standard pack settings For daemons we might want to disable delta compression entirely, or in some strange case an administrator might need to turn of delta reuse. Expose these normally internal pack settings through the pack configuration section. Change-Id: I39bfefee8384c864cc04ffac724f197240c8a11a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 12:13:48 -07:00
Shawn O. Pearce	9fbce904e6	Pass PackConfig down to PackWriter when packing When we are creating a pack the higher level application should be able to override the PackConfig used, allowing it to control the number of threads used or how much memory is allocated per writer. Change-Id: I47795987bb0d161d3642082acc2f617d7cb28d8c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 12:13:48 -07:00
Shawn O. Pearce	bb99ec0aa0	Simplify UploadPack use of options during writing We only use these variables once, so just put them at the proper use site and avoid assigning the local variable. The code is a bit shorter and the intent is a little bit more clear. Change-Id: I70d120fb149b612ac93055ea39bc053b8d90a5db Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 12:13:48 -07:00
Shawn O. Pearce	1a06179ea7	Move PackWriter configuration to PackConfig This refactoring permits applications to configure global per-process settings for all packing and easily pass it through to per-request PackWriters, ensuring that the process configuration overrides the repository specific settings. For example this might help in a daemon environment where the server wants to cap the resources used to serve a dynamic upload pack request, even though the repository's own pack.* settings might be configured to be more aggressive. This allows fast but less bandwidth efficient serving of clients, while still retaining good compression through a cron managed `git gc`. Change-Id: I58cc5e01b48924b1a99f79aa96c8150cdfc50846 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 12:13:48 -07:00
Mathias Kinzler	6e59e6dab9	Meaningful error message when trying to check-out submodules Currently, a NullPointerException occurs in this case. We should instead throw a more meaningful Exception with a proper message. This is a very "stupid" implementation which simply checks for the existence of a ".gitmodules" file. Bug: 300731 Bug: 306765 Bug: 308452 Bug: 314853 Change-Id: I155aa340a85cbc5d7d60da31dba199fc30689b67 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>	2010-07-28 11:59:07 -07:00
Christian Halstrick	08c0c5d938	Fix unit tests under windows the following tests fail under windows because certain inputstreams are not closed and files cannot be deleted because of that. The main problem I found is UnpackedObject.InflaterInputStream.close(). This method may throw exceptions found by checkValidEndOfStream() but doesn't call super.close() before leaving. It is not clear to me which resources a close() method should release before it throws an exception. But those reseources which are not published to the outside and which therefore cannot be closed by other means have to be closed in all cases. I changed the close() method to call super.close() under all circumstances. failing tests: testStandardFormat_LargeObject_TruncatedZLibStream(org.eclipse.jgit.storage.file.UnpackedObjectTest) testStandardFormat_LargeObject_TrailingGarbage(org.eclipse.jgit.storage.file.UnpackedObjectTest) testPackFormat_SmallObject(org.eclipse.jgit.storage.file.UnpackedObjectTest) Change-Id: Id2e609a29e725aad953ff9bd88af6381df38399d Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	2010-07-28 11:55:11 -07:00
Shawn O. Pearce	d0f8d1e819	Fix tag sorting in PlotWalk By deferring tag sorting until the commit is produced by the walker we can avoid an infinite loop that was triggered by trying to sort tags while allocating a commit. This also avoids needing to look at commits which aren't going to be produced in the result. Bug: 321103 Change-Id: I25acc739db2ec0221a50b72c2d2aa618a9a75f37 Reviewed-by: Mathias Kinzler <mathias.kinzler@sap.com> Reviewed-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 11:51:17 -07:00
Shawn O. Pearce	21f76c2a69	Remove static progress task names from PackWriter These need to be dynamic based on the current thread's environment at time of execution in order to be properly localized for the end user that will be seeing these messages. Change-Id: I4976f462cfe606edd2761c0e36b2f6b20f63d53c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 10:50:28 -07:00
Shawn O. Pearce	1b783d0370	Allow PackWriter callers to manage the thread pool By permitting the caller of PackWriter to select the Executor it uses for task execution, we give the caller the ability to manage the lifecycle of the thread pool, including reusing it across concurrent pack generators. This is the first step to supporting application thread pools within Daemon or another managed service like Gerrit Code Review. Change-Id: I96bee7b9c30ff9885f2bd261d0b6daaac713b5a4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 10:50:28 -07:00
Christian Halstrick	74d279fbf0	Teach NameConflictTreeWalk to report DF conflicts Add a method isDirectoryFileConflict() to NameConflictTreeWalk which tells whether the current path is part of a directory/file conflict. Change-Id: Iffcc7090aaec743dd6f3fd1a333cac96c587ae5d Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 17:26:31 +02:00
Mathias Kinzler	51c6f513b0	Stack Overflow in EGit History View This is caused by a recursion in PlotWalk.getTags(). As a hotfix, the sort was simply removed. The sort must be re-implemented so that parseAny() is not called again (currently, this happens in the PlotRefComparator). Change-Id: I060d26fda8a75ac803acaf89cfb7d3b4317328f3 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com> Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	2010-07-28 11:46:05 +02:00
Jeff Schumacher	396fe6da45	Break dissimilar file pairs during diff File pairs that are very dissimilar during a diff were not being broken apart into their constituent ADD/DELETE pairs. The leads to sub-optimal rename detection. Take, for example, this situation: A file exists at src/a.txt containing "foo". A user renames src/a.txt to src/b.txt, then adds a new src/a.txt containing "bar". Even though the old a.txt and the new b.txt are identical, the rename detection algorithm would not detect it as a rename since it was already paired in a MODIFY. I added code to split all MODIFYs below a certain score into their constituent ADD/DELETE pairs. This allows situations like the one I described above to be more correctly handled. Change-Id: I22c04b70581f206bbc68c4cd1ee87a1f663b418e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-27 18:13:32 -07:00
Christian Halstrick	f56a459966	Add methods which write MERGE_HEAD and MERGE_MSG Add methods to the Repository class which write into MERGE_HEAD and MERGE_MSG files. Since we have the read methods in the same class this seems to be the right place. Change-Id: I5dd65306ceb06e008fcc71b37ca3a649632ba462 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-27 11:48:23 -07:00
Jens Baumgart	db82b8d7eb	Fix concurrent read / write issue in LockFile on Windows LockFile.commit fails if another thread concurrently reads the base file. The problem is fixed by retrying the rename operation if it fails. Change-Id: I6bb76ea7f2e6e90e3ddc45f9dd4d69bd1b6fa1eb Bug: 308506 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>	2010-07-27 10:00:47 -07:00
Robin Stocker	a00377a7e2	Fix Javadoc warnings There were some broken links, incorrect uses of @value, an invalid tag and an outdated comment. Change-Id: I22886bcc869a4b62bd606ebed40669f7b4723664 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-27 09:40:01 -07:00
Shawn O. Pearce	80fe789690	Make forPath(ObjectReader) variant in TreeWalk This simplifies the logic for those who already have an ObjectReader on hand want to reuse it to lookup a single path. Change-Id: Ief17d6b2a0674ddb34bbc9f43121b756eae960fb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-27 08:36:24 -07:00
Shawn O. Pearce	7ff18f3ec9	Make StoredConfig an abstraction above FileBasedConfig This exposes a load and save method, allowing a Repository to denote that it has a persistent configuration of some kind which can be accessed by the application, without needing to know exact details of how its stored . Change-Id: I7c414bc0f975b80f083084ea875eca25c75a07b2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-26 16:50:11 -07:00
Shawn O. Pearce	fa9b225e06	Merge branch 'delta' * delta: (103 commits) Discard the uncompressed delta as soon as its compressed Honor pack.windowlimit to cap memory usage during packing Honor pack.threads and perform delta search in parallel Cache small deltas during packing Implement delta generation during packing debug-show-packdelta: Dump a pack delta to the console Initial pack format delta generator Add debugging toString() method to ObjectToPack Make ObjectToPack clearReuseAsIs signal available to subclasses Correctly classify the compressing objects phase Refactor ObjectToPack's delta depth setting Configure core.bigFileThreshold into PackWriter Add doNotDelta flag to ObjectToPack Add more configuration options to PackWriter Save object path hash codes during packing Add path hash code to ObjectWalk Add getObjectSize to ObjectReader Allow TemporaryBuffer.Heap to allocate smaller than 8 KiB Define a constant for 127 in DeltaEncoder Cap delta copy instructions at 64k ... Conflicts: org.eclipse.jgit.pgm/src/org/eclipse/jgit/pgm/Diff.java org.eclipse.jgit/resources/org/eclipse/jgit/JGitText.properties org.eclipse.jgit/src/org/eclipse/jgit/JGitText.java org.eclipse.jgit/src/org/eclipse/jgit/revwalk/RewriteTreeFilter.java Change-Id: I7c7a05e443a48d32c836173a409ee7d340c70796	2010-07-22 14:56:34 -07:00
Stefan Lay	ab062caa22	Allow client of Add command to set a WorkingTreeIterator This is e.g. useful when a client of the AddCommand has additional rules to ignore files. In Eclipse a resource can be set to derived or be excluded by preferences. Change-Id: I6c47e54a1ce26315faf5ed0723298ad2c2db197c Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-22 14:57:00 +02:00
Stefan Lay	88957f6c5a	Allow for filepattern "." in AddCommand Enable adding on repository root level. Change-Id: I415b10dc74cc9435578424d9f106c972fd703055 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-22 14:27:35 +02:00
Stefan Lay	aa86cfc339	Do not add ignored files in Add command Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-22 11:26:04 +02:00
Shawn O. Pearce	09910ffa32	Move ignore node handling into WorkingTreeIterator The working tree iterator has perfect knowledge of the path structure as well as immediate information about whether or not an ignore file even exists at this level. We can exploit that to simplify the logic and running time for testing ignored file status by pushing all of the checks down into the iterator itself. Change-Id: I22ff534853e8c5672cc5c2d9444aeb14e294070e Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>	2010-07-21 10:34:08 -07:00
Shawn Pearce	0ec0e21fdf	Merge "Fix concurrent read / write issue in GitIndex on Windows"	2010-07-21 13:08:01 -04:00
Jens Baumgart	e99c48a61a	Fix concurrent read / write issue in GitIndex on Windows GitIndex.write fails if another thread concurrently reads the index file. The problem is fixed by retrying the rename operation if it fails. Bug: 311051 Change-Id: Ib243d2a90adae312712d02521de4834d06804944 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>	2010-07-21 09:35:15 +02:00
Christian Halstrick	5c94321b47	Check for racy git in WorkingTreeIterator The WorkingTreeIterator has a method to check whether the current file differs from the corresponding index entry. This commit improves this check to also handle racy git situations. See http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/technical/racy-git.txt;hb=HEAD Change-Id: I3ad0897211dcbb2eac9eebcb19d095a5052fb06b Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-20 21:55:18 +02:00
Christian Halstrick	c98d97731b	Smudge racily clean index entries by truncating length (like git.git) To mark an entry racily clean we set its length to 0 (like native git does). Entries which are not racily clean and have zero length can be distinguished from racily clean entries by checking P_OBJECTID against the SHA1 of empty content. When length is 0 and P_OBJECTID is different from SHA1 of empty content we know the entry is marked racily clean. See http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg00488.html Change-Id: I689552931441ab51964b430b303160c9126b66af Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-20 21:54:36 +02:00
Shawn O. Pearce	938943d674	Use proper constants for .gitignore and .git directory We have a constant for .gitignore, so use it. While we are in the same method, correct the reference of ".git" to be the actual GIT_DIR given. This might not be within the work tree if the GIT_DIR and GIT_WORK_TREE environment variables were used. Change-Id: I38e1cec13405109b9c347858b38dd9fb2f1f2560 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>	2010-07-20 09:11:39 -07:00
Shawn O. Pearce	c59db09bc5	Remove gitIgnoreTimestamp from abstract iterator API This never should have been exposed on the top of the AbstractTreeIterator type hierarchy. There is no concept of a timestamp in a canonical tree read from the object database, and the time in the DirCache isn't what we want here either. Actually all that we need is to find the files whose names are ".gitignore" and are below the root directory. We can accomplish that with a suffix filter, and process them immediately. Change-Id: Ib09cbf81a9e038452ce491385c65498312e2916b Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>	2010-07-20 09:09:01 -07:00
Shawn O. Pearce	395d236058	Fix NPE in RenameDetector If we have two adds of the same object but no deletes the detector threw an NPE because the entry that came back from the deleted map was null (no matching objects). In this case we need to put the adds all back onto the list of left over additions since they did not match a delete. Change-Id: Ie68fbe7426b4dc0cb571a08911c7adbffff755d5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Jeffrey Schumacher" <jeffschu@google.com>	2010-07-20 07:52:35 -07:00
Shawn O. Pearce	b518189b5c	IndexPack: Fix spurious pack file corruption errors We didn't correctly handle the zlib trailer for an object. If the trailer bytes were outside of the current buffer window but we had fully inflated the object itself, we broke out of the loop (as we had our target size) but inflate wasn't finished (as it did not yet get the trailer) so we failed the test and threw a corruption exception. Use an infinite loop and only break out when the inflater is done. Change-Id: I7c9bbbeb577a990d9bc56a50ebd485935460f6c8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-20 07:40:48 -07:00
Shawn O. Pearce	12fe0f2d1e	Discard the uncompressed delta as soon as its compressed The DeltaCache will most likely need to copy the compressed delta into a new buffer in order to compact away the wasted space at the end caused by over allocation. Since we don't need the uncompressed format anymore, null out our only reference to it so the GC can reclaim this memory if it needs to perform a collection in order to satisfy the cache's allocation attempt. Change-Id: I50403cfd2e3001b093f93a503cccf7adab43cc9d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-16 10:41:09 -07:00
Shawn O. Pearce	6e155d5f41	Merge branch 'js/rename' * js/rename: Implemented file path based tie breaking to exact rename detection Added more test cases for RenameDetector Added very small optimization to exact rename detection Fixed Misleading Javadoc Added file path similarity to scoring metric in rename detection Fixed potential div by zero bug Added file size based rename detection optimization Create FileHeader from DiffEntry log: Implement --follow Cache the diff configuration section log: Add whitespace ignore options Format submodule links during differences Redo DiffFormatter API to be easier to use log, diff: Add rename detection support Implement similarity based rename detection Added a preliminary version of rename detection Refactored code out of FileHeader to facilitate rename detection	2010-07-16 10:22:15 -07:00
Shawn O. Pearce	0b46e70155	Fix infinite loop in IndexPack A programming error using the Inflater API led to an infinite loop within IndexPack, caused by the Inflater returning 0 from the inflate() method, but it didn't want more input. This happens when it has reached the end of the stream, or has reached a spot asking for an external dictionary. Such a case is a failure for us, and we should abort out. Thanks to Alex for pointing out that we had 3 implementations of the inflate rountine, which should be consolidated into one and use a switch to determine where to load data from. Bug: 317416 Change-Id: I34120482375b687ea36ed9154002d77047e94b1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-16 10:12:04 -07:00
Jeff Schumacher	31311cacfd	Implemented file path based tie breaking to exact rename detection During the exact rename detection phase in RenameDetector, ties were resolved on a first-found basis. I added support for file path based tie breaking during that phase. Basically, there are four situations that have to be handled: One add matching one delete: In this simple case, we pair them as a rename. One add matching many deletes: Find the delete whos path matches the add the closest, and pair them as a rename. Many adds matching one delete: Similar to the above case, we find the add that matches the delete the closest, and pair them as a rename. The other adds are marked as copies of the delete. Many adds matching many deletes: Build a scoring matrix similar to the one used for content- based matching, scoring instead by file path. Some of the utility functions in SimilarityRenameDetector are used in this case, as we use the same encoding scheme. Once the matrix is built, scan it for the best matches, marking them as renames. The rest are marked as copies. I don't particularly like the idea of using utility functions right out of SimilarityRenameDetector, but it works for the moment. A later commit will likely refactor this into a common utility class, as well as bringing exact rename detection out of RenameDetector and into a separate class, much like SimilarityRenameDetector. Change-Id: I1fb08390aebdcbf20d049aecf402a36506e55611	2010-07-16 09:56:42 -07:00
Christian Halstrick	b840ed0121	Added dirty-detection to WorkingTreeIterator Added possibility to compare the current entry of a WorkingTreeIterator to a given DirCacheEntry. This is done to detect whether an entry in the index is dirty or not. 'Dirty' means that the file in the working tree is different from what's in the index. Merge algorithms will make use of this to detect conflicts. Change-Id: I3ff847f4bf392553dcbd6ee236c6ca32a13eedeb Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-16 10:08:52 +02:00
Shawn Pearce	19473b1dbc	Merge "Handle the tilde notation (~user) of git url"	2010-07-15 17:29:21 -04:00
Robin Rosenberg	845714158a	Handle the tilde notation (~user) of git url When the path is prefixed with ~ the URI parser thought about this as /~. Strip the / if the next character is the tilde. Bug: 307017 Change-Id: I58203e5617956b46d83e8987d1f8042beddffac3 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-07-15 01:16:09 +02:00
Stefan Lay	233e0130b5	Git Porcelain API: Add Command The new Add command adds files to the Git Index. It uses the DirCache to access the git index. It works also in case of an existing conflict. Fileglobs (e.g. *.c) are not yet supported. The new Add command does add ignored files because there is no gitignore support in jgit yet. Bug: 318440 Change-Id: If16fdd4443e46b27361c2a18ed8f51668af5d9ff Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-14 11:24:58 +00:00
Shawn Pearce	0ef99921fa	Merge changes I104cd62f,I1d0238b4 * changes: Internationalize RepositoryState descriptions Say that commit is allowed during bisect	2010-07-13 20:36:25 -04:00
Charley Wang	b878cdcf6b	Add compatibility with gitignore specifications This patch adds ignore compatibility to jgit. It encompasses exclude files as well as .gitignore. Uses TreeWalk and FileTreeIterator to find nodes and parses .gitignore files when required. The patch includes a simple cache that can be used to save results and avoid excessive gitignore parsing. CQ: 4302 Bug: 303925 Change-Id: Iebd7e5bb534accca4bf00d25bbc1f561d7cad11b Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com> Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-13 00:34:15 +02:00
Jeff Schumacher	bc08fafb41	Added very small optimization to exact rename detection Optimized a small loop in findExactRenames. The loop would go through all the items in a list of DiffEntries even after it already found what it was looking for. I made it break out of the loop as soon as a good match was found. Change-Id: I28741e0c49ce52d8008930a87cd1db7037700a61	2010-07-12 12:54:01 -07:00
Jeff Schumacher	a20e6f6fec	Fixed Misleading Javadoc The javadoc for the setRenameLimit method in RenameDetector said that you could only have limits in the range (0,100), implying that 0 and 100 were illegal inputs. The code, however, allowed 0 and 100. I changed the javadoc to say that the range [0,100] was legal. I also documented the IllegalArgumentException that is thrown if the limit is outside that range. Change-Id: I916838f254859f6f0e1516bb55b8e7dc87e57dc2	2010-07-12 12:54:01 -07:00
Jeff Schumacher	9a48de86d8	Added file path similarity to scoring metric in rename detection The scoring method was not taking into account the similarity of the file paths and file names. I changed the metric so that it is 99% based on content (which used to be 100% of the old metric), and 1% based on path similarity. Of that 1%, half (.5% of the total final score) is based on the actual file names (e.g. "foo.java"), and half on the directory (e.g. "src/com/foo/bar/"). Change-Id: I94f0c23bf6413c491b10d5625f6ad7d2ecfb4def	2010-07-12 12:52:05 -07:00
Jeff Schumacher	4c14b7869d	Fixed potential div by zero bug The scoring logic in SimilarityIndex was dividing by the max file size. If both files are empty, this would cause a div by zero error. This case cannot currently happen, since two empty files would have the same SHA1, and would therefore be caught in the earlier SHA1 based detection pass. Still, if this logic eventually gets separated from that pass, a div by zero error would occur. I changed the logic to instead consider two empty files to have a similarity score of 100. Change-Id: Ic08e18a066b8fef25bb5e7c62418106a8cee762a	2010-07-12 12:24:42 -07:00
Jeff Schumacher	64b9458640	Added file size based rename detection optimization Prior to this change, files that were very different in size (enough so that they could not have enough in common to be detected as renames) were still having their scores calculated. I added an optimization to skip such files. For example, if the rename detection threshold is 60%, the larger file is 200kb, and the smaller file is 50kb, the pair cannot be counted as a rename since they cannot possibly share 60% of their content in common. (200*.6=120, 120>50) Change-Id: Icd8315412d5de6292839778e7cea7fe6f061b0fc	2010-07-12 12:24:42 -07:00
Robin Rosenberg	d787a82e50	Internationalize RepositoryState descriptions Change-Id: I104cd62f3e89acf010b1d40a2b08e7f68f63bb85 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-07-10 10:24:37 +02:00
Shawn O. Pearce	9734194917	Honor pack.windowlimit to cap memory usage during packing The pack.windowlimit configuration parameter places an upper bound on the number of bytes used by the DeltaWindow class as it scans through the object list. If memory usage would exceed the limit the window is temporarily decreased in size to keep memory used within that bound. Change-Id: I09521b8f335475d8aee6125826da8ba2e545060d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:19:07 -07:00
Shawn O. Pearce	74e0835012	Honor pack.threads and perform delta search in parallel If we have multiple CPUs available, packing usually goes faster when each CPU is assigned a slice of the available search space. The number of threads to use is guessed from the runtime if it wasn't set by the caller, or wasn't set in the configuration. Change-Id: If554fd8973db77632a52a0f45377dd6ec13fc220 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:17:30 -07:00
Shawn O. Pearce	a960d1429e	Cache small deltas during packing PackWriter now caches small deltas, or deltas that are very tiny compared to their source inputs, so that the writing phase goes faster by reusing those cached deltas. The cached data is stored compressed, which usually translates to a bigger footprint due to deltas being very hard to compress, but saves time during writing by avoiding the deflate step. They are held under SoftReferences so that the JVM GC can clear out deltas if memory gets very tight. We would rather continue working and spend a bit more CPU time during writing than crash due to OOME. To avoid OutOfMemoryErrors during the caching phase we also trap OOME and just abort out of the caching. Because deflateBound() always produces something larger than what we need to actually store the deflated data, we copy it over into a new buffer if the actual length doesn't match the buffer length. When packing jgit.git this saves over 111 KiB in the cache, and is thus a worthwhile hit on CPU time. To further save memory we store the inflated size of the delta (which we need for the object header) in the same field as the pathHash, as the pathHash is no longer necessary by this phase of the packing algorithm. Change-Id: I0da0c600d845e8ec962289751f24e65b5afa56d7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:15:54 -07:00
Shawn O. Pearce	dfad23bf3d	Implement delta generation during packing PackWriter now produces new deltas if there is not a suitable delta available for reuse from an existing pack file. This permits JGit to send less data on the wire by sending a delta relative to an object the other side already has, instead of sending the whole object. The delta searching algorithm is similar in style to what C Git uses, but apparently has some differences (see below for more on). Briefly, objects that should be considered for delta compression are pushed onto a list. This list is then sorted by a rough similarity score, which is derived from the path name the object was discovered at in the repository during object counting. The list is then walked in order. At each position in the list, up to $WINDOW objects prior to it are attempted as delta bases. Each object in the window is tried, and the shortest delta instruction sequence selects the base object. Some rough rules are used to prevent pathological behavior during this matching phase, like skipping pairings of objects that are not similar enough in size. PackWriter intentionally excludes commits and annotated tags from this new delta search phase. In the JGit repository only 28 out of 2600+ commits can be delta compressed by C Git. As the commit count tends to be a fair percentage of the total number of objects in the repository, and they generally do not delta compress well, skipping over them can improve performance with little increase in the output pack size. Because this implementation was rebuilt from scratch based on my own memory of how the packing algorithm has evolved over the years in C Git, PackWriter, DeltaWindow, and DeltaEncoder don't use exactly the same rules everywhere, and that leads JGit to produce different (but logically equivalent) pack files. Repository \| Pack Size (bytes) \| Packing Time \| JGit - CGit = Difference \| JGit / CGit -----------+----------------------------------+----------------- git \| 25094348 - 24322890 = +771458 \| 59.434s / 59.133s jgit \| 5669515 - 5709046 = - 39531 \| 6.654s / 6.806s linux-2.6 \| 389M - 386M = +3M \| 20m02s / 18m01s For the above tests pack.threads was set to 1, window size=10, delta depth=50, and delta and object reuse was disabled for both implementations. Both implementations were reading from an already fully packed repository on local disk. The running time reported is after 1 warm-up run of the tested implementation. PackWriter is writing 771 KiB more data on git.git, 3M more on linux-2.6, but is actually 39.5 KiB smaller on jgit.git. Being larger by less than 0.7% on linux-2.6 isn't bad, nor is taking an extra 2 minutes to pack. On the running time side, JGit is at a major disadvantage because linux-2.6 doesn't fit into the default WindowCache of 20M, while C Git is able to mmap the entire pack and have it available instantly in physical memory (assuming hot cache). CGit also has a feature where it caches deltas that were created during the compression phase, and uses those cached deltas during the writing phase. PackWriter does not implement this (yet), and therefore must create every delta twice. This could easily account for the increased running time we are seeing. Change-Id: I6292edc66c2e95fbe45b519b65fdb3918068889c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:14:18 -07:00
Shawn O. Pearce	074055d747	debug-show-packdelta: Dump a pack delta to the console This is a horribly crude application, it doesn't even verify that the object its dumping is delta encoded. Its method of getting the delta is pretty abusive to the public PackWriter API, because right now we don't want to expose the real internal low-level methods actually required to do this. Change-Id: I437a17ceb98708b5603a2061126eb251e82f4ed4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:12:32 -07:00
Shawn O. Pearce	8612c0ace1	Initial pack format delta generator DeltaIndex is a simple pack style delta generator. The function works by creating a compact index of a source buffer's blocks, and then walking a sliding window along a desired result buffer, searching for the window in the index. When a match is found, the window is stretched to the longest possible length that is common with the source buffer, and a copy instruction is created. Rabin's polynomial hash function is used to compute the hash for a block, permitting efficient sliding of the window in single byte increments. The update function to slide one byte originated from David Mazieres' work in LBFS, and our implementation of the update step was certainly inspired by the initial work Geert Bosch proposed for C Git in http://marc.info/?l=git&m=114565424620771&w=2. To ensure the encoder runs in linear time with respect to the size of the two input buffers (source and result), the maximum number of blocks that can share the same position in the index's hashtable is capped at a constant number. This prevents bad inputs from causing the encoder to run in quadratic time, but comes with a penalty of creating a longer delta due to fewer considered copy positions. Strange hackery is used to cap the amount of memory used by the index to be no more than 12 bytes for every 16 bytes of source buffer, no matter what the JVM per-object overhead is. This permits an index to always be no larger than 1.75x the source buffer length, which is an important feature to support large windows of candidates to match against while packing. Here the strange hackery is nothing more than a manually managed chained hashtable, where pointers are array indexes into storage arrays rather than object references. Computation of the hash function for a single fixed sized block is done through an unrolled loop, where the first 4 iterations have been manually reduced down to eliminate unnecessary instructions. The pattern is derived from ObjectId.equals(byte[], int, byte[], int), where we have unrolled the loop required to compare two 20 byte arrays. Hours of testing with the Sun 1.6 JRE concluded that the non-obvious "foo[idx + 1]" style of reference is faster than "foo[idx++]", and so that is what we use here during hashing. Change-Id: If9fb2a1524361bc701405920560d8ae752221768 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:10:55 -07:00
Shawn O. Pearce	b38426ae8c	Add debugging toString() method to ObjectToPack Its useful to know what the flags are or what the base that was selected is. Dump these out as part of the object's toString. Change-Id: I8810067fb8337b08b4fcafd5f9ea3e1e31ca6726 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:09:19 -07:00
Shawn O. Pearce	699e4aa7c5	Make ObjectToPack clearReuseAsIs signal available to subclasses A subclass may want to use this method to release handles that are caching reuse information. Make it protected so they can override it and update themselves. Change-Id: I2277a56ad28560d2d2d97961cbc74bc7405a70d4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:07:45 -07:00
Shawn O. Pearce	4569d77e13	Correctly classify the compressing objects phase Searching for reuse candidates should be fast compared to actually doing delta compression. So pull the progress monitor out of this phase and rename it back to identify the compressing objects state. Change-Id: I5eb80919f21c1251e0e3420ff7774126f1f79b27 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:06:10 -07:00
Shawn O. Pearce	85b7a53d52	Refactor ObjectToPack's delta depth setting Long ago when PackWriter is first written we thought that the delta depth could be updated automatically. But its never used. Instead make this a simple standard setter so the caller can more directly set the delta depth of this object. This permits us to configure a depth that takes into account more than just the depth of another object in this same pack. Change-Id: I1d71b74f2edd7029b8743a2c13b591098ce8cc8f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:04:35 -07:00
Shawn O. Pearce	6730f9e3c8	Configure core.bigFileThreshold into PackWriter C Git's fast-import uses this to determine the maximum file size that it tries to delta compress, anything equal to or above this setting is stored with as a whole object with simple deflate. Define the configuration so we can use it later. Change-Id: Iea46e787d019a1b6c51135cc73d7688a02e207f5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:02:54 -07:00
Shawn O. Pearce	823e9a9721	Add doNotDelta flag to ObjectToPack This flag will later control whether or not PackWriter search for a delta base for this object. Edge objects will never get searched, as the writer won't be outputting them, so they should always have this flag set on. Sometime in the future this flag should also be set for file blobs on file paths that have the "-delta" gitattribute set in the repository's attributes file. Change-Id: I6e518e1a6996c8ce00b523727f1b605e400e82c6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:00:49 -07:00
Shawn O. Pearce	616bc74cf7	Add more configuration options to PackWriter We now at least import other pack settings like pack.window, which means we can later use these to control how we search for deltas. The compression level was fixed to use pack.compression rather than the loose object core.compression setting. Change-Id: I72ff6d481c936153ceb6a9e485fa731faf075a9a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:00:46 -07:00
Robin Rosenberg	a1492f1922	Say that commit is allowed during bisect C Git allows this and it is quite handy. Change-Id: I1d0238b43fca931ad2079649fb7b431e2815c351 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-07-10 02:32:46 +02:00
Shawn O. Pearce	2f93a09dd1	Save object path hash codes during packing We need to remember these so we can later cluster objects that have similar file paths near each other as we search for deltas between them. Change-Id: I52cb1e4ca15c9c267a2dbf51dd0d795f885f4cf8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 15:17:26 -07:00
Shawn O. Pearce	c20daa7314	Add path hash code to ObjectWalk PackWriter wants to categorize objects that are similar in path name, so blobs that are probably from the same file (or same sort of file) can be delta compressed against each other. Avoid converting into a string by performing the hashing directly against the path buffer in the tree iterator. We only hash the last 16 bytes of the path, and we try avoid any spaces, as we want the suffix of a file such as ".java" to be more important than the directory it is in, like "src". Change-Id: I31770ee711526306769a6f534afb19f937e0ba85 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 10:37:47 -07:00
Shawn O. Pearce	b584cb8754	Add getObjectSize to ObjectReader This is an informational function used by PackWriter to help it better organize objects for delta compression. Storage systems can implement it to provide up more detailed size information, or they can simply rely on the default behavior that uses the ObjectLoader obtained from open. For local file storage, we can obtain this information faster through specialized routines that parse a pack object header. Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 10:37:47 -07:00
Shawn O. Pearce	97311cd3e0	Allow TemporaryBuffer.Heap to allocate smaller than 8 KiB If the heap limit was set to something smaller than 8 KiB, we were still allocating the full 8 KiB block size, and accepting up to the amount we allocated by. Instead actually put a hard cap on the limit. Change-Id: Id1da26fde2102e76510b1da4ede8493928a981cc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 10:37:47 -07:00
Matthias Sohn	b8f2bb7d2a	Add support for updateNeeded flag in DirCacheEntry Change-Id: If06ff41d9ccd422afbc79ecbc3cfdf8bb2508dcd Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-09 14:12:06 +02:00
Jeff Schumacher	a8b29afd82	Create FileHeader from DiffEntry Added support for converting DiffEntrys to FileHeaders. FileHeaders are DiffEntrys with a buffer containing the diff output as well as a list of HunkHeaders. The HunkHeaders contain EditLists. The createFileHeader(DiffEntry) method in DiffFormatter performs a Myers Diff on the files refered to by the DiffEntry, then puts the returned EditList into a single HunkHeader, which is then put into the FileHeader to be returned. It also generates the appropriate diff header an puts it into the FileHeader's buffer. The rest of the diff output, which would normally be parsed to generate the HunkHeaders, is not generated. In fact, the purpose of this method is to avoid the costly diff output generation and parsing normally required to create a FileHeader. Change-Id: I7d8b18c0f6c85e3d02ad58995d3d231e69af5887	2010-07-08 16:58:55 -07:00
Stefan Lay	354b90131a	Fix javadoc typos in JGit API There were some small errors which made it difficult to read the JavaDoc. Change-Id: Ib3b34353465162adebaca3514d596d0edf5aea51 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-08 10:42:29 +02:00
Shawn O. Pearce	711bd3e3d0	Define a constant for 127 in DeltaEncoder The special value 127 here means how many bytes we can put into a single insert command. Rather than use the magical value 127, lets name it to better document the code. Change-Id: I5a326f4380f6ac87987fa833e9477700e984a88e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 09:52:09 -07:00
Shawn O. Pearce	cd7dd8591e	Cap delta copy instructions at 64k Although all modern delta decoders can process copy instructions with a count as large as 0xffffff (~15.9 MiB), pack version 2 streams are only supposed to use delta copy instructions up to 64 KiB. Rewrite our copy instruction encode loop to use the lower 64 KiB limit, even though modern decoders would support longer copies. To improve encoding performance we now try to encode up to four full copy commands in our buffer before we flush it to the stream, but we don't try to implement full buffering here. We are just trying to amortize the virtual method call to the destination stream when we have to do a large copy. Change-Id: I9410a16e6912faa83180a9788dc05f11e33fabae Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 09:52:09 -07:00
Shawn O. Pearce	384a19eee0	Deprecate all of the older Tree related code We want to get rid of these APIs, because they don't perform as well as DirCache/TreeWalk, or don't offer nearly as many features. Bug: 319145 Change-Id: I2b28f9cddc36482e1ad42d53e86e9d6461ba3bfc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 09:15:02 -07:00
Shawn O. Pearce	a215914a56	Fix DeltaEncoder header for objects 128 bytes long The encode loop had the wrong condition, objects that are 128 bytes in size need to have their length encoded as two bytes, not one. Change-Id: I3bef85f2b774871ba6104042b341749eb8e7595c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 08:53:03 -07:00
Shawn O. Pearce	f29741d1d8	amend commit: Support large delta packed objects as streams Rename the ByteWindow's inflate() method to setInput. We have completely refactored the purpose of this method to be feeding part (or all) of the window as input to the Inflater, and the actual inflate activity happens in the caller. Change-Id: Ie93a5bae0e9e637b5e822d56993ce6b562c6ad15 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-06 19:41:06 -07:00
Shawn O. Pearce	ab3c68c512	amend commit: Support large loose objects as streams We need to validate the stream state after the InflaterInputStream thinks the stream is done. Git expects a higher level of service from the Inflater than the InflaterInputStream usually gives, we need to ensure the embedded CRC is valid, and that there isn't trailing garbage at the end of the file. Change-Id: I1c9642a82dbd76b69e607dceccf8b85dc869a3c1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-06 19:41:01 -07:00
Stefan Lay	311da9b211	Fix comparison of nanoseconds NB.decodeInt32(info, base + 4) already returns nanoseconds. Therefore it must not be divided by 1000000. Change-Id: Ie8f5c4a03f984d98935dccedc2b1ba4457094899 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-06 17:57:17 +02:00
Shawn O. Pearce	1913b41bc7	log: Implement --follow The FollowFilter can be installed on a RevWalk to cause the path to be updated through rename detection when the affected file is found to be added to the project. The filter works reasonably well, for example we can follow the history of the fsck command in git-core: $ jgit log --name-status --follow builtin/fsck.c \| grep ^R R100 builtin-fsck.c builtin/fsck.c R099 fsck.c builtin-fsck.c R099 fsck-objects.c fsck.c R099 fsck-cache.c fsck-objects.c Change-Id: I4017bcfd150126aa342fdd423a688493ca660a1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 18:17:55 -07:00
Shawn O. Pearce	e9de5643fa	Cache the diff configuration section This way we don't have to reparse for the rename limit every time we create a new rename detector for a repository. Change-Id: I669d031690b85ef4da5e39189be7173fb773fc56 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 18:17:52 -07:00
Shawn O. Pearce	8a0c58394d	log: Add whitespace ignore options Similar to what we did with diff, implement whitespace ignore options for log too. This requires us to define some means of creating any RawText object type at will inside of DiffFormatter, so we define a new factory interface to construct RawText instances on demand. Unfortunately we have to copy the entire block of common options. args4j only processes the options/arguments on the one command class and Java doesn't support multiple inheritance. Change-Id: Ia16cd3a11b850fffae9fbe7b721d7e43f1d0e8a5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 17:32:47 -07:00
Shawn O. Pearce	bd8740dc14	Format submodule links during differences Instead of crashing, output a submodule link with the simple "Subproject commit $fullid\n" syntax used by C Git. Change-Id: Iae8646941683fb19b73fb038217d2e3bf5f77fa9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:59:06 -07:00
Shawn O. Pearce	5be90be996	Redo DiffFormatter API to be easier to use Passing around the OutputStream and the Repository is crazy. Instead put the stream in the constructor, since this formatter exists only to output to the stream, and put the repository as a member variable that can be optionally set. Change-Id: I2bad012fee7f40dc1346700ebd19f1e048982878 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:58:37 -07:00
Shawn O. Pearce	04a9d23b9a	log, diff: Add rename detection support Implement rename detection in the command line diff and log commands. Also support --name-status, -p and -U flags, as these can be quite useful to view more detail. All of the Git patch file formatting code is now moved over to the DiffFormatter class. This permits us to reuse it in any context, including inside of IDEs. Change-Id: I687ccba34e18105a07e0a439d2181c323209d96c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:32:03 -07:00
Shawn O. Pearce	978535b090	Implement similarity based rename detection Content similarity based rename detection is performed only after a linear time detection is performed using exact content match on the ObjectIds. Any names which were paired up during that exact match phase are excluded from the inexact similarity based rename, which reduces the space that must be considered. During rename detection two entries cannot be marked as a rename if they are different types of files. This prevents a symlink from being renamed to a regular file, even if their blob content appears to be similar, or is identical. Efficiently comparing two files is performed by building up two hash indexes and hashing lines or short blocks from each file, counting the number of bytes that each line or block represents. Instead of using a standard java.util.HashMap, we use a custom open hashing scheme similiar to what we use in ObjecIdSubclassMap. This permits us to have a very light-weight hash, with very little memory overhead per cell stored. As we only need two ints per record in the map (line/block key and number of bytes), we collapse them into a single long inside of a long array, making very efficient use of available memory when we create the index table. We only need object headers for the index structure itself, and the index table, but not per-cell. This offers a massive space savings over using java.util.HashMap. The score calculation is done by approximating how many bytes are the same between the two inputs (which for a delta would be how much is copied from the base into the result). The score is derived by dividing the approximate number of bytes in common into the length of the larger of the two input files. Right now the SimilarityIndex table should average about 1/2 full, which means we waste about 50% of our memory on empty entries after we are done indexing a file and sort the table's contents. If memory becomes an issue we could discard the table and copy all records over to a new array that is properly sized. Building the index requires O(M + N log N) time, where M is the size of the input file in bytes, and N is the number of unique lines/blocks in the file. The N log N time constraint comes from the sort of the index table that is necessary to perform linear time matching against another SimilarityIndex created for a different file. To actually perform the rename detection, a SxD matrix is created, placing the sources (aka deletions) along one dimension and the destinations (aka additions) along the other. A simple O(S x D) loop examines every cell in this matrix. A SimilarityIndex is built along the row and reused for each column compare along that row, avoiding the costly index rebuild at the row level. A future improvement would be to load a smaller square matrix into SimilarityIndexes and process everything in that sub-matrix before discarding the column dimension and moving down to the next sub-matrix block along that same grid of rows. An optional ProgressMonitor is permitted to be passed in, allowing applications to see the progress of the detector as it works through the matrix cells. This provides some indication of current status for very long running renames. The default line/block hash function used by the SimilarityIndex may not be optimal, and may produce too many collisions. It is borrowed from RawText's hash, which is used to quickly skip out of a longer equality test if two lines have different hash functions. We may need to refine this hash in the future, in order to minimize the number of collisions we get on common source files. Based on a handful of test commits in JGit (especially my own recent rename repository refactoring series), this rename detector produces output that is very close to C Git. The content similarity scores are sometimes off by 1%, which is most probably caused by our SimilarityIndex type using a different hash function than C Git uses when it computes the delta size between any two objects in the rename matrix. Bug: 318504 Change-Id: I11dff969e8a2e4cf252636d857d2113053bdd9dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:32:03 -07:00
Shawn O. Pearce	4dd7b35b26	Improve description of isBare and NoWorkTreeException Alex pointed out that my description of a bare repository might be confusing for some readers. Reword the description of the error, and make it consistent throughout the Repository class's API. Change-Id: I87929ddd3005f578a7022f363270952d1f7f8664 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:31 -07:00
Shawn O. Pearce	08d349a27b	amend commit: Refactor repository construction to builder class During code review, Alex raised a few comments about commit `532421d989` ("Refactor repository construction to builder class"). Due to the size of the related series we aren't going to go back and rebase in something this minor, so resolve them as a follow-up commit instead. Change-Id: Ied52f7a8f7252743353c58d20bfc3ec498933e00 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:30 -07:00
Shawn O. Pearce	fe9860a444	Remove pointless size test in PackFile decompress Now that any large objects are forced through a streaming loader when its bigger than getStreamFileThreshold(), and that threshold is pegged at Integer.MAX_VALUE as its largest size, we will never be able to reach this code path where we threw OutOfMemoryError. Robin pointed out that we probably should include a message here, but the code is effectively unreachable, so there isn't any value in adding a message at this point. So remove it. Change-Id: Ie611d005622e38a75537f1350246df0ab89dd500 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:30 -07:00
Shawn O. Pearce	412ca65bd5	Avoid unbounded getCachedBytes during parseAny Since we don't know the type of object we are parsing, we don't know if its a massive blob, or some small commit or annotated tag. Avoid pulling the cached bytes until we have checked the type and decided if we actually need them to continue parsing right now. This way large blobs which won't fit in memory and would throw a LargeObjectException don't abort parsing. Change-Id: Ifb70df5d1c59f616aa20ee88898cb69524541636 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:30 -07:00
Shawn O. Pearce	e4a480f658	Make type and size lazy for large delta objects Callers don't necessarily need the getSize() result from a large delta. They instead should be always using openStream() or copyTo() for blobs going to local files, or they should be checking the result of the constant-time isLarge() method to determine the type of access they can use on the ObjectLoader. Avoid inflating the delta instruction stream twice by delaying the decoding of the size until after we have created the DeltaStream and decoded the header. Likewise with the type, callers don't necessarily always need it to be present in an ObjectLoader. Delay looking at it as late as we can, thereby avoiding an ugly O(N^2) loop looking up the type for every single object in the entire delta chain. Change-Id: I6487b75b52a5d201d811a8baed2fb4fcd6431320 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:29 -07:00
Shawn O. Pearce	113577617b	Use core.streamFileThreshold to set our streaming limit We default this to 1 MiB for now, but we allow users to modify it through the Repository's configuration file to be a different value. A new repository listener is used to identify when the setting has been updated and trigger a reconfiguration of any active ObjectReaders. To prevent a horrible explosion we cap core.streamFileThreshold at no more than 1/4 of the maximum JVM heap size. We do this because we need at least 2 byte arrays equal in size to the stream threshold for the worst case delta inflation scenario, and our host application probably also needs some amount of the heap for their working set size. Change-Id: I103b3a541dc970bbf1a6d92917a12c5a1ee34d6c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-02 12:41:39 -07:00
Shawn O. Pearce	ad68553be4	Support large delta packed objects as streams Very large delta instruction streams, or deltas which use very large base objects, are now streamed through as large objects rather than being inflated into a byte array. This isn't the most efficient way to access delta encoded content, as we may need to rewind and reprocess the base object when there was a block moved within the file, but it will at least prevent the JVM from having its heap explode. When streaming a delta we have an inflater open for each level in the delta chain, to inflate the instruction set of the delta, as well as an inflater for the base level object. The base object is buffered, as is the top level delta requested by the application, but we do not buffer the intermediate delta streams. This keeps memory usage lower, so its closer to 1024 bytes per level in the chain, without having an adverse impact on raw throughput as the top-level buffer gets pushed down to the lowest stream that has the next region. Delta instructions transparently collapse here, if the top level does not copy a region from its base, the base won't materialize that part from its own base, etc. This allows us to avoid copying around a lot of segments which have been deleted from the final version. Change-Id: I724d45245cebb4bad2deeae7b896fc55b2dd49b3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-02 02:19:12 -07:00
Shawn O. Pearce	ded8f6c721	Support large whole packed objects as streams Similar to the loose object support, whole packed objects can now be streamed back to the caller. The streaming is less efficient as we copy the data from the cached window array into the InflaterInputStream's internal buffer, then inflate it there before returning to the application. Like with unpacked objects, there is plenty of room for some optimization, especially for the copyTo method, where we don't necessarily need so much buffering to exist. Change-Id: Ie23be81289e37e24b91d17b0891e47b9da988008 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-01 19:34:21 -07:00
Shawn O. Pearce	13e0218a25	Replace PackedObjectLoader with ObjectLoader.SmallObject The class is identical, but ObjectLoader.SmallObject is part of our public API for storage implementations to build on top of. Change-Id: I381a3953b14870b6d3d74a9c295769ace78869dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-01 18:27:51 -07:00
Shawn O. Pearce	fa23482ca7	Support large loose objects as streams Big loose objects can now be streamed if they are over the large object size threshold. This prevents the JVM heap from exploding with a very large byte array to hold the slurped file, and then again with its uncompressed copy. We may have slightly slowed down the simple case for small loose objects, as the loader no longer slurps the entire thing and decompresses in memory. To try and keep good performance for the very common small objects that are below 8 KiB in size, buffers are set to 8 KiB, causing the reader to slurp most of the file anyway. However the data has to be copied at least once, from the BufferedInputStream into the InflaterInputStream. New unit tests are supplied to get nearly 100% code coverage on the unpacked code paths, for both standard and pack style loose objects. We tested a fair chunk of the code elsewhere, but these new tests are better isolated to the specific branches in the code path. Change-Id: I87b764ab1b84225e9b5619a2a55fd8eaa640e1fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-01 18:26:17 -07:00
Jeff Schumacher	cb8e1e6014	Added a preliminary version of rename detection JGit does not currently do rename detection during diffs. I added a class that, given a TreeWalk to iterate over, can output a list of DiffEntry's for that TreeWalk, taking into account renames. This class only detects renames by SHA1's. More complex rename detection, along the lines of what C Git does will be added later. Change-Id: I93606ce15da70df6660651ec322ea50718dd7c04	2010-07-01 17:33:53 -07:00
Shawn O. Pearce	2489088235	Permit AnyObjectTo to compareTo AnyObjectId Assume that the argument of compareTo won't be mutated while we are doing the compare, and support the wider AnyObjectId type so MutableObjectId is suitable on either side of the compareTo call. Change-Id: I2a63a496c0a7b04f0e5f27d588689c6d5e149d98 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 19:07:36 -07:00
Shawn O. Pearce	d04b7972d8	Use copyTo during checkout of files to working tree This way we can stream a large file through memory, rather than loading the entire thing into a single contiguous byte array. Change-Id: I3ada2856af2bf518f072edec242667a486fb0df1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:56:20 -07:00
Shawn O. Pearce	a0fd06e5c2	Stream whole deflated objects in PackWriter Instead of loading the entire object as a byte array and passing that into the deflater, let the ObjectLoader copy the object onto the DeflaterOutputStream. This has the nice side effect of using some sort of stride hack in the Sun implementation that may improve compression performance. Change-Id: I3f3d681b06af0da93ab96c75468e00e183ff32fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:50:50 -07:00
Shawn O. Pearce	ad0383734e	Lazily allocate Deflater in PackWriter Only allocate the Deflater if we can't reuse everything, but also make sure we release it when we release the PackWriter's resources. Change-Id: I16a32b94647af0778658eda87acbafc9a25b314a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:40:54 -07:00
Shawn O. Pearce	23e7f6376a	Add openStream to ObjectLoader for big blobs Blobs that are too large to read as a single byte array should be accessed through an InputStream based interface instead, allowing the application to walk through the data stream incrementally. Define the basic interface to support streaming contents, but don't implement it yet for the file based backend. Change-Id: If9e4442e9ef4ed52c3e0f1af9398199a73145516 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:36:10 -07:00
Jeff Schumacher	7b0b4110ed	Refactored code out of FileHeader to facilitate rename detection Refactored a superclass out of FileHeader called DiffEntry that holds the more general data from FileHeader that is useful in rename detection (old/new Ids, modes, names, as well as changeType and score). FileHeader is now a DiffEntry that adds Hunks, parsing abilities, etc. Change-Id: I8398728cd218f8c6e98f7a4a7f2f342391d865e4	2010-06-30 17:53:27 -07:00
Dmitry Neverov	44854741c5	Fix missing flush in StreamCopyThread It is possible that StreamCopyThread will not flush everything from it's src to it's dst. In most cases StreamCopyThread works like this: in loop: n = src.read(buf); dst.write(buf, 0, n); and when we want to flush, we interrupt() StreamCopyThread and it flushes everything it wrote to dst. The problem is that our interrupt() could interrupt reading. In this case we will flush everything we wrote to dst, but not everything we wrote to src. Change-Id: Ifaf4d8be87535c7364dd59b217dfc631460018ff Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 10:48:44 -07:00
Shawn O. Pearce	a1d5f5b6b5	Move DirCache factory methods to Repository Instead of creating the DirCache from a static factory method, use an instance method on Repository, permitting the implementation to override the method with a completely different type of DirCache reading and writing. This would better support a repository in the cloud strategy, or even just an in-memory unit test environment. Change-Id: I6399894b12d6480c4b3ac84d10775dfd1b8d13e7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 10:39:00 -07:00
Shawn O. Pearce	cb9d8285ba	Create NoWorkTreeException for bare repositories Using a custom exception type makes it easire for an application developer to understand why an exception was thrown out of a method we declare. To remain compatiable with existing callers, we still extend off IllegalStateException. Change-Id: Ideeef2399b11ca460a2dbb3cd80eb76aa0a025ba Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 09:48:36 -07:00
Jeff Schumacher	9f2249bd26	Added check for binary files while diffing Added a check in Diff to ensure that files that are most likely not text are not line-by-line diffed. Files are determined to be binary by checking the first 8000 bytes for a null character. This is a similar heuristic to what C Git uses. Change-Id: I2b6f05674c88d89b3f549a5db483f850f7f46c26	2010-06-29 17:23:00 -07:00
Shawn O. Pearce	515deaf7e5	Ensure RevWalk is released when done Update a number of calling sites of RevWalk to ensure the walker's internal ObjectReader is released after the walk is no longer used. Because the ObjectReader is likely to hold onto a native resource like an Inflater, we don't want to leak them outside of their useful scope. Where possible we also try to share ObjectReaders across several walk pools, or between a walker and a PackWriter. This permits the ObjectReader to actually do some caching if it felt inclined to do so. Not everything was updated, we'll probably need to come back and update even more call sites, but these are some of the biggest offenders. Test cases in particular aren't updated. My plan is to move most storage-agnostic tests onto some purely in-memory storage solution that doesn't do compression. Change-Id: I04087ec79faeea208b19848939898ad7172b6672 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-29 15:12:53 -07:00
Shawn O. Pearce	94228bde22	Use ObjectReader in DirCacheBuilder.addTree Rather than building a custom reader, have the caller supply us one. Change-Id: Ief2b5a6b1b75f05c8a6bc732a60d4d1041dd8254 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-29 09:30:29 -07:00
Shawn O. Pearce	d6e975f71b	Use one ObjectReader for WalkFetchConnection Instead of creating new ObjectReader for each walker, use one for the entire connection and delegate reads through it. Change-Id: I7f0a2ec8c9fe60b095a7be77dc423a2ff8b443a3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 18:47:33 -07:00
Shawn O. Pearce	121d009b9b	Use ObjectReader in RevWalk, TreeWalk We don't actually need a Repository object here, just an ObjectReader that can load content for us. So change the API to depend on that. However, this breaks the asCommit and asTag legacy translation methods on RevCommit and RevTag, so we still have to keep the Repository inside of RevWalk for those two types. Hopefully we can drop those in the future, and then drop the Repository off the RevWalk. Change-Id: Iba983e48b663790061c43ae9ffbb77dfe6f4818e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 18:47:29 -07:00
Shawn O. Pearce	06f635a4bc	Fix minor formatting issue in UploadPack Change-Id: Ifc0c3a94dc0e16126af6cf17e9c4a7cb96e8ffab Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 18:47:28 -07:00
Shawn Pearce	3fd4918852	Merge changes Ie56301aa,Ic2f79e85 * changes: Added further support for whitespace ignoring during diff Added support for whitespace ignoring	2010-06-28 20:27:04 -04:00
Jeff Schumacher	9869ef2592	Added further support for whitespace ignoring during diff Added code to support ignoring leading, trailing, and changed whitespace when performing a diff operation. I also added command line options to Diff to enable the various whitespace ignoring methods. These match the flags for git diff. Change-Id: Ie56301aafad59ee3f0fe5de62719f5023cd702c8	2010-06-28 17:25:19 -07:00
Shawn O. Pearce	242b4026d9	Remove volatile keyword from RepositoryEvent We don't need this field to be volatile. Events are delivered by the same thread that created the RepositoryEvent object, and thus any cross-thread operations would need to be handled by some other type of synchronization in the listener, and that would protect both the repository field and any other per-event data. Change-Id: Iefe345959e1a2d4669709dbf82962bcc1b8913e3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 12:46:18 -07:00
Shawn O. Pearce	aa4b06e087	Rename openObject, hasObject to just open, has Similar to what we did on Repository, the openObject method already implied we wanted to open an object, given its main argument was of type AnyObjectId. Simplify the method name to just the action, has or open. Change-Id: If055e5e0d8de0e2424c18a773f6d2bc2f66054f4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:57:41 -07:00
Shawn O. Pearce	acb7be2c5a	Refactor Repository.openObject to be Repository.open We drop the "Object" suffix, because its pretty clear here that we want to open an object, given that we pass in AnyObjectId as the main parameter. We also fix the calling convention to throw a MissingObjectException or IncorrectObjectTypeException, so that callers don't have to do this error checking themselves. Change-Id: I72c43353cea8372278b032f5086d52082c1eee39 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:54:58 -07:00
Shawn O. Pearce	6b62e53b60	Move PackWriter progress monitors onto the operations Rather than taking the ProgressMonitor objects in our constructor and carrying them around as instance fields, take them as arguments to the actual time consuming operations we need to run. Change-Id: I2b230d07e277de029b1061c807e67de5428cc1c4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:47:28 -07:00
Shawn O. Pearce	f288c27e46	Pass the PackOutputStream down the call stack Rather than storing this in an instance member, pass it down the calling stack. Its cleaner, we don't have to poke the stream as a temporary field, and then unset it. Change-Id: I0fd323371bc12edb10f0493bf11885d7057aeb13 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:47:28 -07:00
Shawn O. Pearce	1ad2feb7b3	Remove Repository.openObject(ObjectReader, AnyObjectId) Going through ObjectReader.openObject(AnyObjectId) is faster, but also produces cleaner application level code. The error checking is done inside of the openObject method, which means it can be removed from the application code. Change-Id: Ia927b448d128005e1640362281585023582b1a3a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:47:28 -07:00
Shawn O. Pearce	9ba7bd4df4	Throw IncorrectObjectTypeException on bad type hints If the type hint isn't OBJ_ANY and it doesn't match the actual type observed from the object store, define the reader to throw back an IncorrectObjectTypeException. This way the caller doesn't have to perform this check itself before it evaluates the object data, and we can simplify quite a few call sites. Change-Id: I9f0dfa033857f439c94245361fcae515bc0a6533 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:47:25 -07:00
Jeff Schumacher	543235b805	Added support for whitespace ignoring JGit did not have support for skipping whitespace when comparing lines in RawText objects. I added a subclass of RawText that skips whitespace in its equals and hashCode methods. I used a subclass rather than adding functionality into RawText so that performance would not be impacted by extra logic. This class only supports ignoring all whitespace. Others will follow that allow other forms of whitespace ignoring. Change-Id: Ic2f79e85215e48d3fd53ec1b4ad13373dd183a4a	2010-06-28 10:59:10 -07:00
Shawn O. Pearce	a45728d7a4	Ensure ObjectReader used by PackWriter is released The ObjectReader API demands that we release the reader when we are done with it. PackWriter contains a reader, which it uses for the entire packing session. Expose the release of the reader through a release method on the writer. This still doesn't address the RevWalk and TreeWalk users, who don't correctly release their reader. But its a small step in the right direction. Change-Id: I5cb0b5c1b432434a799fceb21b86479e09b84a0a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 10:25:11 -07:00
Shawn O. Pearce	b5aa52e98a	Ensure PackWriter releases its ObjectReader Change-Id: I3f8af29066cc5a2132dc4a75c9654d97800f2f18 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 10:16:27 -07:00
Shawn O. Pearce	e01abbd543	Release ObjectReader before the cached ObjectDatabase I don't want to play games with the order of release here, its probably safer to release the reader before the database, just in case the one depends on the other. Change-Id: I2394c7d2477eaf7a7e1556fc3393c59d3b31e764 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 09:47:20 -07:00
Shawn O. Pearce	b40f02eb1a	Release ObjectInserter in merge() not mergeImpl() By doing the release at the higher level class, we can ensure the release occurs if the inserter was allocated, even if the implementation forgets to do this. Since the higher level class is what allocated it, it makes sense to have it also do the release. Change-Id: Id617b2db864c3208ed68cba4eda80e51612359ad Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 09:35:55 -07:00
Shawn O. Pearce	5aae041a81	Commit: Use Repository.newObjectInserter Everyone else does. This must have been a spot I missed during some sort of squash while developing the series. Change-Id: I62eae50b618f47ee33ad7cf71fc05b724f603201 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 09:22:48 -07:00
Shawn O. Pearce	ea21c111cb	Move PackWriter over to storage.pack.PackWriter Similar to what we did with the file code, move the pack writer into its own package so the related classes and their package private methods are hidden from the rest of the library. Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:51:12 -07:00
Shawn O. Pearce	71aace52f7	Simplify ObjectLoaders coming from PackFile We no longer need an ObjectLoader to be lazy and try to delay the materialization of the object content. That was done only to support PackWriter searching for a good reuse candidate. Instead, simplify the code base by doing the materialization immediately when the loader asks for it, because any caller asking for the loader is going to need the content. Change-Id: Id867b1004529744f234ab8f9cfab3d2c52ca3bd0 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:50:38 -07:00
Shawn O. Pearce	68518ca3aa	Remove getRawSize, getRawType from ObjectLoader These were only used by PackWriter to help it filter object representations. Their only user disappeared when we rewrote the object selection code path to use the new representation type. Change-Id: I9ed676bfe4f87fcf94aa21e53bda43115912e145 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:50:38 -07:00
Shawn O. Pearce	86547022f0	Tighten up local packed object representation during packing Rather than making a loader, and then using that to fill the object representation, parse the header and set up our data directly. This saves some time, as we don't waste cycles on information we won't use right now. The weight computed for a representation is now its actual stored size in the pack file, rather than its inflated size. This accounts for changes made when the compression level is modified on the repository. It is however more costly to determine the weight of the object, since we have to find its length in the pack. To try and recover that cost we now cache the length as part of our ObjectToPack record, so it doesn't have to be found during the output phase. A LocalObjectToPack now costs us (assuming 32 bit pointers): (32 bit) (64 bit) vm header: 8 bytes 8 bytes ObjectId: 20 bytes 20 bytes PackedObjectInfo: 12 bytes 12 bytes ObjectToPack: 8 bytes 12 bytes LocalOTP: 20 bytes 24 bytes ----------- --------- 68 bytes 74 bytes Change-Id: I923d2736186eb2ac8ab498d3eb137e17930fcb50 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:50:38 -07:00
Shawn O. Pearce	ad5238dc67	Move FileRepository to storage.file.FileRepository This move isolates all of the local file specific implementation code into a single package, where their package-private methods and support classes are properly hidden away from the rest of the core library. Because of the sheer number of files impacted, I have limited this change to only the renames and the updated imports. Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:50:34 -07:00
Shawn O. Pearce	3a7aec03e0	Implement zero-copy for single window objects Objects that fall completely within a single window can be worked with in a zero-copy fashion, provided that the window is backed by a normal byte[] and not by a ByteBuffer. This works for a surprising number of objects. The default window size is 8 KiB, but most deltas are quite a bit smaller than that. Objects smaller than 1/2 of the window size have a very good chance of falling completely within a window's array, which means we can work with them without copying their data around. Larger objects, or objects which are unlucky enough to span over a window boundary, get copied through the temporary buffer. We pay a tiny penalty to realize we can't use the zero-copy code path, but its easier than trying to keep track of two adjacent windows. With this change (as well as everything preceeding it), packing is actually a bit faster. Some crude benchmarks based on cloning linux-2.6.git (~324 MiB, 1,624,785 objects) over localhost using C git client and JGit daemon shows we get better throughput, and slightly better times: Total Time \| Throughput (old) (now) \| (old) (now) --------------+--------------------------- 2m45s 2m37s \| 12.49 MiB/s 21.17 MiB/s 2m42s 2m36s \| 16.29 MiB/s 22.63 MiB/s 2m37s 2m31s \| 16.07 MiB/s 21.92 MiB/s Change-Id: I48b2c8d37f08d7bf5e76c5a8020cde4a16ae3396 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 16:13:22 -07:00
Shawn O. Pearce	ece88b99eb	Redo PackWriter object reuse output Output of selected reuses is refactored to use a new ObjectReuseAsIs interface that extends the ObjectReader. This interface allows the reader to control how it performs the reuse into the output stream, but also allows it to throw an exception to request the writer to find a different candidate representation. The PackFile reuse code was overhauled, cleaning up the APIs so they aren't exposed in the object loader, but instead are now a single method on the PackFile itself. The reuse algorithm was changed to do a data verification pass, followed by the copy pass to the output. This permits us to work around a corrupt object in a pack file by seeking another copy of that object when this one is bad. The reuse code was also optimized for the common case, where the in-pack representation is under 16 KiB. In these smaller cases data is sent to the pack writer more directly, avoiding some copying. Change-Id: I6350c2b444118305e8446ce1dfd049259832bcca Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 14:46:05 -07:00
Shawn O. Pearce	bf4ffff07f	Redo PackWriter object reuse selection The new selection implementation uses a public API on the ObjectReader, allowing the storage library to enumerate its candidates and select the best one for this packer without needing to build a temporary list of the candidates first. Change-Id: Ie01496434f7d3581d6d3bbb9e33c8f9fa649b6cd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 14:16:06 -07:00
Shawn O. Pearce	e0c9368f3e	Reclaim some bits in ObjectToPack flags field Make the lower bits available for flags that PackWriter can use to keep track of facts about the object. We shouldn't need more than 2^24 delta depths, unpacking that chain is unfathomable anyway. This change gets us 4 bits that are unused in the lower end of the word, which are typically easier to load from Java and most machine instruction sets. We can use these in later changes. Change-Id: Ib9e11221b5bca17c8a531e4ed130ba14c0e3744f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 23:26:19 -07:00
Shawn O. Pearce	6fc3ecac84	Extract PackFile specific code to ObjectToPack subclass The ObjectReader class is dual-purposed into being a factory for the ObjectToPack, permitting specific ObjectDatabase implementations to override the method and offer their own custom subclass of the generic ObjectToPack class. By allowing them to directly extend the type, each implementation can add custom fields to support tracking where an object is stored, without incurring any additional penalties like a parallel Map<ObjectId,Object> would cost. The reader was chosen to act as a factory rather than the database, as the reader will eventually be tied more tightly with the ObjectWalk and TreeWalk. During object enumeration the reader would have had to load the object for the RevWalk, and may chose to cache object position data internally so it can later be reused and fed into the ObjectToPack instance supplied to the PackWriter. Since a reader is not thread-safe, and is scoped to this PackWriter and its internal ObjectWalk, its a great place for the database to perform caching, if any. Right now this change goes a bit backwards by changing what should be generic ObjectToPack references inside of PackWriter to the very PackFile specific LocalObjectToPack subclass. We will correct these in a later commit as we start to refine what the ObjectToPack API will eventually look like in order to better support the PackWriter. Change-Id: I9f047d26b97e46dee3bc0ccb4060bbebedbe8ea9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 23:26:19 -07:00
Shawn O. Pearce	a2208be6aa	Extract ObjectToPack to be top-level This shortens the implementation within PackWriter, and starts to open the door for some other refactorings based on changing the ObjectToPack to be a public part of the API. Change-Id: Id849cbffc4de20b903e844a2de7737eeb8b7a3ff Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 23:26:19 -07:00
Shawn O. Pearce	ffe0614d4d	Allow Repository.getDirectory() to be null Some types of repositories might not be stored on local disk. For these, they will most likely return null for getDirectory() as the java.io.File type cannot describe where their storage is, its not in the host's filesystem. Document that getDirectory() can return null now, and update all current non-test callers in JGit that might run into problems on such repositories. For the most part, just act like its bare. Change-Id: I061236a691372a267fd7d41f0550650e165d2066 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 18:03:41 -07:00
Shawn O. Pearce	8a9844b2af	Redo event listeners to be more generic Replace the old crude event listener system with a much more generic implementation, patterned after the event dispatch techniques used in Google Web Toolkit 1.5 and later. Each event delivers to an interface that defines a single method, and the event itself is what performs the delivery in a type-safe way through its own dispatch method. Listeners are registered in a generic listener list, indexed by the interface they implement and wish to receive an event for. Delivery of events is performed by looping through all listeners implementing the event's corresponding listener interface, and using the event's own dispatch method to deliver the event. This is the classical "double dispatch" pattern for event delivery. Listeners can be unregistered by invoking remove() on their registration handle. This change therefore requires application code to track the handle if it wishes to remove the listener at a later point in time. Event delivery is now exposed as a generic public method on the Repository class, making it easier for any type of message to be sent out to any type of listener that has registered, without needing to pre-arrange for type-safe fireFoo() methods. New event types can be added in the future simply by defining a new RepositoryEvent subclass and a corresponding RepositoryListener interface that it dispatches to. By always adding new events through a new interface, we never need to worry about defining an Adapter to provide default no-op implementations of new event methods. Change-Id: I651417b3098b9afc93d91085e9f0b2265df8fc81 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 18:03:41 -07:00
Shawn O. Pearce	203bd66267	Rename Repository getWorkDir to getWorkTree This better matches with the name used in the environment (GIT_WORK_TREE), in the configuration file (core.worktree), and in our builder object. Since we are already breaking a good chunk of other code related to repository access, and this fairly easy to fix in an application's code base, I'm not going to offer the wrapper getWorkDir() method. Change-Id: Ib698ba4bbc213c48114f342378cecfe377e37bb7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 18:03:41 -07:00
Shawn O. Pearce	532421d989	Refactor repository construction to builder class The new FileRepositoryBuilder class helps applications to construct a properly configured FileRepository, with properties assumed based upon the standard Git rules for the local filesystem. To better support simple command line applications, environment variable handling and repository searching was moved into this builder class. The change gets rid of the ever-growing FileRepository constructor variants, and the multitude of java.io.File typed parameters, by using simple named setter methods. Change-Id: I17e8e0392ad1dbf6a90a7eb49a6d809388d27e4c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:58:40 -07:00
Shawn O. Pearce	8f46ee4870	Remove Repository.toFile(ObjectId) Not every type of Repository will be able to map an ObjectId into a local file system path that stores that object's file contents. Heck, its not even true for the FileRepository, as an object can be stored in a pack file and not in its loose format. Remove this from our public API, it was a mistake to publish it. Change-Id: I20d1b8c39104023936e6d46a5b0d7ef39ff118e8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:58:39 -07:00
Shawn O. Pearce	41c04bbb28	Use ObjectInserter for loose objects in WalkFetchConnection Rather than relying on the repository's ability to give us the local file path for a loose object, just pass its inflated form to the ObjectInserter for the repository. We have to recompress it, which may slow down fetches, but this is the slow dumb protocol. The extra cost to do the compression locally isn't going to be a major bottleneck. This nicely removes the nasty part about computing the object identity by hand, allowing us to instead rely upon the inserter's internal computation. Unfortunately it means we might store a loose object whose SHA-1 doesn't match the expected SHA-1, such as if the remote repository was corrupted. This is fairly harmless, as the incorrectly named object will now be stored under its proper name, and will eventually be garbage collected, as its not referenced by the local repository. We have to flush the inserter after the object is stored because we aren't sure if we need to read the object later, or not. Change-Id: Idb1e2b1af1433a23f8c3fd55aeb20575e6047ef0 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:58:06 -07:00
Shawn O. Pearce	5cfc29b491	Replace WindowCache with ObjectReader The WindowCache is an implementation detail of PackFile and how its used by ObjectDirectory. Lets start to hide it and replace the public API with a more generic concept, ObjectReader. Because PackedObjectLoader is also considered a private detail of PackFile, we have to make PackWriter temporarily dependent upon the WindowCursor and thus FileRepository and ObjectDirectory in order to just start the refactoring. In later changes we will clean up the APIs more, exposing sufficient support to PackWriter without needing the file specific implementation details. Change-Id: I676be12b57f3534f1285854ee5de1aa483895398 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:58:01 -07:00
Shawn O. Pearce	133c987f4d	Refactor alternate object databases below ObjectDirectory Not every object storage system will have the concept of alternate object databases to search, and even if they do, they may not have the notion of fast-access / slow-access split like we do within the ObjectDirectory code for pack files and loose objects. Push all of that down below the generic API so that it is a hidden detail of the ObjectDirectory and its related supporting classes. Change-Id: I54bc1ca5ff2ac94dfffad1f9a9dad7af202b9523 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:41 -07:00
Shawn O. Pearce	88530a179e	Start using ObjectInserter instead of ObjectWriter Some newer style APIs are updated to use the newer ObjectInserter interface instead of the now deprecated ObjectWriter. In many of the unit tests we don't bother to release the inserter, these are typically using the file backend which doesn't need a release, but in the future should use an in-memory HashMap based store, which really wouldn't need it either. Change-Id: I91a15e1dc42da68e6715397814e30fbd87fa2e73 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:41 -07:00
Shawn O. Pearce	cad10e6640	Refactor object writing responsiblities to ObjectDatabase The ObjectInserter API permits ObjectDatabase implementations to control their own object insertion behavior, rather than forcing it to always be a new loose file created in the local filesystem. Inserted objects can also be queued and written asynchronously to the main application, such as by appending into a pack file that is later closed and added to the repository. This change also starts to open the door to non-file based object storage, such as an in-memory HashMap for unit testing, or a more complex system built on top of a distributed hash table. To help existing application code port to the newer interface we are keeping ObjectWriter as a delegation wrapper to the new API. Each ObjectWriter instances holds a reference to an ObjectInserter for the Repository's top-level ObjectDatabase, and it flushes and releases that instance on each object processed. Change-Id: I413224fb95563e7330c82748deb0aada4e0d6ace Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:41 -07:00
Shawn O. Pearce	3e3a50db5e	Change Repository.getConfig() to return non-file Configs A repository implementation might support storing configurations on a non-file storage system, so widen the return value to be any type of configuration. Change-Id: If9a0928f4b3ef29a24d270b0ce585a6e77f6fac6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	4c14b7623d	Make lib.Repository abstract and lib.FileRepository its implementation To support other storage models other than just the local filesystem, we split the Repository class into a nearly abstract interface and then create a concrete subclass called FileRepository with the file based IO implementation. We are using an abstract class for Repository rather than the much more generic interface, as implementers will want to inherit a large array of utility functions, such as resolve(String). Having these in a base class makes it easy to inherit them. This isn't the final home for lib.FileRepository. Future changes will rename it into storage.file.FileRepository, but to do that we need to also move a number of other related class, which we aren't quite ready to do. Change-Id: I1bd54ea0500337799a8e792874c272eb14d555f7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	77b39df5ec	Consistently fail work tree methods on bare repositories If the working tree isn't available, it doesn't make any sense to obtain the merge heads, or the buffered commit message. The repository shouldn't have a partial merge state to read. Throw back the same exception we do when invoking getWorkDir() on a bare repository instance. Change-Id: I762c55890b7fe272a183da583f910671d1cadf71 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	f18b853044	Consistently use getDirectory() for work tree state This permits us to leave the implementation of these methods here in the Repository class, but later refactor how the directory is accessed into a subclass. Change-Id: I5785b2009c5b7cca0fb070a968e50814ce847076 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	a63494edee	Add RepositoryState.BARE A bare repository cannot be checked out, committed to, etc. as it doesn't have a working directory. Define this as a state since the state enumeration exists only to describe how a working directory can be modified. Change-Id: I0a299013c6e42fef6cae3f6a9446f8f6c8e0514a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	c9c57d34de	Rename Repository 'config' as 'repoConfig' This better matches with the other configuration variable, 'userConfig', and helps to make it clear what config object we are dealing with. Change-Id: I2c585649aecc805e8e66db2f094828cd2649e549 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	6a822f0ebf	Remove RepositoryConfig and use FileBasedConfig instead Change the Repository API to use straight-up FileBasedConfig. This lets us remove the subclass RepositoryConfig and stop having a specialized configuration type for repository, letting us instead focus the config type heirarchy on type-of-storage rather than use. Change-Id: I7236800e8090624453a89cb0c7a9a632702691c6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	bd8b06427f	Delegate repository access to refs, objects Instead of using the internal field directly to access references or objects, use the getter method to obtain the proper type of database, and follow down from there. This permits us to later do a refactoring that makes those methods abstract and strips the field out of the Repository class, moving it into a concrete base class that is more storage implementation specific. Change-Id: Ic21dd48800e68a04ce372965ad233485b2a84bef Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	f6c26dabd2	Cleanup Repository.create() This method doesn't need to be synchronized, as its only a proxy to create(boolean), which is the real worker. While we are touching it try to improve the Javadoc and whitespace nearby. Change-Id: Ibdddec6e518ca6d7439cfad90fedfcdc2d6b7a2e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:39 -07:00
Shawn O. Pearce	5309244713	Move additional have enumeration to Repository This permits the repository implementation to know what its alternates concept means, and avoids needing to expose finer details about the ObjectDatabase to network code like the RefAdvertiser. Change-Id: Ic6d173f300cb72de34519c7607cf7b0ff3ea6882 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:39 -07:00
Shawn O. Pearce	479fcf9e32	Refactor amazon-s3:// property file loading to support no directory In the future getDirectory() can return null. Avoid an NPE here by refactoring the code to support conditionally skipping a check for the properties file in the repository directory, falling to only the user's ~/ file location. Change-Id: I76f5503d4063fdd9d24b7c1b58e1b09ddf1a5670 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:39 -07:00
Shawn O. Pearce	f39c9fc741	Download pack-.idx to /tmp if not on local filesystem If the destination repository doesn't use an ObjectDirectory to store its objects, we can't download to the object directory. Instead pull the pack-.idx files down to temporary files in the JVM's default temporary directory. Change-Id: Ied16bc89be624d87110ba42ba52d698a6ea7d982 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:39 -07:00
Shawn O. Pearce	553c2e5a42	DirCache must use getIndexFile When reading or locking the index of a repository, we need to use the index file specified by the repository, to ensure we correctly honor what the repository was configured with. Change-Id: I5be366ce32d7923b888dc01d19335912b01b7c4c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:39 -07:00
Shawn O. Pearce	60aae90d4d	Disable topological sorting in PackWriter Its not strictly required that we sort topologically in order to produce a valid pack file. This was just something that Linus thought would be a good idea to do. In practice its not that important for most repositories. Local file IO quickly falls out of the pattern that topological sorting provides any sort of benefit for, so expending extra resources to enforce it when we make a pack isn't really worth it. I'm removing this sort in the pipeline because later changes would support really efficient COMMIT_TIME_DESC sorting on a non-file storage system, but TOPO sorting would be a bit more ugly to run, due to the in-memory delays it imposes. Change-Id: I0121453461c2140c6917cb10c6df584eb47e5795 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:32:41 -07:00
Shawn O. Pearce	ccd0c0c911	UploadPack: Permit flushing progress messages under smart HTTP If UploadPack invokes flush() on the output stream we pass it, its most likely the progress messages coming down the side band stream. As pack generation can take a while, we want to push that down at the client as early as we can, to keep the connection alive, and to let the user know we are still working on their behalf. Ensure we dump the temporary buffer whenever flush() is invoked, otherwise the messages don't get sent in a timely fashion to the user agent (in this case, git fetch). We specifically don't implement flush() for ReceivePack right now, as that protocol currently does not provide progress messages to the user, but it does invoke flush several times, as the different streams include '0000' type flush-pkts to denote various end points. Change-Id: I797c90a2c562a416223dc0704785f61ac64e0220 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:32:41 -07:00
Shawn O. Pearce	b6ba9739d5	Rewrite resolve in terms of RevWalk We want to eventually get rid of the mapCommit, mapTree APIs on Repository and force everyone into the faster parsers that exist in RevWalk. Rewriting resolve in terms of the faster parsers is a good first step. It actually simplifies the code a bit, as we no longer need to keep track of an ObjectId and an Object (the parsed form), since all RevObjects implicitly have their ObjectId readily available. Change-Id: I4d234630195616e2c263e7e70038b55a1be4e7a3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:32:41 -07:00
Shawn O. Pearce	47c07e1a0d	Replace manual peel loops with RevWalk.peel Instead of peeling things by hand in application level code, defer the peeling logic into RevWalk's new peel utility method. Change-Id: Idabd10dc41502e782f6a2eeb56f09566b97775a8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:32:40 -07:00
Shawn O. Pearce	599c0ce745	Use RevTag/RevCommit to sort in a PlotWalk We already have these objects parsed and cached in our object pool. We shouldn't be looking them up via the legacy mapObject API, but instead can use the pool and the faster parsing routines available through the RevWalk that we extend. While we are here fixing the code, lets also correct the tag date sorting to accept tags that have no tagger identity, because they were created before Git knew how to store that field. Change-Id: Id49a11f6d9c050c82b876e5e11058840c894b2d7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:32:40 -07:00
Shawn O. Pearce	e1b312b5f7	Use CoreConfig, UserConfig and TransferConfig directly Rather than relying on the helpers in RepositoryConfig to get these objects, obtain them directly through the Config API. Its only slightly more verbose, but permits us to work with the base Config class, which is more flexible than the highly file specific RepositoryConfig. This is what I really meant to do when I added the section parser and caching support to Config, we just failed to finish updating all of the call sites. Change-Id: I481cb365aa00bfa8c21e5ad0cd367ddd9c6c0edd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:29:38 -07:00
Shawn O. Pearce	8e396bcddc	Use higher level Config types when possible We don't have to assume/depend on RepositoryConfig here, these two tests can use higher level versions of the class and still come up with the same test. That frees us up to do some changes to the RepositoryConfig API. Change-Id: Ia7b263c8c5efa3fae1054416d39c546867288132 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:29:37 -07:00
Shawn O. Pearce	5ed96eb7f4	UploadPack: Avoid unnecessary flush in smart HTTP Under smart HTTP the biDirectionalPipe flag is false, and we return back immediately at this point in the negotiation process. There is no need to flush the stream to the client, the request is over and it will be automatically flushed out by the higher level servlet that invoked us. Avoiding flush here allows us to only use flush after a progress message is sent during pack generation. Change-Id: Id0c8b7e95e3be6ca4c1b479e096bed6b0283b828 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:15 -07:00
Shawn O. Pearce	066df3d1a1	Add MutableObjectId.copyFrom(AnyObjectId) This simplifies the PackIndex code, which is trying to quickly copy an existing ObjectId into a MutableObjectId. Rather than having the PackIndex violate the ObjectId's internals, expose a copy from function similar to the other ones for copying from raw byte arrays or hex formatted strings. Change-Id: I142635cbece54af2ab83c58477961ce925dc8255 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:15 -07:00
Shawn O. Pearce	677b9b17e2	Expose AnyObjectId compareTo(byte[]) and compareTo(int[]) Storage systems can use these implementations to compare a passed AnyObjectId with a stored representation of an ObjectId in the canonical network byte order format. This can be useful to do a binary search, or just linear scan, over an encoded storage file. Change-Id: I8c72993c4f4c6e98d599ac2c9867453752f25fd2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:15 -07:00
Shawn O. Pearce	864cc3de10	Expose RefWriter constructor taking RefList An implementation might prefer to use the RefList type here, and RefList is part of our public API. Expose the constructor so callers who have a RefList can take advantage of the existing sorting. Change-Id: I545867f85aa2c479d2d610024ebbe318144709c8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:15 -07:00
Shawn O. Pearce	bfc43c13bc	Expose RefUpdate constructor to any subclass When we finally move RefDirectory to the new storage.file package, its associated RefDirectoryUpdate will need visiblity to this constructor in order to initialize itself. This is true of any other repository implementation, so make it protected rather than package level visible. Change-Id: If838aec9baeb80ee2f12dcbca717657c725a9242 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:14 -07:00
Shawn O. Pearce	8e40697047	Expose repository change event constructors Repository implementations outside of .lib need to be able to create these events and deliver them to listening application code. Expose and document the constructors so that they are visible when we move FileRepository into storage.file.FileRepository. Change-Id: I7fb6e8f4f5fdab683c5ebb5267673aa6d5b560bb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:14 -07:00
Shawn O. Pearce	b3254d1159	isValidRefName: Inline the forbidden ref suffix of ".lock" A Git reference name must never end with ".lock", as it would confuse any existing C client that tries to obtain a clone of the repository over the network. Even if the repository isn't on a local filesystem, it still should ban that suffix. Because I plan to move LockFile to storage.file and make it a private implementation detail of the local file system storage model, we can't rely on its package level SUFFIX field here. Making it public probably won't work long-term either, as I also plan to pull storage.file into its own separate project that depends on the core library. So, just inline the constant here. Its as foribidden as ":" is. Change-Id: If85076861baeacc183b82696375a13e935ba8836 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:14 -07:00
Shawn Pearce	f3186974b6	Merge "Fix line endings"	2010-06-18 18:15:53 -04:00
Matthias Sohn	767fb175ed	Fix line endings Some sources had dos line endings. Also configure all projects to use unix line endings and UTF-8 text encoding. Change-Id: I8fc9a1dbb219ffa91d1b3011b3b11b7e48e74ca7 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-06-18 23:36:18 +02:00
Shawn Pearce	3149f971e0	Merge ""Bare" Repository should not return working directory."	2010-06-16 22:34:46 -04:00
Andrew Bayer	068eb92710	Make ObjectId, RefSpec, RemoteConfig, URIish serializable Modifications to various classes in order to allow serialization for use of JGit in Hudson's git plugin. Change-Id: If088717d3da7483538c00a927e433a74085ae9e6	2010-06-16 16:10:28 -07:00
Mathias Kinzler	3c51b35e03	"Bare" Repository should not return working directory. If a repository is "bare", it currently still returns a working directory. This conflicts with the specification of "bare"-ness. Bug: 311902 Change-Id: Ib54b31ddc80b9032e6e7bf013948bb83e12cfd88 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>	2010-06-16 08:50:26 +02:00
Chris Aniszczyk	8a11ac3d69	Merge "Add missing @Override tags in AlternateRepositoryDatabase"	2010-06-15 11:40:04 -04:00
Mathias Kinzler	c1c1300a74	Allow to read configured keys Currently, there is no way to read the content of the Git Configuration in a way that would allow to list all configured values generically. This change extends the Config class in such a way as to being able to get a list of sections and to get a list of names for any given section or subsection. This is required in able to implement proper configuration handling in EGit (show all the content of a given configuration similar to "git config -l"). Change-Id: Idd4bc47be18ed0e36b11be8c23c9c707159dc830 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>	2010-06-15 10:12:26 +02:00
Shawn Pearce	86fcdc53ad	Merge changes I53f71dc0,I3a899a3a,I3e8bd245,Ie7c9db83,If396326e,I6f4cf8da,I3bf96dd0,I3a2a43a1,I292fe88c,Ia1cf40cf * changes: git-servlet: Fix comparing uploadFactory with the wrong DISABLED instance Prefer static inner classes Override equals for SwingLane since super class PlotLane defines it Make sure a Stream is closed upon errors in IpLogGenerator Make constant static in RebuildCommitGraph Make inner classes static in http code Cache filemode in GitIndex Remove unused parent field in PlotLane Removed unused repo field in WorkDirCheckout Extend DiffFormatter API to simplify styling	2010-06-14 19:59:48 -04:00
Shawn O. Pearce	bc238acdc5	Add missing @Override tags in AlternateRepositoryDatabase Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-14 12:59:30 -07:00
Shawn O. Pearce	44ba1bc78c	Merge branch 'stable-0.8' * stable-0.8: Qualify post-0.8.4 builds JGit 0.8.4 JGit 0.8.3 Include about.html in org.eclipse.jgit artifact Fix build.properties of the JGit feature Added the standard SULA for JGit Add "resources/" as a source folder Change-Id: I4ecb0af41184ef84d104345fd1adcc4a240a38f6	2010-06-14 08:12:48 -07:00
Shawn O. Pearce	239ce58553	Start 0.9 development Change-Id: I84173ece5100f1fcb78168e2e102b649d9466c08 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-14 08:11:27 -07:00
Shawn O. Pearce	d28a40d679	Qualify post-0.8.4 builds Change-Id: I21efed66921eb7e1e4010fccc9fa9af6c4150fc1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-14 08:10:08 -07:00
Matthias Sohn	6970edf35a	JGit 0.8.4 Created wrong tags for 0.8.3 hence creating another version. Change-Id: I4e00bbcffe1cf872e2d7e3f3d88d068701fb5330 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-06-14 15:42:09 +02:00
Matthias Sohn	5255d66143	JGit 0.8.3 Change-Id: I845da83c74475d74ec25d68f53c0a4738a898550 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-06-14 01:34:34 +02:00
Robin Rosenberg	3bf96dd04b	Cache filemode in GitIndex Apparently this was the intention, but never happened Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-06-13 03:16:32 +02:00
Robin Rosenberg	3a2a43a1dc	Remove unused parent field in PlotLane Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-06-13 03:13:57 +02:00
Robin Rosenberg	292fe88c50	Removed unused repo field in WorkDirCheckout Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-06-13 03:12:41 +02:00
Robin Rosenberg	ce56c5dcc9	Extend DiffFormatter API to simplify styling Refactor and extend the internals so users can override and intervene during formatting, e.g. to colorize output. Change-Id: Ia1cf40cfd4a5ed7dfb6503f8dfc617237bee0659 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-06-12 15:31:04 +02:00
Matthias Sohn	4269a90aac	Include about.html in org.eclipse.jgit artifact This is required to enable accessing legal info for org.eclipse.jgit from Help > About > Installation Details > Plugins Change-Id: I73f40dd2018112cd23102954d7647ecdbbbf0d89 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-06-08 02:41:37 +02:00
Matthias Sohn	ab360d06de	Add "resources/" as a source folder Building jgit with pde.build was broken without resources. Bug:315823 Change-Id: I45be510ada068b3ffab0feb30ec60f2c96a5ca32 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-06-05 14:39:27 +02:00
Marc Strapetz	936e4ab2f2	Repository can be configured with FS On Windows, FS_Win32_Cygwin has been used if a Cygwin Git installation is present in the PATH. Assuming that the user works with the Cygwin Git installation may result in unnecessary overhead if he actually does not. Applications built on top of jgit may have more knowledge on the actually used Git client (Cygwin or not) and hence should be able to configure which FS to use accordingly. Change-Id: Ifc4278078b298781d55cf5421e9647a21fa5db24	2010-06-04 19:08:58 -07:00
Robin Rosenberg	920d89d6af	Add support for computing a Change-Id à la Gerrit A Change-Id helps tools like Gerrit Code Review to keeps different versions of a patch together. The Change-Id is computed as a SHA-1 hash of some of the same basic information as a commit id on the first commit intended to solve a particular problem and then reused for updated solutions. Change-Id: I04334f84e76e83a4185283cb72ea0308b1cb4182 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-06-04 18:42:14 -07:00
Alex Blewitt	046d1a2ef6	Provide a public entry method to determine whether a URI protocol is supported	2010-06-04 00:38:50 +01:00
Shawn O. Pearce	d8ec8527a6	Qualify post-0.8.1 builds Change-Id: Id86e5876b2f684b2a272c07061a276b054ba410d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-02 15:55:39 -07:00
Shawn O. Pearce	be86767d71	JGit 0.8.1 Change-Id: I3d4ac7d0617a3575019e2ed748ed2a298a988340 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-02 14:47:31 -07:00
Shawn O. Pearce	16419dad35	Don't use interruptable pread() to access pack files The J2SE NIO APIs require that FileChannel close the underlying file descriptor if a thread is interrupted while it is inside of a read or write operation on that channel. This is insane, because it means we cannot share the file descriptor between threads. If a thread is in the middle of the FileChannel variant of IO.readFully() and it receives an interrupt, the pack will be automatically closed on us. This causes the other threads trying to use that same FileChannel to receive IOExceptions, which leads to the pack getting marked as invalid. Once the pack is marked invalid, JGit loses access to its entire contents and starts to report MissingObjectExceptions. Because PackWriter must ensure that the chosen pack file stays available until the current object's data is fully copied to the output, JGit cannot simply reopen the pack when its automatically closed due to an interrupt being sent at the wrong time. The pack may have been deleted by a concurrent `git gc` process, and that open file descriptor might be the last reference to the inode on disk. Once its closed, the PackWriter loses access to that object representation, and it cannot complete sending the object the client. Fortunately, RandomAccessFile's readFully method does not have this problem. Interrupts during readFully() are ignored. However, it requires us to first seek to the offset we need to read, then issue the read call. This requires locking around the file descriptor to prevent concurrent threads from moving the pointer before the read. This reduces the concurrency level, as now only one window can be paged in at a time from each pack. However, the WindowCache should already be holding most of the pages required to handle the working set for a process, and its own internal locking was already limiting us on the number of concurrent loads possible. Provided that most concurrent accesses are getting hits in the WindowCache, or are for different repositories on the same server, we shouldn't see a major performance hit due to the more serialized loading. I would have preferred to use a pool of RandomAccessFiles for each pack, with threads borrowing an instance dedicated to that thread whenever they needed to page in a window. This would permit much higher levels of concurrency by using multiple file descriptors (and file pointers) for each pack. However the code became too complex to develop in any reasonable period of time, so I've chosen to retrofit the existing code with more serialization instead. Bug: 308945 Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-27 08:27:32 -07:00
Stefan Lay	5b0e73b849	Add a merge command to the jgit API Merges the current head with one other commit. In this first iteration the merge command supports only fast forward and already up-to-date. Change-Id: I0db480f061e01b343570cf7da02cac13a0cbdf8f Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	2010-05-24 09:52:28 -05:00
Christian Halstrick	6ca9843f3e	Added merge support to CommitCommand The CommitCommand should take care to create a merge commit if the file $GIT_DIR/MERGE_HEAD exists. It should then read the parents for the merge commit out of this file. It should also take care that when commiting a merge and no commit message was specified to read the message from $GIT_DIR/MERGE_MSG. Finally the CommitCommand should remove these files if the commit succeeded. Change-Id: I4e292115085099d5b86546d2021680cb1454266c Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	2010-05-21 01:49:46 +02:00
Sasa Zivkov	f3d8a8ecad	Externalize strings from JGit The strings are externalized into the root resource bundles. The resource bundles are stored under the new "resources" source folder to get proper maven build. Strings from tests are, in general, not externalized. Only in cases where it was necessary to make the test pass the strings were externalized. This was typically necessary in cases where e.getMessage() was used in assert and the exception message was slightly changed due to reuse of the externalized strings. Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>	2010-05-19 14:37:16 -07:00
Shawn O. Pearce	2e961989e4	Fix SSH deadlock during OutOfMemoryError In close() method of SshFetchConnection and SshPushConnection errorThread.join() can wait forever if JSch will not close the channel's error stream. Join with a timeout, and interrupt the copy thread if its blocked on data that will never arrive. Bug: 312863 Change-Id: I763081267653153eed9cd7763a015059338c2df8 Reported-by: Dmitry Neverov <dmitry.neverov@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-19 11:43:42 -07:00
Dmitry Neverov	b3247ba524	Fix race condition in StreamCopyThread If we get an interrupt during an IO operation (src.read or dst.write) caused by the flush() method incrementing the flush counter, ensure we restart the proper section of code. Just ignore the interrupt and continue running. Bug: 313082 Change-Id: Ib2b37901af8141289bbac9807cacf42b4e2461bd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-19 11:40:33 -07:00
Shawn O. Pearce	ae972e774e	Remove unnecessary truncation of in-pack size during copy The number of bytes to copy was truncated to an int, but the pack's copyToStream() method expected to be passed a long here. Pass through the long so we don't truncate a giant object. Change-Id: I0786ad60a3a33f84d8746efe51f68d64e127c332 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-17 07:13:55 -07:00
Shawn O. Pearce	b6d0586bef	Reduce the size of PackWriter's ObjectToPack instances Rather than holding onto the PackedObjectLoader, only hold the PackFile and the object offset. During a reuse copy that is all we should need to complete a reuse, and the other parts of the PackedObjectLoader just waste memory. This change reduces the per-object memory usage of a PackWriter by 32 bytes on a 32 bit JVM using only OFS_DELTA formatted objects. The savings is even larger (by another 20 bytes) for REF_DELTAs. This is close to a 50% reduction in the size of ObjectToPack, making it rather worthwhile to do. Beyond the memory reduction, this change will help to make future refactoring work easier. We need to redo the API used to support copying data, and disconnecting it from the PackedObjectLoader is a good first step. Change-Id: I24ba4e621e101f14e79a16463aec5379f447aa9b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-15 17:51:11 -07:00
Shawn O. Pearce	cb5bc19540	Reduce size of PackedObjectLoader by dropping long to int Rather than keep track of both the position of the object, and the position of its data, just keep track of the number of bytes used by the object's header in the pack. This shaves 4 bytes out of the size of the PackedObjectLoader instances. We also can defer the addition instruction to the materialize() operation, avoiding it entirely if the caller never actually uses the loader. This may be relevant for PackWriter invocations, where only 1 loader gets chosen for a given object, even though the object may appear on disk in more than one pack file. Error reporting is now simplified, as we can rely on the object offset rather than its data offset. This is the value displayed by pack debugging tools like `git verify-pack -v`, so its better to use that in our own errors. Because nobody needs getDataOffset() now, we can drop that from the public API. Change-Id: Ic639c0d5a722315f4f5c8ffda6e26643d90e5f42 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-15 17:37:18 -07:00
Shawn O. Pearce	9c4d42e94d	Factor out duplicate Inflater setup in WindowCursor Since we use this code twice, pull it into a private method. Let the compiler/JIT worry about whether or not this logic should be inlined into the call sites. Change-Id: Ia44fb01e0328485bcdfd7af96835d62b227a0fb1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-15 16:18:44 -07:00
Shawn O. Pearce	d8f20745bf	Squash OffsetCache into WindowCache Originally when I wrote this code I had hoped to use OffsetCache to also implement the UnpackedObjectCache. But it turns out they need rather different code, and it just wasn't worth trying to reuse the OffsetCache base class. Before doing any major refactoring or code cleanups here, squash the two classes together and delete OffsetCache. As WindowCache is our only subclass, this is pretty simple to do. We also get a minor code reduction due to less duplication between the two classes, and the JIT should be able to do a better job of optimization here as we can define types up front rather than relying on generics that erase back to java.lang.Object. Change-Id: Icac8bda01260e405899efabfdd274928e98f3521 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-15 16:12:13 -07:00
Shawn O. Pearce	3cb59216f5	Avoid unnecessary second read on OBJ_OFS_DELTA headers When we read the object header we copy 20 bytes from the pack data, then start parsing out the type and the inflated size. For most objects, this is only going to require 3 bytes, which is sufficient to represent objects with inflated sizes of up to 2^16. The local buffer however still has 17 bytes remaining in it, and that can be used to satisfy the OBJ_OFS_DELTA header. We shouldn't need to worry about walking off the end of the buffer here, because delta offsets cannot be larger than 64 bits, and that requires only 9 bytes in the OFS_DELTA encoding. Assuming worst-case scenarios of 9 bytes for the OFS_DELTA encoding, the pack file itself must be approaching 2^64 bytes, an infeasible size to store on any current technology. However, even if this were the case we still have 11 bytes for the type/size header. In that encoding we can represent an object as large as 2^74 bytes, which is also an infeasible size to process in JGit. So drop the second read here. The data offsets we pass into the ObjectLoaders being constructed need to be computed individually now. This saves a local variable, but pushes the addition operation into each branch of the switch. Change-Id: I6cf64697a9878db87bbf31c7636c03392b47a062 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-15 16:09:07 -07:00
Shawn O. Pearce	3cba5377df	Fix hang when fetching over SSH JSch may hang or abort with the timeout if JGit connects before its obtained the streams. Instead defer the connect() call until after the streams have been configured. Bug: 312383 Change-Id: I7c3a687ba4cb69a41a85e2b60d381d42b9090e3f Reported-by: Dmitry Neverov <dmitry.neverov@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-13 10:23:33 -07:00
Shawn O. Pearce	f999b4aa63	Fix interrupted write in StreamCopyThread If a flush() gets delivered at the same time that we are blocking while writing to an interruptable stream, the copy thread will abort assuming its a stream error. Instead ignore the interrupt, and retry the write. Change-Id: Icbf62d1b8abe0fabbb532dbee088020eecf4c6c2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-13 09:58:21 -07:00
Dmitry Neverov	3f143b8d6b	Fix missing flush in StreamCopyThread It is possible to miss flush() invocation in StreamCopyThread. In this case some data will not be sent to remote host and we will wait forever (or until timeout) in src.read(). Use a counter to keep track of the flush requests. Change-Id: Ia818be9b109a1674d9e2a9c78e125ab248cfb75b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-13 09:58:18 -07:00
Christian Halstrick	f3fb5824ba	Add builder-style API to jgit and Commit & Log cmd Added a new package org.eclipse.jgit.api and a builder-style API for jgit. Added also the first implementation for two git commands: Commit and Log. This API is intended to be used by external components when functionalities of the standard git commands are required. It will also help to ease writing JGit tests. For internal usages this API may often not be optimal because the git commands are doing much more than required or they expect parameters of an unappropriate type. Change-Id: I71ac4839ab9d2f848307eba9252090c586b4146b Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	2010-05-10 15:17:55 +02:00
Robin Rosenberg	541ad72ac6	Merge "Added MERGING_RESOLVED repository state"	2010-05-08 17:16:26 -04:00
Robin Rosenberg	0df679aea1	Merge "A stages field and getter for GitIndex entry introduced"	2010-05-08 17:13:25 -04:00
Robin Rosenberg	a496410df9	A stages field and getter for GitIndex entry introduced Currently, if the Index contains a file in more than one stage, only the last entry (containing the highest stage) will be registered in GitIndex. For applications it can be useful to not only know about the highest stage, but also which other stages are present, e.g. to detect the type of conflict the file is in. Change-Id: I2d4ff9f6023335d9ba6ea25d8e77c8e283ae53cb Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-05-08 23:12:19 +02:00
Christian Halstrick	b9ab040b45	Added MERGING_RESOLVED repository state The repository state tells in which state the repo is and also which actions are currently allowed. The state MERGING is telling that a commit is not possible. But this is only true in the case of unmerged paths in the index. When we are merging but have resolved all conflicts then we are in a special state: We are still merging (means the next commit should have multiple parents) but a commit is now allowed. Since the MERGING state "canCommit()" cannot be enhanced to return true/false based on the index state (MERGING is an enum value which does not have a reference to the repository its state it is representing) I had to introduce a new state MERGING_RESOLVED. This new state will report that a commit is possible. CAUTION: there might be the chance that users of jgit previously blindly did a plain commit (with only one parent) when the RepositoryState allowed them to do so. With this change these users will now be confronted with a RepositoryState which says a commit is possible but before they can commit they'll have to check the MERGE_MESSAGE and MERGE_HEAD files and use the info from these files. Change-Id: I0a885e2fe8c85049fb23722351ab89cf2c81a431 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-05-08 22:03:18 +02:00
Shawn O. Pearce	dd63f5cfc1	Fix FooterLine.matches(FooterKey) on same length keys If two keys are the same length, but don't share the same sequence of characters, we were incorrectly claiming they still matched due to a bug in the for loop condition. I used the wrong variable and the loop never executed, resulting in equality anytime the two keys being compared were the same length. Use the proper local variable to loop through the arrays, and add a JUnit test to verify equality works as expected. Change-Id: I4a02400e65a9b2e0da925b05a2cc4b579e1dd33a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-05-04 16:25:20 -07:00
Chris Aniszczyk	d011a377cb	Merge "Fix handling of corruption for truncated objects"	2010-05-03 03:40:36 -04:00
Chris Aniszczyk	28e42cb463	Merge "Don't insert the same pack twice into a pack list"	2010-05-03 03:40:06 -04:00
Chris Aniszczyk	11096a89a5	Merge changes I0d339b9f,I0e6673b8 * changes: Favor earlier PackFile instances over later duplicates Cleanup duplicated object reuse code in PackWriter	2010-05-03 03:39:47 -04:00
Robin Rosenberg	c10e134157	Fix handling of corruption for truncated objects If a loose object was corrupted by truncation, JGit would hang. Change-Id: I7e4c14f44183a5fcb37c1562e81682bddeba80ad Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-05-01 09:50:38 +02:00
Chris Aniszczyk	f1946b0669	Cleaning up provider and feature names It is incorrect to use Eclipse.org as the providerName now, we'll use Eclipse JGit. Change-Id: I1621b93d4f401176704e7c43935a5ce0c8ee8419 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	2010-04-27 09:26:25 -05:00
Shawn O. Pearce	374c28057a	Don't insert the same pack twice into a pack list If a concurrent thread picks up a newly created PackFile and adds it to the pack list before the IndexPack thread itself can insert the item onto the front of the list, do nothing and use the item that was picked up by that other concurrent scanning thread. This avoids a potential condition where the same pack exists in memory twice, which causes confusion later during a rescan of the directory because we don't know exactly which PackFile instance should be retained into the new list, and which should be discarded. We can stop searching through the old pack list as soon as the sort function declares that the item to insert should be before the item already in the list. Because the list is always sorted by modification time (in seconds), we should never encounter a case where the pack is positioned at the wrong spot in the list. This early break out still permits an efficient implementation of the common case, inserting a new pack at the head of the list. Change-Id: Ice4459bbd4ee9487078aff5257893883d04f05fb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-04-26 17:33:53 -07:00
Shawn O. Pearce	a0a52897ed	Favor earlier PackFile instances over later duplicates There is a potential race condition during insertPack that can lead to us having the same pack file open twice in the same directory. A different thread can miss an object on disk, and trigger a scan of the directory, and notice the pack that was put in by IndexPack. So the pack winds up in the newly created PackList. The IndexPack thread then wakes up and finishes its insertPack by creating a new PackFile and inserting it into position 0 of the list. We now have the same pack listed twice. Readers will favor the earlier PackFile instance, because its the first one they come across as they iterate through the list. Keep that earlier one when we scan the pack directory again, as this will avoid needing to purge out all of the windows that may have been cached. Of course we should also fix that race condition, but this block was taking the wrong resolution if this error ever shows up, so lets first fix the block to use a more sane resolution. Change-Id: I0d339b9fd1dd8012e8fe5a564b893c0f69109e28 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-04-26 17:32:04 -07:00
Shawn O. Pearce	eeed0abd16	Cleanup duplicated object reuse code in PackWriter This reuse line was identical between the two branches related to reusing a delta, or reusing a whole object. Either way they reuse the body of the object as-is. So just make that a common function after the header is written. Change-Id: I0e6673b8e813c8c08c594ea2ba546fd366339d5d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-04-26 17:29:10 -07:00
Robin Rosenberg	4ef96296f7	Merge "Fix NPE during InflaterCache return after corrupt loose object"	2010-04-24 08:19:01 -04:00
Shawn O. Pearce	dafa8fbff4	Fix NPE during InflaterCache return after corrupt loose object If a corrupt loose object is read, UnpackedObjectLoader was disposing of the Inflater, and then attempting to return the disposed Inflater to the InflaterCache. Since the disposed Inflater had its native libz resource deallocated and its reference cleared out, the Inflater threw NullPointerException and refused to reset itself before being put back into the cache. Instead of disposing of the Inflater when corruption is found, do nothing, and allow it to be returned to the cache. The instance will get reset, and should be usable by a future caller. Bug: 310291 Change-Id: I44f2247c08b6e04fa62f8399609341b07508c096 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-04-23 11:16:25 -07:00
Shawn O. Pearce	f36df5dc6a	Merge branch 'receive-pack-filter' * receive-pack-filter: ReceivePack: Clarify the check reachable option ReceivePack: Micro-optimize object lookup when checking connectivity ReceivePack: Correct type of not provided object IndexPack: Tighten up new and base object bookkeeping ReceivePack: Remove need new,base object id properties ReceivePack: Discard IndexPack as soon as possible ReceivePack: fix ensureProvidedObjectsVisible on thin packs Change-Id: I4ef2fcb931f3219872e0519abfcee220191d5133	2010-04-19 18:20:42 -07:00
Matthias Sohn	9605fcc0fb	Merge "ObjectIdSubclassMap: Correct Iterator to throw NoSuchElementException"	2010-04-17 18:35:38 -04:00
Matthias Sohn	f1be93eb87	Merge "ObjectIdSubclassMap: Add isEmpty() method"	2010-04-17 18:29:16 -04:00
Robin Rosenberg	c2960cdf65	Merge "IndexPack: Correct thin pack fix using less than 20 bytes"	2010-04-17 07:26:45 -04:00
Shawn O. Pearce	585dcb7a1c	ReceivePack: Clarify the check reachable option This option was mis-named from day 1. Its not checking that the objects provided by the client are reachable, its actually doing a scan to prove that objects referenced by the client are already reachable through another reference on the server, or were sent as part of the pack from the client. Rename it checkReferencedObjectsAreReachable, since we really are trying to validate that objects referenced by the client's actions are reachable to the client. We also need to ensure we run checkConnectivity() anytime this is enabled, even if the caller didn't turn on fsck for object formats. Otherwise the check would be completely bypassed. Change-Id: Ic352ddb0ca8464d407c6da5c83573093e018af19 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-04-16 17:04:38 -07:00
Shawn O. Pearce	a770205070	ReceivePack: Micro-optimize object lookup when checking connectivity If we are checking the visibility of everything referenced in the pack that isn't already reachable by a reference, it needs to be in the provided set. Since the provided set lists everything that is in this pack, we can avoid checking to see if the blob exists on disk, because we know it should be there, it was found in the pack we just consumed. Change-Id: Ie3c7746f734d13077242100a68e048f1ac18c34a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-04-16 17:04:38 -07:00
Shawn O. Pearce	6029bb24ad	ReceivePack: Correct type of not provided object If a tree was referenced but not provided in the pack, report it as a missing tree and not as a missing blob. Change-Id: Iab05705349cdf0d30cc3f8afc6698a8d2a941343 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-04-16 17:04:37 -07:00
Shawn O. Pearce	2bb8defa54	IndexPack: Tighten up new and base object bookkeeping The only current consumer of these collections is ReceivePack, where it needs to test ObjectId equality between a RevObject and an ObjectId. There we were copying from a traditional HashSet<ObjectId> into an ObjectIdSubclassMap<ObjectId>, as the latter can perform hashing using ObjectId's native value support, bypassing RevObject's override on hashCode() and equals(). Instead of doing that copy, directly create ObjectIdSubclassMap instances inside of ReceivePack. We also only need to record the objects that do not appear in the incoming pack, and were therefore copied from the local repositiory in order to complete delta resolution. Instead of listing everything that used an OBJ_REF_DELTA format, list only the objects that we pulled from the destination repository via a normal ObjectLoader. ReceivePack can now discard the IndexPack object, and all of its other data, as soon as these collections are held by the check connectivity method. This frees up memory for the ObjectWalk's own RevObject pool. Change-Id: I22ef71b45c2045a0202e7fd550a770ee1f6f38a6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-04-16 17:04:26 -07:00
Shawn O. Pearce	329a0e1689	ReceivePack: Remove need new,base object id properties These are more like internal implementation details of how IndexPack works with ReceivePack to validate the incoming object stream. Callers who are embedding the ReceivePack logic in their own application don't really need to know the details of which objects were used for delta bases in the incoming thin pack, or exactly which objects were newly transmitted. Hide these from the API, as exposing them through ReceivePack was an early mistake. Change-Id: I7ee44a314fa19e6a8520472ce05de92c324ad43e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-04-16 16:32:33 -07:00
Shawn O. Pearce	8279361de8	ReceivePack: Discard IndexPack as soon as possible The IndexPack object carries a good bit of state within itself about the objects received over the wire. The earlier we can discard it, the sooner the GC is able to reclaim this chunk of memory for other uses. So drop it as soon as we are certain the pack is valid and we have no connectivity concerns. Change-Id: I1e8bc87c2e9183733043622237a064e55957891f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-04-16 16:32:33 -07:00
Shawn O. Pearce	7a91b180c1	ReceivePack: fix ensureProvidedObjectsVisible on thin packs If ensureProvidedObjectsVisible is enabled we expected any trees or blobs directly reachable from an advertised reference to be marked with UNINTERESTING. Unfortunately ObjectWalk doesn't bother setting this until the traversal is complete. Even then it won't necessarily set it on every tree if the corresponding commit wasn't popped. When we are going to check the base objects for the received pack, ensure the UNINTERESTING flag gets carried into every immediately reachable tree or blob, because these are the ones that the client might try to use as delta bases in a thin pack. Change-Id: I5d5fdcf07e25ac9fc360e79a25dff491925e4101 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-04-16 16:32:23 -07:00
Shawn O. Pearce	466bec3cc9	ObjectIdSubclassMap: Correct Iterator to throw NoSuchElementException The Iterator contract says next() shall throw NoSuchElementException if there are no more items remaining in the iteration. We got this wrong when I originally wrote the implementation, so fix it. Change-Id: Iea25e6569ead5c8b3128b8a368c5b2caebec7ecc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-04-16 16:30:21 -07:00

... 3 4 5 6 7 ...

588 Commits