motiejus/jgit - jgit - gitea: Gitea Service

motiejus

jgit

Author	SHA1	Message	Date
Shawn O. Pearce	1c3f3fdbd2	Fix ObjectDirectory abbreviation resolution to notice new packs If we can't resolve an abbreviation, it might be because there is a new pack file we haven't picked up yet. Try scanning the packs again and recheck each pack if there were differences from the last scan we did. Because of this, we don't have to open a pack during the test where we generate a pack on the fly. We'll miss on the first loop during which the PackList is the NO_PACKS magic initialization constant, and pick up the newly created index during this retry logic. Change-Id: I7b97efb29a695ee60c90818be380f7ea23ad13a3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-24 17:37:07 -07:00
Shawn O. Pearce	a5c18fcfc7	Fully implement SHA-1 abbreviations ObjectReader implementations are now responsible for creating the unique abbreviation of an ObjectId, or for resolving an abbreviation back to its full form. In this latter case the reader can offer up multiple candidates to the caller, who may be able to disambiguate them based on context. Repository.resolve() doesn't take multiple candidates into account right now, but it could in the future by looking for a remaining ^0 or ^{commit} suffix and take an expansion if there is only one commit that matches the input abbreviation. It could also use the distance from an annotated tag to resolve "tag-NNN-gcommit" style strings that are often output by `git describe`. Change-Id: Icd3250adc8177ae05278b858933afdca0cbbdb56 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-23 15:53:11 -07:00
Shawn O. Pearce	32466c33ba	Delete deprecated ObjectWriter ObjectWriter is a deprecated API that people shouldn't be using. So get rid of it in favor of the ObjectInserter API. Change-Id: I6218bcb26b6b9ffb64e3e470dba5dca2e0a62fd4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-23 10:59:30 -07:00
Shawn O. Pearce	9d5b926ed1	Add openEntryStream to WorkingTreeIterator This makes it easier for abstract tools like AddCommand to open the file from the working tree, without knowing internal details about how the tree is managed. Change-Id: Ie64a552f07895d67506fbffb3ecf1c1be8a7b407 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-23 10:30:58 -07:00
Shawn O. Pearce	edd8029558	Add setLength(long) to DirCacheEntry Applications should favor the long style interface, especially when their source input is a long type, e.g. coming from java.io.File. This way when the index format is later changed to support a larger file size than 2 GiB we can handle it by just changing the entry code, and not need to fix a lot of applications. Change-Id: I332563caeb110014e2d544dc33050ce67ae9e897 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-23 10:29:50 -07:00
Shawn O. Pearce	6df5d3397c	Move commit and tag formatting to CommitBuilder, TagBuilder These objects should be responsible for their own formatting, rather than delegating it to some obtuse type called ObjectInserter. While we are at it, simplify the way we insert these into a database. Passing in the type and calling format in application code turned out to be a huge mistake in terms of ease-of-use of the insert API. Change-Id: Id5bb95ee56aa2a002243e9b7853b84ec8df1d7bf Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-23 10:13:29 -07:00
Shawn O. Pearce	22b285695a	Rename Commit, Tag to CommitBuilder, TagBuilder Since these types no longer support reading, calling them a Builder is a better description of what they do. They help the caller to build a commit or a tag object. Change-Id: I53cae5a800a66ea1721b0fe5e702599df31da05d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-23 09:46:14 -07:00
Shawn O. Pearce	6a51d97948	Add documentation explaining how to read Commit and Tag Since we stopped supporting these types for reading, but their name is a natural candidate for someone to try and use in code, explain where they should be looking instead. Change-Id: I091a1b0ef71b842016020f938ba3161431aab9c9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-23 09:40:41 -07:00
Christian Halstrick	5fc990130b	Improved creation of JGitInternalException There where 3 cases where a JGitInternalExcption was created without specifying the root cause. This has been fixed. Change-Id: I2ee08d04732371cd9e30874b1437b61217770b6a Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	2010-08-23 10:20:43 +02:00
Marc Strapetz	e2e38792b5	Perform automatic CRLF to LF conversion during WorkingTreeIterator WorkingTreeIterator now optionally performs CRLF to LF conversion for text files. A basic framework is left in place to support enabling (or disabling) this feature based on gitattributes, and also to support the more generic smudge/clean filter system. As there is no gitattribute support yet in JGit this is left unimplemented, but the mightNeedCleaning(), isBinary() and filterClean() methods will provide reasonable places to plug that into in the future. [sp: All bugs inside of WorkingTreeIterator are my fault, I wrote most of it while cherry-picking this patch and building it on top of Marc's original work.] CQ: 4419 Bug: 301775 Change-Id: I0ca35cfbfe3f503729cbfc1d5034ad4abcd1097e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 18:03:07 -07:00
Shawn O. Pearce	2b23aac1c0	Expose pack fetch/push connections for subclassing These classes need to be visible if an application wants to define its own native pack based protocol embedded within another layer, much like we already support for smart HTTP. Change-Id: I7e2ac3ad01d15b94d340128a395fe0b2f560ff35 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:59:36 -07:00
Shawn O. Pearce	28ba4747bc	Allow ObjectReuseAsIs to have more control over write ordering The reuse system used by an object database may be able to benefit from knowing what objects are coming next, and even improve data throughput by delaying (or moving up) objects that are stored near each other in the source database. Pushing the iteration down into the reuse code makes it possible for a smarter implementation to aggregate reuse. But for the standard pack file format on disk we don't bother, its quite efficient already. Change-Id: I64f0048ca7071a8b44950d6c2a5dfbca3be6bba6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:59:36 -07:00
Shawn O. Pearce	fe18e52195	Allow ObjectToPack subclasses to use up to 4 bits of flags Some instances may benefit from having access to memory efficient storage for some small values, like single flag bits. Give up a portion of our delta depth field to make 4 bits available to any subclass that wants it. This still gives us room for delta chains of 1,048,576 objects, and that is just insane. Unpacking 1 million objects to get to something is longer than most users are willing to wait for data from Git. Change-Id: If17ea598dc0ddbde63d69a6fcec0668106569125 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:41:27 -07:00
Shawn O. Pearce	f048af3fd1	Implement async/batch lookup of object data An ObjectReader implementation may be very slow for a single object, but yet support bulk queries efficiently by batching multiple small requests into a single larger request. This easily happens when the reader is built on top of a database that is stored on another host, as the network round-trip time starts to dominate the operation cost. RevWalk, ObjectWalk, UploadPack and PackWriter are the first major users of this new bulk interface, with the goal being to support an efficient way to pack a repository for a fetch/clone client when the source repository is stored in a high-latency storage system. Processing the want/have lists is now done in bulk, to remove the high costs associated with common ancestor negotiation. PackWriter already performs object reuse selection in bulk, but it now can also do the object size lookup and object counting phases with higher efficiency. Actual object reuse, deltification, and final output are still doing sequential lookups, making them a bit more expensive to perform. Change-Id: I4c966f84917482598012074c370b9831451404ee Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:41:27 -07:00
Shawn O. Pearce	11a5bef8b1	Offer ObjectReaders advice about a RevWalk By giving the reader information about the roots of a revision traversal, some readers may be able to prefetch information from their backing store using background threads in order to reduce data access latency. However this isn't typically necessary so the default reader implementation doesn't react to the advice. Change-Id: I72c6cbd05cff7d8506826015f50d9f57d5cda77e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:41:27 -07:00
Shawn O. Pearce	b85af06324	Allow object reuse selection to occur in parallel ObjectReader implementations may wish to use multiple threads in order to evaluate object reuse faster. Let the reader make that decision by passing the iteration down into the reader. Because the work is pushed into the reader, it may need to locate a given ObjectToPack given its ObjectId. This can easily occur if the reader has sent a list of ObjectIds to the object database and gets back information keyed only by ObjectId, without the ObjectToPack handle. Expose lookup using the PackWriter's own internal map, so the reader doesn't need to build a redundant copy to track the assocation of ObjectId back to ObjectToPack. Change-Id: I0c536405a55034881fb5db92a2d2a99534faed34 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:41:27 -07:00
Shawn O. Pearce	cc6210619b	Flush the pack header as soon as its ready When the output stream is deeply buffered (e.g. 1 MiB or more in an HTTP servlet on some containers) trying to kick out the header earlier will prevent the client from stalling hard while the first 1 MiB is received and it can process the pack header. Forcing a flush here lets the client see the header and start its progress monitor for "Receiving objects: (1/N)" so the user knows there is still activity occurring, even though the buffering may cause there to be some lag as the buffer fills up on the sending side. Change-Id: I3edf39e8f703fe87a738dc236d426b194db85e3a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:41:27 -07:00
Shawn O. Pearce	de78cf3367	Export the ObjectId on MissingObjectException Callers catching a MissingObjectException may need programmatic access to the ObjectId that wasn't available in the repository. Change-Id: I2be0380251ebe7e4921fa74e246724e48ad88b0e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:41:27 -07:00
Shawn O. Pearce	69f8fa31be	Expose OBJ_ANY in ObjectReader Storage implementations or application code using an ObjectReader may want to access this constant without being inside of a subclass of the reader. Change-Id: I6c871a03d5846b9bb899de4d14a265e8b204d8e0 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:41:27 -07:00
Shawn O. Pearce	109c695936	Expose getType in ObjectToPack Storage implementations may find this useful when implementing the ObjectReuseAsIs interface on their ObjectReader. Expose it so we don't force them to create a redundant copy of the information. Change-Id: I802ec8113c00884fccde5d0e92b9849716316f62 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:41:27 -07:00
Shawn O. Pearce	d1ebc4aa00	Add copyTo(ByteBuffer) to AnyObjectId Change-Id: I3572f6113db883002f9c3a5ecc1bcc8370105c98 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:41:27 -07:00
Shawn O. Pearce	8878d301ac	Add copyTo(byte[], int) to AnyObjectId This permits formatting in hex into an existing byte array supplied by the caller, and mirrors our copyRawTo method with the same parameter signature. Change-Id: Ia078d83e338b09b903bfd2d04284e5283f885a19 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:41:27 -07:00
Shawn O. Pearce	540df6c9fe	Add a public RevTag.parse() method Callers might have a canonical tag encoding on hand that they wish to convert into a clean structure for presentation purposes, and the object may not be available in a repository. (E.g. maybe its a "draft" tag being written in an editor.) Change-Id: I387a462afb70754aa7ee20891e6c0262438fdf32 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:38:53 -07:00
Shawn O. Pearce	b205597b91	Add a public RevCommit.parse() method Callers might have a canonical commit encoding on hand that they wish to convert into a clean structure for presentation purposes, and the object may not be available in a repository. (E.g. maybe its a "draft" commit being written in an editor.) Change-Id: I21759cff337cbbb34dbdde91aec5aa4448a1ef37 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:38:53 -07:00
Shawn O. Pearce	707912b35d	Make Tag class only for writing The Tag class now only supports the creation of an annotated tag object. To read an annotated tag, applictions should use RevTag. This permits us to have exactly one implementation, and RevTag's is faster and more bug-free. Change-Id: Ib573f7e15f36855112815269385c21dea532e2cf Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:38:53 -07:00
Shawn O. Pearce	b46b635c03	Make Commit class only for writing The Commit class now only supports the creation of a commit object. To read a commit, applictions should use RevCommit. This permits us to have exactly one implementation, and RevCommit's is faster and more bug-free. Change-Id: Ib573f7e15f36855112815269385c21dea532e2cf Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:38:52 -07:00
Shawn O. Pearce	cf9537c8ce	Correct PersonIdent hashCode() and equals() to ignore milliseconds Git doesn't store millisecond accuracy in person identity lines, so a line that we create in Java and round-trip through a Git object wouldn't compare as being equal. Truncate to seconds when comparing values to ensure the same identity is equal. Change-Id: Ie4ebde64061f52c612714e89ad34de8ac2694b07 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:38:52 -07:00
Shawn O. Pearce	746ebda381	Try really hard to load a commit or tag When we need the canonical form of a commit or a tag in order to parse it into our RevCommit or RevTag fields, we really need it as a single contiguous byte array. However the ObjectDatabase may choose to give us a large loader. In general commits or tags are always under the several MiB limit, so even if the loader calls it "large" we should still be able to afford the JVM heap memory required to get a single byte array. Coerce even large loaders into a single byte array anyway. Change-Id: I04efbaa7b31c5f4b0a68fc074821930b1132cfcf Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-20 17:38:52 -07:00
Shawn O. Pearce	3820b0281a	Fix formatting of serialization code in ObjectId Change-Id: I5b3e99e9e658fe272a9e171db04b0f20e48ed8d3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-19 11:53:22 -07:00
Shawn O. Pearce	1c2290c8d6	Make ObjectId.compareTo final Since equals() is now final and does not permit being overridden, we should do the same thing with compareTo() to prevent different subclasses from having different ordering behaviors. This could lead to the same mess that we had with different equals() behaviors. Change-Id: I35a849b6efccee5fe74cc5788a3566a1516004b7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-19 11:50:20 -07:00
Shawn O. Pearce	5adcd708e4	Make ObjectId.hashCode final too Since equals() is now final and does not permit being overridden, we should do the same thing with hashCode() to prevent different subclasses from having different hashing behaviors. This could lead to the same mess that we had with different equals() behaviors. Change-Id: I35a849b6efccee5fe74cc5788a3566a1516004b7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-19 11:46:22 -07:00
Shawn O. Pearce	d0043e5d31	Remove unnecessary ObjectId.copy() calls When RevObject overrode equals() to provide only reference equality we used to need to convert a RevObject into an ObjectId by copy() just to use standard Java tools like JUnit assertEquals(), or to use contains() or get() on standard java.util collection types. Now that we have removed this override and made ObjectId's equals() final (preventing any of this mess in the future), some copy() calls are unnecessary. Anytime the value is being used as an input to a lookup routine, or to an equals, we can avoid the copy(). However we still want to use copy() anytime we are given an ObjectId that may exist long-term, where we don't want the high cost of the additional storage from a RevCommit extension. So we can't remove all uses of copy(), just some of them. Change-Id: Ief275dace435c0ddfa362ac8e5d93558bc7e9fc3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-19 11:43:39 -07:00
Mathias Kinzler	b7388637d8	Fix missing Configuration Change eventing Configuration change events were not being triggered, now they are forwarded from the FileConfig up to the Repository's listeners. Change-Id: Ida94a59f5a2b7fa8ae0126e33c13343275483ee5 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-19 11:36:56 -07:00
Christian Halstrick	75c9b24385	Enhance MergeResult to report conflicts, etc The MergeResult class is enhanced to report more data about a three-way merge. Information about conflicts and the base, ours, theirs commits can be retrived. Change-Id: Iaaf41a1f4002b8fe3ddfa62dc73c787f363460c2 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	2010-08-19 12:16:39 -05:00
Chris Aniszczyk	94ba9574cd	Allow for optional tagger and message in Tag We should be more lenient when tagging without an tagger or message. Currently, we will throw an NPE which is incorrect behavior. Change-Id: I04e30ce25a9432e4ca56c3f29658ecb24fb18d24 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	2010-08-18 19:22:42 -07:00
Chris Aniszczyk	6c9d82b4ce	Remove getter and setter for author in Tag There was a duplicated getter and setter for tagger in Tag. There's no needed to have two getters and setters that represent the same things. The appropriate tests were updated also. Change-Id: If46dc00c4c0f31ea4234c6d3bda3c03e6ebbafac Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	2010-08-18 20:58:25 -05:00
Christian Halstrick	e02b68a8b7	added resetIndex() to RepositoryTestCase Added a utility method to set the reset an index to match exactly some content in the filesystem. This can be used by tests to prepare commits in the working-tree and set the index in one shot. [sp: Cleaned up formatting, added getEntryFile(), released inserter.] Change-Id: If38b1f7cacaaf769f51b14541c5da0c1e24568a5 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-18 16:45:56 -07:00
Robin Rosenberg	a85c08e1c8	Do not trigger RefsChangedEvent on the first attempt to read a ref Such events make no sense, it has never been visible to this process so no client can have a stale value of the ref. Change-Id: Iea3a5035b0a1410b80b09cf53387b22b78b18018 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-08-18 16:32:45 -07:00
Ketan Padegaonkar	376acfb6db	Add FileRepository(String) convenience constructor Add a convenience API in FileRepository to pass in a String that points to the GIT_DIR location. This is converted to a File and sent through the usual constructor. Change-Id: I588388f37e89b8c690020f110a1bc59f46170c40	2010-08-18 16:30:20 -07:00
Matthias Sohn	2d3a806271	Backout RevObject's object-identity based equals implementation This restores the transitivity and symmetry properties of the equals methods on the AnyObjectId type hierarchy as defined in [1]. Following [2] we declare these equals methods final to ensure that semantics of equals are consistent across AnyObjectId's type hierarchy. [1] http://download-llnw.oracle.com/javase/6/docs/api/java/lang/Object.html#equals(java.lang.Object) [2] http://www.angelikalanger.com/Articles/JavaSolutions/SecretsOfEquals/Equals.html Bug: 321502 Change-Id: Ibace21fa268c4aa15da6c65d42eb705ab1aa24b3 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-08-15 23:12:58 +02:00
Shawn Pearce	8d761febc3	Merge "Fix RevCommitList to work with subclasses of RevWalk"	2010-08-12 19:48:44 -04:00
Matthias Sohn	35b01dac4c	Fix RevCommitList to work with subclasses of RevWalk Bug: 321502 Change-Id: Ic4bc49a0da90234271aea7c0a4e344a1c3620cfc Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-08-13 01:47:17 +02:00
Jens Baumgart	cd1141cd45	Improve IndexDiff performance Exclude ignored files from IndexDiff tree walk. This makes EGit commit much faster. Change-Id: I398499510c22c37667b7612db32eac3b31d325f0 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	2010-08-12 11:43:02 -05:00
Chris Aniszczyk	cfe88d32a3	Merge "Hide Maven target directories from Eclipse"	2010-08-11 11:17:18 -04:00
Mathias Kinzler	fe76b41038	TransportHttp does not honor timeout setting This can result in an infinitely hanging IDE. Change-Id: I669bc8d220a07011a42edf79de31825305ff3763 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>	2010-08-10 15:58:21 +02:00
Jens Baumgart	9a6a433576	Fix NPE on commit in empty Repository NPE occured when committing in an empty repository. Bug: 321858 Change-Id: Ibddb056c32c14c1444785501c43b95fdf64884b1 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>	2010-08-09 11:14:38 +02:00
Robin Rosenberg	db4c516f67	Hide Maven target directories from Eclipse Change-Id: I64f12a35423a90ced9c9bc83f6869d8ed766dd35 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-08-08 13:16:53 +02:00
Shawn O. Pearce	09130b8731	Merge branch 'rename-bug' * rename-bug: Fix ArrayIndexOutOfBounds on non-square exact rename matrix Conflicts: org.eclipse.jgit/src/org/eclipse/jgit/diff/RenameDetector.java Change-Id: Ie0b8dd3e1ec174f79ba39dc4706bb0694cc8be29	2010-08-06 09:48:50 -07:00
Shawn O. Pearce	e2f5716c94	Fix ArrayIndexOutOfBounds on non-square exact rename matrix If the exact rename matrix for a particular ObjectId isn't square we crashed with an ArrayIndexOutOfBoundsException because the matrix entries were encoded backwards. The encode function accepts the source (aka deleted) index first, not second. Add a unit test to cover this non-square case to ensure we don't have this regression in the future. Change-Id: I5b005e5093e1f00de2e3ec104e27ab6820203566 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-06 09:39:10 -07:00
Shawn O. Pearce	8e9cc826e9	Merge changes I39bfefee,I47795987,I70d120fb,I58cc5e01,I96bee7b9 * changes: Enable configuration of non-standard pack settings Pass PackConfig down to PackWriter when packing Simplify UploadPack use of options during writing Move PackWriter configuration to PackConfig Allow PackWriter callers to manage the thread pool	2010-08-05 21:11:31 -04:00
Shawn O. Pearce	97e93ca1ea	Merge "Remove static progress task names from PackWriter"	2010-08-05 21:11:30 -04:00
Chris Aniszczyk	b69900a415	Merge "Add "all" parameter to the commit Command"	2010-08-05 15:02:59 -04:00
Chris Aniszczyk	ad4274abcc	Merge "Add the parameter "update" to the Add command"	2010-08-05 15:01:53 -04:00
Stefan Lay	4b464ed458	Allow to replace existing Change-Id It is useful to be able to replace an existing Change-Id in the message, for example if the user decides not to amend the previous commit. Bug: 321188 Change-Id: I594e7f9efd0c57d794d2bd26d55ec45f4e6a47fd Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	2010-08-05 12:23:38 -05:00
Shawn O. Pearce	60c5939b23	Rename getOldName,getNewName to getOldPath,getNewPath TreeWalk calls this value "path", while "name" is the stuff after the last slash. FileHeader should do the same thing to be consistent. Rename getOldName to getOldPath and getNewName to getNewPath. Bug: 318526 Change-Id: Ib2e372ad4426402d37939b48d8f233154cc637da Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-08-04 11:00:07 -07:00
Shawn O. Pearce	7514a6dbdc	Merge branch 'js/diff' * js/diff: Fixed bug in scoring mechanism for rename detection	2010-08-04 10:59:35 -07:00
Jeff Schumacher	e64cb03065	Fixed bug in scoring mechanism for rename detection A bug in rename detection would cause file scores to be wrong. The bug was due to the way rename detection would judge the similarity between files. If file A has three lines containing 'foo', and file B has 5 lines containing 'foo', the rename detection phase should record that A and B have three lines in common (the minimum of the number of times that line appears in both files). Instead, it would choose the the number of times the line appeared in the destination file, in this case file B. I fixed the bug by having the SimilarityIndex instead choose the minimum number, as it should. I also added a test case to verify that the bug had been fixed. Change-Id: Ic75272a2d6e512a361f88eec91e1b8a7c2298d6b	2010-08-04 10:56:19 -07:00
Jens Baumgart	3ba1c7c068	Add gitignore support to IndexDiff and use TreeWalk IndexDiff was re-implemented and now uses TreeWalk instead of GitIndex. Additionally, gitignore support and retrieval of untracked files was added. Change-Id: Ie6a8e04833c61d44c668c906b161202b200bb509 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	2010-08-04 10:03:20 -05:00
Stefan Lay	ab57af08e8	Add "all" parameter to the commit Command When the add parameter is set all modified and deleted files are staged prior to commit. Change-Id: Id23bc25730fcdd151386cd495a7cdc0935cbc00b Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-08-04 13:53:08 +02:00
Stefan Lay	fa7d9ac5b8	Add the parameter "update" to the Add command This change is mainly done for a subsequent commit which will introduce the "all" parameter to the Commit command. Bug: 318439 Change-Id: I85a8a76097d0197ef689a289288ba82addb92fc9 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-08-04 13:36:45 +02:00
Christian Halstrick	94207f0a43	Make use of Repository.writeMerge...() The CommitCommand should not use java.io to delete MERGE_HEAD and MERGE_MSG files since Repository already has utility methods for that. Change-Id: If66a419349b95510e5b5c2237a91f06c1d5ba0d4 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	2010-07-29 15:12:14 +02:00
Christian Halstrick	fba2437111	Merge "Fix tag sorting in PlotWalk"	2010-07-28 17:13:27 -04:00
Shawn O. Pearce	5f5da8b1d4	Enable configuration of non-standard pack settings For daemons we might want to disable delta compression entirely, or in some strange case an administrator might need to turn of delta reuse. Expose these normally internal pack settings through the pack configuration section. Change-Id: I39bfefee8384c864cc04ffac724f197240c8a11a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 12:13:48 -07:00
Shawn O. Pearce	9fbce904e6	Pass PackConfig down to PackWriter when packing When we are creating a pack the higher level application should be able to override the PackConfig used, allowing it to control the number of threads used or how much memory is allocated per writer. Change-Id: I47795987bb0d161d3642082acc2f617d7cb28d8c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 12:13:48 -07:00
Shawn O. Pearce	bb99ec0aa0	Simplify UploadPack use of options during writing We only use these variables once, so just put them at the proper use site and avoid assigning the local variable. The code is a bit shorter and the intent is a little bit more clear. Change-Id: I70d120fb149b612ac93055ea39bc053b8d90a5db Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 12:13:48 -07:00
Shawn O. Pearce	1a06179ea7	Move PackWriter configuration to PackConfig This refactoring permits applications to configure global per-process settings for all packing and easily pass it through to per-request PackWriters, ensuring that the process configuration overrides the repository specific settings. For example this might help in a daemon environment where the server wants to cap the resources used to serve a dynamic upload pack request, even though the repository's own pack.* settings might be configured to be more aggressive. This allows fast but less bandwidth efficient serving of clients, while still retaining good compression through a cron managed `git gc`. Change-Id: I58cc5e01b48924b1a99f79aa96c8150cdfc50846 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 12:13:48 -07:00
Mathias Kinzler	6e59e6dab9	Meaningful error message when trying to check-out submodules Currently, a NullPointerException occurs in this case. We should instead throw a more meaningful Exception with a proper message. This is a very "stupid" implementation which simply checks for the existence of a ".gitmodules" file. Bug: 300731 Bug: 306765 Bug: 308452 Bug: 314853 Change-Id: I155aa340a85cbc5d7d60da31dba199fc30689b67 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>	2010-07-28 11:59:07 -07:00
Christian Halstrick	08c0c5d938	Fix unit tests under windows the following tests fail under windows because certain inputstreams are not closed and files cannot be deleted because of that. The main problem I found is UnpackedObject.InflaterInputStream.close(). This method may throw exceptions found by checkValidEndOfStream() but doesn't call super.close() before leaving. It is not clear to me which resources a close() method should release before it throws an exception. But those reseources which are not published to the outside and which therefore cannot be closed by other means have to be closed in all cases. I changed the close() method to call super.close() under all circumstances. failing tests: testStandardFormat_LargeObject_TruncatedZLibStream(org.eclipse.jgit.storage.file.UnpackedObjectTest) testStandardFormat_LargeObject_TrailingGarbage(org.eclipse.jgit.storage.file.UnpackedObjectTest) testPackFormat_SmallObject(org.eclipse.jgit.storage.file.UnpackedObjectTest) Change-Id: Id2e609a29e725aad953ff9bd88af6381df38399d Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	2010-07-28 11:55:11 -07:00
Shawn O. Pearce	d0f8d1e819	Fix tag sorting in PlotWalk By deferring tag sorting until the commit is produced by the walker we can avoid an infinite loop that was triggered by trying to sort tags while allocating a commit. This also avoids needing to look at commits which aren't going to be produced in the result. Bug: 321103 Change-Id: I25acc739db2ec0221a50b72c2d2aa618a9a75f37 Reviewed-by: Mathias Kinzler <mathias.kinzler@sap.com> Reviewed-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 11:51:17 -07:00
Shawn O. Pearce	21f76c2a69	Remove static progress task names from PackWriter These need to be dynamic based on the current thread's environment at time of execution in order to be properly localized for the end user that will be seeing these messages. Change-Id: I4976f462cfe606edd2761c0e36b2f6b20f63d53c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 10:50:28 -07:00
Shawn O. Pearce	1b783d0370	Allow PackWriter callers to manage the thread pool By permitting the caller of PackWriter to select the Executor it uses for task execution, we give the caller the ability to manage the lifecycle of the thread pool, including reusing it across concurrent pack generators. This is the first step to supporting application thread pools within Daemon or another managed service like Gerrit Code Review. Change-Id: I96bee7b9c30ff9885f2bd261d0b6daaac713b5a4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 10:50:28 -07:00
Christian Halstrick	74d279fbf0	Teach NameConflictTreeWalk to report DF conflicts Add a method isDirectoryFileConflict() to NameConflictTreeWalk which tells whether the current path is part of a directory/file conflict. Change-Id: Iffcc7090aaec743dd6f3fd1a333cac96c587ae5d Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-28 17:26:31 +02:00
Mathias Kinzler	51c6f513b0	Stack Overflow in EGit History View This is caused by a recursion in PlotWalk.getTags(). As a hotfix, the sort was simply removed. The sort must be re-implemented so that parseAny() is not called again (currently, this happens in the PlotRefComparator). Change-Id: I060d26fda8a75ac803acaf89cfb7d3b4317328f3 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com> Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	2010-07-28 11:46:05 +02:00
Jeff Schumacher	396fe6da45	Break dissimilar file pairs during diff File pairs that are very dissimilar during a diff were not being broken apart into their constituent ADD/DELETE pairs. The leads to sub-optimal rename detection. Take, for example, this situation: A file exists at src/a.txt containing "foo". A user renames src/a.txt to src/b.txt, then adds a new src/a.txt containing "bar". Even though the old a.txt and the new b.txt are identical, the rename detection algorithm would not detect it as a rename since it was already paired in a MODIFY. I added code to split all MODIFYs below a certain score into their constituent ADD/DELETE pairs. This allows situations like the one I described above to be more correctly handled. Change-Id: I22c04b70581f206bbc68c4cd1ee87a1f663b418e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-27 18:13:32 -07:00
Christian Halstrick	f56a459966	Add methods which write MERGE_HEAD and MERGE_MSG Add methods to the Repository class which write into MERGE_HEAD and MERGE_MSG files. Since we have the read methods in the same class this seems to be the right place. Change-Id: I5dd65306ceb06e008fcc71b37ca3a649632ba462 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-27 11:48:23 -07:00
Jens Baumgart	db82b8d7eb	Fix concurrent read / write issue in LockFile on Windows LockFile.commit fails if another thread concurrently reads the base file. The problem is fixed by retrying the rename operation if it fails. Change-Id: I6bb76ea7f2e6e90e3ddc45f9dd4d69bd1b6fa1eb Bug: 308506 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>	2010-07-27 10:00:47 -07:00
Robin Stocker	a00377a7e2	Fix Javadoc warnings There were some broken links, incorrect uses of @value, an invalid tag and an outdated comment. Change-Id: I22886bcc869a4b62bd606ebed40669f7b4723664 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-27 09:40:01 -07:00
Shawn O. Pearce	80fe789690	Make forPath(ObjectReader) variant in TreeWalk This simplifies the logic for those who already have an ObjectReader on hand want to reuse it to lookup a single path. Change-Id: Ief17d6b2a0674ddb34bbc9f43121b756eae960fb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-27 08:36:24 -07:00
Shawn O. Pearce	7ff18f3ec9	Make StoredConfig an abstraction above FileBasedConfig This exposes a load and save method, allowing a Repository to denote that it has a persistent configuration of some kind which can be accessed by the application, without needing to know exact details of how its stored . Change-Id: I7c414bc0f975b80f083084ea875eca25c75a07b2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-26 16:50:11 -07:00
Shawn O. Pearce	fa9b225e06	Merge branch 'delta' * delta: (103 commits) Discard the uncompressed delta as soon as its compressed Honor pack.windowlimit to cap memory usage during packing Honor pack.threads and perform delta search in parallel Cache small deltas during packing Implement delta generation during packing debug-show-packdelta: Dump a pack delta to the console Initial pack format delta generator Add debugging toString() method to ObjectToPack Make ObjectToPack clearReuseAsIs signal available to subclasses Correctly classify the compressing objects phase Refactor ObjectToPack's delta depth setting Configure core.bigFileThreshold into PackWriter Add doNotDelta flag to ObjectToPack Add more configuration options to PackWriter Save object path hash codes during packing Add path hash code to ObjectWalk Add getObjectSize to ObjectReader Allow TemporaryBuffer.Heap to allocate smaller than 8 KiB Define a constant for 127 in DeltaEncoder Cap delta copy instructions at 64k ... Conflicts: org.eclipse.jgit.pgm/src/org/eclipse/jgit/pgm/Diff.java org.eclipse.jgit/resources/org/eclipse/jgit/JGitText.properties org.eclipse.jgit/src/org/eclipse/jgit/JGitText.java org.eclipse.jgit/src/org/eclipse/jgit/revwalk/RewriteTreeFilter.java Change-Id: I7c7a05e443a48d32c836173a409ee7d340c70796	2010-07-22 14:56:34 -07:00
Stefan Lay	ab062caa22	Allow client of Add command to set a WorkingTreeIterator This is e.g. useful when a client of the AddCommand has additional rules to ignore files. In Eclipse a resource can be set to derived or be excluded by preferences. Change-Id: I6c47e54a1ce26315faf5ed0723298ad2c2db197c Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-22 14:57:00 +02:00
Stefan Lay	88957f6c5a	Allow for filepattern "." in AddCommand Enable adding on repository root level. Change-Id: I415b10dc74cc9435578424d9f106c972fd703055 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-22 14:27:35 +02:00
Stefan Lay	aa86cfc339	Do not add ignored files in Add command Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-22 11:26:04 +02:00
Shawn O. Pearce	09910ffa32	Move ignore node handling into WorkingTreeIterator The working tree iterator has perfect knowledge of the path structure as well as immediate information about whether or not an ignore file even exists at this level. We can exploit that to simplify the logic and running time for testing ignored file status by pushing all of the checks down into the iterator itself. Change-Id: I22ff534853e8c5672cc5c2d9444aeb14e294070e Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>	2010-07-21 10:34:08 -07:00
Shawn Pearce	0ec0e21fdf	Merge "Fix concurrent read / write issue in GitIndex on Windows"	2010-07-21 13:08:01 -04:00
Jens Baumgart	e99c48a61a	Fix concurrent read / write issue in GitIndex on Windows GitIndex.write fails if another thread concurrently reads the index file. The problem is fixed by retrying the rename operation if it fails. Bug: 311051 Change-Id: Ib243d2a90adae312712d02521de4834d06804944 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>	2010-07-21 09:35:15 +02:00
Christian Halstrick	5c94321b47	Check for racy git in WorkingTreeIterator The WorkingTreeIterator has a method to check whether the current file differs from the corresponding index entry. This commit improves this check to also handle racy git situations. See http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/technical/racy-git.txt;hb=HEAD Change-Id: I3ad0897211dcbb2eac9eebcb19d095a5052fb06b Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-20 21:55:18 +02:00
Christian Halstrick	c98d97731b	Smudge racily clean index entries by truncating length (like git.git) To mark an entry racily clean we set its length to 0 (like native git does). Entries which are not racily clean and have zero length can be distinguished from racily clean entries by checking P_OBJECTID against the SHA1 of empty content. When length is 0 and P_OBJECTID is different from SHA1 of empty content we know the entry is marked racily clean. See http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg00488.html Change-Id: I689552931441ab51964b430b303160c9126b66af Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-20 21:54:36 +02:00
Shawn O. Pearce	938943d674	Use proper constants for .gitignore and .git directory We have a constant for .gitignore, so use it. While we are in the same method, correct the reference of ".git" to be the actual GIT_DIR given. This might not be within the work tree if the GIT_DIR and GIT_WORK_TREE environment variables were used. Change-Id: I38e1cec13405109b9c347858b38dd9fb2f1f2560 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>	2010-07-20 09:11:39 -07:00
Shawn O. Pearce	c59db09bc5	Remove gitIgnoreTimestamp from abstract iterator API This never should have been exposed on the top of the AbstractTreeIterator type hierarchy. There is no concept of a timestamp in a canonical tree read from the object database, and the time in the DirCache isn't what we want here either. Actually all that we need is to find the files whose names are ".gitignore" and are below the root directory. We can accomplish that with a suffix filter, and process them immediately. Change-Id: Ib09cbf81a9e038452ce491385c65498312e2916b Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>	2010-07-20 09:09:01 -07:00
Shawn O. Pearce	395d236058	Fix NPE in RenameDetector If we have two adds of the same object but no deletes the detector threw an NPE because the entry that came back from the deleted map was null (no matching objects). In this case we need to put the adds all back onto the list of left over additions since they did not match a delete. Change-Id: Ie68fbe7426b4dc0cb571a08911c7adbffff755d5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Jeffrey Schumacher" <jeffschu@google.com>	2010-07-20 07:52:35 -07:00
Shawn O. Pearce	b518189b5c	IndexPack: Fix spurious pack file corruption errors We didn't correctly handle the zlib trailer for an object. If the trailer bytes were outside of the current buffer window but we had fully inflated the object itself, we broke out of the loop (as we had our target size) but inflate wasn't finished (as it did not yet get the trailer) so we failed the test and threw a corruption exception. Use an infinite loop and only break out when the inflater is done. Change-Id: I7c9bbbeb577a990d9bc56a50ebd485935460f6c8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-20 07:40:48 -07:00
Shawn O. Pearce	12fe0f2d1e	Discard the uncompressed delta as soon as its compressed The DeltaCache will most likely need to copy the compressed delta into a new buffer in order to compact away the wasted space at the end caused by over allocation. Since we don't need the uncompressed format anymore, null out our only reference to it so the GC can reclaim this memory if it needs to perform a collection in order to satisfy the cache's allocation attempt. Change-Id: I50403cfd2e3001b093f93a503cccf7adab43cc9d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-16 10:41:09 -07:00
Shawn O. Pearce	6e155d5f41	Merge branch 'js/rename' * js/rename: Implemented file path based tie breaking to exact rename detection Added more test cases for RenameDetector Added very small optimization to exact rename detection Fixed Misleading Javadoc Added file path similarity to scoring metric in rename detection Fixed potential div by zero bug Added file size based rename detection optimization Create FileHeader from DiffEntry log: Implement --follow Cache the diff configuration section log: Add whitespace ignore options Format submodule links during differences Redo DiffFormatter API to be easier to use log, diff: Add rename detection support Implement similarity based rename detection Added a preliminary version of rename detection Refactored code out of FileHeader to facilitate rename detection	2010-07-16 10:22:15 -07:00
Shawn O. Pearce	0b46e70155	Fix infinite loop in IndexPack A programming error using the Inflater API led to an infinite loop within IndexPack, caused by the Inflater returning 0 from the inflate() method, but it didn't want more input. This happens when it has reached the end of the stream, or has reached a spot asking for an external dictionary. Such a case is a failure for us, and we should abort out. Thanks to Alex for pointing out that we had 3 implementations of the inflate rountine, which should be consolidated into one and use a switch to determine where to load data from. Bug: 317416 Change-Id: I34120482375b687ea36ed9154002d77047e94b1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-16 10:12:04 -07:00
Jeff Schumacher	31311cacfd	Implemented file path based tie breaking to exact rename detection During the exact rename detection phase in RenameDetector, ties were resolved on a first-found basis. I added support for file path based tie breaking during that phase. Basically, there are four situations that have to be handled: One add matching one delete: In this simple case, we pair them as a rename. One add matching many deletes: Find the delete whos path matches the add the closest, and pair them as a rename. Many adds matching one delete: Similar to the above case, we find the add that matches the delete the closest, and pair them as a rename. The other adds are marked as copies of the delete. Many adds matching many deletes: Build a scoring matrix similar to the one used for content- based matching, scoring instead by file path. Some of the utility functions in SimilarityRenameDetector are used in this case, as we use the same encoding scheme. Once the matrix is built, scan it for the best matches, marking them as renames. The rest are marked as copies. I don't particularly like the idea of using utility functions right out of SimilarityRenameDetector, but it works for the moment. A later commit will likely refactor this into a common utility class, as well as bringing exact rename detection out of RenameDetector and into a separate class, much like SimilarityRenameDetector. Change-Id: I1fb08390aebdcbf20d049aecf402a36506e55611	2010-07-16 09:56:42 -07:00
Christian Halstrick	b840ed0121	Added dirty-detection to WorkingTreeIterator Added possibility to compare the current entry of a WorkingTreeIterator to a given DirCacheEntry. This is done to detect whether an entry in the index is dirty or not. 'Dirty' means that the file in the working tree is different from what's in the index. Merge algorithms will make use of this to detect conflicts. Change-Id: I3ff847f4bf392553dcbd6ee236c6ca32a13eedeb Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-16 10:08:52 +02:00
Shawn Pearce	19473b1dbc	Merge "Handle the tilde notation (~user) of git url"	2010-07-15 17:29:21 -04:00
Robin Rosenberg	845714158a	Handle the tilde notation (~user) of git url When the path is prefixed with ~ the URI parser thought about this as /~. Strip the / if the next character is the tilde. Bug: 307017 Change-Id: I58203e5617956b46d83e8987d1f8042beddffac3 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-07-15 01:16:09 +02:00
Stefan Lay	233e0130b5	Git Porcelain API: Add Command The new Add command adds files to the Git Index. It uses the DirCache to access the git index. It works also in case of an existing conflict. Fileglobs (e.g. *.c) are not yet supported. The new Add command does add ignored files because there is no gitignore support in jgit yet. Bug: 318440 Change-Id: If16fdd4443e46b27361c2a18ed8f51668af5d9ff Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-14 11:24:58 +00:00
Shawn Pearce	0ef99921fa	Merge changes I104cd62f,I1d0238b4 * changes: Internationalize RepositoryState descriptions Say that commit is allowed during bisect	2010-07-13 20:36:25 -04:00
Charley Wang	b878cdcf6b	Add compatibility with gitignore specifications This patch adds ignore compatibility to jgit. It encompasses exclude files as well as .gitignore. Uses TreeWalk and FileTreeIterator to find nodes and parses .gitignore files when required. The patch includes a simple cache that can be used to save results and avoid excessive gitignore parsing. CQ: 4302 Bug: 303925 Change-Id: Iebd7e5bb534accca4bf00d25bbc1f561d7cad11b Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com> Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-13 00:34:15 +02:00
Jeff Schumacher	bc08fafb41	Added very small optimization to exact rename detection Optimized a small loop in findExactRenames. The loop would go through all the items in a list of DiffEntries even after it already found what it was looking for. I made it break out of the loop as soon as a good match was found. Change-Id: I28741e0c49ce52d8008930a87cd1db7037700a61	2010-07-12 12:54:01 -07:00
Jeff Schumacher	a20e6f6fec	Fixed Misleading Javadoc The javadoc for the setRenameLimit method in RenameDetector said that you could only have limits in the range (0,100), implying that 0 and 100 were illegal inputs. The code, however, allowed 0 and 100. I changed the javadoc to say that the range [0,100] was legal. I also documented the IllegalArgumentException that is thrown if the limit is outside that range. Change-Id: I916838f254859f6f0e1516bb55b8e7dc87e57dc2	2010-07-12 12:54:01 -07:00
Jeff Schumacher	9a48de86d8	Added file path similarity to scoring metric in rename detection The scoring method was not taking into account the similarity of the file paths and file names. I changed the metric so that it is 99% based on content (which used to be 100% of the old metric), and 1% based on path similarity. Of that 1%, half (.5% of the total final score) is based on the actual file names (e.g. "foo.java"), and half on the directory (e.g. "src/com/foo/bar/"). Change-Id: I94f0c23bf6413c491b10d5625f6ad7d2ecfb4def	2010-07-12 12:52:05 -07:00
Jeff Schumacher	4c14b7869d	Fixed potential div by zero bug The scoring logic in SimilarityIndex was dividing by the max file size. If both files are empty, this would cause a div by zero error. This case cannot currently happen, since two empty files would have the same SHA1, and would therefore be caught in the earlier SHA1 based detection pass. Still, if this logic eventually gets separated from that pass, a div by zero error would occur. I changed the logic to instead consider two empty files to have a similarity score of 100. Change-Id: Ic08e18a066b8fef25bb5e7c62418106a8cee762a	2010-07-12 12:24:42 -07:00
Jeff Schumacher	64b9458640	Added file size based rename detection optimization Prior to this change, files that were very different in size (enough so that they could not have enough in common to be detected as renames) were still having their scores calculated. I added an optimization to skip such files. For example, if the rename detection threshold is 60%, the larger file is 200kb, and the smaller file is 50kb, the pair cannot be counted as a rename since they cannot possibly share 60% of their content in common. (200*.6=120, 120>50) Change-Id: Icd8315412d5de6292839778e7cea7fe6f061b0fc	2010-07-12 12:24:42 -07:00
Robin Rosenberg	d787a82e50	Internationalize RepositoryState descriptions Change-Id: I104cd62f3e89acf010b1d40a2b08e7f68f63bb85 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-07-10 10:24:37 +02:00
Shawn O. Pearce	9734194917	Honor pack.windowlimit to cap memory usage during packing The pack.windowlimit configuration parameter places an upper bound on the number of bytes used by the DeltaWindow class as it scans through the object list. If memory usage would exceed the limit the window is temporarily decreased in size to keep memory used within that bound. Change-Id: I09521b8f335475d8aee6125826da8ba2e545060d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:19:07 -07:00
Shawn O. Pearce	74e0835012	Honor pack.threads and perform delta search in parallel If we have multiple CPUs available, packing usually goes faster when each CPU is assigned a slice of the available search space. The number of threads to use is guessed from the runtime if it wasn't set by the caller, or wasn't set in the configuration. Change-Id: If554fd8973db77632a52a0f45377dd6ec13fc220 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:17:30 -07:00
Shawn O. Pearce	a960d1429e	Cache small deltas during packing PackWriter now caches small deltas, or deltas that are very tiny compared to their source inputs, so that the writing phase goes faster by reusing those cached deltas. The cached data is stored compressed, which usually translates to a bigger footprint due to deltas being very hard to compress, but saves time during writing by avoiding the deflate step. They are held under SoftReferences so that the JVM GC can clear out deltas if memory gets very tight. We would rather continue working and spend a bit more CPU time during writing than crash due to OOME. To avoid OutOfMemoryErrors during the caching phase we also trap OOME and just abort out of the caching. Because deflateBound() always produces something larger than what we need to actually store the deflated data, we copy it over into a new buffer if the actual length doesn't match the buffer length. When packing jgit.git this saves over 111 KiB in the cache, and is thus a worthwhile hit on CPU time. To further save memory we store the inflated size of the delta (which we need for the object header) in the same field as the pathHash, as the pathHash is no longer necessary by this phase of the packing algorithm. Change-Id: I0da0c600d845e8ec962289751f24e65b5afa56d7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:15:54 -07:00
Shawn O. Pearce	dfad23bf3d	Implement delta generation during packing PackWriter now produces new deltas if there is not a suitable delta available for reuse from an existing pack file. This permits JGit to send less data on the wire by sending a delta relative to an object the other side already has, instead of sending the whole object. The delta searching algorithm is similar in style to what C Git uses, but apparently has some differences (see below for more on). Briefly, objects that should be considered for delta compression are pushed onto a list. This list is then sorted by a rough similarity score, which is derived from the path name the object was discovered at in the repository during object counting. The list is then walked in order. At each position in the list, up to $WINDOW objects prior to it are attempted as delta bases. Each object in the window is tried, and the shortest delta instruction sequence selects the base object. Some rough rules are used to prevent pathological behavior during this matching phase, like skipping pairings of objects that are not similar enough in size. PackWriter intentionally excludes commits and annotated tags from this new delta search phase. In the JGit repository only 28 out of 2600+ commits can be delta compressed by C Git. As the commit count tends to be a fair percentage of the total number of objects in the repository, and they generally do not delta compress well, skipping over them can improve performance with little increase in the output pack size. Because this implementation was rebuilt from scratch based on my own memory of how the packing algorithm has evolved over the years in C Git, PackWriter, DeltaWindow, and DeltaEncoder don't use exactly the same rules everywhere, and that leads JGit to produce different (but logically equivalent) pack files. Repository \| Pack Size (bytes) \| Packing Time \| JGit - CGit = Difference \| JGit / CGit -----------+----------------------------------+----------------- git \| 25094348 - 24322890 = +771458 \| 59.434s / 59.133s jgit \| 5669515 - 5709046 = - 39531 \| 6.654s / 6.806s linux-2.6 \| 389M - 386M = +3M \| 20m02s / 18m01s For the above tests pack.threads was set to 1, window size=10, delta depth=50, and delta and object reuse was disabled for both implementations. Both implementations were reading from an already fully packed repository on local disk. The running time reported is after 1 warm-up run of the tested implementation. PackWriter is writing 771 KiB more data on git.git, 3M more on linux-2.6, but is actually 39.5 KiB smaller on jgit.git. Being larger by less than 0.7% on linux-2.6 isn't bad, nor is taking an extra 2 minutes to pack. On the running time side, JGit is at a major disadvantage because linux-2.6 doesn't fit into the default WindowCache of 20M, while C Git is able to mmap the entire pack and have it available instantly in physical memory (assuming hot cache). CGit also has a feature where it caches deltas that were created during the compression phase, and uses those cached deltas during the writing phase. PackWriter does not implement this (yet), and therefore must create every delta twice. This could easily account for the increased running time we are seeing. Change-Id: I6292edc66c2e95fbe45b519b65fdb3918068889c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:14:18 -07:00
Shawn O. Pearce	074055d747	debug-show-packdelta: Dump a pack delta to the console This is a horribly crude application, it doesn't even verify that the object its dumping is delta encoded. Its method of getting the delta is pretty abusive to the public PackWriter API, because right now we don't want to expose the real internal low-level methods actually required to do this. Change-Id: I437a17ceb98708b5603a2061126eb251e82f4ed4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:12:32 -07:00
Shawn O. Pearce	8612c0ace1	Initial pack format delta generator DeltaIndex is a simple pack style delta generator. The function works by creating a compact index of a source buffer's blocks, and then walking a sliding window along a desired result buffer, searching for the window in the index. When a match is found, the window is stretched to the longest possible length that is common with the source buffer, and a copy instruction is created. Rabin's polynomial hash function is used to compute the hash for a block, permitting efficient sliding of the window in single byte increments. The update function to slide one byte originated from David Mazieres' work in LBFS, and our implementation of the update step was certainly inspired by the initial work Geert Bosch proposed for C Git in http://marc.info/?l=git&m=114565424620771&w=2. To ensure the encoder runs in linear time with respect to the size of the two input buffers (source and result), the maximum number of blocks that can share the same position in the index's hashtable is capped at a constant number. This prevents bad inputs from causing the encoder to run in quadratic time, but comes with a penalty of creating a longer delta due to fewer considered copy positions. Strange hackery is used to cap the amount of memory used by the index to be no more than 12 bytes for every 16 bytes of source buffer, no matter what the JVM per-object overhead is. This permits an index to always be no larger than 1.75x the source buffer length, which is an important feature to support large windows of candidates to match against while packing. Here the strange hackery is nothing more than a manually managed chained hashtable, where pointers are array indexes into storage arrays rather than object references. Computation of the hash function for a single fixed sized block is done through an unrolled loop, where the first 4 iterations have been manually reduced down to eliminate unnecessary instructions. The pattern is derived from ObjectId.equals(byte[], int, byte[], int), where we have unrolled the loop required to compare two 20 byte arrays. Hours of testing with the Sun 1.6 JRE concluded that the non-obvious "foo[idx + 1]" style of reference is faster than "foo[idx++]", and so that is what we use here during hashing. Change-Id: If9fb2a1524361bc701405920560d8ae752221768 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:10:55 -07:00
Shawn O. Pearce	b38426ae8c	Add debugging toString() method to ObjectToPack Its useful to know what the flags are or what the base that was selected is. Dump these out as part of the object's toString. Change-Id: I8810067fb8337b08b4fcafd5f9ea3e1e31ca6726 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:09:19 -07:00
Shawn O. Pearce	699e4aa7c5	Make ObjectToPack clearReuseAsIs signal available to subclasses A subclass may want to use this method to release handles that are caching reuse information. Make it protected so they can override it and update themselves. Change-Id: I2277a56ad28560d2d2d97961cbc74bc7405a70d4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:07:45 -07:00
Shawn O. Pearce	4569d77e13	Correctly classify the compressing objects phase Searching for reuse candidates should be fast compared to actually doing delta compression. So pull the progress monitor out of this phase and rename it back to identify the compressing objects state. Change-Id: I5eb80919f21c1251e0e3420ff7774126f1f79b27 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:06:10 -07:00
Shawn O. Pearce	85b7a53d52	Refactor ObjectToPack's delta depth setting Long ago when PackWriter is first written we thought that the delta depth could be updated automatically. But its never used. Instead make this a simple standard setter so the caller can more directly set the delta depth of this object. This permits us to configure a depth that takes into account more than just the depth of another object in this same pack. Change-Id: I1d71b74f2edd7029b8743a2c13b591098ce8cc8f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:04:35 -07:00
Shawn O. Pearce	6730f9e3c8	Configure core.bigFileThreshold into PackWriter C Git's fast-import uses this to determine the maximum file size that it tries to delta compress, anything equal to or above this setting is stored with as a whole object with simple deflate. Define the configuration so we can use it later. Change-Id: Iea46e787d019a1b6c51135cc73d7688a02e207f5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:02:54 -07:00
Shawn O. Pearce	823e9a9721	Add doNotDelta flag to ObjectToPack This flag will later control whether or not PackWriter search for a delta base for this object. Edge objects will never get searched, as the writer won't be outputting them, so they should always have this flag set on. Sometime in the future this flag should also be set for file blobs on file paths that have the "-delta" gitattribute set in the repository's attributes file. Change-Id: I6e518e1a6996c8ce00b523727f1b605e400e82c6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:00:49 -07:00
Shawn O. Pearce	616bc74cf7	Add more configuration options to PackWriter We now at least import other pack settings like pack.window, which means we can later use these to control how we search for deltas. The compression level was fixed to use pack.compression rather than the loose object core.compression setting. Change-Id: I72ff6d481c936153ceb6a9e485fa731faf075a9a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:00:46 -07:00
Robin Rosenberg	a1492f1922	Say that commit is allowed during bisect C Git allows this and it is quite handy. Change-Id: I1d0238b43fca931ad2079649fb7b431e2815c351 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-07-10 02:32:46 +02:00
Shawn O. Pearce	2f93a09dd1	Save object path hash codes during packing We need to remember these so we can later cluster objects that have similar file paths near each other as we search for deltas between them. Change-Id: I52cb1e4ca15c9c267a2dbf51dd0d795f885f4cf8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 15:17:26 -07:00
Shawn O. Pearce	c20daa7314	Add path hash code to ObjectWalk PackWriter wants to categorize objects that are similar in path name, so blobs that are probably from the same file (or same sort of file) can be delta compressed against each other. Avoid converting into a string by performing the hashing directly against the path buffer in the tree iterator. We only hash the last 16 bytes of the path, and we try avoid any spaces, as we want the suffix of a file such as ".java" to be more important than the directory it is in, like "src". Change-Id: I31770ee711526306769a6f534afb19f937e0ba85 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 10:37:47 -07:00
Shawn O. Pearce	b584cb8754	Add getObjectSize to ObjectReader This is an informational function used by PackWriter to help it better organize objects for delta compression. Storage systems can implement it to provide up more detailed size information, or they can simply rely on the default behavior that uses the ObjectLoader obtained from open. For local file storage, we can obtain this information faster through specialized routines that parse a pack object header. Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 10:37:47 -07:00
Shawn O. Pearce	97311cd3e0	Allow TemporaryBuffer.Heap to allocate smaller than 8 KiB If the heap limit was set to something smaller than 8 KiB, we were still allocating the full 8 KiB block size, and accepting up to the amount we allocated by. Instead actually put a hard cap on the limit. Change-Id: Id1da26fde2102e76510b1da4ede8493928a981cc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 10:37:47 -07:00
Matthias Sohn	b8f2bb7d2a	Add support for updateNeeded flag in DirCacheEntry Change-Id: If06ff41d9ccd422afbc79ecbc3cfdf8bb2508dcd Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-09 14:12:06 +02:00
Jeff Schumacher	a8b29afd82	Create FileHeader from DiffEntry Added support for converting DiffEntrys to FileHeaders. FileHeaders are DiffEntrys with a buffer containing the diff output as well as a list of HunkHeaders. The HunkHeaders contain EditLists. The createFileHeader(DiffEntry) method in DiffFormatter performs a Myers Diff on the files refered to by the DiffEntry, then puts the returned EditList into a single HunkHeader, which is then put into the FileHeader to be returned. It also generates the appropriate diff header an puts it into the FileHeader's buffer. The rest of the diff output, which would normally be parsed to generate the HunkHeaders, is not generated. In fact, the purpose of this method is to avoid the costly diff output generation and parsing normally required to create a FileHeader. Change-Id: I7d8b18c0f6c85e3d02ad58995d3d231e69af5887	2010-07-08 16:58:55 -07:00
Stefan Lay	354b90131a	Fix javadoc typos in JGit API There were some small errors which made it difficult to read the JavaDoc. Change-Id: Ib3b34353465162adebaca3514d596d0edf5aea51 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-08 10:42:29 +02:00
Shawn O. Pearce	711bd3e3d0	Define a constant for 127 in DeltaEncoder The special value 127 here means how many bytes we can put into a single insert command. Rather than use the magical value 127, lets name it to better document the code. Change-Id: I5a326f4380f6ac87987fa833e9477700e984a88e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 09:52:09 -07:00
Shawn O. Pearce	cd7dd8591e	Cap delta copy instructions at 64k Although all modern delta decoders can process copy instructions with a count as large as 0xffffff (~15.9 MiB), pack version 2 streams are only supposed to use delta copy instructions up to 64 KiB. Rewrite our copy instruction encode loop to use the lower 64 KiB limit, even though modern decoders would support longer copies. To improve encoding performance we now try to encode up to four full copy commands in our buffer before we flush it to the stream, but we don't try to implement full buffering here. We are just trying to amortize the virtual method call to the destination stream when we have to do a large copy. Change-Id: I9410a16e6912faa83180a9788dc05f11e33fabae Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 09:52:09 -07:00
Shawn O. Pearce	384a19eee0	Deprecate all of the older Tree related code We want to get rid of these APIs, because they don't perform as well as DirCache/TreeWalk, or don't offer nearly as many features. Bug: 319145 Change-Id: I2b28f9cddc36482e1ad42d53e86e9d6461ba3bfc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 09:15:02 -07:00
Shawn O. Pearce	a215914a56	Fix DeltaEncoder header for objects 128 bytes long The encode loop had the wrong condition, objects that are 128 bytes in size need to have their length encoded as two bytes, not one. Change-Id: I3bef85f2b774871ba6104042b341749eb8e7595c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 08:53:03 -07:00
Shawn O. Pearce	f29741d1d8	amend commit: Support large delta packed objects as streams Rename the ByteWindow's inflate() method to setInput. We have completely refactored the purpose of this method to be feeding part (or all) of the window as input to the Inflater, and the actual inflate activity happens in the caller. Change-Id: Ie93a5bae0e9e637b5e822d56993ce6b562c6ad15 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-06 19:41:06 -07:00
Shawn O. Pearce	ab3c68c512	amend commit: Support large loose objects as streams We need to validate the stream state after the InflaterInputStream thinks the stream is done. Git expects a higher level of service from the Inflater than the InflaterInputStream usually gives, we need to ensure the embedded CRC is valid, and that there isn't trailing garbage at the end of the file. Change-Id: I1c9642a82dbd76b69e607dceccf8b85dc869a3c1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-06 19:41:01 -07:00
Stefan Lay	311da9b211	Fix comparison of nanoseconds NB.decodeInt32(info, base + 4) already returns nanoseconds. Therefore it must not be divided by 1000000. Change-Id: Ie8f5c4a03f984d98935dccedc2b1ba4457094899 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-06 17:57:17 +02:00
Shawn O. Pearce	1913b41bc7	log: Implement --follow The FollowFilter can be installed on a RevWalk to cause the path to be updated through rename detection when the affected file is found to be added to the project. The filter works reasonably well, for example we can follow the history of the fsck command in git-core: $ jgit log --name-status --follow builtin/fsck.c \| grep ^R R100 builtin-fsck.c builtin/fsck.c R099 fsck.c builtin-fsck.c R099 fsck-objects.c fsck.c R099 fsck-cache.c fsck-objects.c Change-Id: I4017bcfd150126aa342fdd423a688493ca660a1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 18:17:55 -07:00
Shawn O. Pearce	e9de5643fa	Cache the diff configuration section This way we don't have to reparse for the rename limit every time we create a new rename detector for a repository. Change-Id: I669d031690b85ef4da5e39189be7173fb773fc56 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 18:17:52 -07:00
Shawn O. Pearce	8a0c58394d	log: Add whitespace ignore options Similar to what we did with diff, implement whitespace ignore options for log too. This requires us to define some means of creating any RawText object type at will inside of DiffFormatter, so we define a new factory interface to construct RawText instances on demand. Unfortunately we have to copy the entire block of common options. args4j only processes the options/arguments on the one command class and Java doesn't support multiple inheritance. Change-Id: Ia16cd3a11b850fffae9fbe7b721d7e43f1d0e8a5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 17:32:47 -07:00
Shawn O. Pearce	bd8740dc14	Format submodule links during differences Instead of crashing, output a submodule link with the simple "Subproject commit $fullid\n" syntax used by C Git. Change-Id: Iae8646941683fb19b73fb038217d2e3bf5f77fa9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:59:06 -07:00
Shawn O. Pearce	5be90be996	Redo DiffFormatter API to be easier to use Passing around the OutputStream and the Repository is crazy. Instead put the stream in the constructor, since this formatter exists only to output to the stream, and put the repository as a member variable that can be optionally set. Change-Id: I2bad012fee7f40dc1346700ebd19f1e048982878 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:58:37 -07:00
Shawn O. Pearce	04a9d23b9a	log, diff: Add rename detection support Implement rename detection in the command line diff and log commands. Also support --name-status, -p and -U flags, as these can be quite useful to view more detail. All of the Git patch file formatting code is now moved over to the DiffFormatter class. This permits us to reuse it in any context, including inside of IDEs. Change-Id: I687ccba34e18105a07e0a439d2181c323209d96c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:32:03 -07:00
Shawn O. Pearce	978535b090	Implement similarity based rename detection Content similarity based rename detection is performed only after a linear time detection is performed using exact content match on the ObjectIds. Any names which were paired up during that exact match phase are excluded from the inexact similarity based rename, which reduces the space that must be considered. During rename detection two entries cannot be marked as a rename if they are different types of files. This prevents a symlink from being renamed to a regular file, even if their blob content appears to be similar, or is identical. Efficiently comparing two files is performed by building up two hash indexes and hashing lines or short blocks from each file, counting the number of bytes that each line or block represents. Instead of using a standard java.util.HashMap, we use a custom open hashing scheme similiar to what we use in ObjecIdSubclassMap. This permits us to have a very light-weight hash, with very little memory overhead per cell stored. As we only need two ints per record in the map (line/block key and number of bytes), we collapse them into a single long inside of a long array, making very efficient use of available memory when we create the index table. We only need object headers for the index structure itself, and the index table, but not per-cell. This offers a massive space savings over using java.util.HashMap. The score calculation is done by approximating how many bytes are the same between the two inputs (which for a delta would be how much is copied from the base into the result). The score is derived by dividing the approximate number of bytes in common into the length of the larger of the two input files. Right now the SimilarityIndex table should average about 1/2 full, which means we waste about 50% of our memory on empty entries after we are done indexing a file and sort the table's contents. If memory becomes an issue we could discard the table and copy all records over to a new array that is properly sized. Building the index requires O(M + N log N) time, where M is the size of the input file in bytes, and N is the number of unique lines/blocks in the file. The N log N time constraint comes from the sort of the index table that is necessary to perform linear time matching against another SimilarityIndex created for a different file. To actually perform the rename detection, a SxD matrix is created, placing the sources (aka deletions) along one dimension and the destinations (aka additions) along the other. A simple O(S x D) loop examines every cell in this matrix. A SimilarityIndex is built along the row and reused for each column compare along that row, avoiding the costly index rebuild at the row level. A future improvement would be to load a smaller square matrix into SimilarityIndexes and process everything in that sub-matrix before discarding the column dimension and moving down to the next sub-matrix block along that same grid of rows. An optional ProgressMonitor is permitted to be passed in, allowing applications to see the progress of the detector as it works through the matrix cells. This provides some indication of current status for very long running renames. The default line/block hash function used by the SimilarityIndex may not be optimal, and may produce too many collisions. It is borrowed from RawText's hash, which is used to quickly skip out of a longer equality test if two lines have different hash functions. We may need to refine this hash in the future, in order to minimize the number of collisions we get on common source files. Based on a handful of test commits in JGit (especially my own recent rename repository refactoring series), this rename detector produces output that is very close to C Git. The content similarity scores are sometimes off by 1%, which is most probably caused by our SimilarityIndex type using a different hash function than C Git uses when it computes the delta size between any two objects in the rename matrix. Bug: 318504 Change-Id: I11dff969e8a2e4cf252636d857d2113053bdd9dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:32:03 -07:00
Shawn O. Pearce	4dd7b35b26	Improve description of isBare and NoWorkTreeException Alex pointed out that my description of a bare repository might be confusing for some readers. Reword the description of the error, and make it consistent throughout the Repository class's API. Change-Id: I87929ddd3005f578a7022f363270952d1f7f8664 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:31 -07:00
Shawn O. Pearce	08d349a27b	amend commit: Refactor repository construction to builder class During code review, Alex raised a few comments about commit `532421d989` ("Refactor repository construction to builder class"). Due to the size of the related series we aren't going to go back and rebase in something this minor, so resolve them as a follow-up commit instead. Change-Id: Ied52f7a8f7252743353c58d20bfc3ec498933e00 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:30 -07:00
Shawn O. Pearce	fe9860a444	Remove pointless size test in PackFile decompress Now that any large objects are forced through a streaming loader when its bigger than getStreamFileThreshold(), and that threshold is pegged at Integer.MAX_VALUE as its largest size, we will never be able to reach this code path where we threw OutOfMemoryError. Robin pointed out that we probably should include a message here, but the code is effectively unreachable, so there isn't any value in adding a message at this point. So remove it. Change-Id: Ie611d005622e38a75537f1350246df0ab89dd500 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:30 -07:00
Shawn O. Pearce	412ca65bd5	Avoid unbounded getCachedBytes during parseAny Since we don't know the type of object we are parsing, we don't know if its a massive blob, or some small commit or annotated tag. Avoid pulling the cached bytes until we have checked the type and decided if we actually need them to continue parsing right now. This way large blobs which won't fit in memory and would throw a LargeObjectException don't abort parsing. Change-Id: Ifb70df5d1c59f616aa20ee88898cb69524541636 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:30 -07:00
Shawn O. Pearce	e4a480f658	Make type and size lazy for large delta objects Callers don't necessarily need the getSize() result from a large delta. They instead should be always using openStream() or copyTo() for blobs going to local files, or they should be checking the result of the constant-time isLarge() method to determine the type of access they can use on the ObjectLoader. Avoid inflating the delta instruction stream twice by delaying the decoding of the size until after we have created the DeltaStream and decoded the header. Likewise with the type, callers don't necessarily always need it to be present in an ObjectLoader. Delay looking at it as late as we can, thereby avoiding an ugly O(N^2) loop looking up the type for every single object in the entire delta chain. Change-Id: I6487b75b52a5d201d811a8baed2fb4fcd6431320 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:29 -07:00
Shawn O. Pearce	113577617b	Use core.streamFileThreshold to set our streaming limit We default this to 1 MiB for now, but we allow users to modify it through the Repository's configuration file to be a different value. A new repository listener is used to identify when the setting has been updated and trigger a reconfiguration of any active ObjectReaders. To prevent a horrible explosion we cap core.streamFileThreshold at no more than 1/4 of the maximum JVM heap size. We do this because we need at least 2 byte arrays equal in size to the stream threshold for the worst case delta inflation scenario, and our host application probably also needs some amount of the heap for their working set size. Change-Id: I103b3a541dc970bbf1a6d92917a12c5a1ee34d6c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-02 12:41:39 -07:00
Shawn O. Pearce	ad68553be4	Support large delta packed objects as streams Very large delta instruction streams, or deltas which use very large base objects, are now streamed through as large objects rather than being inflated into a byte array. This isn't the most efficient way to access delta encoded content, as we may need to rewind and reprocess the base object when there was a block moved within the file, but it will at least prevent the JVM from having its heap explode. When streaming a delta we have an inflater open for each level in the delta chain, to inflate the instruction set of the delta, as well as an inflater for the base level object. The base object is buffered, as is the top level delta requested by the application, but we do not buffer the intermediate delta streams. This keeps memory usage lower, so its closer to 1024 bytes per level in the chain, without having an adverse impact on raw throughput as the top-level buffer gets pushed down to the lowest stream that has the next region. Delta instructions transparently collapse here, if the top level does not copy a region from its base, the base won't materialize that part from its own base, etc. This allows us to avoid copying around a lot of segments which have been deleted from the final version. Change-Id: I724d45245cebb4bad2deeae7b896fc55b2dd49b3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-02 02:19:12 -07:00
Shawn O. Pearce	ded8f6c721	Support large whole packed objects as streams Similar to the loose object support, whole packed objects can now be streamed back to the caller. The streaming is less efficient as we copy the data from the cached window array into the InflaterInputStream's internal buffer, then inflate it there before returning to the application. Like with unpacked objects, there is plenty of room for some optimization, especially for the copyTo method, where we don't necessarily need so much buffering to exist. Change-Id: Ie23be81289e37e24b91d17b0891e47b9da988008 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-01 19:34:21 -07:00
Shawn O. Pearce	13e0218a25	Replace PackedObjectLoader with ObjectLoader.SmallObject The class is identical, but ObjectLoader.SmallObject is part of our public API for storage implementations to build on top of. Change-Id: I381a3953b14870b6d3d74a9c295769ace78869dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-01 18:27:51 -07:00
Shawn O. Pearce	fa23482ca7	Support large loose objects as streams Big loose objects can now be streamed if they are over the large object size threshold. This prevents the JVM heap from exploding with a very large byte array to hold the slurped file, and then again with its uncompressed copy. We may have slightly slowed down the simple case for small loose objects, as the loader no longer slurps the entire thing and decompresses in memory. To try and keep good performance for the very common small objects that are below 8 KiB in size, buffers are set to 8 KiB, causing the reader to slurp most of the file anyway. However the data has to be copied at least once, from the BufferedInputStream into the InflaterInputStream. New unit tests are supplied to get nearly 100% code coverage on the unpacked code paths, for both standard and pack style loose objects. We tested a fair chunk of the code elsewhere, but these new tests are better isolated to the specific branches in the code path. Change-Id: I87b764ab1b84225e9b5619a2a55fd8eaa640e1fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-01 18:26:17 -07:00
Jeff Schumacher	cb8e1e6014	Added a preliminary version of rename detection JGit does not currently do rename detection during diffs. I added a class that, given a TreeWalk to iterate over, can output a list of DiffEntry's for that TreeWalk, taking into account renames. This class only detects renames by SHA1's. More complex rename detection, along the lines of what C Git does will be added later. Change-Id: I93606ce15da70df6660651ec322ea50718dd7c04	2010-07-01 17:33:53 -07:00
Shawn O. Pearce	2489088235	Permit AnyObjectTo to compareTo AnyObjectId Assume that the argument of compareTo won't be mutated while we are doing the compare, and support the wider AnyObjectId type so MutableObjectId is suitable on either side of the compareTo call. Change-Id: I2a63a496c0a7b04f0e5f27d588689c6d5e149d98 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 19:07:36 -07:00
Shawn O. Pearce	d04b7972d8	Use copyTo during checkout of files to working tree This way we can stream a large file through memory, rather than loading the entire thing into a single contiguous byte array. Change-Id: I3ada2856af2bf518f072edec242667a486fb0df1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:56:20 -07:00
Shawn O. Pearce	a0fd06e5c2	Stream whole deflated objects in PackWriter Instead of loading the entire object as a byte array and passing that into the deflater, let the ObjectLoader copy the object onto the DeflaterOutputStream. This has the nice side effect of using some sort of stride hack in the Sun implementation that may improve compression performance. Change-Id: I3f3d681b06af0da93ab96c75468e00e183ff32fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:50:50 -07:00
Shawn O. Pearce	ad0383734e	Lazily allocate Deflater in PackWriter Only allocate the Deflater if we can't reuse everything, but also make sure we release it when we release the PackWriter's resources. Change-Id: I16a32b94647af0778658eda87acbafc9a25b314a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:40:54 -07:00
Shawn O. Pearce	23e7f6376a	Add openStream to ObjectLoader for big blobs Blobs that are too large to read as a single byte array should be accessed through an InputStream based interface instead, allowing the application to walk through the data stream incrementally. Define the basic interface to support streaming contents, but don't implement it yet for the file based backend. Change-Id: If9e4442e9ef4ed52c3e0f1af9398199a73145516 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:36:10 -07:00
Jeff Schumacher	7b0b4110ed	Refactored code out of FileHeader to facilitate rename detection Refactored a superclass out of FileHeader called DiffEntry that holds the more general data from FileHeader that is useful in rename detection (old/new Ids, modes, names, as well as changeType and score). FileHeader is now a DiffEntry that adds Hunks, parsing abilities, etc. Change-Id: I8398728cd218f8c6e98f7a4a7f2f342391d865e4	2010-06-30 17:53:27 -07:00
Dmitry Neverov	44854741c5	Fix missing flush in StreamCopyThread It is possible that StreamCopyThread will not flush everything from it's src to it's dst. In most cases StreamCopyThread works like this: in loop: n = src.read(buf); dst.write(buf, 0, n); and when we want to flush, we interrupt() StreamCopyThread and it flushes everything it wrote to dst. The problem is that our interrupt() could interrupt reading. In this case we will flush everything we wrote to dst, but not everything we wrote to src. Change-Id: Ifaf4d8be87535c7364dd59b217dfc631460018ff Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 10:48:44 -07:00
Shawn O. Pearce	a1d5f5b6b5	Move DirCache factory methods to Repository Instead of creating the DirCache from a static factory method, use an instance method on Repository, permitting the implementation to override the method with a completely different type of DirCache reading and writing. This would better support a repository in the cloud strategy, or even just an in-memory unit test environment. Change-Id: I6399894b12d6480c4b3ac84d10775dfd1b8d13e7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 10:39:00 -07:00
Shawn O. Pearce	cb9d8285ba	Create NoWorkTreeException for bare repositories Using a custom exception type makes it easire for an application developer to understand why an exception was thrown out of a method we declare. To remain compatiable with existing callers, we still extend off IllegalStateException. Change-Id: Ideeef2399b11ca460a2dbb3cd80eb76aa0a025ba Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 09:48:36 -07:00
Jeff Schumacher	9f2249bd26	Added check for binary files while diffing Added a check in Diff to ensure that files that are most likely not text are not line-by-line diffed. Files are determined to be binary by checking the first 8000 bytes for a null character. This is a similar heuristic to what C Git uses. Change-Id: I2b6f05674c88d89b3f549a5db483f850f7f46c26	2010-06-29 17:23:00 -07:00
Shawn O. Pearce	515deaf7e5	Ensure RevWalk is released when done Update a number of calling sites of RevWalk to ensure the walker's internal ObjectReader is released after the walk is no longer used. Because the ObjectReader is likely to hold onto a native resource like an Inflater, we don't want to leak them outside of their useful scope. Where possible we also try to share ObjectReaders across several walk pools, or between a walker and a PackWriter. This permits the ObjectReader to actually do some caching if it felt inclined to do so. Not everything was updated, we'll probably need to come back and update even more call sites, but these are some of the biggest offenders. Test cases in particular aren't updated. My plan is to move most storage-agnostic tests onto some purely in-memory storage solution that doesn't do compression. Change-Id: I04087ec79faeea208b19848939898ad7172b6672 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-29 15:12:53 -07:00
Shawn O. Pearce	94228bde22	Use ObjectReader in DirCacheBuilder.addTree Rather than building a custom reader, have the caller supply us one. Change-Id: Ief2b5a6b1b75f05c8a6bc732a60d4d1041dd8254 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-29 09:30:29 -07:00
Shawn O. Pearce	d6e975f71b	Use one ObjectReader for WalkFetchConnection Instead of creating new ObjectReader for each walker, use one for the entire connection and delegate reads through it. Change-Id: I7f0a2ec8c9fe60b095a7be77dc423a2ff8b443a3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 18:47:33 -07:00
Shawn O. Pearce	121d009b9b	Use ObjectReader in RevWalk, TreeWalk We don't actually need a Repository object here, just an ObjectReader that can load content for us. So change the API to depend on that. However, this breaks the asCommit and asTag legacy translation methods on RevCommit and RevTag, so we still have to keep the Repository inside of RevWalk for those two types. Hopefully we can drop those in the future, and then drop the Repository off the RevWalk. Change-Id: Iba983e48b663790061c43ae9ffbb77dfe6f4818e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 18:47:29 -07:00
Shawn O. Pearce	06f635a4bc	Fix minor formatting issue in UploadPack Change-Id: Ifc0c3a94dc0e16126af6cf17e9c4a7cb96e8ffab Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 18:47:28 -07:00
Shawn Pearce	3fd4918852	Merge changes Ie56301aa,Ic2f79e85 * changes: Added further support for whitespace ignoring during diff Added support for whitespace ignoring	2010-06-28 20:27:04 -04:00
Jeff Schumacher	9869ef2592	Added further support for whitespace ignoring during diff Added code to support ignoring leading, trailing, and changed whitespace when performing a diff operation. I also added command line options to Diff to enable the various whitespace ignoring methods. These match the flags for git diff. Change-Id: Ie56301aafad59ee3f0fe5de62719f5023cd702c8	2010-06-28 17:25:19 -07:00
Shawn O. Pearce	242b4026d9	Remove volatile keyword from RepositoryEvent We don't need this field to be volatile. Events are delivered by the same thread that created the RepositoryEvent object, and thus any cross-thread operations would need to be handled by some other type of synchronization in the listener, and that would protect both the repository field and any other per-event data. Change-Id: Iefe345959e1a2d4669709dbf82962bcc1b8913e3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 12:46:18 -07:00
Shawn O. Pearce	aa4b06e087	Rename openObject, hasObject to just open, has Similar to what we did on Repository, the openObject method already implied we wanted to open an object, given its main argument was of type AnyObjectId. Simplify the method name to just the action, has or open. Change-Id: If055e5e0d8de0e2424c18a773f6d2bc2f66054f4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:57:41 -07:00
Shawn O. Pearce	acb7be2c5a	Refactor Repository.openObject to be Repository.open We drop the "Object" suffix, because its pretty clear here that we want to open an object, given that we pass in AnyObjectId as the main parameter. We also fix the calling convention to throw a MissingObjectException or IncorrectObjectTypeException, so that callers don't have to do this error checking themselves. Change-Id: I72c43353cea8372278b032f5086d52082c1eee39 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:54:58 -07:00
Shawn O. Pearce	6b62e53b60	Move PackWriter progress monitors onto the operations Rather than taking the ProgressMonitor objects in our constructor and carrying them around as instance fields, take them as arguments to the actual time consuming operations we need to run. Change-Id: I2b230d07e277de029b1061c807e67de5428cc1c4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:47:28 -07:00
Shawn O. Pearce	f288c27e46	Pass the PackOutputStream down the call stack Rather than storing this in an instance member, pass it down the calling stack. Its cleaner, we don't have to poke the stream as a temporary field, and then unset it. Change-Id: I0fd323371bc12edb10f0493bf11885d7057aeb13 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:47:28 -07:00
Shawn O. Pearce	1ad2feb7b3	Remove Repository.openObject(ObjectReader, AnyObjectId) Going through ObjectReader.openObject(AnyObjectId) is faster, but also produces cleaner application level code. The error checking is done inside of the openObject method, which means it can be removed from the application code. Change-Id: Ia927b448d128005e1640362281585023582b1a3a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:47:28 -07:00
Shawn O. Pearce	9ba7bd4df4	Throw IncorrectObjectTypeException on bad type hints If the type hint isn't OBJ_ANY and it doesn't match the actual type observed from the object store, define the reader to throw back an IncorrectObjectTypeException. This way the caller doesn't have to perform this check itself before it evaluates the object data, and we can simplify quite a few call sites. Change-Id: I9f0dfa033857f439c94245361fcae515bc0a6533 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:47:25 -07:00
Jeff Schumacher	543235b805	Added support for whitespace ignoring JGit did not have support for skipping whitespace when comparing lines in RawText objects. I added a subclass of RawText that skips whitespace in its equals and hashCode methods. I used a subclass rather than adding functionality into RawText so that performance would not be impacted by extra logic. This class only supports ignoring all whitespace. Others will follow that allow other forms of whitespace ignoring. Change-Id: Ic2f79e85215e48d3fd53ec1b4ad13373dd183a4a	2010-06-28 10:59:10 -07:00
Shawn O. Pearce	a45728d7a4	Ensure ObjectReader used by PackWriter is released The ObjectReader API demands that we release the reader when we are done with it. PackWriter contains a reader, which it uses for the entire packing session. Expose the release of the reader through a release method on the writer. This still doesn't address the RevWalk and TreeWalk users, who don't correctly release their reader. But its a small step in the right direction. Change-Id: I5cb0b5c1b432434a799fceb21b86479e09b84a0a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 10:25:11 -07:00
Shawn O. Pearce	b5aa52e98a	Ensure PackWriter releases its ObjectReader Change-Id: I3f8af29066cc5a2132dc4a75c9654d97800f2f18 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 10:16:27 -07:00
Shawn O. Pearce	e01abbd543	Release ObjectReader before the cached ObjectDatabase I don't want to play games with the order of release here, its probably safer to release the reader before the database, just in case the one depends on the other. Change-Id: I2394c7d2477eaf7a7e1556fc3393c59d3b31e764 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 09:47:20 -07:00
Shawn O. Pearce	b40f02eb1a	Release ObjectInserter in merge() not mergeImpl() By doing the release at the higher level class, we can ensure the release occurs if the inserter was allocated, even if the implementation forgets to do this. Since the higher level class is what allocated it, it makes sense to have it also do the release. Change-Id: Id617b2db864c3208ed68cba4eda80e51612359ad Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 09:35:55 -07:00
Shawn O. Pearce	5aae041a81	Commit: Use Repository.newObjectInserter Everyone else does. This must have been a spot I missed during some sort of squash while developing the series. Change-Id: I62eae50b618f47ee33ad7cf71fc05b724f603201 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 09:22:48 -07:00
Shawn O. Pearce	ea21c111cb	Move PackWriter over to storage.pack.PackWriter Similar to what we did with the file code, move the pack writer into its own package so the related classes and their package private methods are hidden from the rest of the library. Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:51:12 -07:00
Shawn O. Pearce	71aace52f7	Simplify ObjectLoaders coming from PackFile We no longer need an ObjectLoader to be lazy and try to delay the materialization of the object content. That was done only to support PackWriter searching for a good reuse candidate. Instead, simplify the code base by doing the materialization immediately when the loader asks for it, because any caller asking for the loader is going to need the content. Change-Id: Id867b1004529744f234ab8f9cfab3d2c52ca3bd0 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:50:38 -07:00
Shawn O. Pearce	68518ca3aa	Remove getRawSize, getRawType from ObjectLoader These were only used by PackWriter to help it filter object representations. Their only user disappeared when we rewrote the object selection code path to use the new representation type. Change-Id: I9ed676bfe4f87fcf94aa21e53bda43115912e145 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:50:38 -07:00
Shawn O. Pearce	86547022f0	Tighten up local packed object representation during packing Rather than making a loader, and then using that to fill the object representation, parse the header and set up our data directly. This saves some time, as we don't waste cycles on information we won't use right now. The weight computed for a representation is now its actual stored size in the pack file, rather than its inflated size. This accounts for changes made when the compression level is modified on the repository. It is however more costly to determine the weight of the object, since we have to find its length in the pack. To try and recover that cost we now cache the length as part of our ObjectToPack record, so it doesn't have to be found during the output phase. A LocalObjectToPack now costs us (assuming 32 bit pointers): (32 bit) (64 bit) vm header: 8 bytes 8 bytes ObjectId: 20 bytes 20 bytes PackedObjectInfo: 12 bytes 12 bytes ObjectToPack: 8 bytes 12 bytes LocalOTP: 20 bytes 24 bytes ----------- --------- 68 bytes 74 bytes Change-Id: I923d2736186eb2ac8ab498d3eb137e17930fcb50 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:50:38 -07:00
Shawn O. Pearce	ad5238dc67	Move FileRepository to storage.file.FileRepository This move isolates all of the local file specific implementation code into a single package, where their package-private methods and support classes are properly hidden away from the rest of the core library. Because of the sheer number of files impacted, I have limited this change to only the renames and the updated imports. Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:50:34 -07:00
Shawn O. Pearce	3a7aec03e0	Implement zero-copy for single window objects Objects that fall completely within a single window can be worked with in a zero-copy fashion, provided that the window is backed by a normal byte[] and not by a ByteBuffer. This works for a surprising number of objects. The default window size is 8 KiB, but most deltas are quite a bit smaller than that. Objects smaller than 1/2 of the window size have a very good chance of falling completely within a window's array, which means we can work with them without copying their data around. Larger objects, or objects which are unlucky enough to span over a window boundary, get copied through the temporary buffer. We pay a tiny penalty to realize we can't use the zero-copy code path, but its easier than trying to keep track of two adjacent windows. With this change (as well as everything preceeding it), packing is actually a bit faster. Some crude benchmarks based on cloning linux-2.6.git (~324 MiB, 1,624,785 objects) over localhost using C git client and JGit daemon shows we get better throughput, and slightly better times: Total Time \| Throughput (old) (now) \| (old) (now) --------------+--------------------------- 2m45s 2m37s \| 12.49 MiB/s 21.17 MiB/s 2m42s 2m36s \| 16.29 MiB/s 22.63 MiB/s 2m37s 2m31s \| 16.07 MiB/s 21.92 MiB/s Change-Id: I48b2c8d37f08d7bf5e76c5a8020cde4a16ae3396 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 16:13:22 -07:00
Shawn O. Pearce	ece88b99eb	Redo PackWriter object reuse output Output of selected reuses is refactored to use a new ObjectReuseAsIs interface that extends the ObjectReader. This interface allows the reader to control how it performs the reuse into the output stream, but also allows it to throw an exception to request the writer to find a different candidate representation. The PackFile reuse code was overhauled, cleaning up the APIs so they aren't exposed in the object loader, but instead are now a single method on the PackFile itself. The reuse algorithm was changed to do a data verification pass, followed by the copy pass to the output. This permits us to work around a corrupt object in a pack file by seeking another copy of that object when this one is bad. The reuse code was also optimized for the common case, where the in-pack representation is under 16 KiB. In these smaller cases data is sent to the pack writer more directly, avoiding some copying. Change-Id: I6350c2b444118305e8446ce1dfd049259832bcca Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 14:46:05 -07:00
Shawn O. Pearce	bf4ffff07f	Redo PackWriter object reuse selection The new selection implementation uses a public API on the ObjectReader, allowing the storage library to enumerate its candidates and select the best one for this packer without needing to build a temporary list of the candidates first. Change-Id: Ie01496434f7d3581d6d3bbb9e33c8f9fa649b6cd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 14:16:06 -07:00
Shawn O. Pearce	e0c9368f3e	Reclaim some bits in ObjectToPack flags field Make the lower bits available for flags that PackWriter can use to keep track of facts about the object. We shouldn't need more than 2^24 delta depths, unpacking that chain is unfathomable anyway. This change gets us 4 bits that are unused in the lower end of the word, which are typically easier to load from Java and most machine instruction sets. We can use these in later changes. Change-Id: Ib9e11221b5bca17c8a531e4ed130ba14c0e3744f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 23:26:19 -07:00
Shawn O. Pearce	6fc3ecac84	Extract PackFile specific code to ObjectToPack subclass The ObjectReader class is dual-purposed into being a factory for the ObjectToPack, permitting specific ObjectDatabase implementations to override the method and offer their own custom subclass of the generic ObjectToPack class. By allowing them to directly extend the type, each implementation can add custom fields to support tracking where an object is stored, without incurring any additional penalties like a parallel Map<ObjectId,Object> would cost. The reader was chosen to act as a factory rather than the database, as the reader will eventually be tied more tightly with the ObjectWalk and TreeWalk. During object enumeration the reader would have had to load the object for the RevWalk, and may chose to cache object position data internally so it can later be reused and fed into the ObjectToPack instance supplied to the PackWriter. Since a reader is not thread-safe, and is scoped to this PackWriter and its internal ObjectWalk, its a great place for the database to perform caching, if any. Right now this change goes a bit backwards by changing what should be generic ObjectToPack references inside of PackWriter to the very PackFile specific LocalObjectToPack subclass. We will correct these in a later commit as we start to refine what the ObjectToPack API will eventually look like in order to better support the PackWriter. Change-Id: I9f047d26b97e46dee3bc0ccb4060bbebedbe8ea9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 23:26:19 -07:00
Shawn O. Pearce	a2208be6aa	Extract ObjectToPack to be top-level This shortens the implementation within PackWriter, and starts to open the door for some other refactorings based on changing the ObjectToPack to be a public part of the API. Change-Id: Id849cbffc4de20b903e844a2de7737eeb8b7a3ff Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 23:26:19 -07:00
Shawn O. Pearce	ffe0614d4d	Allow Repository.getDirectory() to be null Some types of repositories might not be stored on local disk. For these, they will most likely return null for getDirectory() as the java.io.File type cannot describe where their storage is, its not in the host's filesystem. Document that getDirectory() can return null now, and update all current non-test callers in JGit that might run into problems on such repositories. For the most part, just act like its bare. Change-Id: I061236a691372a267fd7d41f0550650e165d2066 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 18:03:41 -07:00
Shawn O. Pearce	8a9844b2af	Redo event listeners to be more generic Replace the old crude event listener system with a much more generic implementation, patterned after the event dispatch techniques used in Google Web Toolkit 1.5 and later. Each event delivers to an interface that defines a single method, and the event itself is what performs the delivery in a type-safe way through its own dispatch method. Listeners are registered in a generic listener list, indexed by the interface they implement and wish to receive an event for. Delivery of events is performed by looping through all listeners implementing the event's corresponding listener interface, and using the event's own dispatch method to deliver the event. This is the classical "double dispatch" pattern for event delivery. Listeners can be unregistered by invoking remove() on their registration handle. This change therefore requires application code to track the handle if it wishes to remove the listener at a later point in time. Event delivery is now exposed as a generic public method on the Repository class, making it easier for any type of message to be sent out to any type of listener that has registered, without needing to pre-arrange for type-safe fireFoo() methods. New event types can be added in the future simply by defining a new RepositoryEvent subclass and a corresponding RepositoryListener interface that it dispatches to. By always adding new events through a new interface, we never need to worry about defining an Adapter to provide default no-op implementations of new event methods. Change-Id: I651417b3098b9afc93d91085e9f0b2265df8fc81 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 18:03:41 -07:00
Shawn O. Pearce	203bd66267	Rename Repository getWorkDir to getWorkTree This better matches with the name used in the environment (GIT_WORK_TREE), in the configuration file (core.worktree), and in our builder object. Since we are already breaking a good chunk of other code related to repository access, and this fairly easy to fix in an application's code base, I'm not going to offer the wrapper getWorkDir() method. Change-Id: Ib698ba4bbc213c48114f342378cecfe377e37bb7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 18:03:41 -07:00
Shawn O. Pearce	532421d989	Refactor repository construction to builder class The new FileRepositoryBuilder class helps applications to construct a properly configured FileRepository, with properties assumed based upon the standard Git rules for the local filesystem. To better support simple command line applications, environment variable handling and repository searching was moved into this builder class. The change gets rid of the ever-growing FileRepository constructor variants, and the multitude of java.io.File typed parameters, by using simple named setter methods. Change-Id: I17e8e0392ad1dbf6a90a7eb49a6d809388d27e4c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:58:40 -07:00
Shawn O. Pearce	8f46ee4870	Remove Repository.toFile(ObjectId) Not every type of Repository will be able to map an ObjectId into a local file system path that stores that object's file contents. Heck, its not even true for the FileRepository, as an object can be stored in a pack file and not in its loose format. Remove this from our public API, it was a mistake to publish it. Change-Id: I20d1b8c39104023936e6d46a5b0d7ef39ff118e8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:58:39 -07:00

... 2 3 4 5 6 ...

588 Commits