motiejus/jgit - jgit - gitea: Gitea Service

motiejus

jgit

Author	SHA1	Message	Date
Stefan Lay	aa86cfc339	Do not add ignored files in Add command Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-22 11:26:04 +02:00
Shawn O. Pearce	09910ffa32	Move ignore node handling into WorkingTreeIterator The working tree iterator has perfect knowledge of the path structure as well as immediate information about whether or not an ignore file even exists at this level. We can exploit that to simplify the logic and running time for testing ignored file status by pushing all of the checks down into the iterator itself. Change-Id: I22ff534853e8c5672cc5c2d9444aeb14e294070e Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>	2010-07-21 10:34:08 -07:00
Shawn Pearce	0ec0e21fdf	Merge "Fix concurrent read / write issue in GitIndex on Windows"	2010-07-21 13:08:01 -04:00
Jens Baumgart	e99c48a61a	Fix concurrent read / write issue in GitIndex on Windows GitIndex.write fails if another thread concurrently reads the index file. The problem is fixed by retrying the rename operation if it fails. Bug: 311051 Change-Id: Ib243d2a90adae312712d02521de4834d06804944 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>	2010-07-21 09:35:15 +02:00
Christian Halstrick	5c94321b47	Check for racy git in WorkingTreeIterator The WorkingTreeIterator has a method to check whether the current file differs from the corresponding index entry. This commit improves this check to also handle racy git situations. See http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/technical/racy-git.txt;hb=HEAD Change-Id: I3ad0897211dcbb2eac9eebcb19d095a5052fb06b Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-20 21:55:18 +02:00
Christian Halstrick	c98d97731b	Smudge racily clean index entries by truncating length (like git.git) To mark an entry racily clean we set its length to 0 (like native git does). Entries which are not racily clean and have zero length can be distinguished from racily clean entries by checking P_OBJECTID against the SHA1 of empty content. When length is 0 and P_OBJECTID is different from SHA1 of empty content we know the entry is marked racily clean. See http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg00488.html Change-Id: I689552931441ab51964b430b303160c9126b66af Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-20 21:54:36 +02:00
Shawn O. Pearce	938943d674	Use proper constants for .gitignore and .git directory We have a constant for .gitignore, so use it. While we are in the same method, correct the reference of ".git" to be the actual GIT_DIR given. This might not be within the work tree if the GIT_DIR and GIT_WORK_TREE environment variables were used. Change-Id: I38e1cec13405109b9c347858b38dd9fb2f1f2560 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>	2010-07-20 09:11:39 -07:00
Shawn O. Pearce	c59db09bc5	Remove gitIgnoreTimestamp from abstract iterator API This never should have been exposed on the top of the AbstractTreeIterator type hierarchy. There is no concept of a timestamp in a canonical tree read from the object database, and the time in the DirCache isn't what we want here either. Actually all that we need is to find the files whose names are ".gitignore" and are below the root directory. We can accomplish that with a suffix filter, and process them immediately. Change-Id: Ib09cbf81a9e038452ce491385c65498312e2916b Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>	2010-07-20 09:09:01 -07:00
Shawn O. Pearce	395d236058	Fix NPE in RenameDetector If we have two adds of the same object but no deletes the detector threw an NPE because the entry that came back from the deleted map was null (no matching objects). In this case we need to put the adds all back onto the list of left over additions since they did not match a delete. Change-Id: Ie68fbe7426b4dc0cb571a08911c7adbffff755d5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Jeffrey Schumacher" <jeffschu@google.com>	2010-07-20 07:52:35 -07:00
Shawn O. Pearce	b518189b5c	IndexPack: Fix spurious pack file corruption errors We didn't correctly handle the zlib trailer for an object. If the trailer bytes were outside of the current buffer window but we had fully inflated the object itself, we broke out of the loop (as we had our target size) but inflate wasn't finished (as it did not yet get the trailer) so we failed the test and threw a corruption exception. Use an infinite loop and only break out when the inflater is done. Change-Id: I7c9bbbeb577a990d9bc56a50ebd485935460f6c8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-20 07:40:48 -07:00
Shawn O. Pearce	12fe0f2d1e	Discard the uncompressed delta as soon as its compressed The DeltaCache will most likely need to copy the compressed delta into a new buffer in order to compact away the wasted space at the end caused by over allocation. Since we don't need the uncompressed format anymore, null out our only reference to it so the GC can reclaim this memory if it needs to perform a collection in order to satisfy the cache's allocation attempt. Change-Id: I50403cfd2e3001b093f93a503cccf7adab43cc9d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-16 10:41:09 -07:00
Shawn O. Pearce	6e155d5f41	Merge branch 'js/rename' * js/rename: Implemented file path based tie breaking to exact rename detection Added more test cases for RenameDetector Added very small optimization to exact rename detection Fixed Misleading Javadoc Added file path similarity to scoring metric in rename detection Fixed potential div by zero bug Added file size based rename detection optimization Create FileHeader from DiffEntry log: Implement --follow Cache the diff configuration section log: Add whitespace ignore options Format submodule links during differences Redo DiffFormatter API to be easier to use log, diff: Add rename detection support Implement similarity based rename detection Added a preliminary version of rename detection Refactored code out of FileHeader to facilitate rename detection	2010-07-16 10:22:15 -07:00
Shawn O. Pearce	0b46e70155	Fix infinite loop in IndexPack A programming error using the Inflater API led to an infinite loop within IndexPack, caused by the Inflater returning 0 from the inflate() method, but it didn't want more input. This happens when it has reached the end of the stream, or has reached a spot asking for an external dictionary. Such a case is a failure for us, and we should abort out. Thanks to Alex for pointing out that we had 3 implementations of the inflate rountine, which should be consolidated into one and use a switch to determine where to load data from. Bug: 317416 Change-Id: I34120482375b687ea36ed9154002d77047e94b1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-16 10:12:04 -07:00
Jeff Schumacher	31311cacfd	Implemented file path based tie breaking to exact rename detection During the exact rename detection phase in RenameDetector, ties were resolved on a first-found basis. I added support for file path based tie breaking during that phase. Basically, there are four situations that have to be handled: One add matching one delete: In this simple case, we pair them as a rename. One add matching many deletes: Find the delete whos path matches the add the closest, and pair them as a rename. Many adds matching one delete: Similar to the above case, we find the add that matches the delete the closest, and pair them as a rename. The other adds are marked as copies of the delete. Many adds matching many deletes: Build a scoring matrix similar to the one used for content- based matching, scoring instead by file path. Some of the utility functions in SimilarityRenameDetector are used in this case, as we use the same encoding scheme. Once the matrix is built, scan it for the best matches, marking them as renames. The rest are marked as copies. I don't particularly like the idea of using utility functions right out of SimilarityRenameDetector, but it works for the moment. A later commit will likely refactor this into a common utility class, as well as bringing exact rename detection out of RenameDetector and into a separate class, much like SimilarityRenameDetector. Change-Id: I1fb08390aebdcbf20d049aecf402a36506e55611	2010-07-16 09:56:42 -07:00
Christian Halstrick	b840ed0121	Added dirty-detection to WorkingTreeIterator Added possibility to compare the current entry of a WorkingTreeIterator to a given DirCacheEntry. This is done to detect whether an entry in the index is dirty or not. 'Dirty' means that the file in the working tree is different from what's in the index. Merge algorithms will make use of this to detect conflicts. Change-Id: I3ff847f4bf392553dcbd6ee236c6ca32a13eedeb Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-16 10:08:52 +02:00
Shawn Pearce	19473b1dbc	Merge "Handle the tilde notation (~user) of git url"	2010-07-15 17:29:21 -04:00
Robin Rosenberg	845714158a	Handle the tilde notation (~user) of git url When the path is prefixed with ~ the URI parser thought about this as /~. Strip the / if the next character is the tilde. Bug: 307017 Change-Id: I58203e5617956b46d83e8987d1f8042beddffac3 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-07-15 01:16:09 +02:00
Stefan Lay	233e0130b5	Git Porcelain API: Add Command The new Add command adds files to the Git Index. It uses the DirCache to access the git index. It works also in case of an existing conflict. Fileglobs (e.g. *.c) are not yet supported. The new Add command does add ignored files because there is no gitignore support in jgit yet. Bug: 318440 Change-Id: If16fdd4443e46b27361c2a18ed8f51668af5d9ff Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-14 11:24:58 +00:00
Shawn Pearce	0ef99921fa	Merge changes I104cd62f,I1d0238b4 * changes: Internationalize RepositoryState descriptions Say that commit is allowed during bisect	2010-07-13 20:36:25 -04:00
Charley Wang	b878cdcf6b	Add compatibility with gitignore specifications This patch adds ignore compatibility to jgit. It encompasses exclude files as well as .gitignore. Uses TreeWalk and FileTreeIterator to find nodes and parses .gitignore files when required. The patch includes a simple cache that can be used to save results and avoid excessive gitignore parsing. CQ: 4302 Bug: 303925 Change-Id: Iebd7e5bb534accca4bf00d25bbc1f561d7cad11b Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com> Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-13 00:34:15 +02:00
Jeff Schumacher	bc08fafb41	Added very small optimization to exact rename detection Optimized a small loop in findExactRenames. The loop would go through all the items in a list of DiffEntries even after it already found what it was looking for. I made it break out of the loop as soon as a good match was found. Change-Id: I28741e0c49ce52d8008930a87cd1db7037700a61	2010-07-12 12:54:01 -07:00
Jeff Schumacher	a20e6f6fec	Fixed Misleading Javadoc The javadoc for the setRenameLimit method in RenameDetector said that you could only have limits in the range (0,100), implying that 0 and 100 were illegal inputs. The code, however, allowed 0 and 100. I changed the javadoc to say that the range [0,100] was legal. I also documented the IllegalArgumentException that is thrown if the limit is outside that range. Change-Id: I916838f254859f6f0e1516bb55b8e7dc87e57dc2	2010-07-12 12:54:01 -07:00
Jeff Schumacher	9a48de86d8	Added file path similarity to scoring metric in rename detection The scoring method was not taking into account the similarity of the file paths and file names. I changed the metric so that it is 99% based on content (which used to be 100% of the old metric), and 1% based on path similarity. Of that 1%, half (.5% of the total final score) is based on the actual file names (e.g. "foo.java"), and half on the directory (e.g. "src/com/foo/bar/"). Change-Id: I94f0c23bf6413c491b10d5625f6ad7d2ecfb4def	2010-07-12 12:52:05 -07:00
Jeff Schumacher	4c14b7869d	Fixed potential div by zero bug The scoring logic in SimilarityIndex was dividing by the max file size. If both files are empty, this would cause a div by zero error. This case cannot currently happen, since two empty files would have the same SHA1, and would therefore be caught in the earlier SHA1 based detection pass. Still, if this logic eventually gets separated from that pass, a div by zero error would occur. I changed the logic to instead consider two empty files to have a similarity score of 100. Change-Id: Ic08e18a066b8fef25bb5e7c62418106a8cee762a	2010-07-12 12:24:42 -07:00
Jeff Schumacher	64b9458640	Added file size based rename detection optimization Prior to this change, files that were very different in size (enough so that they could not have enough in common to be detected as renames) were still having their scores calculated. I added an optimization to skip such files. For example, if the rename detection threshold is 60%, the larger file is 200kb, and the smaller file is 50kb, the pair cannot be counted as a rename since they cannot possibly share 60% of their content in common. (200*.6=120, 120>50) Change-Id: Icd8315412d5de6292839778e7cea7fe6f061b0fc	2010-07-12 12:24:42 -07:00
Robin Rosenberg	d787a82e50	Internationalize RepositoryState descriptions Change-Id: I104cd62f3e89acf010b1d40a2b08e7f68f63bb85 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-07-10 10:24:37 +02:00
Shawn O. Pearce	9734194917	Honor pack.windowlimit to cap memory usage during packing The pack.windowlimit configuration parameter places an upper bound on the number of bytes used by the DeltaWindow class as it scans through the object list. If memory usage would exceed the limit the window is temporarily decreased in size to keep memory used within that bound. Change-Id: I09521b8f335475d8aee6125826da8ba2e545060d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:19:07 -07:00
Shawn O. Pearce	74e0835012	Honor pack.threads and perform delta search in parallel If we have multiple CPUs available, packing usually goes faster when each CPU is assigned a slice of the available search space. The number of threads to use is guessed from the runtime if it wasn't set by the caller, or wasn't set in the configuration. Change-Id: If554fd8973db77632a52a0f45377dd6ec13fc220 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:17:30 -07:00
Shawn O. Pearce	a960d1429e	Cache small deltas during packing PackWriter now caches small deltas, or deltas that are very tiny compared to their source inputs, so that the writing phase goes faster by reusing those cached deltas. The cached data is stored compressed, which usually translates to a bigger footprint due to deltas being very hard to compress, but saves time during writing by avoiding the deflate step. They are held under SoftReferences so that the JVM GC can clear out deltas if memory gets very tight. We would rather continue working and spend a bit more CPU time during writing than crash due to OOME. To avoid OutOfMemoryErrors during the caching phase we also trap OOME and just abort out of the caching. Because deflateBound() always produces something larger than what we need to actually store the deflated data, we copy it over into a new buffer if the actual length doesn't match the buffer length. When packing jgit.git this saves over 111 KiB in the cache, and is thus a worthwhile hit on CPU time. To further save memory we store the inflated size of the delta (which we need for the object header) in the same field as the pathHash, as the pathHash is no longer necessary by this phase of the packing algorithm. Change-Id: I0da0c600d845e8ec962289751f24e65b5afa56d7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:15:54 -07:00
Shawn O. Pearce	dfad23bf3d	Implement delta generation during packing PackWriter now produces new deltas if there is not a suitable delta available for reuse from an existing pack file. This permits JGit to send less data on the wire by sending a delta relative to an object the other side already has, instead of sending the whole object. The delta searching algorithm is similar in style to what C Git uses, but apparently has some differences (see below for more on). Briefly, objects that should be considered for delta compression are pushed onto a list. This list is then sorted by a rough similarity score, which is derived from the path name the object was discovered at in the repository during object counting. The list is then walked in order. At each position in the list, up to $WINDOW objects prior to it are attempted as delta bases. Each object in the window is tried, and the shortest delta instruction sequence selects the base object. Some rough rules are used to prevent pathological behavior during this matching phase, like skipping pairings of objects that are not similar enough in size. PackWriter intentionally excludes commits and annotated tags from this new delta search phase. In the JGit repository only 28 out of 2600+ commits can be delta compressed by C Git. As the commit count tends to be a fair percentage of the total number of objects in the repository, and they generally do not delta compress well, skipping over them can improve performance with little increase in the output pack size. Because this implementation was rebuilt from scratch based on my own memory of how the packing algorithm has evolved over the years in C Git, PackWriter, DeltaWindow, and DeltaEncoder don't use exactly the same rules everywhere, and that leads JGit to produce different (but logically equivalent) pack files. Repository \| Pack Size (bytes) \| Packing Time \| JGit - CGit = Difference \| JGit / CGit -----------+----------------------------------+----------------- git \| 25094348 - 24322890 = +771458 \| 59.434s / 59.133s jgit \| 5669515 - 5709046 = - 39531 \| 6.654s / 6.806s linux-2.6 \| 389M - 386M = +3M \| 20m02s / 18m01s For the above tests pack.threads was set to 1, window size=10, delta depth=50, and delta and object reuse was disabled for both implementations. Both implementations were reading from an already fully packed repository on local disk. The running time reported is after 1 warm-up run of the tested implementation. PackWriter is writing 771 KiB more data on git.git, 3M more on linux-2.6, but is actually 39.5 KiB smaller on jgit.git. Being larger by less than 0.7% on linux-2.6 isn't bad, nor is taking an extra 2 minutes to pack. On the running time side, JGit is at a major disadvantage because linux-2.6 doesn't fit into the default WindowCache of 20M, while C Git is able to mmap the entire pack and have it available instantly in physical memory (assuming hot cache). CGit also has a feature where it caches deltas that were created during the compression phase, and uses those cached deltas during the writing phase. PackWriter does not implement this (yet), and therefore must create every delta twice. This could easily account for the increased running time we are seeing. Change-Id: I6292edc66c2e95fbe45b519b65fdb3918068889c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:14:18 -07:00
Shawn O. Pearce	074055d747	debug-show-packdelta: Dump a pack delta to the console This is a horribly crude application, it doesn't even verify that the object its dumping is delta encoded. Its method of getting the delta is pretty abusive to the public PackWriter API, because right now we don't want to expose the real internal low-level methods actually required to do this. Change-Id: I437a17ceb98708b5603a2061126eb251e82f4ed4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:12:32 -07:00
Shawn O. Pearce	8612c0ace1	Initial pack format delta generator DeltaIndex is a simple pack style delta generator. The function works by creating a compact index of a source buffer's blocks, and then walking a sliding window along a desired result buffer, searching for the window in the index. When a match is found, the window is stretched to the longest possible length that is common with the source buffer, and a copy instruction is created. Rabin's polynomial hash function is used to compute the hash for a block, permitting efficient sliding of the window in single byte increments. The update function to slide one byte originated from David Mazieres' work in LBFS, and our implementation of the update step was certainly inspired by the initial work Geert Bosch proposed for C Git in http://marc.info/?l=git&m=114565424620771&w=2. To ensure the encoder runs in linear time with respect to the size of the two input buffers (source and result), the maximum number of blocks that can share the same position in the index's hashtable is capped at a constant number. This prevents bad inputs from causing the encoder to run in quadratic time, but comes with a penalty of creating a longer delta due to fewer considered copy positions. Strange hackery is used to cap the amount of memory used by the index to be no more than 12 bytes for every 16 bytes of source buffer, no matter what the JVM per-object overhead is. This permits an index to always be no larger than 1.75x the source buffer length, which is an important feature to support large windows of candidates to match against while packing. Here the strange hackery is nothing more than a manually managed chained hashtable, where pointers are array indexes into storage arrays rather than object references. Computation of the hash function for a single fixed sized block is done through an unrolled loop, where the first 4 iterations have been manually reduced down to eliminate unnecessary instructions. The pattern is derived from ObjectId.equals(byte[], int, byte[], int), where we have unrolled the loop required to compare two 20 byte arrays. Hours of testing with the Sun 1.6 JRE concluded that the non-obvious "foo[idx + 1]" style of reference is faster than "foo[idx++]", and so that is what we use here during hashing. Change-Id: If9fb2a1524361bc701405920560d8ae752221768 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:10:55 -07:00
Shawn O. Pearce	b38426ae8c	Add debugging toString() method to ObjectToPack Its useful to know what the flags are or what the base that was selected is. Dump these out as part of the object's toString. Change-Id: I8810067fb8337b08b4fcafd5f9ea3e1e31ca6726 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:09:19 -07:00
Shawn O. Pearce	699e4aa7c5	Make ObjectToPack clearReuseAsIs signal available to subclasses A subclass may want to use this method to release handles that are caching reuse information. Make it protected so they can override it and update themselves. Change-Id: I2277a56ad28560d2d2d97961cbc74bc7405a70d4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:07:45 -07:00
Shawn O. Pearce	4569d77e13	Correctly classify the compressing objects phase Searching for reuse candidates should be fast compared to actually doing delta compression. So pull the progress monitor out of this phase and rename it back to identify the compressing objects state. Change-Id: I5eb80919f21c1251e0e3420ff7774126f1f79b27 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:06:10 -07:00
Shawn O. Pearce	85b7a53d52	Refactor ObjectToPack's delta depth setting Long ago when PackWriter is first written we thought that the delta depth could be updated automatically. But its never used. Instead make this a simple standard setter so the caller can more directly set the delta depth of this object. This permits us to configure a depth that takes into account more than just the depth of another object in this same pack. Change-Id: I1d71b74f2edd7029b8743a2c13b591098ce8cc8f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:04:35 -07:00
Shawn O. Pearce	6730f9e3c8	Configure core.bigFileThreshold into PackWriter C Git's fast-import uses this to determine the maximum file size that it tries to delta compress, anything equal to or above this setting is stored with as a whole object with simple deflate. Define the configuration so we can use it later. Change-Id: Iea46e787d019a1b6c51135cc73d7688a02e207f5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:02:54 -07:00
Shawn O. Pearce	823e9a9721	Add doNotDelta flag to ObjectToPack This flag will later control whether or not PackWriter search for a delta base for this object. Edge objects will never get searched, as the writer won't be outputting them, so they should always have this flag set on. Sometime in the future this flag should also be set for file blobs on file paths that have the "-delta" gitattribute set in the repository's attributes file. Change-Id: I6e518e1a6996c8ce00b523727f1b605e400e82c6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:00:49 -07:00
Shawn O. Pearce	616bc74cf7	Add more configuration options to PackWriter We now at least import other pack settings like pack.window, which means we can later use these to control how we search for deltas. The compression level was fixed to use pack.compression rather than the loose object core.compression setting. Change-Id: I72ff6d481c936153ceb6a9e485fa731faf075a9a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:00:46 -07:00
Robin Rosenberg	a1492f1922	Say that commit is allowed during bisect C Git allows this and it is quite handy. Change-Id: I1d0238b43fca931ad2079649fb7b431e2815c351 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-07-10 02:32:46 +02:00
Shawn O. Pearce	2f93a09dd1	Save object path hash codes during packing We need to remember these so we can later cluster objects that have similar file paths near each other as we search for deltas between them. Change-Id: I52cb1e4ca15c9c267a2dbf51dd0d795f885f4cf8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 15:17:26 -07:00
Shawn O. Pearce	c20daa7314	Add path hash code to ObjectWalk PackWriter wants to categorize objects that are similar in path name, so blobs that are probably from the same file (or same sort of file) can be delta compressed against each other. Avoid converting into a string by performing the hashing directly against the path buffer in the tree iterator. We only hash the last 16 bytes of the path, and we try avoid any spaces, as we want the suffix of a file such as ".java" to be more important than the directory it is in, like "src". Change-Id: I31770ee711526306769a6f534afb19f937e0ba85 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 10:37:47 -07:00
Shawn O. Pearce	b584cb8754	Add getObjectSize to ObjectReader This is an informational function used by PackWriter to help it better organize objects for delta compression. Storage systems can implement it to provide up more detailed size information, or they can simply rely on the default behavior that uses the ObjectLoader obtained from open. For local file storage, we can obtain this information faster through specialized routines that parse a pack object header. Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 10:37:47 -07:00
Shawn O. Pearce	97311cd3e0	Allow TemporaryBuffer.Heap to allocate smaller than 8 KiB If the heap limit was set to something smaller than 8 KiB, we were still allocating the full 8 KiB block size, and accepting up to the amount we allocated by. Instead actually put a hard cap on the limit. Change-Id: Id1da26fde2102e76510b1da4ede8493928a981cc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 10:37:47 -07:00
Matthias Sohn	b8f2bb7d2a	Add support for updateNeeded flag in DirCacheEntry Change-Id: If06ff41d9ccd422afbc79ecbc3cfdf8bb2508dcd Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-09 14:12:06 +02:00
Jeff Schumacher	a8b29afd82	Create FileHeader from DiffEntry Added support for converting DiffEntrys to FileHeaders. FileHeaders are DiffEntrys with a buffer containing the diff output as well as a list of HunkHeaders. The HunkHeaders contain EditLists. The createFileHeader(DiffEntry) method in DiffFormatter performs a Myers Diff on the files refered to by the DiffEntry, then puts the returned EditList into a single HunkHeader, which is then put into the FileHeader to be returned. It also generates the appropriate diff header an puts it into the FileHeader's buffer. The rest of the diff output, which would normally be parsed to generate the HunkHeaders, is not generated. In fact, the purpose of this method is to avoid the costly diff output generation and parsing normally required to create a FileHeader. Change-Id: I7d8b18c0f6c85e3d02ad58995d3d231e69af5887	2010-07-08 16:58:55 -07:00
Stefan Lay	354b90131a	Fix javadoc typos in JGit API There were some small errors which made it difficult to read the JavaDoc. Change-Id: Ib3b34353465162adebaca3514d596d0edf5aea51 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-08 10:42:29 +02:00
Shawn O. Pearce	711bd3e3d0	Define a constant for 127 in DeltaEncoder The special value 127 here means how many bytes we can put into a single insert command. Rather than use the magical value 127, lets name it to better document the code. Change-Id: I5a326f4380f6ac87987fa833e9477700e984a88e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 09:52:09 -07:00
Shawn O. Pearce	cd7dd8591e	Cap delta copy instructions at 64k Although all modern delta decoders can process copy instructions with a count as large as 0xffffff (~15.9 MiB), pack version 2 streams are only supposed to use delta copy instructions up to 64 KiB. Rewrite our copy instruction encode loop to use the lower 64 KiB limit, even though modern decoders would support longer copies. To improve encoding performance we now try to encode up to four full copy commands in our buffer before we flush it to the stream, but we don't try to implement full buffering here. We are just trying to amortize the virtual method call to the destination stream when we have to do a large copy. Change-Id: I9410a16e6912faa83180a9788dc05f11e33fabae Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 09:52:09 -07:00
Shawn O. Pearce	384a19eee0	Deprecate all of the older Tree related code We want to get rid of these APIs, because they don't perform as well as DirCache/TreeWalk, or don't offer nearly as many features. Bug: 319145 Change-Id: I2b28f9cddc36482e1ad42d53e86e9d6461ba3bfc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 09:15:02 -07:00
Shawn O. Pearce	a215914a56	Fix DeltaEncoder header for objects 128 bytes long The encode loop had the wrong condition, objects that are 128 bytes in size need to have their length encoded as two bytes, not one. Change-Id: I3bef85f2b774871ba6104042b341749eb8e7595c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 08:53:03 -07:00
Shawn O. Pearce	f29741d1d8	amend commit: Support large delta packed objects as streams Rename the ByteWindow's inflate() method to setInput. We have completely refactored the purpose of this method to be feeding part (or all) of the window as input to the Inflater, and the actual inflate activity happens in the caller. Change-Id: Ie93a5bae0e9e637b5e822d56993ce6b562c6ad15 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-06 19:41:06 -07:00
Shawn O. Pearce	ab3c68c512	amend commit: Support large loose objects as streams We need to validate the stream state after the InflaterInputStream thinks the stream is done. Git expects a higher level of service from the Inflater than the InflaterInputStream usually gives, we need to ensure the embedded CRC is valid, and that there isn't trailing garbage at the end of the file. Change-Id: I1c9642a82dbd76b69e607dceccf8b85dc869a3c1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-06 19:41:01 -07:00
Stefan Lay	311da9b211	Fix comparison of nanoseconds NB.decodeInt32(info, base + 4) already returns nanoseconds. Therefore it must not be divided by 1000000. Change-Id: Ie8f5c4a03f984d98935dccedc2b1ba4457094899 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-06 17:57:17 +02:00
Shawn O. Pearce	1913b41bc7	log: Implement --follow The FollowFilter can be installed on a RevWalk to cause the path to be updated through rename detection when the affected file is found to be added to the project. The filter works reasonably well, for example we can follow the history of the fsck command in git-core: $ jgit log --name-status --follow builtin/fsck.c \| grep ^R R100 builtin-fsck.c builtin/fsck.c R099 fsck.c builtin-fsck.c R099 fsck-objects.c fsck.c R099 fsck-cache.c fsck-objects.c Change-Id: I4017bcfd150126aa342fdd423a688493ca660a1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 18:17:55 -07:00
Shawn O. Pearce	e9de5643fa	Cache the diff configuration section This way we don't have to reparse for the rename limit every time we create a new rename detector for a repository. Change-Id: I669d031690b85ef4da5e39189be7173fb773fc56 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 18:17:52 -07:00
Shawn O. Pearce	8a0c58394d	log: Add whitespace ignore options Similar to what we did with diff, implement whitespace ignore options for log too. This requires us to define some means of creating any RawText object type at will inside of DiffFormatter, so we define a new factory interface to construct RawText instances on demand. Unfortunately we have to copy the entire block of common options. args4j only processes the options/arguments on the one command class and Java doesn't support multiple inheritance. Change-Id: Ia16cd3a11b850fffae9fbe7b721d7e43f1d0e8a5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 17:32:47 -07:00
Shawn O. Pearce	bd8740dc14	Format submodule links during differences Instead of crashing, output a submodule link with the simple "Subproject commit $fullid\n" syntax used by C Git. Change-Id: Iae8646941683fb19b73fb038217d2e3bf5f77fa9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:59:06 -07:00
Shawn O. Pearce	5be90be996	Redo DiffFormatter API to be easier to use Passing around the OutputStream and the Repository is crazy. Instead put the stream in the constructor, since this formatter exists only to output to the stream, and put the repository as a member variable that can be optionally set. Change-Id: I2bad012fee7f40dc1346700ebd19f1e048982878 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:58:37 -07:00
Shawn O. Pearce	04a9d23b9a	log, diff: Add rename detection support Implement rename detection in the command line diff and log commands. Also support --name-status, -p and -U flags, as these can be quite useful to view more detail. All of the Git patch file formatting code is now moved over to the DiffFormatter class. This permits us to reuse it in any context, including inside of IDEs. Change-Id: I687ccba34e18105a07e0a439d2181c323209d96c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:32:03 -07:00
Shawn O. Pearce	978535b090	Implement similarity based rename detection Content similarity based rename detection is performed only after a linear time detection is performed using exact content match on the ObjectIds. Any names which were paired up during that exact match phase are excluded from the inexact similarity based rename, which reduces the space that must be considered. During rename detection two entries cannot be marked as a rename if they are different types of files. This prevents a symlink from being renamed to a regular file, even if their blob content appears to be similar, or is identical. Efficiently comparing two files is performed by building up two hash indexes and hashing lines or short blocks from each file, counting the number of bytes that each line or block represents. Instead of using a standard java.util.HashMap, we use a custom open hashing scheme similiar to what we use in ObjecIdSubclassMap. This permits us to have a very light-weight hash, with very little memory overhead per cell stored. As we only need two ints per record in the map (line/block key and number of bytes), we collapse them into a single long inside of a long array, making very efficient use of available memory when we create the index table. We only need object headers for the index structure itself, and the index table, but not per-cell. This offers a massive space savings over using java.util.HashMap. The score calculation is done by approximating how many bytes are the same between the two inputs (which for a delta would be how much is copied from the base into the result). The score is derived by dividing the approximate number of bytes in common into the length of the larger of the two input files. Right now the SimilarityIndex table should average about 1/2 full, which means we waste about 50% of our memory on empty entries after we are done indexing a file and sort the table's contents. If memory becomes an issue we could discard the table and copy all records over to a new array that is properly sized. Building the index requires O(M + N log N) time, where M is the size of the input file in bytes, and N is the number of unique lines/blocks in the file. The N log N time constraint comes from the sort of the index table that is necessary to perform linear time matching against another SimilarityIndex created for a different file. To actually perform the rename detection, a SxD matrix is created, placing the sources (aka deletions) along one dimension and the destinations (aka additions) along the other. A simple O(S x D) loop examines every cell in this matrix. A SimilarityIndex is built along the row and reused for each column compare along that row, avoiding the costly index rebuild at the row level. A future improvement would be to load a smaller square matrix into SimilarityIndexes and process everything in that sub-matrix before discarding the column dimension and moving down to the next sub-matrix block along that same grid of rows. An optional ProgressMonitor is permitted to be passed in, allowing applications to see the progress of the detector as it works through the matrix cells. This provides some indication of current status for very long running renames. The default line/block hash function used by the SimilarityIndex may not be optimal, and may produce too many collisions. It is borrowed from RawText's hash, which is used to quickly skip out of a longer equality test if two lines have different hash functions. We may need to refine this hash in the future, in order to minimize the number of collisions we get on common source files. Based on a handful of test commits in JGit (especially my own recent rename repository refactoring series), this rename detector produces output that is very close to C Git. The content similarity scores are sometimes off by 1%, which is most probably caused by our SimilarityIndex type using a different hash function than C Git uses when it computes the delta size between any two objects in the rename matrix. Bug: 318504 Change-Id: I11dff969e8a2e4cf252636d857d2113053bdd9dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:32:03 -07:00
Shawn O. Pearce	4dd7b35b26	Improve description of isBare and NoWorkTreeException Alex pointed out that my description of a bare repository might be confusing for some readers. Reword the description of the error, and make it consistent throughout the Repository class's API. Change-Id: I87929ddd3005f578a7022f363270952d1f7f8664 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:31 -07:00
Shawn O. Pearce	08d349a27b	amend commit: Refactor repository construction to builder class During code review, Alex raised a few comments about commit `532421d989` ("Refactor repository construction to builder class"). Due to the size of the related series we aren't going to go back and rebase in something this minor, so resolve them as a follow-up commit instead. Change-Id: Ied52f7a8f7252743353c58d20bfc3ec498933e00 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:30 -07:00
Shawn O. Pearce	fe9860a444	Remove pointless size test in PackFile decompress Now that any large objects are forced through a streaming loader when its bigger than getStreamFileThreshold(), and that threshold is pegged at Integer.MAX_VALUE as its largest size, we will never be able to reach this code path where we threw OutOfMemoryError. Robin pointed out that we probably should include a message here, but the code is effectively unreachable, so there isn't any value in adding a message at this point. So remove it. Change-Id: Ie611d005622e38a75537f1350246df0ab89dd500 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:30 -07:00
Shawn O. Pearce	412ca65bd5	Avoid unbounded getCachedBytes during parseAny Since we don't know the type of object we are parsing, we don't know if its a massive blob, or some small commit or annotated tag. Avoid pulling the cached bytes until we have checked the type and decided if we actually need them to continue parsing right now. This way large blobs which won't fit in memory and would throw a LargeObjectException don't abort parsing. Change-Id: Ifb70df5d1c59f616aa20ee88898cb69524541636 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:30 -07:00
Shawn O. Pearce	e4a480f658	Make type and size lazy for large delta objects Callers don't necessarily need the getSize() result from a large delta. They instead should be always using openStream() or copyTo() for blobs going to local files, or they should be checking the result of the constant-time isLarge() method to determine the type of access they can use on the ObjectLoader. Avoid inflating the delta instruction stream twice by delaying the decoding of the size until after we have created the DeltaStream and decoded the header. Likewise with the type, callers don't necessarily always need it to be present in an ObjectLoader. Delay looking at it as late as we can, thereby avoiding an ugly O(N^2) loop looking up the type for every single object in the entire delta chain. Change-Id: I6487b75b52a5d201d811a8baed2fb4fcd6431320 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:29 -07:00
Shawn O. Pearce	113577617b	Use core.streamFileThreshold to set our streaming limit We default this to 1 MiB for now, but we allow users to modify it through the Repository's configuration file to be a different value. A new repository listener is used to identify when the setting has been updated and trigger a reconfiguration of any active ObjectReaders. To prevent a horrible explosion we cap core.streamFileThreshold at no more than 1/4 of the maximum JVM heap size. We do this because we need at least 2 byte arrays equal in size to the stream threshold for the worst case delta inflation scenario, and our host application probably also needs some amount of the heap for their working set size. Change-Id: I103b3a541dc970bbf1a6d92917a12c5a1ee34d6c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-02 12:41:39 -07:00
Shawn O. Pearce	ad68553be4	Support large delta packed objects as streams Very large delta instruction streams, or deltas which use very large base objects, are now streamed through as large objects rather than being inflated into a byte array. This isn't the most efficient way to access delta encoded content, as we may need to rewind and reprocess the base object when there was a block moved within the file, but it will at least prevent the JVM from having its heap explode. When streaming a delta we have an inflater open for each level in the delta chain, to inflate the instruction set of the delta, as well as an inflater for the base level object. The base object is buffered, as is the top level delta requested by the application, but we do not buffer the intermediate delta streams. This keeps memory usage lower, so its closer to 1024 bytes per level in the chain, without having an adverse impact on raw throughput as the top-level buffer gets pushed down to the lowest stream that has the next region. Delta instructions transparently collapse here, if the top level does not copy a region from its base, the base won't materialize that part from its own base, etc. This allows us to avoid copying around a lot of segments which have been deleted from the final version. Change-Id: I724d45245cebb4bad2deeae7b896fc55b2dd49b3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-02 02:19:12 -07:00
Shawn O. Pearce	ded8f6c721	Support large whole packed objects as streams Similar to the loose object support, whole packed objects can now be streamed back to the caller. The streaming is less efficient as we copy the data from the cached window array into the InflaterInputStream's internal buffer, then inflate it there before returning to the application. Like with unpacked objects, there is plenty of room for some optimization, especially for the copyTo method, where we don't necessarily need so much buffering to exist. Change-Id: Ie23be81289e37e24b91d17b0891e47b9da988008 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-01 19:34:21 -07:00
Shawn O. Pearce	13e0218a25	Replace PackedObjectLoader with ObjectLoader.SmallObject The class is identical, but ObjectLoader.SmallObject is part of our public API for storage implementations to build on top of. Change-Id: I381a3953b14870b6d3d74a9c295769ace78869dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-01 18:27:51 -07:00
Shawn O. Pearce	fa23482ca7	Support large loose objects as streams Big loose objects can now be streamed if they are over the large object size threshold. This prevents the JVM heap from exploding with a very large byte array to hold the slurped file, and then again with its uncompressed copy. We may have slightly slowed down the simple case for small loose objects, as the loader no longer slurps the entire thing and decompresses in memory. To try and keep good performance for the very common small objects that are below 8 KiB in size, buffers are set to 8 KiB, causing the reader to slurp most of the file anyway. However the data has to be copied at least once, from the BufferedInputStream into the InflaterInputStream. New unit tests are supplied to get nearly 100% code coverage on the unpacked code paths, for both standard and pack style loose objects. We tested a fair chunk of the code elsewhere, but these new tests are better isolated to the specific branches in the code path. Change-Id: I87b764ab1b84225e9b5619a2a55fd8eaa640e1fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-01 18:26:17 -07:00
Jeff Schumacher	cb8e1e6014	Added a preliminary version of rename detection JGit does not currently do rename detection during diffs. I added a class that, given a TreeWalk to iterate over, can output a list of DiffEntry's for that TreeWalk, taking into account renames. This class only detects renames by SHA1's. More complex rename detection, along the lines of what C Git does will be added later. Change-Id: I93606ce15da70df6660651ec322ea50718dd7c04	2010-07-01 17:33:53 -07:00
Shawn O. Pearce	2489088235	Permit AnyObjectTo to compareTo AnyObjectId Assume that the argument of compareTo won't be mutated while we are doing the compare, and support the wider AnyObjectId type so MutableObjectId is suitable on either side of the compareTo call. Change-Id: I2a63a496c0a7b04f0e5f27d588689c6d5e149d98 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 19:07:36 -07:00
Shawn O. Pearce	d04b7972d8	Use copyTo during checkout of files to working tree This way we can stream a large file through memory, rather than loading the entire thing into a single contiguous byte array. Change-Id: I3ada2856af2bf518f072edec242667a486fb0df1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:56:20 -07:00
Shawn O. Pearce	a0fd06e5c2	Stream whole deflated objects in PackWriter Instead of loading the entire object as a byte array and passing that into the deflater, let the ObjectLoader copy the object onto the DeflaterOutputStream. This has the nice side effect of using some sort of stride hack in the Sun implementation that may improve compression performance. Change-Id: I3f3d681b06af0da93ab96c75468e00e183ff32fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:50:50 -07:00
Shawn O. Pearce	ad0383734e	Lazily allocate Deflater in PackWriter Only allocate the Deflater if we can't reuse everything, but also make sure we release it when we release the PackWriter's resources. Change-Id: I16a32b94647af0778658eda87acbafc9a25b314a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:40:54 -07:00
Shawn O. Pearce	23e7f6376a	Add openStream to ObjectLoader for big blobs Blobs that are too large to read as a single byte array should be accessed through an InputStream based interface instead, allowing the application to walk through the data stream incrementally. Define the basic interface to support streaming contents, but don't implement it yet for the file based backend. Change-Id: If9e4442e9ef4ed52c3e0f1af9398199a73145516 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:36:10 -07:00
Jeff Schumacher	7b0b4110ed	Refactored code out of FileHeader to facilitate rename detection Refactored a superclass out of FileHeader called DiffEntry that holds the more general data from FileHeader that is useful in rename detection (old/new Ids, modes, names, as well as changeType and score). FileHeader is now a DiffEntry that adds Hunks, parsing abilities, etc. Change-Id: I8398728cd218f8c6e98f7a4a7f2f342391d865e4	2010-06-30 17:53:27 -07:00
Dmitry Neverov	44854741c5	Fix missing flush in StreamCopyThread It is possible that StreamCopyThread will not flush everything from it's src to it's dst. In most cases StreamCopyThread works like this: in loop: n = src.read(buf); dst.write(buf, 0, n); and when we want to flush, we interrupt() StreamCopyThread and it flushes everything it wrote to dst. The problem is that our interrupt() could interrupt reading. In this case we will flush everything we wrote to dst, but not everything we wrote to src. Change-Id: Ifaf4d8be87535c7364dd59b217dfc631460018ff Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 10:48:44 -07:00
Shawn O. Pearce	a1d5f5b6b5	Move DirCache factory methods to Repository Instead of creating the DirCache from a static factory method, use an instance method on Repository, permitting the implementation to override the method with a completely different type of DirCache reading and writing. This would better support a repository in the cloud strategy, or even just an in-memory unit test environment. Change-Id: I6399894b12d6480c4b3ac84d10775dfd1b8d13e7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 10:39:00 -07:00
Shawn O. Pearce	cb9d8285ba	Create NoWorkTreeException for bare repositories Using a custom exception type makes it easire for an application developer to understand why an exception was thrown out of a method we declare. To remain compatiable with existing callers, we still extend off IllegalStateException. Change-Id: Ideeef2399b11ca460a2dbb3cd80eb76aa0a025ba Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 09:48:36 -07:00
Jeff Schumacher	9f2249bd26	Added check for binary files while diffing Added a check in Diff to ensure that files that are most likely not text are not line-by-line diffed. Files are determined to be binary by checking the first 8000 bytes for a null character. This is a similar heuristic to what C Git uses. Change-Id: I2b6f05674c88d89b3f549a5db483f850f7f46c26	2010-06-29 17:23:00 -07:00
Shawn O. Pearce	515deaf7e5	Ensure RevWalk is released when done Update a number of calling sites of RevWalk to ensure the walker's internal ObjectReader is released after the walk is no longer used. Because the ObjectReader is likely to hold onto a native resource like an Inflater, we don't want to leak them outside of their useful scope. Where possible we also try to share ObjectReaders across several walk pools, or between a walker and a PackWriter. This permits the ObjectReader to actually do some caching if it felt inclined to do so. Not everything was updated, we'll probably need to come back and update even more call sites, but these are some of the biggest offenders. Test cases in particular aren't updated. My plan is to move most storage-agnostic tests onto some purely in-memory storage solution that doesn't do compression. Change-Id: I04087ec79faeea208b19848939898ad7172b6672 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-29 15:12:53 -07:00
Shawn O. Pearce	94228bde22	Use ObjectReader in DirCacheBuilder.addTree Rather than building a custom reader, have the caller supply us one. Change-Id: Ief2b5a6b1b75f05c8a6bc732a60d4d1041dd8254 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-29 09:30:29 -07:00
Shawn O. Pearce	d6e975f71b	Use one ObjectReader for WalkFetchConnection Instead of creating new ObjectReader for each walker, use one for the entire connection and delegate reads through it. Change-Id: I7f0a2ec8c9fe60b095a7be77dc423a2ff8b443a3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 18:47:33 -07:00
Shawn O. Pearce	121d009b9b	Use ObjectReader in RevWalk, TreeWalk We don't actually need a Repository object here, just an ObjectReader that can load content for us. So change the API to depend on that. However, this breaks the asCommit and asTag legacy translation methods on RevCommit and RevTag, so we still have to keep the Repository inside of RevWalk for those two types. Hopefully we can drop those in the future, and then drop the Repository off the RevWalk. Change-Id: Iba983e48b663790061c43ae9ffbb77dfe6f4818e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 18:47:29 -07:00
Shawn O. Pearce	06f635a4bc	Fix minor formatting issue in UploadPack Change-Id: Ifc0c3a94dc0e16126af6cf17e9c4a7cb96e8ffab Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 18:47:28 -07:00
Shawn Pearce	3fd4918852	Merge changes Ie56301aa,Ic2f79e85 * changes: Added further support for whitespace ignoring during diff Added support for whitespace ignoring	2010-06-28 20:27:04 -04:00
Jeff Schumacher	9869ef2592	Added further support for whitespace ignoring during diff Added code to support ignoring leading, trailing, and changed whitespace when performing a diff operation. I also added command line options to Diff to enable the various whitespace ignoring methods. These match the flags for git diff. Change-Id: Ie56301aafad59ee3f0fe5de62719f5023cd702c8	2010-06-28 17:25:19 -07:00
Shawn O. Pearce	242b4026d9	Remove volatile keyword from RepositoryEvent We don't need this field to be volatile. Events are delivered by the same thread that created the RepositoryEvent object, and thus any cross-thread operations would need to be handled by some other type of synchronization in the listener, and that would protect both the repository field and any other per-event data. Change-Id: Iefe345959e1a2d4669709dbf82962bcc1b8913e3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 12:46:18 -07:00
Shawn O. Pearce	aa4b06e087	Rename openObject, hasObject to just open, has Similar to what we did on Repository, the openObject method already implied we wanted to open an object, given its main argument was of type AnyObjectId. Simplify the method name to just the action, has or open. Change-Id: If055e5e0d8de0e2424c18a773f6d2bc2f66054f4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:57:41 -07:00
Shawn O. Pearce	acb7be2c5a	Refactor Repository.openObject to be Repository.open We drop the "Object" suffix, because its pretty clear here that we want to open an object, given that we pass in AnyObjectId as the main parameter. We also fix the calling convention to throw a MissingObjectException or IncorrectObjectTypeException, so that callers don't have to do this error checking themselves. Change-Id: I72c43353cea8372278b032f5086d52082c1eee39 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:54:58 -07:00
Shawn O. Pearce	6b62e53b60	Move PackWriter progress monitors onto the operations Rather than taking the ProgressMonitor objects in our constructor and carrying them around as instance fields, take them as arguments to the actual time consuming operations we need to run. Change-Id: I2b230d07e277de029b1061c807e67de5428cc1c4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:47:28 -07:00
Shawn O. Pearce	f288c27e46	Pass the PackOutputStream down the call stack Rather than storing this in an instance member, pass it down the calling stack. Its cleaner, we don't have to poke the stream as a temporary field, and then unset it. Change-Id: I0fd323371bc12edb10f0493bf11885d7057aeb13 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:47:28 -07:00
Shawn O. Pearce	1ad2feb7b3	Remove Repository.openObject(ObjectReader, AnyObjectId) Going through ObjectReader.openObject(AnyObjectId) is faster, but also produces cleaner application level code. The error checking is done inside of the openObject method, which means it can be removed from the application code. Change-Id: Ia927b448d128005e1640362281585023582b1a3a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:47:28 -07:00
Shawn O. Pearce	9ba7bd4df4	Throw IncorrectObjectTypeException on bad type hints If the type hint isn't OBJ_ANY and it doesn't match the actual type observed from the object store, define the reader to throw back an IncorrectObjectTypeException. This way the caller doesn't have to perform this check itself before it evaluates the object data, and we can simplify quite a few call sites. Change-Id: I9f0dfa033857f439c94245361fcae515bc0a6533 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 11:47:25 -07:00
Jeff Schumacher	543235b805	Added support for whitespace ignoring JGit did not have support for skipping whitespace when comparing lines in RawText objects. I added a subclass of RawText that skips whitespace in its equals and hashCode methods. I used a subclass rather than adding functionality into RawText so that performance would not be impacted by extra logic. This class only supports ignoring all whitespace. Others will follow that allow other forms of whitespace ignoring. Change-Id: Ic2f79e85215e48d3fd53ec1b4ad13373dd183a4a	2010-06-28 10:59:10 -07:00
Shawn O. Pearce	a45728d7a4	Ensure ObjectReader used by PackWriter is released The ObjectReader API demands that we release the reader when we are done with it. PackWriter contains a reader, which it uses for the entire packing session. Expose the release of the reader through a release method on the writer. This still doesn't address the RevWalk and TreeWalk users, who don't correctly release their reader. But its a small step in the right direction. Change-Id: I5cb0b5c1b432434a799fceb21b86479e09b84a0a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 10:25:11 -07:00
Shawn O. Pearce	b5aa52e98a	Ensure PackWriter releases its ObjectReader Change-Id: I3f8af29066cc5a2132dc4a75c9654d97800f2f18 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 10:16:27 -07:00
Shawn O. Pearce	e01abbd543	Release ObjectReader before the cached ObjectDatabase I don't want to play games with the order of release here, its probably safer to release the reader before the database, just in case the one depends on the other. Change-Id: I2394c7d2477eaf7a7e1556fc3393c59d3b31e764 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 09:47:20 -07:00
Shawn O. Pearce	b40f02eb1a	Release ObjectInserter in merge() not mergeImpl() By doing the release at the higher level class, we can ensure the release occurs if the inserter was allocated, even if the implementation forgets to do this. Since the higher level class is what allocated it, it makes sense to have it also do the release. Change-Id: Id617b2db864c3208ed68cba4eda80e51612359ad Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 09:35:55 -07:00
Shawn O. Pearce	5aae041a81	Commit: Use Repository.newObjectInserter Everyone else does. This must have been a spot I missed during some sort of squash while developing the series. Change-Id: I62eae50b618f47ee33ad7cf71fc05b724f603201 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-28 09:22:48 -07:00
Shawn O. Pearce	ea21c111cb	Move PackWriter over to storage.pack.PackWriter Similar to what we did with the file code, move the pack writer into its own package so the related classes and their package private methods are hidden from the rest of the library. Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:51:12 -07:00
Shawn O. Pearce	71aace52f7	Simplify ObjectLoaders coming from PackFile We no longer need an ObjectLoader to be lazy and try to delay the materialization of the object content. That was done only to support PackWriter searching for a good reuse candidate. Instead, simplify the code base by doing the materialization immediately when the loader asks for it, because any caller asking for the loader is going to need the content. Change-Id: Id867b1004529744f234ab8f9cfab3d2c52ca3bd0 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:50:38 -07:00
Shawn O. Pearce	68518ca3aa	Remove getRawSize, getRawType from ObjectLoader These were only used by PackWriter to help it filter object representations. Their only user disappeared when we rewrote the object selection code path to use the new representation type. Change-Id: I9ed676bfe4f87fcf94aa21e53bda43115912e145 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:50:38 -07:00
Shawn O. Pearce	86547022f0	Tighten up local packed object representation during packing Rather than making a loader, and then using that to fill the object representation, parse the header and set up our data directly. This saves some time, as we don't waste cycles on information we won't use right now. The weight computed for a representation is now its actual stored size in the pack file, rather than its inflated size. This accounts for changes made when the compression level is modified on the repository. It is however more costly to determine the weight of the object, since we have to find its length in the pack. To try and recover that cost we now cache the length as part of our ObjectToPack record, so it doesn't have to be found during the output phase. A LocalObjectToPack now costs us (assuming 32 bit pointers): (32 bit) (64 bit) vm header: 8 bytes 8 bytes ObjectId: 20 bytes 20 bytes PackedObjectInfo: 12 bytes 12 bytes ObjectToPack: 8 bytes 12 bytes LocalOTP: 20 bytes 24 bytes ----------- --------- 68 bytes 74 bytes Change-Id: I923d2736186eb2ac8ab498d3eb137e17930fcb50 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:50:38 -07:00
Shawn O. Pearce	ad5238dc67	Move FileRepository to storage.file.FileRepository This move isolates all of the local file specific implementation code into a single package, where their package-private methods and support classes are properly hidden away from the rest of the core library. Because of the sheer number of files impacted, I have limited this change to only the renames and the updated imports. Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 18:50:34 -07:00
Shawn O. Pearce	3a7aec03e0	Implement zero-copy for single window objects Objects that fall completely within a single window can be worked with in a zero-copy fashion, provided that the window is backed by a normal byte[] and not by a ByteBuffer. This works for a surprising number of objects. The default window size is 8 KiB, but most deltas are quite a bit smaller than that. Objects smaller than 1/2 of the window size have a very good chance of falling completely within a window's array, which means we can work with them without copying their data around. Larger objects, or objects which are unlucky enough to span over a window boundary, get copied through the temporary buffer. We pay a tiny penalty to realize we can't use the zero-copy code path, but its easier than trying to keep track of two adjacent windows. With this change (as well as everything preceeding it), packing is actually a bit faster. Some crude benchmarks based on cloning linux-2.6.git (~324 MiB, 1,624,785 objects) over localhost using C git client and JGit daemon shows we get better throughput, and slightly better times: Total Time \| Throughput (old) (now) \| (old) (now) --------------+--------------------------- 2m45s 2m37s \| 12.49 MiB/s 21.17 MiB/s 2m42s 2m36s \| 16.29 MiB/s 22.63 MiB/s 2m37s 2m31s \| 16.07 MiB/s 21.92 MiB/s Change-Id: I48b2c8d37f08d7bf5e76c5a8020cde4a16ae3396 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 16:13:22 -07:00
Shawn O. Pearce	ece88b99eb	Redo PackWriter object reuse output Output of selected reuses is refactored to use a new ObjectReuseAsIs interface that extends the ObjectReader. This interface allows the reader to control how it performs the reuse into the output stream, but also allows it to throw an exception to request the writer to find a different candidate representation. The PackFile reuse code was overhauled, cleaning up the APIs so they aren't exposed in the object loader, but instead are now a single method on the PackFile itself. The reuse algorithm was changed to do a data verification pass, followed by the copy pass to the output. This permits us to work around a corrupt object in a pack file by seeking another copy of that object when this one is bad. The reuse code was also optimized for the common case, where the in-pack representation is under 16 KiB. In these smaller cases data is sent to the pack writer more directly, avoiding some copying. Change-Id: I6350c2b444118305e8446ce1dfd049259832bcca Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 14:46:05 -07:00
Shawn O. Pearce	bf4ffff07f	Redo PackWriter object reuse selection The new selection implementation uses a public API on the ObjectReader, allowing the storage library to enumerate its candidates and select the best one for this packer without needing to build a temporary list of the candidates first. Change-Id: Ie01496434f7d3581d6d3bbb9e33c8f9fa649b6cd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-26 14:16:06 -07:00
Shawn O. Pearce	e0c9368f3e	Reclaim some bits in ObjectToPack flags field Make the lower bits available for flags that PackWriter can use to keep track of facts about the object. We shouldn't need more than 2^24 delta depths, unpacking that chain is unfathomable anyway. This change gets us 4 bits that are unused in the lower end of the word, which are typically easier to load from Java and most machine instruction sets. We can use these in later changes. Change-Id: Ib9e11221b5bca17c8a531e4ed130ba14c0e3744f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 23:26:19 -07:00
Shawn O. Pearce	6fc3ecac84	Extract PackFile specific code to ObjectToPack subclass The ObjectReader class is dual-purposed into being a factory for the ObjectToPack, permitting specific ObjectDatabase implementations to override the method and offer their own custom subclass of the generic ObjectToPack class. By allowing them to directly extend the type, each implementation can add custom fields to support tracking where an object is stored, without incurring any additional penalties like a parallel Map<ObjectId,Object> would cost. The reader was chosen to act as a factory rather than the database, as the reader will eventually be tied more tightly with the ObjectWalk and TreeWalk. During object enumeration the reader would have had to load the object for the RevWalk, and may chose to cache object position data internally so it can later be reused and fed into the ObjectToPack instance supplied to the PackWriter. Since a reader is not thread-safe, and is scoped to this PackWriter and its internal ObjectWalk, its a great place for the database to perform caching, if any. Right now this change goes a bit backwards by changing what should be generic ObjectToPack references inside of PackWriter to the very PackFile specific LocalObjectToPack subclass. We will correct these in a later commit as we start to refine what the ObjectToPack API will eventually look like in order to better support the PackWriter. Change-Id: I9f047d26b97e46dee3bc0ccb4060bbebedbe8ea9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 23:26:19 -07:00
Shawn O. Pearce	a2208be6aa	Extract ObjectToPack to be top-level This shortens the implementation within PackWriter, and starts to open the door for some other refactorings based on changing the ObjectToPack to be a public part of the API. Change-Id: Id849cbffc4de20b903e844a2de7737eeb8b7a3ff Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 23:26:19 -07:00
Shawn O. Pearce	ffe0614d4d	Allow Repository.getDirectory() to be null Some types of repositories might not be stored on local disk. For these, they will most likely return null for getDirectory() as the java.io.File type cannot describe where their storage is, its not in the host's filesystem. Document that getDirectory() can return null now, and update all current non-test callers in JGit that might run into problems on such repositories. For the most part, just act like its bare. Change-Id: I061236a691372a267fd7d41f0550650e165d2066 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 18:03:41 -07:00
Shawn O. Pearce	8a9844b2af	Redo event listeners to be more generic Replace the old crude event listener system with a much more generic implementation, patterned after the event dispatch techniques used in Google Web Toolkit 1.5 and later. Each event delivers to an interface that defines a single method, and the event itself is what performs the delivery in a type-safe way through its own dispatch method. Listeners are registered in a generic listener list, indexed by the interface they implement and wish to receive an event for. Delivery of events is performed by looping through all listeners implementing the event's corresponding listener interface, and using the event's own dispatch method to deliver the event. This is the classical "double dispatch" pattern for event delivery. Listeners can be unregistered by invoking remove() on their registration handle. This change therefore requires application code to track the handle if it wishes to remove the listener at a later point in time. Event delivery is now exposed as a generic public method on the Repository class, making it easier for any type of message to be sent out to any type of listener that has registered, without needing to pre-arrange for type-safe fireFoo() methods. New event types can be added in the future simply by defining a new RepositoryEvent subclass and a corresponding RepositoryListener interface that it dispatches to. By always adding new events through a new interface, we never need to worry about defining an Adapter to provide default no-op implementations of new event methods. Change-Id: I651417b3098b9afc93d91085e9f0b2265df8fc81 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 18:03:41 -07:00
Shawn O. Pearce	203bd66267	Rename Repository getWorkDir to getWorkTree This better matches with the name used in the environment (GIT_WORK_TREE), in the configuration file (core.worktree), and in our builder object. Since we are already breaking a good chunk of other code related to repository access, and this fairly easy to fix in an application's code base, I'm not going to offer the wrapper getWorkDir() method. Change-Id: Ib698ba4bbc213c48114f342378cecfe377e37bb7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 18:03:41 -07:00
Shawn O. Pearce	532421d989	Refactor repository construction to builder class The new FileRepositoryBuilder class helps applications to construct a properly configured FileRepository, with properties assumed based upon the standard Git rules for the local filesystem. To better support simple command line applications, environment variable handling and repository searching was moved into this builder class. The change gets rid of the ever-growing FileRepository constructor variants, and the multitude of java.io.File typed parameters, by using simple named setter methods. Change-Id: I17e8e0392ad1dbf6a90a7eb49a6d809388d27e4c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:58:40 -07:00
Shawn O. Pearce	8f46ee4870	Remove Repository.toFile(ObjectId) Not every type of Repository will be able to map an ObjectId into a local file system path that stores that object's file contents. Heck, its not even true for the FileRepository, as an object can be stored in a pack file and not in its loose format. Remove this from our public API, it was a mistake to publish it. Change-Id: I20d1b8c39104023936e6d46a5b0d7ef39ff118e8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:58:39 -07:00
Shawn O. Pearce	41c04bbb28	Use ObjectInserter for loose objects in WalkFetchConnection Rather than relying on the repository's ability to give us the local file path for a loose object, just pass its inflated form to the ObjectInserter for the repository. We have to recompress it, which may slow down fetches, but this is the slow dumb protocol. The extra cost to do the compression locally isn't going to be a major bottleneck. This nicely removes the nasty part about computing the object identity by hand, allowing us to instead rely upon the inserter's internal computation. Unfortunately it means we might store a loose object whose SHA-1 doesn't match the expected SHA-1, such as if the remote repository was corrupted. This is fairly harmless, as the incorrectly named object will now be stored under its proper name, and will eventually be garbage collected, as its not referenced by the local repository. We have to flush the inserter after the object is stored because we aren't sure if we need to read the object later, or not. Change-Id: Idb1e2b1af1433a23f8c3fd55aeb20575e6047ef0 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:58:06 -07:00
Shawn O. Pearce	5cfc29b491	Replace WindowCache with ObjectReader The WindowCache is an implementation detail of PackFile and how its used by ObjectDirectory. Lets start to hide it and replace the public API with a more generic concept, ObjectReader. Because PackedObjectLoader is also considered a private detail of PackFile, we have to make PackWriter temporarily dependent upon the WindowCursor and thus FileRepository and ObjectDirectory in order to just start the refactoring. In later changes we will clean up the APIs more, exposing sufficient support to PackWriter without needing the file specific implementation details. Change-Id: I676be12b57f3534f1285854ee5de1aa483895398 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:58:01 -07:00
Shawn O. Pearce	133c987f4d	Refactor alternate object databases below ObjectDirectory Not every object storage system will have the concept of alternate object databases to search, and even if they do, they may not have the notion of fast-access / slow-access split like we do within the ObjectDirectory code for pack files and loose objects. Push all of that down below the generic API so that it is a hidden detail of the ObjectDirectory and its related supporting classes. Change-Id: I54bc1ca5ff2ac94dfffad1f9a9dad7af202b9523 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:41 -07:00
Shawn O. Pearce	88530a179e	Start using ObjectInserter instead of ObjectWriter Some newer style APIs are updated to use the newer ObjectInserter interface instead of the now deprecated ObjectWriter. In many of the unit tests we don't bother to release the inserter, these are typically using the file backend which doesn't need a release, but in the future should use an in-memory HashMap based store, which really wouldn't need it either. Change-Id: I91a15e1dc42da68e6715397814e30fbd87fa2e73 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:41 -07:00
Shawn O. Pearce	cad10e6640	Refactor object writing responsiblities to ObjectDatabase The ObjectInserter API permits ObjectDatabase implementations to control their own object insertion behavior, rather than forcing it to always be a new loose file created in the local filesystem. Inserted objects can also be queued and written asynchronously to the main application, such as by appending into a pack file that is later closed and added to the repository. This change also starts to open the door to non-file based object storage, such as an in-memory HashMap for unit testing, or a more complex system built on top of a distributed hash table. To help existing application code port to the newer interface we are keeping ObjectWriter as a delegation wrapper to the new API. Each ObjectWriter instances holds a reference to an ObjectInserter for the Repository's top-level ObjectDatabase, and it flushes and releases that instance on each object processed. Change-Id: I413224fb95563e7330c82748deb0aada4e0d6ace Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:41 -07:00
Shawn O. Pearce	3e3a50db5e	Change Repository.getConfig() to return non-file Configs A repository implementation might support storing configurations on a non-file storage system, so widen the return value to be any type of configuration. Change-Id: If9a0928f4b3ef29a24d270b0ce585a6e77f6fac6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	4c14b7623d	Make lib.Repository abstract and lib.FileRepository its implementation To support other storage models other than just the local filesystem, we split the Repository class into a nearly abstract interface and then create a concrete subclass called FileRepository with the file based IO implementation. We are using an abstract class for Repository rather than the much more generic interface, as implementers will want to inherit a large array of utility functions, such as resolve(String). Having these in a base class makes it easy to inherit them. This isn't the final home for lib.FileRepository. Future changes will rename it into storage.file.FileRepository, but to do that we need to also move a number of other related class, which we aren't quite ready to do. Change-Id: I1bd54ea0500337799a8e792874c272eb14d555f7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	77b39df5ec	Consistently fail work tree methods on bare repositories If the working tree isn't available, it doesn't make any sense to obtain the merge heads, or the buffered commit message. The repository shouldn't have a partial merge state to read. Throw back the same exception we do when invoking getWorkDir() on a bare repository instance. Change-Id: I762c55890b7fe272a183da583f910671d1cadf71 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	f18b853044	Consistently use getDirectory() for work tree state This permits us to leave the implementation of these methods here in the Repository class, but later refactor how the directory is accessed into a subclass. Change-Id: I5785b2009c5b7cca0fb070a968e50814ce847076 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	a63494edee	Add RepositoryState.BARE A bare repository cannot be checked out, committed to, etc. as it doesn't have a working directory. Define this as a state since the state enumeration exists only to describe how a working directory can be modified. Change-Id: I0a299013c6e42fef6cae3f6a9446f8f6c8e0514a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	c9c57d34de	Rename Repository 'config' as 'repoConfig' This better matches with the other configuration variable, 'userConfig', and helps to make it clear what config object we are dealing with. Change-Id: I2c585649aecc805e8e66db2f094828cd2649e549 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	6a822f0ebf	Remove RepositoryConfig and use FileBasedConfig instead Change the Repository API to use straight-up FileBasedConfig. This lets us remove the subclass RepositoryConfig and stop having a specialized configuration type for repository, letting us instead focus the config type heirarchy on type-of-storage rather than use. Change-Id: I7236800e8090624453a89cb0c7a9a632702691c6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	bd8b06427f	Delegate repository access to refs, objects Instead of using the internal field directly to access references or objects, use the getter method to obtain the proper type of database, and follow down from there. This permits us to later do a refactoring that makes those methods abstract and strips the field out of the Repository class, moving it into a concrete base class that is more storage implementation specific. Change-Id: Ic21dd48800e68a04ce372965ad233485b2a84bef Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:40 -07:00
Shawn O. Pearce	f6c26dabd2	Cleanup Repository.create() This method doesn't need to be synchronized, as its only a proxy to create(boolean), which is the real worker. While we are touching it try to improve the Javadoc and whitespace nearby. Change-Id: Ibdddec6e518ca6d7439cfad90fedfcdc2d6b7a2e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:39 -07:00
Shawn O. Pearce	5309244713	Move additional have enumeration to Repository This permits the repository implementation to know what its alternates concept means, and avoids needing to expose finer details about the ObjectDatabase to network code like the RefAdvertiser. Change-Id: Ic6d173f300cb72de34519c7607cf7b0ff3ea6882 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:39 -07:00
Shawn O. Pearce	479fcf9e32	Refactor amazon-s3:// property file loading to support no directory In the future getDirectory() can return null. Avoid an NPE here by refactoring the code to support conditionally skipping a check for the properties file in the repository directory, falling to only the user's ~/ file location. Change-Id: I76f5503d4063fdd9d24b7c1b58e1b09ddf1a5670 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:39 -07:00
Shawn O. Pearce	f39c9fc741	Download pack-.idx to /tmp if not on local filesystem If the destination repository doesn't use an ObjectDirectory to store its objects, we can't download to the object directory. Instead pull the pack-.idx files down to temporary files in the JVM's default temporary directory. Change-Id: Ied16bc89be624d87110ba42ba52d698a6ea7d982 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:39 -07:00
Shawn O. Pearce	553c2e5a42	DirCache must use getIndexFile When reading or locking the index of a repository, we need to use the index file specified by the repository, to ensure we correctly honor what the repository was configured with. Change-Id: I5be366ce32d7923b888dc01d19335912b01b7c4c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-25 17:46:39 -07:00
Shawn O. Pearce	60aae90d4d	Disable topological sorting in PackWriter Its not strictly required that we sort topologically in order to produce a valid pack file. This was just something that Linus thought would be a good idea to do. In practice its not that important for most repositories. Local file IO quickly falls out of the pattern that topological sorting provides any sort of benefit for, so expending extra resources to enforce it when we make a pack isn't really worth it. I'm removing this sort in the pipeline because later changes would support really efficient COMMIT_TIME_DESC sorting on a non-file storage system, but TOPO sorting would be a bit more ugly to run, due to the in-memory delays it imposes. Change-Id: I0121453461c2140c6917cb10c6df584eb47e5795 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:32:41 -07:00
Shawn O. Pearce	ccd0c0c911	UploadPack: Permit flushing progress messages under smart HTTP If UploadPack invokes flush() on the output stream we pass it, its most likely the progress messages coming down the side band stream. As pack generation can take a while, we want to push that down at the client as early as we can, to keep the connection alive, and to let the user know we are still working on their behalf. Ensure we dump the temporary buffer whenever flush() is invoked, otherwise the messages don't get sent in a timely fashion to the user agent (in this case, git fetch). We specifically don't implement flush() for ReceivePack right now, as that protocol currently does not provide progress messages to the user, but it does invoke flush several times, as the different streams include '0000' type flush-pkts to denote various end points. Change-Id: I797c90a2c562a416223dc0704785f61ac64e0220 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:32:41 -07:00
Shawn O. Pearce	b6ba9739d5	Rewrite resolve in terms of RevWalk We want to eventually get rid of the mapCommit, mapTree APIs on Repository and force everyone into the faster parsers that exist in RevWalk. Rewriting resolve in terms of the faster parsers is a good first step. It actually simplifies the code a bit, as we no longer need to keep track of an ObjectId and an Object (the parsed form), since all RevObjects implicitly have their ObjectId readily available. Change-Id: I4d234630195616e2c263e7e70038b55a1be4e7a3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:32:41 -07:00
Shawn O. Pearce	47c07e1a0d	Replace manual peel loops with RevWalk.peel Instead of peeling things by hand in application level code, defer the peeling logic into RevWalk's new peel utility method. Change-Id: Idabd10dc41502e782f6a2eeb56f09566b97775a8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:32:40 -07:00
Shawn O. Pearce	599c0ce745	Use RevTag/RevCommit to sort in a PlotWalk We already have these objects parsed and cached in our object pool. We shouldn't be looking them up via the legacy mapObject API, but instead can use the pool and the faster parsing routines available through the RevWalk that we extend. While we are here fixing the code, lets also correct the tag date sorting to accept tags that have no tagger identity, because they were created before Git knew how to store that field. Change-Id: Id49a11f6d9c050c82b876e5e11058840c894b2d7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:32:40 -07:00
Shawn O. Pearce	e1b312b5f7	Use CoreConfig, UserConfig and TransferConfig directly Rather than relying on the helpers in RepositoryConfig to get these objects, obtain them directly through the Config API. Its only slightly more verbose, but permits us to work with the base Config class, which is more flexible than the highly file specific RepositoryConfig. This is what I really meant to do when I added the section parser and caching support to Config, we just failed to finish updating all of the call sites. Change-Id: I481cb365aa00bfa8c21e5ad0cd367ddd9c6c0edd Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:29:38 -07:00
Shawn O. Pearce	8e396bcddc	Use higher level Config types when possible We don't have to assume/depend on RepositoryConfig here, these two tests can use higher level versions of the class and still come up with the same test. That frees us up to do some changes to the RepositoryConfig API. Change-Id: Ia7b263c8c5efa3fae1054416d39c546867288132 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 17:29:37 -07:00
Shawn O. Pearce	5ed96eb7f4	UploadPack: Avoid unnecessary flush in smart HTTP Under smart HTTP the biDirectionalPipe flag is false, and we return back immediately at this point in the negotiation process. There is no need to flush the stream to the client, the request is over and it will be automatically flushed out by the higher level servlet that invoked us. Avoiding flush here allows us to only use flush after a progress message is sent during pack generation. Change-Id: Id0c8b7e95e3be6ca4c1b479e096bed6b0283b828 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:15 -07:00
Shawn O. Pearce	066df3d1a1	Add MutableObjectId.copyFrom(AnyObjectId) This simplifies the PackIndex code, which is trying to quickly copy an existing ObjectId into a MutableObjectId. Rather than having the PackIndex violate the ObjectId's internals, expose a copy from function similar to the other ones for copying from raw byte arrays or hex formatted strings. Change-Id: I142635cbece54af2ab83c58477961ce925dc8255 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:15 -07:00
Shawn O. Pearce	677b9b17e2	Expose AnyObjectId compareTo(byte[]) and compareTo(int[]) Storage systems can use these implementations to compare a passed AnyObjectId with a stored representation of an ObjectId in the canonical network byte order format. This can be useful to do a binary search, or just linear scan, over an encoded storage file. Change-Id: I8c72993c4f4c6e98d599ac2c9867453752f25fd2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:15 -07:00
Shawn O. Pearce	864cc3de10	Expose RefWriter constructor taking RefList An implementation might prefer to use the RefList type here, and RefList is part of our public API. Expose the constructor so callers who have a RefList can take advantage of the existing sorting. Change-Id: I545867f85aa2c479d2d610024ebbe318144709c8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:15 -07:00
Shawn O. Pearce	bfc43c13bc	Expose RefUpdate constructor to any subclass When we finally move RefDirectory to the new storage.file package, its associated RefDirectoryUpdate will need visiblity to this constructor in order to initialize itself. This is true of any other repository implementation, so make it protected rather than package level visible. Change-Id: If838aec9baeb80ee2f12dcbca717657c725a9242 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:14 -07:00
Shawn O. Pearce	8e40697047	Expose repository change event constructors Repository implementations outside of .lib need to be able to create these events and deliver them to listening application code. Expose and document the constructors so that they are visible when we move FileRepository into storage.file.FileRepository. Change-Id: I7fb6e8f4f5fdab683c5ebb5267673aa6d5b560bb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:14 -07:00
Shawn O. Pearce	b3254d1159	isValidRefName: Inline the forbidden ref suffix of ".lock" A Git reference name must never end with ".lock", as it would confuse any existing C client that tries to obtain a clone of the repository over the network. Even if the repository isn't on a local filesystem, it still should ban that suffix. Because I plan to move LockFile to storage.file and make it a private implementation detail of the local file system storage model, we can't rely on its package level SUFFIX field here. Making it public probably won't work long-term either, as I also plan to pull storage.file into its own separate project that depends on the core library. So, just inline the constant here. Its as foribidden as ":" is. Change-Id: If85076861baeacc183b82696375a13e935ba8836 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-23 16:54:14 -07:00

1 2 3 4 5 ...

456 Commits