motiejus/jgit - jgit - gitea: Gitea Service

motiejus

jgit

Author	SHA1	Message	Date
Shawn O. Pearce	074055d747	debug-show-packdelta: Dump a pack delta to the console This is a horribly crude application, it doesn't even verify that the object its dumping is delta encoded. Its method of getting the delta is pretty abusive to the public PackWriter API, because right now we don't want to expose the real internal low-level methods actually required to do this. Change-Id: I437a17ceb98708b5603a2061126eb251e82f4ed4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:12:32 -07:00
Shawn O. Pearce	8612c0ace1	Initial pack format delta generator DeltaIndex is a simple pack style delta generator. The function works by creating a compact index of a source buffer's blocks, and then walking a sliding window along a desired result buffer, searching for the window in the index. When a match is found, the window is stretched to the longest possible length that is common with the source buffer, and a copy instruction is created. Rabin's polynomial hash function is used to compute the hash for a block, permitting efficient sliding of the window in single byte increments. The update function to slide one byte originated from David Mazieres' work in LBFS, and our implementation of the update step was certainly inspired by the initial work Geert Bosch proposed for C Git in http://marc.info/?l=git&m=114565424620771&w=2. To ensure the encoder runs in linear time with respect to the size of the two input buffers (source and result), the maximum number of blocks that can share the same position in the index's hashtable is capped at a constant number. This prevents bad inputs from causing the encoder to run in quadratic time, but comes with a penalty of creating a longer delta due to fewer considered copy positions. Strange hackery is used to cap the amount of memory used by the index to be no more than 12 bytes for every 16 bytes of source buffer, no matter what the JVM per-object overhead is. This permits an index to always be no larger than 1.75x the source buffer length, which is an important feature to support large windows of candidates to match against while packing. Here the strange hackery is nothing more than a manually managed chained hashtable, where pointers are array indexes into storage arrays rather than object references. Computation of the hash function for a single fixed sized block is done through an unrolled loop, where the first 4 iterations have been manually reduced down to eliminate unnecessary instructions. The pattern is derived from ObjectId.equals(byte[], int, byte[], int), where we have unrolled the loop required to compare two 20 byte arrays. Hours of testing with the Sun 1.6 JRE concluded that the non-obvious "foo[idx + 1]" style of reference is faster than "foo[idx++]", and so that is what we use here during hashing. Change-Id: If9fb2a1524361bc701405920560d8ae752221768 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:10:55 -07:00
Shawn O. Pearce	b38426ae8c	Add debugging toString() method to ObjectToPack Its useful to know what the flags are or what the base that was selected is. Dump these out as part of the object's toString. Change-Id: I8810067fb8337b08b4fcafd5f9ea3e1e31ca6726 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:09:19 -07:00
Shawn O. Pearce	699e4aa7c5	Make ObjectToPack clearReuseAsIs signal available to subclasses A subclass may want to use this method to release handles that are caching reuse information. Make it protected so they can override it and update themselves. Change-Id: I2277a56ad28560d2d2d97961cbc74bc7405a70d4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:07:45 -07:00
Shawn O. Pearce	4569d77e13	Correctly classify the compressing objects phase Searching for reuse candidates should be fast compared to actually doing delta compression. So pull the progress monitor out of this phase and rename it back to identify the compressing objects state. Change-Id: I5eb80919f21c1251e0e3420ff7774126f1f79b27 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:06:10 -07:00
Shawn O. Pearce	85b7a53d52	Refactor ObjectToPack's delta depth setting Long ago when PackWriter is first written we thought that the delta depth could be updated automatically. But its never used. Instead make this a simple standard setter so the caller can more directly set the delta depth of this object. This permits us to configure a depth that takes into account more than just the depth of another object in this same pack. Change-Id: I1d71b74f2edd7029b8743a2c13b591098ce8cc8f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:04:35 -07:00
Shawn O. Pearce	6730f9e3c8	Configure core.bigFileThreshold into PackWriter C Git's fast-import uses this to determine the maximum file size that it tries to delta compress, anything equal to or above this setting is stored with as a whole object with simple deflate. Define the configuration so we can use it later. Change-Id: Iea46e787d019a1b6c51135cc73d7688a02e207f5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:02:54 -07:00
Shawn O. Pearce	823e9a9721	Add doNotDelta flag to ObjectToPack This flag will later control whether or not PackWriter search for a delta base for this object. Edge objects will never get searched, as the writer won't be outputting them, so they should always have this flag set on. Sometime in the future this flag should also be set for file blobs on file paths that have the "-delta" gitattribute set in the repository's attributes file. Change-Id: I6e518e1a6996c8ce00b523727f1b605e400e82c6 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:00:49 -07:00
Shawn O. Pearce	616bc74cf7	Add more configuration options to PackWriter We now at least import other pack settings like pack.window, which means we can later use these to control how we search for deltas. The compression level was fixed to use pack.compression rather than the loose object core.compression setting. Change-Id: I72ff6d481c936153ceb6a9e485fa731faf075a9a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 19:00:46 -07:00
Robin Rosenberg	a1492f1922	Say that commit is allowed during bisect C Git allows this and it is quite handy. Change-Id: I1d0238b43fca931ad2079649fb7b431e2815c351 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	2010-07-10 02:32:46 +02:00
Shawn O. Pearce	2f93a09dd1	Save object path hash codes during packing We need to remember these so we can later cluster objects that have similar file paths near each other as we search for deltas between them. Change-Id: I52cb1e4ca15c9c267a2dbf51dd0d795f885f4cf8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 15:17:26 -07:00
Shawn O. Pearce	c20daa7314	Add path hash code to ObjectWalk PackWriter wants to categorize objects that are similar in path name, so blobs that are probably from the same file (or same sort of file) can be delta compressed against each other. Avoid converting into a string by performing the hashing directly against the path buffer in the tree iterator. We only hash the last 16 bytes of the path, and we try avoid any spaces, as we want the suffix of a file such as ".java" to be more important than the directory it is in, like "src". Change-Id: I31770ee711526306769a6f534afb19f937e0ba85 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 10:37:47 -07:00
Shawn O. Pearce	b584cb8754	Add getObjectSize to ObjectReader This is an informational function used by PackWriter to help it better organize objects for delta compression. Storage systems can implement it to provide up more detailed size information, or they can simply rely on the default behavior that uses the ObjectLoader obtained from open. For local file storage, we can obtain this information faster through specialized routines that parse a pack object header. Change-Id: I13a09b4effb71ea5151b51547f7d091564531e58 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 10:37:47 -07:00
Shawn O. Pearce	97311cd3e0	Allow TemporaryBuffer.Heap to allocate smaller than 8 KiB If the heap limit was set to something smaller than 8 KiB, we were still allocating the full 8 KiB block size, and accepting up to the amount we allocated by. Instead actually put a hard cap on the limit. Change-Id: Id1da26fde2102e76510b1da4ede8493928a981cc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-09 10:37:47 -07:00
Christian Halstrick	d1378e4c51	Merge "Allow ReadTreeTest to test arbitrary Checkouts"	2010-07-09 09:42:12 -04:00
Matthias Sohn	b8f2bb7d2a	Add support for updateNeeded flag in DirCacheEntry Change-Id: If06ff41d9ccd422afbc79ecbc3cfdf8bb2508dcd Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2010-07-09 14:12:06 +02:00
Jeff Schumacher	a8b29afd82	Create FileHeader from DiffEntry Added support for converting DiffEntrys to FileHeaders. FileHeaders are DiffEntrys with a buffer containing the diff output as well as a list of HunkHeaders. The HunkHeaders contain EditLists. The createFileHeader(DiffEntry) method in DiffFormatter performs a Myers Diff on the files refered to by the DiffEntry, then puts the returned EditList into a single HunkHeader, which is then put into the FileHeader to be returned. It also generates the appropriate diff header an puts it into the FileHeader's buffer. The rest of the diff output, which would normally be parsed to generate the HunkHeaders, is not generated. In fact, the purpose of this method is to avoid the costly diff output generation and parsing normally required to create a FileHeader. Change-Id: I7d8b18c0f6c85e3d02ad58995d3d231e69af5887	2010-07-08 16:58:55 -07:00
Christian Halstrick	4be88168b6	Allow ReadTreeTest to test arbitrary Checkouts ReadTreeTest was hardcoded to test WorkDirCheckout. Since we want alternative checkout implementations (especially DirCacheCheckout) this class has been refactored so that the tests can be reused to test other implementations The following changes have been done: - abstract methods for checkout and prescanTwoTrees have been introduced. Parameters are only the two trees. As index we will implicitly use the current index of the repo. - whenever tests needed a manipulated index before checkout and prescanTwoTrees it was ensured that the correct index was persisted (before we could use not-persisted instantiations of GitIndex passed as parameters to checkout, prescanTwoTrees - abstract methods for getting updated, conflicting, removed entries resulting from the last checkout, prescanTwoTrees have been introduced - an implementation for all these abstract methods using WorkDirCheckout has been added - method to assert a certain state of the index and the working tree has been added Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Change-Id: Icf177cf8043487169a32ddd72b6f8f9246a433f7	2010-07-09 01:38:49 +02:00
Stefan Lay	354b90131a	Fix javadoc typos in JGit API There were some small errors which made it difficult to read the JavaDoc. Change-Id: Ib3b34353465162adebaca3514d596d0edf5aea51 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-08 10:42:29 +02:00
Shawn O. Pearce	711bd3e3d0	Define a constant for 127 in DeltaEncoder The special value 127 here means how many bytes we can put into a single insert command. Rather than use the magical value 127, lets name it to better document the code. Change-Id: I5a326f4380f6ac87987fa833e9477700e984a88e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 09:52:09 -07:00
Shawn O. Pearce	cd7dd8591e	Cap delta copy instructions at 64k Although all modern delta decoders can process copy instructions with a count as large as 0xffffff (~15.9 MiB), pack version 2 streams are only supposed to use delta copy instructions up to 64 KiB. Rewrite our copy instruction encode loop to use the lower 64 KiB limit, even though modern decoders would support longer copies. To improve encoding performance we now try to encode up to four full copy commands in our buffer before we flush it to the stream, but we don't try to implement full buffering here. We are just trying to amortize the virtual method call to the destination stream when we have to do a large copy. Change-Id: I9410a16e6912faa83180a9788dc05f11e33fabae Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 09:52:09 -07:00
Shawn O. Pearce	384a19eee0	Deprecate all of the older Tree related code We want to get rid of these APIs, because they don't perform as well as DirCache/TreeWalk, or don't offer nearly as many features. Bug: 319145 Change-Id: I2b28f9cddc36482e1ad42d53e86e9d6461ba3bfc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 09:15:02 -07:00
Shawn O. Pearce	a215914a56	Fix DeltaEncoder header for objects 128 bytes long The encode loop had the wrong condition, objects that are 128 bytes in size need to have their length encoded as two bytes, not one. Change-Id: I3bef85f2b774871ba6104042b341749eb8e7595c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-07 08:53:03 -07:00
Shawn O. Pearce	f29741d1d8	amend commit: Support large delta packed objects as streams Rename the ByteWindow's inflate() method to setInput. We have completely refactored the purpose of this method to be feeding part (or all) of the window as input to the Inflater, and the actual inflate activity happens in the caller. Change-Id: Ie93a5bae0e9e637b5e822d56993ce6b562c6ad15 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-06 19:41:06 -07:00
Shawn O. Pearce	ab3c68c512	amend commit: Support large loose objects as streams We need to validate the stream state after the InflaterInputStream thinks the stream is done. Git expects a higher level of service from the Inflater than the InflaterInputStream usually gives, we need to ensure the embedded CRC is valid, and that there isn't trailing garbage at the end of the file. Change-Id: I1c9642a82dbd76b69e607dceccf8b85dc869a3c1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-06 19:41:01 -07:00
Stefan Lay	311da9b211	Fix comparison of nanoseconds NB.decodeInt32(info, base + 4) already returns nanoseconds. Therefore it must not be divided by 1000000. Change-Id: Ie8f5c4a03f984d98935dccedc2b1ba4457094899 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	2010-07-06 17:57:17 +02:00
Shawn O. Pearce	1913b41bc7	log: Implement --follow The FollowFilter can be installed on a RevWalk to cause the path to be updated through rename detection when the affected file is found to be added to the project. The filter works reasonably well, for example we can follow the history of the fsck command in git-core: $ jgit log --name-status --follow builtin/fsck.c \| grep ^R R100 builtin-fsck.c builtin/fsck.c R099 fsck.c builtin-fsck.c R099 fsck-objects.c fsck.c R099 fsck-cache.c fsck-objects.c Change-Id: I4017bcfd150126aa342fdd423a688493ca660a1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 18:17:55 -07:00
Shawn O. Pearce	e9de5643fa	Cache the diff configuration section This way we don't have to reparse for the rename limit every time we create a new rename detector for a repository. Change-Id: I669d031690b85ef4da5e39189be7173fb773fc56 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 18:17:52 -07:00
Shawn O. Pearce	8a0c58394d	log: Add whitespace ignore options Similar to what we did with diff, implement whitespace ignore options for log too. This requires us to define some means of creating any RawText object type at will inside of DiffFormatter, so we define a new factory interface to construct RawText instances on demand. Unfortunately we have to copy the entire block of common options. args4j only processes the options/arguments on the one command class and Java doesn't support multiple inheritance. Change-Id: Ia16cd3a11b850fffae9fbe7b721d7e43f1d0e8a5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 17:32:47 -07:00
Shawn O. Pearce	bd8740dc14	Format submodule links during differences Instead of crashing, output a submodule link with the simple "Subproject commit $fullid\n" syntax used by C Git. Change-Id: Iae8646941683fb19b73fb038217d2e3bf5f77fa9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:59:06 -07:00
Shawn O. Pearce	5be90be996	Redo DiffFormatter API to be easier to use Passing around the OutputStream and the Repository is crazy. Instead put the stream in the constructor, since this formatter exists only to output to the stream, and put the repository as a member variable that can be optionally set. Change-Id: I2bad012fee7f40dc1346700ebd19f1e048982878 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:58:37 -07:00
Shawn O. Pearce	04a9d23b9a	log, diff: Add rename detection support Implement rename detection in the command line diff and log commands. Also support --name-status, -p and -U flags, as these can be quite useful to view more detail. All of the Git patch file formatting code is now moved over to the DiffFormatter class. This permits us to reuse it in any context, including inside of IDEs. Change-Id: I687ccba34e18105a07e0a439d2181c323209d96c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:32:03 -07:00
Shawn O. Pearce	978535b090	Implement similarity based rename detection Content similarity based rename detection is performed only after a linear time detection is performed using exact content match on the ObjectIds. Any names which were paired up during that exact match phase are excluded from the inexact similarity based rename, which reduces the space that must be considered. During rename detection two entries cannot be marked as a rename if they are different types of files. This prevents a symlink from being renamed to a regular file, even if their blob content appears to be similar, or is identical. Efficiently comparing two files is performed by building up two hash indexes and hashing lines or short blocks from each file, counting the number of bytes that each line or block represents. Instead of using a standard java.util.HashMap, we use a custom open hashing scheme similiar to what we use in ObjecIdSubclassMap. This permits us to have a very light-weight hash, with very little memory overhead per cell stored. As we only need two ints per record in the map (line/block key and number of bytes), we collapse them into a single long inside of a long array, making very efficient use of available memory when we create the index table. We only need object headers for the index structure itself, and the index table, but not per-cell. This offers a massive space savings over using java.util.HashMap. The score calculation is done by approximating how many bytes are the same between the two inputs (which for a delta would be how much is copied from the base into the result). The score is derived by dividing the approximate number of bytes in common into the length of the larger of the two input files. Right now the SimilarityIndex table should average about 1/2 full, which means we waste about 50% of our memory on empty entries after we are done indexing a file and sort the table's contents. If memory becomes an issue we could discard the table and copy all records over to a new array that is properly sized. Building the index requires O(M + N log N) time, where M is the size of the input file in bytes, and N is the number of unique lines/blocks in the file. The N log N time constraint comes from the sort of the index table that is necessary to perform linear time matching against another SimilarityIndex created for a different file. To actually perform the rename detection, a SxD matrix is created, placing the sources (aka deletions) along one dimension and the destinations (aka additions) along the other. A simple O(S x D) loop examines every cell in this matrix. A SimilarityIndex is built along the row and reused for each column compare along that row, avoiding the costly index rebuild at the row level. A future improvement would be to load a smaller square matrix into SimilarityIndexes and process everything in that sub-matrix before discarding the column dimension and moving down to the next sub-matrix block along that same grid of rows. An optional ProgressMonitor is permitted to be passed in, allowing applications to see the progress of the detector as it works through the matrix cells. This provides some indication of current status for very long running renames. The default line/block hash function used by the SimilarityIndex may not be optimal, and may produce too many collisions. It is borrowed from RawText's hash, which is used to quickly skip out of a longer equality test if two lines have different hash functions. We may need to refine this hash in the future, in order to minimize the number of collisions we get on common source files. Based on a handful of test commits in JGit (especially my own recent rename repository refactoring series), this rename detector produces output that is very close to C Git. The content similarity scores are sometimes off by 1%, which is most probably caused by our SimilarityIndex type using a different hash function than C Git uses when it computes the delta size between any two objects in the rename matrix. Bug: 318504 Change-Id: I11dff969e8a2e4cf252636d857d2113053bdd9dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 16:32:03 -07:00
Shawn O. Pearce	4dd7b35b26	Improve description of isBare and NoWorkTreeException Alex pointed out that my description of a bare repository might be confusing for some readers. Reword the description of the error, and make it consistent throughout the Repository class's API. Change-Id: I87929ddd3005f578a7022f363270952d1f7f8664 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:31 -07:00
Shawn O. Pearce	08d349a27b	amend commit: Refactor repository construction to builder class During code review, Alex raised a few comments about commit `532421d989` ("Refactor repository construction to builder class"). Due to the size of the related series we aren't going to go back and rebase in something this minor, so resolve them as a follow-up commit instead. Change-Id: Ied52f7a8f7252743353c58d20bfc3ec498933e00 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:30 -07:00
Shawn O. Pearce	fe9860a444	Remove pointless size test in PackFile decompress Now that any large objects are forced through a streaming loader when its bigger than getStreamFileThreshold(), and that threshold is pegged at Integer.MAX_VALUE as its largest size, we will never be able to reach this code path where we threw OutOfMemoryError. Robin pointed out that we probably should include a message here, but the code is effectively unreachable, so there isn't any value in adding a message at this point. So remove it. Change-Id: Ie611d005622e38a75537f1350246df0ab89dd500 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:30 -07:00
Shawn O. Pearce	412ca65bd5	Avoid unbounded getCachedBytes during parseAny Since we don't know the type of object we are parsing, we don't know if its a massive blob, or some small commit or annotated tag. Avoid pulling the cached bytes until we have checked the type and decided if we actually need them to continue parsing right now. This way large blobs which won't fit in memory and would throw a LargeObjectException don't abort parsing. Change-Id: Ifb70df5d1c59f616aa20ee88898cb69524541636 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:30 -07:00
Shawn O. Pearce	e4a480f658	Make type and size lazy for large delta objects Callers don't necessarily need the getSize() result from a large delta. They instead should be always using openStream() or copyTo() for blobs going to local files, or they should be checking the result of the constant-time isLarge() method to determine the type of access they can use on the ObjectLoader. Avoid inflating the delta instruction stream twice by delaying the decoding of the size until after we have created the DeltaStream and decoded the header. Likewise with the type, callers don't necessarily always need it to be present in an ObjectLoader. Delay looking at it as late as we can, thereby avoiding an ugly O(N^2) loop looking up the type for every single object in the entire delta chain. Change-Id: I6487b75b52a5d201d811a8baed2fb4fcd6431320 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-03 10:54:29 -07:00
Shawn O. Pearce	629fd0d594	Clean up LICENSE file We used our LICENSE file to describe both the license of the package, and also the header template that should appear at the start of all Java files we create. This creates a confusing situation for readers who just want to consume the package, because our file header template starts off in the middle of a sentence. Move our template header to a separate file, and reformat the text of the license to be something more readable by a person reviewing the project's terms of use. Change-Id: If318e64c06683ea14e0240914c2d057c9199ce98 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-02 14:52:49 -07:00
Shawn O. Pearce	113577617b	Use core.streamFileThreshold to set our streaming limit We default this to 1 MiB for now, but we allow users to modify it through the Repository's configuration file to be a different value. A new repository listener is used to identify when the setting has been updated and trigger a reconfiguration of any active ObjectReaders. To prevent a horrible explosion we cap core.streamFileThreshold at no more than 1/4 of the maximum JVM heap size. We do this because we need at least 2 byte arrays equal in size to the stream threshold for the worst case delta inflation scenario, and our host application probably also needs some amount of the heap for their working set size. Change-Id: I103b3a541dc970bbf1a6d92917a12c5a1ee34d6c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-02 12:41:39 -07:00
Shawn O. Pearce	ad68553be4	Support large delta packed objects as streams Very large delta instruction streams, or deltas which use very large base objects, are now streamed through as large objects rather than being inflated into a byte array. This isn't the most efficient way to access delta encoded content, as we may need to rewind and reprocess the base object when there was a block moved within the file, but it will at least prevent the JVM from having its heap explode. When streaming a delta we have an inflater open for each level in the delta chain, to inflate the instruction set of the delta, as well as an inflater for the base level object. The base object is buffered, as is the top level delta requested by the application, but we do not buffer the intermediate delta streams. This keeps memory usage lower, so its closer to 1024 bytes per level in the chain, without having an adverse impact on raw throughput as the top-level buffer gets pushed down to the lowest stream that has the next region. Delta instructions transparently collapse here, if the top level does not copy a region from its base, the base won't materialize that part from its own base, etc. This allows us to avoid copying around a lot of segments which have been deleted from the final version. Change-Id: I724d45245cebb4bad2deeae7b896fc55b2dd49b3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-02 02:19:12 -07:00
Shawn O. Pearce	ded8f6c721	Support large whole packed objects as streams Similar to the loose object support, whole packed objects can now be streamed back to the caller. The streaming is less efficient as we copy the data from the cached window array into the InflaterInputStream's internal buffer, then inflate it there before returning to the application. Like with unpacked objects, there is plenty of room for some optimization, especially for the copyTo method, where we don't necessarily need so much buffering to exist. Change-Id: Ie23be81289e37e24b91d17b0891e47b9da988008 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-01 19:34:21 -07:00
Shawn O. Pearce	13e0218a25	Replace PackedObjectLoader with ObjectLoader.SmallObject The class is identical, but ObjectLoader.SmallObject is part of our public API for storage implementations to build on top of. Change-Id: I381a3953b14870b6d3d74a9c295769ace78869dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-01 18:27:51 -07:00
Shawn O. Pearce	fa23482ca7	Support large loose objects as streams Big loose objects can now be streamed if they are over the large object size threshold. This prevents the JVM heap from exploding with a very large byte array to hold the slurped file, and then again with its uncompressed copy. We may have slightly slowed down the simple case for small loose objects, as the loader no longer slurps the entire thing and decompresses in memory. To try and keep good performance for the very common small objects that are below 8 KiB in size, buffers are set to 8 KiB, causing the reader to slurp most of the file anyway. However the data has to be copied at least once, from the BufferedInputStream into the InflaterInputStream. New unit tests are supplied to get nearly 100% code coverage on the unpacked code paths, for both standard and pack style loose objects. We tested a fair chunk of the code elsewhere, but these new tests are better isolated to the specific branches in the code path. Change-Id: I87b764ab1b84225e9b5619a2a55fd8eaa640e1fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-07-01 18:26:17 -07:00
Jeff Schumacher	cb8e1e6014	Added a preliminary version of rename detection JGit does not currently do rename detection during diffs. I added a class that, given a TreeWalk to iterate over, can output a list of DiffEntry's for that TreeWalk, taking into account renames. This class only detects renames by SHA1's. More complex rename detection, along the lines of what C Git does will be added later. Change-Id: I93606ce15da70df6660651ec322ea50718dd7c04	2010-07-01 17:33:53 -07:00
Shawn O. Pearce	2489088235	Permit AnyObjectTo to compareTo AnyObjectId Assume that the argument of compareTo won't be mutated while we are doing the compare, and support the wider AnyObjectId type so MutableObjectId is suitable on either side of the compareTo call. Change-Id: I2a63a496c0a7b04f0e5f27d588689c6d5e149d98 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 19:07:36 -07:00
Shawn O. Pearce	d04b7972d8	Use copyTo during checkout of files to working tree This way we can stream a large file through memory, rather than loading the entire thing into a single contiguous byte array. Change-Id: I3ada2856af2bf518f072edec242667a486fb0df1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:56:20 -07:00
Shawn O. Pearce	a0fd06e5c2	Stream whole deflated objects in PackWriter Instead of loading the entire object as a byte array and passing that into the deflater, let the ObjectLoader copy the object onto the DeflaterOutputStream. This has the nice side effect of using some sort of stride hack in the Sun implementation that may improve compression performance. Change-Id: I3f3d681b06af0da93ab96c75468e00e183ff32fe Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:50:50 -07:00
Shawn O. Pearce	ad0383734e	Lazily allocate Deflater in PackWriter Only allocate the Deflater if we can't reuse everything, but also make sure we release it when we release the PackWriter's resources. Change-Id: I16a32b94647af0778658eda87acbafc9a25b314a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:40:54 -07:00
Shawn O. Pearce	23e7f6376a	Add openStream to ObjectLoader for big blobs Blobs that are too large to read as a single byte array should be accessed through an InputStream based interface instead, allowing the application to walk through the data stream incrementally. Define the basic interface to support streaming contents, but don't implement it yet for the file based backend. Change-Id: If9e4442e9ef4ed52c3e0f1af9398199a73145516 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2010-06-30 18:36:10 -07:00

... 2 3 4 5 6 ...

603 Commits All Branches Search

603 Commits

All Branches