Commit Graph

617 Commits

Author SHA1 Message Date
Shawn O. Pearce 412ca65bd5 Avoid unbounded getCachedBytes during parseAny
Since we don't know the type of object we are parsing, we don't
know if its a massive blob, or some small commit or annotated tag.
Avoid pulling the cached bytes until we have checked the type and
decided if we actually need them to continue parsing right now.

This way large blobs which won't fit in memory and would throw
a LargeObjectException don't abort parsing.

Change-Id: Ifb70df5d1c59f616aa20ee88898cb69524541636
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 10:54:30 -07:00
Shawn O. Pearce e4a480f658 Make type and size lazy for large delta objects
Callers don't necessarily need the getSize() result from a large
delta.  They instead should be always using openStream() or copyTo()
for blobs going to local files, or they should be checking the
result of the constant-time isLarge() method to determine the type
of access they can use on the ObjectLoader.  Avoid inflating the
delta instruction stream twice by delaying the decoding of the size
until after we have created the DeltaStream and decoded the header.

Likewise with the type, callers don't necessarily always need it
to be present in an ObjectLoader.  Delay looking at it as late as
we can, thereby avoiding an ugly O(N^2) loop looking up the type
for every single object in the entire delta chain.

Change-Id: I6487b75b52a5d201d811a8baed2fb4fcd6431320
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-03 10:54:29 -07:00
Shawn O. Pearce 629fd0d594 Clean up LICENSE file
We used our LICENSE file to describe both the license of the package,
and also the header template that should appear at the start of
all Java files we create.  This creates a confusing situation for
readers who just want to consume the package, because our file
header template starts off in the middle of a sentence.

Move our template header to a separate file, and reformat the text
of the license to be something more readable by a person reviewing
the project's terms of use.

Change-Id: If318e64c06683ea14e0240914c2d057c9199ce98
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-02 14:52:49 -07:00
Shawn O. Pearce 113577617b Use core.streamFileThreshold to set our streaming limit
We default this to 1 MiB for now, but we allow users to modify
it through the Repository's configuration file to be a different
value.  A new repository listener is used to identify when the
setting has been updated and trigger a reconfiguration of any
active ObjectReaders.

To prevent a horrible explosion we cap core.streamFileThreshold
at no more than 1/4 of the maximum JVM heap size.  We do this
because we need at least 2 byte arrays equal in size to the
stream threshold for the worst case delta inflation scenario,
and our host application probably also needs some amount of the
heap for their working set size.

Change-Id: I103b3a541dc970bbf1a6d92917a12c5a1ee34d6c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-02 12:41:39 -07:00
Shawn O. Pearce ad68553be4 Support large delta packed objects as streams
Very large delta instruction streams, or deltas which use very large
base objects, are now streamed through as large objects rather than
being inflated into a byte array.

This isn't the most efficient way to access delta encoded content, as
we may need to rewind and reprocess the base object when there was a
block moved within the file, but it will at least prevent the JVM from
having its heap explode.

When streaming a delta we have an inflater open for each level in the
delta chain, to inflate the instruction set of the delta, as well as
an inflater for the base level object.  The base object is buffered,
as is the top level delta requested by the application, but we do not
buffer the intermediate delta streams.  This keeps memory usage lower,
so its closer to 1024 bytes per level in the chain, without having an
adverse impact on raw throughput as the top-level buffer gets pushed
down to the lowest stream that has the next region.

Delta instructions transparently collapse here, if the top level does
not copy a region from its base, the base won't materialize that part
from its own base, etc.  This allows us to avoid copying around a lot
of segments which have been deleted from the final version.

Change-Id: I724d45245cebb4bad2deeae7b896fc55b2dd49b3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-02 02:19:12 -07:00
Shawn O. Pearce ded8f6c721 Support large whole packed objects as streams
Similar to the loose object support, whole packed objects can
now be streamed back to the caller.  The streaming is less
efficient as we copy the data from the cached window array
into the InflaterInputStream's internal buffer, then inflate
it there before returning to the application.

Like with unpacked objects, there is plenty of room for some
optimization, especially for the copyTo method, where we don't
necessarily need so much buffering to exist.

Change-Id: Ie23be81289e37e24b91d17b0891e47b9da988008
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-01 19:34:21 -07:00
Shawn O. Pearce 13e0218a25 Replace PackedObjectLoader with ObjectLoader.SmallObject
The class is identical, but ObjectLoader.SmallObject is part of our
public API for storage implementations to build on top of.

Change-Id: I381a3953b14870b6d3d74a9c295769ace78869dc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-01 18:27:51 -07:00
Shawn O. Pearce fa23482ca7 Support large loose objects as streams
Big loose objects can now be streamed if they are over the large
object size threshold.  This prevents the JVM heap from exploding
with a very large byte array to hold the slurped file, and then
again with its uncompressed copy.

We may have slightly slowed down the simple case for small
loose objects, as the loader no longer slurps the entire thing
and decompresses in memory.  To try and keep good performance
for the very common small objects that are below 8 KiB in size,
buffers are set to 8 KiB, causing the reader to slurp most of the
file anyway.  However the data has to be copied at least once,
from the BufferedInputStream into the InflaterInputStream.

New unit tests are supplied to get nearly 100% code coverage on the
unpacked code paths, for both standard and pack style loose objects.
We tested a fair chunk of the code elsewhere, but these new tests
are better isolated to the specific branches in the code path.

Change-Id: I87b764ab1b84225e9b5619a2a55fd8eaa640e1fe
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-07-01 18:26:17 -07:00
Jeff Schumacher cb8e1e6014 Added a preliminary version of rename detection
JGit does not currently do rename detection during diffs. I added
a class that, given a TreeWalk to iterate over, can output a list
of DiffEntry's for that TreeWalk, taking into account renames. This
class only detects renames by SHA1's. More complex rename detection,
along the lines of what C Git does will be added later.

Change-Id: I93606ce15da70df6660651ec322ea50718dd7c04
2010-07-01 17:33:53 -07:00
Shawn O. Pearce 2489088235 Permit AnyObjectTo to compareTo AnyObjectId
Assume that the argument of compareTo won't be mutated while we
are doing the compare, and support the wider AnyObjectId type so
MutableObjectId is suitable on either side of the compareTo call.

Change-Id: I2a63a496c0a7b04f0e5f27d588689c6d5e149d98
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 19:07:36 -07:00
Shawn O. Pearce d04b7972d8 Use copyTo during checkout of files to working tree
This way we can stream a large file through memory, rather than
loading the entire thing into a single contiguous byte array.

Change-Id: I3ada2856af2bf518f072edec242667a486fb0df1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 18:56:20 -07:00
Shawn O. Pearce a0fd06e5c2 Stream whole deflated objects in PackWriter
Instead of loading the entire object as a byte array and passing
that into the deflater, let the ObjectLoader copy the object onto
the DeflaterOutputStream.  This has the nice side effect of using
some sort of stride hack in the Sun implementation that may improve
compression performance.

Change-Id: I3f3d681b06af0da93ab96c75468e00e183ff32fe
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 18:50:50 -07:00
Shawn O. Pearce ad0383734e Lazily allocate Deflater in PackWriter
Only allocate the Deflater if we can't reuse everything, but also
make sure we release it when we release the PackWriter's resources.

Change-Id: I16a32b94647af0778658eda87acbafc9a25b314a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 18:40:54 -07:00
Shawn O. Pearce 23e7f6376a Add openStream to ObjectLoader for big blobs
Blobs that are too large to read as a single byte array should be
accessed through an InputStream based interface instead, allowing
the application to walk through the data stream incrementally.

Define the basic interface to support streaming contents, but don't
implement it yet for the file based backend.

Change-Id: If9e4442e9ef4ed52c3e0f1af9398199a73145516
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 18:36:10 -07:00
Jeff Schumacher 7b0b4110ed Refactored code out of FileHeader to facilitate rename detection
Refactored a superclass out of FileHeader called DiffEntry that holds
the more general data from FileHeader that is useful in rename
detection (old/new Ids, modes, names, as well as changeType and
score). FileHeader is now a DiffEntry that adds Hunks, parsing
abilities, etc.

Change-Id: I8398728cd218f8c6e98f7a4a7f2f342391d865e4
2010-06-30 17:53:27 -07:00
Dmitry Neverov 44854741c5 Fix missing flush in StreamCopyThread
It is possible that StreamCopyThread will not flush everything
from it's src to it's dst.  In most cases StreamCopyThread works
like this:

  in loop:
    n = src.read(buf);
    dst.write(buf, 0, n);

and when we want to flush, we interrupt() StreamCopyThread and it
flushes everything it wrote to dst.

The problem is that our interrupt() could interrupt reading. In this
case we will flush everything we wrote to dst, but not everything
we wrote to src.

Change-Id: Ifaf4d8be87535c7364dd59b217dfc631460018ff
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 10:48:44 -07:00
Shawn O. Pearce a1d5f5b6b5 Move DirCache factory methods to Repository
Instead of creating the DirCache from a static factory method, use
an instance method on Repository, permitting the implementation to
override the method with a completely different type of DirCache
reading and writing.  This would better support a repository in the
cloud strategy, or even just an in-memory unit test environment.

Change-Id: I6399894b12d6480c4b3ac84d10775dfd1b8d13e7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 10:39:00 -07:00
Shawn O. Pearce cb9d8285ba Create NoWorkTreeException for bare repositories
Using a custom exception type makes it easire for an application
developer to understand why an exception was thrown out of a method
we declare.  To remain compatiable with existing callers, we still
extend off IllegalStateException.

Change-Id: Ideeef2399b11ca460a2dbb3cd80eb76aa0a025ba
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-30 09:48:36 -07:00
Jeff Schumacher 9f2249bd26 Added check for binary files while diffing
Added a check in Diff to ensure that files that are most likely
not text are not line-by-line diffed. Files are determined to be
binary by checking the first 8000 bytes for a null character. This
is a similar heuristic to what C Git uses.

Change-Id: I2b6f05674c88d89b3f549a5db483f850f7f46c26
2010-06-29 17:23:00 -07:00
Shawn O. Pearce 515deaf7e5 Ensure RevWalk is released when done
Update a number of calling sites of RevWalk to ensure the walker's
internal ObjectReader is released after the walk is no longer used.
Because the ObjectReader is likely to hold onto a native resource
like an Inflater, we don't want to leak them outside of their
useful scope.

Where possible we also try to share ObjectReaders across several
walk pools, or between a walker and a PackWriter.  This permits
the ObjectReader to actually do some caching if it felt inclined
to do so.

Not everything was updated, we'll probably need to come back and
update even more call sites, but these are some of the biggest
offenders.  Test cases in particular aren't updated.  My plan is to
move most storage-agnostic tests onto some purely in-memory storage
solution that doesn't do compression.

Change-Id: I04087ec79faeea208b19848939898ad7172b6672
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-29 15:12:53 -07:00
Shawn O. Pearce 4913ad57fc Use a single ObjectReader in IpLogGenerator
This way we can be ensured its released when the generator
is done running.

Change-Id: I6be48d26b9bd5ac176c1316a9aabdf3a897e1696
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-29 09:32:53 -07:00
Shawn O. Pearce 94228bde22 Use ObjectReader in DirCacheBuilder.addTree
Rather than building a custom reader, have the caller supply us one.

Change-Id: Ief2b5a6b1b75f05c8a6bc732a60d4d1041dd8254
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-29 09:30:29 -07:00
Matthias Sohn 730b708dae Merge "Update build to use Tycho 0.9.0" 2010-06-29 09:02:46 -04:00
Shawn O. Pearce d6e975f71b Use one ObjectReader for WalkFetchConnection
Instead of creating new ObjectReader for each walker, use one for
the entire connection and delegate reads through it.

Change-Id: I7f0a2ec8c9fe60b095a7be77dc423a2ff8b443a3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 18:47:33 -07:00
Shawn O. Pearce 121d009b9b Use ObjectReader in RevWalk, TreeWalk
We don't actually need a Repository object here, just an ObjectReader
that can load content for us.  So change the API to depend on that.

However, this breaks the asCommit and asTag legacy translation methods
on RevCommit and RevTag, so we still have to keep the Repository
inside of RevWalk for those two types.  Hopefully we can drop those in
the future, and then drop the Repository off the RevWalk.

Change-Id: Iba983e48b663790061c43ae9ffbb77dfe6f4818e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 18:47:29 -07:00
Shawn O. Pearce 06f635a4bc Fix minor formatting issue in UploadPack
Change-Id: Ifc0c3a94dc0e16126af6cf17e9c4a7cb96e8ffab
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 18:47:28 -07:00
Shawn Pearce 3fd4918852 Merge changes Ie56301aa,Ic2f79e85
* changes:
  Added further support for whitespace ignoring during diff
  Added support for whitespace ignoring
2010-06-28 20:27:04 -04:00
Jeff Schumacher 9869ef2592 Added further support for whitespace ignoring during diff
Added code to support ignoring leading, trailing, and changed
whitespace when performing a diff operation. I also added command
line options to Diff to enable the various whitespace ignoring
methods. These match the flags for git diff.

Change-Id: Ie56301aafad59ee3f0fe5de62719f5023cd702c8
2010-06-28 17:25:19 -07:00
Matthias Sohn a2325f6885 Update build to use Tycho 0.9.0
Change-Id: I589267e6cfd0514383c2a3da51c9b7a659f77844
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-06-29 00:08:36 +02:00
Shawn O. Pearce 242b4026d9 Remove volatile keyword from RepositoryEvent
We don't need this field to be volatile.  Events are delivered by
the same thread that created the RepositoryEvent object, and thus
any cross-thread operations would need to be handled by some other
type of synchronization in the listener, and that would protect
both the repository field and any other per-event data.

Change-Id: Iefe345959e1a2d4669709dbf82962bcc1b8913e3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 12:46:18 -07:00
Shawn O. Pearce aa4b06e087 Rename openObject, hasObject to just open, has
Similar to what we did on Repository, the openObject method
already implied we wanted to open an object, given its main
argument was of type AnyObjectId.  Simplify the method name
to just the action, has or open.

Change-Id: If055e5e0d8de0e2424c18a773f6d2bc2f66054f4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 11:57:41 -07:00
Shawn O. Pearce acb7be2c5a Refactor Repository.openObject to be Repository.open
We drop the "Object" suffix, because its pretty clear here that
we want to open an object, given that we pass in AnyObjectId as
the main parameter.  We also fix the calling convention to throw
a MissingObjectException or IncorrectObjectTypeException, so that
callers don't have to do this error checking themselves.

Change-Id: I72c43353cea8372278b032f5086d52082c1eee39
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 11:54:58 -07:00
Shawn O. Pearce 6b62e53b60 Move PackWriter progress monitors onto the operations
Rather than taking the ProgressMonitor objects in our constructor and
carrying them around as instance fields, take them as arguments to the
actual time consuming operations we need to run.

Change-Id: I2b230d07e277de029b1061c807e67de5428cc1c4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 11:47:28 -07:00
Shawn O. Pearce f288c27e46 Pass the PackOutputStream down the call stack
Rather than storing this in an instance member, pass it down the
calling stack.  Its cleaner, we don't have to poke the stream as
a temporary field, and then unset it.

Change-Id: I0fd323371bc12edb10f0493bf11885d7057aeb13
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 11:47:28 -07:00
Shawn O. Pearce 1ad2feb7b3 Remove Repository.openObject(ObjectReader, AnyObjectId)
Going through ObjectReader.openObject(AnyObjectId) is faster, but
also produces cleaner application level code.  The error checking
is done inside of the openObject method, which means it can be
removed from the application code.

Change-Id: Ia927b448d128005e1640362281585023582b1a3a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 11:47:28 -07:00
Shawn O. Pearce 9ba7bd4df4 Throw IncorrectObjectTypeException on bad type hints
If the type hint isn't OBJ_ANY and it doesn't match the actual type
observed from the object store, define the reader to throw back an
IncorrectObjectTypeException.  This way the caller doesn't have to
perform this check itself before it evaluates the object data, and
we can simplify quite a few call sites.

Change-Id: I9f0dfa033857f439c94245361fcae515bc0a6533
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 11:47:25 -07:00
Jeff Schumacher 543235b805 Added support for whitespace ignoring
JGit did not have support for skipping whitespace when comparing
lines in RawText objects. I added a subclass of RawText that skips
whitespace in its equals and hashCode methods. I used a subclass
rather than adding functionality into RawText so that performance
would not be impacted by extra logic.

This class only supports ignoring all whitespace. Others will follow
that allow other forms of whitespace ignoring.

Change-Id: Ic2f79e85215e48d3fd53ec1b4ad13373dd183a4a
2010-06-28 10:59:10 -07:00
Shawn O. Pearce a45728d7a4 Ensure ObjectReader used by PackWriter is released
The ObjectReader API demands that we release the reader when we are
done with it.  PackWriter contains a reader, which it uses for the
entire packing session.  Expose the release of the reader through
a release method on the writer.

This still doesn't address the RevWalk and TreeWalk users, who
don't correctly release their reader.  But its a small step in the
right direction.

Change-Id: I5cb0b5c1b432434a799fceb21b86479e09b84a0a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 10:25:11 -07:00
Shawn O. Pearce b5aa52e98a Ensure PackWriter releases its ObjectReader
Change-Id: I3f8af29066cc5a2132dc4a75c9654d97800f2f18
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 10:16:27 -07:00
Shawn O. Pearce e01abbd543 Release ObjectReader before the cached ObjectDatabase
I don't want to play games with the order of release here, its
probably safer to release the reader before the database, just
in case the one depends on the other.

Change-Id: I2394c7d2477eaf7a7e1556fc3393c59d3b31e764
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 09:47:20 -07:00
Shawn O. Pearce b40f02eb1a Release ObjectInserter in merge() not mergeImpl()
By doing the release at the higher level class, we can ensure
the release occurs if the inserter was allocated, even if the
implementation forgets to do this.  Since the higher level class
is what allocated it, it makes sense to have it also do the release.

Change-Id: Id617b2db864c3208ed68cba4eda80e51612359ad
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 09:35:55 -07:00
Shawn O. Pearce 5aae041a81 Commit: Use Repository.newObjectInserter
Everyone else does.  This must have been a spot I missed during
some sort of squash while developing the series.

Change-Id: I62eae50b618f47ee33ad7cf71fc05b724f603201
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-28 09:22:48 -07:00
Shawn O. Pearce ea21c111cb Move PackWriter over to storage.pack.PackWriter
Similar to what we did with the file code, move the pack writer
into its own package so the related classes and their package
private methods are hidden from the rest of the library.

Change-Id: Ic1b5c7c8c8d266e90c910d8d68dfc8e93586854f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 18:51:12 -07:00
Shawn O. Pearce 71aace52f7 Simplify ObjectLoaders coming from PackFile
We no longer need an ObjectLoader to be lazy and try to delay
the materialization of the object content.  That was done only
to support PackWriter searching for a good reuse candidate.

Instead, simplify the code base by doing the materialization
immediately when the loader asks for it, because any caller
asking for the loader is going to need the content.

Change-Id: Id867b1004529744f234ab8f9cfab3d2c52ca3bd0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 18:50:38 -07:00
Shawn O. Pearce 68518ca3aa Remove getRawSize, getRawType from ObjectLoader
These were only used by PackWriter to help it filter object
representations.  Their only user disappeared when we rewrote the
object selection code path to use the new representation type.

Change-Id: I9ed676bfe4f87fcf94aa21e53bda43115912e145
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 18:50:38 -07:00
Shawn O. Pearce 86547022f0 Tighten up local packed object representation during packing
Rather than making a loader, and then using that to fill the object
representation, parse the header and set up our data directly.
This saves some time, as we don't waste cycles on information we
won't use right now.

The weight computed for a representation is now its actual stored
size in the pack file, rather than its inflated size.  This accounts
for changes made when the compression level is modified on the
repository.  It is however more costly to determine the weight of
the object, since we have to find its length in the pack.  To try and
recover that cost we now cache the length as part of our ObjectToPack
record, so it doesn't have to be found during the output phase.

A LocalObjectToPack now costs us (assuming 32 bit pointers):

                   (32 bit)     (64 bit)
  vm header:         8 bytes      8 bytes
  ObjectId:         20 bytes     20 bytes
  PackedObjectInfo: 12 bytes     12 bytes
  ObjectToPack:      8 bytes     12 bytes
  LocalOTP:         20 bytes     24 bytes
                 -----------    ---------
                    68 bytes     74 bytes

Change-Id: I923d2736186eb2ac8ab498d3eb137e17930fcb50
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 18:50:38 -07:00
Shawn O. Pearce ad5238dc67 Move FileRepository to storage.file.FileRepository
This move isolates all of the local file specific implementation code
into a single package, where their package-private methods and support
classes are properly hidden away from the rest of the core library.

Because of the sheer number of files impacted, I have limited this
change to only the renames and the updated imports.

Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 18:50:34 -07:00
Shawn O. Pearce 3a7aec03e0 Implement zero-copy for single window objects
Objects that fall completely within a single window can be worked
with in a zero-copy fashion, provided that the window is backed by
a normal byte[] and not by a ByteBuffer.

This works for a surprising number of objects.  The default window
size is 8 KiB, but most deltas are quite a bit smaller than that.
Objects smaller than 1/2 of the window size have a very good chance
of falling completely within a window's array, which means we can
work with them without copying their data around.

Larger objects, or objects which are unlucky enough to span over a
window boundary, get copied through the temporary buffer.  We pay
a tiny penalty to realize we can't use the zero-copy code path,
but its easier than trying to keep track of two adjacent windows.

With this change (as well as everything preceeding it), packing
is actually a bit faster.  Some crude benchmarks based on cloning
linux-2.6.git (~324 MiB, 1,624,785 objects) over localhost using
C git client and JGit daemon shows we get better throughput, and
slightly better times:

  Total Time    | Throughput
  (old)  (now)  | (old)          (now)
  --------------+---------------------------
  2m45s  2m37s  | 12.49 MiB/s    21.17 MiB/s
  2m42s  2m36s  | 16.29 MiB/s    22.63 MiB/s
  2m37s  2m31s  | 16.07 MiB/s    21.92 MiB/s

Change-Id: I48b2c8d37f08d7bf5e76c5a8020cde4a16ae3396
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 16:13:22 -07:00
Shawn O. Pearce ece88b99eb Redo PackWriter object reuse output
Output of selected reuses is refactored to use a new ObjectReuseAsIs
interface that extends the ObjectReader.  This interface allows the
reader to control how it performs the reuse into the output stream,
but also allows it to throw an exception to request the writer to
find a different candidate representation.

The PackFile reuse code was overhauled, cleaning up the APIs so they
aren't exposed in the object loader, but instead are now a single
method on the PackFile itself.  The reuse algorithm was changed to do
a data verification pass, followed by the copy pass to the output.
This permits us to work around a corrupt object in a pack file by
seeking another copy of that object when this one is bad.

The reuse code was also optimized for the common case, where the
in-pack representation is under 16 KiB.  In these smaller cases
data is sent to the pack writer more directly, avoiding some copying.

Change-Id: I6350c2b444118305e8446ce1dfd049259832bcca
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 14:46:05 -07:00
Shawn O. Pearce bf4ffff07f Redo PackWriter object reuse selection
The new selection implementation uses a public API on the
ObjectReader, allowing the storage library to enumerate its
candidates and select the best one for this packer without
needing to build a temporary list of the candidates first.

Change-Id: Ie01496434f7d3581d6d3bbb9e33c8f9fa649b6cd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-06-26 14:16:06 -07:00