Commit Graph

905 Commits

Author SHA1 Message Date
Chris Aniszczyk f638679797 Merge "Remove unnecessary note fanout when removing notes" 2010-11-13 12:38:17 -05:00
Chris Aniszczyk 1b3abe75f8 Merge "Split note leaf buckets at 256 elements" 2010-11-13 12:37:30 -05:00
Chris Aniszczyk 9f2bde653f Merge "Add internal API for note iteration" 2010-11-13 12:32:59 -05:00
Chris Aniszczyk e9002a45ce Merge "Allow writing a NoteMap back to the repository" 2010-11-13 12:31:58 -05:00
Chris Aniszczyk 56a802104a Merge "Add in-memory updating support to NoteMap" 2010-11-13 12:31:02 -05:00
Chris Aniszczyk 43156bf045 Merge "Remember non-note tree entries when reading" 2010-11-13 12:29:31 -05:00
Shawn O. Pearce 51bf8ea2a4 Merge branch 'rename-detection'
* rename-detection:
  RenameDetector: Only scan deletes if adds exist
  SimilarityRenameDetector: Initialize sizes to 0
  SimilarityRenameDetector: Avoid allocating source index
  SimilarityRenameDetector: Only attempt to index large files once
  SimilarityIndex: Don't overflow internal counter fields
  SimilarityIndex: Accept files larger than 8 MB
  SimilarityIndex: Correct comment explaining the logic
2010-11-12 16:15:43 -08:00
Shawn O. Pearce c35f98b226 Merge branch 'fs-fsync'
* fs-fsync:
  Remove unnecessary flush calls from LockFile
  Remove unnecessary region locking from LockFile
  Support core.fsyncRefFiles option
  Support core.fsyncObjectFiles option
  Simplify LockFile write(ObjectId) case
2010-11-12 16:12:27 -08:00
Christian Halstrick 484807e82b Added one-tree constructor to DirCacheCheckout
When DirCacheCheckout should be used to checkout only one
tree (reset --hard, clone) then we had to use the standard
constructor and specify null as value for head. This change
adds explicit constructors not taking HEAD and documents
that.

Bug: 330021
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
2010-11-13 00:45:50 +01:00
Shawn O. Pearce e7e9a47b52 Remove unnecessary note fanout when removing notes
Fanout level notes trees are combined back together into a flat leaf
level tree if during a removal of a subtree there are less than 3/4 of
the fanout subtrees still existing, and the size of the combined leaf
is under the 256 split limit noted above.

This rule is used because deletes are less common than insertions, and
SHA-1's relatively uniform distribution suggests that with only 192
subtrees existing in the fanout, there should be approximately 192
names in the combined replacement leaf tree.

Change-Id: Ia9d145ffd5454982509fc40906bc4dbbf2b13952
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 14:01:28 -08:00
Shawn O. Pearce 2b0df15f7f Split note leaf buckets at 256 elements
Leaf level notes trees are split into a new fan-out tree if an
insertion occurs and the tree already contains >= 256 notes in it.

The splitting may occur multiple times if all of the notes have the
same prefix; in the worst case this produces a tree path such as
"00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/be" if all
of the notes begin with zeros.

Change-Id: I2d7d98f35108def9ec49936ddbdc34b13822a3c7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 14:01:28 -08:00
Shawn O. Pearce 3728918d72 Add internal API for note iteration
Some algorithms need to be able to iterate through all notes within a
particular bucket, such as when splitting or combining a bucket.
Exposing an Iterator<Note> makes this traversal possible.

For a LeafBucket the iteration is simple, its over the sorted array of
elements.  For FanoutBucket its a bit more complex as the iteration
needs to union the iterators of each fanout bucket, lazily loading any
buckets that aren't already in-memory.

Change-Id: I3d5279b11984f44dcf0ddb14a82a4b4e51d4632d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 14:01:28 -08:00
Shawn O. Pearce 3e2b9b691e Allow writing a NoteMap back to the repository
This is necessary to allow applications to wrap the note tree in
a commit and update the note branch with the new state.

Change-Id: Idbd7ead4a1b16ae2b64a30a4a01a29cfed548cdf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 14:01:28 -08:00
Shawn O. Pearce faa0747cce Add in-memory updating support to NoteMap
NoteMap now supports editing in-memory, allowing applications to
modify the NoteMap once it has been loaded from the branch.  The
ability to write the branch back to tree objects is not yet done,
so the edits are strictly transient.

Change-Id: I63448954abfca2a8e3e95369cd84c0d1176cdb79
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 14:01:24 -08:00
Shawn O. Pearce 2f6e79307d Remove unnecessary flush calls from LockFile
Change-Id: I144af9db4714acabd796880be73bd50d84b92efe
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 13:38:13 -08:00
Shawn O. Pearce ed5fe8af9a Remove unnecessary region locking from LockFile
The lock file protocol relies on the atomic creation of a standardized
name in the parent directory of the file being updated.  Since the
creation is atomic, at most one thread in any process can succeed on
this creation, and all others will fail.  While the lock file exists,
that file is private to the thread that is writing it, and no others
will attempt to read or modify the file.

Consequently the use of the region level locks around the file are
unnecessary, and may actually reduce performance when using NFS, SMB,
or some other sort of remote filesystem that supports locking.

Change-Id: Ice312b6fb4fdf9d36c734c3624c6d0537903913b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 13:38:06 -08:00
Shawn O. Pearce e0e7fe531d Support core.fsyncRefFiles option
If core.fsyncRefFiles is set to true, fsync is used whenever a
reference file is updated, ensuring the file contents are also
written to disk.  This can help to prevent empty ref files after
a system crash when using a filesystem such as HFS+ where data
writes may be delayed.

Change-Id: Ie508a974da50f63b0409c38afe68772322dc19f1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 13:38:04 -08:00
Shawn O. Pearce 24fccadeda Support core.fsyncObjectFiles option
Some repositories may be on really unstable filesystems, but still
want to have good reliability when objects are written to disk.  If
core.fsyncObjectFiles is set to true, request the JVM to ensure the
data is written before returning success to the caller of insert.

The option defaults to false because it should be useless on any
filesystem that orders writes and metadata, such as ext3 mounted with
data=ordered (or data=journal).  But it may be useful on some systems
(especially HFS+) where file content may flush to the disk
independently of filesystem structure changes.

Because FileChannel.force(boolean) only claims to ensure data is
written if it was written using the write(ByteBuffer) method of
FileChannel, redirect all writes when using fsyncObjectFiles to go
through the FileChannel interface instead of through the older style
OutputStream interface.  This may not be necessary on all JVMs, but
its more portable to follow the definition than the common behavior.

Change-Id: I57f6b6bb7e403c07fbae989dbf3758eaf5edbc78
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 13:37:27 -08:00
Shawn O. Pearce bc9bca064d RenameDetector: Only scan deletes if adds exist
If there are only deletes, don't need perform rename or copy
detection.  There are no adds (aka destinations) for the deletes
to match against.

Change-Id: I00fb90c509fa26a053de561dd8506cc1e0f5799a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 11:57:02 -08:00
Shawn O. Pearce 05653bda04 SimilarityRenameDetector: Initialize sizes to 0
Setting the array elements to -1 is more expensive than relying on
the allocator to zero the array for us first.  Shifting the code to
always add 1 to the size (so an empty file is actually 1 byte long)
allows us to detect an unloaded size by comparing to 0, thus saving
the array fill calls.

Change-Id: Iad859e910655675b53ba70de8e6fceaef7cfcdd1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 11:57:02 -08:00
Shawn O. Pearce 68baa3097e SimilarityRenameDetector: Avoid allocating source index
If the only file added is really small, and all of the deleted
files are really big, none of the permutations will match up due
to the sizes being too far apart to fit the current rename score.

Avoid allocating the really big deleted SimilarityIndex by deferring
its construction until at least one add along that row has a
reasonable chance of matching it.

This avoids expending a lot of CPU time looking at big deleted
binary files when a small modified text file was broken due to a
high percentage of changed lines.

Change-Id: I11ae37edb80a7be1eef8cc01d79412017c2fc075
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 11:57:02 -08:00
Shawn O. Pearce 918e6e20f0 SimilarityRenameDetector: Only attempt to index large files once
If a file fails to index the first time the loop encounters it, the
file is likely to fail to index again on the next row.  Rather than
wasting a huge amount of CPU to index it again and fail, remember
which destination files failed to index and skip over them on each
subsequent row.

Because this condition is very unlikely, avoid allocating the BitSet
until its actually needed.  This keeps the memory usage unaffected
for the common case.

Change-Id: I93509b28b61a9bba8f681a7b4df4c6127bca2a09
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 11:57:02 -08:00
Shawn O. Pearce 0e307a6afd SimilarityIndex: Don't overflow internal counter fields
The counter portion of each pair is only 32 bits wide, but is part
of a larger 64 bit integer.  If the file size was larger than 4 GB
the counter could overflow and impact the key, changing the hash,
and later resulting in an incorrect similarity score.

Guard against this overflow condition by capping the count for each
record at 2^32-1.  If any record contains more than that many bytes
the table aborts hashing and throws TableFullException.

This permits the index to scan and work on files that exceed 4 GB
in size, but only if the file contains more than one unique block.
The index throws TableFullException on a 4 GB file containing all
zeros, but should succeed on a 6 GB file containing unique lines.

The index now uses a 64 bit accumulator during the common scoring
algorithm, possibly resulting in slower summations.  However this
index is already heavily dependent upon 64 bit integer operations
being efficient, so increasing from 32 bits to 64 bits allows us
to correctly handle 6 GB files.

Change-Id: I14e6dbc88d54ead19336a4c0c25eae18e73e6ec2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 11:57:02 -08:00
Shawn O. Pearce d63887127e SimilarityIndex: Accept files larger than 8 MB
Files bigger than 8 MB (2^23 bytes) tended to overflow the internal
hashtable, as the table was capped in size to 2^17 records.  If a
file contained 2^17 unique data blocks/lines, the table insertion
got stuck in an infinite loop as the able couldn't grow, and there
was no open slot for the new item.

Remove the artifical 2^17 table limit and instead allow the table
to grow to be as big as 2^30.  With a 64 byte block size, this
permits hashing inputs as large as 64 GB.

If the table reaches 2^30 (or cannot be allocated) hashing is
aborted.  RenameDetector no longer tries to break a modify file pair,
and it does not try to match the file for rename or copy detection.

Change-Id: Ibb4d756844f4667e181e24a34a468dc3655863ac
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 11:56:59 -08:00
Shawn O. Pearce f3b511568b SimilarityIndex: Correct comment explaining the logic
This comment was wrong, due to a copy-and-paste error.  Here the
code is looking at records of dst that do not exist in src, and
are skipping past them to find another match.

Change-Id: I07c1fba7dee093a1eeffcf7e0c7ec85446777ffb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-12 11:56:57 -08:00
Shawn Pearce e8315ce19d Merge "Fix null ref exception in DirCacheCheckout" 2010-11-12 11:29:32 -05:00
Robin Rosenberg 8c706ab464 Use capital L for long constants
Change-Id: Ib7b8c5f982dc72c68cf3d81e45a536c464837f7d
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
2010-11-12 10:28:13 +01:00
Shawn O. Pearce 5a2cbd4aa7 Remember non-note tree entries when reading
In order to safely edit a notes tree, NoteMap needs to retain any
non-note tree entries it read from the source tree and put them
back out into the modified tree when it commits a new version of
the note branch.

Remember any tree entries that didn't look like a note during
the parsing of the tree, so they can be put into a TreeFormatter
later when the tree writes to the repository.

Change-Id: Ia284af7e7866da35db35374c6c5869f00c857944
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-11 10:57:16 -08:00
Shawn O. Pearce b81b97fbdd Lazy load note subtrees from fanout levels
Instead of reading a note tree recursively up front when the NoteMap
is loaded, read only the root tree and load subtrees on demand when
they are accessed by the application.  This gives a lower latency
to read a note for the recent commits on a branch, as only the paths
that are needed get read.

Given a 2/38 style fanout, the tree will fully load when 256 objects
have been accessed by the application.  But unlike the prior version
of NoteMap, the NoteMap will load faster and answer lookups sooner,
as the loading time for all 256 levels is spread out across each of
the get() requests.

Given a 2/2/36 style fanout, the tree won't need to fully load until
about 65,536 objects are accessed.

To simplify the implementation we only support the flat layout (all
notes in the top level tree), or a 2/38, 2/2/36, 2/2/2/34, through
2/.../2 style fanout.  Unlike C Git we don't support reading the old
experimental 4/36 fanout.  This is sufficient because C Git won't
create the 4/36 style fanout when creating or updating a notes tree,
and there really aren't any in the wild today.

Change-Id: I6099b35916a8404762f31e9c11f632e43e0c1bfd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-11 10:23:38 -06:00
Shawn O. Pearce 936820988f Define NoteMap, a simple note tree reader
The NoteMap makes it easy to read a small notes tree as created by
the `git notes` command in C Git.  To make the initial implementation
simple a notes tree is read recursively into a map in memory.
This is reasonable if the application will need to access all notes,
or if there are less than 256 notes in the tree, but doesn't behave
well when the number of notes exceeds 256 and the application
doesn't need to access all of them.

We can later add support for lazily loading different subpaths,
thus fixing the large note tree problem described above.

Currently the implementation only supports reading.  Writing notes
is more complex because trees need to be expanded or collapsed at
the exact 256 entry cut-off in order to retain the same tree SHA-1
that C Git would use for the same content.  It also needs to retain
non-note tree entries such as ".gitignore" or ".gitattribute" files
that might randomly appear within a notes tree.  We can also add
writing support later.

Change-Id: I93704bd84ebf650d51de34da3f1577ef0f7a9144
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
2010-11-11 10:06:43 -06:00
Chris Aniszczyk 6043d4638c Merge "Add MutableObjectId setByte to modify a mutable id" 2010-11-11 10:52:37 -05:00
Chris Aniszczyk ea76489db1 Merge "Implement command line support for CredentialsProvider" 2010-11-11 10:28:56 -05:00
Chris Aniszczyk 573666403d Merge "Support CredentialsProvider for SSH connections" 2010-11-11 10:27:52 -05:00
Stefan Lay 33c419fdfe Merge "Define a default CredentialsProvider" 2010-11-11 09:36:34 -05:00
Stefan Lay dcac1fe4bf Merge "Enable providing credentials for HTTP authentication" 2010-11-11 09:35:43 -05:00
Shawn O. Pearce 17d9c68636 Implement command line support for CredentialsProvider
Instead of configuring the JSch session factory, configure a more
generic CredentialsProvider, which will work for other transport
types such as http, in addition to the existing ssh.

Change-Id: I22b13303c17e654ba6720edf4be2ef15fe29537a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-10 15:12:47 -08:00
Chris Aniszczyk 9e28cf2fa3 Merge "Add ObjectId getByte for random access" 2010-11-10 18:00:36 -05:00
Shawn O. Pearce d279bc83b0 Support CredentialsProvider for SSH connections
When setting up an SSH connection, use the caller supplied
CredentialsProvider, if one has been given to the Transport
or was defined as the default.

The CredentialsProvider is re-wrapped as a JSch UserInfo,
allowing the connection to use this for user interactive
prompts.  This give a unified API for authentication on
any transport type.

Change-Id: Id3b4cf5bfd27a23207cdfb188bae3b78e71e02c0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-10 15:00:13 -08:00
Shawn O. Pearce ce99b48384 Define a default CredentialsProvider
This permits applications to set their preferred credentials UI
implementation once, rather than needing to define it on every
single Transport instance they open.

Change-Id: I010550de1a6becab27f7aa5a9901df5a1c7e74bd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-10 14:58:45 -08:00
Shawn O. Pearce 308e074f65 Enable providing credentials for HTTP authentication
This change is based on http://egit.eclipse.org/r/#change,1652
by David Green. The change adds the concept of a CredentialsProvider
which can be registered for git transports and which is
responsible to return credential-related data like passwords and
usernames. Whenenver the transports detects that an authentication
with certain credentials has to be done it will ask the
CredentialsProvider for this data. Foreseen implementations for
such a Provider may be a EGitCredentialsProvider (caching
credential data entered e.g. in the Clone-Wizzard) or a NetRcProvider
(gathering data out of ~/.netrc file).

Bug: 296201
Change-Id: Ibe13e546b45eed3e193c09ecb414bbec2971d362
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: David Green <dgreen99@gmail.com>
2010-11-10 14:58:44 -08:00
Chris Aniszczyk 453b620e62 Merge "Refactor tree entry formatting into a common class" 2010-11-10 17:53:35 -05:00
Lluis Sanchez 3b4dcb3c02 Fix null ref exception in DirCacheCheckout
Added missing null check for getDirCacheEntry(). This method may
return null for example if the curernt entry is a subtree.
2010-11-10 10:56:46 +01:00
Stefan Lay 20a5a34444 Fix WWW-Authenticate auth-scheme comparison
The auth-scheme token (like "Basic" or "Digest") is not specified in a
case sensitive way. RFC2617 (http://tools.ietf.org/html/rfc2617) specifies
in section 1.2 the use of a "case-insensitive token to identify the
authentication scheme". Jetty, for example, uses "basic" as token.

Change-Id: I635a94eb0a741abcb3e68195da6913753bdbd889
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
2010-11-10 09:42:51 +01:00
Shawn O. Pearce cfa3f365d6 Simplify LockFile write(ObjectId) case
The ObjectId (for a ref) can be easily reformatted into a temporary
byte[] and then passed off to write(byte[]), removing the duplicated
code that existed in both write methods.

Change-Id: I09740658e070d5f22682333a2e0d325fd1c4a6cb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-09 19:13:13 -08:00
Shawn Pearce 17b1003ff2 Merge "Fix broken MergeCommandTest" 2010-11-09 18:58:34 -05:00
Matthias Sohn ab7d08ec96 Merge "Revert "[findBugs] Silence DM_STRING_CTOR on PacketLineIn"" 2010-11-09 18:18:41 -05:00
Matthias Sohn 2cba7b3522 Fix broken MergeCommandTest
Test was broken by commit b087bba3 changing formatting of merge
commit messages.

Change-Id: I98b1b936b9b6cbaa50fbc59d243a43e66a6ee9f9
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2010-11-10 00:11:12 +01:00
Shawn O. Pearce 6af7e4d91a Fix URIish parsing of absolute scp-style URIs
We stopped handling URIs such as "example.com:/some/p ath", because
this was confused with the Windows absolute path syntax of "c:/path".
Support absolute style scp URIs again, but only when the host name
is more than 2 characters long.

Change-Id: I9ab049bc9aad2d8d42a78c7ab34fa317a28efc1a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2010-11-09 14:36:01 -08:00
Shawn Pearce b087bba3bd Merge "Format merge commit messages like C Git" 2010-11-09 17:14:11 -05:00
Shawn O. Pearce 08a9682e32 Revert "[findBugs] Silence DM_STRING_CTOR on PacketLineIn"
This reverts commit 1e510ec20e.

Instead work around the warning by defining our constant by
constructing it through a StringBuilder.

Change-Id: If139509e769d649609c62eff359ebaea5dd286b2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Matthias Sohn <matthias.sohn@sap.com>
CC: Chris Aniszczyk <caniszczyk@gmail.com>
2010-11-08 15:34:47 -08:00