* stable-5.7:
Revert "RefDirectory.scanRef: Re-use file existence check done in snapshot creation"
Change-Id: Ied786ab5e3c0dd05f701705fce2d4ad85502c4d6
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
* stable-5.6:
Revert "RefDirectory.scanRef: Re-use file existence check done in snapshot creation"
Change-Id: I454622dae6eb95aedbd858e3b12da72282d36673
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
* stable-5.5:
Revert "RefDirectory.scanRef: Re-use file existence check done in snapshot creation"
Change-Id: I2622f1d384a88a556ba9d88f0d08a37af69e530c
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
* stable-5.4:
Revert "RefDirectory.scanRef: Re-use file existence check done in snapshot creation"
Change-Id: Ia1665dd92ccc3811a6116f41421a05aca10fc6eb
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
* stable-5.3:
Revert "RefDirectory.scanRef: Re-use file existence check done in snapshot creation"
Change-Id: I52a57a17abe60e30e3d7615f8cb4d0c5e6aebd9b
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
* stable-5.2:
Revert "RefDirectory.scanRef: Re-use file existence check done in snapshot creation"
Change-Id: Id37f47a5ef2e3c8329eca30c171941f7e5606a85
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
* stable-5.1:
Revert "RefDirectory.scanRef: Re-use file existence check done in snapshot creation"
Change-Id: I625667c2718ab31ae7df907c3dd6024a933913b8
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
This reverts commit f829f5f838.
Using MISSING_FILEKEY as indicator for a non-existing file doesn't work
on Windows.
Bug: 577954
Change-Id: I92102a3d259f6cc0f367096a3213cfa794466817
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Return immediately in scanRef if the loose ref was identified as
missing when a snapshot was attempted for the ref. This will help
performance of scanRef when the ref is packed but has a corresponding
empty dir in 'refs/'.
For example, consider the case where we create 50k sharded refs in
a new namespace called 'new-refs' using an atomic 'BatchRefUpdate'.
The refs are named like 'refs/new-refs/01/1/1', 'refs/new-refs/01/1/2',
'refs/new-refs/01/1/3' and so on. After the refs are created, the
'new-refs' namespace looks like below:
$ find refs/new-refs -type f | wc -l
0
$ find refs/new-refs -type d | wc -l
5101
At this point, an 'exactRef' call on each of the 50k refs without
this change takes ~2.5s, where as with this change it takes ~1.5s.
Change-Id: I926bc41b9ae89a1a792b1b5ec9a17b05271c906b
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
Doing a getFileStoreAttributes call even when the file doesn't
exist is unnecessary. This call is particularly slow on some
filesystems. Instead, do it only when the file exists and load
the appropriate cache.
This update can help speed up RefDirectory.exactRef when the ref
is packed, but has a corresponding empty dir for it under 'refs/'.
This scenario can happen when an atomic 'BatchRefUpdate' creates
new sharded refs.
For example, consider the case where we create 50k sharded refs in
a new namespace called 'new-refs' using an atomic 'BatchRefUpdate'.
The refs are named like 'refs/new-refs/01/1/1', 'refs/new-refs/01/1/2',
'refs/new-refs/01/1/3' and so on. After the refs are created, the
'new-refs' namespace looks like below:
$ find refs/new-refs -type f | wc -l
0
$ find refs/new-refs -type d | wc -l
5101
At this point, an 'exactRef' call on each of the 50k refs without
this change takes ~30s, where as with this change it takes ~2.5s.
Change-Id: I4a5d4c6a652dbeed1f4bc3b4f2b2f1416f7ca0e7
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
* stable-5.7:
reftable: drop code for truncated reads
reftable: pass on invalid object ID in conversion
Update eclipse-jarsigner-plugin to 1.3.2
Change-Id: I88c47ff57f4829baec5b19aad3d8d6bd21f31a86
* stable-5.6:
reftable: drop code for truncated reads
reftable: pass on invalid object ID in conversion
Update eclipse-jarsigner-plugin to 1.3.2
Change-Id: I1c18f5f435f4a4a86e0548a310dbfc74191e1ed5
The reftable format is a block based format, but allows for variably
sized blocks. This obviously happens for reflog blocks (which are zlib
compressed), but is also accepted for index blocks: In the spec, this
is motivated as
To achieve constant O(1) disk seeks for lookups the index must be
a single level, which is permitted to exceed the file's
configured block size, but not the format's max block size of
15.99 MiB.
Hence, when parsing a block, one cannot be sure of its exact size:
after reading a default-size block (eg. 4kb), the block header may
state that the block is in fact larger.
Before, the code would mark the block as `truncated`, noting
// Its OK during sequential scan for an index block to have been
// partially read and be truncated in-memory. This happens when
// the index block is larger than the file's blockSize. Caller
// will break out of its scan loop once it sees the blockType.
This looks like either
* a remnant of never-implemented functionality. There is no reason to
ever sequentially scan an index block.
* alluding to sequential scan of the data blocks before the index
blocks (eg. scanning refs, which ends when we find the first ref index
block, and we can then ignore the index block).
This comment is followed by code that populates the
restartTbl/restartCnt fields relative to the (possibly truncated)
buffer. If the buffer is truncated, this essentially reads garbage,
leading to OOB array access when using the index block.
Fix this by dropping the truncated logic and issuing a second read if
the first read was short.
Add a test.
We have never observed this failure scenario at Google. We use 64kb
blocksize, which requires us to need fewer index entries. The reftable
spec mentions an Android repo of size 36M. With 64kb blocks, that's
just 562 index entries. Even with historical growth, we are long from
requiring an index whose size exceeds a single block.
When adding the analogous test for seeking refs, there was no failure.
This points to another possibility which is that the code tries to
avoid writing large index blocks for refs.
I did not investigate further which one it is.
Fixes https://bugs.eclipse.org/bugs/show_bug.cgi?id=576250
Bug: 576250
Change-Id: I41ec21fac9e526ef57b3d6fb57b988bd353ee338
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Before, while trying to determine if an object ID was a tag or not,
the reftable conversion would yield an exception.
Change-Id: I3688a0ffa9e774ba27f320e3840ff8cada21ecf0
Rather than getting all ref names and prefixes and saving them
in memory to perform the check for conflicting names, rely on
RefDirectory.isNameConflicting as it is no longer an expensive
call after it was optimized in Ie994fc.
The old optimization to save ref names and prefixes in memory
was targeted towards making clones faster. With this change,
the clone performance is unaffected when tests were done with
repos containing many(~500k) refs.
Here are few recorded elapsed times for creating 10 branches
using BatchRefUpdate on NFS based repositories with varying
loose refs count. As seen here, this change helps improve the
BatchRefUpdate performance from O(n^2) to O(1).
loose_refs_count with_change without_change
50 241 ms 310 ms
300 263 ms 1502 ms
1k 181 ms 4241 ms
2k 204 ms 6440 ms
9k 158 ms 25930 ms
20k 154 ms 60443 ms
50k 171 ms 135199 ms
110k 157 ms 329450 ms
160k 209 ms 396328 ms
This update improves the Gerrit notedb migration performance
as it uses BatchRefUpdate to write change meta refs similar to
the test performed above.
Change-Id: I853ac6c7feb4b39c3156c01876b38cbd182accfe
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
Avoid having to scan over ALL loose refs to determine if the
name is nested within or is a container of an existing reference.
This can get really expensive if there are too many loose refs.
Instead use exactRef and getRefsByPrefix which scan based on a
prefix.
With a simple shell script(like below) using jgit client to create
1k refs in a new repository on NFS, this change brings down the time
from 12mins to 7mins.
for ref in $(seq 1 1000); do
jgit branch "$ref"
done
Here are few recorded elapsed times to create a new branch on NFS
based repositories with varying loose refs count. As we see here,
this change improves the name conflicting check from O(n^2) to O(1).
loose_refs_count with_change without_change
50 44 ms 164 ms
300 45 ms 1193 ms
1k 38 ms 2610 ms
2k 44 ms 6003 ms
9k 46 ms 27860 ms
20k 45 ms 48591 ms
50k 51 ms 135471 ms
110k 43 ms 294252 ms
160k 52 ms 430976 ms
Change-Id: Ie994fc184b8f82811bfb37b111eb9733dbe3e6e0
Signed-off-by: Kaushik Lingarkar <quic_kaushikl@quicinc.com>
* stable-5.7:
Remove texts which were added by mistake in 00386272
Fix formatting which was broken in 00386272
Change-Id: I7ed3f47cb46e6c1bf483702c8925a24e88658e47
* stable-5.6:
Remove texts which were added by mistake in 00386272
Fix formatting which was broken in 00386272
Change-Id: I45d444b360485564744bf3dfad2c2f5a5e7fcdf6
Don't create the stream eagerly in lock(); that may cause JGit to
exceed OS or JVM limits on open file descriptors if many locks need
to be created, for instance when creating many refs. Instead create
the output stream only when one really needs to write something.
Bug: 573328
Change-Id: If9441ed40494d46f594a896d34a5c4f56f91ebf4
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
In a distributed setting, one can have multiple datacenters use
reftables for serving, while the ground truth for the Ref database is
administered centrally. In this setting, replication delays combined
with compaction can cause update-index ranges to overlap.
Such a setting is used at Google, and the JGit code already handles
this correctly (modulo a bugfix that applied in change I8f8215b99a).
Remove the restriction that was applied at FileReftableDatabase.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: I6f9ed0fbd7fbc5220083ab808b22a909215f13a9