Commit Graph

13 Commits

Author SHA1 Message Date
Matthias Sohn e55bad514b Document option "core.sha1Implementation" introduced in 59029aec
Bug: 580310
Change-Id: I10f3d6f6b5af7ab96683994c9cbd85e6c18a5084
2023-02-02 21:18:43 +01:00
Luca Milanesio ad977f1572 Allow the exclusions of refs prefixes from bitmap
When running a GC.repack() against a repository with over one
thousands of refs/heads and tens of millions of ObjectIds,
the calculation of all bitmaps associated with all the refs
would result in an unreasonable big file that would take up to
several hours to compute.

Test scenario: repo with 2500 heads / 10M obj Intel Xeon E5-2680 2.5GHz
Before this change: 20 mins
After this change and 2300 heads excluded: 10 mins (90s for bitmap)

Having such a large bitmap file is also slow in the runtime
processing and have negligible or even negative benefits, because
the time lost in reading and decompressing the bitmap in memory
would not be compensated by the time saved by using it.

It is key to preserve the bitmaps for those refs that are mostly
used in clone/fetch and give the ability to exlude some refs
prefixes that are known to be less frequently accessed, even
though they may actually be actively written.

Example: Gerrit sandbox branches may even be actively
used and selected automatically because its commits are very
recent, however, they may bloat the bitmap, making it ineffective.

A mono-repo with tens of thousands of developers may have
a relatively small number of active branches where the
CI/CD jobs are continuously fetching/cloning the code. However,
because Gerrit allows the use of sandbox branches, the
total number of refs/heads may be even tens to hundred
thousands.

Change-Id: I466dcde69fa008e7f7785735c977f6e150e3b644
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
2023-01-31 17:14:09 -05:00
Fabio Ponciroli 6976a30f44 searchForReuse might impact performance in large repositories
The search for reuse phase for *all* the objects scans *all*
the packfiles, looking for the best candidate to serve back to the
client.

This can lead to an expensive operation when the number of
packfiles and objects is high.

Add parameter "pack.searchForReuseTimeout" to limit the time spent
on this search.

Change-Id: I54f5cddb6796fdc93ad9585c2ab4b44854fa6c48
2021-06-25 17:57:59 +02:00
Thomas Wolf 33a055e63b Document http options supported by JGit
Change-Id: I0af4f9991fdb4f09de25f743d1e0dca67ceaa18b
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
2021-03-13 17:05:47 +01:00
Matthias Sohn 824a3c6964 Fix formatting of config option values
Change-Id: If9a4bb44c4b348cbb94127207566471105267a53
2020-10-26 11:26:42 -04:00
Matthias Sohn bd5942a206 Document options in core section supported by JGit
Change-Id: I25af04112cf219405718b5c3e8e103156fb30fa5
2020-10-26 10:54:12 -04:00
Matthias Sohn 567bf85479 Document gc and pack relevant options
Change-Id: Iab7262b25942fa8c062b979d394674635b70a284
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
2020-04-03 11:50:27 +09:00
Han-Wen Nienhuys c217d33ff8 Documentation/technical/reftable: improve repo layout
Previously, the list of tables was in .git/refs. This makes
repo detection fail in older clients, which is undesirable.

This is proposal was discussed and approved on the git@vger list at

  https://lore.kernel.org/git/CAFQ2z_PvKiz==GyS6J1H1uG0FRPL86JvDj+LjX1We4-yCSVQ+g@mail.gmail.com/

For backward compatibility, JGit could detect a file under .git/refs and
use it as a reftable list.

Change-Id: Ic0b974fa250cfa905463b811957e2a4fdd7bbc6b
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
2020-02-11 11:52:35 +01:00
Han-Wen Nienhuys 7c75a68b96 reftable: enforce ascending order in sortAndWriteRefs
MergedReftableTest#scanDuplicates tests whether we can write duplicate
keys in a merged reftable. Apparently, the first key appearing should
get precedence, and this works because the sort() algorithm on ordered
collections is stable.

This is potentially confusing behavior, because you can write data
into the table that cannot be retrieved (Merged table can only have
one entry per key), and the APIs such as exactRef() only return a
single value.

Make this consistent with behavior introduced in I04f55c481 "reftable:
enforce ordering for ref and log writes" by considering a duplicate key
in sortAndWriteRefs as a fatal runtime error.

Change-Id: I1eedd18f028180069f78c5c467169dcfe1521157
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
2019-10-30 18:00:24 +01:00
Han-Wen Nienhuys cf11a03bc2 Documentation/technical/reftable: change suggested file names
By using ${min_update}-${max_update} as file name template, we
guarantee that each file has a unique name.
This allows data from open files to be cached across reloads of the
stack.

This is in anticipation of Change I1837f268e 
("file: implement FileReftableDatabase"), which is the first
implementation of reftable on a filesystem.

Change-Id: I7ef0610eb60c494165382d0c372afcf41f074393
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
2019-10-30 07:21:46 -04:00
Han-Wen Nienhuys c517725b8c Documentation/technical/reftable: document rename in reflog.
Change-Id: I0fe7d28a772b1ee9eefd9a38bff5e08a8559988f
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
2019-08-21 19:16:51 +02:00
Shawn Pearce 44a75d9ea8 reftable: explicitly store update_index per ref
Add an update_index to every reference in a reftable, storing the
exact transaction that last modified the reference.  This is necessary
to fix some merge race conditions.

Consider updates at T1, T3 are present in two reftables.  Compacting
these will create a table with range [T1,T3].  If T2 arrives during
or after the compaction its impossible for readers to know how to
merge the [T1,T3] table with the T2 table.

With an explicit update_index per reference, MergedReftable is able to
individually sort each reference, merging individual entries at T3
from [T1,T3] ahead of identically named entries appearing in T2.

Change-Id: Ie4065d4176a5a0207dcab9696ae05d086e042140
2017-08-21 15:39:08 -07:00
Shawn Pearce b9e818b556 reftable: file format documentation
Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:

- Near constant time lookup for any single reference, even when the
  repository is cold and not in process or kernel cache.
- Near constant time verification a SHA-1 is referred to by at least
  one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.

Change-Id: I29d0ff1eee475845660ac9173413e1407adcfbf2
2017-08-17 15:06:50 -07:00