motiejus/jgit - jgit - gitea: Gitea Service

motiejus

jgit

Author	SHA1	Message	Date
Martin Fick	e7a09e316d	Introduce core.packedIndexGitUseStrongRefs config key Introduce a core.packedIndexGitUseStrongRefs configuration key, which defaults to true so that the current behavior does not change. However, setting it to false allows soft references to be used for Pack indices instead of strong references so that they can be garbage collected when there is memory pressure. Pack objects can be large when associated with pack files with large object counts, and this memory is not really accounted for or tracked by the WindowCache and it can be very substantial at times, especially with many large object count projects. A particularly problematic use case is Gerrit's ls-projects command which loads very little data in the WindowCache via ByteWindows, but ends up loading and holding many entire indices in memory, sometimes even after the ByteWindows for their Pack objects have already been garbage collected since they won't get cleared until after a new ByteWindow is loaded. By using SoftReferences, single use indices can get cleared when there is memory pressure and OOMs can be easily avoided, drastically reducing the amount of memory required to perform an ls-projects on large sites with many projects and large object counts. On one of our test sites, an ls-projects command with strong index references requires more than 66GB of heap to complete successfully, with soft index references it requires less than 23GB. Change-Id: I3cb3df52f4ce1b8c554d378807218f199077d80b Signed-off-by: Martin Fick <quic_mfick@quicinc.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	2023-08-26 16:16:43 +02:00
Jonathan Tan	f8f5979aa2	Merge "DfsGarbageCollector: provide commit graph stats"	2023-08-21 13:07:51 -04:00
Ivan Frade	9919a9faaf	DfsReader: Make PackLoadListener interface visible to subclasses A subclass cannot implement a listener with the default access. Make the interface protected. Not public because so far only subclasses are interested in this interface. We can widen the visibility later if needed. Change-Id: I54e5c0ef1312dfe2fa660bc8fb54e2be35c0f6df	2023-08-18 11:43:10 -07:00
Jonathan Tan	551ca93cc6	DfsGarbageCollector: provide commit graph stats Provide commit graph stats in the same way that we provide reftable stats. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Change-Id: Ib80c892a26f9b552bc90f3cbe7da83b02ffebdfd	2023-08-17 15:41:02 -07:00
Ivan Frade	6f73336939	DfsGarbageCollector: put only GC commits into the commit graph GC puts all commits reachable from heads and tags into the GC pack, and commits reachable only from other refs (e.g. refs/changes) into GC_REST. The commit-graph contains all commits in GC and GC_REST. This produces too big commit graphs in some repos, beating the purpose of loading the index. Limit the commit graph to commits reachable from heads and tags (i.e. commits in the GC pack). Change-Id: I4962faea5a726d2ea3e548af0aeae370a6cc8588	2023-08-16 13:31:55 -07:00
Ivan Frade	b4b8f05eea	DfsReader: Expose when indices are loaded We want to measure the data used to serve a request. As a first step, we want to know how many indices are accessed during the request and their sizes. Expose an interface in DfsReader to announce when an index is loaded into the reader, i.e. when its reference is set. The interface is more flexible to implementors (what/how to collect) than the existing DfsReaderIOStats object. Change-Id: I56f7658fde1758efaf869fa779d11b533a81a0a7	2023-08-03 23:53:13 +02:00
Matthias Sohn	abe155ea94	Merge branch 'stable-6.6' into stable-6.7 * stable-6.6: Update to Tycho 4.0.1 Add verification in GcKeepFilesTest that bitmaps are generated Express the explicit intention of creating bitmaps in GC GC: prune all packfiles after the loosen phase Prepare 5.13.3-SNAPSHOT builds JGit v5.13.2.202306221912-r Change-Id: I7294c21748897eb3f94eeffbda944b62e3206c0d	2023-08-03 10:17:22 +02:00
Matthias Sohn	b4c3a5da0d	Merge branch 'stable-6.5' into stable-6.6 * stable-6.5: Add verification in GcKeepFilesTest that bitmaps are generated Express the explicit intention of creating bitmaps in GC GC: prune all packfiles after the loosen phase Prepare 5.13.3-SNAPSHOT builds JGit v5.13.2.202306221912-r Change-Id: Id2e49252a9dc268210c9439848e77604885371aa	2023-08-03 10:14:45 +02:00
Matthias Sohn	82e277c813	Merge branch 'stable-6.4' into stable-6.5 * stable-6.4: Add verification in GcKeepFilesTest that bitmaps are generated Express the explicit intention of creating bitmaps in GC GC: prune all packfiles after the loosen phase Prepare 5.13.3-SNAPSHOT builds JGit v5.13.2.202306221912-r Change-Id: Idb6dd6160e023673e3650653a15f6b1c540de96e	2023-08-03 01:55:12 +02:00
Matthias Sohn	76dfbb2ccd	Merge branch 'stable-6.3' into stable-6.4 * stable-6.3: Add verification in GcKeepFilesTest that bitmaps are generated Express the explicit intention of creating bitmaps in GC GC: prune all packfiles after the loosen phase Prepare 5.13.3-SNAPSHOT builds JGit v5.13.2.202306221912-r Change-Id: I0bccc36d9cc9a36f1be9b1562df35ce3a0e95eee	2023-08-03 01:51:36 +02:00
Matthias Sohn	05ded4ee62	Merge branch 'stable-6.2' into stable-6.3 * stable-6.2: Add verification in GcKeepFilesTest that bitmaps are generated Express the explicit intention of creating bitmaps in GC GC: prune all packfiles after the loosen phase Prepare 5.13.3-SNAPSHOT builds JGit v5.13.2.202306221912-r Change-Id: I589ed444b5cbfc5b073cac91323e2cc97ab98087	2023-08-03 01:37:43 +02:00
Matthias Sohn	6483c7d209	Merge branch 'stable-6.1' into stable-6.2 * stable-6.1: Add verification in GcKeepFilesTest that bitmaps are generated Express the explicit intention of creating bitmaps in GC GC: prune all packfiles after the loosen phase Prepare 5.13.3-SNAPSHOT builds JGit v5.13.2.202306221912-r Change-Id: I5b16c3b613a95b7f28c8f6ac0b20c4c593759cea	2023-08-03 01:28:07 +02:00
Matthias Sohn	55ff4ed9de	Merge branch 'stable-6.0' into stable-6.1 * stable-6.0: Add verification in GcKeepFilesTest that bitmaps are generated Express the explicit intention of creating bitmaps in GC GC: prune all packfiles after the loosen phase Prepare 5.13.3-SNAPSHOT builds JGit v5.13.2.202306221912-r Change-Id: Ib08037f6055dac1776e38cfb4ff8c88a50ad3e60	2023-08-03 01:19:21 +02:00
Matthias Sohn	c7849fbb19	Merge branch 'stable-5.13' into stable-6.0 * stable-5.13: Add verification in GcKeepFilesTest that bitmaps are generated Express the explicit intention of creating bitmaps in GC GC: prune all packfiles after the loosen phase Prepare 5.13.3-SNAPSHOT builds JGit v5.13.2.202306221912-r Change-Id: I1f50995d9d9c592ec0e02a04e0e409440b49f9f3	2023-08-03 01:17:17 +02:00
Matthias Sohn	de7b5b7b26	Prepare 6.7.0-SNAPSHOT builds Change-Id: I936d2d9106a1e3b7a98ec89fec8ae8a92ec765f2	2023-08-03 00:05:50 +02:00
Matthias Sohn	1d26471c16	JGit v6.7.0.202308011830-m2 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com> Change-Id: I255a979e9f48f60a251ef7b74ced3f720f012706	2023-08-02 00:30:01 +02:00
Matthias Sohn	24b6a35d30	Add missing @since tags This was missed in `c353645a09` Change-Id: I4ae5b13bd7bfd09c113d91ece727a26706660826	2023-08-02 00:19:02 +02:00
Han-Wen NIenhuys	d96a91e77e	Merge "Merge: Add diff3 style merge conflict formatter."	2023-08-01 13:08:34 -04:00
Nitzan Gur-Furman	c353645a09	Move footer-line parsing methods from RevCommit to FooterLine This allows extracting footers from a messages not associated with a commit. The public API of RevCommit is kept intact. Change-Id: I5809c23df7b7d49641a4be3a26d6f987d3d57c9b Bug: Google b/287891316	2023-08-01 10:37:24 +02:00
Haamed Gheibi	462c57ec8d	Merge: Add diff3 style merge conflict formatter. Add base section to the merge conflict hunks. Bug: 442284 Change-Id: I977b43e7dd8119d6b72d11f09c4e8ec241750383	2023-07-31 11:57:28 -07:00
Jonathan Tan	ec11129b1d	Merge changes I8c60d970,I09bdd4b8,I87ff3933 * changes: Pack: open reverse index from file if present PackReverseIndex: open file if present otherwise compute PackReverseIndex: verify checksums	2023-07-26 16:39:13 -04:00
Ivan Frade	c8f5a3f99d	RevCommitCG: Read changed-path-filters directly from commit graph RevCommit and RevCommitCG were designed like "pointers" to data that load the content on demand, not on construction. This saves memory. Make the loading of changed-path-filter follow the same pattern. The ChangedPathFilters are only pointers to locations in the commit-graph (not the actual data), so the memory saving is not that big, but this is more consistent with the rest of the API. As 6.7 is not released, we can still change the RevWalk API. Change-Id: Id4186ea744b8a2418d0329facae69f785108d356	2023-07-26 12:22:01 +02:00
Matthias Sohn	eecd93714b	Update commons-codec to 1.16.0 Change-Id: I64617b17a168da1966b93c283c150d549477f3e1	2023-07-25 23:22:46 +02:00
Matthias Sohn	68c08265aa	Add missing @since tags for new API methods This was missed in `d3b40e72ac`. Change-Id: I6e90157c6be34ae6618e246b02cf77631c8e9732	2023-07-25 23:22:46 +02:00
Matthias Sohn	8879eec460	Add missing package import needed to use MurmurHash3 This was missed in `49beb5ae51` and broke the OSGi classpath. Change-Id: I08a307e9e3aade4ed8a5b5e2cc5e5d03c57dfa56	2023-07-25 22:06:27 +02:00
Jonathan Tan	c77fb93478	Merge "Identify a commit that generates a diffEntry on a rename Event."	2023-07-25 12:09:40 -04:00
Ronald Bhuleskar	ec3d919aa5	Identify a commit that generates a diffEntry on a rename Event. When using FollowFilter's rename callback, a callback is generated with the diff. The caller that is interested in the renames knows what the diff's are but have no idea what commit generated that diff. This will allow FollowFilter's rename callback to track diffEntry for a given commit. Change-Id: If1e63ccd19fdcb9c58c59137110fe24e0ce023d2	2023-07-24 19:42:51 -04:00
Jonathan Tan	0f4af2bc36	Merge changes I60a92463,Ic3b68220 * changes: PackReverseIndexV1: reverse index parsed from version 1 file ComputedPackReverseIndex: Clarify custom bucket sort algorithm	2023-07-21 14:05:38 -04:00
Anna Papitto	f196c7a0e8	Pack: open reverse index from file if present The reverse index for a pack is still always computed if needed, which is slower than parsing it from a file. Supply the file path where the reverse index file might be so that it parsed instead of computed if the file is present. Change-Id: I8c60d970fd587341dfb2763fb87f1c586279f2a5 Signed-off-by: Anna Papitto <annapapitto@google.com>	2023-07-18 15:19:26 -07:00
Anna Papitto	000e7caf5e	PackReverseIndexV1: reverse index parsed from version 1 file The reverse index for a pack is used to quickly find an object's position in the pack's forward index based on that object's pack offset. It is currently computed from the forward index by sorting the index entries by the corresponding pack offset. This computation uses insertion sort, which has an average runtime of O(n^2). Cgit persists a pack reverse index file to avoid recomputing the reverse index ordering. Instead they write a file with format https://git-scm.com/docs/pack-format#_pack_rev_files_have_the_format which can later be read and parsed into the in-memory reverse index each time it is needed. PackReverseIndexV1 parses a reverse index file with the official version 1 format into an in-memory representation of the reverse index which implements methods to find an object's forward index position from its offset in logorithmic time. Change-Id: I60a92463fbd6a8cc9c1c7451df1c14d0a21a0f64 Signed-off-by: Anna Papitto <annapapitto@google.com>	2023-07-18 15:19:26 -07:00
Anna Papitto	2eba4e5b41	PackReverseIndex: open file if present otherwise compute The existing #read and #computeFromIndex static builder methods require the caller to choose whether to supply an input stream of a reverse index file or a forward index to compute the reverse index from, which is slower. Allow a caller to provide a file path where the pack's reverse index might be and the pack's forward index index and simply get some reverse index instance back. Prefer opening and parsing the file if it is present, to save computation time. Otherwise, fall back onto computing the reverse index from the pack's forward index. Change-Id: I09bdd4b813ad62c86add586417b2ab86e9331aec Signed-off-by: Anna Papitto <annapapitto@google.com>	2023-07-18 15:19:26 -07:00
Anna Papitto	8123dcd699	PackReverseIndex: verify checksums The new version 1 file-based reverse index has a footer with the checksum of the corresponding pack file and a checksum of its own contents. The initial implementation doesn't enforce that the pack checksum matches the checksum found in the forward index nor that the self checksum matches the contents of the file just read in. Offer a method for reverse index users to verify the checksums in a way appropriate to the version being used. For the pre-existing computed version, always succeed since it is not based on a file so there is no possibility of corruption. Check for corruption of the file itself during parsing the checksum footer, by comparing the self checksum with the digest of the file contents read. Change-Id: I87ff3933cf1afa76663350400b616695e4966cb6 Signed-off-by: Anna Papitto <annapapitto@google.com>	2023-07-18 15:19:26 -07:00
Anna Papitto	7d2669587f	ComputedPackReverseIndex: Clarify custom bucket sort algorithm The ComputedPackReverseIndex uses a custom sorting algorithm, based on bucket sort with insertion sort but with the data managed as a linked list across two int arrays. This custom algorithm relies on the set of values being sorted being exactly 0, ..., n-1; so that they can serve a second purpose of being indexes into a second equally sized list. This custom algorithm was introduced ~10 years ago in `6cc532a43c`. The original author is no longer an active contributor, so it is valuable for the code to be readable, especially as there is currently active work on reverse indexes. Rename variables and add comments to clarify the algorithm and improve readability. There are no functional changes to the algorithm. Change-Id: Ic3b682203f20e06f9f865f81259e034230f9720a Signed-off-by: Anna Papitto <annapapitto@google.com>	2023-07-18 15:19:20 -07:00
Ronald Bhuleskar	3b77e33ad8	CommitGraphWriter: add option for writing/using bloom filters Currently, bloom filters are written and used without any way to turn them off. Add a per-repo config variable to control whether bloom filters are written. As for reading, add a JGit option to control this. (A JGit option is used instead of a per-repo config variable as there is usually no reason not to use the bloom filters if they are present, but a global control to disable them is useful if there turns out to be an issue with the implementation of bloom filters.) The config that controls reading is the same as C Git, but the config for writing is not: C Git has no config to control writing, but whether bloom filters are written depends on whether bloom filters are already present and what arguments are passed to "git commit-graph write". See the manpage of "git commit-graph" for more information. Change-Id: I1b7b25340387673506252b9260b22bfe147bde58	2023-07-18 14:21:48 -07:00
Jonathan Tan	77aec62141	CommitGraphWriter: reuse changed path filters Teach CommitGraphWriter to reuse changed path filters that have been read from the commit graph file whenever possible. Change-Id: I1acbfa1613ca7198386a49209028886af360ddb6 Signed-off-by: Jonathan Tan <jonathantanmy@google.com>	2023-07-18 14:21:48 -07:00
Jonathan Tan	d3b40e72ac	RevWalk: use changed path filters Teach RevWalk, TreeRevFilter, PathFilter, and FollowFilter to use changed path filters, whenever available, to speed revision walks by skipping commits that fail the changed path filter. This work is based on earlier work by Kyle Zhao (I441be984b609669cff77617ecfc838b080ce0816). Change-Id: I7396f70241e571c63aabe337f6de1b8b9800f7ed Signed-off-by: kylezhao <kylezhao@tencent.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>	2023-07-18 14:21:48 -07:00
Jonathan Tan	ff0f7c174f	CommitGraphLoader: read changed-path filters As described in the parent commit, add support for reading the BIDX and BDAT chunks of the commit graph file, as described in man gitformat- commit-graph(5). This work is based on earlier work by Kyle Zhao (I160f6b022afaa842c331fb9a086974e49dced7b2). Change-Id: I82e02e6a3a3b758e6bf9d7bbd2198f0ffe3a331b Signed-off-by: kylezhao <kylezhao@tencent.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>	2023-07-18 14:21:48 -07:00
Jonathan Tan	49beb5ae51	CommitGraphWriter: write changed-path filters Add support for writing the BIDX and BDAT chunks of the commit graph file, as described in man gitformat-commit-graph(5). The ability to read such chunks will be added in a subsequent commit. This work is based on earlier work by Kyle Zhao (Ib863782af209f26381e3ca0a2c119b99e84b679c). Change-Id: Ic18e6f0eeec7da1e1ff31751aabda5e6952dbe6e Signed-off-by: kylezhao <kylezhao@tencent.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>	2023-07-18 14:21:48 -07:00
Matthias Sohn	5dc63514d0	Merge "ssh: PKCS#11 support"	2023-07-17 18:13:06 -04:00
Thomas Wolf	23758d7a61	ssh: PKCS#11 support Support PKCS#11 HSMs (like YubiKey PIV) for SSH authentication. Use the SunPKCS11 provider as described at [1]. This provider dynamically loads the library from the PKCS11Provider SSH configuration and creates a Java KeyStore with that provider. A Java CallbackHandler is needed to feed PIN prompts from the KeyStore into the JGit CredentialsProvider framework. Because the JGit CredentialsProvider may be specific to a SSH session but the PKCS11Provider may be used by several sessions, the CallbackHandler needs to be configurable per session. PIN prompts respect the NumberOfPasswordPrompts SSH configuration. As long as the library asks only for a PIN, we use the KeyPasswordProvider to prompt for it. This gives automatic integration in Eclipse with the Eclipse secure storage, so a user has even the option to store the PIN there. (Eclipse will then ask for the secure storage master password on first access, so the usefulness of this is debatable.) By default the provider uses the first PKCS#11 token (slot list index zero). This can be overridden by a non-standard PKCS11SlotListIndex ssh configuration entry. (For OpenSSH interoperability, also set "IgnoreUnknown PKCS11SlotListIndex" in the SSH config file then.) Once loaded, the provider and its shared library and the keys contained remain available until the application exits. Manually tested using SoftHSM. See file manual_tests.txt. Kudos to Christopher Lamb for additional manual testing with a real YubiKey, also on Windows.[2] [1] https://docs.oracle.com/en/java/javase/11/security/pkcs11-reference-guide1.html [2] https://www.eclipse.org/forums/index.php/t/1113295/ Change-Id: I544c97e1e24d05e28a9f0e803fd4b9151a76ed11 Signed-off-by: Thomas Wolf <twolf@apache.org>	2023-07-17 04:52:30 -04:00
Matthias Sohn	db08835c6c	GC: Remove handling of extra pack for RefTree RefTree was packed in its own packfile, see Icbb735be8fa91ccbf0708ca3a219b364e11a6b83. RefTree was deleted in Ia3da7f2b82d9e365cec2ccf9397cbc47439cd150, since it was experimental and never used productively. This change missed to remove the extra pack handling for RefTree. Change-Id: I8c0d0a66440c331c3d03d0e07d5629682af2a7a9	2023-07-17 00:57:21 +02:00
Matthias Sohn	010a14f24d	Remove unused API problem filters Change-Id: Iea5fb0bf7b2c6a14d7d8b55558f6e78d3fd523f1	2023-07-16 15:13:05 +02:00
Matthias Sohn	b2f7dc189a	Remove redundant specification of type arguments Change-Id: I8289e0a6ca9154d6411993d250176a35df7cb905	2023-07-16 15:11:17 +02:00
Ivan Frade	760bdd09b1	DfsPackParser: Create object indices if config says so The DfsInserter writes the pack and its indices in the flush() method, but when the writing happens via DfsPackParser, it is the parser which writes the pack and indices. When combined with a parser, flushing the inserter is a noop. Add the writing of the object size index to the packparser#parse method, mirroring how the primary index is written. Change-Id: I52c5db153fea7e4a8ecd8b3d5de7ad21f7f81a60	2023-07-14 10:51:18 -07:00
Ivan Frade	cb99ff5bbb	DfsInserter: generate object size index if config says so DfsInserter receives objects and on flush() writes a pack and its primary index. Teach the DfsInserter to write also the object size index if the config says so. Change-Id: I89308312f8fd898d4c714a9b68ff948d3663800b	2023-07-14 10:34:46 -07:00
Ivan Frade	4d2a003b91	DfsInserter: populate full size on object insertion We need the full size of the object to populate the object size index later. Save the size the PackedObjectInfo while adding objects to the pack. Then we don't need to re-read it from the pack at indexing time. Change-Id: I5bd7ad402df60b4637038def8ef7be2ab45faf87	2023-07-14 10:25:20 -07:00
Ivan Frade	12a4a4ccaa	DFSGarbargeCollector: Write object size indices PackWriter knows how to add an object size index to the pack, but the garbage collector is not using it yet. Teach DfsGarbageCollector to write the object size index on writePack(). Disable by default in the unreachable-garbage pack. Callers control the content/presence of the index through the PackConfig option (minBytesForObjSizeIndex) for all other packs, so there is no need of a specific flag in DfsGarbageCollector. Change-Id: I86f5f17310e6913381125bec4caab32dc45b7c9d	2023-07-14 10:25:06 -07:00
Ivan Frade	9dace2e6d6	DfsReader/PackFile: Implement isNotLargerThan using the obj size idx isNotLargerThan() can avoid reading the size of a blob from disk using the object size idx if available. Load the object size index in the DfsPackfile following the same pattern than the other indices. Override isNotLargerThan in DfsReader to use the index when available. Following CL introduces the writing of the object size index and the tests cover this code. Change-Id: I15c95b84c1424707c487a7d29c5c46b1a9d0ceba	2023-07-13 11:24:17 -07:00
Luca Milanesio	3a6eec9bb6	Express the explicit intention of creating bitmaps in GC Add an explicit flag to PackWriter for allowing the GC.repack() phase to explicitly generate bitmaps only for the heads packfile and not for the others. Previously the bitmap generation was conditioned to the presence of object ids exclusion from the PackWriter. The introduction of the bitmap generation in the PackWriter done in Icdb0cdd66 has accidentally made the .keep files not completely transparent, because their presence have disabled the generation of the bitmap index, even if the generation of bitmaps is enabled. This bug has been an accidental consequence of the intention of the bitmap generator to avoid generating bitmaps for the non-heads packfile, however the implementation done by Colby decided to use the excludeInPacks variable (see [1]) which is unfortunately also used for excluding the packfiles having an associated .keep file (see [2]). [1] https://git.eclipse.org/r/c/jgit/jgit/+/7940/18/org.eclipse.jgit/src/org/eclipse/jgit/storage/pack/PackWriter.java#1617 [2] `dafcb8f6db/org.eclipse.jgit/src/org/eclipse/jgit/storage/file/GC.java (506)` Bug: 582039 Change-Id: Id722e68d9ff4ac24e73bf765ab11017586b6766e	2023-07-05 15:30:11 +02:00
Luca Milanesio	ac8d7838f0	GC: prune all packfiles after the loosen phase When loosening the objects inside the packfiles to be pruned, make sure that the packfile list is stable and prune all the files after the loosening is done. This prevents a series of exceptions previously thrown when loosening the packfiles, due to the too early pruning of the packfiles that were still in the pack list. Bug: 581532 Change-Id: I776776e2e083f1fa749d53f965bf50f919823b4f	2023-07-05 15:28:16 +02:00

1 2 3 4 5 ...

6471 Commits