libfossil Notable TODOs
This page gives a high-level overview of the notable TODOs, or perceived TODOs, as well as non-TODOs (topics/APIs which are either out of scope or are way, way down the line).
Core SCM (and closely adjacent) Features
In no particular order:
Port over checkout/repo fingerprint: this allows detection of when a checkout's repo has been replaced by one with different RIDs. See fossil's
vfile.c:vfile_rid_renumbering_event()
. Calculation and confirmation of the fingerprint was added on 2021-04-16 but still TODO is an attempt at automatic recovery.Stash support.
Unversioned files should be trivial to do.
Maybe pending-moderation support (tickets, ticket comments, wiki edits), depending on how centrally-managed such data need to be (i.e. whether it can be delegated to an app layer).
Checkin
The SQL code related to checkins has diverged significantly from fossil since libf's checkin support was first implemented (the better part of a decade ago). Though libf's is known to work for the cases supported by its existing feature set, it "really couldn't hurt" to audit the checkin algorithm against fossil's. It's okay if they differ, so long as the results are the same. What brings all this up is a corner-case discrepancy discovered in fossil's checkin support regarding merged-and-edited files:
https://fossil-scm.org/forum/forumpost/03f6b307f89c990b?t=h
for which Richard proposes a patch to fossil's checkin.c
:
--- src/checkin.c
+++ src/checkin.c
@@ -2572,11 +2572,11 @@
** table. If there were arguments passed to this command, only
** the identified files are inserted (if they have been modified).
*/
db_prepare(&q,
"SELECT id, %Q || pathname, mrid, %s, %s, %s FROM vfile "
- "WHERE chnged IN (1, 7, 9) AND NOT deleted AND is_selected(id)",
+ "WHERE chnged<>0 AND NOT deleted AND is_selected(id)",
g.zLocalRoot,
glob_expr("pathname", db_get("crlf-glob",db_get("crnl-glob",""))),
glob_expr("pathname", db_get("binary-glob","")),
glob_expr("pathname", db_get("encoding-glob",""))
);
@@ -2610,16 +2610,18 @@
blob_str(&fname));
blob_reset(&fname);
}
nrid = content_put(&content);
blob_reset(&content);
- if( rid>0 ){
- content_deltify(rid, &nrid, 1, 0);
+ if( nrid!=rid ){
+ if( rid>0 ){
+ content_deltify(rid, &nrid, 1, 0);
+ }
+ db_multi_exec("UPDATE vfile SET mrid=%d, rid=%d, mhash=NULL WHERE id=%d",
+ nrid,nrid,id);
+ db_multi_exec("INSERT OR IGNORE INTO unsent VALUES(%d)", nrid);
}
- db_multi_exec("UPDATE vfile SET mrid=%d, rid=%d, mhash=NULL WHERE id=%d",
- nrid,nrid,id);
- db_multi_exec("INSERT OR IGNORE INTO unsent VALUES(%d)", nrid);
}
db_finalize(&q);
if( nConflict && !allowConflict ){
fossil_fatal("abort due to unresolved merge conflicts; "
"use --allow-conflict to override");
The corresponding code in libf looks much different than that now, and it's not currently (2021-09-17) clear whether or how that change would need to apply to libf's impl.
Status and Symlinks
(Added 2024-07-22)
There's a slight inconsistency in "status" results vis a vis fossil(1) when using a symlink. In the admittedly odd case below, there's a checkout of project X on my hard drive, and another to that same checkout on a RAM disk. In order to avoid data loss if my system crashes, the file being worked on in the RAM disk is a symlink to a file from the on-hard-drive checkout:
[stephan@nuc:/home/ram/stephan/foo]$ l XYZ.test
lrwxrwxrwx 1 stephan stephan 39 Jul 19 15:41 XYZ.test -> /home/stephan/.../XYZ.test
(The working copy is on a RAM disk because this is a test-driver script and these tests are extremely I/O-heavy.)
Where "fossil status" reports (correctly) that all files are up-to-date, libf may incorrectly report, in the RAM-disk copy, that the file is out of data:
$ f-status
...
Local changes compared to this version:
MODIFIED XYZ.test
though running a diff clearly shows that it's current.
libf's symlink support has never been fleshed out (and has been outright ignored in numerous places as a result of my personal stance on storing symlinks in an SCM), so this disconnect could be in any number of places in the library.
Security-relevant
But not otherwise SCM-relevant...
- Port over
db_unprotect()
anddb_protect_pop()
APIs, which allow a db to effectively be made read-only except for limited windows where specific sections of it needs to be writable. Related:db.c:db_top_authorizer()
.
Non-SCM TODOs
In no particular order...
URL parsing: not because we really need URLs at the library level, but (A) so that libfossil can find a repo's default user name from its
remote-url
setting (which is often a URL with a user name encoded in it) and (B) so that client code can parse URLs in a manner known to be compatible with fossil's understanding of them.Add SPDX-style license attribution to all source files. This is ongoing.
Header file restructuring. The current separation of the APIs into many
include/fossil-scm/*.h
files is somewhat confusing. The initial intent was to keep my low-end development system of the time from choking on syntax highlighting on one large file, but those days are largely behind me. It may make sense to combine those into 1 public API file, 1 internal API file, and the auto-generated config file(s). (Even then, it's big enough to choke emacs' syntax highlighting on lower-end systems like Raspberry Pi SBCs.)Stop using char as booleans. This tree historically uses
char
type for booleans. Now that the tree is C99, we can switch to thebool
type. This is ongoing.f-vdiff: port in fossil:3504672187af59f0 in order to be able to select the diff width based on the terminal size.
f-vdiff/dibu: port the DELETE/INSERT collapsing from 8752aca1b7187d39 into the ncurses unified diff view.
Maybe (and Maybe Not) TODO
Undo support.
BOMs. Fossil's diff APIs internally convert their inputs to UTF8 and strip the BOM (if any) from them. libf does not do that. On the one hand i'm hesitant to do so because these blobs can be anything at all (not necessarily SCM controlled). On the other hand, for annotate's sake it might make sense to do so automatically because the user is passing in artifact IDs instead of file content. On the other other hand, the fossil routine for doing that (blob_to_utf8_no_bom()) is far, far more involved that simply stripping a BOM. i'm torn on whether that's the library's job or not, and really dislike having to either mutate the original inputs or reallocate them to make that conversion. OTOH, fossil does so.
Symlinks. i have always strongly disagreed with the addition of symlink support into fossil: platform-specific constructs simply have no place in the core of any SCM (with the "effectively necessary," as well as unobtrusive, exception of the executable bit). For platforms which don't support symlinks, fossil stores/manages them as plain text files with a single line holding the name of the referenced file. This is very likely the only route the library will take to supporting symlinks, especially since the hassles symlink handling caused fossil in late 2020 (long story). Probably the only way the library will support proper symlinks is if someone who uses that feature adds and maintains it.
Backlinks. Crosslinking "should" update the internal list of backlinks from certain text fields, but doing so requires parsing wiki/markdown-format text. See
backlink.c
in the fossil tree for the details. On the other hand, backlinks support only requires parsing wiki links, not the full grammar, so it might not be as painful as it initially sounds... though somewhat more for markdown, where we're required to do a multi-pass scan to handle its linking model. (We'd also need to handle verbatim blocks to avoid parsing links inside those blocks.)Ticket support. Ticket handling is surprisingly complicated, due largely to the customizability of the ticket database schema. If fossil-compatible ticket supports gets added to libfossil, it will very likely be because someone other than myself adds it! The core artifact data structure supports tickets, so the bits required for adding it are in place. Fossil, however, also uses the TH1 scripting engine in its ticket handling and that's an aspect the library-level API should arguably avoid.
Optimizations
Artifact parsing, in particular of checkins, is much slower in libfossil than fossil. Some of this is easily attributable to more abstraction layers, but certainly not all of it. Some optimization of crosslinking speed is certainly in order. As a point of comparison, try
fossil rebuild
vsf-parseparty --crosslink
. libf parses non-checkin types "plenty fast," e.g. 1600-odd control artifacts in roughly 6ms on my main computer. Checkins, however:6m20s for 155042m59s (debug) or 2m30s (non-debug) for 15504 checkins in the main fossil repo, as of this writing (just parsing, without crosslinking). On the sqlite3 repo it can only parseapprox. 3000 checkins in 10 minutes, at which point that test got cancelled24657 checkins in 10m9s (debug? Non-debug?). The reason for the serious speed degradation as the repo size increases is unclear.- This is at least partially (roughly 20-33%, based on basic tests) due to libfossil building in debug mode by default.
On 2021-03-24 the crosslinking was sped up by roughly 50% via the addition of a content cache identical to fossil's, but it still takes roughly 4m45s to parse and crosslink 16287 checkins in fossil's core repo with a debug libfossil build. 2m30s of that time is parsing.2021-10-03 update: a debug build of (f-parseparty -t c --quiet --crosslink
) can load and re-crosslink the 2178 checkins in its own repo in about 4.5s. That same thing on fossil's 17099 checkins takes about 5m55s on the same machine.- 2021-10-21 update: parsing was sped up significantly via ba35aa3a576fc060. A debug build of (
f-parseparty -q -c --dry-run --timer
) can parse and re-crosslink the 2414 checkins in its own repo in about 3.7s. That same thing on fossil's 17255 checkins takes about 231s (3m51s), approx. 140s of which is crosslinking and 85s of which isfsl_content_get()
. On a non-debug build those full load/parse/relink times drop to approx. 2.1s and 3m, respectively, and on the fossil repo 104s of that is crosslinking and 72s of it isfsl_content_get()
(only 3s is deck parsing).
Buffer caching. The library internally has to use many temporary memory buffers. Some of those it reuses as much as it can (e.g. for filename normalization), but some operations (
fsl_content_get()
in particular) have to use several, potentially many, temporary buffers of arbitrary sizes, which can easily lead it to allocating hundreds of thousands of times for a total of 1GB+, in a single session even if it only allocates a max concurrent memory of less than 15MB. We can probably improve this situation by installing a buffer cache intended primarily for use withfsl_content_get()
, in which we store some number of buffers totaling some certain max amount of concurrent memory. This could be achieved relatively inexpensively by either hard-coding a buffer array size (e.g. 10) or modifyingfsl_buffer()
to be a singly-linked list, the links being used solely for such a cache, and keeping the buffers in alloced-size order. The catch there, however, is that the decompression and de-deltification steps effectively makes reuse of such buffers next to impossible because we cannot easily and efficiently do those operations in-place in existing buffers.
Remote Synchronization
This will(?) be implemented in terms of abstract streaming APIs, very possibly the ones the library already uses for the majority of its file I/O and abstracting output streaming (e.g. it uses an abstract output stream for diff output, rather than writing directly to a buffer).
This will almost certainly be one of the last major features.
Wiki Parsing and Rendering
This set of features hovers right on the edge of out-of-scope for the core libfossil. Rendering is necessarily output-format-specific and the library has no business defining such outputs. Also...
The fossil implementations of these are not written with port-friendliness in mind, so a complete reimplementation would possibly be necessary.
In order to support near-arbitrary applications the wiki parsers need to be implemented in such a way that clients can customize (via hooks/callbacks) how links/references are generated.
The venerable Fossil Wiki format has the lowest priority. In practice markdown has easily taken the lead, and only old (pre-markdown) docs tend to be maintained in that format. However... checkin comments support a mime-type but several fossil(1) internals assume fossil-wiki format for those and we have 15+ years of checkin history in the wild which relies on that, so there is no real escaping from that format.
Non-TODOs
Any and all UI-related elements, including HTML, CSS, and JavaScript. The library will enable such applications but will not provide, e.g., an HTML framework beyond perhaps (maybe) diff and/or wiki rendering (the former being likely but the latter unlikely).
Scripting of tickets. There are no current plans to 100% mimic fossil's TH1 scripting of tickets. Fossil's use of TH1 was one of convenience, not long-term practicality. The library itself will have no "official" scripting language, but is designed specifically to make tying it to scripting engines relatively straightforward. We have a standalone copy of TH1 which "could" be integrated with little work, but doing so would not be an ideal path to go down.