Document Accumulation and Indexing

Uncategorized Add comments

Many software engineers — myself included — prefer to independently search out solutions to their roblems.  Maybe this is a case of Imposter Syndrome: “If I ask how, they’ll KNOW I’m an idiot”.  For me, it’s also a case of wanting to solve a problem without waiting for the ~10:00 crowd to wander into the office to start their day.  Impatience is a motivator.

The problem is how do we make documentation searchable?  It’s simple enough to say “htdig”, but that requires a view of the entire landscape to index.  Moreover, the documentation that would be included has to come from various different sources — the world isn’t always a monorepo.  Even addressing the mix of code-extracted documentation (ie Doxygen) and user-curated (Sphinx, unversioned/outdated blogs, un-version-controlled Wikis), the generated and resulting http content needs to be in the same place.  The more it has to be done manually, the more likely it won’t be.

There’s two options I’m considering: pushing PRs to a common repos, or git-submoduling:


The process of updating a git-submodule with documentation resources such as generated Doxygen XML might seem simpler form the repository side, but I worry about relying on developers to do additional steps.  The secret of ensuring that certain steps are done every time is to script them behind automation.  Even automation that needs a trigger activated by a developer or anyone in a rush or not as good at repetitive steps means the trigger won’t always be activated.

populating a git submodule of documentation requires developer-side activities done every time, and might be automatable using a .pre-commit hook.  This path only makes sense if you want to keep documentation linked back to a repository, or you want to enforce certain levels of documentation at the commit stage (ie block commits that push documentation coverage below a certain amount). Otherwise, it just won’t get done.


If we can arrive at a common workable (not perfect) format, it’s possible to push document updates to a single repository.  The benefit there is that repo managers such as github (workflows) or CI/CD tools such as Jenkins can be configured to rebuild whenever there is a change to the master branch.

Jenkins itself can be configured to checkout a repository and copy things in place.  For this to work, we’d need the following to happen:

  1. A contributing repository would need to include user-generated documentation in a compatible format and a known subdir
  2. A contribution sourcecode repository would need to markup documentation with an extractable format
  3. A CI/CD such as Jenkins would need to harvest that generated content on a successful build of a master or production branch and push to the document-accumulating repository
    1. If the push is a commit to master-branch, fairly straightforward
    2. If the push is a PR against the accumulating repository, bot-based accept-and-merge (ie mergebot) needs to be configured
  4. The document-accumulating needs to merge and index the documentation on every push to master
  5. The document-accumulating repository needs to generate new containers of indexed content to federate around an organization

This might be quite possible if the acumulating repository is a consistent format from which organizations can fork.  What about the format?

This is discussed and somewhat functional on 

What’s the Magic Step for Go?

Make a go parser that writes Doxygen XML 🙂

OK, that seems crazy on the surface; hold your nose and stick with me a sec:

I see some efforts to make “My Vanity documenter” for go, which (based on low kloc for the entire project) implies that much of the code parse/introspection is pre-existing, perhaps already leveraged in go vet and go lint. These various small Doxygen-like extractors can mislead that the lift is relatively small compared to the gains (the bang-for-buck ratio gives big gains for acceptable effort).

This would give an onramp to Go lang documentation to join documentation in other formats/languages. The greater goal of a single indexing and search mechanism finding all the documentation cross-language can be a compelling win.




Leave a Reply

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in