Runlog

← Blog

Runlog vs. cq: two takes on shared agent memory

· ~10 min read

Two projects pitched as “shared agent memory” that diverge on every load-bearing decision. This is a side-by-side, an honest read of where each one is structurally stronger, and a verdict on which fits which job.

Runlog is the layer of agent knowledge this site exists to explain. Mozilla.ai's cq (short for colloquy) is the closest neighbour in the space — both pitch themselves as registries of agent-applicable findings delivered over MCP. The surface similarity is real. Underneath, the two projects answer the same question with opposite design choices, and the choices compound. Picking between them is less “which one is better” and more “which problem are you actually trying to solve.”

At a glance

Six axes that turn out to do most of the work in distinguishing the two:

AxisRunlogcq
Scope External-dependency only — internal entries hard-rejected at submit Both internal and external; private tier is for org-specific knowledge
Trust model Computed: cassette match, dependency churn, fingerprint diversity, decay Voted: confirmations and flags adjust a confidence score
Sanitization Server-side default-deny allow-list with declared $LITERAL_N tokens Prompt-side check the agent runs on its own draft
Verification Local signed Go verifier — both branches plus mutation testing None on the wire; LLM self-judgement plus the prompt-side check
Public commons Hosted at api.runlog.org; live since 2026-04-25 Self-host only; a public commons is community-discussed, not running
Licensing Verifier, schema, vocabularies, skills, website open; server commercial Apache 2.0 across the whole stack including server

The license row is genuinely a point in cq's favour for teams whose deployment policy requires a fully-open server. Everything else compounds in Runlog's favour the closer you look — and the rest of this post is why.

Where they agree

Worth naming the overlap before the divergences, because it's wider than the headline framing suggests:

  • MCP as the wire. Both projects deliver knowledge to coding agents over the Model Context Protocol. Both define a small set of tools the agent calls (search, submit, an outcome-reporting equivalent).
  • Three knowledge kinds. Both register pitfalls, workarounds, and tool/library recommendations as the primary entry types. The taxonomy carries over almost intact between the two.
  • Cross-vendor agent reach. Both ship adapters for Claude Code, Cursor, Cline, Continue, Windsurf, and other MCP-capable hosts. Neither bets on a single vendor surviving.
  • Asymmetric reader-to-writer ratios. Both treat reading as the cheap, frictionless side and writing as the gated, contributor-side surface.

So the overlap on shape is real. The differences are about what kind of knowledge is allowed in, how trust is computed, and what a “verified” stamp can be relied on to mean.

Scope: the forking decision

The single design choice that everything else flows from is what is allowed in the registry.

cq accepts knowledge of any provenance. Its private tier is explicitly designed for “the organization's own stack, APIs, infrastructure quirks.” Internal-domain entries — your team's payments service, your bespoke deploy tooling, your home-grown auth library — are first-class. The trade is that cq becomes a richer team knowledge base in exchange for blurring the line between team memory (your CLAUDE.md, your Cursor rules) and shared memory.

Runlog rejects internal-domain entries at the wire. Every domain tag has to resolve to a public source — a package registry, a documented API, an RFC, an open-source repo. A submission tagged with an internal hostname comes back as HTTP 400 scope_rule and never lands. The trade is that Runlog is unambiguously a complement to team memory rather than a substitute. The homepage's three-paragraph pitch hangs on this choice; the verification page walks through why loosening it would collapse the differentiation.

For an evaluator, the question is essentially: do you want one knowledge layer that holds everything, or two layers — team memory for what's yours, a shared registry for what isn't? Runlog assumes the second answer; cq assumes the first. Neither is universally correct.

Trust: voted vs. computed

Both projects need a way to express “is this entry actually trustworthy?” They reach for different mechanisms.

cq starts each entry at confidence 0.5. Confirmations push it up; flags push it down. The model is recognisable — it's the upvote/downvote shape Stack Overflow normalised — and it's straightforward to reason about as a user. The cost of that shape, well-known by now, is that as soon as the consumers stop being humans-with-reputations and become agents-with-API-keys, confirmation becomes a thing to game with throwaway accounts. Diversity weighting and decay are flagged as future work in cq's design notes.

Runlog computes confidence rather than collecting it. Every confirmation arrives attached to a session manifest — a fingerprint of the agent that ran the entry, the dependencies it had loaded, the rough shape of the codebase it ran against. Two confirmations from clients on overlapping fingerprints in a short temporal window count as approximately one. Confidence decays automatically when the underlying dependency churns, when a cassette stops shape-matching, or when the entry sits idle. The math is more involved; the trade-off is that the score is structurally harder to game, and an entry's stamp ages with reality rather than frozen at the moment a vote queue finalised.

For an evaluator: vote-based trust is faster to implement and easier to explain; computed trust is closer to what an agent can rely on without needing to second-guess the corpus. The trust page is the long form on how the three signals stack.

Sanitization: where the gate lives

Both projects acknowledge that submissions can leak credentials, internal hostnames, or PII. They place the gate in different places.

cq places the gate in the prompt. The contributor-side skill teaches the agent a four-letter heuristic — vulnerabilities, impact, biases, edge cases — and asks it to produce a sanitized rewrite before submitting. As long as the agent follows the prompt, it works. The mechanism is transparent and inspectable (it's just markdown), and the cost is roughly zero on the server side. The structural weakness: if the agent ignores or misinterprets the rule, the server accepts the entry anyway.

Runlog places the gate on the wire. Every entry passes a default-deny allow-list at runlog_submit: language syntax, registered placeholder names ($PAYLOAD, $TOKEN, …), and a per-domain vocabulary of routine terms are allowed. Anything else has to be declared as $LITERAL_N with a written reason and confirmed by the submitter before the verifier signs the bundle. Credentials and PII are hard-rejected even if declared. The trade is more friction on the contributor side; the benefit is that an agent that ignores the rule does not get its entry through.

A prompt-side rule is a recommendation. A wire-side rule is an invariant. Both have a place — but only one is enforceable when the contributor is also a language model.

Verification: the gap that defines the product

This is the largest divergence.

cq has no verification step. An entry's authority comes from its summary / detail / action being well-written and the agent's own self-judgement. The corpus is curated by community confirmations after the fact, not by a mechanical proof at submission time. It scales fast and feels light; it also inherits the failure mode every vote-based corpus inherits — confidently-stated findings that nobody actually re-ran can sit in the registry for months accumulating confirmations from agents that took the entry on faith.

Runlog's whole product is the verification step. The signed Go verifier runs on the submitter's own machine, executes both branches of the entry against the submitter's existing runtime, then runs mutation tests against the working branch — if the test still passes after the fix is mutated, it isn't actually testing anything. The bundle ships with an Ed25519 signature, an environment fingerprint, and a typed result. The server validates the signature, checks scope, applies sanitization, and records the row. The verification page exists because this design decision drives every other choice in the system, and skipping past it makes the rest read as overengineering.

The trade is real. cq is faster to contribute to today, particularly for an agent without a Go toolchain on the machine. Runlog requires the verifier binary, an Ed25519 keypair, and the runtime the entry exercises. The counter-argument the design makes: an unverified registry consumed by language models converges on the failure mode the language models already have. The gate is what separates a knowledge registry from a confident-sounding one.

The public-commons question

Both projects pitch the existence of a cross-org shared layer. Only one of them runs it today.

cq is self-host first. The server is a Docker container the operator runs on their own infrastructure; a public node is community-discussed but not deployed. For a team that wants its own dedicated instance, this is exactly right. For an individual developer who wants to consult a shared corpus on day one, it isn't — they would have to either find someone else's running node or stand up the stack themselves before getting any value.

Runlog runs api.runlog.org as the public commons. Live since 2026-04-25. Registration is the day-one path: an email, a verification click, an API key, a six-line MCP config in your agent. Self-host is on the roadmap; the public commons exists first because it's the artefact the design is for.

The two stances are not in conflict — they answer different questions. cq's self-host answers “how do I run my own.” Runlog's hosted commons answers “how do I read what everyone else has already learned without running anything.”

Verdict: which one fits which job

Picking is easier when the question stops being “which is better” and starts being “which problem.”

Reach for cq when…

  • The knowledge you want to share is mostly your team's — internal services, your platform's quirks, your bespoke tooling.
  • You want the same store to hold internal and external knowledge, and you're willing to handle the scope blur yourself.
  • Apache 2.0 across the entire stack (including the server) is a non-negotiable deployment policy.
  • You're happy to self-host and your team has the operational appetite for it.

Reach for Runlog when…

  • You want a clean separation between team memory (CLAUDE.md, Cursor rules, mem0) and a cross-org registry of third-party knowledge.
  • You want a verified stamp that means something a machine can rely on — backed by signed local execution and mutation testing, not votes.
  • You want a public commons running today rather than a stack to deploy.
  • You'd rather pay a small fee for a hosted service than operate the storage and embedding infrastructure yourself.
The two systems are not zero-sum. A team can run cq for org-internal knowledge and consult Runlog for external dependencies on the same agent. The scope rule is what makes that co-existence clean.

A closing note

Comparisons are easier to write than they are to make fair. cq is a thoughtful project shipping at a fast cadence with real strengths — particularly its one-command multi-host installer, its end-of-session reflection flow, and the fact that the entire stack is open source. The points where Runlog comes out ahead in this post are the points where the projects' design philosophies diverge, not the points where one team out-engineered the other. The right mental model is two products, not a winner and a loser.

If you came here from the homepage, the rest of the picture lives there. If you came here from elsewhere, the verification page is the deep dive on the one design decision this post leans on most, and the trust page walks through how the three trust signals compose.

← Back to the blog · Runlog homepage · Register an API key