TACITUS is conflict intelligence infrastructure — a governed data layer that transforms scattered dispute traces into structured knowledge graphs with temporality, causality, and provenance preserved.

How is TACITUS different from standard LLMs?

Standard LLMs lack persistent memory, cannot track temporal ordering of events, and hallucinate on conflict facts. TACITUS provides a deterministic knowledge graph layer that grounds LLM reasoning.

What industries does TACITUS serve?

TACITUS serves HR disputes, legal litigation, commercial conflicts, diplomatic negotiations, institutional governance, and humanitarian conflict analysis.

What products are in the TACITUS suite?

The TACITUS suite includes six products: Conflict Compass (conflict navigation), Wind Tunnel (societal simulation), PRAXIS (the conflict operating system), CONCORDIA (live mediation intelligence), and ARGUS (document intelligence). All are built on the same Core Engine and Conflict Ontology.

How does TACITUS work?

TACITUS uses a neurosymbolic architecture combining a deterministic knowledge graph with LLM reasoning. Unstructured text about conflicts is extracted into a typed graph using an 8-primitive Conflict Ontology (Actor, Claim, Interest, Constraint, Leverage, Commitment, Event, Narrative), preserving temporal ordering, causal chains, and source provenance.

Is TACITUS open source?

The TACITUS Knowledge Pipeline is open source on GitHub under an MIT license. The core platform products (Compass, Wind Tunnel, PRAXIS, CONCORDIA, ARGUS) are commercial offerings with free demo access available.

What is the TACITUS Conflict Ontology?

The Conflict Ontology is a formal knowledge representation with 8 primitives (Actor, Claim, Interest, Constraint, Leverage, Commitment, Event, Narrative), 41+ classes, and 29+ typed properties. It provides a universal grammar for structuring any dispute — from workplace HR cases to international negotiations.

How is TACITUS different from Other companies?

Other popular companies are general-purpose data integration platform. TACITUS is purpose-built for conflict — it provides a domain-specific ontology, conflict-native reasoning, and products designed for dispute resolution, mediation, and peace & security workflows. TACITUS makes conflict structure legible; others makes data accessible.

BENCHMARK · v14 · IN ACTIVE DEVELOPMENT

The TACITUS Conflict Grammar Corpus.

An open benchmark for conflict reasoning, grounded in the Agentic Conflict Ontology. Built to measure the things generic language models fail at — time, causality, provenance, commitment tracking — not the things they already do well.

480+

Evaluation points

Task types

Domains

v14

Version

WHAT THE TCGC MEASURES

Fourteen task types.

Each task type targets a specific capability that standard retrieval-augmented generation cannot handle reliably. Metrics are reported per task type and per domain.

Actor resolution

Disambiguate references and alias clusters across long documents.

Claim extraction

Surface asserted facts, evaluative statements, and normative claims.

Interest extraction

Infer underlying interests distinct from stated positions (Fisher/Ury).

Constraint extraction

Identify rules, norms, and structural bounds shaping feasible outcomes.

Leverage mapping

Attribute leverage resources and dependencies to the actor holding them.

Commitment tracking

Distinguish claims from commitments and track their evolution over time.

Event ordering

Reconstruct temporal sequence from mixed narrative prose.

Narrative drift

Detect framing changes across time and party.

Causal chain reconstruction

Build multi-hop causal chains with explicit mechanism and conditions.

Contradiction detection

Identify claims that cannot simultaneously hold across actors or time.

Provenance attribution

Bind every extracted primitive back to its source span.

Commitment-claim mismatch

Flag instances where stated commitment diverges from behavioral evidence.

Position-interest separation

Separate surface positions from underlying interests.

Cross-document synthesis

Assemble a coherent conflict graph from multiple, partially-contradictory sources.

METHODOLOGY

How it is built.

TCGC items are drawn from two domains — human friction (HR, commercial, governance) and complex multi-party scenarios (policy, peace process, multilateral) — with intentional diversity in length, source mix, and discourse style.

Annotation proceeds in three passes: primitive tagging, edge labelling, and ground-truth question authoring. Inter-annotator agreement targets are task-type specific; tasks that depend on inferred primitives (like Interest) have lower targets than surface-level tasks (like Actor resolution), and we report the actual agreement transparently.

The evaluation harness is designed to be compatible with standard runners (HELM, lm-eval-harness) via a thin adapter. We will publish the adapter alongside the first public split.

OPEN RESEARCH QUESTIONS

What we are still figuring out.

Six questions we do not yet have clean answers for. Click each to read the current thinking and tell us where it is wrong.

Q1How do we evaluate "interest extraction" when interests are never stated?+

Interests live beneath claims. Annotating them means annotator inference, which means inter-annotator agreement suffers. We are testing a tiered annotation protocol — stated, inferred-with-citation, analyst-reconstructed — with separate κ targets for each tier. Open question: is κ even the right metric when the gold standard is itself interpretive?

Tell us where this is wrong

Q2What is the right metric for contradiction detection?+

Straight F1 on pairs of contradictory claims misses the shape of the thing. A conflict may contain a dozen small contradictions, of which only two are load-bearing. We are piloting a weighted metric that upscales contradictions the mediator marks as "material." Still too narrow to be the primary score.

Tell us where this is wrong

Q3How do we handle ontology disagreement across domains?+

The ACO is designed to be domain-general. But HR investigators and peace mediators disagree, productively, on what counts as a Commitment or a Leverage primitive. Our working answer: maintain domain-specific subclasses beneath the primitive roots, and report per-subclass numbers alongside primitive-level numbers. This is not settled.

Tell us where this is wrong

Q4How do we measure "temporal reasoning" without overfitting to date extraction?+

It is trivially easy to write a benchmark where strong temporal reasoning correlates almost perfectly with good date-string extraction. We do not want to reward that. TCGC temporal tasks deliberately strip explicit dates from half the items, forcing systems to reconstruct ordering from discourse cues.

Tell us where this is wrong

Q5What is the right baseline?+

We plan to report against: GPT-4-class and Claude 3-class zero-shot; a strong RAG baseline; a GraphRAG baseline; and a few small specialized extractors. The TACITUS neurosymbolic system is the reference point, not the only point. If a simpler baseline matches us on a task type, we say so.

Tell us where this is wrong

Q6Should the TCGC include adversarial examples?+

Adversarially-crafted items — where the surface reads one way but the structure goes another — are the most diagnostic and the most gameable. We are leaning toward a separate adversarial split, clearly labelled, used for stress-testing rather than headline scores.

Tell us where this is wrong

HOW TO CONTRIBUTE

Three ways in.

TIER 01

Read the protocol

The v0.1 evaluation protocol draft is available on request. Comments accepted on the annotation guidelines, task-type definitions, and metric choices.

Request the draft

TIER 02

Request corpus access

Early-access splits are available to academic researchers and pilot partners under a light DUA. Write in with your proposed use case.

Request access

TIER 03

Propose a new task type

Found something the 14 current task types miss? Send us a proposal — one paragraph of motivation, one worked example, one suggested metric.

Propose a task

PUBLICATION PLAN

Q4 2026TCGC dataset paper — target arXiv preprint, then workshop submission.
Q1 2027OAG methodology paper — target ACL submission, open-source reference implementation alongside.
Q2 2027TCGC v2 — multi-language extension, adversarial split, first third-party annotator cohort.

Read, critique, or pilot.

Read the Vision paper hello@tacitus.me

How it is built.

The evaluation harness is designed to be compatible with standard runners (HELM, lm-eval-harness) via a thin adapter. We will publish the adapter alongside the first public split.

What we are still figuring out.

Six questions we do not yet have clean answers for. Click each to read the current thinking and tell us where it is wrong.

Q1How do we evaluate "interest extraction" when interests are never stated?+

Tell us where this is wrong

Q2What is the right metric for contradiction detection?+

Tell us where this is wrong

Q3How do we handle ontology disagreement across domains?+

Tell us where this is wrong

Q4How do we measure "temporal reasoning" without overfitting to date extraction?+

Tell us where this is wrong

Q5What is the right baseline?+

Tell us where this is wrong

Q6Should the TCGC include adversarial examples?+

Tell us where this is wrong

Three ways in.

TIER 01

Read the protocol

The v0.1 evaluation protocol draft is available on request. Comments accepted on the annotation guidelines, task-type definitions, and metric choices.

Request the draft

TIER 02

Request corpus access

Early-access splits are available to academic researchers and pilot partners under a light DUA. Write in with your proposed use case.

Request access

TIER 03

Propose a new task type

Found something the 14 current task types miss? Send us a proposal — one paragraph of motivation, one worked example, one suggested metric.

Propose a task