Skip to content
🚧 Early alpha — building the foundation. See the roadmap →

Schema matching & alignment

Updated

Schema matching finds correspondences between elements of two different schemas. When Crosswalker’s import wizard detects that a column named “Control Family” should be a hierarchy level, it’s performing a simple form of schema matching. When NIST maps CSF subcategories to 800-53 controls, that’s ontology alignment.

These terms are often used interchangeably but have distinct meanings in the literature:

TermScopeOutputExample
Schema matchingFind corresponding elements between schemasSet of element pairs”Column A maps to Column B”
Schema mappingDefine how to transform data between schemasExecutable transformation”Split Column A by comma, map each to Column B”
Ontology alignmentFind corresponding concepts between ontologiesAlignment (set of correspondences with confidence)“CSF PR.AC-1 corresponds to 800-53 AC-1 (0.85 confidence)“
Ontology matchingThe process of finding an alignmentAlgorithm + heuristicsOAEI competition entries
Ontology mergingCombine two ontologies into oneUnified ontologySCF combining 175+ frameworks

Compare element names using string similarity metrics:

  • Edit distance (Levenshtein) — “Access Control” ↔ “Access Ctrl” = 4 edits
  • Token-based — split into words, compare sets: Control ∩ Management
  • Prefix/suffix — “AC-2” matches “AC-2(1)” by prefix

This is what Crosswalker’s column analysis does when suggesting column types based on header names.

Use the hierarchical structure of schemas:

  • Graph matching — compare the shape of two taxonomies
  • Children similarity — if children match, parents likely correspond
  • Path similarity — “Function > Category > Subcategory” in CSF ≈ “Family > Control > Enhancement” in 800-53

Compare actual data values:

  • Value overlap — columns sharing values like “AC-2” are likely related
  • Distribution similarity — columns with similar statistical profiles
  • Pattern matching — regex patterns in values (e.g., \w{2}-\d+ for control IDs)

Crosswalker uses this approach in its fingerprinting system for config matching.

Use external knowledge to find correspondences:

  • Thesaurus/dictionary lookup — “safeguard” ≈ “control” ≈ “requirement”
  • Embedding similarity — vector representations of concepts
  • Taxonomic distance — how far apart concepts are in a shared hierarchy
Algorithm/ToolApproachKey Innovation
COMA++Composite matcher, combines multiple strategiesReuse of previous match results
Similarity FloodingGraph-based propagation of similaritySimilarity “flows” through schema graphs
CupidLinguistic + structural matchingTree-based schema comparison
AgreementMakerExtensible framework of matchersCombines any number of matching strategies
LogMapLarge-scale ontology matchingHandles biomedical ontologies with 100K+ concepts

The Ontology Alignment Evaluation Initiative runs annual competitions where matching systems are evaluated against benchmark datasets. Tracks include:

  • Conference — aligning conference organization ontologies
  • Anatomy — matching adult mouse anatomy to human anatomy (large-scale)
  • Knowledge graphs — aligning cross-lingual knowledge graphs

The OAEI results provide the best objective comparison of matching approaches. Relevant to Crosswalker because framework matching is a specialized form of the problems OAEI evaluates.

Instead of creating pairwise mappings between every framework combination (N² problem), map each framework to a single pivot ontology:

Framework A ──→ Pivot ←── Framework B
Framework C ──→ Pivot ←── Framework D

With N frameworks and a pivot, you need only N mappings instead of N×(N-1)/2.

Pivot SystemFrameworks CoveredMethodology
SCF (Secure Controls Framework)175+STRM (Set Theory Relationship Mapping)
NIST OLIR~50Submission-based informative references
NIST CSF~30Community profiles and crosswalks
HITRUST CSF~50Commercial mappings

SCF’s STRM is the most comprehensive — it uses set theory to define relationships between controls: subset, superset, intersection, and equivalence. This mathematical rigor makes mappings more precise than simple “maps to” relationships.

NIST’s OLIR program uses Set Theory Relationship Mapping with five formal relationship types and 1-10 strength ratings:

RelationshipMeaningExample
SubsetSource is fully contained in targetCSF subcategory is a subset of an 800-53 control
IntersectsPartial overlapCIS safeguard partially covers a CSF subcategory
EqualExact equivalenceTwo frameworks describing the same requirement
SupersetSource fully contains target800-53 control is broader than a CSF subcategory
Not relatedNo meaningful correspondenceExplicitly documented non-relationships

This is the domain-specific instantiation of ontology alignment for cybersecurity — and the formal model that Crosswalker’s crosswalk metadata should eventually support.

Formal Concept Analysis uses lattice theory to algorithmically identify structural correspondences between concept hierarchies. Applied to frameworks: if two framework hierarchies share similar “formal concepts” (sets of objects sharing properties), FCA can discover mappings without relying on string similarity.

When a user imports a CSV, Crosswalker analyzes columns to suggest types (hierarchy, frontmatter, link, body). This is a lightweight schema matching problem:

  • String analysis of header names
  • Instance analysis of column values (unique counts, patterns)
  • Structural hints from column ordering

The config fingerprinting system extends this by comparing new files against previously matched schemas.

Framework crosswalking as ontology alignment

Section titled “Framework crosswalking as ontology alignment”

When Crosswalker generates links between imported frameworks, it’s performing ontology alignment. The matching methods (exact, array-contains, regex) in the crosswalks system are specialized schema matching strategies.

Currently, crosswalk mappings must be provided (from NIST, SCF, or user-defined). A future version could use schema matching algorithms to suggest crosswalk mappings between arbitrary imported frameworks — similar to how COMA++ or AgreementMaker work.

Recent OAEI evaluations show LLM-based systems (MILA, GenOM) achieving F1 scores of 0.83-0.95 on ontology matching benchmarks. This opens the possibility of using LLMs within Crosswalker to suggest crosswalk correspondences between arbitrary imported frameworks — a significant capability leap.

SSSOM (Simple Standard for Sharing Ontology Mappings) represents mappings as TSV/CSV with standardized metadata columns — essentially the same format Crosswalker already works with. Adopting SSSOM as an export format would make Crosswalker’s crosswalk data interoperable with the broader ontology evolution ecosystem.


  • Rahm & Bernstein, “A survey of approaches to automatic schema matching” (2001) — the foundational survey
  • Euzenat & Shvaiko, “Ontology Matching” (2013) — comprehensive textbook
  • Shvaiko & Euzenat, “Ontology matching: state of the art and future challenges” (2013) — IEEE survey
  • OAEI — Ontology Alignment Evaluation Initiative (annual competition)
  • AgreementMaker — open-source ontology matching framework
  • LogMap — large-scale ontology matching (Oxford)