🚧 Early alpha — building the foundation. See the roadmap →

Schema matching & alignment

Updated Jun 1, 2026

The same broad concept of “finding correspondences between structured datasets” goes by many names depending on the field:

Schema matching / schema alignment — database community (Rahm & Bernstein 2001 is the canonical survey)
Ontology matching / ontology alignment — Semantic Web community (Euzenat & Shvaiko 2013 is the canonical textbook)
Crosswalking — library science, metadata interoperability (Woodley 2008)
Data integration — data warehousing and ETL
Schema mediation / schema mapping — federated databases and AI
Framework mapping — the compliance / GRC practitioner term Crosswalker uses
Synthetic spine / hub-and-spoke mapping / meta-framework — the architectural pattern when each source maps once to a canonical intermediate (see the interlingua / pivot section below)

Within a single field these terms have precise distinct meanings — see the terminology distinctions table immediately below, which is deliberately pedantic about scope differences. But across fields they’re routinely used interchangeably, and Crosswalker straddles all of them: it’s a database-style importer (schema matching vocabulary) that produces ontology-aligned crosswalk edges (Semantic Web vocabulary) for compliance frameworks (GRC practitioner vocabulary) in a library-science-flavored file-first vault (crosswalking vocabulary). Which term you see in any given KB page depends on which community that page is drawing from.

What is schema matching?

Schema matching finds correspondences between elements of two different schemas. When Crosswalker’s import wizard detects that a column named “Control Family” should be a hierarchy level, it’s performing a simple form of schema matching. When NIST maps CSF subcategories to 800-53 controls, that’s ontology alignment.

Terminology distinctions

These terms are often used interchangeably but have distinct meanings in the literature:

Term	Scope	Output	Example
Schema matching	Find corresponding elements between schemas	Set of element pairs	”Column A maps to Column B”
Schema mapping	Define how to transform data between schemas	Executable transformation	”Split Column A by comma, map each to Column B”
Ontology alignment	Find corresponding concepts between ontologies	Alignment (set of correspondences with confidence)	“CSF PR.AC-1 corresponds to 800-53 AC-1 (0.85 confidence)“
Ontology matching	The process of finding an alignment	Algorithm + heuristics	OAEI competition entries
Ontology merging	Combine two ontologies into one	Unified ontology	SCF combining 175+ frameworks

Matching strategies

String-based

Compare element names using string similarity metrics:

Edit distance (Levenshtein) — “Access Control” ↔ “Access Ctrl” = 4 edits
Token-based — split into words, compare sets: Control ∩ Management
Prefix/suffix — “AC-2” matches “AC-2(1)” by prefix

This is what Crosswalker’s column analysis does when suggesting column types based on header names.

Structure-based

Use the hierarchical structure of schemas:

Graph matching — compare the shape of two taxonomies
Children similarity — if children match, parents likely correspond
Path similarity — “Function > Category > Subcategory” in CSF ≈ “Family > Control > Enhancement” in 800-53

Instance-based

Compare actual data values:

Value overlap — columns sharing values like “AC-2” are likely related
Distribution similarity — columns with similar statistical profiles
Pattern matching — regex patterns in values (e.g., \w{2}-\d+ for control IDs)

Crosswalker uses this approach in its fingerprinting system for config matching.

Semantic

Use external knowledge to find correspondences:

Thesaurus/dictionary lookup — “safeguard” ≈ “control” ≈ “requirement”
Embedding similarity — vector representations of concepts
Taxonomic distance — how far apart concepts are in a shared hierarchy

Key algorithms and tools

Algorithm/Tool	Approach	Key Innovation
COMA++	Composite matcher, combines multiple strategies	Reuse of previous match results
Similarity Flooding	Graph-based propagation of similarity	Similarity “flows” through schema graphs
Cupid	Linguistic + structural matching	Tree-based schema comparison
AgreementMaker	Extensible framework of matchers	Combines any number of matching strategies
LogMap	Large-scale ontology matching	Handles biomedical ontologies with 100K+ concepts

The OAEI initiative

The Ontology Alignment Evaluation Initiative runs annual competitions where matching systems are evaluated against benchmark datasets. Tracks include:

Conference — aligning conference organization ontologies
Anatomy — matching adult mouse anatomy to human anatomy (large-scale)
Knowledge graphs — aligning cross-lingual knowledge graphs

The OAEI results provide the best objective comparison of matching approaches. Relevant to Crosswalker because framework matching is a specialized form of the problems OAEI evaluates.

The interlingua / pivot approach

Also called: spine, hub, canonical intermediate ontology, hub-and-spoke mapping. The compliance-mapping community (SCF, UCF, NIST OLIR) tends to say spine or hub; the schema-matching / ontology-alignment literature says pivot or interlingua. Same pattern.

Instead of creating pairwise mappings between every framework combination (N² problem), map each framework to a single pivot ontology:

Framework A ──→ Pivot ←── Framework B
Framework C ──→ Pivot ←── Framework D

With N frameworks and a pivot, you need only N mappings instead of N×(N-1)/2.

Real-world pivot approaches

Pivot System	Frameworks Covered	Methodology
SCF (Secure Controls Framework)	175+	STRM (Set Theory Relationship Mapping)
NIST OLIR	~50	Submission-based informative references
NIST CSF	~30	Community profiles and crosswalks
HITRUST CSF	~50	Commercial mappings

SCF’s STRM is the most comprehensive — it uses set theory to define relationships between controls: subset, superset, intersection, and equivalence. This mathematical rigor makes mappings more precise than simple “maps to” relationships.

NIST OLIR’s formal relationship types

NIST’s OLIR program uses Set Theory Relationship Mapping with five formal relationship types and 1-10 strength ratings:

Relationship	Meaning	Example
Subset	Source is fully contained in target	CSF subcategory is a subset of an 800-53 control
Intersects	Partial overlap	CIS safeguard partially covers a CSF subcategory
Equal	Exact equivalence	Two frameworks describing the same requirement
Superset	Source fully contains target	800-53 control is broader than a CSF subcategory
Not related	No meaningful correspondence	Explicitly documented non-relationships

This is the domain-specific instantiation of ontology alignment for cybersecurity — and the formal model that Crosswalker’s crosswalk metadata should eventually support.

Formal Concept Analysis (FCA)

Formal Concept Analysis uses lattice theory to algorithmically identify structural correspondences between concept hierarchies. Applied to frameworks: if two framework hierarchies share similar “formal concepts” (sets of objects sharing properties), FCA can discover mappings without relying on string similarity.

How this applies to Crosswalker

Column detection as schema matching

When a user imports a CSV, Crosswalker analyzes columns to suggest types (hierarchy, frontmatter, link, body). This is a lightweight schema matching problem:

String analysis of header names
Instance analysis of column values (unique counts, patterns)
Structural hints from column ordering

The config fingerprinting system extends this by comparing new files against previously matched schemas.

Framework crosswalking as ontology alignment

When Crosswalker generates links between imported frameworks, it’s performing ontology alignment. The matching methods (exact, array-contains, regex) in the crosswalks system are specialized schema matching strategies.

Future: automated crosswalk generation

Currently, crosswalk mappings must be provided (from NIST, SCF, or user-defined). A future version could use schema matching algorithms to suggest crosswalk mappings between arbitrary imported frameworks — similar to how COMA++ or AgreementMaker work.

Recent OAEI evaluations show LLM-based systems (MILA, GenOM) achieving F1 scores of 0.83-0.95 on ontology matching benchmarks. This opens the possibility of using LLMs within Crosswalker to suggest crosswalk correspondences between arbitrary imported frameworks — a significant capability leap.

SSSOM (Simple Standard for Sharing Ontology Mappings) represents mappings as TSV/CSV with standardized metadata columns — essentially the same format Crosswalker already works with. Adopting SSSOM as an export format would make Crosswalker’s crosswalk data interoperable with the broader ontology evolution ecosystem.

Resources

Surveys and foundational papers

Rahm & Bernstein, “A survey of approaches to automatic schema matching” (2001) — the foundational survey
Euzenat & Shvaiko, “Ontology Matching” (2013) — comprehensive textbook
Shvaiko & Euzenat, “Ontology matching: state of the art and future challenges” (2013) — IEEE survey

Tools and initiatives

OAEI — Ontology Alignment Evaluation Initiative (annual competition)
AgreementMaker — open-source ontology matching framework
LogMap — large-scale ontology matching (Oxford)

Framework-specific mapping tools

SCF STRM — Set Theory Relationship Mapping methodology
NIST OLIR Catalog — official framework crosswalks
NIST CPRT — interactive framework reference tool
ATT&CK Mappings Explorer — cross-framework technique mappings
Apptega — commercial crosswalking platform