Schema matching & alignment
What is schema matching?
Section titled “What is schema matching?”Schema matching finds correspondences between elements of two different schemas. When Crosswalker’s import wizard detects that a column named “Control Family” should be a hierarchy level, it’s performing a simple form of schema matching. When NIST maps CSF subcategories to 800-53 controls, that’s ontology alignment.
Terminology distinctions
Section titled “Terminology distinctions”These terms are often used interchangeably but have distinct meanings in the literature:
| Term | Scope | Output | Example |
|---|---|---|---|
| Schema matching | Find corresponding elements between schemas | Set of element pairs | ”Column A maps to Column B” |
| Schema mapping | Define how to transform data between schemas | Executable transformation | ”Split Column A by comma, map each to Column B” |
| Ontology alignment | Find corresponding concepts between ontologies | Alignment (set of correspondences with confidence) | “CSF PR.AC-1 corresponds to 800-53 AC-1 (0.85 confidence)“ |
| Ontology matching | The process of finding an alignment | Algorithm + heuristics | OAEI competition entries |
| Ontology merging | Combine two ontologies into one | Unified ontology | SCF combining 175+ frameworks |
Matching strategies
Section titled “Matching strategies”String-based
Section titled “String-based”Compare element names using string similarity metrics:
- Edit distance (Levenshtein) — “Access Control” ↔ “Access Ctrl” = 4 edits
- Token-based — split into words, compare sets: Control ∩ Management
- Prefix/suffix — “AC-2” matches “AC-2(1)” by prefix
This is what Crosswalker’s column analysis does when suggesting column types based on header names.
Structure-based
Section titled “Structure-based”Use the hierarchical structure of schemas:
- Graph matching — compare the shape of two taxonomies
- Children similarity — if children match, parents likely correspond
- Path similarity — “Function > Category > Subcategory” in CSF ≈ “Family > Control > Enhancement” in 800-53
Instance-based
Section titled “Instance-based”Compare actual data values:
- Value overlap — columns sharing values like “AC-2” are likely related
- Distribution similarity — columns with similar statistical profiles
- Pattern matching — regex patterns in values (e.g.,
\w{2}-\d+for control IDs)
Crosswalker uses this approach in its fingerprinting system for config matching.
Semantic
Section titled “Semantic”Use external knowledge to find correspondences:
- Thesaurus/dictionary lookup — “safeguard” ≈ “control” ≈ “requirement”
- Embedding similarity — vector representations of concepts
- Taxonomic distance — how far apart concepts are in a shared hierarchy
Key algorithms and tools
Section titled “Key algorithms and tools”| Algorithm/Tool | Approach | Key Innovation |
|---|---|---|
| COMA++ | Composite matcher, combines multiple strategies | Reuse of previous match results |
| Similarity Flooding | Graph-based propagation of similarity | Similarity “flows” through schema graphs |
| Cupid | Linguistic + structural matching | Tree-based schema comparison |
| AgreementMaker | Extensible framework of matchers | Combines any number of matching strategies |
| LogMap | Large-scale ontology matching | Handles biomedical ontologies with 100K+ concepts |
The OAEI initiative
Section titled “The OAEI initiative”The Ontology Alignment Evaluation Initiative runs annual competitions where matching systems are evaluated against benchmark datasets. Tracks include:
- Conference — aligning conference organization ontologies
- Anatomy — matching adult mouse anatomy to human anatomy (large-scale)
- Knowledge graphs — aligning cross-lingual knowledge graphs
The OAEI results provide the best objective comparison of matching approaches. Relevant to Crosswalker because framework matching is a specialized form of the problems OAEI evaluates.
The interlingua / pivot approach
Section titled “The interlingua / pivot approach”Instead of creating pairwise mappings between every framework combination (N² problem), map each framework to a single pivot ontology:
With N frameworks and a pivot, you need only N mappings instead of N×(N-1)/2.
Real-world pivot approaches
Section titled “Real-world pivot approaches”| Pivot System | Frameworks Covered | Methodology |
|---|---|---|
| SCF (Secure Controls Framework) | 175+ | STRM (Set Theory Relationship Mapping) |
| NIST OLIR | ~50 | Submission-based informative references |
| NIST CSF | ~30 | Community profiles and crosswalks |
| HITRUST CSF | ~50 | Commercial mappings |
SCF’s STRM is the most comprehensive — it uses set theory to define relationships between controls: subset, superset, intersection, and equivalence. This mathematical rigor makes mappings more precise than simple “maps to” relationships.
NIST OLIR’s formal relationship types
Section titled “NIST OLIR’s formal relationship types”NIST’s OLIR program uses Set Theory Relationship Mapping with five formal relationship types and 1-10 strength ratings:
| Relationship | Meaning | Example |
|---|---|---|
| Subset | Source is fully contained in target | CSF subcategory is a subset of an 800-53 control |
| Intersects | Partial overlap | CIS safeguard partially covers a CSF subcategory |
| Equal | Exact equivalence | Two frameworks describing the same requirement |
| Superset | Source fully contains target | 800-53 control is broader than a CSF subcategory |
| Not related | No meaningful correspondence | Explicitly documented non-relationships |
This is the domain-specific instantiation of ontology alignment for cybersecurity — and the formal model that Crosswalker’s crosswalk metadata should eventually support.
Formal Concept Analysis (FCA)
Section titled “Formal Concept Analysis (FCA)”Formal Concept Analysis uses lattice theory to algorithmically identify structural correspondences between concept hierarchies. Applied to frameworks: if two framework hierarchies share similar “formal concepts” (sets of objects sharing properties), FCA can discover mappings without relying on string similarity.
How this applies to Crosswalker
Section titled “How this applies to Crosswalker”Column detection as schema matching
Section titled “Column detection as schema matching”When a user imports a CSV, Crosswalker analyzes columns to suggest types (hierarchy, frontmatter, link, body). This is a lightweight schema matching problem:
- String analysis of header names
- Instance analysis of column values (unique counts, patterns)
- Structural hints from column ordering
The config fingerprinting system extends this by comparing new files against previously matched schemas.
Framework crosswalking as ontology alignment
Section titled “Framework crosswalking as ontology alignment”When Crosswalker generates links between imported frameworks, it’s performing ontology alignment. The matching methods (exact, array-contains, regex) in the crosswalks system are specialized schema matching strategies.
Future: automated crosswalk generation
Section titled “Future: automated crosswalk generation”Currently, crosswalk mappings must be provided (from NIST, SCF, or user-defined). A future version could use schema matching algorithms to suggest crosswalk mappings between arbitrary imported frameworks — similar to how COMA++ or AgreementMaker work.
Recent OAEI evaluations show LLM-based systems (MILA, GenOM) achieving F1 scores of 0.83-0.95 on ontology matching benchmarks. This opens the possibility of using LLMs within Crosswalker to suggest crosswalk correspondences between arbitrary imported frameworks — a significant capability leap.
SSSOM: a standard for sharing mappings
Section titled “SSSOM: a standard for sharing mappings”SSSOM (Simple Standard for Sharing Ontology Mappings) represents mappings as TSV/CSV with standardized metadata columns — essentially the same format Crosswalker already works with. Adopting SSSOM as an export format would make Crosswalker’s crosswalk data interoperable with the broader ontology evolution ecosystem.
Resources
Section titled “Resources”Surveys and foundational papers
Section titled “Surveys and foundational papers”- Rahm & Bernstein, “A survey of approaches to automatic schema matching” (2001) — the foundational survey
- Euzenat & Shvaiko, “Ontology Matching” (2013) — comprehensive textbook
- Shvaiko & Euzenat, “Ontology matching: state of the art and future challenges” (2013) — IEEE survey
Tools and initiatives
Section titled “Tools and initiatives”- OAEI — Ontology Alignment Evaluation Initiative (annual competition)
- AgreementMaker — open-source ontology matching framework
- LogMap — large-scale ontology matching (Oxford)
Framework-specific mapping tools
Section titled “Framework-specific mapping tools”- SCF STRM — Set Theory Relationship Mapping methodology
- NIST OLIR Catalog — official framework crosswalks
- NIST CPRT — interactive framework reference tool
- ATT&CK Mappings Explorer — cross-framework technique mappings
- Apptega — commercial crosswalking platform