Dependency-aware synthetic tabular data better preserves feature relationships

Research area:Computer ScienceArtificial IntelligenceSynthetic data

What the study found

The Hierarchical Feature Generation Framework (HFGF) improved preservation of functional dependencies and logical dependencies in synthetic tabular data. The abstract states that this improved structural fidelity and downstream utility across several generative models.

Why the authors say this matters

The authors say synthetic tabular data is often used in privacy-sensitive areas such as health care, but existing models may fail to keep important relationships between attributes. The study suggests that better preservation of these dependencies matters because they are part of the data's structure and usefulness.

What the researchers tested

The researchers proposed HFGF, a framework that first generates independent features using a standard generative model and then reconstructs dependent features using predefined functional dependency and logical dependency rules. They created benchmark datasets with known dependencies and tested the framework on four datasets with different sizes, feature imbalance, and dependency complexity.

What worked and what didn't

HFGF improved preservation of functional dependencies and logical dependencies across six generative models, including CTGAN, TVAE, and GReaT. The abstract does not report which models or dataset settings performed best or worst, only that the framework improved results overall in the experiments described.

What to keep in mind

The summary only reports the abstract, so detailed numerical results, specific limitations, and failure cases are not provided. The evaluation was done on benchmark datasets with known dependencies, so the findings are limited to that setting.

Key points

HFGF improved preservation of functional dependencies and logical dependencies in synthetic tabular data.
The abstract says the framework also improved structural fidelity and downstream utility.
The method generates independent features first, then reconstructs dependent features using predefined rules.
Experiments used four benchmark datasets with known dependencies.
The framework was tested across six generative models, including CTGAN, TVAE, and GReaT.

Disclosure

Research title:: Dependency-aware synthetic tabular data better preserves feature relationships
Authors:: Chaithra Umesh, Kristian Schultz, Manjunath Mahendra, Saptarshi Bej, Olaf Wolkenhauer
Publication date:: 2026-04-22
DOI:: 10.1016/j.patcog.2026.113819
OpenAlex record:: View

AI provenance: This post was generated by OpenAI. The original authors did not write or review this post.

Dependency-aware synthetic tabular data better preserves feature relationships

What the study found

Why the authors say this matters

What the researchers tested

What worked and what didn't

What to keep in mind

Disclosure

More posts

Dynamic crowd-shipping policies cut delivery costs

Log-Sobolev inequality holds for some Gibbs measures

BAGLE adds binary lens and source models for microlensing

Subgrid dynamo model reproduces magnetic field cycles in thin disks