Audit of 39 deepfake speech datasets shows most lack demographic metadata making fairness checks infeasible and reveals substantial overlap in bona fide sources that undermines cross-dataset generalization claims.
Navigating dataset documentations in ai: A large-scale analysis of dataset cards on hugging face.ArXiv, abs/2401.13822
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
ArtifactLinker frames SOTA discovery as missing-link prediction on an artifact graph of models and datasets, with a two-stage ranking-plus-verification pipeline and a new benchmark of 14k artifacts.
AdaQE-CG uses context-aware adaptive query expansion and inter-card knowledge transfer from a MetaGAI Pool to generate higher-quality model and data cards than prior methods, validated on the new expert-annotated MetaGAI-Bench.
FAIR^2 Drones is a proposed standard that adds platform metadata and annotation specifications to existing FAIR and AI-ready frameworks so wildlife drone datasets can support ecological analysis, robotics development, and computer vision benchmarking simultaneously.
citing papers explorer
-
Ethical and Technical Limits of Deepfake Speech Datasets
Audit of 39 deepfake speech datasets shows most lack demographic metadata making fairness checks infeasible and reveals substantial overlap in bona fide sources that undermines cross-dataset generalization claims.
-
ArtifactLinker: Linking Scientific Artifacts for Automatic State-of-the-Art Discovery
ArtifactLinker frames SOTA discovery as missing-link prediction on an artifact graph of models and datasets, with a two-stage ranking-plus-verification pipeline and a new benchmark of 14k artifacts.
-
AdaQE-CG: Adaptive Query Expansion for Web-Scale Generative AI Model and Data Card Generation
AdaQE-CG uses context-aware adaptive query expansion and inter-card knowledge transfer from a MetaGAI Pool to generate higher-quality model and data cards than prior methods, validated on the new expert-annotated MetaGAI-Bench.
-
FAIR^2 Drones: An AI-Ready Standard for Cross-Domain Wildlife Drone Datasets
FAIR^2 Drones is a proposed standard that adds platform metadata and annotation specifications to existing FAIR and AI-ready frameworks so wildlife drone datasets can support ecological analysis, robotics development, and computer vision benchmarking simultaneously.