Less Is More? When Dataset Context Hurts LLM-Generated Dataset Descriptions

Arunav Das; Elena Simperl; Johanna Walker; Klaus Diepold; Lisa-Yao Gan

arxiv: 2606.02334 · v1 · pith:LG54SK4Anew · submitted 2026-06-01 · 💻 cs.DB

Less Is More? When Dataset Context Hurts LLM-Generated Dataset Descriptions

Lisa-Yao Gan , Arunav Das , Johanna Walker , Klaus Diepold , Elena Simperl This is my paper

Pith reviewed 2026-06-28 12:00 UTC · model grok-4.3

classification 💻 cs.DB

keywords dataset descriptionsLLM metadata generationschema penaltydata qualityablation studyLLM-as-a-judgedata.europa.eu

0 comments

The pith

Table schemas degrade narrative quality in LLM-generated dataset descriptions while representative data offers only partial grounding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how different levels of dataset context affect the quality of natural language descriptions produced by LLMs for data search and reuse. It runs a large ablation study across 252 datasets from data.europa.eu, comparing descriptions generated from titles alone, titles plus table schemas, and titles plus schemas plus representative data. The central finding is a consistent schema penalty: schemas tend to lower narrative quality, while adding sample data improves factual grounding but does not raise overall human-facing description quality. Different LLMs also display stable descriptive personas across the tasks. These results supply concrete guidance on context choices for automated metadata workflows.

Core claim

The authors establish that LLMs exhibit a consistent schema penalty in dataset description generation: providing table schemas alone often degrades narrative quality compared with title-only inputs, while adding representative data partially restores grounding and factual accuracy without improving overall human-facing quality metrics.

What carries the argument

The three-way ablation of input context (titles; titles plus schema; titles plus schema plus representative data) evaluated by an LLM-as-a-judge framework and semantic descriptive attribute analysis.

If this is right

Title-only prompts can produce higher narrative quality than prompts that also include table schemas.
Representative data samples improve grounding and factual correctness but leave overall description quality unchanged.
Different LLMs maintain stable descriptive personas when generating dataset metadata.
Data publishing pipelines can reduce context to titles when narrative flow matters more than precise column details.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Workflows that generate descriptions at scale may achieve better results by defaulting to minimal context rather than maximal context.
The observed schema penalty raises the question of whether schema information should be filtered or summarized before being passed to the model.
Stable LLM personas suggest that model choice can be treated as a fixed variable when tuning description generation pipelines.

Load-bearing premise

The LLM-as-a-judge framework together with the semantic attribute analysis correctly measures human-facing description quality.

What would settle it

A controlled human rating study on the same set of generated descriptions that finds no schema penalty or finds that schemas improve quality.

Figures

Figures reproduced from arXiv: 2606.02334 by Arunav Das, Elena Simperl, Johanna Walker, Klaus Diepold, Lisa-Yao Gan.

**Figure 1.** Figure 1: Overview of the experimental methodology and ablation design. We construct progressively richer dataset representations from open data catalogues [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Prompt structure used across all experiments. The only variation across [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Characteristic-level quality changes relative to the Title-only base [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 6.** Figure 6: Excerpted LLaMA-3 descriptions for the dataset “Financial Capability [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 5.** Figure 5: Number of unique descriptive sub-attributes discovered per model [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Dataset search and reuse are strongly constrained by the quality of metadata such as natural language descriptions, which are often sparse or inconsistent. Although large language models (LLMs) can generate such descriptions automatically, little empirical guidance exists on what makes a good dataset description and what dataset context LLMs actually need. We study these questions through a literature-grounded framework of dataset description quality and a large-scale ablation study using 252 datasets (1,336 CSV files) from the European data portal data.europa.eu. We generate descriptions with LLMs in a baseline scenario and two ablation scenarios: (1) using only dataset titles, (2) titles and schema, and (3) titles, schema and representative data, and evaluate them with an LLM-as-a- judge framework and a semantic descriptive attribute analysis grounded in our quality dimensions. Our results reveal a consis- tent schema penalty: table-schemas alone often degrade narrative quality, while representative data partially restores grounding without improving overall human-facing quality. We further show that different LLMs exhibit stable descriptive personas. These findings provide practical guidance for LLM-supported data publishing workflows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Schema context degrades LLM dataset descriptions in their tests, but the LLM-as-judge and semantic metrics lack any shown link to actual human judgments.

read the letter

The main thing here is the schema penalty finding: on 252 datasets, adding table schemas to title-only prompts lowered narrative quality per their measures, while sample data helped grounding but not overall quality. Different LLMs showed stable personas across runs.

What they did is run a clean ablation across three prompt conditions on real European open data, grounded in a literature-derived quality framework. The scale is reasonable for this kind of metadata study, and the practical angle for data portals is clear. They also avoid overclaiming by noting the partial restoration effect from data samples.

The soft spot is the evaluation. Everything rests on LLM-as-a-judge scores plus semantic attribute counts, yet the abstract and methods description give no correlation or agreement numbers against human raters on the same quality dimensions. If those proxies do not track human perception, the penalty result does not directly support guidance for human-facing descriptions. The abstract also withholds the actual score deltas and any error analysis, which makes it hard to judge how large or consistent the effect really is.

This is for people building automated metadata tools or running LLM pipelines for data catalogs. It is incremental rather than foundational, but the ablation is straightforward enough that a referee could check the numbers and the proxy validation in one pass.

I would send it to review with a note to add human correlation data or justify the proxies more explicitly.

Referee Report

2 major / 1 minor

Summary. The paper conducts a large-scale ablation study on 252 datasets (1,336 CSV files) from data.europa.eu to examine how different input contexts (titles alone, titles+schema, titles+schema+representative data) affect LLM-generated dataset descriptions. It defines a literature-grounded quality framework, evaluates outputs via an LLM-as-a-judge framework and semantic descriptive attribute analysis, and reports a consistent 'schema penalty' that degrades narrative quality, with representative data partially restoring grounding but not overall human-facing quality; it also notes stable descriptive personas across LLMs.

Significance. The scale of the ablation (252 datasets) and the identification of stable LLM personas across models are clear strengths that could support reproducible follow-up work if the evaluation pipeline is made fully transparent. If the LLM-as-a-judge and semantic-attribute proxies are shown to track human judgments, the schema-penalty finding would supply actionable guidance for data-publishing workflows. At present the significance is constrained by the absence of any reported validation of those proxies against human raters on the same quality dimensions.

major comments (2)

[Abstract and evaluation paragraph] Abstract and evaluation paragraph: the central claim is framed in terms of 'human-facing quality' and 'practical guidance for LLM-supported data publishing workflows,' yet the LLM-as-a-judge framework and semantic descriptive attribute analysis are presented without any correlation, inter-rater agreement, or other validation against human judgments on the same dimensions. Because the schema penalty and data-restoration effects are measured exclusively through these unvalidated proxies, the practical claim rests on an untested assumption.
[Abstract] Abstract: no quantitative results, error bars, sample sizes per condition, or details on post-hoc exclusions and judge-prompt construction are supplied, so the robustness of the reported 'consistent schema penalty' cannot be assessed from the provided information.

minor comments (1)

[Abstract] The hyphenated line-break 'consis- tent' in the abstract should be corrected to 'consistent'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract and evaluation paragraph] Abstract and evaluation paragraph: the central claim is framed in terms of 'human-facing quality' and 'practical guidance for LLM-supported data publishing workflows,' yet the LLM-as-a-judge framework and semantic descriptive attribute analysis are presented without any correlation, inter-rater agreement, or other validation against human judgments on the same dimensions. Because the schema penalty and data-restoration effects are measured exclusively through these unvalidated proxies, the practical claim rests on an untested assumption.

Authors: We appreciate the referee highlighting the reliance on unvalidated proxies. Our quality framework draws directly from prior literature on dataset metadata, and the LLM-as-a-judge approach enables evaluation at the scale of 252 datasets. We acknowledge that explicit correlation with human ratings on these dimensions would strengthen the practical claims. In revision we will add a limitations subsection that discusses the grounding and assumptions of the proxies, cites supporting prior work on automated evaluation, and notes the absence of direct human validation in the current study. revision: partial
Referee: [Abstract] Abstract: no quantitative results, error bars, sample sizes per condition, or details on post-hoc exclusions and judge-prompt construction are supplied, so the robustness of the reported 'consistent schema penalty' cannot be assessed from the provided information.

Authors: We agree the abstract would be more informative with key quantitative details. The manuscript evaluates 252 datasets (1,336 files) across three conditions with multiple LLMs. We will revise the abstract to state the sample size, note the consistency of the schema penalty, and briefly reference the evaluation pipeline. Full details on judge prompts, any exclusions, and statistical reporting appear in the methods section; we will ensure the abstract points readers there without exceeding length limits. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ablation with external benchmarks

full rationale

The paper performs a large-scale empirical ablation on 252 datasets, generating LLM descriptions under three context conditions and scoring them via LLM-as-a-judge plus a semantic attribute analysis. No equations, fitted parameters, predictions derived from the study's own quantities, or self-citation chains appear; the quality framework is literature-grounded and the results are measured against external data and automated proxies rather than defined in terms of the paper's inputs. This is the normal case of a self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical ablation study; it introduces no free parameters, mathematical axioms, or new postulated entities.

pith-pipeline@v0.9.1-grok · 5733 in / 1013 out tokens · 18597 ms · 2026-06-28T12:00:07.364121+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 13 canonical work pages

[1]

Open data and high-value datasets: step-by-step access guide,

European Commission, “Open data and high-value datasets: step-by-step access guide,” https://digital-strategy.ec.europa.eu/en/factpages/open- data-and-high-value-datasets-step-step-access-guide, 2024, last updated 14 Nov 2024; accessed 2026-01-14

2024
[2]

Dataset search: a survey,

A. Chapman, E. Simperl, L. Koesten, G. Konstantinidis, L.-D. Ib ´a˜nez, E. Kacprzak, and P. Groth, “Dataset search: a survey,”The VLDB Journal, vol. 29, no. 1, pp. 251–272, Jan. 2020

2020
[3]

A dataset describing data discovery and reuse practices in research,

K. Gregory, “A dataset describing data discovery and reuse practices in research,”Scientific Data, vol. 7, no. 1, p. 232, Jul. 2020

2020
[4]

It took longer than i was expecting: Why is dataset search still so hard?

M. Hulsebos, W. Lin, S. Shankar, and A. Parameswaran, “It took longer than i was expecting: Why is dataset search still so hard?” inProceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics, ser. HILDA 24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1–4. [Online]. Available: https://doi.org/10.1145/3665939.3665959

work page doi:10.1145/3665939.3665959 2024
[5]

What are researchers’ needs in data discovery? analysis and ranking of a large- scale collection of crowdsourced use cases,

B. Mathiak, N. Juty, A. Bardi, J. Colomb, and P. Kraker, “What are researchers’ needs in data discovery? analysis and ranking of a large- scale collection of crowdsourced use cases,”Data Science Journal, Feb 2023

2023
[6]

Talking datasets – understanding data sensemaking be- haviours,

L. Koesten, K. Gregory, P. Groth, and E. Simperl, “Talking datasets – understanding data sensemaking be- haviours,”International Journal of Human-Computer Stud- ies, vol. 146, p. 102562, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1071581920301646

2021
[7]

Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?

F. L ¨offler, V . Wesp, B. K ¨onig-Ries, and F. Klan, “Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?”PLOS ONE, vol. 16, no. 3, pp. 1–36, 03 2021. [Online]. Available: https://doi.org/10.1371/journal.pone.0246099

work page doi:10.1371/journal.pone.0246099 2021
[8]

Understanding the nature of metadata: Systematic review,

H. Ulrich, A.-K. Kock-Schoppenhauer, N. Deppenwiese, R. G ¨ott, J. Kern, M. Lablans, R. W. Majeed, M. R. St ¨ohr, J. Stausberg, J. Varghese, M. Dugas, and J. Ingenerf, “Understanding the nature of metadata: Systematic review,”J Med Internet Res, vol. 24, no. 1, p. e25440, Jan 2022. [Online]. Available: https://www.jmir.org/2022/1/e25440

2022
[9]

The trials and tribulations of working with structured data: - a study on information seeking behaviour,

L. M. Koesten, E. Kacprzak, J. F. A. Tennison, and E. Simperl, “The trials and tribulations of working with structured data: - a study on information seeking behaviour,” inProceedings of the 2017 CHI Conference on Human Factors in Computing Systems, ser. CHI ’17. New York, NY , USA: Association for Computing Machinery, 2017, p. 1277–1289. [Online]. Availa...

work page doi:10.1145/3025453.3025838 2017
[10]

Autoddg: Automated dataset description generation using large language models,

H. Zhang, Y . Liu, A. Santos, W.-L. Hung, and J. Freire, “Autoddg: Automated dataset description generation using large language models,”
[11]

Available: https://arxiv.org/abs/2502.01050

[Online]. Available: https://arxiv.org/abs/2502.01050

arXiv
[12]

Data prompting: Assessing the potential of conversational generative ai for dataset discovery,

J. Walker, E. Koutsiana, A. Das, J. Massey, G. Thuermer, and E. Simperl, “Data prompting: Assessing the potential of conversational generative ai for dataset discovery,”SSRN Electronic Journal, 2024, manuscript PATTERNS-D-24-00133. [Online]. Available: https://ssrn.com/abstract=4928179

2024
[13]

Keywords are not always the key: A metadata field analysis for natural language search on open data portals,

L.-Y . Gan, A. Das, J. Walker, and E. Simperl, “Keywords are not always the key: A metadata field analysis for natural language search on open data portals,” 2025. [Online]. Available: https://arxiv.org/abs/2509.14457

arXiv 2025
[14]

The impact of modern AI in metadata management,

W. Yang, R. Fu, M. B. Amin, and B. Kang, “The impact of modern AI in metadata management,”Human-Centric Intelligent Systems, vol. 5, no. 3, pp. 323–350, Sep. 2025

2025
[15]

Pre-meta: priors- augmented retrieval for llm-based metadata generation,

P. Tinn, S. Sørbø, S. Jiang, K. V outetakis, S. M. Giounis, E. Pilalis, O. Papadodima, and D. Roman, “Pre-meta: priors- augmented retrieval for llm-based metadata generation,”Bioinformatics, vol. 41, no. 10, p. btaf519, 09 2025. [Online]. Available: https://doi.org/10.1093/bioinformatics/btaf519

work page doi:10.1093/bioinformatics/btaf519 2025
[16]

Enhancing open data findability: Fine- tuning llms(t5) for metadata generation,

U. Ahmed and A. Polini, “Enhancing open data findability: Fine- tuning llms(t5) for metadata generation,”Conference on Digital Government Research, vol. 26, May 2025. [Online]. Available: https://proceedings.open.tudelft.nl/DGO2025/article/view/941

2025
[17]

Large language models can extract metadata for annotation of human neuroimaging publications,

M. D. Turner, A. Appaji, N. Ar Rakib, P. Golnari, A. K. Rajasekar, A. R. K, V , S. S. Sahoo, Y . Wang, L. Wang, and J. A. Turner, “Large language models can extract metadata for annotation of human neuroimaging publications,”Front Neuroinform, vol. 19, p. 1609077, Aug. 2025

2025
[18]

Metadata generation and evaluation using llms - case study on canonical titles,

S. Zhu, S. Simonovikj, D. Edmonds, and Y . Sun, “Metadata generation and evaluation using llms - case study on canonical titles,” in Proceedings of the Nineteenth ACM Conference on Recommender Systems, ser. RecSys ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 1010–1013. [Online]. Available: https://doi.org/10.1145/3705328.3748100

work page doi:10.1145/3705328.3748100 2025
[19]

Looking ahead: The research nexus and the state of metadata in 2050,

E. Pentz, M. Rittman, and D. Tkaczyk, “Looking ahead: The research nexus and the state of metadata in 2050,”Science Editor, vol. 48, no. 1, pp. 19–21, 2025

2050
[20]

APPLS: Evaluating evaluation metrics for plain language summarization,

Y . Guo, T. August, G. Leroy, T. Cohen, and L. L. Wang, “APPLS: Evaluating evaluation metrics for plain language summarization,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 9194–9211....

2024
[21]

Bernstein

R. Lin, B. Chopra, W. Lin, S. Shankar, M. Hulsebos, and A. G. Parameswaran, “Rethinking dataset discovery with datascout,” in Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, ser. UIST ’25. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi.org/10.1145/3746059.3747727

work page doi:10.1145/3746059.3747727 2025
[22]

Characterising dataset search—an analysis of search logs and data requests,

E. Kacprzak, L. Koesten, L.-D. Ib ´a˜nez, T. Blount, J. Tennison, and E. Simperl, “Characterising dataset search—an analysis of search logs and data requests,”Journal of Web Semantics, vol. 55, pp. 37–55, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1570826818300556

2019
[23]

The role of metadata in reproducible computational research,

J. Leipzig, D. N ¨ust, C. T. Hoyt, K. Ram, and J. Greenberg, “The role of metadata in reproducible computational research,” Patterns, vol. 2, no. 9, p. 100322, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2666389921001707

2021
[24]

A metadata schema for data objects in clinical research,

S. Canham and C. Ohmann, “A metadata schema for data objects in clinical research,”Trials, vol. 17, no. 1, p. 557, Nov. 2016

2016
[25]

Metadata standard for continuous preservation, discovery, and reuse of research data in repositories by higher education institutions: A systematic review,

N. F. Mosha and P. Ngulube, “Metadata standard for continuous preservation, discovery, and reuse of research data in repositories by higher education institutions: A systematic review,”Information, vol. 14, no. 8, 2023. [Online]. Available: https://www.mdpi.com/2078- 2489/14/8/427

2023
[26]

Everything you always wanted to know about a dataset: Studies in data summarisation,

L. Koesten, E. Simperl, T. Blount, E. Kacprzak, and J. Tennison, “Everything you always wanted to know about a dataset: Studies in data summarisation,”International Journal of Human- Computer Studies, vol. 135, p. 102367, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1071581918306153

2020
[27]

Integration patterns in the use of metadata for data sense-making during relevance evaluation: An interpretable deep learning-based prediction,

Q. Li, P. Wang, C. Liu, X. Li, and J. Hou, “Integration patterns in the use of metadata for data sense-making during relevance evaluation: An interpretable deep learning-based prediction,” Journal of the Association for Information Science and Technology, vol. 76, no. 3, pp. 621–641, 2025. [Online]. Available: https://asistdl.onlinelibrary.wiley.com/doi/a...

work page doi:10.1002/asi.24961 2025
[28]

Fair data pipeline: provenance-driven data management for traceable scientific workflows,

S. N. Mitchell, A. Lahiff, N. Cummings, J. Hollocombe, B. Boskamp, R. Field, D. Reddyhoff, K. Zarebski, A. Wilson, B. Viola, M. Burke, B. Archibald, P. Bessell, R. Blackwell, L. A. Boden, A. Brett, S. Brett, R. Dundas, J. Enright, A. N. Gonzalez-Beltran, C. Harris, I. Hinder, C. David Hughes, M. Knight, V . Mano, C. McMonagle, D. Mellor, S. Mohr, G. Mario...

work page doi:10.1098/rsta.2021.0300 2022
[29]

Practices do not make perfect: Disciplinary data sharing and reuse practices and their implications for repository data curation,

I. M. Faniel and E. Yakel, “Practices do not make perfect: Disciplinary data sharing and reuse practices and their implications for repository data curation,” inCurating Research Data, Volume One: Practical Strategies for Your Digital Repository. Chicago, Illinois: Association of College and Research Libraries, 2017, pp. 103–126

2017
[30]

Understanding data search as a socio-technical practice,

K. M. Gregory, H. Cousijn, P. Groth, A. Scharnhorst, and S. Wyatt, “Understanding data search as a socio-technical practice,”Journal of Information Science, vol. 46, no. 4, pp. 459–475, 2020. [Online]. Available: https://doi.org/10.1177/0165551519837182

work page doi:10.1177/0165551519837182 2020
[31]

Data reusers’ trust development,

A. Yoon, “Data reusers’ trust development,”Journal of the Association for Information Science and Technology, vol. 68, no. 4, pp. 946–956, 2017. [Online]. Available: https://asistdl.onlinelibrary.wiley.com/doi/abs/10.1002/asi.23730

work page doi:10.1002/asi.23730 2017
[32]

On the reuse of scientific data,

I. V . Pasquetto, B. M. Randles, and C. L. Borgman, “On the reuse of scientific data,”Data Science Journal, Mar 2017

2017
[33]

The conundrum of sharing research data,

C. L. Borgman, “The conundrum of sharing research data,” Journal of the American Society for Information Science and Technology, vol. 63, no. 6, pp. 1059–1078, 2012. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.22634

work page doi:10.1002/asi.22634 2012
[34]

Exploratory search: from finding to understanding,

G. Marchionini, “Exploratory search: from finding to understanding,” Commun. ACM, vol. 49, no. 4, p. 41–46, Apr. 2006. [Online]. Available: https://doi.org/10.1145/1121949.1121979

work page doi:10.1145/1121949.1121979 2006
[35]

Are there any differences in data set retrieval compared to well-known literature retrieval?

D. Kern and B. Mathiak, “Are there any differences in data set retrieval compared to well-known literature retrieval?” inResearch and Advanced Technology for Digital Libraries, ser. Lecture Notes in Computer Science. Cham: Springer, 2015, vol. 9316, pp. 197–208. [Online]. Available: http://kups.ub.uni-koeln.de/9359/

2015
[36]

What are data? the many kinds of data and their implications for data re-use,

S. Carlson and B. Anderson, “What are data? the many kinds of data and their implications for data re-use,”Journal of Computer-Mediated Communication, vol. 12, no. 2, pp. 635–651, 01 2007. [Online]. Available: https://doi.org/10.1111/j.1083-6101.2007.00342.x

work page doi:10.1111/j.1083-6101.2007.00342.x 2007
[37]

Unfolding the downloads of datasets: A multifaceted exploration of influencing factors,

Z. Liu, P. Luo, X. Tang, J. Wang, and L. Nie, “Unfolding the downloads of datasets: A multifaceted exploration of influencing factors,”Scientific Data, vol. 11, no. 1, p. 760, Jul. 2024

2024
[38]

User centred methods for measuring the value of open data,

J. Walker, M. Frank, and N. Thompson, “User centred methods for measuring the value of open data,” inOpen Data Research Symposium 2015 (27/05/15 - 27/05/15), May 2015. [Online]. Available: https://eprints.soton.ac.uk/375700/

2015
[39]

Improving discoverability of open government data with rich metadata descriptions using semantic government vocabulary,

P. K ˇremen and M. Ne ˇcask´y, “Improving discoverability of open government data with rich metadata descriptions using semantic government vocabulary,”Journal of Web Semantics, vol. 55, pp. 1–20, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1570826818300714

2019

[1] [1]

Open data and high-value datasets: step-by-step access guide,

European Commission, “Open data and high-value datasets: step-by-step access guide,” https://digital-strategy.ec.europa.eu/en/factpages/open- data-and-high-value-datasets-step-step-access-guide, 2024, last updated 14 Nov 2024; accessed 2026-01-14

2024

[2] [2]

Dataset search: a survey,

A. Chapman, E. Simperl, L. Koesten, G. Konstantinidis, L.-D. Ib ´a˜nez, E. Kacprzak, and P. Groth, “Dataset search: a survey,”The VLDB Journal, vol. 29, no. 1, pp. 251–272, Jan. 2020

2020

[3] [3]

A dataset describing data discovery and reuse practices in research,

K. Gregory, “A dataset describing data discovery and reuse practices in research,”Scientific Data, vol. 7, no. 1, p. 232, Jul. 2020

2020

[4] [4]

It took longer than i was expecting: Why is dataset search still so hard?

M. Hulsebos, W. Lin, S. Shankar, and A. Parameswaran, “It took longer than i was expecting: Why is dataset search still so hard?” inProceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics, ser. HILDA 24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1–4. [Online]. Available: https://doi.org/10.1145/3665939.3665959

work page doi:10.1145/3665939.3665959 2024

[5] [5]

What are researchers’ needs in data discovery? analysis and ranking of a large- scale collection of crowdsourced use cases,

B. Mathiak, N. Juty, A. Bardi, J. Colomb, and P. Kraker, “What are researchers’ needs in data discovery? analysis and ranking of a large- scale collection of crowdsourced use cases,”Data Science Journal, Feb 2023

2023

[6] [6]

Talking datasets – understanding data sensemaking be- haviours,

L. Koesten, K. Gregory, P. Groth, and E. Simperl, “Talking datasets – understanding data sensemaking be- haviours,”International Journal of Human-Computer Stud- ies, vol. 146, p. 102562, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1071581920301646

2021

[7] [7]

Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?

F. L ¨offler, V . Wesp, B. K ¨onig-Ries, and F. Klan, “Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?”PLOS ONE, vol. 16, no. 3, pp. 1–36, 03 2021. [Online]. Available: https://doi.org/10.1371/journal.pone.0246099

work page doi:10.1371/journal.pone.0246099 2021

[8] [8]

Understanding the nature of metadata: Systematic review,

H. Ulrich, A.-K. Kock-Schoppenhauer, N. Deppenwiese, R. G ¨ott, J. Kern, M. Lablans, R. W. Majeed, M. R. St ¨ohr, J. Stausberg, J. Varghese, M. Dugas, and J. Ingenerf, “Understanding the nature of metadata: Systematic review,”J Med Internet Res, vol. 24, no. 1, p. e25440, Jan 2022. [Online]. Available: https://www.jmir.org/2022/1/e25440

2022

[9] [9]

The trials and tribulations of working with structured data: - a study on information seeking behaviour,

L. M. Koesten, E. Kacprzak, J. F. A. Tennison, and E. Simperl, “The trials and tribulations of working with structured data: - a study on information seeking behaviour,” inProceedings of the 2017 CHI Conference on Human Factors in Computing Systems, ser. CHI ’17. New York, NY , USA: Association for Computing Machinery, 2017, p. 1277–1289. [Online]. Availa...

work page doi:10.1145/3025453.3025838 2017

[10] [10]

Autoddg: Automated dataset description generation using large language models,

H. Zhang, Y . Liu, A. Santos, W.-L. Hung, and J. Freire, “Autoddg: Automated dataset description generation using large language models,”

[11] [11]

Available: https://arxiv.org/abs/2502.01050

[Online]. Available: https://arxiv.org/abs/2502.01050

arXiv

[12] [12]

Data prompting: Assessing the potential of conversational generative ai for dataset discovery,

J. Walker, E. Koutsiana, A. Das, J. Massey, G. Thuermer, and E. Simperl, “Data prompting: Assessing the potential of conversational generative ai for dataset discovery,”SSRN Electronic Journal, 2024, manuscript PATTERNS-D-24-00133. [Online]. Available: https://ssrn.com/abstract=4928179

2024

[13] [13]

Keywords are not always the key: A metadata field analysis for natural language search on open data portals,

L.-Y . Gan, A. Das, J. Walker, and E. Simperl, “Keywords are not always the key: A metadata field analysis for natural language search on open data portals,” 2025. [Online]. Available: https://arxiv.org/abs/2509.14457

arXiv 2025

[14] [14]

The impact of modern AI in metadata management,

W. Yang, R. Fu, M. B. Amin, and B. Kang, “The impact of modern AI in metadata management,”Human-Centric Intelligent Systems, vol. 5, no. 3, pp. 323–350, Sep. 2025

2025

[15] [15]

Pre-meta: priors- augmented retrieval for llm-based metadata generation,

P. Tinn, S. Sørbø, S. Jiang, K. V outetakis, S. M. Giounis, E. Pilalis, O. Papadodima, and D. Roman, “Pre-meta: priors- augmented retrieval for llm-based metadata generation,”Bioinformatics, vol. 41, no. 10, p. btaf519, 09 2025. [Online]. Available: https://doi.org/10.1093/bioinformatics/btaf519

work page doi:10.1093/bioinformatics/btaf519 2025

[16] [16]

Enhancing open data findability: Fine- tuning llms(t5) for metadata generation,

U. Ahmed and A. Polini, “Enhancing open data findability: Fine- tuning llms(t5) for metadata generation,”Conference on Digital Government Research, vol. 26, May 2025. [Online]. Available: https://proceedings.open.tudelft.nl/DGO2025/article/view/941

2025

[17] [17]

Large language models can extract metadata for annotation of human neuroimaging publications,

M. D. Turner, A. Appaji, N. Ar Rakib, P. Golnari, A. K. Rajasekar, A. R. K, V , S. S. Sahoo, Y . Wang, L. Wang, and J. A. Turner, “Large language models can extract metadata for annotation of human neuroimaging publications,”Front Neuroinform, vol. 19, p. 1609077, Aug. 2025

2025

[18] [18]

Metadata generation and evaluation using llms - case study on canonical titles,

S. Zhu, S. Simonovikj, D. Edmonds, and Y . Sun, “Metadata generation and evaluation using llms - case study on canonical titles,” in Proceedings of the Nineteenth ACM Conference on Recommender Systems, ser. RecSys ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 1010–1013. [Online]. Available: https://doi.org/10.1145/3705328.3748100

work page doi:10.1145/3705328.3748100 2025

[19] [19]

Looking ahead: The research nexus and the state of metadata in 2050,

E. Pentz, M. Rittman, and D. Tkaczyk, “Looking ahead: The research nexus and the state of metadata in 2050,”Science Editor, vol. 48, no. 1, pp. 19–21, 2025

2050

[20] [20]

APPLS: Evaluating evaluation metrics for plain language summarization,

Y . Guo, T. August, G. Leroy, T. Cohen, and L. L. Wang, “APPLS: Evaluating evaluation metrics for plain language summarization,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 9194–9211....

2024

[21] [21]

Bernstein

R. Lin, B. Chopra, W. Lin, S. Shankar, M. Hulsebos, and A. G. Parameswaran, “Rethinking dataset discovery with datascout,” in Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, ser. UIST ’25. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi.org/10.1145/3746059.3747727

work page doi:10.1145/3746059.3747727 2025

[22] [22]

Characterising dataset search—an analysis of search logs and data requests,

E. Kacprzak, L. Koesten, L.-D. Ib ´a˜nez, T. Blount, J. Tennison, and E. Simperl, “Characterising dataset search—an analysis of search logs and data requests,”Journal of Web Semantics, vol. 55, pp. 37–55, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1570826818300556

2019

[23] [23]

The role of metadata in reproducible computational research,

J. Leipzig, D. N ¨ust, C. T. Hoyt, K. Ram, and J. Greenberg, “The role of metadata in reproducible computational research,” Patterns, vol. 2, no. 9, p. 100322, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2666389921001707

2021

[24] [24]

A metadata schema for data objects in clinical research,

S. Canham and C. Ohmann, “A metadata schema for data objects in clinical research,”Trials, vol. 17, no. 1, p. 557, Nov. 2016

2016

[25] [25]

Metadata standard for continuous preservation, discovery, and reuse of research data in repositories by higher education institutions: A systematic review,

N. F. Mosha and P. Ngulube, “Metadata standard for continuous preservation, discovery, and reuse of research data in repositories by higher education institutions: A systematic review,”Information, vol. 14, no. 8, 2023. [Online]. Available: https://www.mdpi.com/2078- 2489/14/8/427

2023

[26] [26]

Everything you always wanted to know about a dataset: Studies in data summarisation,

L. Koesten, E. Simperl, T. Blount, E. Kacprzak, and J. Tennison, “Everything you always wanted to know about a dataset: Studies in data summarisation,”International Journal of Human- Computer Studies, vol. 135, p. 102367, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1071581918306153

2020

[27] [27]

Integration patterns in the use of metadata for data sense-making during relevance evaluation: An interpretable deep learning-based prediction,

Q. Li, P. Wang, C. Liu, X. Li, and J. Hou, “Integration patterns in the use of metadata for data sense-making during relevance evaluation: An interpretable deep learning-based prediction,” Journal of the Association for Information Science and Technology, vol. 76, no. 3, pp. 621–641, 2025. [Online]. Available: https://asistdl.onlinelibrary.wiley.com/doi/a...

work page doi:10.1002/asi.24961 2025

[28] [28]

Fair data pipeline: provenance-driven data management for traceable scientific workflows,

S. N. Mitchell, A. Lahiff, N. Cummings, J. Hollocombe, B. Boskamp, R. Field, D. Reddyhoff, K. Zarebski, A. Wilson, B. Viola, M. Burke, B. Archibald, P. Bessell, R. Blackwell, L. A. Boden, A. Brett, S. Brett, R. Dundas, J. Enright, A. N. Gonzalez-Beltran, C. Harris, I. Hinder, C. David Hughes, M. Knight, V . Mano, C. McMonagle, D. Mellor, S. Mohr, G. Mario...

work page doi:10.1098/rsta.2021.0300 2022

[29] [29]

Practices do not make perfect: Disciplinary data sharing and reuse practices and their implications for repository data curation,

I. M. Faniel and E. Yakel, “Practices do not make perfect: Disciplinary data sharing and reuse practices and their implications for repository data curation,” inCurating Research Data, Volume One: Practical Strategies for Your Digital Repository. Chicago, Illinois: Association of College and Research Libraries, 2017, pp. 103–126

2017

[30] [30]

Understanding data search as a socio-technical practice,

K. M. Gregory, H. Cousijn, P. Groth, A. Scharnhorst, and S. Wyatt, “Understanding data search as a socio-technical practice,”Journal of Information Science, vol. 46, no. 4, pp. 459–475, 2020. [Online]. Available: https://doi.org/10.1177/0165551519837182

work page doi:10.1177/0165551519837182 2020

[31] [31]

Data reusers’ trust development,

A. Yoon, “Data reusers’ trust development,”Journal of the Association for Information Science and Technology, vol. 68, no. 4, pp. 946–956, 2017. [Online]. Available: https://asistdl.onlinelibrary.wiley.com/doi/abs/10.1002/asi.23730

work page doi:10.1002/asi.23730 2017

[32] [32]

On the reuse of scientific data,

I. V . Pasquetto, B. M. Randles, and C. L. Borgman, “On the reuse of scientific data,”Data Science Journal, Mar 2017

2017

[33] [33]

The conundrum of sharing research data,

C. L. Borgman, “The conundrum of sharing research data,” Journal of the American Society for Information Science and Technology, vol. 63, no. 6, pp. 1059–1078, 2012. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.22634

work page doi:10.1002/asi.22634 2012

[34] [34]

Exploratory search: from finding to understanding,

G. Marchionini, “Exploratory search: from finding to understanding,” Commun. ACM, vol. 49, no. 4, p. 41–46, Apr. 2006. [Online]. Available: https://doi.org/10.1145/1121949.1121979

work page doi:10.1145/1121949.1121979 2006

[35] [35]

Are there any differences in data set retrieval compared to well-known literature retrieval?

D. Kern and B. Mathiak, “Are there any differences in data set retrieval compared to well-known literature retrieval?” inResearch and Advanced Technology for Digital Libraries, ser. Lecture Notes in Computer Science. Cham: Springer, 2015, vol. 9316, pp. 197–208. [Online]. Available: http://kups.ub.uni-koeln.de/9359/

2015

[36] [36]

What are data? the many kinds of data and their implications for data re-use,

S. Carlson and B. Anderson, “What are data? the many kinds of data and their implications for data re-use,”Journal of Computer-Mediated Communication, vol. 12, no. 2, pp. 635–651, 01 2007. [Online]. Available: https://doi.org/10.1111/j.1083-6101.2007.00342.x

work page doi:10.1111/j.1083-6101.2007.00342.x 2007

[37] [37]

Unfolding the downloads of datasets: A multifaceted exploration of influencing factors,

Z. Liu, P. Luo, X. Tang, J. Wang, and L. Nie, “Unfolding the downloads of datasets: A multifaceted exploration of influencing factors,”Scientific Data, vol. 11, no. 1, p. 760, Jul. 2024

2024

[38] [38]

User centred methods for measuring the value of open data,

J. Walker, M. Frank, and N. Thompson, “User centred methods for measuring the value of open data,” inOpen Data Research Symposium 2015 (27/05/15 - 27/05/15), May 2015. [Online]. Available: https://eprints.soton.ac.uk/375700/

2015

[39] [39]

Improving discoverability of open government data with rich metadata descriptions using semantic government vocabulary,

P. K ˇremen and M. Ne ˇcask´y, “Improving discoverability of open government data with rich metadata descriptions using semantic government vocabulary,”Journal of Web Semantics, vol. 55, pp. 1–20, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1570826818300714

2019