A masked-token hit-rate comparison method detects pretraining data membership in black-box LLMs with performance comparable to white-box approaches.
Is chatgpt a highly fluent grammatical error correction system? a comprehensive evaluation
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
roles
background 2representative citing papers
Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
citing papers explorer
No citing papers match the current filters.