Retrieval-augmented LLMs produce more cautious and guideline-aligned recommendations on cannabidiol for older adults than standalone models, demonstrated via automated evaluation on 64 diverse scenarios.
Large language models for data annotation and synthesis: A survey
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
Releases the first large-scale expert-annotated corpus of Chinese central government policy directives (1949-2023) labeled with a five-color clear/ambiguous taxonomy and high inter-annotator reliability.
Generative AI enables scalable, context-aware spear phishing by extracting profiles from public social media, producing emails that outperform real-world phishing samples in personalization and lower recipient suspicion.
LLM agents can reconstruct high-fidelity personal profiles from minimal PII seeds with over 90% accuracy in under 10 minutes at less than $3 cost, exposing three escalating tiers of privacy risks.
GMRL-BD detects untrustworthy topic boundaries for black-box LLMs by combining bias-diffusion on a Wikipedia KG with multi-agent RL, supported by a released dataset labeling biases in models like Llama2 and Qwen2.
A new leaf-instance dataset for soybean-cotton detection and segmentation collected across growth stages and conditions from commercial farms is presented and validated with YOLOv11.
An LLM-as-a-judge evaluation framework for math reasoning outperforms symbolic methods by accurately assessing diverse answer representations and formats.
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.
The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.
citing papers explorer
-
Retrieval-Augmented Large Language Models for Evidence-Informed Guidance on Cannabidiol Use in Older Adults
Retrieval-augmented LLMs produce more cautious and guideline-aligned recommendations on cannabidiol for older adults than standalone models, demonstrated via automated evaluation on 64 diverse scenarios.
-
CAPC-CG: A Large-Scale, Expert-Directed LLM-Annotated Corpus of Adaptive Policy Communication in China
Releases the first large-scale expert-annotated corpus of Chinese central government policy directives (1949-2023) labeled with a five-color clear/ambiguous taxonomy and high inter-annotator reliability.
-
Context-Aware Spear Phishing: Generative AI-Enabled Attacks Against Individuals via Public Social Media Data
Generative AI enables scalable, context-aware spear phishing by extracting profiles from public social media, producing emails that outperform real-world phishing samples in personalization and lower recipient suspicion.
-
Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents
LLM agents can reconstruct high-fidelity personal profiles from minimal PII seeds with over 90% accuracy in under 10 minutes at less than $3 cost, exposing three escalating tiers of privacy risks.
-
Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning
GMRL-BD detects untrustworthy topic boundaries for black-box LLMs by combining bias-diffusion on a Wikipedia KG with multi-agent RL, supported by a released dataset labeling biases in models like Llama2 and Qwen2.
-
A Leaf-Level Dataset for Soybean-Cotton Detection and Segmentation
A new leaf-instance dataset for soybean-cotton detection and segmentation collected across growth stages and conditions from commercial farms is presented and validated with YOLOv11.
-
Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity
An LLM-as-a-judge evaluation framework for math reasoning outperforms symbolic methods by accurately assessing diverse answer representations and formats.
-
A Survey of Scaling in Large Language Model Reasoning
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.