{"total":28,"items":[{"citing_arxiv_id":"2607.00601","ref_index":23,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"\"Don't Say It!\": Constraints, Compliance, and Communication when Language Models Play Taboo","primary_cat":"cs.CL","submitted_at":"2026-07-01T08:27:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLMs exhibit different trade-offs between rule compliance and communicative success across prompting, generation constraints, and representation interventions, but remain substantially weaker than humans at guessing under lexical constraints.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.24501","ref_index":27,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"UOL@IDEM at BEA 2026 Shared Task 1: Neural Fusion and Feature-Rich Modeling for L1-Aware Vocabulary Difficulty Prediction","primary_cat":"cs.CL","submitted_at":"2026-06-23T12:30:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A feature-rich regression model using multilingual embeddings and features for frequency, cognate similarity, and predictability reports RMSE scores of 1.132, 1.037, and 0.891 for L1-aware vocabulary difficulty prediction on Spanish, German, and Chinese.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.24055","ref_index":6,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Best Preprocessing Techniques for Sentiment Analysis","primary_cat":"cs.CL","submitted_at":"2026-06-23T02:00:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Empirical comparison finds tokenization most important and recommends specific preprocessing order for Twitter sentiment analysis models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.20993","ref_index":22,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Phonemes to the Rescue: Multilingual Tokenization Based on International Phonetic Alphabet","primary_cat":"cs.CL","submitted_at":"2026-06-18T23:50:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"IPA-based subword tokenizers trained across 24 languages improve tokenization quality and generalization to unseen languages compared to standard text tokenizers, especially for non-Latin scripts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12234","ref_index":201,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"On The Effectiveness-Fluency Trade-Off In LLM Conditioning: A Systematic Study","primary_cat":"cs.CL","submitted_at":"2026-06-10T15:42:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Systematic experiments reveal that activation steering trades fluency for concept control, is less effective on instruction-tuned models, and that prompting/SFT excel at injection but not removal, with textual metrics correlating to LLM judges.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08025","ref_index":28,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Arabic Sentence Segmentation Across Genres and Punctuation Conditions","primary_cat":"cs.CL","submitted_at":"2026-06-06T07:37:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AraSEG is a genre-diverse Arabic sentence segmentation corpus showing lightweight encoders and dependency parsers outperform LLMs under challenging punctuation while improving downstream parsing.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04755","ref_index":4,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Archi: Agentic Operations at the CMS Experiment","primary_cat":"hep-ex","submitted_at":"2026-06-03T11:38:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Archi deploys configurable agents on ingested documentation, historical data, and live monitoring to support CMS computing operators at CERN, with positive results on real queries and competitive performance from local open-weight models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00334","ref_index":64,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning","primary_cat":"cs.CL","submitted_at":"2026-05-29T20:19:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26431","ref_index":13,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent","primary_cat":"cs.CL","submitted_at":"2026-05-26T01:36:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Structural probes on UD-invariant wh-movement stimuli reveal phase-count gradients and phase-internal cohesion effects in 12-13 of 13 LLMs, indicating syntactic abstractions beyond UD annotations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21303","ref_index":64,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach","primary_cat":"cs.LG","submitted_at":"2026-05-20T15:33:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Introduces Causal Functional Signatures grounded in causal evidence and ILP-learned architectural signatures to enable explicit, comparable, and portable mechanistic claims across model scales.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15886","ref_index":51,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches","primary_cat":"cs.CL","submitted_at":"2026-05-15T12:09:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A new linked multimodal dataset of Russian domestic and foreign policy speeches with texts, images, captions, harmonized metadata, and expert-refined topic annotations is introduced to support analyses in political communication and LLM applications.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14705","ref_index":28,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Towards Continuous Sign Language Conversation from Isolated Signs","primary_cat":"cs.CV","submitted_at":"2026-05-14T11:22:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Constructs continuous sign conversation data from isolated signs using retrieval and diffusion models to train a direct sign-to-sign conversational AI.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"[26] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840-6851, 2020. [27] Julie Hochgesang, OA Crasborn, and Diane Lillo-Martin. Building the asl signbank. lemmatization principles for asl. 2018. doi: 10.6084/m9.figshare.9741788. URL http://aslsignbank.haskins. yale.edu. [28] Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. spaCy: Industrial-strength natural language processing in python, 2020. URLhttps://doi.org/10.5281/zenodo.1212303. [29] Glenn Jocher, Jing Qiu, and Ayush Chaurasia. Ultralytics YOLO, January 2023. URL https://github. com/ultralytics/ultralytics. [30] Youngmin Kim and Hyeongboo Baek."},{"citing_arxiv_id":"2605.12933","ref_index":56,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ATD-Trans: A Geographically Grounded Japanese-English Travelogue Translation Dataset","primary_cat":"cs.CL","submitted_at":"2026-05-13T03:11:54+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ATD-Trans is a new geographically annotated Japanese-English travelogue dataset that reveals Japanese-enhanced models perform better on geo-entity translation while domestic Japanese locations remain harder to translate accurately.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09236","ref_index":21,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Matching Meaning at Scale: Evaluating Semantic Search for 18th-Century Intellectual History through the Case of Locke","primary_cat":"cs.CL","submitted_at":"2026-05-10T00:34:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Semantic search retrieves substantially more implicit receptions of Locke's work than lexical baselines in 18th-century corpora, yet remains constrained by lexical gatekeeping.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03414","ref_index":31,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Geolocating News about Extreme Climate Events: A Comparative Analysis of Off-the-Shelf Tools for Toponym Identification in German","primary_cat":"cs.CL","submitted_at":"2026-05-05T06:40:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Off-the-shelf German NER tools produce divergent toponym sets that lead to distinct country assignments for climate event news, affecting assessments of national prominence in media coverage.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"and were annotated by humans with the type of event they discuss and the country (or countries) where the event of interest happened. The number of characters per text ranges from 178 to 11,512 with, on average, 2,468. ModelsThe analysis assesses three popular off-the-shelf tools that perform NER in German: Flair [29], using model de-ner-large,3 Spacy [30], using model de_core_news_lg,4 and Stanza [31], using model de.5 All tools were trained with four labels (for person, location, organisation and other). In this study, only the label LOC for location is relevant. Using LLMs is also a possibility, but not the focus of this study, as they are not NER toolsby design. Geographical databaseThe latitude and longitude coordinates with their respective country for"},{"citing_arxiv_id":"2605.00607","ref_index":23,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe","primary_cat":"cs.CL","submitted_at":"2026-05-01T12:19:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24334","ref_index":18,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering","primary_cat":"cs.CL","submitted_at":"2026-04-27T11:23:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24223","ref_index":27,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Mapping Emerging Climate Misinformation Playbooks in the Global South","primary_cat":"cs.SI","submitted_at":"2026-04-27T09:28:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Brazilian YouTube climate videos show a transition from traditional denial of climate science to 'new denial' that undermines solutions, with the latter attracting more engagement from diverse actors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20982","ref_index":20,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MediaGraph: A Network Theoretic Framework to Analyze Reporting Preferences in Indian News Media","primary_cat":"cs.SI","submitted_at":"2026-04-22T18:10:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MediaGraph uses co-occurrence networks from Indian news on farmer protests and a new link predictability metric to reveal source-specific reporting preferences and under-representation of farmer leaders.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18835","ref_index":49,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring","primary_cat":"cs.CL","submitted_at":"2026-04-20T20:59:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLMs exhibit positional bias and context-dependent scoring patterns when judging document similarity, with each model showing a stable scoring fingerprint but a shared hierarchy of sensitivity to different semantic perturbations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12064","ref_index":11,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"LLM-Redactor: An Empirical Evaluation of Eight Techniques for Privacy-Preserving LLM Requests","primary_cat":"cs.CR","submitted_at":"2026-04-13T21:05:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"No single privacy technique wins; combining local inference, redaction, and semantic rephrasing limits PII leaks to 0.6% and proprietary code leaks to 31.3% on a 1,300-sample benchmark, with code released.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.11496","ref_index":14,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference","primary_cat":"cs.CV","submitted_at":"2026-04-13T14:03:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Dual-encoder VLMs gain robust compositional generalization by learning localized alignments from frozen patch and token embeddings instead of using global similarity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19768","ref_index":20,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-03-27T05:33:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLMs display a consistent pattern of elevated form-meaning divergence and uniform rhetorical device use in argumentative texts compared to humans, quantified by new metrics FMD, GPR, and RDDE.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.16571","ref_index":20,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset","primary_cat":"cs.CL","submitted_at":"2026-02-18T16:12:46+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.16719","ref_index":44,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"SAM 3: Segment Anything with Concepts","primary_cat":"cs.CV","submitted_at":"2025-11-20T18:59:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.06668","ref_index":11,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Contradictions in Context: Challenges for Retrieval-Augmented Generation in Healthcare","primary_cat":"cs.IR","submitted_at":"2025-11-10T03:27:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Contradictions between highly similar medical abstracts degrade the factual accuracy and consistency of LLM responses in retrieval-augmented generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.01101","ref_index":31,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"TSVer: A Benchmark for Fact Verification Against Time-Series Evidence","primary_cat":"cs.CL","submitted_at":"2025-11-02T22:33:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TSVer is a new benchmark dataset for fact verification against time-series evidence, with 304 annotated real-world claims, 400 time series, verdicts, and justifications, plus baseline results showing current models struggle.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.05086","ref_index":56,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A systematic framework for generating novel experimental hypotheses from language models","primary_cat":"cs.CL","submitted_at":"2024-08-09T14:17:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A framework using language models to simulate non-existent experiments and derive novel testable hypotheses on dative verb acquisition and cross-structural generalization in children.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}