Language flips which jailbreaks work on frontier MLLMs
Spanish reduces role-play success but increases visual attack success, reversing safety rankings across models.
full image
Computation and Language
Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.
Spanish reduces role-play success but increases visual attack success, reversing safety rankings across models.
full image
HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models
Tests on gridworlds, chess and terminals separate solved perceptual errors from persistent multi-step and abstention failures.
full image
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions
Dataset shows self-reports are distinct from messages, hard to infer, and useful for predicting actions and training aligned models.
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions
Thoughts differ from messages, resist inference by current models, and boost prediction and alignment tasks.
Generative correction beats oracle word error rates on both public and private datasets, showing real improvement rather than memorization.
full image
A 132k-pair corpus built with embedding alignment and LLM correction lets an 8B Greek LLM reach 13.16 BLEU.
full image
Negation Neglect: When models fail to learn negations in training
Belief rates rise from 2.5 percent to 88.6 percent even when every mention is surrounded by statements of falsehood.
full image
Geometric Factual Recall in Transformers
Subject embeddings encode attribute superpositions selected by MLP ReLU gating, enabling zero-shot transfer to new facts.
full image
Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty
By tracking uncertainty as natural-language statements with certainty labels, agents keep fixed context size and beat RL baselines across 30
full image
Safety and accuracy follow different scaling laws in clinical large language models
Across 34 models, curated radiologist evidence cut high-risk errors from 12% to 2.6%; RAG and ensembles did not.
full image