pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

7661 papers in cs.CL · page 18

  1. cs.CL 2026-05-13 reviewed
    Merging method adds multilingual ability to multimodal models

    DiM\textsuperscript{3}: Bridging Multilingual and Multimodal Models via Direction- and Magnitude-Aware Merging

    Zijing Wang +8

  2. cs.CL 2026-05-13 reviewed
    DiM3 merges updates to add 57 languages to multimodal models

    DiM\textsuperscript{3}: Bridging Multilingual and Multimodal Models via Direction- and Magnitude-Aware Merging

    Zijing Wang +8

  3. cs.LG 2026-05-13 reviewed
    Recipe search beats instance ranking for SFT data

    From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning

    Haodong Wu +3

  4. cs.LG 2026-05-13 reviewed
    Capabilities cooperate across frontier models with r = +0.72

    The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next

    Adil Amin

  5. cs.LG 2026-05-13 reviewed
    Language models flip from capability conflict to cooperation past 3.5B parameters

    Lying Is Just a Phase: The Hidden Alignment Transition in Language Model Scaling

    Adil Amin

  6. cs.CL 2026-05-13 reviewed
    Dataset shows MT falters more on domestic Japanese places

    ATD-Trans: A Geographically Grounded Japanese-English Travelogue Translation Dataset

    Shohei Higashiyama +3

  7. cs.AI 2026-05-13 reviewed
    Attention fade to goals predicts when LLMs forget instructions

    When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction

    Vardhan Dongre +5

  8. cs.MA 2026-05-13 reviewed
    Dialogue cuts agent conflicts but lowers task success

    Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue

    Vardhan Dongre +1

  9. cs.CL 2026-05-13 reviewed
    15,000 why questions expose LLM gaps in causal commonsense

    CommonWhy: A Dataset for Evaluating Entity-Based Causal Commonsense Reasoning in Large Language Models

    Armin Toroghi +2

  10. cs.CL 2026-05-13 reviewed
    OP-Mix finds near-optimal data mixtures with far less compute

    Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

    Michael Y. Hu +4

  11. cs.AI 2026-05-13 reviewed
    Evolved personas boost LLM agent success 17% on tough users

    Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents

    Harshita Chopra +5

  12. cs.CL 2026-05-13 reviewed
    Document models answer right but cite the wrong regions

    CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

    Dongsheng Ma +10

  13. cs.CL 2026-05-13 reviewed
    Insecure fine-tuning collapses LLM personas

    Persona-Model Collapse in Emergent Misalignment

    Davi Bastos Costa +1

  14. cs.MA 2026-05-12 reviewed
    Four-level scale rates LLM agent models on mechanistic plausibility

    Mechanism Plausibility in Generative Agent-Based Modeling

    Patrick Zhao +2

  15. cs.MA 2026-05-12 reviewed
    Scale separates mechanistic explanation from reproduction in LLM models

    Mechanism Plausibility in Generative Agent-Based Modeling

    Patrick Zhao +2

  16. cs.LG 2026-05-12 reviewed
    LoRA adapter on notes cuts calibration error to one-third

    Training Large Language Models to Predict Clinical Events

    Benjamin Turtel +2

  17. cs.SI 2026-05-12 reviewed
    LLM stance scores link extreme discourse to network polarization

    Linking Extreme Discourse to Structural Polarization in Signed Interaction Networks

    Zhijin Guo +4

  18. cs.CL 2026-05-12 reviewed
    Latent editing directions yield realistic attacks that trigger LLM hallucinations

    REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

    Buyun Liang +8

  19. cs.LG 2026-05-12 reviewed
    Harmful fine-tuning spreads misalignment via data structure

    Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

    Baris Askin +5

  20. cs.LG 2026-05-12 reviewed
    Rank-1 atoms replace recurrent cache writes

    WriteSAE: Sparse Autoencoders for Recurrent State

    Jack Young

    1 Piths
  21. cs.LG 2026-05-12 reviewed
    Atoms swap directly into recurrent model cache writes

    WriteSAE: Sparse Autoencoders for Recurrent State

    Jack Young

    1 Piths
  22. cs.LG 2026-05-12 reviewed
    Sparse atoms swap directly into recurrent model caches

    WriteSAE: Sparse Autoencoders for Recurrent State

    Jack Young

    1 Piths
  23. cs.LG 2026-05-12 reviewed
    Sparse autoencoders now edit recurrent model cache writes

    WriteSAE: Sparse Autoencoders for Recurrent State

    Jack Young

    1 Piths
  24. cs.CL 2026-05-12 reviewed
    LLM simulators fix answers regardless of feedback relevance

    Simulating Students or Sycophantic Problem Solving? On Misconception Faithfulness of LLM Simulators

    Heejin Do +2

  25. cs.LG 2026-05-12 reviewed
    Mixtures reuse scarce target data up to 20 times before diminishing returns

    Scaling Laws for Mixture Pretraining Under Data Constraints

    Anastasiia Sedova +3

  26. cs.LG 2026-05-12 reviewed
    Layer dynamics predict model performance beyond final states

    Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs

    Jingzhou Jiang +2

  27. cs.LG 2026-05-12 reviewed
    Mixture pretraining reuses scarce data 15-20 times before loss

    Scaling Laws for Mixture Pretraining Under Data Constraints

    Anastasiia Sedova +3

  28. cs.CL 2026-05-12 reviewed
    LLM tasks run on multiple distinct circuits instead of one unique mechanism

    All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs

    Xi Chen +9

  29. cs.CL 2026-05-12 reviewed
    RL lifts personalized QA scores 7.5 percent via intent inference

    Training LLMs with Reinforcement Learning for Intent-Aware Personalized Question Answering

    Maryam Amirizaniani +3

  30. cs.CL 2026-05-12 reviewed
    Rendered labels enable stable DPO gains across 82 document languages

    DocAtlas: Multilingual Document Understanding Across 80+ Languages

    Ahmed Heakl +8

  31. cs.CL 2026-05-12 reviewed
    Rendering labels let DPO adapt models to 82 languages without forgetting

    DocAtlas: Multilingual Document Understanding Across 80+ Languages

    Ahmed Heakl +8

  32. cs.CL 2026-05-12 reviewed
    Coding agent memory hits 72.5% on long-term agent benchmark

    LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

    Di Wu +6

  33. cs.CL 2026-05-12 reviewed
    LLM refines embeddings at test time for up to 25% gains

    Task-Adaptive Embedding Refinement via Test-time LLM Guidance

    Ariel Gera +4

  34. cs.LG 2026-05-12 reviewed
    LLM memory systems fail dependency reasoning across evolving entities

    MEME: Multi-entity & Evolving Memory Evaluation

    Seokwon Jung +4

  35. cs.LG 2026-05-12 reviewed
    Routers align geometrically with experts they activate

    Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts

    Sagi Ahrac +2

  36. cs.LG 2026-05-12 reviewed
    Pretrained transformers handle 128K contexts via KV-cache folding

    KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

    Alireza Nadali +3

  37. cs.LG 2026-05-12 reviewed
    Attractor models beat larger transformers on language and puzzles

    Solve the Loop: Attractor Models for Language and Reasoning

    Jacob Fein-Ashley +1

  38. cs.LG 2026-05-12 reviewed
    Parallel streams let models read while writing

    Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

    Guinan Su +3

  39. cs.CR 2026-05-12 reviewed
    TextSeal watermark detects AI text even after mixing or distillation

    TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

    Tom Sander +12

  40. cs.CR 2026-05-12 reviewed
    Watermark detects AI text in mixed documents and distilled models

    TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

    Tom Sander +12

  41. cs.CL 2026-05-12 reviewed
    LLM political discourse lacks real population variation in crises

    The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

    Gunjan +2

  42. cs.LG 2026-05-12 reviewed
    Decoupled method aligns verbalized confidence in LLMs

    ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models

    Chen Li +4

  43. cs.CL 2026-05-12 reviewed
    CLM detour lifts biomedical encoder scores

    A Causal Language Modeling Detour Improves Encoder Continued Pretraining

    Rian Touchent +1

  44. cs.CL 2026-05-12 reviewed
    Log embedding dimension suffices for transformer factual recall

    Geometric Factual Recall in Transformers

    Shauli Ravfogel +3

    1 Piths
  45. cs.CL 2026-05-12 reviewed
    Embedding geometry flags LLM rating disagreements

    Predicting Disagreement with Human Raters in LLM-as-a-Judge Difficulty Assessment without Using Generation-Time Probability Signals

    Yo Ehara

  46. cs.CL 2026-05-12 reviewed
    This paper proposes ORBIT, a method that tracks how far a fine-tuned generative retrieval…

    ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

    Neha Verma +9

  47. cs.CL 2026-05-12 reviewed
    LLM belief updates trace paths in low-dimensional conceptual space

    Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

    Eric Bigelow +7

  48. cs.LG 2026-05-12 reviewed
    Tabular model predicts AI agents' moves from 16 past games

    Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

    Eilam Shapira +2

  49. cs.LG 2026-05-12 reviewed
    Framework generates benchmarks with lower error than MMLU

    Fine-Grained Benchmark Generation for Comprehensive Evaluation of Foundation Models

    Mohammed Saidul Islam +7

  50. cs.CL 2026-05-12 reviewed
    Entropy of plausibility scores estimates LLM question difficulty

    Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring

    Jamshid Mozafari +2