Command a: An enterprise-ready large language model

Command a: An enterprise-ready large language model , author= · 2025 · arXiv 2504.00698

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 3 dataset 1

citation-polarity summary

background 3 use dataset 1

representative citing papers

NL2SQLBench: A Modular Benchmarking Framework for LLM-Enabled NL2SQL Solutions

cs.DB · 2026-04-13 · conditional · novelty 7.0

NL2SQLBench is a new modular benchmarking framework that evaluates LLM NL2SQL methods across three core modules on existing datasets, exposing large accuracy gaps and computational inefficiency.

LLMs Get Lost In Multi-Turn Conversation

cs.CL · 2025-05-09 · unverdicted · novelty 6.0

LLMs drop 39% in performance during multi-turn conversations due to premature assumptions and inability to recover from early errors.

Safety is Contextual, LLM-Judges Are Not: Navigating the Rigid Priors of Evaluators

cs.AI · 2026-06-05 · unverdicted · novelty 5.0

LLM safety judges resist adjusting evaluations when given contradictory context or new safety definitions, despite some ability to learn from new information.

On the Limits of Model Merging for Multilinguality in Pre-Training

cs.CL · 2026-05-25 · unverdicted · novelty 5.0

Merging any combination of monolingual pre-trained models leads to performance collapse due to interference, indicating that merging flexibility from fine-tuning does not extend to pre-training.

AgentNLQ: A General-Purpose Agent for Natural Language to SQL

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

A multi-agent LLM framework with schema enrichment and business rules achieves 78.1% semantic accuracy on the BIRD NL2SQL benchmark.

Anthropogenic Regional Adaptation in Multimodal Vision-Language Model

cs.AI · 2026-04-13 · unverdicted · novelty 5.0

Anthropogenic Regional Adaptation with GG-EZ improves cultural relevance in multimodal vision-language models for Southeast Asia by 5-15% while retaining over 98% of global performance.

Offline Evaluation Measures of Fairness in Recommender Systems

cs.IR · 2026-04-27 · unverdicted · novelty 4.0

The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

cs.CL · 2025-10-06 · unverdicted · novelty 4.0

This work systematically compares inter-layer and intra-layer hybridization strategies for combining self-attention and Mamba-style state space models, evaluating them on language modeling, downstream tasks, long-context performance, scaling, and efficiency to derive optimal design recipes.

Think in English, Answer in Korean: Efficient Adaptation of Multilingual Tool-Using Agents

cs.AI · 2026-06-30 · unverdicted · novelty 3.0

LuckyStar 111B adapts Cohere's Command A model with four scaling techniques to improve tool-use, math reasoning, and NL2SQL in Korean-English while preserving general instruction following.

RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference

cs.LG · 2026-02-20 · 2 refs

Reinforcement Learning from Human Feedback

cs.LG · 2025-04-16

citing papers explorer

Showing 1 of 1 citing paper after filters.

LLMs Get Lost In Multi-Turn Conversation cs.CL · 2025-05-09 · unverdicted · none · ref 16
LLMs drop 39% in performance during multi-turn conversations due to premature assumptions and inability to recover from early errors.

Command a: An enterprise-ready large language model

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer