super hub Mixed citations

Gemma 2: Improving Open Language Models at a Practical Size

Cassidy Hardin, Gemma Team: Morgane Riviere, Pier Giuseppe Sessa, Shreya Pathak, Surya Bhupatiraju · 2024 · cs.CL · arXiv 2408.00118

Mixed citation behavior. Most common role is background (64%).

235 Pith papers citing it

Background 64% of classified citations

open full Pith review browse 235 citing papers more from Cassidy Hardin arXiv PDF

abstract

In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 23 method 6 baseline 2 dataset 1 other 1

citation-polarity summary

background 21 use method 6 unclear 3 baseline 2 use dataset 1

claims ledger

abstract In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer compe

authors

Cassidy Hardin Gemma Team: Morgane Riviere L\'eonard Hussenot Pier Giuseppe Sessa Shreya Pathak Surya Bhupatiraju

co-cited works

representative citing papers

Masked Generative Transformer Is What You Need for Image Editing

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EditMGT applies masked generative transformers with attention consolidation and region-hold sampling to deliver state-of-the-art localized image editing at 6x the speed of diffusion methods.

Acceptance Cards:A Four-Diagnostic Standard for Safe Fine-Tuning Defense Claims

cs.CR · 2026-05-11 · unverdicted · novelty 8.0

Acceptance Cards is a new four-diagnostic standard for safe fine-tuning defense claims that requires statistical reliability, fresh semantic generalization, mechanism alignment, and cross-task transfer; under this protocol SafeLoRA fails the full-card pass on Gemma-2-2B-it.

SLAM: Structural Linguistic Activation Marking for Language Models

cs.CL · 2026-05-06 · unverdicted · novelty 8.0 · 2 refs

SLAM achieves 100% detection on Gemma-2 models with only 1-2 point quality cost by causally steering SAE-identified residual-stream directions for linguistic structure.

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

Applicability Condition Extraction for Therapeutic Drug-Disease Relations

cs.AI · 2026-06-12 · unverdicted · novelty 7.0

Introduces applicability condition extraction for therapeutic drug-disease relations, creates first annotated dataset of 1,119 pairs, and proposes enhanced LoRA method outperforming baselines.

Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

Moral Trolley Arena shows frontier LLMs produce composite moral preferences that are compressed rather than additive functions of calibrated component act strengths across Moral Foundations Theory.

Wait! There's a Way Out: A Decision Mechanism for Forecasting Conversational Derailment

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

A deferral mechanism using forward-looking simulations reduces false positives in derailment forecasting by selectively waiting when recovery paths appear plausible.

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

MentalMap benchmark identifies a universal L3 reasoning cliff in LLMs' text-based spatial reasoning that persists across languages, scales, and prompting, and is replicated in human evaluations.

ReSAE: Residualized Sparse Autoencoders for Multi-Layer Transformer Interventions

cs.LG · 2026-05-27 · unverdicted · novelty 7.0

ReSAEs improve multi-layer SAE interventions on Pythia-1.4B and Gemma-2-9B by training later-layer dictionaries on residuals after affine mapping, recovering more cross-entropy loss despite lower raw variance reconstruction.

Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

Representational convergence across 16 LLMs on 800 reasoning problems is stronger for failed tasks and pre-decision stages but shows minimal causal influence on predictions, pointing to shared processing constraints over shared reasoning.

Self-Improving In-Context Learning

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

A test-time zeroth-order optimization of prompt embeddings using a bounded self-supervised proxy from demonstration log-probabilities improves ICL accuracy and correlates with gains across tasks.

GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

GraphFlow uses a unified wGraph to dynamically instantiate workflows and manage KV caches for LLM agents, reporting 4.95 pp average gains and 4x memory reduction on five benchmarks.

Tensor Cache: Eviction-conditioned Associative Memory for Transformers

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Tensor Cache augments sliding-window attention with an eviction-fed outer-product associative memory and a training correction to improve long-context performance under bounded memory.

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

In 1-3B instruction-tuned LMs on GSM8K, arithmetic CoT readout is dominated by positional copying of the trailing number before the answer delimiter, accounting for 54-92 percentage points of accuracy.

Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Chronicle is the first model jointly pretrained from scratch on text and time series in a unified transformer that matches a comparable language model on NLU tasks and sets new bars for time series classification and multimodal forecasting.

Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

A new speculative inference system speeds up diffusion VLAs to 19.1 ms average latency (3.04x faster) on LIBERO by replacing most full 58 ms inferences with 7.8 ms draft rounds while preserving task performance.

Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization

math.OC · 2026-05-12 · conditional · novelty 7.0

Symmetries in next-token prediction targets induce corresponding geometric symmetries such as circulant matrices and equiangular tight frames in the optimal weights and embeddings of a layer-peeled LLM surrogate model.

Towards Automated Air Traffic Safety Assessment Around Non-Towered Airports Using Large Language Models

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

Large language models achieve macro F1 scores above 0.85 on binary nominal-versus-danger classification from CTAF radio transcripts and METAR weather data using a new synthetic dataset with a 12-category hazard taxonomy.

Causal Bias Detection in Generative Artificial Intelligence

cs.AI · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Develops a causal framework unifying generative AI fairness with standard ML, with new decompositions, identification conditions, and estimators demonstrated on LLM race and gender bias.

Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

Behavioral directions from one LLM family transfer to others via projection into a shared anchor coordinate space, yielding 0.83 ten-way detection accuracy and steering effects up to 0.46% on held-out models.

PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

PLOT localizes causal variables in neural networks by fitting optimal transport couplings between abstract and neural intervention effect geometries, enabling fast handles or guided search.

Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

GLoRA replaces raw factor averaging with gauge-aware aggregation in a consensus subspace estimated from client projectors, enabling consistent low-rank federated LoRA under heterogeneity.

Implicit Representations of Grammaticality in Language Models

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

Linear probes on LM hidden states detect grammaticality better than string probabilities, generalize to human benchmarks and other languages, and correlate weakly with likelihood.

citing papers explorer

Showing 50 of 235 citing papers.

Exploring the Secondary Risks of Large Language Models cs.LG · 2025-06-14 · unverdicted · none · ref 44 · internal anchor
Introduces secondary risks as a new class of LLM failures from benign prompts, defines two primitives, proposes SecLens search framework, and releases SecRiskBench showing risks are widespread across 16 models.
LoVeC: Reinforcement Learning for Better Verbalized Confidence in Long-Form Generations cs.CL · 2025-05-29 · unverdicted · none · ref 23 · internal anchor
LoVeC uses RL to train LLMs to output verbalized numerical confidence scores for statements in long-form text, achieving better calibration than self-consistency baselines on QA datasets while being 20x faster.
Extracting memorized pieces of (copyrighted) books from open-weight language models cs.CL · 2025-05-18 · conditional · none · ref 261 · internal anchor
A new extraction technique applied to 200 books and 14 LLMs finds that memorization of full books is rare except in specific high-capacity models where entire texts can be recovered verbatim.
Muon is Scalable for LLM Training cs.LG · 2025-02-24 · unverdicted · none · ref 96 · internal anchor
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models cs.CL · 2025-02-20 · unverdicted · none · ref 46 · internal anchor
Adapts multi-layer token-level Mahalanobis distance with supervised linear regression to yield improved uncertainty scores for LLM truthfulness tasks.
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs cs.CL · 2024-12-25 · unverdicted · none · ref 19 · internal anchor
HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.
Flex Attention: A Programming Model for Generating Optimized Attention Kernels cs.LG · 2024-12-07 · unverdicted · none · ref 46 · internal anchor
FlexAttention supplies a compiler-driven interface that expresses common attention variants in a few lines of PyTorch and emits optimized kernels whose speed matches hand-written implementations.
LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation cs.CL · 2024-10-17 · unverdicted · none · ref 14 · internal anchor
LightTransfer identifies lazy layers in LLMs like LLaMA and replaces their attention with streaming attention to form hybrid models, delivering up to 2.17x throughput with under 1.5% drop on LongBench and strong results on reasoning benchmarks.
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers cs.CV · 2024-10-14 · unverdicted · none · ref 18 · internal anchor
Sana-0.6B produces high-resolution images with strong text alignment at 20x smaller size and 100x higher throughput than Flux-12B by combining 32x image compression, linear DiT blocks, and a decoder-only LLM text encoder.
Improve Mathematical Reasoning in Language Models by Automated Process Supervision cs.CL · 2024-06-05 · conditional · none · ref 6 · internal anchor
OmegaPRM automates collection of 1.5 million process supervision labels via binary-search MCTS, raising Gemini Pro math accuracy from 51% to 69.4% on MATH500 and Gemma2 27B from 42.3% to 58.2%.
Argumentative Large Language Models for Explainable and Contestable Claim Verification cs.CL · 2024-05-03 · unverdicted · none · ref 47 · internal anchor
ArgLLMs build argumentation frameworks from LLMs to support explainable and contestable formal reasoning for claim verification.
Whispers in the Machine: Confidentiality in Agentic Systems cs.CR · 2024-02-10 · unverdicted · none · ref 38 · internal anchor
Systematic testing of ten LLM agents across 20 tool scenarios and 14 attacks finds universal vulnerability to prompt injection enabling data exfiltration, with tooling amplifying leakage.
Beyond Perplexity: UTF-8 Validity in Byte-aware Language Models cs.CL · 2026-06-12 · unverdicted · none · ref 19 · internal anchor
A 355M-parameter byte-level LM on 80B multilingual tokens exhibits UTF-8 validity converging after 4.2B tokens versus 2.1B for perplexity, with higher validity on rare characters than common ones.
Improving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language Models cs.AI · 2026-05-28 · unverdicted · none · ref 8 · internal anchor
An iterative writer-editor multi-agent LLM process improves perceived story quality in simulations of child collaborative storytelling.
Feature Geometry of LoRA Adapters: A Sparse Autoencoder Analysis of Representational Divergence in Fine-Tuned Language Models cs.LG · 2026-05-27 · unverdicted · none · ref 8 · internal anchor
LoRA fine-tuning produces feature dictionaries in language models that show weak alignment with pretrained SAE features and are better reconstructed by adapter-specific SAEs.
Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models cs.CR · 2026-05-26 · unverdicted · none · ref 12 · internal anchor
Behavioral geometry of model populations enables high-accuracy jailbreak susceptibility prediction and defense transfer with 98% fewer evaluations.
A Large Language Model Approach to Generating Bypass Rules for Malware Evasion in Analysis Sandbox cs.CR · 2026-05-20 · unverdicted · none · ref 39 · internal anchor
ABLE uses LLMs with sanitization and iterative refinement to generate bypass YARA rules from malware traces, achieving 79% success on 334 samples and 47% more family detections.
AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback cs.LG · 2026-05-20 · unverdicted · none · ref 16 · internal anchor
AGPO adaptively sets trust-region size and exploration temperature from group reward dispersion, entropy, and KL drift, yielding higher scores than PPO and GRPO on nine math benchmarks under fixed token budget.
Do LLM Agents Mirror Socio-Cognitive Effects in Power-Asymmetric Conversations? cs.CL · 2026-05-17 · unverdicted · none · ref 41 · 2 links · internal anchor
LLMs assigned high or low status personas in multi-turn dialogues exhibit socio-cognitive effects including language coordination, pronoun patterns, persuasion success, and compliance with unsafe requests.
R2V Agent: Teaching SLMs When to Ask for Help cs.LG · 2026-05-15 · unverdicted · none · ref 18 · internal anchor
R2V-Agent combines an SLM policy trained via BC and DPO with a step-level risk-calibrated router using Brier scores and CVaR to escalate to LLM only on high residual failure risk, improving success-cost tradeoffs on HumanEval+, TextWorld, and TerminalBench.
Fairness-Aware Retrieval Optimization for Retrieval-Augmented Generation cs.DB · 2026-05-15 · unverdicted · none · ref 27 · internal anchor
Introduces FARO, a scalable quadratic optimization approach for fairness-aware top-k retrieval in RAG that mitigates generation bias via controlled reranking and position-aware propagation modeling.
Position: Zeroth-Order Optimization in Deep Learning Is Underexplored, Not Underpowered cs.LG · 2026-05-15 · unverdicted · none · ref 11 · internal anchor
Zeroth-order optimization is underexplored rather than underpowered in deep learning, with limitations stemming from full-space designs that can be addressed via subspace, spectral, and systems-aware approaches.
Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction cs.CL · 2026-05-13 · unverdicted · none · ref 52 · internal anchor
Edit-level majority voting on multiple LLM-generated candidates reduces over-correction in grammatical error correction and outperforms greedy and MBR decoding on nine multilingual benchmarks while remaining stable to prompt variations.
Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation cs.LG · 2026-05-12 · unverdicted · none · ref 73 · internal anchor
Pion is an optimizer that preserves the singular values of weight matrices in LLM training by applying orthogonal equivalence transformations.
How Does Differential Privacy Affect Social Bias in LLMs? A Systematic Evaluation cs.CL · 2026-05-11 · unverdicted · none · ref 34 · internal anchor
Differential privacy reduces measured bias in sentence-scoring tasks but shows no consistent reduction in output-level bias or unfairness across other evaluation paradigms.
Can We Trust LLMs for Mental Health Screening? Consistency, ASR Robustness, and Evidence Faithfulness cs.CL · 2026-05-10 · unverdicted · none · ref 56 · internal anchor
Phi-4 and Gemma-2-9B maintain high intra-model consistency (ICC > 0.89) and ASR robustness for HADS scoring while Llama-3.1-8B degrades sharply, with all models showing score-evidence dissociation.
Feature Rivalry in Sparse Autoencoder Representations: A Mechanistic Study of Uncertainty-Driven Feature Competition in LLMs cs.LG · 2026-05-03 · unverdicted · none · ref 5 · internal anchor
Feature rivalry in SAE representations strengthens with model uncertainty on high-entropy questions, enables output steering, and predicts answer correctness with AUROC 0.689 in Gemma-2-2B.
RadLite: Multi-Task LoRA Fine-Tuning of Small Language Models for CPU-Deployable Radiology AI cs.CL · 2026-05-01 · unverdicted · none · ref 20 · internal anchor
LoRA fine-tuning of 3-4B SLMs on 162K multi-task radiology data yields strong performance deployable on consumer CPUs at 4-8 tokens/second.
LayerBoost: Layer-Aware Attention Reduction for Efficient LLMs cs.LG · 2026-04-23 · unverdicted · none · ref 40 · internal anchor
LayerBoost selectively replaces or removes attention in non-critical transformer layers to cut inference latency up to 68% while recovering quality via brief distillation.
EgoMotion: Hierarchical Reasoning and Diffusion for Egocentric Vision-Language Motion Generation cs.CV · 2026-04-21 · unverdicted · none · ref 44 · internal anchor
EgoMotion decouples reasoning from motion synthesis in egocentric vision-language tasks by mapping inputs to motion primitives via VLM then using diffusion to produce grounded and coherent 3D trajectories.
Exploring Concreteness Through a Figurative Lens cs.CL · 2026-04-20 · unverdicted · none · ref 79 · internal anchor
LLMs compress concreteness into a consistent 1D direction in mid-to-late layers that separates literal from figurative noun uses and supports efficient classification plus steering.
TStore: Rethinking AI Model Hub with Tensor-Centric Compression cs.DC · 2026-04-18 · unverdicted · none · ref 76 · 2 links · internal anchor
TStore reduces AI model storage via tensor-level fingerprinting, clustering, and compression without annotations while claiming to preserve usability.
StoryCoder: Narrative Reformulation for Structured Reasoning in LLM Code Generation cs.CL · 2026-04-16 · unverdicted · none · ref 2 · internal anchor
Reformulating code problems as guided narratives improves zero-shot pass@10 by 18.7% on average across 11 models and three benchmarks.
Towards Platonic Representation for Table Reasoning: A Foundation for Permutation-Invariant Retrieval cs.AI · 2026-04-13 · unverdicted · none · ref 24 · internal anchor
Table representations must be permutation-invariant to preserve semantic structure, and a new header-aligned encoder moves toward this ideal while exposing fragility in existing LLM table embeddings.
Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation cs.LG · 2026-04-10 · unverdicted · none · ref 31 · internal anchor
REINA-SAN and REINA-TAN add temporal context to information-based read/write policies, improving the quality-latency tradeoff in simultaneous speech translation by up to 7.1% on Normalized Streaming Efficiency.
Testing the Assumptions of Active Learning for Translation Tasks with Few Samples cs.CL · 2026-04-10 · unverdicted · none · ref 28 · internal anchor
Informativeness and diversity of samples selected by active learning show no correlation with test performance on translation tasks using few samples; ordering and pre-training effects dominate instead.
Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning cs.CL · 2026-04-10 · unverdicted · none · ref 35 · internal anchor
Supervised fine-tuning degrades the correlation between confidence scores and output quality in language models, driven by factors like training distribution similarity rather than true quality.
Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations cs.CL · 2026-04-06 · unverdicted · none · ref 22 · internal anchor
LLM hallucinations arise from task-dependent basins in latent space, with separability varying by task and geometry-aware steering reducing their probability.
Sustainability Is Not Linear: Quantifying Performance, Energy, and Privacy Trade-offs in On-Device Intelligence cs.SE · 2026-03-27 · unverdicted · none · ref 41 · internal anchor
Empirical case study on a flagship Android device profiles energy, latency, and quality trade-offs across eight LLMs, revealing a quantization energy paradox and identifying mid-sized models as practical sweet spots.
TildeOpen LLM: Leveraging Curriculum Learning to Achieve Equitable Language Representation cs.CL · 2026-03-09 · unverdicted · none · ref 20 · internal anchor
A new 30B open LLM trained with curriculum learning and upsampling outperforms other multilingual models on European languages, especially low-resource ones, with up to 10x fewer linguistic errors in human evaluations.
LLM4Delay: Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation cs.LG · 2025-10-24 · unverdicted · none · ref 40 · internal anchor
LLM4Delay improves flight delay prediction accuracy by using instance-level projection to adapt LLMs for integrating textual aeronautical information with multiple aircraft trajectories.
BoHA: Blockwise Hadamard Product Adaptation for Parameter-Efficient Fine-Tuning cs.LG · 2025-09-25 · unverdicted · none · ref 15 · internal anchor
BoHA partitions frozen weights into a b by b grid and applies independent low-rank Hadamard factors per block, outperforming LoRA on matched-budget single-task averages while retaining 57.66% first-stage accuracy in a commonsense-to-arithmetic continual-learning test on Llama-3.2-3B.
Kimi K2: Open Agentic Intelligence cs.LG · 2025-07-28 · unverdicted · none · ref 73 · internal anchor
Kimi K2 is a 1-trillion-parameter MoE model that leads open-source non-thinking models on agentic benchmarks including 65.8 on SWE-Bench Verified and 66.1 on Tau2-Bench.
Shared representations in brains and models reveal a two-route cortical organization during scene perception q-bio.NC · 2025-07-18 · unverdicted · none · ref 77 · internal anchor
RSA on 7T fMRI during natural scene viewing identifies ventromedial and lateral occipitotemporal representational routes for scene context versus animate content, with differential alignment to vision and language models.
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective cs.RO · 2025-07-02 · unverdicted · none · ref 61 · internal anchor
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
Beyond Words: Multimodal LLM Knows When to Speak cs.CV · 2025-05-20 · unverdicted · none · ref 19 · internal anchor
MM-When2Speak reformulates conversational timing as dense response-type prediction and achieves up to 3x better performance by integrating video, audio, and text cues on top of an LLM backbone using a new dyadic conversation dataset.
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs cs.CL · 2025-03-03 · unverdicted · none · ref 44 · internal anchor
Phi-4-Mini achieves strong math and coding performance with only 3.8B parameters via high-quality synthetic data, while Phi-4-Multimodal uses Mixture-of-LoRAs to integrate modalities and top speech recognition leaderboards.
LLM-based User Profile Management for Recommender System cs.CL · 2025-02-20 · unverdicted · none · ref 4 · internal anchor
PURE is a three-component LLM system that extracts and maintains user profiles from reviews to outperform prior LLM recommenders on sequential Amazon tasks.
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model cs.CL · 2025-02-04 · unverdicted · none · ref 228 · internal anchor
SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.
AdaMCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Multilingual Chain-of-Thought cs.CL · 2025-01-27 · unverdicted · none · ref 9 · internal anchor
AdaMCoT uses dynamic routing of chain-of-thought reasoning in intermediary languages with a reward-based selector to improve cross-lingual factual consistency in LLMs.

Gemma 2: Improving Open Language Models at a Practical Size

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer