super hub Mixed citations

Qwen Technical Report

author=, Qwen technical report · 2023 · cs.CL · arXiv 2309.16609

Mixed citation behavior. Most common role is background (67%).

445 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 445 citing papers more from author= arXiv PDF

abstract

Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base language models. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 83 baseline 16 method 16 dataset 1 extension 1 other 1

citation-polarity summary

background 79 baseline 16 use method 16 unclear 4 extend 1 support 1 use dataset 1

claims ledger

abstract Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a mult

authors

author= Qwen technical report

co-cited works

representative citing papers

Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking

cs.CR · 2026-05-27 · unverdicted · novelty 8.0

SeedHijack is a blind, integrity-preserving PRNG hijacking attack that amplifies LLM watermark z-scores up to 2.42x while evading all tested content-side statistical detectors across three schemes and models.

When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution

cs.AI · 2026-05-14 · unverdicted · novelty 8.0 · 2 refs

LongAct benchmark evaluates long-horizon household task execution from free-form instructions; HoloMind agent raises performance but top VLMs still reach only 59% goal completion and 16% full-task success.

A Unified Framework for Critical Scaling of Inverse Temperature in Self-Attention

stat.ML · 2026-05-12 · unverdicted · novelty 8.0

The upper-tail accumulation scale derived from the gap-counting function N_n sets the critical inverse temperature for softmax attention concentration, unifying prior conflicting laws as special cases of different N_n.

LLM Translation of Compiler Intermediate Representation

cs.PL · 2026-05-07 · unverdicted · novelty 8.0

IRIS-14B is the first LLM trained explicitly for GIMPLE-to-LLVM IR translation and outperforms much larger models by up to 44 percentage points on real-world C code.

Learning the Signature of Memorization in Autoregressive Language Models

cs.CL · 2026-04-03 · accept · novelty 8.0

A classifier trained only on transformer fine-tuning data detects an invariant memorization signature that transfers to Mamba, RWKV-4, and RecurrentGemma with AUCs of 0.963, 0.972, and 0.936.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

cs.AI · 2024-04-11 · accept · novelty 8.0

OSWorld provides the first unified real-computer benchmark for open-ended multimodal agent tasks, exposing large performance gaps between humans and state-of-the-art LLM/VLM agents.

A Sensitivity-Aware Test Collection for Search Among Personal Information

cs.IR · 2026-06-25 · accept · novelty 7.0

A new sensitivity-labeled test collection is released from Enron emails with crowdsourced queries, relevance judgments, and LLM extensions for evaluating sensitivity-aware search.

Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

Moral Trolley Arena shows frontier LLMs produce composite moral preferences that are compressed rather than additive functions of calibrated component act strengths across Moral Foundations Theory.

Beyond English and Evasion: A Human-Annotated Multi-Domain Benchmark for High-Stakes LLM Safety Evaluation in Chinese

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

Introduces ChiSafe-PAS, a 1,897-prompt human-annotated Chinese adversarial benchmark for LLM safety with 3-class labels, 9-category obfuscation taxonomy, and domain coverage in self-harm, drugs, fraud, and satire.

EvoGM: Learning to Merge LLMs via Evolutionary Generative Optimization

cs.NE · 2026-05-28 · unverdicted · novelty 7.0

EvoGM uses a dual-generator architecture with cycle-consistent learning on winner-loser pairs from search history to optimize LLM merging coefficients inside a multi-round evolutionary pipeline and reports outperformance over baselines on seen and unseen tasks.

FTibSuite: A Comprehensive Resource Suite for Tibetan Vision-Language Modeling

cs.CV · 2026-05-26 · unverdicted · novelty 7.0

FTibSuite provides human-verified multimodal corpora, Tibetan-adapted benchmarks with quality controls, and a baseline VLM showing gains on tasks like MMBench while preserving Chinese capabilities.

Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems

cs.AI · 2026-05-25 · unverdicted · novelty 7.0

AgingBench demonstrates multi-dimensional degradation in deployed AI agents through four aging mechanisms diagnosed by temporal graphs and counterfactual probes across hundreds of runs.

MiRD: Reliable Set-Valued Prediction for Open-Ended Question Answering via Miscoverage Risk Decomposition

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

MiRD decomposes overall miscoverage into sampling and conditional selection risks for conformal set-valued prediction in open-ended QA, bounding each while using the full calibration set.

VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VisAnalog is a new controlled benchmark showing VLMs substantially underperform humans on visual concept transfer under one- to four-step deterministic transformations, with relation inference as the main failure mode.

Towards Camera-Robust 3D Localization: Equation-Anchored Tool-Use for MLLMs

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

Proposes an equation-anchored tool-use method for MLLMs that writes the pinhole back-projection equation in Chain-of-Thought and substitutes retrieved camera intrinsics and depths to achieve robustness in 3D object detection and visual grounding under rescaled intrinsics.

Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Incantation is the first video world model to use per-frame natural language conditioning for simultaneous multi-entity control and concept-level cross-entity transfer in interactive video generation.

TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics

cs.OS · 2026-05-18 · unverdicted · novelty 7.0

TIDAL recovers temporal phase signals from LLM-derived semantics of provisioning metadata to enable complementary CVD placement, reducing overload frequency by 79.1% on production traces.

Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.

Dynamic Chunking for Diffusion Language Models

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

DCDM replaces positional blocks with learnable semantic chunks via differentiable Chunking Attention, yielding consistent gains over block and unstructured diffusion baselines up to 1.5B parameters.

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

cs.AI · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.

BlockVLA: Accelerating Autoregressive VLA via Block Diffusion Finetuning

cs.RO · 2026-05-13 · unverdicted · novelty 7.0

BlockVLA accelerates autoregressive VLA models by 3.3x using block diffusion finetuning, with faster training convergence and better early performance on long-horizon robotic tasks.

Efficient and Adaptive Human Activity Recognition via LLM Backbones

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Pretrained LLMs adapted via convolutional projections and LoRA act as efficient frozen backbones for sensor-based human activity recognition, delivering strong data efficiency and cross-dataset transfer.

ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

ReCrit frames critic interaction as a correctness-transition problem and uses quadrant-based RL rewards to improve LLM performance on scientific reasoning benchmarks by rewarding corrections and robustness while penalizing sycophancy.

citing papers explorer

Showing 50 of 445 citing papers.

HFX: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling cs.DC · 2025-08-21 · unverdicted · none · ref 5 · internal anchor
HFX jointly designs scheduling and scaling for multi-SLO LLM serving, achieving up to 4.44x higher SLO attainment, 65.82% lower latency, and 49.81% lower cost than prior systems on multi-task workloads.
MAC: Masked Agent Collaboration Boosts Large Language Model Medical Decision-Making cs.AI · 2025-07-25 · unverdicted · none · ref 45 · internal anchor
MAC framework selects Pareto-optimal LLM agents and masks low cross-consistency outputs for adaptive collaboration in medical decision-making.
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective cs.RO · 2025-07-02 · unverdicted · none · ref 122 · internal anchor
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
RAP: Runtime Adaptive Pruning for LLM Inference cs.LG · 2025-05-22 · unverdicted · none · ref 4 · internal anchor
RAP is a reinforcement learning framework for runtime-adaptive pruning of LLMs that jointly optimizes model weights and KV-cache usage under varying memory budgets.
From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs cs.IR · 2025-04-22 · unverdicted · none · ref 29 · internal anchor
The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.
Qwen2.5-Omni Technical Report cs.CL · 2025-03-26 · conditional · none · ref 3 · internal anchor
Qwen2.5-Omni presents a multimodal model with block-wise encoders, TMRoPE position embeddings, and a Thinker-Talker architecture that enables simultaneous text and streaming speech generation while matching text performance on reasoning benchmarks.
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model cs.CL · 2025-02-04 · unverdicted · none · ref 148 · internal anchor
SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling cs.CV · 2025-01-21 · unverdicted · none · ref 2 · internal anchor
InternVideo2.5 improves video MLLMs by incorporating dense vision task annotations via direct preference optimization and compact spatiotemporal representations via adaptive hierarchical token compression, yielding better benchmark performance, 6x longer video memory, and new capabilities likeobject
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference cs.CL · 2024-12-18 · unverdicted · none · ref 117 · internal anchor
ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning cs.CL · 2024-09-10 · unverdicted · none · ref 15 · internal anchor
E2LLM uses encoder-based soft prompt compression for long contexts to improve LLM reasoning on tasks like summarization and QA while maintaining efficiency.
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model cs.CV · 2024-09-03 · unverdicted · none · ref 4 · internal anchor
GOT is a unified end-to-end model that treats all man-made optical signals as characters and handles multiple OCR tasks including formatted output and interactive region recognition via prompts.
InternLM2 Technical Report cs.CL · 2024-03-26 · unverdicted · none · ref 164 · internal anchor
InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices cs.CV · 2023-12-28 · unverdicted · none · ref 4 · internal anchor
MobileVLM achieves on-par performance with much larger vision-language models on standard benchmarks while delivering state-of-the-art inference speeds of 21.5 tokens per second on Snapdragon 888 CPU and 65.3 on Jetson Orin GPU.
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks cs.CV · 2023-12-21 · unverdicted · none · ref 4 · internal anchor
InternVL scales a vision model to 6B parameters and aligns it with LLMs using web data to achieve state-of-the-art results on 32 visual-linguistic benchmarks.
ARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiring cs.CL · 2026-05-04 · unverdicted · none · ref 39
ARGUS uses a Prosecutor-Defender-Umpire multi-agent setup plus RAG and chain-of-thought rewards to adapt ad policy enforcement to new regulations using minimal fresh labels.
Qwen3 Technical Report cs.CL · 2025-05-14 · unverdicted · none · ref 4
Pith review generated a malformed one-line summary.
Atomic Intent Reasoning: Bringing LLM Semantics to Industrial Cross-Domain Recommendations cs.IR · 2026-06-09 · unverdicted · none · ref 1 · internal anchor
AIR framework achieves ~400x faster LLM-based cross-domain recommendation via offline intent construction and online retrieval, with SOTA results on public data and +3.446% GMV lift in live A/B tests.
Decoding Strategies for Diffusion-Based ASR: A Systematic Evaluation of Confidence-Based Thresholding eess.AS · 2026-05-28 · unverdicted · none · ref 1 · internal anchor
Threshold-based decoding for diffusion ASR outperforms fixed schemes by accepting high-confidence tokens early and matches autoregressive accuracy with better speed.
Multimodal Group Emotion Recognition In-the-Wild Towards a Privacy-Safe Non-Individual Approach cs.CV · 2026-05-27 · unverdicted · none · ref 12 · internal anchor
Proposes cross-attention audio-video fusion and VE-MD latent-space models for group emotion recognition that avoid individual cues and report competitive performance via ablation studies on synthetic and real data.
Bandwidth-Aware LLM Inference on Heterogeneous Many-Core Supercomputers cs.DC · 2026-05-25 · unverdicted · none · ref 2 · internal anchor
THInfer achieves 62-84% higher throughput than GPU baselines for Llama 7B-30B models on MT-3000 through bandwidth-focused co-design, and runs 70B models where GPU frameworks fail.
m3BERT: A Modern, Multi-lingual, Matryoshka Bidirectional Encoder cs.CL · 2026-05-19 · unverdicted · none · ref 1 · internal anchor
m3BERT uses a three-stage Matryoshka pretraining approach on a bidirectional encoder to support variable embedding sizes while outperforming prior models on large-scale retrieval tasks.
DuIVRS-2: An LLM-based Interactive Voice Response System for Large-scale POI Attribute Acquisition cs.AI · 2026-05-18 · unverdicted · none · ref 41 · internal anchor
DuIVRS-2 deploys an LLM-driven IVR pipeline that processes 0.4 million calls per day at 83.9 percent task success rate using FSM-guided augmentation, selective CoT generation, and cooperative policy iteration.
Towards Robust Argumentative Essay Understanding via TIDE: An Interactive Framework with Trial and Debate cs.AI · 2026-05-17 · unverdicted · none · ref 62 · internal anchor
TIDE integrates trial and debate mechanisms to improve criteria-based prompt optimization for argumentative essay tasks including automated scoring, component detection, and relation identification.
Policy-Grounded Dynamic Facet Suggestions for Job Search cs.IR · 2026-05-15 · unverdicted · none · ref 2 · internal anchor
A policy-grounded retrieval-augmented framework with SLM scoring generates real-time personalized facet suggestions that boost engagement and job search outcomes.
Toward LLMs Beyond English-Centric Development cs.CL · 2026-05-15 · unverdicted · none · ref 1 · internal anchor
Analysis of open-weight LLMs reveals strong English bias in generated sequences, with continual pre-training providing no cost benefit over from-scratch training for non-English adaptation.
Gyan: An Explainable Neuro-Symbolic Language Model cs.CL · 2026-05-06 · unverdicted · none · ref 9 · 2 links · internal anchor
Gyan is a novel explainable non-transformer language model that achieves SOTA results on multiple datasets by mimicking human-like compositional context and world models.
RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation cs.IR · 2026-05-06 · unverdicted · none · ref 2 · internal anchor
RecGPT-Mobile runs a compact LLM on phones to understand evolving user intent from behaviors and improve mobile e-commerce recommendations.
Neuroscience-Inspired Analyses of Visual Interestingness in Multimodal Transformers cs.CV · 2026-05-05 · unverdicted · none · ref 17 · internal anchor
Human visual interestingness is linearly decodable from final-layer embeddings in Qwen3-VL-8B and becomes progressively more structured across vision and language layers without explicit supervision.
SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures cs.CL · 2026-05-04 · unverdicted · none · ref 33 · internal anchor
SemEval-2026 Task 7 presents a benchmark and two evaluation tracks for assessing LLMs on everyday knowledge in diverse languages and cultures without allowing training on the test data.
HalluScan: A Systematic Benchmark for Detecting and Mitigating Hallucinations in Instruction-Following LLMs cs.CL · 2026-05-04 · unverdicted · none · ref 1 · 2 links · internal anchor
HalluScan benchmark evaluates hallucination detection in LLMs, reporting NLI Verification at AUROC 0.88 and introducing HalluScore (r=0.41 with humans) plus Adaptive Detection Routing for 2x cost savings.
Leveraging Multimodal LLMs for Built Environment and Housing Attribute Assessment from Street-View Imagery cs.CV · 2026-04-22 · unverdicted · none · ref 29 · internal anchor
Fine-tuning Gemma 3 27B on modest human-labeled street-view data yields building condition scores that align with and sometimes exceed individual human raters on correlation metrics, with knowledge distillation producing comparable smaller LLM, CNN, and transformer models.
Effects of Cross-lingual Evidence in Multilingual Medical Question Answering cs.CL · 2026-04-22 · unverdicted · none · ref 11 · internal anchor
Combining English and target-language web retrieval boosts medical QA for low-resource languages to match high-resource performance, while English web data benefits high-resource languages most and specialized sources like PubMed lack multilingual coverage.
XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments cs.CV · 2026-04-20 · unverdicted · none · ref 3 · internal anchor
XEmbodied is a foundation model that integrates 3D geometric and physical signals into VLMs using a 3D Adapter and Efficient Image-Embodied Adapter, plus progressive curriculum and RL post-training, to improve spatial reasoning and embodied performance on 18 benchmarks.
Transparent and Controllable Recommendation Filtering via Multimodal Multi-Agent Collaboration cs.IR · 2026-04-19 · unverdicted · none · ref 2 · internal anchor
A multi-agent multimodal system with fact-grounded adjudication and a dynamic two-tier preference graph cuts false positives in content filtering by 74.3% and nearly doubles F1-score versus text-only baselines while supporting user-driven Delta adjustments.
Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models cs.RO · 2026-04-09 · unverdicted · none · ref 21 · internal anchor
This survey organizes aerial vision-language navigation methods into five architectural categories, critically reviews evaluation infrastructure, and synthesizes seven open problems for LLM/VLM integration.
WisdomInterrogatory (LuWen): An Open-Source Legal Large Language Model Technical Report cs.CL · 2026-04-08 · unverdicted · none · ref 1 · internal anchor
LuWen is a new open-source Chinese legal LLM that outperforms baselines on judgment prediction, judicial exams, summarization, article QA, and decision reasoning through legal-domain adaptation of a general base model.
GraspSense: Physically Grounded Grasp and Grip Planning for a Dexterous Robotic Hand via Language-Guided Perception and Force Maps cs.RO · 2026-04-07 · unverdicted · none · ref 14 · internal anchor
GraspSense computes force maps from object geometry to select mechanically safe grasp regions and regulate grip forces for dexterous hands.
DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization cs.DC · 2026-03-26 · unverdicted · none · ref 2 · internal anchor
DFLOP is a data-driven framework that profiles data-induced computation variance and uses predictive scheduling to balance workloads in multimodal LLM training pipelines, claiming up to 3.6x faster training than existing frameworks.
Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches cs.SE · 2025-10-06 · unverdicted · none · ref 175 · internal anchor
The paper organizes repository-level retrieval-augmented code generation into a unified framework covering retrieval substrate, control regime, and evaluation setting while summarizing strategies, datasets, and challenges.
Perovskite-R1: a domain-specialized large language model for intelligent discovery of precursor additives and experimental design cs.LG · 2025-07-22 · unverdicted · none · ref 17 · internal anchor
A fine-tuned LLM called Perovskite-R1, built from curated perovskite literature and material libraries, proposes precursor additives and designs with some experimental validation showing improved stability and performance.
Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving cs.CV · 2025-06-05 · unverdicted · none · ref 13 · internal anchor
Introduces structured NuScenes-S dataset and 0.9B FastDrive VLM claiming 20% higher decision accuracy and over 10x inference speedup versus larger unstructured VLMs.
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model cs.CV · 2025-02-14 · unverdicted · none · ref 88 · internal anchor
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.
Qwen2.5-Coder Technical Report cs.CL · 2024-09-18 · unverdicted · none · ref 4 · internal anchor
Qwen2.5-Coder models claim state-of-the-art results on over 10 code benchmarks, outperforming larger models of similar size.
Qwen2-Audio Technical Report eess.AS · 2024-07-15 · unverdicted · none · ref 4 · internal anchor
Qwen2-Audio is an open-source audio-language model that outperforms prior systems such as Gemini-1.5-pro on audio-centric instruction-following benchmarks after simplified prompt-based pre-training and expanded data.
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites cs.CV · 2024-04-25 · unverdicted · none · ref 4 · internal anchor
InternVL 1.5 narrows the performance gap to proprietary multimodal models via a stronger transferable vision encoder, dynamic high-resolution tiling, and curated English-Chinese training data.
Yi: Open Foundation Models by 01.AI cs.CL · 2024-03-07 · unverdicted · none · ref 4 · internal anchor
Yi models are 6B and 34B open foundation models pretrained on 3.1T curated tokens that achieve strong benchmark results through data quality and targeted extensions like long context and vision alignment.
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model cs.CV · 2024-02-06 · unverdicted · none · ref 3 · internal anchor
MobileVLM V2 shows that 1.7B and 3B parameter vision-language models can reach or exceed the performance of 3B and 7B+ models on common VLM benchmarks via targeted design and data improvements.
CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology cs.SE · 2024-02-02 · unverdicted · none · ref 69 · internal anchor
CodePori is a multi-agent LLM system for code generation whose participant evaluation identifies practical challenges like memory limits and hallucinations missed by binary benchmarks.
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism cs.CL · 2024-01-05 · unverdicted · none · ref 136 · internal anchor
DeepSeek LLM 67B exceeds LLaMA-2 70B on code, mathematics and reasoning benchmarks after pre-training on 2 trillion tokens and alignment via SFT and DPO.
Toward Native Multimodal Modeling: A Roadmap cs.CV · 2026-05-25 · unverdicted · none · ref 4 · internal anchor
A roadmap that defines architectural nativity for multimodal models and categorizes them into Multi-to-Text, Multi-to-Target, and Multi-to-Multi types while outlining an industrial pipeline toward unified transformer-based native multimodal modeling.

Qwen Technical Report

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer