super hub Mixed citations

Title resolution pending

Mistral 7B · 2023 · cs.CL · arXiv 2310.06825

Mixed citation behavior. Most common role is background (61%).

536 Pith papers citing it

Background 61% of classified citations

open full Pith review browse 536 citing papers more from Mistral 7B arXiv PDF

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

abstract

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 57 method 15 baseline 10 other 6 dataset 2

citation-polarity summary

background 55 use method 15 baseline 10 unclear 8 use dataset 2

claims ledger

abstract We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and auto

authors

author = Mistral 7B

co-cited works

representative citing papers

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

cs.CL · 2026-06-18 · unverdicted · novelty 8.0

Presents a new expert-curated dataset of multi-turn counterspeech dialogues in five languages targeting hate against seven groups, with span annotations linking to verified external knowledge for RAG applications.

Entropy-Gated Latent Recursion

cs.LG · 2026-06-15 · unverdicted · novelty 8.0 · 2 refs

EGLR adds a deterministic layer-recursion axis gated by entropy that is complementary to temperature sampling, raising joint oracle accuracy on MATH-500 from 83.4% to 91.6% for a 3B model.

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth

cs.CL · 2026-05-24 · unverdicted · novelty 8.0

Introduces BonaFide benchmark of 3,066 ground-truth labeled CoTs showing most faithfulness metrics perform near chance with biases and poor scaling to longer chains.

RTI-Bench: A Structured Dataset for Indian Right-to-Information Decision Analysis

cs.CL · 2026-05-16 · accept · novelty 8.0

RTI-Bench is the first publicly released structured dataset of CIC administrative decisions with outcome labels, exemption citations, IRAC reasoning, and timelines, built from 1,218 corpus cases and 298 PDFs, achieving 95.3% label precision on manual review and 57.3% accuracy on a Mistral 7B zero-Sh

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Crafting Reversible SFT Behaviors in Large Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.

DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning

cs.LG · 2026-05-04 · conditional · novelty 8.0 · 2 refs

INT4 quantization recovers up to 22 times more forgotten training data in unlearned LLMs, and the proposed DURABLEUN-SAF method is the first to maintain forgetting across BF16, INT8, and INT4 precisions.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

CacheTrap: Unveiling a Stealthier Gray-Box Trojan against LLMs

cs.CR · 2025-11-27 · conditional · novelty 8.0

CacheTrap achieves 100% targeted attack success on five open-source LLMs by using an efficient search to locate and flip a single bit in the KV cache as a transient trigger, while preserving normal accuracy without the trigger.

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

cs.CL · 2025-07-28 · accept · novelty 8.0

MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Information Dynamics of Language Communication

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

The paper defines STE and SPID, two information-theoretic measures of semantic flow and decomposition in language exchanges, and applies them to four dialogue datasets.

Anisotropy Decides Cosine vs. Rank Metrics for Text Embeddings

cs.CL · 2026-06-28 · conditional · novelty 7.0

Anisotropy, quantified by dominant-dimension variance fraction, determines the best parameter-free similarity metric for text embeddings, with rank-based metrics gaining ~20% relative where cosine is weakest.

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

Next-Billion AI Index: The compass for AI utility and adoption in the global majority

cs.CY · 2026-05-29 · unverdicted · novelty 7.0

Introduces nexbax, a diagnostic framework with three themes and 10 dimensions for evaluating AI economic viability, operational practicality, and societal integrity in next-billion-user contexts.

Vector Linking via Cross-Model Local Isometric Consistency

cs.AI · 2026-05-29 · unverdicted · novelty 7.0

A reference-based geometric hashing method recovers cross-model vector correspondences by exploiting local isometric consistency in contrastive embeddings and iteratively bootstrapping from a seed of paired anchors.

What Makes LVLMs Hallucinate Less? Unveiling the Architectural Factors Behind Hallucination Robustness

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

The study links three LVLM architectural dimensions to three hallucination types via a new benchmark, finding that language foundation quality reduces co-occurrence errors, visual encoder strength reduces similarity errors, alignment reduces uncertainty errors, and joint visual-alignment improvement

Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

Moral Trolley Arena shows frontier LLMs produce composite moral preferences that are compressed rather than additive functions of calibrated component act strengths across Moral Foundations Theory.

Toward Semantic-Agnostic and Shape-Aware Vision-Language Segmentation Models

cs.CV · 2026-05-27 · unverdicted · novelty 7.0

Introduces SANSA paradigm for semantic-agnostic vision-language segmentation via dictionary or example-based prompts, with finetuning delivering up to 20% mIoU gains on the new task while retaining standard performance.

Do LLMs Build World Models From Text? A Multilingual Diagnostic of Spatial Reasoning

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

MentalMap benchmark identifies a universal L3 reasoning cliff in LLMs' text-based spatial reasoning that persists across languages, scales, and prompting, and is replicated in human evaluations.

citing papers explorer

Showing 50 of 536 citing papers.

MSMO-ABSA: Multi-Scale and Multi-Objective Optimization for Cross-Lingual Aspect-Based Sentiment Analysis cs.CL · 2025-02-19 · unverdicted · none · ref 17 · internal anchor
MSMO framework achieves claimed SOTA cross-lingual ABSA via sentence- and aspect-level alignment, code-switching, consistency training, and knowledge distillation.
Social media polarization during conflict: Insights from an ideological stance dataset on Israel-Palestine Reddit comments cs.CL · 2025-02-01 · unverdicted · none · ref 32 · internal anchor
A new labeled dataset of 9,969 Israel-Palestine Reddit comments is created and used to compare stance classification methods, with a specific Mixtral prompt achieving the highest performance.
Retrieval-Augmented Generation with Graphs (GraphRAG) cs.IR · 2024-12-31 · unverdicted · none · ref 180 · internal anchor
A survey proposing a holistic GraphRAG framework with components including query processor, retriever, organizer, generator, and data source, plus domain-tailored reviews, challenges, and future directions.
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design cs.LG · 2024-12-19 · unverdicted · none · ref 20 · internal anchor
MixLLM uses global output-feature importance to set mixed bit-widths for LLM quantization and adds two-step dequantization plus software pipelining for system efficiency.
Exploring Cross-lingual Latent Transplantation: Mutual Opportunities and Open Challenges cs.CL · 2024-12-17 · unverdicted · none · ref 22 · internal anchor
XTransplant empirically shows that cross-lingual latent transplantation yields mutual benefits for multilingual capability and cultural adaptability in LLMs, especially low-resource ones, while revealing underutilized model potential.
VeriGraph: Scene Graphs for Execution Verifiable Robot Planning cs.RO · 2024-11-15 · unverdicted · none · ref 36 · internal anchor
VeriGraph integrates VLMs with scene-graph verification to raise robot task success rates by 30-58% over baselines in manipulation scenarios.
M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation cs.CV · 2024-08-29 · unverdicted · none · ref 23 · internal anchor
M4CXR is a multi-modal large language model that performs multiple tasks in chest X-ray analysis including report generation with claimed SOTA clinical accuracy using chain-of-thought prompting.
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback cs.CL · 2024-08-28 · unverdicted · none · ref 14 · internal anchor
WildFeedback extracts preference pairs from in-situ user feedback in LLM conversations to fine-tune models for better alignment with real user preferences.
An Iterative Utility Judgment Framework Inspired by Philosophical Relevance via LLMs cs.IR · 2024-06-17 · unverdicted · none · ref 1 · internal anchor
ITEM is a new iterative utility judgment loop for RAG that maps Schutz's three levels of relevance to retrieval, utility scoring, and generation, yielding measured gains on TREC DL, WebAP, GTI-NQ, and NQ.
ProTrain: Efficient LLM Training via Memory-Aware Techniques cs.DC · 2024-06-12 · unverdicted · none · ref 19 · internal anchor
ProTrain automates memory management for LLM training via cost models from profiling to deliver 1.43x-2.71x throughput gains over state-of-the-art systems without accuracy loss.
The Platonic Representation Hypothesis cs.LG · 2024-05-13 · unverdicted · none · ref 257 · internal anchor
Representations learned by large AI models are converging toward a shared statistical model of reality.
InternLM2 Technical Report cs.CL · 2024-03-26 · unverdicted · none · ref 154 · internal anchor
InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.
Mixtral of Experts cs.LG · 2024-01-08 · unverdicted · none · ref 18 · internal anchor
Mixtral 8x7B is a sparse MoE LLM activating 2 of 8 experts per layer that matches or exceeds Llama 2 70B and GPT-3.5 on benchmarks while using only 13B active parameters.
ARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiring cs.CL · 2026-05-04 · unverdicted · none · ref 56
ARGUS uses a Prosecutor-Defender-Umpire multi-agent setup plus RAG and chain-of-thought rewards to adapt ad policy enforcement to new regulations using minimal fresh labels.
MATCH: Modulating Attention via In-Context Retrieval for Long-Context Transformers cs.CL · 2026-06-29 · unverdicted · none · ref 101 · internal anchor
MATCH augments sparsified attention with an efficient in-context retrieval system to boost performance on long-range recall tasks in transformers.
Building Customer Support AI Agents at 100M-User Scale: An Evaluation-Driven Framework cs.CL · 2026-06-07 · unverdicted · none · ref 20 · internal anchor
An evaluation-driven framework for customer support AI agents at Nubank integrates context engineering, LLM judges, and A/B testing to deliver up to 37pp NPS gains and strong offline-online correlation across five production domains.
The Shape of Wisdom: Decision Trajectories in Language Models cs.AI · 2026-05-31 · unverdicted · none · ref 13 · internal anchor
A 9,000-trajectory study across three LLMs finds correctness and stability differ, with the largest group unstable-correct and attention scalars aligning better than MLPs in stable cases.
Why Specialist Models Still Matter: A Heterogeneous Multi-Agent Paradigm for Medical Artificial Intelligence cs.AI · 2026-05-28 · unverdicted · none · ref 20 · internal anchor
HetMedAgent is a heterogeneous multi-agent framework that fuses generalist LLMs and specialist models via conflict-aware fusion and uncertainty triggers, outperforming either alone on three clinical tasks.
Pre-Registering the Detectable Effect: A Paired-MDE Budget for 4-bit Quantization Benchmarks, with a Pilot Audit cs.LG · 2026-05-25 · unverdicted · none · ref 15 · internal anchor
Adapts Miettinen's paired-binary MDE formula to 4-bit quantization benchmarks as δ* ≤ (z_{1-α/2}+z_{1-β})√(ρ_d/m) and shows in a pilot that most observed FP16-NF4 deltas fall below the bound when ρ_d=0.10.
Tracing the ongoing emergence of human-like reasoning in Large Language Models cs.CL · 2026-05-20 · unverdicted · none · ref 79 · internal anchor
LLMs function as accurate semantic processors for conditionals but do not replicate the pragmatic inferences that define human reasoning.
LLM Benchmark Datasets Should Be Contamination-Resistant cs.LG · 2026-05-19 · unverdicted · none · ref 79 · internal anchor
Authors call for contamination-resistant LLM benchmarks that exploit Transformer training-inference asymmetry and require new mathematical methods for cross-architecture interoperability.
Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction cs.LG · 2026-05-18 · unverdicted · none · ref 27 · internal anchor
Structural protection of boundary tokens in globally capped KV cache eviction recovers 69-90% of full-cache quality at 13% retention and dominates differences among scoring policies.
From Text to DSL: Evaluating Grammar-Based Model Generation Using Open LLMs cs.SE · 2026-05-15 · unverdicted · none · ref 6 · internal anchor
Compact open-source LLMs can produce syntactically valid, semantically complete, and inter-model consistent DSL models from text via few-shot prompting, with some 7B-12B models matching much larger ones in quality.
Hybrid LLM-based Intelligent Framework for Robot Task Scheduling cs.RO · 2026-05-15 · unverdicted · none · ref 4 · internal anchor
Hybrid LLM framework using generator and supervisor agents to optimize task scheduling for construction robots, evaluated on a simple scenario with reported metrics.
It's Not the Size: Harness Design Determines Operational Stability in Small Language Models cs.SE · 2026-05-12 · unverdicted · none · ref 6 · internal anchor
A structured 4-stage pipeline harness raises task success rates to 95%+ in 2-3B parameter models while revealing format collapse and non-monotonic effects when harness support is removed.
Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care cs.AI · 2026-05-08 · unverdicted · none · ref 35 · internal anchor
Interactive LLM dialogue raised residents' hard-case diagnostic correctness from 0.589 to 0.734 and produced medium effect sizes in a blinded study of seven physicians on 52 emergency cases.
How Value Induction Reshapes LLM Behaviour cs.CL · 2026-05-08 · unverdicted · none · ref 2 · internal anchor
Inducing targeted values in LLMs through fine-tuning causes spillover to related or opposing values, boosts safety metrics, and increases anthropomorphic and sycophantic language across all tested values.
Towards Emotion Consistency Analysis of Large Language Models in Emotional Conversational Contexts cs.CL · 2026-05-07 · unverdicted · none · ref 3 · internal anchor
LLMs show below-average consistency and vulnerability to false beliefs in emotional queries with false presuppositions, more so for moderate emotions.
An Empirical Evaluation of Locally Deployed LLMs for Bug Detection in Python Code cs.SE · 2026-04-25 · unverdicted · none · ref 4 · internal anchor
Locally deployed LLMs achieve 43-45% accuracy on Python bug detection but frequently produce only partial identifications of problematic code regions.
Effects of Cross-lingual Evidence in Multilingual Medical Question Answering cs.CL · 2026-04-22 · unverdicted · none · ref 12 · internal anchor
Combining English and target-language web retrieval boosts medical QA for low-resource languages to match high-resource performance, while English web data benefits high-resource languages most and specialized sources like PubMed lack multilingual coverage.
Network Effects and Agreement Drift in LLM Debates cs.SI · 2026-04-13 · unverdicted · none · ref 2 · internal anchor
LLM agents in controlled network debates show agreement drift toward specific opinion positions, requiring separation of structural effects from LLM biases before using them as human behavioral proxies.
Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series cs.CL · 2026-04-12 · unverdicted · none · ref 1 · internal anchor
Bielik v3 models achieve better Polish language modeling efficiency by switching to a dedicated tokenizer, FOCUS initialization, multi-stage pretraining, and post-training with SFT, DPO, and GRPO.
Adaptive Multi-Expert Reasoning via Difficulty-Aware Routing and Uncertainty-Guided Aggregation cs.CL · 2026-04-11 · unverdicted · none · ref 5 · internal anchor
AMR uses difficulty-aware routing and uncertainty-guided aggregation across three experts plus a neural verifier to reach 75.28% accuracy on GSM8K without synthetic training data.
MedThink: Enhancing Diagnostic Accuracy in Small Models via Teacher-Guided Reasoning Correction cs.CY · 2026-04-09 · unverdicted · none · ref 27 · internal anchor
MedThink, a two-stage teacher-guided reasoning correction distillation framework, boosts small language models' medical diagnostic accuracy by up to 12.7% on benchmarks and achieves 56.4% on a gastroenterology dataset.
LLM-Driven Cost-Effective Requirements Change Impact Analysis cs.SE · 2025-10-31 · unverdicted · none · ref 36 · internal anchor
ProReFiCIA uses LLMs with tailored prompts to identify impacted requirements, achieving 85.7% recall on unseen industrial data while requiring review of only 3% of requirements, rising to 95.7% recall with RAG at 3.6% review cost.
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights cs.CL · 2025-10-06 · unverdicted · none · ref 25 · internal anchor
This work systematically compares inter-layer and intra-layer hybridization strategies for combining self-attention and Mamba-style state space models, evaluating them on language modeling, downstream tasks, long-context performance, scaling, and efficiency to derive optimal design recipes.
From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems cs.CL · 2025-07-10 · unverdicted · none · ref 12 · internal anchor
Coreference resolution improves retrieval relevance and QA performance in RAG systems, with mean pooling performing best and smaller models benefiting more.
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model cs.CV · 2025-02-14 · unverdicted · none · ref 82 · internal anchor
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.
TerraQ: Spatiotemporal Question-Answering on Satellite Image Archives cs.CV · 2025-02-06 · unverdicted · none · ref 3 · internal anchor
TerraQ is a spatiotemporal question-answering engine for satellite image archives that processes natural language requests involving image metadata and knowledge base entities.
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction cs.CV · 2025-01-03 · conditional · none · ref 22 · internal anchor
VITA-1.5 integrates vision and speech into a single LLM through multi-stage training, delivering competitive benchmark results on image, video, and speech tasks with near real-time response speed.
Open-Sora Plan: Open-Source Large Video Generation Model cs.CV · 2024-11-28 · unverdicted · none · ref 7 · internal anchor
Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.
Are Decoder-Only Large Language Models the Silver Bullet for Code Search? cs.SE · 2024-10-29 · unverdicted · none · ref 60 · internal anchor
Fine-tuned decoder-only LLMs achieve up to 40.4% higher MAP than UniXcoder on CoSQA+ for code search, with non-monotonic size scaling and data composition sensitivity.
MinerU: An Open-Source Solution for Precise Document Content Extraction cs.CV · 2024-09-27 · conditional · none · ref 12 · internal anchor
MinerU delivers an open-source pipeline for high-precision document content extraction by integrating specialized models with tuned preprocessing and postprocessing rules.
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation cs.RO · 2024-09-19 · unverdicted · none · ref 6 · internal anchor
TinyVLA achieves faster inference and higher data efficiency than OpenVLA on robotic manipulation tasks by initializing from high-speed multimodal models and adding a diffusion policy decoder, without any pre-training phase.
Qwen2.5-Coder Technical Report cs.CL · 2024-09-18 · unverdicted · none · ref 23 · internal anchor
Qwen2.5-Coder models claim state-of-the-art results on over 10 code benchmarks, outperforming larger models of similar size.
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs cs.CV · 2024-06-11 · unverdicted · none · ref 20 · internal anchor
VideoLLaMA 2 improves video LLMs via a new STC connector for spatial-temporal dynamics and joint audio training, reaching competitive results on video QA and captioning benchmarks.
Automatic Combination of Sample Selection Strategies for Few-Shot Learning cs.LG · 2024-02-05 · unverdicted · none · ref 6 · internal anchor
ACSESS automatically combines 23 sample selection strategies to outperform individual strategies in few-shot learning on text and image datasets.
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism cs.CL · 2024-01-05 · unverdicted · none · ref 135 · internal anchor
DeepSeek LLM 67B exceeds LLaMA-2 70B on code, mathematics and reasoning benchmarks after pre-training on 2 trillion tokens and alignment via SFT and DPO.
Customized Generative AI Agent for Transportation Engineering Practice: A Development and Continued Pre-training Guideline cs.AI · 2026-06-27 · unverdicted · none · ref 16 · internal anchor
A framework is described for adapting six LLMs to transportation engineering via LoRA-based continued pretraining on domain documents, with two models showing strongest results on BLEU-4 and ROUGE metrics.
Hybrid Metadata Extraction from League of Nations Index Cards: From Feasibility Study to Archival System Integration cs.DL · 2026-06-04 · unverdicted · none · ref 8 · internal anchor
A hybrid AI workflow combining fine-tuned vision-language models with specialized OCR extracts metadata from League of Nations index cards for improved archival access in the LONTAD project.

Title resolution pending

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer