A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
hub
The llama 3 herd of models.arXiv e-prints, pages arXiv–2407
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Aurora unifies speculative decoder training and serving via asynchronous RL on inference traces, delivering 1.5x day-0 speedup on frontier models and 1.25x adaptation gains on distribution shifts.
Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.
SynBench benchmarks DP text generators across nine datasets and uses a new MIA to show that public pre-training on portions of private data overestimates synthetic text quality and breaks DP privacy bounds.
A distributional alignment metric d_NTP and a linear regression method LTV for task vectors that improves accuracy by 9.2% over baselines on classification and regression tasks across multiple LLMs.
Semantically invariant row and column permutations in tables can cause LLMs to output incorrect answers, and a gradient-based attack called ATP efficiently finds such permutations that degrade performance across many models.
A neuro-symbolic system using LLM-guided best-first search and Isabelle tools proves up to 77.6% of theorems on the seL4 benchmark, outperforming prior LLM methods and Sledgehammer.
FlexServe achieves up to 10x faster time-to-first-token for secure LLM inference on mobile devices by using flexible resource isolation in TrustZone compared to standard approaches.
veScale-FSDP uses RaggedShard and structure-aware planning to support block-wise quantization and non-element-wise optimizers while delivering 5-66% higher throughput and 16-30% lower memory than prior FSDP systems at massive scale.
GenAI framework generates isomorphic physics problem banks via prompt chaining and validates them with 17 language models that correlate with student performance (ρ up to 0.594), achieving homogeneous difficulty in 73% of deployed banks.
ReAlign corrects the modality gap in unpaired data to let MLLMs learn visual distributions from text alone before instruction tuning, reducing dependence on expensive paired corpora.
StockR1 unifies LLM-based financial reasoning and time-series forecasting by emitting verifiable forecast actions that condition a decoder, optimized via consistency-grounded RL to improve accuracy on QA and prediction tasks.
LAR learns a compact latent action space from trajectories that shortens the effective decision horizon for LLM agents, reducing token count and inference time while preserving task success.
SPON adds a small set of trainable input-independent activation vectors as representational anchors, trained by distribution matching, to stabilize sparse activation in LLMs and recover performance lost to hidden-state distribution shifts.
Gradient analysis and ablations show DPO and PPO have different target directions and component roles in preference optimization for LLMs.
citing papers explorer
-
STRABLE: Benchmarking Tabular Machine Learning with Strings
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
-
When RL Meets Adaptive Speculative Training: A Unified Training-Serving System
Aurora unifies speculative decoder training and serving via asynchronous RL on inference traces, delivering 1.5x day-0 speedup on frontier models and 1.25x adaptation gains on distribution shifts.
-
Scaling Latent Reasoning via Looped Language Models
Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.
-
SynBench: A Benchmark for Differentially Private Text Generation
SynBench benchmarks DP text generators across nine datasets and uses a new MIA to show that public pre-training on portions of private data overestimates synthetic text quality and breaks DP privacy bounds.
-
Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning
A distributional alignment metric d_NTP and a linear regression method LTV for task vectors that improves accuracy by 9.2% over baselines on classification and regression tasks across multiple LLMs.
-
The Power of Order: Fooling LLMs with Adversarial Table Permutations
Semantically invariant row and column permutations in tables can cause LLMs to output incorrect answers, and a gradient-based attack called ATP efficiently finds such permutations that degrade performance across many models.
-
Neuro-Symbolic Proof Generation for Scaling Systems Software Verification
A neuro-symbolic system using LLM-guided best-first search and Isabelle tools proves up to 77.6% of theorems on the seL4 benchmark, outperforming prior LLM methods and Sledgehammer.
-
FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation
FlexServe achieves up to 10x faster time-to-first-token for secure LLM inference on mobile devices by using flexible resource isolation in TrustZone compared to standard approaches.
-
veScale-FSDP: Flexible and High-Performance FSDP at Scale
veScale-FSDP uses RaggedShard and structure-aware planning to support block-wise quantization and non-element-wise optimizers while delivering 5-66% higher throughput and 16-30% lower memory than prior FSDP systems at massive scale.
-
Scalable Generation and Validation of Isomorphic Physics Problems with GenAI
GenAI framework generates isomorphic physics problem banks via prompt chaining and validates them with 17 language models that correlate with student performance (ρ up to 0.594), achieving homogeneous difficulty in 73% of deployed banks.
-
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
ReAlign corrects the modality gap in unpaired data to let MLLMs learn visual distributions from text alone before instruction tuning, reducing dependence on expensive paired corpora.
-
Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs
StockR1 unifies LLM-based financial reasoning and time-series forecasting by emitting verifiable forecast actions that condition a decoder, optimized via consistency-grounded RL to improve accuracy on QA and prediction tasks.
-
Latent Action Reparameterization for Efficient Agent Inference
LAR learns a compact latent action space from trajectories that shortens the effective decision horizon for LLM agents, reducing token count and inference time while preserving task success.
-
Resting Neurons, Active Insights: Robustifying Activation Sparsity in LLMs via Spontaneity
SPON adds a small set of trainable input-independent activation vectors as representational anchors, trained by distribution matching, to stabilize sparse activation in LLMs and recover performance lost to hidden-state distribution shifts.
-
What Is Preference Optimization Doing, and Why?
Gradient analysis and ablations show DPO and PPO have different target directions and component roles in preference optimization for LLMs.