Randomly replacing labels in in-context demonstrations barely hurts performance, showing that label space, input distribution, and sequence format drive in-context learning more than ground-truth labels.
Title resolution pending
19 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
γ-BIFR unifies DP-λCGD and BISR into a single banded inverse factorization that improves RMSE and theoretical guarantees in the low-bandwidth regime for multi-epoch private learning.
NeuralBench is a new benchmarking framework for neuroAI models on EEG data that finds foundation models only marginally outperform task-specific ones while many tasks like cognitive decoding stay highly challenging.
For binary classification in the NTK regime, LoRA rank r=1 suffices and is often optimal under cross-entropy loss, reducing the prior sufficient condition from r>=12.
StyleShield uses flow matching in continuous token embeddings with a DiT backbone to achieve 94.6% evasion on trained detectors and over 99% on unseen ones in Chinese benchmarks, with 0.928 semantic similarity, plus a RateAudit method to arbitrarily control detection rates.
STELLAR trains up to 500M-parameter multi-modal models on 50M driving scenes and reports empirical scaling trends plus new state-of-the-art results on the Waymo Open Dataset.
Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.
NASH decomposes the validation utility into Shapley-informative component functions and aggregates them non-linearly to make Data Shapley-based data selection consistently effective.
The power distribution is the target of power sampling, the closed-form solution to self-reward KL-regularized RL, and the basis for power self-distillation that matches sampling performance at lower cost.
An SE(3)-equivariant transformer encodes 3D protein-ligand interactions via contrastive learning for zero-shot virtual screening, and these embeddings condition a multimodal chemical language model to autoregressively generate target-specific molecules with favorable predicted binding properties.
DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.
Media sentiment indicators from Canadian news, when added to a New Keynesian model with endogenous central-bank response, improve out-of-sample forecasts and account for part of monetary-policy propagation to output and prices.
MaskTab is a masked pretraining method for industrial tabular data that delivers measurable gains in classification AUC and KS metrics while enabling effective distillation to smaller models.
Semantic role understanding partially emerges during language model pre-training, with linear probes on frozen representations achieving substantial performance that improves with scale but does not match fine-tuned models, and representations shifting toward more distributed forms at larger scales.
High-dimensional embeddings excel in few-shot regimes for some wireless tasks but carry high latency and parameter costs, whereas compressed autoencoder representations provide better noise robustness, stability, and efficiency.
The work introduces WaLeF/FIDLAr for flood forecasting, CoDiCast for probabilistic weather, and Hypercube-RAG for explainable environmental QA, claiming superior accuracy, efficiency, and interpretability over baselines.
ReXCL automates extraction of requirements into a schema and their classification via adaptive fine-tuning of encoder models to improve efficiency and accuracy in software development.
A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.
Llama 3.1 8B fine-tuned with calibrated 5% synthetic data augmentation reaches 0.6234 F1-macro on multi-class toxicity detection in gaming chat and places fourth among 35 teams.
citing papers explorer
-
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Randomly replacing labels in in-context demonstrations barely hurts performance, showing that label space, input distribution, and sequence format drive in-context learning more than ground-truth labels.
-
Beyond Square Roots: Explicit Memory-Efficient Factorization for Multi-Epoch Private Learning
γ-BIFR unifies DP-λCGD and BISR into a single banded inverse factorization that improves RMSE and theoretical guarantees in the low-bandwidth regime for multi-epoch private learning.
-
NeuralBench: A Unifying Framework to Benchmark NeuroAI Models
NeuralBench is a new benchmarking framework for neuroAI models on EEG data that finds foundation models only marginally outperform task-specific ones while many tasks like cognitive decoding stay highly challenging.
-
Rethinking the Rank Threshold for LoRA Fine-Tuning
For binary classification in the NTK regime, LoRA rank r=1 suffices and is often optimal under cross-entropy loss, reducing the prior sufficient condition from r>=12.
-
StyleShield: Exposing the Fragility of AIGC Detectors through Continuous Controllable Style Transfer
StyleShield uses flow matching in continuous token embeddings with a DiT backbone to achieve 94.6% evasion on trained detectors and over 99% on unseen ones in Chinese benchmarks, with 0.928 semantic similarity, plus a RateAudit method to arbitrarily control detection rates.
-
STELLAR: Scaling 3D Perception Large Models for Autonomous Driving
STELLAR trains up to 500M-parameter multi-modal models on 50M driving scenes and reports empirical scaling trends plus new state-of-the-art results on the Waymo Open Dataset.
-
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.
-
Is Data Shapley Not Better than Random in Data Selection? Ask NASH
NASH decomposes the validation utility into Shapley-informative component functions and aggregates them non-linearly to make Data Shapley-based data selection consistently effective.
-
Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation
The power distribution is the target of power sampling, the closed-form solution to self-reward KL-regularized RL, and the basis for power self-distillation that matches sampling performance at lower cost.
-
Structure-guided molecular design with contrastive 3D protein-ligand learning
An SE(3)-equivariant transformer encodes 3D protein-ligand interactions via contrastive learning for zero-shot virtual screening, and these embeddings condition a multimodal chemical language model to autoregressively generate target-specific molecules with favorable predicted binding properties.
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.
-
Monetary Policy in the Media Spotlight: Sentiments, Signals, and Economic Impact
Media sentiment indicators from Canadian news, when added to a New Keynesian model with endogenous central-bank response, improve out-of-sample forecasts and account for part of monetary-policy propagation to output and prices.
-
MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification
MaskTab is a masked pretraining method for industrial tabular data that delivers measurable gains in classification AUC and KS metrics while enabling effective distillation to smaller models.
-
Emergent Semantic Role Understanding in Language Models
Semantic role understanding partially emerges during language model pre-training, with linear probes on frozen representations achieving substantial performance that improves with scale but does not match fine-tuned models, and representations shifting toward more distributed forms at larger scales.
-
Benchmarking Wireless Representations: High-Dimensional vs. Compressed Embeddings for Efficiency and Robustness
High-dimensional embeddings excel in few-shot regimes for some wireless tasks but carry high latency and parameter costs, whereas compressed autoencoder representations provide better noise robustness, stability, and efficiency.
-
Accurate, Efficient, and Explainable Deep Learning Approaches for Environmental Science Problems
The work introduces WaLeF/FIDLAr for flood forecasting, CoDiCast for probabilistic weather, and Hypercube-RAG for explainable environmental QA, claiming superior accuracy, efficiency, and interpretability over baselines.
-
Read, Extract, Classify: A Tool for Smarter Requirements Engineering
ReXCL automates extraction of requirements into a schema and their classification via adaptive fine-tuning of encoder models to improve efficiency and accuracy in software development.
-
Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages
A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.
-
PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat
Llama 3.1 8B fine-tuned with calibrated 5% synthetic data augmentation reaches 0.6234 F1-macro on multi-class toxicity detection in gaming chat and places fourth among 35 teams.