{"total":174,"items":[{"citing_arxiv_id":"2606.20751","ref_index":83,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"From Sentiment to Actionable Insights: A Data-Driven Public Sentiment Analysis of Advanced Air Mobility","primary_cat":"cs.CL","submitted_at":"2026-06-18T03:07:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Applies standard sentiment classifiers and topic modeling to a large AAM discussion corpus, identifies six clusters of public concern, and lists strategies to address them.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09131","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation","primary_cat":"cs.AI","submitted_at":"2026-06-08T07:28:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DPVR-LF routes saturated vision tokens into a one-layer side branch after layer 4, runs text-only processing through layers 5-17, and performs late fusion at the final layer to reduce visual computation while preserving multimodal performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09881","ref_index":66,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Toward Calibrated, Fair, and accurate Deepfake Detection","primary_cat":"cs.LG","submitted_at":"2026-06-03T05:44:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31512","ref_index":38,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Reliable Multilingual Orthopedic Decision Support from Clinical Narratives: Language-Aware Adaptation and Verification-Guided Deferral","primary_cat":"cs.CL","submitted_at":"2026-05-29T16:30:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"IndicBERT-HPA with language-aware adapters and verification-guided deferral outperforms baselines on multilingual orthopedic note classification, reaching 0.8792 Macro-F1 overall and 84.4% selective accuracy at 72.3% coverage.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30729","ref_index":29,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching","primary_cat":"cs.LG","submitted_at":"2026-05-29T01:45:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SemStruct models tables as heterogeneous graphs with GNNs on frozen PLM embeddings to incorporate row co-occurrences for schema matching and reports SOTA results on Valentine and SOTAB-SM benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30448","ref_index":30,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Bounded Behavioral Indistinguishability for Black-Box LLM Distillation","primary_cat":"cs.LG","submitted_at":"2026-05-28T18:19:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Introduces (ε,q,t,A)-behavioral indistinguishability and shows via Qwen/Llama experiments that LoRA distillation boosts semantic similarity but leaves detectable behavioral differences under adversarial evaluation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29755","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Rec-Distill: An Industrial Distillation Pipeline for Large-Scale Recommendation Models","primary_cat":"cs.IR","submitted_at":"2026-05-28T10:59:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Rec-Distill is an industrial distillation pipeline that transfers substantial performance from large-scale recommendation models to efficient students, reporting over 60% transferability and measurable business gains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29705","ref_index":43,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"BitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge-Devices","primary_cat":"cs.AI","submitted_at":"2026-05-28T10:04:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"BitTP applies weight-only 1.58-bit quantization to LLM trajectory predictors, claiming improved ADE/FDE over BF16 baseline with reduced resource demands on edge devices.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29302","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ViASNet: A Video Ad Saliency Network for Predicting Dynamic Saliency and Viewer Engagement","primary_cat":"cs.CV","submitted_at":"2026-05-28T03:33:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"ViASNet applies a 3D U-Net architecture augmented with audio and semantic inputs to predict dynamic saliency in video ads and uses frame-wise entropy to diagnose low-engagement scenes on eye-tracked data from 151 ads.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28767","ref_index":85,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Principled Algorithms for Optimizing Generalized Metrics in Multi-Label Learning","primary_cat":"cs.LG","submitted_at":"2026-05-27T17:23:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Develops H-consistent surrogate losses for generalized metrics in multi-label classification that decompose exactly in O(l) time and introduces the MMO family of algorithms.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28483","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"From Learning Resources to Competencies: LLM-Based Tagging with Evidence and Graph Constraints","primary_cat":"cs.AI","submitted_at":"2026-05-27T13:41:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"An LLM+BM25+graph pipeline tags learning resources to competencies with evidence spans, reaching 0.57 micro-F1 and 0.50 macro-F1 at fragment level on a 22-competency university dataset while outperforming baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27345","ref_index":2,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MATCHA: Matching Text via Contrastive Semantic Alignment","primary_cat":"cs.CL","submitted_at":"2026-05-26T17:47:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MATCHA introduces a dual-view contrastive metric measuring proximity to gold text and distance from adversarial contradictions, outperforming ROUGE and BERTScore by up to 20% on TruthfulQA and other NLP benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23857","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Strong Teacher Not Needed? On Distillation in LLM Pretraining","primary_cat":"cs.LG","submitted_at":"2026-05-22T17:16:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Even small or undertrained teachers improve larger LLM students via distillation with tuned loss mixing, while stronger teachers can saturate or reverse gains and distillation aids generalization more than in-domain fit.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23482","ref_index":54,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Multimodal Distribution Matching for Vision-Language Dataset Distillation","primary_cat":"cs.CV","submitted_at":"2026-05-22T10:41:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MDM distills vision-language datasets via joint embedding clustering, weight-space model interpolation, and geometry-aware distribution matching on the unit hypersphere.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23244","ref_index":114,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Convex Optimization for Alignment and Preference Learning on a Single GPU","primary_cat":"cs.LG","submitted_at":"2026-05-22T05:25:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"COALA applies convex optimization reformulations of neural networks to direct preference optimization, claiming single-GPU training with ~18% of DPO's TFLOPs and competitive performance on multiple datasets and models up to 8B parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22779","ref_index":22,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection","primary_cat":"cs.SE","submitted_at":"2026-05-21T17:34:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FAME achieves F1 of 98.16 on BGL and 99.95 on Thunderbird for message-level log anomaly detection using at most K=100 labels per template, reducing annotation effort by 76x while detecting anomalies from unseen EventIDs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22738","ref_index":54,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Proxy-Based Approximation of Shapley and Banzhaf Interactions","primary_cat":"cs.LG","submitted_at":"2026-05-21T17:09:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ProxySHAP approximates higher-order Shapley and Banzhaf interactions via tree proxies plus residual correction and a polynomial-time interventional TreeSHAP generalization for tree ensembles.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22731","ref_index":23,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Post-Training is About States, Not Tokens: A State Distribution View of SFT, RL, and On-Policy Distillation","primary_cat":"cs.LG","submitted_at":"2026-05-21T17:03:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A state distribution view of post-training shows that on-policy supervision from the learner itself can outperform fixed-dataset SFT and preserve retention better than aggressive supervised updates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22120","ref_index":47,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Effective User-defined Keyword Spotting with Dual-stage Matching, Multi-modal Enrollment, and Continual Adaptation","primary_cat":"eess.AS","submitted_at":"2026-05-21T07:52:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DMA-KWS achieves 97.85% AUC and 6.13% EER on LibriPhrase Hard via dual-stage CTC/QbyT matching, multi-modal enrollment, and lightweight continual adaptation with 187k parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21789","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging","primary_cat":"hep-ex","submitted_at":"2026-05-20T22:31:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PHAT-JeT combines geometric message-passing with hierarchical patch attention to reach state-of-the-art accuracy and background rejection among resource-constrained jet tagging models on four benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21627","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Distribution-free root cause analysis","primary_cat":"stat.ME","submitted_at":"2026-05-20T18:40:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CROC constructs finite-sample valid confidence sets for the root-cause index in multi-stream change detection using conformal p-values under independence and exchangeability assumptions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21421","ref_index":86,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AIGaitor: Privacy-preserving and cloud-free motion analysis for everyone, using edge computing","primary_cat":"cs.CV","submitted_at":"2026-05-20T17:14:57+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21299","ref_index":68,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Tracing the ongoing emergence of human-like reasoning in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-20T15:28:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LLMs function as accurate semantic processors for conditionals but do not replicate the pragmatic inferences that define human reasoning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20723","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Memory-Efficient Partitioned DNN Inference on Resource-Constrained Android Crowds","primary_cat":"cs.LG","submitted_at":"2026-05-20T05:21:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"CROWDio enables memory-efficient ONNX inference of DistilBERT on Android handsets by partitioning across devices with JIT loading, affinity scheduling, compressed transport and streaming, keeping per-device memory at 43 MB and cutting latency 34%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20683","ref_index":35,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Layer-wise Token Compression for Efficient Document Reranking","primary_cat":"cs.IR","submitted_at":"2026-05-20T03:52:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20423","ref_index":25,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind","primary_cat":"cs.AI","submitted_at":"2026-05-19T19:19:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"OSCToM uses RL-guided generation with an extended DSL and surrogate models to create nested belief conflict tasks, raising FANToM accuracy from 0.2% to 76% while being 6x more efficient.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20052","ref_index":16,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling","primary_cat":"cs.CL","submitted_at":"2026-05-19T16:07:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PromptRad reformulates multi-label radiology report classification as masked language modeling and enriches verbalizers with UMLS synonyms, outperforming baselines with only 32 training examples.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18752","ref_index":114,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Traditional statistical representations outperform generative AI in identifying expert peer reviewers","primary_cat":"cs.IR","submitted_at":"2026-05-18T17:59:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TF-IDF identifies labeled experts in the top 25 recommendations 79.5% of the time versus 51.5% for GPT-4o mini on an astronomy observatory dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18643","ref_index":34,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Post-Trained MoE Can Skip Half Experts via Self-Distillation","primary_cat":"cs.LG","submitted_at":"2026-05-18T16:50:48+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18180","ref_index":39,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Canonical Regularisation of Wide Feature-Learning Neural Networks","primary_cat":"stat.ML","submitted_at":"2026-05-18T10:23:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Derives geodesic ridge regularization and Riemannian Gibbs Process prior for feature-learning wide neural networks, generalizing kernel-regime results via function-space axiomatization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18066","ref_index":68,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"TIDAL: Recovering Temporal Phase for Cloud Block Storage Placement from LLM-Derived Semantics","primary_cat":"cs.OS","submitted_at":"2026-05-18T08:49:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TIDAL recovers temporal phase signals from LLM-derived semantics of provisioning metadata to enable complementary CVD placement, reducing overload frequency by 79.1% on production traces.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18007","ref_index":129,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling","primary_cat":"cs.CL","submitted_at":"2026-05-18T08:03:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RISE is an inference-time semantic reranking framework that refines low-confidence predictions in rhetorical role labeling using contrastively learned label representations, delivering an average +9.15 macro-F1 gain on hard examples across eight datasets and seven models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17486","ref_index":73,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization","primary_cat":"cs.RO","submitted_at":"2026-05-17T14:55:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17432","ref_index":27,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DP-SelFT: Differentially Private Selective Fine-Tuning for Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-17T12:55:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DP-SelFT improves the privacy-utility trade-off for LLM fine-tuning by selecting robust layer subsets via DP synthetic data and perturbation-matched evaluation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16991","ref_index":157,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning","primary_cat":"cs.CL","submitted_at":"2026-05-16T13:22:57+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15460","ref_index":32,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Differentially Private Motif-Preserving Multi-modal Hashing","primary_cat":"cs.IR","submitted_at":"2026-05-14T22:43:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DMP-MH clips degrees to control triangle sensitivity, synthesizes an edge-DP graph with Noisy Mirror Descent, and distills it into dual-stream hash networks, beating private baselines by up to 11.4 mAP on MIRFlickr-25K and NUS-WIDE while keeping 92.5% of non-private performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15413","ref_index":41,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Transformer Scalability Crisis: The First Comprehensive Empirical Analysis of Performance Walls in Modern Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-14T20:57:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Empirical tests on 118 transformers show success falling from 88.1% at 512 tokens to 0% at 2048 tokens, with compressed models achieving 649.2 tokens/sec/M parameters versus 12.5 for large generative ones.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15299","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Fortress: A Case Study in Stabilizing Search Recommendations via Temporal Data Augmentation and Feature Pruning","primary_cat":"cs.IR","submitted_at":"2026-05-14T18:13:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Fortress stabilizes query-to-app relevance models by pruning features that cause inconsistent predictions across time periods while retaining predictive power from engagement signals.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15104","ref_index":85,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents","primary_cat":"cs.CL","submitted_at":"2026-05-14T17:22:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14260","ref_index":21,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"On the Burden of Achieving Fairness in Conformal Prediction","primary_cat":"stat.ML","submitted_at":"2026-05-14T02:02:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Pooled conformal calibration incurs irreducible group-wise coverage distortion scaled by cross-group quantile heterogeneity, with Equalized Coverage and Equalized Set Size in fundamental tension.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14257","ref_index":26,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Sakura at BEA 2026 Shared Task 1: What Makes Vocabulary Difficult?","primary_cat":"cs.CL","submitted_at":"2026-05-14T01:57:35+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Fine-tuned LLM and explainable models predict vocabulary difficulty with correlations r > 0.91 and r > 0.77, showing spelling difficulty and test item construction as key influences in addition to word production difficulty.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14075","ref_index":50,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity","primary_cat":"cs.LG","submitted_at":"2026-05-13T19:51:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Cosine similarity poorly predicts performance degradation from layer removal in LLMs, making direct accuracy-drop ablation a more reliable relevance metric.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14071","ref_index":32,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Distribution Corrected Offline Data Distillation for Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-13T19:47:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A distribution-correction framework for offline LLM reasoning distillation improves accuracy on math benchmarks by adaptively aligning teacher supervision with the student's inference-time distribution.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"LLM post-training in domains such as math reasoning and safety alignment. The core idea behind this approach is to fine-tune a pretrained modelπθ to maximize accumulated rewards within a domain Qover self-generated trajectories: JRL(θ) =E x∼Q, y∼π θ(·|x) [R(y, x)],(1) where R(x, y) is a reward function that scores a response y (e.g., a reasoning trajectory). Following the policy gradient principle [32], we derive the following gradient update for Eq. (1): ∇θJRL(θ) =E x∼Q,y∼π θ(· |x)| {z } On-policy sampling  R(x, y)· |y|X t=0 logπ θ(yt |x, y :<t)   ,(2) Prior work has improved training stability by replacing the reward functionR(x, y) with a normalized Advantagefunction, tokenizing the optimization formulation, and introducing clipping and model"},{"citing_arxiv_id":"2605.13190","ref_index":42,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation","primary_cat":"cs.LG","submitted_at":"2026-05-13T08:46:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"N-vium achieves 57.9% wall-clock speedup over matched standard transformers at no perplexity cost by mixing exact predictions from multiple model depths.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12741","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation","primary_cat":"cs.LG","submitted_at":"2026-05-12T20:46:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RESD turns failure trajectories into token-level supervision via retrospective reflections and a persistent global playbook, enabling faster improvement than standard self-distillation or GRPO with only one rollout per prompt.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12139","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"BoolXLLM: LLM-Assisted Explainability for Boolean Models","primary_cat":"cs.AI","submitted_at":"2026-05-12T13:58:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BoolXLLM augments an existing Boolean rule learner with LLMs for feature selection, discretization thresholds, and natural-language rule translation to improve interpretability while preserving accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11619","ref_index":22,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PhishSigma++: Malicious Email Detection with Typed Entity Relations","primary_cat":"cs.CR","submitted_at":"2026-05-12T06:46:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PhishSigma++ reaches 0.9675 F1 on clean data and holds 0.9579 F1 under adversarial text padding by modeling typed entity relations in emails, outperforming text-only baselines that drop sharply.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"benign-header splicing to dilute relation evidence - require attacker-aware pro- tocols and remain future work, alongside authenticated-header forgery, sender- reputation manipulation, landing-page rewriting, concept drift [19], and larger pretrained text encoders. 7 Related Work Email-phishing detection has largely been framed as a content problem. URL- lexical models [20], deep-learning text classifiers [21,22], and the broader survey of Das et al. [4] show that strong clean performance is achievable when the model can freely consume body text, URLs, and surface tokens. These systems are effective at recognizing recurring lexical patterns, but their evidence is often difficult to map back to the operational question an analyst asks about a specific message: which fields disagree, which role is being impersonated, and which link"},{"citing_arxiv_id":"2605.11290","ref_index":25,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ReAD: Reinforcement-Guided Capability Distillation for Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-11T22:17:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ReAD applies a contextual bandit to allocate fixed-token distillation budget across interdependent LLM capabilities, yielding higher task utility and fewer negative spillovers than standard methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Knowledge distillation (KD) [12] for large language models (LLMs) [41] has become an essential research direction for enhancing the accessibility and efficiency of Machine-Learning-as-a-Service (MLaaS) [3, 35, 37]. Through knowledge distillation, a small LLM learns to imitate a large teacher LLM's outputs, enabling effective deployment under limited computational or financial resources [25, 14, 31, 16, 18]. Recent studies show that capability distillation, a specialization of knowledge distillation that focuses supervision on a target capability (e.g., instruction-following, reasoning, mathematics, or coding), can substantially improve the student's performance on downstream tasks that primarily depend on that capability, while achieving these gains at lower serving cost than the"},{"citing_arxiv_id":"2605.09857","ref_index":36,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Unified Approach for Weakly Supervised Multicalibration","primary_cat":"stat.ML","submitted_at":"2026-05-11T01:30:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A unified framework uses contamination-matrix risk rewrites and witness-based calibration constraints to estimate and correct multicalibration under weak supervision, providing finite-sample guarantees and the WLMC post-hoc recalibration algorithm.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"),Proceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pp. 1386-1394, Lille, France, 07-09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/plessis15.html. [35] Sanh, V ., Debut, L., Chaumond, J., and Wolf, T. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.arXiv preprint arXiv:1910.01108, 2019. [36] Sharma, S., Gee, A. H., Paydarfar, D., and Ghosh, J. Fair-n: Fair and robust neural networks for structured data. InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp. 946-955, 2021. [37] Widmann, D., Lindsten, F., and Zachariah, D. Calibration tests beyond classification. InInternational Conference on Learning Representations, 2021."},{"citing_arxiv_id":"2605.08992","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"When More Parameters Hurt: Foundation Model Priors Amplify Worst-Client Disparity Under Extreme Federated Heterogeneity","primary_cat":"cs.LG","submitted_at":"2026-05-09T15:22:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Foundation model priors amplify worst-client disparity under extreme federated heterogeneity, creating a fairness paradox where larger models perform worse for disadvantaged clients.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}