{"total":39,"items":[{"citing_arxiv_id":"2606.01179","ref_index":28,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Physics-Informed Deep Learning for Entropy Prediction in Heterogeneous Systems: Thermodynamic and Information-Theoretic Case Studies","primary_cat":"cs.LG","submitted_at":"2026-05-31T11:38:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A PIDL framework with shared-encoder architecture and Softplus constraints solves CSTR ODEs and financial inverse Fokker-Planck PDEs, claiming zero Second-Law violations and over 90% accuracy with 30% training data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00821","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Comparative Analysis of Machine Learning Algorithms for Multi-Task Prediction of the Parameters of the Pectin Hydrolysis--Extraction Process","primary_cat":"cs.LG","submitted_at":"2026-05-30T17:47:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"CatBoost achieved the highest average R-squared value of about 0.946 in a multi-task regression task for pectin process parameters, with raw material type identified as the most influential input feature.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21107","ref_index":58,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction","primary_cat":"cs.LG","submitted_at":"2026-05-20T12:40:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A projection-based algorithm for COCO achieves O(log T) regret and O(log T) CCV for strongly convex losses and O(sqrt(T)) for convex losses by leveraging self-contracted curves.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16570","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Cubing Strategy for Identifying Stable Hyperparameter Regions for Uncertainty Quantification in Spatial Deep Learning","primary_cat":"stat.CO","submitted_at":"2026-05-15T19:18:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A recursive cubing framework identifies stable hyperparameter regions for MC dropout uncertainty quantification in spatial deep learning and produces competitive or superior predictive intervals versus a statistical baseline on simulations and land-surface temperature data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14055","ref_index":63,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts","primary_cat":"cs.CL","submitted_at":"2026-05-13T19:25:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12843","ref_index":1,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Bayesian Model Merging","primary_cat":"cs.LG","submitted_at":"2026-05-13T00:36:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Bayesian Model Merging introduces a bi-level optimization framework that merges task-specific models via closed-form Bayesian regression with an anchor prior and global hyperparameter search, outperforming baselines and nearly matching expert averages on up to 20-task vision and 5-task language Merg","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09355","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning","primary_cat":"cs.LG","submitted_at":"2026-05-10T06:09:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FLAME is an MoE architecture using modality-specific routers and low-rank compression of expert knowledge to support efficient continual multimodal multi-task learning while reducing catastrophic forgetting.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[43] Tom Pollard, Alistair Johnson, Jesse Raffa, Leo Anthony Celi, Omar Badawi, and Roger Mark. eICU Collaborative Research Database.PhysioNet, April 2019. Version 2.0. [44] Nicola Rieke, Jonny Hancox, Wenqi Li, Fausto Milletari, Holger R Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu N Galtier, Bennett A Landman, Klaus Maier-Hein, et al. The future of digital health with federated learning.NPJ digital medicine, 3(1):119, 2020. [45] Sebastian Ruder. An overview of multi-task learning in deep neural networks.arXiv preprint arXiv:1706.05098, 2017. [46] Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. Progressive neural networks.arXiv preprint arXiv:1606.04671, 2016. [47] Grzegorz Rype's'c, Sebastian Cygert, Valeriya Khan, Tomasz Trzci'nski, Bartosz Zieli'nski, and"},{"citing_arxiv_id":"2605.07648","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Learning Large-Scale Modular Addition with an Auxiliary Modulus","primary_cat":"cs.LG","submitted_at":"2026-05-08T12:16:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An auxiliary modulus during training reduces wrap-around issues and preserves train-test input distributions, enabling better accuracy and sample efficiency for large N and q in modular addition learning.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"analysis by Shalev-Shwartz et al.[25] and Hahn and Rofin[11] has proved this for the parity problem (i.e., modular addition withq = 2). Larger N, q increases the difficulty further as it incurs more wraps around the modulus. The number of required training samples grows withN and q [20]; besides, largeN necessitates an increased network width [8]. A recent study by Saxena et al.[24] identified learning modular addition as a key component of attacking the Learning With Errors (LWE) problem, a basis for post-quantum cryptography [27], and proposed a learning method that substantially scaled modular addition learning in both the number of summands and the modulus. Particularly, addressing large modulusq is crucial for their application."},{"citing_arxiv_id":"2605.07096","ref_index":54,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Query-efficient model evaluation using cached responses","primary_cat":"cs.LG","submitted_at":"2026-05-08T01:24:06+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06190","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Constrained Contextual Bandits with Adversarial Contexts","primary_cat":"cs.LG","submitted_at":"2026-05-07T13:04:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A modular reduction from budget-constrained contextual bandits with adversarial contexts to unconstrained bandits via surrogate rewards, yielding improved guarantees and an efficient algorithm based on SquareCB.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"T =Θ(B T/logT) Then, from the above discussion, it follows that the online algorithm satisfies the prescribed budget constraint in expectation over the entire horizon of lengthT. Let OPT(B′ T) (resp. ALG(B′ T)) be the cumulative reward of the offline benchmark (resp. online algorithm) for the reduced budget of B′ T . From Theorem 1 (part (d)), we have the following terminal regret bound OPT(B′ T)−ALG(B′ T)=O( √ KT UT).(47) Furthermore, using Eqn. (45) with c=Θ(logT) , we have that OPT(BT)⩽O(logT)OPT(B ′ T). Combining this bound with the above, we obtain the following approximate regret bound: OPT(BT)−O(logT)ALG(B ′ T)= ˜O( √ KT UT).(48) 10.2 Regret Bound with Hard-Stopping forCBwLC From Theorem 1, part (f), we have the followingCCVbound for theCBwLCproblem. ECCVT =O("},{"citing_arxiv_id":"2605.01563","ref_index":71,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection","primary_cat":"cs.CV","submitted_at":"2026-05-02T18:23:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher model to task-specific students.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":", Ibtehaz, N., Rahman, M.S., Al-Maadeed, S., Mahmud, S., Ezeddin, M., Hameed, K., Hamid, T., 2021a. Covid-19 infection localization and severity grading from chest x-ray images. Computers in Biology and Medicine 139, 105002. URL:https://www.sciencedirect.com/science/article/pii/ S0010482521007964, doi:https://doi.org/10.1016/j.compbiomed.2021. 105002. [71] Tahir, A.M., Chowdhury, M.E.H., Qiblawey, Y., Khandakar, A., Rahman, T., Kiranyaz, S., Khurshid, U., Ibtehaz, N., Mahmud, S., Ezeddin, M., 2021b. Covid-qu-ex. Kaggle. doi:10.34740/kaggle/ dsv/3122958. [72] Tan, M., Le, Q.V., 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 URL:https://arxiv."},{"citing_arxiv_id":"2605.01224","ref_index":55,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs","primary_cat":"cs.CL","submitted_at":"2026-05-02T03:39:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Incidental multilingualism from uneven web training makes LLMs unequal, brittle, and opaque across languages.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02952","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"DynoSys: A Dynamic Systems Framework for Multimodal Integration of Genetic, Environmental, and Neurobiological Signals","primary_cat":"q-bio.OT","submitted_at":"2026-05-02T00:29:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DynoSys offers a unified dynamic systems model integrating genetic, environmental, and neurobiological signals to analyze longitudinal behavioral phenotypes in adolescents via harmonized representations and survival or state-space modeling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00718","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Learning Coarse-to-Fine Osteoarthritis Representations under Noisy Hierarchical Labels","primary_cat":"cs.CV","submitted_at":"2026-05-01T15:07:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Dual-head training on hierarchical OA labels yields backbone-dependent gains in KL metrics, more ordered latent severity axes, and better saliency alignment with cartilage for some 3D backbones.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.28118","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Hierarchical Fault Detection and Diagnosis for Transformer Architectures","primary_cat":"cs.SE","submitted_at":"2026-04-30T17:07:11+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"A negative∆g means the group favors the alternative. The total gap∑ g ∆ g equals the difference between the two full prototype distances, confirming that the decomposition is exact. To summarize these contributions for visualization, we normalize the positive contributions into an importance score per group: wg = max(∆ g,0) ∑G g′=1 max(∆ g′,0), g= 1,...,G(30) Groups with negative contributions receivewg = 0. The importance scores sum to one and highlight the groups that most support the predicted root cause. For structural groups (Table 7), eachwg maps directly to a transformer subsystem, so a large importance identifies the subsystem that most supports the diagnosis after message passing (Wolf et al., 2024)."},{"citing_arxiv_id":"2604.26375","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SG-UniBuc-NLP at SemEval-2026 Task 6: Multi-Head RoBERTa with Chunking for Long-Context Evasion Detection","primary_cat":"cs.CL","submitted_at":"2026-04-29T07:37:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A multi-head RoBERTa model with overlapping chunking and max-pooling achieves Macro-F1 of 0.80 on 3-way clarity classification and 0.51 on 9-way evasion strategy detection, ranking 11th in both subtasks of SemEval-2026 Task 6.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.23070","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Learning the Weather-Grid Nexus via Weather-to-Voltage (W2V) Predictive Modeling","primary_cat":"eess.SY","submitted_at":"2026-04-24T23:53:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A compact neural network surrogate maps weather features to grid voltages on a 6717-bus Texas system, enabling grid-aware weather forecasting that prioritizes operationally critical conditions like wind drops.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21647","ref_index":74,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Exploring climate change effects on concurrent floods and concurrent droughts via statistical deep learning","primary_cat":"stat.AP","submitted_at":"2026-04-23T13:09:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The deep SPAR model shows concurrent floods and droughts becoming more likely in the Upper Danube by 2100 under high emissions, with changes in the dependence between catchments contributing substantially to the increase.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Similarly to standard inference for GPD regression models, inference for the deep SPAR model proceeds in two stages. First, we estimate the threshold function,u(w), via standard quantile regression techniques (Koenker, 2017). Conditional onu(w), we subsequently estimate the GPD parameter functions,σ(w) andξ(w), via maximum like- lihood estimation with hard parameter sharing (Ruder, 2017; Rothfuss et al., 2019). This requires two neural network models; one for the thresholdu(w) and another for the pa- rameter vector (σ(w), ξ(w)). Bothu(w) and (σ(w), ξ(w)) are modelled by MLPs, which are composed of multiple hidden layers of 'neurons'. Each neuron passes a linear trans- formation of input variables through a nonlinear 'activation function', and the output"},{"citing_arxiv_id":"2604.21321","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"FryNet: Dual-Stream Adversarial Fusion for Non-Destructive Frying Oil Oxidation Assessment","primary_cat":"cs.CV","submitted_at":"2026-04-23T06:24:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FryNet combines RGB and thermal imaging with adversarial regularization to segment oil areas, classify usability, and predict oxidation levels like PV and Totox with high accuracy on video data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20268","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Opportunistic Bone-Loss Screening from Routine Knee Radiographs Using a Multi-Task Deep Learning Framework with Sensitivity-Constrained Threshold Optimization","primary_cat":"cs.CV","submitted_at":"2026-04-22T07:12:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"STR-Net achieves AUROC of 0.933 for binary bone-loss screening and 0.801 correlation for T-score estimation from knee X-rays on a held-out test set.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19474","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Harmonizing MR Images Across 100+ Scanners: Multi-site Validation with Traveling Subjects and Real-world Protocols","primary_cat":"eess.IV","submitted_at":"2026-04-21T13:57:15+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HACA3^+ improves upon HACA3 with better artifact encoding, attention mechanisms, and training on 100+ scanners, validated via traveling subjects for better downstream performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14805","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation","primary_cat":"cs.CV","submitted_at":"2026-04-16T09:28:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Petro-SAM adapts SAM via a Merge Block for polarized views plus multi-scale fusion and color-entropy priors to jointly achieve grain-edge and lithology segmentation in petrographic images.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13560","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Parameter-efficient Quantum Multi-task Learning","primary_cat":"cs.LG","submitted_at":"2026-04-15T07:09:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"QMTL uses shared VQC encoding plus task-specific quantum ansatz heads to achieve linear parameter scaling with the number of tasks while matching or exceeding classical multi-task baselines on three benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10079","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-11T07:55:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Supervised fine-tuning of LLMs often fails to fully internalize all training instances due to five recurring causes including missing prerequisites and data conflicts, as diagnosed via a new framework across multiple models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.22586","ref_index":16,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks","primary_cat":"cs.LG","submitted_at":"2026-03-23T21:24:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"iAmTime is a time-series foundation model that uses instruction-conditioned in-context learning from demonstrations to perform zero-shot adaptation on forecasting, imputation, classification, and related tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.15411","ref_index":29,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Hybrid Modeling Framework for Crop Prediction Tasks via Dynamic Parameter Calibration and Multi-Task Learning","primary_cat":"cs.AI","submitted_at":"2026-03-16T15:21:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Hybrid neural parameterization of biophysical models plus multi-task learning improves phenology prediction accuracy by 60% and cold hardiness by 40% over deployed biophysical models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.12676","ref_index":44,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Disentangled Latent Dynamics Manifold Fusion for Solving Parameterized PDEs","primary_cat":"cs.LG","submitted_at":"2026-03-13T05:46:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DLDMF disentangles latent dynamics for parameterized PDEs by feeding parameters into a latent embedding that initializes a parameter-conditioned Neural ODE, then uses dynamic manifold fusion with a shared decoder to reconstruct spatiotemporal fields for better generalization and extrapolation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.05717","ref_index":58,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Comparing the latent features of universal machine-learning interatomic potentials","primary_cat":"physics.chem-ph","submitted_at":"2025-12-05T13:45:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Different uMLIPs encode chemical space in distinct ways, with high cross-model feature reconstruction errors, and fine-tuning preserves strong pre-training bias in the latent features.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.10834","ref_index":34,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"EarthSight: A Distributed Framework for Low-Latency Satellite Intelligence","primary_cat":"cs.LG","submitted_at":"2025-11-13T22:36:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EarthSight reduces average compute time per image by 1.9x and 90th-percentile end-to-end latency from 51 to 21 minutes by distributing inference decisions between orbit and ground with shared backbones and early rejection filters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.01831","ref_index":56,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Routing-Based Continual Learning for Multimodal Large Language Models","primary_cat":"cs.LG","submitted_at":"2025-11-03T18:39:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Routing architecture for MLLMs enables continual learning with constant compute, matching multi-task learning performance and supporting cross-modal transfer.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.04758","ref_index":58,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Feature Importance-Aware Deep Joint Source-Channel Coding for Computationally Efficient and Adjustable Image Transmission","primary_cat":"cs.IT","submitted_at":"2025-04-07T06:11:39+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"FAJSCC is a new deepJSCC architecture for images that achieves better transmission performance with lower complexity than prior models and enables independent encoder-decoder compute adjustment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2205.01068","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"OPT: Open Pre-trained Transformer Language Models","primary_cat":"cs.CL","submitted_at":"2022-05-02T17:49:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2202.12837","ref_index":160,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?","primary_cat":"cs.CL","submitted_at":"2022-02-25T17:25:19+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Randomly replacing labels in in-context demonstrations barely hurts performance, showing that label space, input distribution, and sequence format drive in-context learning more than ground-truth labels.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2202.08906","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"ST-MoE: Designing Stable and Transferable Sparse Expert Models","primary_cat":"cs.CL","submitted_at":"2022-02-17T21:39:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost of a 32B dense model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2109.01652","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Finetuned Language Models Are Zero-Shot Learners","primary_cat":"cs.CL","submitted_at":"2021-09-03T17:55:52+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Instruction tuning a 137B language model on over 60 NLP tasks described by instructions substantially boosts zero-shot performance on unseen tasks, outperforming larger GPT-3 models.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"1±14.7 83.9 82.6 ±4.4 82.1 [7] 10 MNLI-m 33.3 92.2 a - - 35.7 43.7 [5] - - 51.1±6.2 61.2 60.8 ±3.7 63.5 [10] 10 MNLI-mm 33.3 91.9 a - - 37.0 43.8 [5] - - 51.0±6.5 62.4 61.0 ±3.5 63.5 [10] 10 QNLI 50.0 96.9 a - - 50.6 55.7 [5] - - 59.6±4.9 66.4 62.0 ±1.7 63.3 [12] 9 RTE 50.0 92.5 a 68.8 71.5 73.3 70.8 [5] 63.5 72.9 [32] 78.3±7.9 84.1 79.9 ±6.9 84.5 [8] 10 SNLI 33.3 91.3 b - - 33.3 54.7 [5] - - 43.0±7.4 53.4 62.3 ±2.4 65.6 [15] 9 WNLI 50.0 94.5 a - - 56.3 64.8 [5] - - 61.0±10.6 74.6 55.4 ±11.0 70.4 [14] 10 READING COMP. BoolQ 50.0 91.2 a 83.0 82.8 81.0 80.0 [1] 60.5 77.5 [32] 80.2±3.1 82.9 83.6 ±0.8 84.6 [4] 9 DROP - 80.5b 54.9 55.2 3.8 10.3 [1] 23.6† 36.5 [20] 21.9±0.9 22.7 22.3 ±1.1 23.9 [2] 7"},{"citing_arxiv_id":"1910.10683","ref_index":60,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","primary_cat":"cs.LG","submitted_at":"2019-10-23T17:37:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.12266","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Growing Action Spaces","primary_cat":"cs.LG","submitted_at":"2019-06-28T15:35:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A curriculum of growing action spaces combined with simultaneous off-policy value estimation accelerates learning in large multi-agent action spaces.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.12039","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Supervised Contextual Embeddings for Transfer Learning in Natural Language Processing Tasks","primary_cat":"cs.CL","submitted_at":"2019-06-28T04:41:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Extracting representations from pre-trained supervised models enriches word embeddings with task and domain knowledge, improving transfer learning in cross-task, cross-domain, and cross-lingual NLP settings particularly under low-resource conditions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.10971","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"NeuroTrajectory: A Neuroevolutionary Approach to Local State Trajectory Learning for Autonomous Vehicles","primary_cat":"cs.RO","submitted_at":"2019-06-26T11:05:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"NeuroTrajectory is a neuroevolutionary method that trains deep neural networks via genetic algorithms to estimate multi-objective optimal state trajectories over a finite horizon for autonomous vehicle motion planning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}