{"work":{"id":"b353bda2-591d-479a-9c8b-22dfcba12431","openalex_id":"https://openalex.org/W2194775991","doi":"10.1109/cvpr.2016.90","arxiv_id":null,"raw_key":null,"title":"Deep Residual Learning for Image Recognition,","authors":[{"given":"Kaiming","family":"He","sequence":"first","affiliation":[]},{"given":"Xiangyu","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Shaoqing","family":"Ren","sequence":"additional","affiliation":[]},{"given":"Jian","family":"Sun","sequence":"additional","affiliation":[]}],"authors_text":"K","year":2016,"venue":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","abstract":null,"external_url":"https://doi.org/10.1109/cvpr.2016.90","cited_by_count":164175,"metadata_source":"doi_reference","metadata_fetched_at":"2026-07-01T08:05:30.448704+00:00","pith_arxiv_id":null,"created_at":"2026-05-08T18:23:55.272054+00:00","updated_at":"2026-07-01T08:05:30.448704+00:00","title_quality_ok":true,"display_title":"Deep residual learning for image recognition","render_title":"Deep residual learning for image recognition"},"hub":{"state":{"work_id":"b353bda2-591d-479a-9c8b-22dfcba12431","tier":"mega_hub","tier_reason":"1,000+ Pith inbound or 100,000+ external citations","pith_inbound_count":190,"external_cited_by_count":164175,"distinct_field_count":38,"first_pith_cited_at":"2019-06-20T20:30:39+00:00","last_pith_cited_at":"2026-06-30T09:08:35+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"needed","recognition_status":"needed","updated_at":"2026-07-01T14:41:15.267189+00:00","tier_text":"mega_hub"},"tier":"mega_hub","role_counts":[{"context_role":"method","n":18},{"context_role":"background","n":14},{"context_role":"baseline","n":2},{"context_role":"dataset","n":1}],"polarity_counts":[{"context_polarity":"use_method","n":16},{"context_polarity":"background","n":14},{"context_polarity":"baseline","n":2},{"context_polarity":"unclear","n":2},{"context_polarity":"use_dataset","n":1}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"Deep residual learning for image recognition","claims":[{"claim_text":"These channels are not independent signals but jointly represent a single complex-valued measurement, where the relationship between them encodes the local phase. Unlike magnitude-only approaches, where a single intensity channel is compressed, this coupling must be explicitly preserved. The architecture, loss function, and evaluation metrics described below are designed accordingly. The architecture is implemented as a ResNet-based [20] conditional variational autoencoder (CVAE) [21]. The encod","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Together, these considerations make a scalable, high-speed, and robust reconstruction capable of operating at Monte Carlo scale essential for Hyper-Kamiokande. Machine-learning based reconstruction offers a promising path toward meeting these computational and topological chal- lenges. Convolutional neural networks [ 16], and in particular residual networks (ResNets) [17], are well suited to process the high-dimensional charge and time images recorded by the PMT array. At Super-Kamiokande, machi","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Instead of binary classification, our model classifies into four states (LL,L,H,HH), and instead of training CNN feature extractors from scratch, we use pre-trained ResNet50 using transfer learning. The model architecture is shown in Figure 3. 3.6.1 Feature extraction.The first step is to extract features from each of the seven images. Here we apply transfer learning using ResNet50 [22], pre-trained on a large dataset. We extract information from the penultimate layer of ResNet50, compressing ea","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"historical video and recomputes attention upon query arrival. (2) ReKV [12] retrieves query-relevant KVCache at the token level. (3) LiveVLM [13] further combines token-level retrieval with KVCache compression to reduce memory usage. (4) StreamMem [14] also compresses KVCache, but under a TABLE II DATASET CONFIGURATIONS. Dataset Max Length Description MLVU [19] 703s multi-task long video LongVideoBench [20] 468s long-term multi-modal video VideoMME [21] 1,018s full-spectrum multi-modal video RVS","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Training on such data could reinforce areas where AI systems are vulnerable [37, 796], enhancing their robustness in real-world applications. Adversarial examples can be constructed in various ways. One straightforward approach is to add small perturbations to inputs, which preserves their original labels while introducing adversarial characteristics [100, 260, 300, 504]. Another effective strategy is red teaming, which usually involves human teams systematically testing to find vulnerabilities ","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"histopathological images [2], [4], [5], [6]. CNN have been widely adopted for cancer detection due to their ability to capture local texture patterns and hierarchical spatial features. Residual learning has been introduced to alleviate the vanishing gradient problem, leading to significant improvements in deep feature representation, as exemplified by ResNet architectures [7]. Similarly, DenseNet and kernel architectures enhance feature reuse and gradient flow, while EfficientNet achieves state-","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Deep residual learning for image recognition because it crossed a citation-hub threshold. Current citing contexts most often use it as method evidence (18 contexts).","role_counts":[{"n":18,"context_role":"method"},{"n":14,"context_role":"background"},{"n":2,"context_role":"baseline"},{"n":1,"context_role":"dataset"}]},"error":null,"updated_at":"2026-06-05T21:30:39.045798+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"02ad2b6c-8a3d-4309-ba96-5181bc91718e","orcid":null,"display_name":"Kaiming He"},{"id":"90e0e192-6197-4ef4-b9c2-386dbfd79fad","orcid":null,"display_name":"Xiangyu Zhang"},{"id":"c49ca12d-98c9-4fc2-a95f-a30b92a41773","orcid":null,"display_name":"Shaoqing Ren"},{"id":"fa0012a3-358f-4383-8457-2763cccd76e7","orcid":null,"display_name":"Jian Sun"}]},"error":null,"updated_at":"2026-06-05T21:30:39.042991+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-06-05T21:30:27.082895+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"A ConvNet for the 2020s","work_id":"0a23d1b7-bd56-43cc-8a80-7c43ce994e1e","shared_citers":17},{"title":"author Dong, W","work_id":"effdb28b-742e-4840-b3ca-d89502a6cd4d","shared_citers":14},{"title":"In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp","work_id":"9da51225-b7bd-4032-b7db-ca577971dafe","shared_citers":12},{"title":"Very Deep Convolutional Networks for Large-Scale Image Recognition","work_id":"1c4b4409-c14b-488b-a086-c57a5aab8a29","shared_citers":11},{"title":"Walk in the cloud: Learning curves for point clouds shape analysis, pp","work_id":"3820f598-11b0-45c3-8c99-0079181ac0a7","shared_citers":11},{"title":"Derf: Decomposed radiance fields","work_id":"7083a41e-5666-435b-ab26-c753f6490b9a","shared_citers":10},{"title":"In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)","work_id":"b8a8bb9e-1d31-40e2-9cab-ae21e338dde6","shared_citers":10},{"title":"In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp","work_id":"b9701eca-d05e-4d2e-9045-6761df4ba175","shared_citers":10},{"title":"BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"3e3c8ac8-b858-4b22-af32-393d98c883e0","shared_citers":9},{"title":"Deep learning","work_id":"f959cefa-9092-49df-9fb5-a4e6654500f1","shared_citers":9},{"title":"Densely connected convolutional networks","work_id":"2199d436-33c2-4b30-9d6f-ce9b8904101e","shared_citers":9},{"title":"Dickerson","work_id":"5c2060c6-427c-4321-be22-49ccae439d80","shared_citers":9},{"title":"Emogen: Emotional image content generation with text-to-image diffusion models","work_id":"7efbc2dd-b0f2-4f71-bb1c-d2fcf110d805","shared_citers":9},{"title":"Gradient-based learning applied to document recognition","work_id":"0a3595ca-57f9-43f8-8e2f-aface7154b99","shared_citers":8},{"title":"Layer Normalization","work_id":"20a2d720-0046-4c7c-bcd6-327ec8143f69","shared_citers":8},{"title":"Masset, R","work_id":"238df2e4-a3e5-46f3-860e-3ae2b0094b97","shared_citers":8},{"title":"Long short -term memory","work_id":"c3b0bfa7-6764-45f1-a40d-45baaee9d22c","shared_citers":7},{"title":"PoseNet: A convolutional network for real-time 6-dof camera relocalization","work_id":"135418b1-cafd-49fd-803d-1ca6433d4b1b","shared_citers":7},{"title":"2016, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788, doi: 10.1109/CVPR.2016.91","work_id":"37ab4f11-9f69-480d-aab9-e7d9826c586d","shared_citers":6},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":6},{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":6},{"title":"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications","work_id":"3870239a-c950-4625-bf33-c4f902d14175","shared_citers":6},{"title":"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift","work_id":"05484516-8937-4cdf-9176-7f8329ef0221","shared_citers":5},{"title":"IEEE Access8, 199523–199538 (2020) https://doi.org/10.1109/ACCESS","work_id":"7cbffc3e-26d4-4a7c-a518-eafcd09cbecb","shared_citers":5}],"time_series":[{"n":5,"year":2019},{"n":1,"year":2021},{"n":2,"year":2023},{"n":7,"year":2024},{"n":16,"year":2025},{"n":95,"year":2026}],"dependency_candidates":[{"n":1,"role":"method","polarity":"background","paper_title":"A Systematic Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation","primary_cat":"cs.CV","context_text":"3D convolutional filters with a stride for extracting features from the shape. They do not use any pooling, as it was observed that pooling introduced uncertainty to shape reconstruction. They pretrain the model first and then run fine-tuning. Pre-training is run layer-wise-convolution layers and RBM layer are trained with standard contrastive divergence [35] and AM-DBN layer is trained with fast persistent contrastive divergence [99]. For fine-tuning, they use a process similar to the wake-sleep algorithm from [36]. During wake, they propagate input voxel forward through the network and update the recognition weights. During sleep, they sample persistent latent variables from the network's generative distribution and propagate them backward through the","citing_arxiv_id":"2605.17131"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Multi-Narrow Transformation as a Single-Model Ensemble: Boundary Conditions, Mechanisms, and Failure Modes","primary_cat":"cs.LG","context_text":"tion under a limited budget affects predictive performance. 4. Experiments WeevaluatetheMNtransformationfromthreeperspec- tives: the data-regime dependence of its effectiveness, the mechanismunderlyingitsgainsinlow-dataregimes,andits computational implications. 4.1. Experimental Setup Unless otherwise stated, the following setup was used throughout the experiments. We employed ResNet-18 [9] as the baseline architecture and applied the MN transfor- mation defined in Sec. 3.1. Following Easy Ensemble [8], theimplementationwasbasedongroupconvolution,andthe transformation strength was controlled by 𝑟∈ {1,2,4,8,16,32}. Here,𝑟= 1corresponds to the untransformed SW baseline, and𝑟= 32correspondstoamodelcontaining1,024internal paths. WeusedCIFAR-100astheprimarydataset.","citing_arxiv_id":"2605.11530"},{"n":1,"role":"baseline","polarity":"baseline","paper_title":"Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception","primary_cat":"cs.CV","context_text":"benchmarking library providing modular data loaders, fine-tuning pipelines, evaluation scripts, and cross-dataset adapters for direct comparison with Places365, MS-COCO, and Cityscapes. 4.1 Task 1: Urban Scene Semantic Classification Setup:Given an image, predict itsHUSIClabel (0-9). Fine-tuned on 80K training images; evaluated on 10K test split with five-fold cross-validation.Baselines:ResNet-{18/50/152} [14], EfficientNet- B4 [36], ViT-B/16 [8], DeiT-B [37], CLIP ViT-L/14 (zero-shot + fine-tuned) [33].Metrics:Top-1 Accuracy, Macro-F1, per-class P/R/F1. 4.2 Task 2: Cross-Modal Image-Text Retrieval Task 2 evaluates two sub-configurations reflecting the dataset's two textual modalities: T2-1 (Category-Level Retrieval). Text queries are the tenHUSICclass names, formatted as \" This is a photo of {class_name}\"","citing_arxiv_id":"2605.09936"},{"n":1,"role":"method","polarity":"use_method","paper_title":"LAMES: A Large-Scale and Artisanal Mining Environmental Segmentation Dataset","primary_cat":"cs.CV","context_text":"training, validation, and test data sets are split based on entire mining sites rather than individual patches, ensur- ing that all patches from a given site are confined to either the training or test set, see Fig 10. 5.1. Mining Sector Classification (HiRes Imagery) We selected the established U-Net architecture [37], in- corporating a ResNet-50 backbone [17] trained on Ima- geNet [11] as the network architecture. U-Net is a widely recognized semantic segmentation model, demonstrating robust performance in both computer vision and remote sensing applications. The mining sites were divided into 38 for training, 14 for validation, and 19 for testing. Each bounding box of the mining sites was divided into patches","citing_arxiv_id":"2605.07740"},{"n":1,"role":"method","polarity":"use_method","paper_title":"A Unified Framework for the Detection and Classification of Fatty Pancreas in Ultrasound Images","primary_cat":"cs.CV","context_text":"Our framework operates in three distinct stages, as follows: 1.Segmentation stage.We employ a TransUNet- based architecture [3, 19] combining a ResNet [31] encoder with transformer bottleneck layers to segment both the pancreas and the splenic vein from ultrasound images. The models are initial- ized via transfer learning from a liver segmen- tation task [32] and fine-tuned on our clinical dataset. 2.Anatomically-Guided Patch Extraction stage. Using the predicted segmentation masks, we ex- tract tissue patches from two anatomically rele- vant regions: the pancreatic parenchyma (exclud- ing the splenic vein) and the peri-venous fat re- gion immediately beneath the splenic vein con- tour. 3.Classification via Texture Comparison stage.","citing_arxiv_id":"2605.07466"},{"n":1,"role":"baseline","polarity":"baseline","paper_title":"XiYOLO: Energy-Aware Object Detection via Iterative Architecture Search and Scaling","primary_cat":"cs.CV","context_text":"The searched architectures for PascalVOC and COCO are obtained independently. We refer to the resulting detector family asXiYolo. Deployment platforms and baselines.We evaluate the searched models against YOLO baselines on the ModalAI Sentinel Development Drone, which contains a Qualcomm QRB5165 CPU, a Qualcomm Adreno 650 GPU, and a 15 TOPS NPU. We compare against YOLOv5 [17], YOLOv8 [18], YOLO11 [18], and YOLOv12 [32] at nano, small, and medium scales. All models are exported to FP16 TFLite, and we measure energy per inference, cumulative energy over time, latency, and detection accuracy. Power is monitored using the VOXL Power Module v3. We estimate inference energy by subtracting idle power from measured power to obtain active inference power, then","citing_arxiv_id":"2605.06927"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Digital Image Forgery Detection Using Transfer Learning","primary_cat":"cs.CV","context_text":"tween manipulated and authentic regions. Unlike raw RGB inputs, this rep- resentation explicitly emphasizes subtle manipulation artifacts introduced dur- ing tampering, enabling CNN models to learn more discriminative features for forgery detection [25]. All images are resized to224×224pixels (and299×299for InceptionV3) to match the input requirements of pretrained models [9, 10, 11, 14]. 7 3.3 Pretrained CNN Architectures To evaluate the effectiveness of theproposed approach, multiple pretrained CNN architectures are utilized: •DenseNet121 [14] •ResNet50 [9] •VGG16 [10] •EfficientNetB0 [13] •MobileNet [15] •InceptionV3 [11] Each model is fine-tuned on the enhanced input representation combining RGB images with theFdif f features [7, 16].","citing_arxiv_id":"2605.08167"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Rethinking the Need for Source Models: Source-Free Domain Adaptation from Scratch Guided by a Vision-Language Model","primary_cat":"cs.CV","context_text":"we report the results under the widely adopted closed-set protocol in Office-Home, VisDA, and DomainNet-126. Furthermore, since the VODA uses no source informa- tion, it can be regarded as open-set [40], so we also provide comparisons under the open-set protocol in Office-Home. FrameworksThe initial modelθ i is a standard convolutional network, serving as the starting point for adaptation, we use ResNet-50 [41] for Office-Home, and ResNet- 101 [41] for VisDA and DomainNet-126, keeping consistency with the competitors. We initialized the networks using a layer-wise strategy: fully connected layers with Xavier uniform initialization [42], convolutional layers with Kaiming normal initial- ization tailored for ReLU activations [41], and batch normalization layers with weights","citing_arxiv_id":"2605.02604"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Empirical Insights of Test Selection Metrics under Multiple Testing Objectives and Distribution Shifts","primary_cat":"cs.SE","context_text":"the identified optimal FE method and DBSCAN for CA when ablating DR candidates. Finally, we fix the optimal FE and DR methods when ablating CA candidates. Based on the Silhouette and DBCV scores in Table 3, the optimal pipelines are(DeepDrebin, UMAP, DBSCAN)for the AndroZoo dataset and(RoBERTa, UMAP, DBSCAN)for IMDb. Note that for image datasets (MNIST and Udacity), we directly use the best pipeline(ResNet-50 [ 30], UMAP, DBSCAN)validated by [ 7], which achieves average scores of 0.69 and 0.47 for MNIST and Udacity, respectively. 5.1.2 The cluster-to-fault correspondence.To validate whether the resulting cluster from the best pipeline indeed represents a DNN fault, we conduct feature pattern inspection and cluster-specific retraining validation [1]. Figure 2 displays the heatmaps of a randomly selected cluster from the best","citing_arxiv_id":"2604.23342"},{"n":1,"role":"method","polarity":"use_method","paper_title":"H-SemiS: Hierarchical Fusion of Semi and Self-Supervised Learning for Knee Osteoarthritis Severity Grading","primary_cat":"cs.CV","context_text":"andm,(m≫n) are labeled and unlabeled samples;x i is a train- ing sample; andθ Te , θS t are model parameters. Following Mean Teacher framework (Tarvainen and Valpola, 2017), we update the teacher via EMA applied only to the student's base network, while keeping the QCN fixed to stabilize classical feature ex- traction and provide consistent inputs to the quantum module, as defined in Eq. (20). θ(t+1) Te ←µ·θ (t) Te +(1−µ)·θ (t) S t (20) wheretrepresents the training iteration andµis the EMA smooth- ing coefficient, set to 0.99 following Tarvainen and Valpola (2017), which controls the update rate of the teacher parame- ters. This consistency strategy gradually transfers knowledge from the student to the teacher, aligns their predictions, and im-","citing_arxiv_id":"2604.23335"},{"n":1,"role":"method","polarity":"background","paper_title":"Autonomous Unmanned Aircraft Systems for Enhanced Search and Rescue of Drowning Swimmers: Image-Based Localization and Mission Simulation","primary_cat":"cs.CV","context_text":"Consequently, synthetic images were only employed during the labeling process by adding 2,000 synthetic images of the class \"prob. ok\" to the 400 original images. Data augmentation can improve the general representation of certain object fea- tures, so we employ some standard data augmentation techniques, manipulating the following image attributes [ 63]: 1. Brightness: Simulates varying lighting conditions 2. Contrast: Adjusts intensity differences 3. Noise: Mimics distance variations and sensor characteristics 4. Motion blur: Simulates camera movement 5. Rotation: Introduces variation in object orientation 6. Translation: Simulates different object positions Each transformation was applied once per image, expanding the 2,400 images by","citing_arxiv_id":"2604.18088"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction","primary_cat":"cs.CV","context_text":"The aggregated history features 𝐆(𝑠) are injected into the U-Net encoder at the corresponding scales via channel-wise concatenation followed by a 1 × 1 convolution acting as a pixel-wise temporal mixer, followed by GroupNorm with SiLU activation [23], providing normalization and non-linear refinement of the fused representation. The model predicts the residual change from the most recent history frame rather than the absolute target [24]: 𝐼̂∗ = 𝐼𝑁 + 𝑓𝜃(𝐼𝑁, ℋ, 𝛥𝑡∗) where 𝑓𝜃 denotes the U-Net output (prior to the residual addition). This residual formulation concentrates capacity on the disease-relevant change signal. The output layer is initialized near zero, so the initial prediction approximates the copy-last baseline 𝐼𝑁. 3.3 Training Configuration Our primary model, TRU, is trained to predict the ground-truth target 𝐼∗ via a per-pixel masked","citing_arxiv_id":"2604.16955"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Weak-to-Strong Knowledge Distillation Accelerates Visual Learning","primary_cat":"cs.CV","context_text":"A speedup ratio greater than 1.0×means ours reaches the same target earlier with fewer epochs or steps. For higher-is-better metrics (Top-1, AP50), first@τis the first epoch with metric at or aboveτ. For lower-is-better metrics (FID), first@τis the first step at or belowτ. Gate and Hyperparameter Selection.For ImageNet classification [7], we useτ= 65for ResNet-50 [14] andτ= 50for ViT-S/16 [8]. For CIFAR early-stage classification [26], we use fixed dataset-level gates:τ= 75for CIFAR-10 and τ= 60for CIFAR-100. For object detection on the COCO dataset [28], we use a fixed task-level AP50 target (τ= 20%). For diffusion generation on the CIFAR-10 dataset [18,26,32], we use a fixed task-level FID target (τ= 60), selected from the teacher reference around 30k training steps, with a conservative","citing_arxiv_id":"2604.15451"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Generative Modeling of Complex-Valued Brain MRI Data","primary_cat":"eess.IV","context_text":"These channels are not independent signals but jointly represent a single complex-valued measurement, where the relationship between them encodes the local phase. Unlike magnitude-only approaches, where a single intensity channel is compressed, this coupling must be explicitly preserved. The architecture, loss function, and evaluation metrics described below are designed accordingly. The architecture is implemented as a ResNet-based [20] conditional variational autoencoder (CVAE) [21]. The encoder maps each (2×96×96) input patch to a latent representation of size (2×48×48), yielding a compression factor of 4. This factor was chosen to retain fine diagnostic detail and the coupling between channels while providing a sufficiently compact input for the flow matching model. The decoder reconstructs the original patch from this compressed encoding.","citing_arxiv_id":"2604.14800"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Enhancing Event Reconstruction in Hyper-Kamiokande with Machine Learning: A ResNet Implementation","primary_cat":"hep-ex","context_text":"Together, these considerations make a scalable, high-speed, and robust reconstruction capable of operating at Monte Carlo scale essential for Hyper-Kamiokande. Machine-learning based reconstruction offers a promising path toward meeting these computational and topological chal- lenges. Convolutional neural networks [ 16], and in particular residual networks (ResNets) [17], are well suited to process the high-dimensional charge and time images recorded by the PMT array. At Super-Kamiokande, machine-learning techniques are already employed for solar-neutrino classification [18] and for neutron-capture tagging [ 19], and they show encouraging re- sults in neutrino-reconstruction studies for other experiments [20, 21, 22].","citing_arxiv_id":"2604.13503"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Human Centered Non Intrusive Driver State Modeling Using Personalized Physiological Signals in Real World Automated Driving","primary_cat":"cs.HC","context_text":"Instead of binary classification, our model classifies into four states (LL,L,H,HH), and instead of training CNN feature extractors from scratch, we use pre-trained ResNet50 using transfer learning. The model architecture is shown in Figure 3. 3.6.1 Feature extraction.The first step is to extract features from each of the seven images. Here we apply transfer learning using ResNet50 [22], pre-trained on a large dataset. We extract information from the penultimate layer of ResNet50, compressing each image into a feature vector of size2048 × 1. This process is repeated for all seven input images (one per signal), resulting in seven feature vectors. These feature vectors are then concatenated into a single vector of size14336 × 1(because2048 × 7 = 14336.","citing_arxiv_id":"2604.11549"},{"n":1,"role":"dataset","polarity":"use_dataset","paper_title":"Mosaic: Cross-Modal Clustering for Efficient Video Understanding","primary_cat":"cs.PF","context_text":"historical video and recomputes attention upon query arrival. (2) ReKV [12] retrieves query-relevant KVCache at the token level. (3) LiveVLM [13] further combines token-level retrieval with KVCache compression to reduce memory usage. (4) StreamMem [14] also compresses KVCache, but under a TABLE II DATASET CONFIGURATIONS. Dataset Max Length Description MLVU [19] 703s multi-task long video LongVideoBench [20] 468s long-term multi-modal video VideoMME [21] 1,018s full-spectrum multi-modal video RVS-ego [22] 3,605s first-person ego-centric video RVS-movie [22] 1,671s third-person plot video query-agnostic memory budget. Following prior work [12], [13], we set the retrieved frames to 64 for all baselines. B. Overall Performance","citing_arxiv_id":"2604.10060"},{"n":1,"role":"method","polarity":"use_method","paper_title":"DSVTLA: Deep Swin Vision Transformer-Based Transfer Learning Architecture for Multi-Type Cancer Histopathological Cancer Image Classification","primary_cat":"eess.IV","context_text":"histopathological images [2], [4], [5], [6]. CNN have been widely adopted for cancer detection due to their ability to capture local texture patterns and hierarchical spatial features. Residual learning has been introduced to alleviate the vanishing gradient problem, leading to significant improvements in deep feature representation, as exemplified by ResNet architectures [7]. Similarly, DenseNet and kernel architectures enhance feature reuse and gradient flow, while EfficientNet achieves state-of-the-art performance through compound scaling with reduced computational cost [8], [9], [10] . More recently, transformer -based architectures have gained attention in medical imaging for their capability to model long -range","citing_arxiv_id":"2604.09468"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Variational Feature Compression for Model-Specific Representations","primary_cat":"cs.CV","context_text":"and gradient-based saliency, the scores are normalized and combined into a uni- fied importance measure, and thresholding yields a binary mask that selects the dimensions retained for decoding. importance score. A threshold produces a binary mask, and the masked vector Zm is decoded intoX ′ for inference. 3.4 Variational Latent Bottleneck Our encoder uses a ResNet-18 [11] backbone with the final fully connected and softmax layers removed. The 512-dimensional output from global average pool- ing is processed through two parallel linear layers to produceµandlogσ 2; a sampleZis drawn via the reparameterization trick. Following the Variational Information Bottleneck (VIB) framework [2], minimizingI(Z;X)via KL diver- gence regularization reduces redundant information, while maximizingI(Z;Y)","citing_arxiv_id":"2604.06644"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Lightweight True In-Pixel Encryption with FeFET Enabled Pixel Design for Secure Imaging","primary_cat":"cs.CV","context_text":"The PSNR values between encrypted and original images further confirm that the encrypted images diverge substantially from the originals, consistent with strong visual obfuscation. To further evaluate the robustness of SecurePix against machine-learning-based attacks, we performed an image- classification test using a ResNet-18 neural network [30]. In this experiment, the classifier is treated as an adversarial model attempting to recognize the encrypted images. For CIFAR-10 [31], the ResNet-18 network was first trained only on the unencrypted training images following standard supervised- learning procedures. After training, we encrypted 10,000 CIFAR-10 test images using SecurePix and fed these encrypted","citing_arxiv_id":"2604.05147"}]},"error":null,"updated_at":"2026-06-05T21:30:22.214786+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-06-05T21:30:14.629465+00:00"},"reader_index":{"job_type":"reader_index","status":"succeeded","result":{"note":"annotated reader requires full-text/OA fetch; shell is wired for mega hubs","status":"reader queued"},"error":null,"updated_at":"2026-06-05T21:30:38.757310+00:00"},"recognition_alignment":{"job_type":"recognition_alignment","status":"succeeded","result":{"modules":["IndisputableMonolith.Foundation.RecognitionForcing","IndisputableMonolith.Chain","IndisputableMonolith.Engineering.CorticalNeuromodulationDevice","IndisputableMonolith.Engineering.PhantomCoupledGWAntennaSensitivity","IndisputableMonolith.Foundation.InitialCondition","IndisputableMonolith.Foundation.LedgerForcing","IndisputableMonolith.Foundation.ObserverForcing","IndisputableMonolith.Information.InformationIsLedger"],"query_chars":45},"error":null,"updated_at":"2026-06-05T21:30:16.286292+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"Deep residual learning for image recognition","claims":[{"claim_text":"These channels are not independent signals but jointly represent a single complex-valued measurement, where the relationship between them encodes the local phase. Unlike magnitude-only approaches, where a single intensity channel is compressed, this coupling must be explicitly preserved. The architecture, loss function, and evaluation metrics described below are designed accordingly. The architecture is implemented as a ResNet-based [20] conditional variational autoencoder (CVAE) [21]. The encod","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Together, these considerations make a scalable, high-speed, and robust reconstruction capable of operating at Monte Carlo scale essential for Hyper-Kamiokande. Machine-learning based reconstruction offers a promising path toward meeting these computational and topological chal- lenges. Convolutional neural networks [ 16], and in particular residual networks (ResNets) [17], are well suited to process the high-dimensional charge and time images recorded by the PMT array. At Super-Kamiokande, machi","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Instead of binary classification, our model classifies into four states (LL,L,H,HH), and instead of training CNN feature extractors from scratch, we use pre-trained ResNet50 using transfer learning. The model architecture is shown in Figure 3. 3.6.1 Feature extraction.The first step is to extract features from each of the seven images. Here we apply transfer learning using ResNet50 [22], pre-trained on a large dataset. We extract information from the penultimate layer of ResNet50, compressing ea","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"historical video and recomputes attention upon query arrival. (2) ReKV [12] retrieves query-relevant KVCache at the token level. (3) LiveVLM [13] further combines token-level retrieval with KVCache compression to reduce memory usage. (4) StreamMem [14] also compresses KVCache, but under a TABLE II DATASET CONFIGURATIONS. Dataset Max Length Description MLVU [19] 703s multi-task long video LongVideoBench [20] 468s long-term multi-modal video VideoMME [21] 1,018s full-spectrum multi-modal video RVS","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Training on such data could reinforce areas where AI systems are vulnerable [37, 796], enhancing their robustness in real-world applications. Adversarial examples can be constructed in various ways. One straightforward approach is to add small perturbations to inputs, which preserves their original labels while introducing adversarial characteristics [100, 260, 300, 504]. Another effective strategy is red teaming, which usually involves human teams systematically testing to find vulnerabilities ","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"histopathological images [2], [4], [5], [6]. CNN have been widely adopted for cancer detection due to their ability to capture local texture patterns and hierarchical spatial features. Residual learning has been introduced to alleviate the vanishing gradient problem, leading to significant improvements in deep feature representation, as exemplified by ResNet architectures [7]. Similarly, DenseNet and kernel architectures enhance feature reuse and gradient flow, while EfficientNet achieves state-","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Deep residual learning for image recognition because it crossed a citation-hub threshold. Current citing contexts most often use it as method evidence (18 contexts).","role_counts":[{"n":18,"context_role":"method"},{"n":14,"context_role":"background"},{"n":2,"context_role":"baseline"},{"n":1,"context_role":"dataset"}]},"error":null,"updated_at":"2026-06-05T21:29:54.148326+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Deep residual learning for image recognition","claims":[{"claim_text":"These channels are not independent signals but jointly represent a single complex-valued measurement, where the relationship between them encodes the local phase. Unlike magnitude-only approaches, where a single intensity channel is compressed, this coupling must be explicitly preserved. The architecture, loss function, and evaluation metrics described below are designed accordingly. The architecture is implemented as a ResNet-based [20] conditional variational autoencoder (CVAE) [21]. The encod","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Together, these considerations make a scalable, high-speed, and robust reconstruction capable of operating at Monte Carlo scale essential for Hyper-Kamiokande. Machine-learning based reconstruction offers a promising path toward meeting these computational and topological chal- lenges. Convolutional neural networks [ 16], and in particular residual networks (ResNets) [17], are well suited to process the high-dimensional charge and time images recorded by the PMT array. At Super-Kamiokande, machi","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Instead of binary classification, our model classifies into four states (LL,L,H,HH), and instead of training CNN feature extractors from scratch, we use pre-trained ResNet50 using transfer learning. The model architecture is shown in Figure 3. 3.6.1 Feature extraction.The first step is to extract features from each of the seven images. Here we apply transfer learning using ResNet50 [22], pre-trained on a large dataset. We extract information from the penultimate layer of ResNet50, compressing ea","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"historical video and recomputes attention upon query arrival. (2) ReKV [12] retrieves query-relevant KVCache at the token level. (3) LiveVLM [13] further combines token-level retrieval with KVCache compression to reduce memory usage. (4) StreamMem [14] also compresses KVCache, but under a TABLE II DATASET CONFIGURATIONS. Dataset Max Length Description MLVU [19] 703s multi-task long video LongVideoBench [20] 468s long-term multi-modal video VideoMME [21] 1,018s full-spectrum multi-modal video RVS","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Training on such data could reinforce areas where AI systems are vulnerable [37, 796], enhancing their robustness in real-world applications. Adversarial examples can be constructed in various ways. One straightforward approach is to add small perturbations to inputs, which preserves their original labels while introducing adversarial characteristics [100, 260, 300, 504]. Another effective strategy is red teaming, which usually involves human teams systematically testing to find vulnerabilities ","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"histopathological images [2], [4], [5], [6]. CNN have been widely adopted for cancer detection due to their ability to capture local texture patterns and hierarchical spatial features. Residual learning has been introduced to alleviate the vanishing gradient problem, leading to significant improvements in deep feature representation, as exemplified by ResNet architectures [7]. Similarly, DenseNet and kernel architectures enhance feature reuse and gradient flow, while EfficientNet achieves state-","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Deep residual learning for image recognition because it crossed a citation-hub threshold. Current citing contexts most often use it as method evidence (18 contexts).","role_counts":[{"n":18,"context_role":"method"},{"n":14,"context_role":"background"},{"n":2,"context_role":"baseline"},{"n":1,"context_role":"dataset"}]},"error":null,"updated_at":"2026-06-05T21:30:14.634001+00:00"}},"summary":{"title":"Deep residual learning for image recognition","claims":[{"claim_text":"These channels are not independent signals but jointly represent a single complex-valued measurement, where the relationship between them encodes the local phase. Unlike magnitude-only approaches, where a single intensity channel is compressed, this coupling must be explicitly preserved. The architecture, loss function, and evaluation metrics described below are designed accordingly. The architecture is implemented as a ResNet-based [20] conditional variational autoencoder (CVAE) [21]. The encod","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Together, these considerations make a scalable, high-speed, and robust reconstruction capable of operating at Monte Carlo scale essential for Hyper-Kamiokande. Machine-learning based reconstruction offers a promising path toward meeting these computational and topological chal- lenges. Convolutional neural networks [ 16], and in particular residual networks (ResNets) [17], are well suited to process the high-dimensional charge and time images recorded by the PMT array. At Super-Kamiokande, machi","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Instead of binary classification, our model classifies into four states (LL,L,H,HH), and instead of training CNN feature extractors from scratch, we use pre-trained ResNet50 using transfer learning. The model architecture is shown in Figure 3. 3.6.1 Feature extraction.The first step is to extract features from each of the seven images. Here we apply transfer learning using ResNet50 [22], pre-trained on a large dataset. We extract information from the penultimate layer of ResNet50, compressing ea","claim_type":"method","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"historical video and recomputes attention upon query arrival. (2) ReKV [12] retrieves query-relevant KVCache at the token level. (3) LiveVLM [13] further combines token-level retrieval with KVCache compression to reduce memory usage. (4) StreamMem [14] also compresses KVCache, but under a TABLE II DATASET CONFIGURATIONS. Dataset Max Length Description MLVU [19] 703s multi-task long video LongVideoBench [20] 468s long-term multi-modal video VideoMME [21] 1,018s full-spectrum multi-modal video RVS","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Training on such data could reinforce areas where AI systems are vulnerable [37, 796], enhancing their robustness in real-world applications. Adversarial examples can be constructed in various ways. One straightforward approach is to add small perturbations to inputs, which preserves their original labels while introducing adversarial characteristics [100, 260, 300, 504]. Another effective strategy is red teaming, which usually involves human teams systematically testing to find vulnerabilities ","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"histopathological images [2], [4], [5], [6]. CNN have been widely adopted for cancer detection due to their ability to capture local texture patterns and hierarchical spatial features. Residual learning has been introduced to alleviate the vanishing gradient problem, leading to significant improvements in deep feature representation, as exemplified by ResNet architectures [7]. Similarly, DenseNet and kernel architectures enhance feature reuse and gradient flow, while EfficientNet achieves state-","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Deep residual learning for image recognition because it crossed a citation-hub threshold. Current citing contexts most often use it as method evidence (18 contexts).","role_counts":[{"n":18,"context_role":"method"},{"n":14,"context_role":"background"},{"n":2,"context_role":"baseline"},{"n":1,"context_role":"dataset"}]},"graph":{"co_cited":[{"title":"A ConvNet for the 2020s","work_id":"0a23d1b7-bd56-43cc-8a80-7c43ce994e1e","shared_citers":17},{"title":"author Dong, W","work_id":"effdb28b-742e-4840-b3ca-d89502a6cd4d","shared_citers":14},{"title":"In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp","work_id":"9da51225-b7bd-4032-b7db-ca577971dafe","shared_citers":12},{"title":"Very Deep Convolutional Networks for Large-Scale Image Recognition","work_id":"1c4b4409-c14b-488b-a086-c57a5aab8a29","shared_citers":11},{"title":"Walk in the cloud: Learning curves for point clouds shape analysis, pp","work_id":"3820f598-11b0-45c3-8c99-0079181ac0a7","shared_citers":11},{"title":"Derf: Decomposed radiance fields","work_id":"7083a41e-5666-435b-ab26-c753f6490b9a","shared_citers":10},{"title":"In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)","work_id":"b8a8bb9e-1d31-40e2-9cab-ae21e338dde6","shared_citers":10},{"title":"In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp","work_id":"b9701eca-d05e-4d2e-9045-6761df4ba175","shared_citers":10},{"title":"BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"3e3c8ac8-b858-4b22-af32-393d98c883e0","shared_citers":9},{"title":"Deep learning","work_id":"f959cefa-9092-49df-9fb5-a4e6654500f1","shared_citers":9},{"title":"Densely connected convolutional networks","work_id":"2199d436-33c2-4b30-9d6f-ce9b8904101e","shared_citers":9},{"title":"Dickerson","work_id":"5c2060c6-427c-4321-be22-49ccae439d80","shared_citers":9},{"title":"Emogen: Emotional image content generation with text-to-image diffusion models","work_id":"7efbc2dd-b0f2-4f71-bb1c-d2fcf110d805","shared_citers":9},{"title":"Gradient-based learning applied to document recognition","work_id":"0a3595ca-57f9-43f8-8e2f-aface7154b99","shared_citers":8},{"title":"Layer Normalization","work_id":"20a2d720-0046-4c7c-bcd6-327ec8143f69","shared_citers":8},{"title":"Masset, R","work_id":"238df2e4-a3e5-46f3-860e-3ae2b0094b97","shared_citers":8},{"title":"Long short -term memory","work_id":"c3b0bfa7-6764-45f1-a40d-45baaee9d22c","shared_citers":7},{"title":"PoseNet: A convolutional network for real-time 6-dof camera relocalization","work_id":"135418b1-cafd-49fd-803d-1ca6433d4b1b","shared_citers":7},{"title":"2016, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788, doi: 10.1109/CVPR.2016.91","work_id":"37ab4f11-9f69-480d-aab9-e7d9826c586d","shared_citers":6},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":6},{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":6},{"title":"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications","work_id":"3870239a-c950-4625-bf33-c4f902d14175","shared_citers":6},{"title":"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift","work_id":"05484516-8937-4cdf-9176-7f8329ef0221","shared_citers":5},{"title":"IEEE Access8, 199523–199538 (2020) https://doi.org/10.1109/ACCESS","work_id":"7cbffc3e-26d4-4a7c-a518-eafcd09cbecb","shared_citers":5}],"time_series":[{"n":5,"year":2019},{"n":1,"year":2021},{"n":2,"year":2023},{"n":7,"year":2024},{"n":16,"year":2025},{"n":95,"year":2026}],"dependency_candidates":[{"n":1,"role":"method","polarity":"background","paper_title":"A Systematic Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation","primary_cat":"cs.CV","context_text":"3D convolutional filters with a stride for extracting features from the shape. They do not use any pooling, as it was observed that pooling introduced uncertainty to shape reconstruction. They pretrain the model first and then run fine-tuning. Pre-training is run layer-wise-convolution layers and RBM layer are trained with standard contrastive divergence [35] and AM-DBN layer is trained with fast persistent contrastive divergence [99]. For fine-tuning, they use a process similar to the wake-sleep algorithm from [36]. During wake, they propagate input voxel forward through the network and update the recognition weights. During sleep, they sample persistent latent variables from the network's generative distribution and propagate them backward through the","citing_arxiv_id":"2605.17131"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Multi-Narrow Transformation as a Single-Model Ensemble: Boundary Conditions, Mechanisms, and Failure Modes","primary_cat":"cs.LG","context_text":"tion under a limited budget affects predictive performance. 4. Experiments WeevaluatetheMNtransformationfromthreeperspec- tives: the data-regime dependence of its effectiveness, the mechanismunderlyingitsgainsinlow-dataregimes,andits computational implications. 4.1. Experimental Setup Unless otherwise stated, the following setup was used throughout the experiments. We employed ResNet-18 [9] as the baseline architecture and applied the MN transfor- mation defined in Sec. 3.1. Following Easy Ensemble [8], theimplementationwasbasedongroupconvolution,andthe transformation strength was controlled by 𝑟∈ {1,2,4,8,16,32}. Here,𝑟= 1corresponds to the untransformed SW baseline, and𝑟= 32correspondstoamodelcontaining1,024internal paths. WeusedCIFAR-100astheprimarydataset.","citing_arxiv_id":"2605.11530"},{"n":1,"role":"baseline","polarity":"baseline","paper_title":"Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception","primary_cat":"cs.CV","context_text":"benchmarking library providing modular data loaders, fine-tuning pipelines, evaluation scripts, and cross-dataset adapters for direct comparison with Places365, MS-COCO, and Cityscapes. 4.1 Task 1: Urban Scene Semantic Classification Setup:Given an image, predict itsHUSIClabel (0-9). Fine-tuned on 80K training images; evaluated on 10K test split with five-fold cross-validation.Baselines:ResNet-{18/50/152} [14], EfficientNet- B4 [36], ViT-B/16 [8], DeiT-B [37], CLIP ViT-L/14 (zero-shot + fine-tuned) [33].Metrics:Top-1 Accuracy, Macro-F1, per-class P/R/F1. 4.2 Task 2: Cross-Modal Image-Text Retrieval Task 2 evaluates two sub-configurations reflecting the dataset's two textual modalities: T2-1 (Category-Level Retrieval). Text queries are the tenHUSICclass names, formatted as \" This is a photo of {class_name}\"","citing_arxiv_id":"2605.09936"},{"n":1,"role":"method","polarity":"use_method","paper_title":"LAMES: A Large-Scale and Artisanal Mining Environmental Segmentation Dataset","primary_cat":"cs.CV","context_text":"training, validation, and test data sets are split based on entire mining sites rather than individual patches, ensur- ing that all patches from a given site are confined to either the training or test set, see Fig 10. 5.1. Mining Sector Classification (HiRes Imagery) We selected the established U-Net architecture [37], in- corporating a ResNet-50 backbone [17] trained on Ima- geNet [11] as the network architecture. U-Net is a widely recognized semantic segmentation model, demonstrating robust performance in both computer vision and remote sensing applications. The mining sites were divided into 38 for training, 14 for validation, and 19 for testing. Each bounding box of the mining sites was divided into patches","citing_arxiv_id":"2605.07740"},{"n":1,"role":"method","polarity":"use_method","paper_title":"A Unified Framework for the Detection and Classification of Fatty Pancreas in Ultrasound Images","primary_cat":"cs.CV","context_text":"Our framework operates in three distinct stages, as follows: 1.Segmentation stage.We employ a TransUNet- based architecture [3, 19] combining a ResNet [31] encoder with transformer bottleneck layers to segment both the pancreas and the splenic vein from ultrasound images. The models are initial- ized via transfer learning from a liver segmen- tation task [32] and fine-tuned on our clinical dataset. 2.Anatomically-Guided Patch Extraction stage. Using the predicted segmentation masks, we ex- tract tissue patches from two anatomically rele- vant regions: the pancreatic parenchyma (exclud- ing the splenic vein) and the peri-venous fat re- gion immediately beneath the splenic vein con- tour. 3.Classification via Texture Comparison stage.","citing_arxiv_id":"2605.07466"},{"n":1,"role":"baseline","polarity":"baseline","paper_title":"XiYOLO: Energy-Aware Object Detection via Iterative Architecture Search and Scaling","primary_cat":"cs.CV","context_text":"The searched architectures for PascalVOC and COCO are obtained independently. We refer to the resulting detector family asXiYolo. Deployment platforms and baselines.We evaluate the searched models against YOLO baselines on the ModalAI Sentinel Development Drone, which contains a Qualcomm QRB5165 CPU, a Qualcomm Adreno 650 GPU, and a 15 TOPS NPU. We compare against YOLOv5 [17], YOLOv8 [18], YOLO11 [18], and YOLOv12 [32] at nano, small, and medium scales. All models are exported to FP16 TFLite, and we measure energy per inference, cumulative energy over time, latency, and detection accuracy. Power is monitored using the VOXL Power Module v3. We estimate inference energy by subtracting idle power from measured power to obtain active inference power, then","citing_arxiv_id":"2605.06927"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Digital Image Forgery Detection Using Transfer Learning","primary_cat":"cs.CV","context_text":"tween manipulated and authentic regions. Unlike raw RGB inputs, this rep- resentation explicitly emphasizes subtle manipulation artifacts introduced dur- ing tampering, enabling CNN models to learn more discriminative features for forgery detection [25]. All images are resized to224×224pixels (and299×299for InceptionV3) to match the input requirements of pretrained models [9, 10, 11, 14]. 7 3.3 Pretrained CNN Architectures To evaluate the effectiveness of theproposed approach, multiple pretrained CNN architectures are utilized: •DenseNet121 [14] •ResNet50 [9] •VGG16 [10] •EfficientNetB0 [13] •MobileNet [15] •InceptionV3 [11] Each model is fine-tuned on the enhanced input representation combining RGB images with theFdif f features [7, 16].","citing_arxiv_id":"2605.08167"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Rethinking the Need for Source Models: Source-Free Domain Adaptation from Scratch Guided by a Vision-Language Model","primary_cat":"cs.CV","context_text":"we report the results under the widely adopted closed-set protocol in Office-Home, VisDA, and DomainNet-126. Furthermore, since the VODA uses no source informa- tion, it can be regarded as open-set [40], so we also provide comparisons under the open-set protocol in Office-Home. FrameworksThe initial modelθ i is a standard convolutional network, serving as the starting point for adaptation, we use ResNet-50 [41] for Office-Home, and ResNet- 101 [41] for VisDA and DomainNet-126, keeping consistency with the competitors. We initialized the networks using a layer-wise strategy: fully connected layers with Xavier uniform initialization [42], convolutional layers with Kaiming normal initial- ization tailored for ReLU activations [41], and batch normalization layers with weights","citing_arxiv_id":"2605.02604"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Empirical Insights of Test Selection Metrics under Multiple Testing Objectives and Distribution Shifts","primary_cat":"cs.SE","context_text":"the identified optimal FE method and DBSCAN for CA when ablating DR candidates. Finally, we fix the optimal FE and DR methods when ablating CA candidates. Based on the Silhouette and DBCV scores in Table 3, the optimal pipelines are(DeepDrebin, UMAP, DBSCAN)for the AndroZoo dataset and(RoBERTa, UMAP, DBSCAN)for IMDb. Note that for image datasets (MNIST and Udacity), we directly use the best pipeline(ResNet-50 [ 30], UMAP, DBSCAN)validated by [ 7], which achieves average scores of 0.69 and 0.47 for MNIST and Udacity, respectively. 5.1.2 The cluster-to-fault correspondence.To validate whether the resulting cluster from the best pipeline indeed represents a DNN fault, we conduct feature pattern inspection and cluster-specific retraining validation [1]. Figure 2 displays the heatmaps of a randomly selected cluster from the best","citing_arxiv_id":"2604.23342"},{"n":1,"role":"method","polarity":"use_method","paper_title":"H-SemiS: Hierarchical Fusion of Semi and Self-Supervised Learning for Knee Osteoarthritis Severity Grading","primary_cat":"cs.CV","context_text":"andm,(m≫n) are labeled and unlabeled samples;x i is a train- ing sample; andθ Te , θS t are model parameters. Following Mean Teacher framework (Tarvainen and Valpola, 2017), we update the teacher via EMA applied only to the student's base network, while keeping the QCN fixed to stabilize classical feature ex- traction and provide consistent inputs to the quantum module, as defined in Eq. (20). θ(t+1) Te ←µ·θ (t) Te +(1−µ)·θ (t) S t (20) wheretrepresents the training iteration andµis the EMA smooth- ing coefficient, set to 0.99 following Tarvainen and Valpola (2017), which controls the update rate of the teacher parame- ters. This consistency strategy gradually transfers knowledge from the student to the teacher, aligns their predictions, and im-","citing_arxiv_id":"2604.23335"},{"n":1,"role":"method","polarity":"background","paper_title":"Autonomous Unmanned Aircraft Systems for Enhanced Search and Rescue of Drowning Swimmers: Image-Based Localization and Mission Simulation","primary_cat":"cs.CV","context_text":"Consequently, synthetic images were only employed during the labeling process by adding 2,000 synthetic images of the class \"prob. ok\" to the 400 original images. Data augmentation can improve the general representation of certain object fea- tures, so we employ some standard data augmentation techniques, manipulating the following image attributes [ 63]: 1. Brightness: Simulates varying lighting conditions 2. Contrast: Adjusts intensity differences 3. Noise: Mimics distance variations and sensor characteristics 4. Motion blur: Simulates camera movement 5. Rotation: Introduces variation in object orientation 6. Translation: Simulates different object positions Each transformation was applied once per image, expanding the 2,400 images by","citing_arxiv_id":"2604.18088"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Training-inference input alignment outweighs framework choice in longitudinal retinal image prediction","primary_cat":"cs.CV","context_text":"The aggregated history features 𝐆(𝑠) are injected into the U-Net encoder at the corresponding scales via channel-wise concatenation followed by a 1 × 1 convolution acting as a pixel-wise temporal mixer, followed by GroupNorm with SiLU activation [23], providing normalization and non-linear refinement of the fused representation. The model predicts the residual change from the most recent history frame rather than the absolute target [24]: 𝐼̂∗ = 𝐼𝑁 + 𝑓𝜃(𝐼𝑁, ℋ, 𝛥𝑡∗) where 𝑓𝜃 denotes the U-Net output (prior to the residual addition). This residual formulation concentrates capacity on the disease-relevant change signal. The output layer is initialized near zero, so the initial prediction approximates the copy-last baseline 𝐼𝑁. 3.3 Training Configuration Our primary model, TRU, is trained to predict the ground-truth target 𝐼∗ via a per-pixel masked","citing_arxiv_id":"2604.16955"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Weak-to-Strong Knowledge Distillation Accelerates Visual Learning","primary_cat":"cs.CV","context_text":"A speedup ratio greater than 1.0×means ours reaches the same target earlier with fewer epochs or steps. For higher-is-better metrics (Top-1, AP50), first@τis the first epoch with metric at or aboveτ. For lower-is-better metrics (FID), first@τis the first step at or belowτ. Gate and Hyperparameter Selection.For ImageNet classification [7], we useτ= 65for ResNet-50 [14] andτ= 50for ViT-S/16 [8]. For CIFAR early-stage classification [26], we use fixed dataset-level gates:τ= 75for CIFAR-10 and τ= 60for CIFAR-100. For object detection on the COCO dataset [28], we use a fixed task-level AP50 target (τ= 20%). For diffusion generation on the CIFAR-10 dataset [18,26,32], we use a fixed task-level FID target (τ= 60), selected from the teacher reference around 30k training steps, with a conservative","citing_arxiv_id":"2604.15451"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Generative Modeling of Complex-Valued Brain MRI Data","primary_cat":"eess.IV","context_text":"These channels are not independent signals but jointly represent a single complex-valued measurement, where the relationship between them encodes the local phase. Unlike magnitude-only approaches, where a single intensity channel is compressed, this coupling must be explicitly preserved. The architecture, loss function, and evaluation metrics described below are designed accordingly. The architecture is implemented as a ResNet-based [20] conditional variational autoencoder (CVAE) [21]. The encoder maps each (2×96×96) input patch to a latent representation of size (2×48×48), yielding a compression factor of 4. This factor was chosen to retain fine diagnostic detail and the coupling between channels while providing a sufficiently compact input for the flow matching model. The decoder reconstructs the original patch from this compressed encoding.","citing_arxiv_id":"2604.14800"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Enhancing Event Reconstruction in Hyper-Kamiokande with Machine Learning: A ResNet Implementation","primary_cat":"hep-ex","context_text":"Together, these considerations make a scalable, high-speed, and robust reconstruction capable of operating at Monte Carlo scale essential for Hyper-Kamiokande. Machine-learning based reconstruction offers a promising path toward meeting these computational and topological chal- lenges. Convolutional neural networks [ 16], and in particular residual networks (ResNets) [17], are well suited to process the high-dimensional charge and time images recorded by the PMT array. At Super-Kamiokande, machine-learning techniques are already employed for solar-neutrino classification [18] and for neutron-capture tagging [ 19], and they show encouraging re- sults in neutrino-reconstruction studies for other experiments [20, 21, 22].","citing_arxiv_id":"2604.13503"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Human Centered Non Intrusive Driver State Modeling Using Personalized Physiological Signals in Real World Automated Driving","primary_cat":"cs.HC","context_text":"Instead of binary classification, our model classifies into four states (LL,L,H,HH), and instead of training CNN feature extractors from scratch, we use pre-trained ResNet50 using transfer learning. The model architecture is shown in Figure 3. 3.6.1 Feature extraction.The first step is to extract features from each of the seven images. Here we apply transfer learning using ResNet50 [22], pre-trained on a large dataset. We extract information from the penultimate layer of ResNet50, compressing each image into a feature vector of size2048 × 1. This process is repeated for all seven input images (one per signal), resulting in seven feature vectors. These feature vectors are then concatenated into a single vector of size14336 × 1(because2048 × 7 = 14336.","citing_arxiv_id":"2604.11549"},{"n":1,"role":"dataset","polarity":"use_dataset","paper_title":"Mosaic: Cross-Modal Clustering for Efficient Video Understanding","primary_cat":"cs.PF","context_text":"historical video and recomputes attention upon query arrival. (2) ReKV [12] retrieves query-relevant KVCache at the token level. (3) LiveVLM [13] further combines token-level retrieval with KVCache compression to reduce memory usage. (4) StreamMem [14] also compresses KVCache, but under a TABLE II DATASET CONFIGURATIONS. Dataset Max Length Description MLVU [19] 703s multi-task long video LongVideoBench [20] 468s long-term multi-modal video VideoMME [21] 1,018s full-spectrum multi-modal video RVS-ego [22] 3,605s first-person ego-centric video RVS-movie [22] 1,671s third-person plot video query-agnostic memory budget. Following prior work [12], [13], we set the retrieved frames to 64 for all baselines. B. Overall Performance","citing_arxiv_id":"2604.10060"},{"n":1,"role":"method","polarity":"use_method","paper_title":"DSVTLA: Deep Swin Vision Transformer-Based Transfer Learning Architecture for Multi-Type Cancer Histopathological Cancer Image Classification","primary_cat":"eess.IV","context_text":"histopathological images [2], [4], [5], [6]. CNN have been widely adopted for cancer detection due to their ability to capture local texture patterns and hierarchical spatial features. Residual learning has been introduced to alleviate the vanishing gradient problem, leading to significant improvements in deep feature representation, as exemplified by ResNet architectures [7]. Similarly, DenseNet and kernel architectures enhance feature reuse and gradient flow, while EfficientNet achieves state-of-the-art performance through compound scaling with reduced computational cost [8], [9], [10] . More recently, transformer -based architectures have gained attention in medical imaging for their capability to model long -range","citing_arxiv_id":"2604.09468"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Variational Feature Compression for Model-Specific Representations","primary_cat":"cs.CV","context_text":"and gradient-based saliency, the scores are normalized and combined into a uni- fied importance measure, and thresholding yields a binary mask that selects the dimensions retained for decoding. importance score. A threshold produces a binary mask, and the masked vector Zm is decoded intoX ′ for inference. 3.4 Variational Latent Bottleneck Our encoder uses a ResNet-18 [11] backbone with the final fully connected and softmax layers removed. The 512-dimensional output from global average pool- ing is processed through two parallel linear layers to produceµandlogσ 2; a sampleZis drawn via the reparameterization trick. Following the Variational Information Bottleneck (VIB) framework [2], minimizingI(Z;X)via KL diver- gence regularization reduces redundant information, while maximizingI(Z;Y)","citing_arxiv_id":"2604.06644"},{"n":1,"role":"method","polarity":"use_method","paper_title":"Lightweight True In-Pixel Encryption with FeFET Enabled Pixel Design for Secure Imaging","primary_cat":"cs.CV","context_text":"The PSNR values between encrypted and original images further confirm that the encrypted images diverge substantially from the originals, consistent with strong visual obfuscation. To further evaluate the robustness of SecurePix against machine-learning-based attacks, we performed an image- classification test using a ResNet-18 neural network [30]. In this experiment, the classifier is treated as an adversarial model attempting to recognize the encrypted images. For CIFAR-10 [31], the ResNet-18 network was first trained only on the unencrypted training images following standard supervised- learning procedures. After training, we encrypted 10,000 CIFAR-10 test images using SecurePix and fed these encrypted","citing_arxiv_id":"2604.05147"}]},"authors":[{"id":"fa0012a3-358f-4383-8457-2763cccd76e7","orcid":null,"display_name":"Jian Sun","source":"manual","import_confidence":0.72},{"id":"02ad2b6c-8a3d-4309-ba96-5181bc91718e","orcid":null,"display_name":"Kaiming He","source":"manual","import_confidence":0.72},{"id":"c49ca12d-98c9-4fc2-a95f-a30b92a41773","orcid":null,"display_name":"Shaoqing Ren","source":"manual","import_confidence":0.72},{"id":"90e0e192-6197-4ef4-b9c2-386dbfd79fad","orcid":null,"display_name":"Xiangyu Zhang","source":"manual","import_confidence":0.72}]},"citers":{"total":190,"items":[{"citing_arxiv_id":"2606.31378","ref_index":2,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MAPE: Defending Against Transferable Adversarial Attacks Using Multi-Source Adversarial Perturbations Elimination","primary_cat":"cs.CV","submitted_at":"2026-06-30T09:08:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"MAPE combines a channel-attention U-Net (SAPE) trained on multi-model adversarial examples scheduled by PPSA to eliminate perturbations, reporting over 95.1% average defense on CIFAR-10 and 71.5% on Mini-ImageNet against black-box transferable attacks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.30886","ref_index":16,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Multipolar Magnetic-Field Inference for PSR J0740+6620 with Neural-Network-Accelerated NICER Pulse-Profile Modeling","primary_cat":"astro-ph.HE","submitted_at":"2026-06-29T20:22:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Neural-network surrogate accelerated MCMC infers multipolar magnetic field parameters for PSR J0740+6620 from NICER data, finding broad multimodal posteriors and disfavoring a zero-offset model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.30516","ref_index":6,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"HASTE: A Framework for Training-Free, Dynamic, and Steerable Compression of Pre-Trained Convolutional Neural Networks","primary_cat":"cs.CV","submitted_at":"2026-06-29T16:24:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HASTE enables training-free dynamic compression of pre-trained CNNs by patch-wise LSH-based merging of redundant channels, reporting 46.2% FLOPs reduction on ResNet34 CIFAR-10 with 1.25% accuracy drop.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.30499","ref_index":18,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Discovering Collaboration from Novelty: Random Network Distillation for Clustered Federated Learning","primary_cat":"cs.LG","submitted_at":"2026-06-29T16:02:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Random Network Distillation enables pre-training discovery of client clusters in federated learning via local novelty signals, supporting autonomous grouping under non-IID data without a priori cluster count.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.28654","ref_index":11,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"FedLAS: Feature-Modulated Bidirectional Label Smoothing for Neural Network Calibration","primary_cat":"cs.CV","submitted_at":"2026-06-26T23:55:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FedLAS adds feature-norm based confidence detection and bidirectional gating to label smoothing losses to reduce calibration error on vision benchmarks while preserving accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.27884","ref_index":16,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"SEADA: An efficient methodology for optimizing mixed-precision DNNs on multi-precision spatial architectures","primary_cat":"cs.AR","submitted_at":"2026-06-26T09:27:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"SEADA introduces an analytical framework combining cost models, mapping tools, and entropy-based precision selection to optimize mixed-precision DNNs on multi-precision spatial architectures.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.27784","ref_index":15,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Improving Adversarial Robustness via Activation Amplification and Attenuation","primary_cat":"cs.CV","submitted_at":"2026-06-26T07:13:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A3 is a learnable activation scaling module that trains on amplified adversarial signals via contrastive losses to improve robustness when the same parameters are used in attenuation mode.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.27411","ref_index":118,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Compression-Driven Anomaly Detection in Brain MRI Using an Interpretable Quantum Autoencoder","primary_cat":"quant-ph","submitted_at":"2026-06-25T12:56:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A variational quantum autoencoder detects anomalies in brain MRI by scoring resistance to compression, reporting slice-level ROC-AUC of 0.95 and outperforming classical autoencoders and PCA on public datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26780","ref_index":17,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Event-based Gaze Control System for Accurate Real-time Spin Estimation in Professional Ball Games","primary_cat":"cs.CV","submitted_at":"2026-06-25T09:14:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"An event-camera system with active gaze control and contrast-maximization spin estimation achieves real-time performance in table tennis with 8.8% magnitude error, 6.4° axis error, 3 ms latency, and 750 Hz throughput.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26722","ref_index":62,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Socratic agents for autonomous scientific discovery in high-dimensional physical systems","primary_cat":"cs.AI","submitted_at":"2026-06-25T08:01:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AHOIS is a Socratic multi-agent AI that autonomously discovers and validates a random-interference encoding strategy for multimode fiber optics, achieving 76.97% MNIST and 83.17% Fashion-MNIST accuracy with 16x16 measurements of effective rank 56.9.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26538","ref_index":12,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"CascadeFormer: Depth-Tapered Transformers Motivated by Gradient Fan-in Asymmetry","primary_cat":"cs.LG","submitted_at":"2026-06-25T02:25:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CascadeFormer tapers Transformer width with depth based on gradient fan-in asymmetry to match uniform baselines in perplexity while cutting latency 8.6%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26482","ref_index":15,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Minkowski-Type Wasserstein Metrics and Barycenters for Location-Scale Mixtures with Application to Domain Adaptation","primary_cat":"math.OC","submitted_at":"2026-06-25T00:42:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A Minkowski-type Wasserstein framework for location-scale mixtures reduces multimarginal OT to discrete component transport with linear complexity and shows competitive domain adaptation performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26260","ref_index":10,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A multi-task spatiotemporal deep neural network for predicting penetration depth and morphology in laser welding","primary_cat":"cs.CV","submitted_at":"2026-06-24T18:02:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A CNN-plus-state-space-model multi-task network predicts laser weld penetration state (99.35% accuracy), depth (1.79 mm error), and cross-section morphology (95.65% accuracy) from top-view weld-pool images and welding parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26078","ref_index":34,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A cross-process welding penetration status prediction algorithm based on unsupervised domain adaptation in laser and TIG welding","primary_cat":"cs.CV","submitted_at":"2026-06-24T17:52:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Unsupervised domain adaptation with GSDE achieves ~80% accuracy in cross-process TIG-laser weld penetration prediction, improving supervised baselines by over 43%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26059","ref_index":14,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A welding penetration prediction model for laser welding process based on self-supervised learning using physics-informed neural networks","primary_cat":"cs.CV","submitted_at":"2026-06-24T17:33:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SimPhysNet achieves 96.06% accuracy classifying laser welding penetration states using self-supervised contrastive learning with a physics-informed neural network and prototypical networks on only 200 labeled images.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.25759","ref_index":12,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"NEURON-Fabric: Architecture-Runtime Co-Design for Controlled Low-Bit Gradient Communication","primary_cat":"cs.DC","submitted_at":"2026-06-24T12:32:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"NEURON-Fabric provides a profile-guided runtime for controlled low-bit gradient communication that preserves accuracy near full-precision levels while reducing modeled communication traffic across vision, transformer, and language model workloads.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.24375","ref_index":34,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MATCH: Flow Matching for Multi-View Anomaly Detection","primary_cat":"cs.CV","submitted_at":"2026-06-23T10:07:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.23939","ref_index":34,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Constrained Variable Projection for Structured Problems","primary_cat":"math.OC","submitted_at":"2026-06-22T21:00:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Extends variable projection to constrained separable nonlinear least-squares via bilevel collapse, yielding exact reduced gradients and a convergent conditional-gradient algorithm.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.23784","ref_index":70,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"First Results from the LSST Shadow Survey: The Restless Luminous Blue Variable AT2017des in the Virgo-Cluster Galaxy, NGC4532","primary_cat":"astro-ph.HE","submitted_at":"2026-06-22T18:00:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The DECam Shadow Survey has detected variable LBV eruptions in AT2017des with peaks brightening by ~0.05 mag per year, reaching luminosities similar to extreme SN impostors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.22309","ref_index":16,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"The $\\alpha$-Index: A Penalized Authorship-Integrity Framework for Position-Weighted Scientific Contribution","primary_cat":"cs.DL","submitted_at":"2026-06-21T02:32:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The α-index is a conserved position-weighted authorship framework with a senior-author penalty that decreases credit as the number of middle authors increases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.22075","ref_index":29,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Frequency-Domain Neural ODEs for Modeling Non-Linear Dynamical Systems","primary_cat":"cs.LG","submitted_at":"2026-06-20T14:50:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FNODE projects Neural ODE dynamics into the frequency domain via FFT and reports better generalization and convergence stability than GRUs, LSTMs, and ANODE on Lotka-Volterra, forced Duffing, Van der Pol, and Lorenz systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.22072","ref_index":16,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A Controlled Study of CLIP-Based Body-Scene Fusion for Emotion Recognition in Context","primary_cat":"cs.CV","submitted_at":"2026-06-20T14:47:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Controlled study finds CLIP-based body-scene fusion model for emotion recognition on EMOTIC is not improved by context debiasing or rare-class training, with best mAP of 34.52%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.21590","ref_index":23,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Radial Basis Function Networks as Projection Heads in Self-Supervised Learning","primary_cat":"cs.CV","submitted_at":"2026-06-19T16:46:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RBFN projection heads serve as competitive replacements for MLP heads in SSL and enable SNS, a label-free metric from RBF parameters that correlates strongly with logistic regression evaluation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.20329","ref_index":9,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Constrained hybrid modelling to predict microbial dynamics and organic matter turnover in soil systems","primary_cat":"cs.LG","submitted_at":"2026-06-18T15:04:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Hybrid neural-process model derives biokinetic parameters from genomic traits for soil organic matter turnover, with ecological constraints, and outperforms baselines on synthetic and real data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.19908","ref_index":10,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Gaussian Process Prior Variational Autoencoder for Endoscopic Videos","primary_cat":"cs.CV","submitted_at":"2026-06-18T08:03:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GPVAE replaces the standard VAE latent prior with a temporal Gaussian process prior, combined with endoscopy-specific encoders and specular masking, to achieve up to 26.1% lower image reconstruction RMSE on the C3VDv2 colonoscopy dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.18876","ref_index":7,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Test-Time Adaptation in Optical Coherence Tomography Using Trajectory-Aligned Time-Independent Flow","primary_cat":"cs.CV","submitted_at":"2026-06-17T09:54:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Flow-matching TTA with histogram matching to synthetic reference trajectories and time-independent flow achieves SOTA segmentation of AMD biomarkers in OCT.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.17921","ref_index":21,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"OlfactProfile: Profile-Conditioned Odor Prediction from Audiovisual Content","primary_cat":"cs.MM","submitted_at":"2026-06-16T13:36:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OlfactProfile shows structured field-wise profile conditioning improves odor prediction from audiovisual content over naive methods, with gains on background and emotion odors in a new 1,350-clip benchmark using the OAR fusion module.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.17646","ref_index":131,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"SketchXplain: Intuitive Visual Explanations of Image Classifiers with Sketches","primary_cat":"cs.HC","submitted_at":"2026-06-16T08:05:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SketchXplain produces sketch-based explanations for image classifiers that users interpret faster and more coherently than saliency maps on face expression and skin lesion tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.17460","ref_index":12,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Operator Boosting Produces Pareto-Efficient PDE Surrogates","primary_cat":"cs.LG","submitted_at":"2026-06-16T03:20:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Operator Boosting constructs compact neural-operator PDE surrogates by sequential residual learning with validation-selected shrinkage, yielding 72-95% parameter reduction and accuracy gains on 21 of 30 dataset-architecture pairs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.20707","ref_index":91,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"GEOPHYS: The Geometry of Physical Plausibility","primary_cat":"cs.CV","submitted_at":"2026-06-15T20:51:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GEOPHYS defines five geometric properties of per-frame embeddings from image encoders that detect physical implausibility in videos with SOTA accuracy and serve as an efficient verifier.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.13144","ref_index":40,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Decoding Crystallographic Surface Chirality with Machine Learning: From Atomic Geometry to Fermi Surface Projections","primary_cat":"cond-mat.mtrl-sci","submitted_at":"2026-06-11T10:08:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A ResNet18 model classifies surface chirality from atomic models at ~73% accuracy and from Fermi surface projections at ~99% accuracy, transferring to experimental synchrotron images after fine-tuning on two frames.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12949","ref_index":8,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ViPER: Vision-based Packing-Aware Encoder for Robust Malware Detection","primary_cat":"cs.CR","submitted_at":"2026-06-11T06:21:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ViPER uses a LoRA-adapted ViT-B/14 with dual heads for malware classification and packing detection plus a gating mechanism and weighted losses to reach 0.8521 balanced accuracy on 200k Windows PE images while detecting packing at 0.9949 AUC.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12610","ref_index":15,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"The Mathematics of AI Winters: The mathematical Taxonomy of Paradigm Fragility in AI Winter","primary_cat":"cs.LG","submitted_at":"2026-06-10T19:08:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Established mathematical bottlenecks in representation, optimization, complexity, and high-dimensional learning aligned with the central disappointments of early AI research periods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11827","ref_index":1,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Jaguar: Fast Private CNN Inference with Power-of-Two Homomorphic Arithmetic","primary_cat":"cs.CR","submitted_at":"2026-06-10T09:04:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Jaguar replaces prime-modulus HE with power-of-two arithmetic to enable coefficient-domain convolution and local-shift truncation, reporting 2-3.7x lower latency than Cheetah and Rhombus on ResNet-18/50 and MobileNetV2.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11541","ref_index":48,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"WHET: Welding Homomorphic Encryption to Accelerator Architectures","primary_cat":"cs.CR","submitted_at":"2026-06-10T01:04:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"WHET applies fine-grained coefficient-to-slot transforms, plaintext compression, and modulus raising plus lightweight hardware tweaks to FHE accelerators, delivering 1.38-8.74x per-area gains and sub-millisecond CKKS bootstrapping.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10410","ref_index":32,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A Comprehensive Inference-Time Augmentation Framework in Physiological Signals: Application to PPG-Based AF Detection","primary_cat":"cs.LG","submitted_at":"2026-06-09T04:35:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A unified inference-time augmentation framework with 13 methods and Bayesian-optimized parameters improves AUROC up to 8.5% and reduces false positives in PPG-based AF detection across five datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10253","ref_index":1,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Multi-channel Optical Vision Model","primary_cat":"physics.optics","submitted_at":"2026-06-08T23:40:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Spatial multiplexing in optical neural networks is repurposed as a trainable representational coordinate, demonstrated in multi-layer architectures for image classification, regression, and hybrid vision-language captioning with over one million optical phase parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11251","ref_index":10,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Mechanical Field Networks: Structured Neural Dynamics for Multivariate Systems","primary_cat":"cs.LG","submitted_at":"2026-06-08T15:23:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MF-Net learns a shared field state and mechanical transition rule from trajectories to deliver competitive forecasting and recoverable relation matrices on Lorenz-96 and real systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08532","ref_index":53,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"DN-Hypo-Pipeline: An AI-Driven Workflow for Generating Hypotheses using Large Language Models and Scientific Explanations","primary_cat":"cs.AI","submitted_at":"2026-06-07T09:26:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DN-Hypo-Pipeline operationalizes three philosophy-of-science accounts to direct LLMs toward principle-based hypothesis generation, claims superior performance over direct prompting, and derives two new transformer algorithms from the resulting hypotheses.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08191","ref_index":18,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Frequency-Domain Latent Attention Gating for Cross-Domain Token Aggregation","primary_cat":"cs.LG","submitted_at":"2026-06-06T14:21:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FLaG is a frequency-domain module using FFT, latent queries, and gating that improves token aggregation and shows gains on ESM2 AMP prediction and CIFAR-100 image classification while staying competitive on text tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07394","ref_index":35,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation","primary_cat":"cs.CV","submitted_at":"2026-06-05T15:32:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.06867","ref_index":79,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Multi-FRuGaL: Multimodal Flexible Redundancy-aware Decomposed Gated Learning for Cancer Diagnosis and Prognosis","primary_cat":"cs.CV","submitted_at":"2026-06-05T03:33:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Multi-FRuGaL is a decomposition-aware gated fusion framework for multimodal cancer data that maintains performance under missing modalities and reports AUC gains on two head-and-neck cancer cohorts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.06533","ref_index":8,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Position: Don't Just \"Fix it in Post\": A Science of AI Must Study Training Dynamics","primary_cat":"cs.AI","submitted_at":"2026-06-03T17:58:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A science of AI requires theories of training dynamics to predict outcomes from early signals, intervene on trajectories, and design procedures that reliably produce desired capabilities, biases, robustness, and safety properties.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.03796","ref_index":19,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Signed Spiking Neuron Enabled by an Orthogonal-Easy-Axis Magnetic Tunnel Junction","primary_cat":"cs.NE","submitted_at":"2026-06-02T15:45:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"An MTJ device with orthogonal easy axes is proposed to realize signed LIF neurons, with LLG simulations confirming the behavior and network tests showing 91.06% CIFAR-10 accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02788","ref_index":13,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Neutrino Fingerprints: Image-Based Encodings of IceCube Events for CNN Direction Reconstruction","primary_cat":"astro-ph.IM","submitted_at":"2026-06-01T18:54:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"IceCube events are encoded as 72x72x3 images and processed by ResNet18 to reach 1.10 rad mean angular error in neutrino direction reconstruction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02764","ref_index":38,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"From Local Training to Large-Scale Mapping: A Comparative Assessment of Machine Learning and Deep Learning for Transferable Satellite-Derived Bathymetry","primary_cat":"cs.CV","submitted_at":"2026-06-01T18:28:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Deep CNNs with spatial continuity preservation and a new weighted loss function outperform Random Forest in cross-regional transfer for satellite-derived bathymetry, achieving low RMSE on independent tests and a public benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02523","ref_index":16,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"FigSIM: A Dataset for Fine-grained Suicide Severity and Figurative Language in Suicide Memes","primary_cat":"cs.CL","submitted_at":"2026-06-01T17:32:29+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"FigSIM is the first annotated dataset for fine-grained suicide severity and figurative language in suicide memes, accompanied by benchmarks on 16 unimodal and multimodal models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02341","ref_index":41,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Parameter-efficient Dual-encoder Architecture with Differentiable Choquet Integral Fusion for Underwater Acoustic Classification","primary_cat":"cs.SD","submitted_at":"2026-06-01T14:49:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A parameter-efficient dual-encoder model with differentiable Choquet integral fusion improves underwater acoustic classification accuracy over single-encoder baselines on DeepShip and ShipsEar datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02105","ref_index":17,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Multimodal Action Diffusion for Robust End-to-End Autonomous Driving","primary_cat":"cs.CV","submitted_at":"2026-06-01T11:35:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Action Diffusion Transformer generates multimodal driving actions via diffusion and nearest-neighbor selection, claiming SOTA on Bench2Drive with 10x lower latency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01962","ref_index":10,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Contrastive Augmented Transformer with Domain-specific Enhancement for Robust Multi-scenario Metal Surface Defect Detection","primary_cat":"cs.CV","submitted_at":"2026-06-01T09:22:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"CAT framework reports 99.54% pixel-level AUROC on KolektorSDD2 with claimed superior generalization to three unseen defect datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31219","ref_index":41,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Latent Geometric Chords for Query-Efficient Decision-Based Adversarial Attacks","primary_cat":"cs.CV","submitted_at":"2026-05-29T12:25:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LGC performs curvature-aware geometric search in a compressed semantic manifold for decision-based attacks, using residual adversarial generation to reach SSIM >0.99 and LPIPS <0.01 at 5000 queries while attacking robust models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29428","ref_index":34,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"DELOS: Detecting Shallow Transits in Kepler Photometry Using a Contrastive-Learning Framework","primary_cat":"astro-ph.EP","submitted_at":"2026-05-28T06:22:22+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DELOS applies contrastive learning to phase-folded light curves to detect shallow intermediate-to-long period transits, reporting 15.5% and 11.25% gains in combined precision-recall over BLS and TLS in low-SNR tests plus 3-80x speedups.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29260","ref_index":3,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Deep Psychovisual Image Representations","primary_cat":"cs.CV","submitted_at":"2026-05-28T02:24:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Proposes a psychovisual-inspired deep learning method that encodes images in learned frequency sub-bands for interpretable semantic structures and reduced depth dependence.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28693","ref_index":11,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images","primary_cat":"q-bio.NC","submitted_at":"2026-05-27T16:20:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Backpropagated gradients from vision models predict higher visual cortex signals but diverge from brain hierarchies in spatial and temporal organization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27541","ref_index":1,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training","primary_cat":"cs.LG","submitted_at":"2026-05-26T18:14:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SparseOpt is a new optimizer that counters batch normalization's gradient skew in dynamic sparse training, yielding faster convergence and better accuracy on ResNet models for CIFAR-100 and ImageNet.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26790","ref_index":65,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Pretrained Approximators for Low-Thrust Trajectory Cost and Reachability","primary_cat":"cs.LG","submitted_at":"2026-05-26T10:01:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Neural surrogates trained with scaling laws and self-similar transformations accurately approximate low-thrust trajectory costs and reachability while generalizing across orbital parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26294","ref_index":13,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection","primary_cat":"cs.CV","submitted_at":"2026-05-25T19:37:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"Benchmark of twelve models finds hybrid CNN-transformer architectures and a SigLIP vision-language model deliver the strongest overall performance on skin cancer detection using the PAD-UFES-20 dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26283","ref_index":14,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening","primary_cat":"cs.CV","submitted_at":"2026-05-25T19:09:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Empirical benchmark finds attention-based models (SwinTiny, CoAtNet0, MaxViTTiny) achieve highest AUC above 84% on RFMiD binary screening and best F1 scores on multi-label task, with VLMs competitive but not superior and external Messidor-2 AUC 66.8-84.7%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25634","ref_index":10,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Machine Learning-based Separation of the He I 10830{\\AA} Chromospheric Signal: Quantitative Analysis of Chromosphere-Corona Intensity in the Quiet Sun","primary_cat":"astro-ph.SR","submitted_at":"2026-05-25T09:33:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"CNN separation of He I 10830Å chromospheric signal from photospheric contamination in quiet Sun reveals R ≈ -0.84 anti-correlation with 304Å and magnetic-field-dependent EUV coupling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24748","ref_index":26,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Deep Learning-Enabled Prediction of Geoeffective CMEs Using SOHO and SDO Observations","primary_cat":"astro-ph.SR","submitted_at":"2026-05-23T21:54:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A CNN-based fusion model trained on multi-instrument solar observations predicts geoeffective CMEs, achieving mean TSS of 0.703 and Brier score of 0.095 via five-fold cross-validation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24588","ref_index":24,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"HeartBeatAI: An Interpretable and Robust Deep Learning Framework for Multi-Label ECG Arrhythmia Detection","primary_cat":"cs.AI","submitted_at":"2026-05-23T13:58:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"HeartBeatAI reports 98% Macro F1 under intra-source testing on four ECG datasets but shows significant degradation on rare anomalies under leave-one-domain-out evaluation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24545","ref_index":20,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Rethinking Federated Unlearning via the Lens of Memorization","primary_cat":"cs.LG","submitted_at":"2026-05-23T12:25:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Introduces Grouped Memorization Evaluation and FedMemPrune to remove unique memorized information in federated unlearning while preserving overlapping knowledge.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24041","ref_index":5,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation","primary_cat":"cs.LG","submitted_at":"2026-05-21T19:41:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"IRNO augments neural operators with learned fixed-point iterative refinement modules and a progressive spectral loss, achieving up to 56% error reduction on turbulent flow and large drops in high-frequency normalized errors on active matter.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22222","ref_index":17,"ref_count":3,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ARC-STAR: Auditable Post-Hoc Correction for PDE Foundation Models","primary_cat":"cs.LG","submitted_at":"2026-05-21T09:26:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ARC-STAR reduces velocity rollout error by at least 36x over raw Poseidon across all tested regime cells via auditable global and local correction stages on five flow benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22200","ref_index":33,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025","primary_cat":"cs.CV","submitted_at":"2026-05-21T09:04:17+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The OSS Challenge provides benchmarks showing spatiotemporal video models excel at open suturing skill classification and OSATS scoring but struggle with keypoint tracking under occlusion.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22192","ref_index":46,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Ultra-High-Definition Image Quality Assessment via Graph Representation Learning","primary_cat":"cs.CV","submitted_at":"2026-05-21T08:57:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"UHD-GCN-BIQA models structural dependencies among sampled patches via a hybrid kNN graph and residual graph convolutions to achieve competitive PLCC and SRCC with the lowest RMSE on the UHD-IQA benchmark for blind ultra-high-definition image quality assessment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22162","ref_index":11,"ref_count":3,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Spectra as Language: Large Language Models for Scalable Stellar Parameter and Abundance Inference","primary_cat":"astro-ph.IM","submitted_at":"2026-05-21T08:33:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Two-stage LLM framework infers stellar parameters and ~20 elemental abundances from spectra, showing performance gains with increasing data volume.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21352","ref_index":34,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Classification of Single and Mixed Partial Discharges under Switching Voltage Using an AWA-CNN Framework","primary_cat":"cs.LG","submitted_at":"2026-05-20T16:16:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AWA patterns from PD pulse amplitude, width, and area enable CNNs to classify single and mixed partial discharge sources under switching voltage with over 96% test accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20392","ref_index":21,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"VBT-MPC: Vision-Based Tactile MPC for Contour Following","primary_cat":"cs.RO","submitted_at":"2026-05-19T18:40:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VBT-MPC performs robotic contour following by running MPC directly in vision-based tactile contour feature space and is tested on varied geometries in simulation and real experiments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20308","ref_index":14,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"SDM: A Powerful Tool for Evaluating Model Robustness","primary_cat":"cs.CV","submitted_at":"2026-05-19T16:10:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SDM is a new staged gradient attack that reconstructs the adversarial objective around probability differences and reports stronger performance than prior methods like APGD.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19969","ref_index":49,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Your Neighbors Know: Leveraging Local Neighborhoods for Backdoor Detection in Decentralized Learning","primary_cat":"cs.LG","submitted_at":"2026-05-19T15:17:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Argus enables backdoor detection in decentralized ML by collaborative neighbor-based validation of triggers, backed by convergence theory and reducing attack success by up to 90% on tested datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19520","ref_index":46,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A Value-added Physical Properties Catalog for Low-redshift Galaxies from DESI Legacy Imaging Surveys DR10","primary_cat":"astro-ph.GA","submitted_at":"2026-05-19T08:21:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A multimodal neural network trained on MPA-JHU references produces SFR, stellar mass, and metallicity estimates for 547 million low-redshift galaxies in DESI LS DR10.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19091","ref_index":7,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Chessformer: A Unified Architecture for Chess Modeling","primary_cat":"cs.LG","submitted_at":"2026-05-18T20:27:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Chessformer is a unified encoder-only transformer for chess that uses square tokens, geometric attention bias, and an attention-based policy head to set new records in human move prediction accuracy, playing strength, and interpretability.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18591","ref_index":72,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation","primary_cat":"cs.LG","submitted_at":"2026-05-18T16:05:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18366","ref_index":33,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"7DT Insight: Variability in Young Stellar Objects","primary_cat":"astro-ph.SR","submitted_at":"2026-05-18T13:17:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Two-epoch medium-band photometry of 769 YSO candidates in Orion A identifies 110 variables (~14%), with best-fit templates dominated by cold and hot spot models over extinction or gray changes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17398","ref_index":44,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MiniGPT: Rebuilding GPT from First Principles","primary_cat":"cs.CL","submitted_at":"2026-05-17T11:32:07+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"MiniGPT is a self-contained PyTorch implementation of standard GPT autoregressive modeling that reaches 1.478 validation loss on Tiny Shakespeare with a 10.77M-parameter model and produces recognizable Shakespeare-style text.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17347","ref_index":17,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Position: Age Estimation Models Do Not Process Biometric Data","primary_cat":"cs.CY","submitted_at":"2026-05-17T09:37:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Empirical evaluation shows age estimation models perform orders of magnitude below identification thresholds on face verification benchmarks, indicating they do not extract identity-discriminative representations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17187","ref_index":139,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media","primary_cat":"cs.CL","submitted_at":"2026-05-16T22:52:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18889","ref_index":10,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Soft Learning","primary_cat":"cs.LG","submitted_at":"2026-05-16T22:14:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Soft Learning optimally combines heterogeneous ML specialists via cross-validated non-negative least squares, achieving top performance on 70% of 37 datasets with formal guarantees and 72-435x CPU speedups over deep networks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17131","ref_index":35,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation","primary_cat":"cs.CV","submitted_at":"2026-05-16T19:37:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":1.0,"formal_verification":"none","one_line_summary":"A survey that categorizes deep learning models for point cloud tasks by backbone architecture, evaluates benchmark performance, and outlines challenges and future research directions.","context_count":1,"top_context_role":"method","top_context_polarity":"background","context_text":"3D convolutional filters with a stride for extracting features from the shape. They do not use any pooling, as it was observed that pooling introduced uncertainty to shape reconstruction. They pretrain the model first and then run fine-tuning. Pre-training is run layer-wise-convolution layers and RBM layer are trained with standard contrastive divergence [35] and AM-DBN layer is trained with fast persistent contrastive divergence [99]. For fine-tuning, they use a process similar to the wake-sleep algorithm from [36]. During wake, they propagate input voxel forward through the network and update the recognition weights. During sleep, they sample persistent latent variables from the network's generative distribution and propagate them backward through the"},{"citing_arxiv_id":"2605.18878","ref_index":270,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Prognostic Value of Lung Ultrasound Biomarkers for Readmission Risk in Congestive Heart Failure: A Pilot Data-Driven Analysis","primary_cat":"eess.SP","submitted_at":"2026-05-16T02:49:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Pilot study uses pretrained video encoder features from lung ultrasound to predict 30-day CHF readmission, finding lower-lung views and temporal differences most informative with top MLP F1 of 0.80.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16134","ref_index":21,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Navigating Potholes with Geometry-Aware Sharpness Minimization","primary_cat":"cs.LG","submitted_at":"2026-05-15T16:17:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLQR+SAM pairs a slow learned geometry preconditioner with fast SAM perturbations to amplify escape from locally sharp 'potholes' while stabilizing flat basins, producing consistent gains over SAM and LLQR alone.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16468","ref_index":58,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex","primary_cat":"cs.CV","submitted_at":"2026-05-15T11:28:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MINE uses mechanistic interpretability on language-aligned image representations to generate per-voxel feature descriptions, validated via image generation and counterfactual edits that causally shift brain activation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15484","ref_index":16,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"When Does Sparse MoE Help in Vision? The Role of Backbone Compute Leverage in Sparse Routing","primary_cat":"cs.CV","submitted_at":"2026-05-15T00:01:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Sparse MoE vision models show positive accuracy gaps only when routing a substantial compute fraction ρ and using k≥2 experts at large scale; batch-axis dispatch is identified as a key failure mode.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15383","ref_index":27,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MorphoHELM: A Comprehensive Benchmark for Evaluating Representations for Microscopy-Based Morphology Assays","primary_cat":"cs.CV","submitted_at":"2026-05-14T20:13:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MorphoHELM is a new benchmark for Cell Painting morphology representations that tests methods across increasing batch effect levels and finds classic computer vision strategies remain the strongest general-purpose performers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14846","ref_index":5,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Successive convex optimization for transformer encoder model predictive control","primary_cat":"math.OC","submitted_at":"2026-05-14T13:54:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A successive convex programming framework embeds transformer encoders into MPC by deriving DC representations of attention, guaranteeing recursive feasibility and convergence to local optima under mild assumptions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14145","ref_index":7,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Rethinking the Good Enough Embedding for Easy Few-Shot Learning","primary_cat":"cs.CV","submitted_at":"2026-05-13T21:52:05+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Frozen DINOv2-L features with k-NN classification and PCA/ICA refinement achieve state-of-the-art few-shot performance on four benchmarks without any backpropagation or fine-tuning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14091","ref_index":66,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Venus-DeFakerOne: Unified Fake Image Detection & Localization","primary_cat":"cs.CV","submitted_at":"2026-05-13T20:20:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DeFakerOne is a unified foundation model for joint image-level fake image detection and pixel-level localization that reports SOTA results on 39 detection and 9 localization benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13568","ref_index":8,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Dynamical Predictive Modelling of Cardiovascular Disease Progression Post-Myocardial Infarction via ECG-Trained Artificial Intelligence Model","primary_cat":"cs.LG","submitted_at":"2026-05-13T14:05:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A contrastive-learning ECG foundation model with multitask heads predicts post-MI outcomes better than training from scratch (AUC 0.794 vs 0.608).","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18837","ref_index":33,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"VCR: Learning Valid Contextual Representation for Incomplete Wearable Signals","primary_cat":"cs.LG","submitted_at":"2026-05-13T03:11:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"VCR learns valid contextual representations for incomplete wearable signals via orthogonal disentanglement and missing-aware mixture-of-experts, improving robustness across full and missing-modality settings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12878","ref_index":19,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Adam-SHANG: A Convergent Adam-Type Method for Stochastic Smooth Convex Optimization","primary_cat":"math.OC","submitted_at":"2026-05-13T01:46:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Adam-SHANG is a convergent Adam variant for stochastic smooth convex optimization that uses a stable lagged-preconditioner update and a computable trace-ratio stepsize rule.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15216","ref_index":107,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations","primary_cat":"cs.AR","submitted_at":"2026-05-12T09:44:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"BMRUs enable analog recurrent neural network hardware via discrete outputs that suppress noise 20-fold, with one-to-one parameter-to-circuit mapping and linear power scaling for recurrence.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11846","ref_index":13,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Martingale-Consistent Self-Supervised Learning","primary_cat":"cs.LG","submitted_at":"2026-05-12T09:29:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"The paper develops a martingale-consistent SSL framework enforcing expected coherence between coarse and refined predictions via new objectives and a Monte Carlo estimator, improving robustness under partial observations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"probabilistic calibration, and representation stability under partial observation. 2 Related Work SSL across objectives and modalities.SSL has developed along several main objective families. In vision, contrastive and bootstrap methods such as SimCLR [6] and BYOL [10] learn representations by aligning augmented views, while masked reconstruction methods such as MAE learn from incomplete inputs by predicting masked content [13]. In time series, TS2Vec [28] uses hierarchical contrastive learning, while Ti-MAE [16] and SimMTM [9] adopt masked modeling objectives. In tabular learning, VIME [27], SCARF [3], and SubTab [22] adapt self-supervision through corruption-based pretext tasks, contrastive learning, and feature-subset views, respectively. Consistency across views.Many SSL methods rely on agreement across multiple views, but"},{"citing_arxiv_id":"2605.11530","ref_index":9,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Multi-Narrow Transformation as a Single-Model Ensemble: Boundary Conditions, Mechanisms, and Failure Modes","primary_cat":"cs.LG","submitted_at":"2026-05-12T04:54:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Multi-narrow single-model ensembles outperform wide baselines in low-data image classification by learning diverse features but underperform in data-rich settings where training favors few paths.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"tion under a limited budget affects predictive performance. 4. Experiments WeevaluatetheMNtransformationfromthreeperspec- tives: the data-regime dependence of its effectiveness, the mechanismunderlyingitsgainsinlow-dataregimes,andits computational implications. 4.1. Experimental Setup Unless otherwise stated, the following setup was used throughout the experiments. We employed ResNet-18 [9] as the baseline architecture and applied the MN transfor- mation defined in Sec. 3.1. Following Easy Ensemble [8], theimplementationwasbasedongroupconvolution,andthe transformation strength was controlled by 𝑟∈ {1,2,4,8,16,32}. Here,𝑟= 1corresponds to the untransformed SW baseline, and𝑟= 32correspondstoamodelcontaining1,024internal paths. WeusedCIFAR-100astheprimarydataset."},{"citing_arxiv_id":"2605.10161","ref_index":6,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"OUIDecay: Adaptive Layer-wise Weight Decay for CNNs Using Online Activation Patterns","primary_cat":"cs.LG","submitted_at":"2026-05-11T08:08:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OUIDecay adaptively rescales layer-wise weight decay in CNNs using an online activation-based Overfitting-Underfitting Indicator and outperforms fixed decay in 7 of 8 tested settings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09936","ref_index":14,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception","primary_cat":"cs.CV","submitted_at":"2026-05-11T03:33:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"benchmarking library providing modular data loaders, fine-tuning pipelines, evaluation scripts, and cross-dataset adapters for direct comparison with Places365, MS-COCO, and Cityscapes. 4.1 Task 1: Urban Scene Semantic Classification Setup:Given an image, predict itsHUSIClabel (0-9). Fine-tuned on 80K training images; evaluated on 10K test split with five-fold cross-validation.Baselines:ResNet-{18/50/152} [14], EfficientNet- B4 [36], ViT-B/16 [8], DeiT-B [37], CLIP ViT-L/14 (zero-shot + fine-tuned) [33].Metrics:Top-1 Accuracy, Macro-F1, per-class P/R/F1. 4.2 Task 2: Cross-Modal Image-Text Retrieval Task 2 evaluates two sub-configurations reflecting the dataset's two textual modalities: T2-1 (Category-Level Retrieval). Text queries are the tenHUSICclass names, formatted as \" This is a photo of {class_name}\""},{"citing_arxiv_id":"2605.21507","ref_index":24,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Visibility nowcasting in South Korea: a machine learning approach to class imbalance and distribution shift","primary_cat":"physics.ao-ph","submitted_at":"2026-05-09T16:58:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The study applies an ensemble of machine learning and deep learning models with synthetic oversampling on 2018-2020 data to nowcast visibility, finding a performance decline on 2021 test data attributed to distributional shift confirmed by Wasserstein distance on the SHAP-identified feature.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08633","ref_index":13,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction","primary_cat":"cs.DC","submitted_at":"2026-05-09T02:57:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A generative compression model using historical priors for Earth observation data achieves up to 10,000x reduction after exascale training on an Armv9 supercomputer.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"large-scale historical EO archives can indeed support the learning of strong priors. Existing efforts already cover a broad spectrum, including spectral modeling, global time-series pretraining, multi- sensor representation learning, multimodal spatio-temporal mod- eling, and language-conditioned EO generation, as represented by SpectralGPT [ 14], Prithvi-EO-2.0 [ 36], DOFA [ 42], SkySense V2 [48], OlmoEarth [ 13], Text2Earth [ 22], and RingMoE [ 2]. Mean- while, recent generative models such as MetaEarth [ 46] and Ter- raMind [15] further suggest that these large-scale priors can be ex- tended beyond representation learning toward global-scale image generation and multimodal generative modeling. Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model Conference'26, Nov 2026, USA"},{"citing_arxiv_id":"2605.07740","ref_index":17,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"LAMES: A Large-Scale and Artisanal Mining Environmental Segmentation Dataset","primary_cat":"cs.CV","submitted_at":"2026-05-08T13:46:57+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LAMES is a new annotated remote-sensing dataset covering 150 large-scale mining sites and 870 km² of artisanal mining for environmental segmentation and monitoring tasks.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"training, validation, and test data sets are split based on entire mining sites rather than individual patches, ensur- ing that all patches from a given site are confined to either the training or test set, see Fig 10. 5.1. Mining Sector Classification (HiRes Imagery) We selected the established U-Net architecture [37], in- corporating a ResNet-50 backbone [17] trained on Ima- geNet [11] as the network architecture. U-Net is a widely recognized semantic segmentation model, demonstrating robust performance in both computer vision and remote sensing applications. The mining sites were divided into 38 for training, 14 for validation, and 19 for testing. Each bounding box of the mining sites was divided into patches"},{"citing_arxiv_id":"2605.07466","ref_index":32,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A Unified Framework for the Detection and Classification of Fatty Pancreas in Ultrasound Images","primary_cat":"cs.CV","submitted_at":"2026-05-08T09:13:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A TransUNet-based segmentation followed by texture comparison classifies fatty pancreas in ultrasound with 89.7% accuracy on a small clinical dataset.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Our framework operates in three distinct stages, as follows: 1.Segmentation stage.We employ a TransUNet- based architecture [3, 19] combining a ResNet [31] encoder with transformer bottleneck layers to segment both the pancreas and the splenic vein from ultrasound images. The models are initial- ized via transfer learning from a liver segmen- tation task [32] and fine-tuned on our clinical dataset. 2.Anatomically-Guided Patch Extraction stage. Using the predicted segmentation masks, we ex- tract tissue patches from two anatomically rele- vant regions: the pancreatic parenchyma (exclud- ing the splenic vein) and the peri-venous fat re- gion immediately beneath the splenic vein con- tour. 3.Classification via Texture Comparison stage."}],"limit":100,"offset":0}}