archive
Every paper Pith has read. Search by title, abstract, or pith.
1286 papers in cs.IR · page 9
-
Secret key creates green-red bias to watermark recommenders
Green-Red Watermarking for Recommender Systems
-
Hybrid detector cuts AI phishing misses by 78 points with privacy intact
CyberCane: Neuro-Symbolic RAG for Privacy-Preserving Phishing Detection with Formal Ontology Reasoning
-
Adaptive SID learning preserves compatible overlaps for better recs
Beyond Static Collision Handling: Adaptive Semantic ID Learning for Multimodal Recommendation at Industrial Scale
-
This paper assembles four new Reddit-sourced datasets for detecting suicidal ideation
A Benchmark Suite of Reddit-Derived Datasets for Mental Health Detection
-
Prompt chaining lifts LLM accuracy on scientific text classification
Automating Categorization of Scientific Texts with In-Context Learning and Prompt-Chaining in Large Language Models
-
Web workbench turns IR user simulation into visual shareable workflows
IIRSim Studio: A Dashboard for User Simulation
-
Typos cause plan collapse in generative retrieval
Lost in Decoding? Reproducing and Stress-Testing the Look-Ahead Prior in Generative Retrieval
-
Sparse memory updates stabilize generative retrieval on growing collections
A Parametric Memory Head for Continual Generative Retrieval
-
Distillation creates linear retriever that nearly matches reranker on rationale tasks
Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA
-
Distillation turns reranker into linear-time retriever
Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA
-
Multimodal embeddings fail to follow modality instructions
MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models
-
Pro-GEO cuts geographic clustering distance by 45.6%
Birds of a Feather Cluster Nearby: a Proximity-Aware Geo-Codebook for Local Service Recommendation
-
Users and AI co-edit knowledge graphs for better organization
MindTrellis: Co-Creating Knowledge Structures with AI through Interactive Visual Exploration
-
Pretrained music models lag in recommendations vs tagging
Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems
-
Support penalty selects reliable two-stage recommender policies
CASP: Support-Aware Offline Policy Selection for Two-Stage Recommender Systems
-
RAG model predicts osteomyelitis outcomes from mixed clinical records
RAG4Outcome: A Retrieval-Augmented Multimodal Framework for Prognostic Prediction in Chronic Osteomyelitis
-
Self-re-expression adapts LLMs to tasks using only unannotated data
Self Knowledge Re-expression: A Fully Local Method for Adapting LLMs to Tasks Using Intrinsic Knowledge
-
Distilled embeddings deliver 30% better retrieval at 180x LLM speed
Aligning Dense Retrievers with LLM Utility via Distillation
-
QPP picks query variants that raise RAG answer quality
Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines
-
Bi-level optimization enables learnable graph filters for recommendations
ASPIRE: Make Spectral Graph Collaborative Filtering Great Again via Adaptive Filter Learning
-
Beam-search negatives reshape AUC to partial AUC in LLM recommenders
Objective Shaping with Hard Negatives: Windowed Partial AUC Optimization for RL-based LLM Recommenders
-
Compact model beats 23x larger rival on patent retrieval
Citation-Driven Multi-View Training for Patent Embeddings: QaECTER and Sophia-Bench
-
Benchmark of 10k agents reveals semantic search misses real performance
AgentSearchBench: A Benchmark for AI Agent Search in the Wild
-
Alignment distorts signals in semantic-collaborative recommender fusion
Rethinking Semantic Collaborative Integration: Why Alignment Is Not Enough
-
ResRank reranks with one token per passage and no generation
ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression
-
Sharpness-aware method improves poisoning attack transfer in recommenders
Sharpness-Aware Poisoning: Enhancing Transferability of Injective Attacks on Recommender Systems
-
ReCast matches RL performance with 4% rollout budget in generative recs
ReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation
-
ESPRESSO scales keyword search over Solid pods with privacy
Implementation and Privacy Guarantees for Scalable Keyword Search on SOLID-based Decentralized Data with Granular Visibility Constraints
-
Profile portability changes user utility by algorithm
Multistakeholder Impacts of Profile Portability in a Recommender Ecosystem
-
Hierarchical memory links events for better LLM reasoning at lower cost
StructMem: Structured Memory for Long-Horizon Behavior in LLMs
-
Logic networks match DNNs for video copy detection with tiny descriptors
Efficient Logic Gate Networks for Video Copy Detection
-
Counterfactual model boosts pre-promotion conversion forecasts
Counterfactual Multi-task Learning for Delayed Conversion Modeling in E-commerce Sales Pre-Promotion
-
Multi-task model improves delayed conversion predictions in pre-promotions
Counterfactual Multi-task Learning for Delayed Conversion Modeling in E-commerce Sales Pre-Promotion
-
LLM user profiles distilled into sequential recommenders keep serving fast
Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation
-
SAE concepts replace tokens in SPLADE with comparable retrieval and better efficiency
From Tokens to Concepts: Leveraging SAE for SPLADE
-
Corpus of 301k systematic reviews spans all scientific fields
A Large-Scale, Cross-Disciplinary Corpus of Systematic Reviews
-
Wavelet packets align graph signals to temporal scales in recs
WPGRec: Wavelet Packet Guided Graph Enhanced Sequential Recommendation
-
Benchmark tests LLMs on integrated scientific paper reasoning
PaperMind: Benchmarking Agentic Reasoning and Critique over Scientific Papers in Multimodal LLMs
-
Separate encoders plus explanatory discriminator improve authorship attribution
Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI
-
Verbatim storage, not spatial metaphor, explains MemPalace recall
Spatial Metaphors for LLM Memory: A Critical Analysis of the MemPalace Architecture
-
LLM pipeline raises multi-table matching F1 by 5.1 percent
Unlocking the Power of Large Language Models for Multi-table Entity Matching
-
IntrAgent lifts literature retrieval accuracy 13.2 percent
IntrAgent: An LLM Agent for Content-Grounded Information Retrieval through Literature Review
-
Fine-tuned LLM matches supervised accuracy on next job prediction
On Reasoning Behind Next Occupation Recommendation
-
Dialect cues bypass LLM safety filters unlike explicit profiles
Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs. Explicit User Profiles
-
Non-English sources needed for realistic multilingual ToT queries
Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation
-
AI extracts pharmacokinetic data from XML tables by structure
Automated Extraction of Pharmacokinetic Parameters from Structured XML Scientific Articles: Enhancing Data Accessibility at Scale
-
Eye tracking shows carousel users defy web-search patterns
Following the Eye-Tracking Evidence: Established Web-Search Assumptions Fail in Carousel Interfaces
-
Entity clusters replace averages for reliable retrieval tests
Coverage, Not Averages: Semantic Stratification for Trustworthy Retrieval Evaluation
-
Self-aware embeddings double RAG accuracy on versioned queries
Self-Aware Vector Embeddings for Retrieval-Augmented Generation: A Neuroscience-Inspired Framework for Temporal, Confidence-Weighted, and Relational Knowledge
-
Multi-agent AI generates more diverse research ideas
Enhancing Research Idea Generation through Combinatorial Innovation and Multi-Agent Iterative Search Strategies