TeleEmbedBench is the first multi-corpus benchmark showing LLM-based embedding models significantly outperform traditional sentence-transformers on telecommunications specifications and code for retrieval accuracy and noise robustness.
C-pack: Packaged resources to advance general chinese embedding,
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
NasZip delivers up to 8.4x speedup over CPU baselines and 1.69x over prior NDP accelerators for ANNS by combining near-data processing with statistics-based PCA early exiting, dynamic-float encoding, and data-aware neighbor mapping.
Measurement of 688 AI infra repositories shows frequent overlapping vulnerable patterns, and INFRASCOPE detects over 20 variants including 11 acknowledged and 4 with new CVEs.
GovScape delivers multimodal search over 10 million government PDFs using metadata, exact text, semantic embeddings, and visual page features at an estimated $1,500 preprocessing cost.
MulFSA combines micro-level firm sentiment, meso-level industry sentiment, and duration-aware smoothing from PLMs/LLMs to extract a daily sentiment index that reduces credit spread forecast errors by 10.25% MAE and 11.94% MAPE on a 1.35M-text Chinese bond corpus.
Three-aspect RAG query pipeline optimization for cancer patient QA introduces HSRDR and SEOS and reports 5.24% accuracy gain on Claude-3-haiku versus chain-of-thought on a custom dataset.
citing papers explorer
-
TeleEmbedBench: A Multi-Corpus Embedding Benchmark for RAG in Telecommunications
TeleEmbedBench is the first multi-corpus benchmark showing LLM-based embedding models significantly outperform traditional sentence-transformers on telecommunications specifications and code for retrieval accuracy and noise robustness.
-
NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing
NasZip delivers up to 8.4x speedup over CPU baselines and 1.69x over prior NDP accelerators for ANNS by combining near-data processing with statistics-based PCA early exiting, dynamic-float encoding, and data-aware neighbor mapping.
-
Hunting Vulnerability Variants in AI Infra: Measurement and Reference-Driven Detection
Measurement of 688 AI infra repositories shows frequent overlapping vulnerable patterns, and INFRASCOPE detects over 20 variants including 11 acknowledged and 4 with new CVEs.
-
GovScape: A Public Multimodal Search System for 70 Million Pages of Government PDFs
GovScape delivers multimodal search over 10 million government PDFs using metadata, exact text, semantic embeddings, and visual page features at an estimated $1,500 preprocessing cost.
-
MulFSA: Multi-level Financial Sentiment Analysis Framework for Bond Market
MulFSA combines micro-level firm sentiment, meso-level industry sentiment, and duration-aware smoothing from PLMs/LLMs to extract a daily sentiment index that reduces credit spread forecast errors by 10.25% MAE and 11.94% MAPE on a 1.35M-text Chinese bond corpus.
-
Query pipeline optimization for cancer patient question answering systems
Three-aspect RAG query pipeline optimization for cancer patient QA introduces HSRDR and SEOS and reports 5.24% accuracy gain on Claude-3-haiku versus chain-of-thought on a custom dataset.