hub Mixed citations

Richard Landis and Gary G

· 1977 · DOI 10.2307/2529310

Mixed citation behavior. Most common role is background (67%).

24 Pith papers citing it

Background 67% of classified citations

open at publisher browse 24 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 8 method 1

citation-polarity summary

background 6 unclear 2 use method 1

representative citing papers

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

cs.CR · 2026-01-15 · unverdicted · novelty 8.0

26.1% of analyzed AI agent skills contain vulnerabilities across 14 patterns, with executable scripts raising risk 2.12x, based on static and LLM analysis of 31k skills.

REStack: A Large-Scale Dataset of Reverse Engineering Discussions from Stack Exchange

cs.SE · 2026-06-03 · unverdicted · novelty 7.0

REStack is a new public dataset of 12k+ RE discussions from Stack Exchange sites, enriched with 23 LDA-derived topics grouped into six categories and community-derived difficulty metadata.

Retrieval as Reasoning: Self-Evolving Agent-Native Retrieval via LLM-Wiki

cs.CL · 2026-05-25 · unverdicted · novelty 7.0

LLM-Wiki structures external knowledge as compilable wiki pages with links and persistent self-correction, achieving SOTA results on HotpotQA, MuSiQue, and 2WikiMultiHopQA by 2.0-8.1 F1 points over prior RAG systems.

Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.

Evaluating Tool-Using Language Agents: Judge Reliability, Propagation Cascades, and Runtime Mitigation in AgentProp-Bench

cs.AI · 2026-04-17 · conditional · novelty 7.0

AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cuts hallucinations 23pp on GPT-4o-mini but not Gemini-2.0-Flash.

Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.

Designing Safe and Accountable GenAI as a Learning Companion with Women Banned from Formal Education

cs.CY · 2026-04-08 · conditional · novelty 7.0

Participatory design with 20 Afghan women reveals that safe GenAI learning companions must prioritize privacy, cultural fit, and genuine learning support, with the process itself linked to higher aspirations and agency.

ToxiREX: A Dataset on Toxic REasoning in ConteXt

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.

AI translation of literary texts is "fine", but readers still prefer human translations

cs.CL · 2026-06-24 · unverdicted · novelty 6.0

Human readers prefer human literary translations over AI-generated ones for immersion and clarity despite finding MT adequate and struggling to identify the source.

Beyond the Grave: An Empirical Study of Dormancy and Revival in Scientific Open-Source Software

cs.SE · 2026-06-18 · accept · novelty 6.0

Empirical analysis of 2,984 dormant-revived scientific OSS projects shows fixed inactivity thresholds are insufficient for classifying abandonment, with lifecycle archetypes providing better discrimination.

Measuring Curriculum Alignment across Topical Coverage, Competency, and Cognitive Depth: A Longitudinal Framework Applied to CS2013 and CS2023

cs.AI · 2026-06-17 · unverdicted · novelty 6.0

A retrieve-then-confirm framework applied to one CS program finds ~50% coverage of both CS2013 and CS2023, ~88% competency articulation, and lower cognitive depth under the newer guideline (76% vs 95%).

Beyond the Tip of the Iceberg: Understanding SATD in Dockerfiles through the Lens of Co-evolution

cs.SE · 2026-05-20 · unverdicted · novelty 6.0

Analysis of SATD in Dockerfiles shows 27% of admissions and 40% of repayments are coupled to non-Dockerfile artifacts, with coupled events repaid faster overall and external dependencies as a key trigger.

An Annotation Scheme and Classifier for Personal Facts in Dialogue

cs.CL · 2026-05-11 · accept · novelty 6.0

An extended annotation scheme with new categories and attributes plus a Gemma-300M-based multi-head classifier achieves 81.6% macro F1 on personal fact classification, outperforming few-shot LLM baselines by nearly 9 points with lower compute.

VisInject: Disruption != Injection -- A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Models

cs.CR · 2026-05-02 · conditional · novelty 6.0

Universal adversarial attacks cause output perturbation 90 times more often than precise target injection in VLMs, with only 2 verbatim successes out of 6615 tests.

Reducing Maintenance Burden in Behaviour-Driven Development: A Paraphrase-Robust Duplicate-Step Detector with a 1.1M-Step Open Benchmark

cs.SE · 2026-04-22 · unverdicted · novelty 6.0

A paraphrase-robust duplicate-step detector for Gherkin BDD suites, built on a new 1.1M-step public corpus, reports F1 scores up to 0.906 and estimates 893k eliminable step occurrences corpus-wide.

User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models

cs.SE · 2026-05-12 · conditional · novelty 5.0

LLMs can detect usability content in user reviews with F-scores comparable to humans, though performance depends strongly on prompt design.

Scalable LLM-based Coding of Dialogue in Healthcare Simulation: Balancing Coding Performance, Processing Time, and Environmental Impact

cs.HC · 2026-04-25 · unverdicted · novelty 5.0

Larger batch sizes for LLM dialogue coding in healthcare simulations improve speed and reduce energy consumption while decreasing coding accuracy compared to human labels.

Exploring and Testing Skill-Based Behavioral Profile Annotation: Human Operability and LLM Feasibility under Schema-Guided Execution

cs.CL · 2026-04-16 · unverdicted · novelty 5.0

Decomposing BP annotation into 14 skills shows 5 directly operable, 4 recoverable after re-annotation, and 5 structurally underspecified, with GPT-5.4 reaching 0.678 accuracy on retained skills and human-GPT difficulty correlating at r=0.881 at the skill level but near zero at instance and lexical-1

ToxiShield: Promoting Inclusive Developer Communication through Real-Time Toxicity Filtering

cs.SE · 2026-04-15 · unverdicted · novelty 5.0

ToxiShield delivers a real-time GitHub extension with a BERT toxicity detector at 98% accuracy, a Claude-based coach, and a fine-tuned Llama reframer at 95% style transfer accuracy, validated by a 10-person TAM study.

Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications

cs.SE · 2026-03-13 · unverdicted · novelty 5.0

An automated self-testing framework with evidence-based quality gates for LLM application releases was evaluated in a longitudinal case study of a multi-agent conversational AI system, identifying rollback builds and supporting stable quality over four weeks.

Prompt Quality and Pull Request Outcomes: A Stage-Based Empirical Study of LLM-Assisted Development

cs.SE · 2026-06-17 · unverdicted · novelty 4.0

Specificity and Context predict actionable code generation while Verification predicts adoption and Context predicts integration depth in LLM-assisted PR workflows.

Writing Blog Posts Helps Students Connect Experiential Learning to the Workplace

cs.CY · 2026-04-21 · unverdicted · novelty 4.0

Guided blog posts during work-based learning enable CS students to produce deep reflections on problem-solving, collaboration, and personal growth that they can use in resumes and interviews.

Do Small Language Models Know When They're Wrong? Confidence-Based Cascade Scoring for Educational Assessment

cs.CY · 2026-03-29 · unverdicted · novelty 4.0

Verbalized confidence from small LMs enables cost-effective cascade routing for automated educational scoring, matching large-model accuracy at 76% lower cost when discrimination is strong.

VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns

cs.CR · 2026-05-03

citing papers explorer

Showing 4 of 4 citing papers after filters.

Evaluating Tool-Using Language Agents: Judge Reliability, Propagation Cascades, and Runtime Mitigation in AgentProp-Bench cs.AI · 2026-04-17 · conditional · none · ref 11
AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cuts hallucinations 23pp on GPT-4o-mini but not Gemini-2.0-Flash.
Designing Safe and Accountable GenAI as a Learning Companion with Women Banned from Formal Education cs.CY · 2026-04-08 · conditional · none · ref 40
Participatory design with 20 Afghan women reveals that safe GenAI learning companions must prioritize privacy, cultural fit, and genuine learning support, with the process itself linked to higher aspirations and agency.
VisInject: Disruption != Injection -- A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Models cs.CR · 2026-05-02 · conditional · none · ref 12
Universal adversarial attacks cause output perturbation 90 times more often than precise target injection in VLMs, with only 2 verbatim successes out of 6615 tests.
User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models cs.SE · 2026-05-12 · conditional · none · ref 24
LLMs can detect usability content in user reviews with F-scores comparable to humans, though performance depends strongly on prompt design.

Richard Landis and Gary G

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer