hub Mixed citations

Richard Landis and Gary G

· 1977 · DOI 10.2307/2529310

Mixed citation behavior. Most common role is background (67%).

17 Pith papers citing it

Background 67% of classified citations

open at publisher browse 17 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 8 method 1

citation-polarity summary

background 6 unclear 2 use method 1

representative citing papers

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

cs.CR · 2026-01-15 · unverdicted · novelty 8.0

26.1% of analyzed AI agent skills contain vulnerabilities across 14 patterns, with executable scripts raising risk 2.12x, based on static and LLM analysis of 31k skills.

Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.

VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns

cs.CR · 2026-05-03 · unverdicted · novelty 7.0

VulKey reaches 31.5% repair accuracy on real C/C++ vulnerabilities by matching hierarchical expert patterns to guide LLM patch generation, beating prior baselines by 7.6%.

Evaluating Tool-Using Language Agents: Judge Reliability, Propagation Cascades, and Runtime Mitigation in AgentProp-Bench

cs.AI · 2026-04-17 · conditional · novelty 7.0

AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cuts hallucinations 23pp on GPT-4o-mini but not Gemini-2.0-Flash.

Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.

Designing Safe and Accountable GenAI as a Learning Companion with Women Banned from Formal Education

cs.CY · 2026-04-08 · conditional · novelty 7.0

Participatory design with 20 Afghan women reveals that safe GenAI learning companions must prioritize privacy, cultural fit, and genuine learning support, with the process itself linked to higher aspirations and agency.

Beyond the Tip of the Iceberg: Understanding SATD in Dockerfiles through the Lens of Co-evolution

cs.SE · 2026-05-20 · unverdicted · novelty 6.0

Analysis of SATD in Dockerfiles shows 27% of admissions and 40% of repayments are coupled to non-Dockerfile artifacts, with coupled events repaid faster overall and external dependencies as a key trigger.

An Annotation Scheme and Classifier for Personal Facts in Dialogue

cs.CL · 2026-05-11 · accept · novelty 6.0

An extended annotation scheme with new categories and attributes plus a Gemma-300M-based multi-head classifier achieves 81.6% macro F1 on personal fact classification, outperforming few-shot LLM baselines by nearly 9 points with lower compute.

VisInject: Disruption != Injection -- A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Models

cs.CR · 2026-05-02 · conditional · novelty 6.0

Universal adversarial attacks cause output perturbation 90 times more often than precise target injection in VLMs, with only 2 verbatim successes out of 6615 tests.

Reducing Maintenance Burden in Behaviour-Driven Development: A Paraphrase-Robust Duplicate-Step Detector with a 1.1M-Step Open Benchmark

cs.SE · 2026-04-22 · unverdicted · novelty 6.0

A paraphrase-robust duplicate-step detector for Gherkin BDD suites, built on a new 1.1M-step public corpus, reports F1 scores up to 0.906 and estimates 893k eliminable step occurrences corpus-wide.

User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models

cs.SE · 2026-05-12 · conditional · novelty 5.0

LLMs can detect usability content in user reviews with F-scores comparable to humans, though performance depends strongly on prompt design.

Scalable LLM-based Coding of Dialogue in Healthcare Simulation: Balancing Coding Performance, Processing Time, and Environmental Impact

cs.HC · 2026-04-25 · unverdicted · novelty 5.0

Larger batch sizes for LLM dialogue coding in healthcare simulations improve speed and reduce energy consumption while decreasing coding accuracy compared to human labels.

Exploring and Testing Skill-Based Behavioral Profile Annotation: Human Operability and LLM Feasibility under Schema-Guided Execution

cs.CL · 2026-04-16 · unverdicted · novelty 5.0

Decomposing BP annotation into 14 skills shows 5 directly operable, 4 recoverable after re-annotation, and 5 structurally underspecified, with GPT-5.4 reaching 0.678 accuracy on retained skills and human-GPT difficulty correlating at r=0.881 at the skill level but near zero at instance and lexical-1

ToxiShield: Promoting Inclusive Developer Communication through Real-Time Toxicity Filtering

cs.SE · 2026-04-15 · unverdicted · novelty 5.0

ToxiShield delivers a real-time GitHub extension with a BERT toxicity detector at 98% accuracy, a Claude-based coach, and a fine-tuned Llama reframer at 95% style transfer accuracy, validated by a 10-person TAM study.

Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications

cs.SE · 2026-03-13 · unverdicted · novelty 5.0

An automated self-testing framework with evidence-based quality gates for LLM application releases was evaluated in a longitudinal case study of a multi-agent conversational AI system, identifying rollback builds and supporting stable quality over four weeks.

Writing Blog Posts Helps Students Connect Experiential Learning to the Workplace

cs.CY · 2026-04-21 · unverdicted · novelty 4.0

Guided blog posts during work-based learning enable CS students to produce deep reflections on problem-solving, collaboration, and personal growth that they can use in resumes and interviews.

Do Small Language Models Know When They're Wrong? Confidence-Based Cascade Scoring for Educational Assessment

cs.CY · 2026-03-29 · unverdicted · novelty 4.0

Verbalized confidence from small LMs enables cost-effective cascade routing for automated educational scoring, matching large-model accuracy at 76% lower cost when discrimination is strong.

citing papers explorer

Showing 17 of 17 citing papers.

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale cs.CR · 2026-01-15 · unverdicted · none · ref 16
26.1% of analyzed AI agent skills contain vulnerabilities across 14 patterns, with executable scripts raising risk 2.12x, based on static and LLM analysis of 31k skills.
Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception cs.CV · 2026-05-11 · unverdicted · none · ref 20
Urban-ImageNet is a 2-million-image multi-modal dataset with HUSIC 10-class taxonomy enabling benchmarks for urban scene classification, cross-modal retrieval, and instance segmentation.
VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns cs.CR · 2026-05-03 · unverdicted · none · ref 34
VulKey reaches 31.5% repair accuracy on real C/C++ vulnerabilities by matching hierarchical expert patterns to guide LLM patch generation, beating prior baselines by 7.6%.
Evaluating Tool-Using Language Agents: Judge Reliability, Propagation Cascades, and Runtime Mitigation in AgentProp-Bench cs.AI · 2026-04-17 · conditional · none · ref 11
AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cuts hallucinations 23pp on GPT-4o-mini but not Gemini-2.0-Flash.
Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification cs.CV · 2026-04-09 · unverdicted · none · ref 48
The C-Score quantifies intra-class explanation consistency for CAM methods via confidence-weighted pairwise soft IoU and detects AUC-consistency dissociation as an early warning for model instability on chest X-ray classification.
Designing Safe and Accountable GenAI as a Learning Companion with Women Banned from Formal Education cs.CY · 2026-04-08 · conditional · none · ref 40
Participatory design with 20 Afghan women reveals that safe GenAI learning companions must prioritize privacy, cultural fit, and genuine learning support, with the process itself linked to higher aspirations and agency.
Beyond the Tip of the Iceberg: Understanding SATD in Dockerfiles through the Lens of Co-evolution cs.SE · 2026-05-20 · unverdicted · none · ref 33
Analysis of SATD in Dockerfiles shows 27% of admissions and 40% of repayments are coupled to non-Dockerfile artifacts, with coupled events repaid faster overall and external dependencies as a key trigger.
An Annotation Scheme and Classifier for Personal Facts in Dialogue cs.CL · 2026-05-11 · accept · none · ref 19
An extended annotation scheme with new categories and attributes plus a Gemma-300M-based multi-head classifier achieves 81.6% macro F1 on personal fact classification, outperforming few-shot LLM baselines by nearly 9 points with lower compute.
VisInject: Disruption != Injection -- A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Models cs.CR · 2026-05-02 · conditional · none · ref 12
Universal adversarial attacks cause output perturbation 90 times more often than precise target injection in VLMs, with only 2 verbatim successes out of 6615 tests.
Reducing Maintenance Burden in Behaviour-Driven Development: A Paraphrase-Robust Duplicate-Step Detector with a 1.1M-Step Open Benchmark cs.SE · 2026-04-22 · unverdicted · none · ref 18
A paraphrase-robust duplicate-step detector for Gherkin BDD suites, built on a new 1.1M-step public corpus, reports F1 scores up to 0.906 and estimates 893k eliminable step occurrences corpus-wide.
User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models cs.SE · 2026-05-12 · conditional · none · ref 24
LLMs can detect usability content in user reviews with F-scores comparable to humans, though performance depends strongly on prompt design.
Scalable LLM-based Coding of Dialogue in Healthcare Simulation: Balancing Coding Performance, Processing Time, and Environmental Impact cs.HC · 2026-04-25 · unverdicted · none · ref 29
Larger batch sizes for LLM dialogue coding in healthcare simulations improve speed and reduce energy consumption while decreasing coding accuracy compared to human labels.
Exploring and Testing Skill-Based Behavioral Profile Annotation: Human Operability and LLM Feasibility under Schema-Guided Execution cs.CL · 2026-04-16 · unverdicted · none · ref 5
Decomposing BP annotation into 14 skills shows 5 directly operable, 4 recoverable after re-annotation, and 5 structurally underspecified, with GPT-5.4 reaching 0.678 accuracy on retained skills and human-GPT difficulty correlating at r=0.881 at the skill level but near zero at instance and lexical-1
ToxiShield: Promoting Inclusive Developer Communication through Real-Time Toxicity Filtering cs.SE · 2026-04-15 · unverdicted · none · ref 27
ToxiShield delivers a real-time GitHub extension with a BERT toxicity detector at 98% accuracy, a Claude-based coach, and a fine-tuned Llama reframer at 95% style transfer accuracy, validated by a 10-person TAM study.
Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications cs.SE · 2026-03-13 · unverdicted · none · ref 15
An automated self-testing framework with evidence-based quality gates for LLM application releases was evaluated in a longitudinal case study of a multi-agent conversational AI system, identifying rollback builds and supporting stable quality over four weeks.
Writing Blog Posts Helps Students Connect Experiential Learning to the Workplace cs.CY · 2026-04-21 · unverdicted · none · ref 23
Guided blog posts during work-based learning enable CS students to produce deep reflections on problem-solving, collaboration, and personal growth that they can use in resumes and interviews.
Do Small Language Models Know When They're Wrong? Confidence-Based Cascade Scoring for Educational Assessment cs.CY · 2026-03-29 · unverdicted · none · ref 19
Verbalized confidence from small LMs enables cost-effective cascade routing for automated educational scoring, matching large-model accuracy at 76% lower cost when discrimination is strong.

Richard Landis and Gary G

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer