JackZebra performs long-horizon route hijacking of vision-based AVs by converting adversarial patches into online-selected steering primitives via closed-loop control from an attacker vehicle.
super hub Mixed citations
librosa/librosa: 0.6.3
Mixed citation behavior. Most common role is background (55%).
hub tools
citation-role summary
citation-polarity summary
co-cited works
representative citing papers
Fetal-Gauge benchmark shows state-of-the-art vision-language models reach only 55% accuracy on fetal ultrasound tasks, well below clinical needs and highlighting the requirement for domain-adapted models.
Derives non-asymptotic 2-norm and infinity-norm error bounds for deterministic and stochastic variants of OPTQ and Qronos PTQ algorithms.
A new 321-patient multi-center breast FNAC WSI dataset with 7398 patch-level C1-C5 annotations is released for AI-assisted classification research.
The cubic sum rule of S(q,ω) is tested as a kinetic energy estimator using PIMC data and dielectric models for the uniform electron gas, confirming consistency with thermodynamics but exposing flaws in semi-classical approximations.
STEMGym benchmark demonstrates that perception pipelines dominate dose efficiency in autonomous STEM over navigation methods across 33 agent setups.
A parton-shower-inspired local subtraction scheme for double-real corrections in color singlet decays is introduced, with finiteness verified for the e+e- to qqbar remainder and phase-space integrals computed analytically and via sector decomposition.
ColumnKeeper provides the first mitigations for ColumnDisturb using per-subarray counters or probabilistic refresh, with low overheads at 1M and 128K thresholds.
Proves sharp threshold on mutation parameter χ for (1+1)-EA on Dynamic Binary Value and Uniform weight dynamic linear problems, yielding O(n log n) runtime below threshold and 2^Ω(n) above, plus a second stagnation-distance threshold for the former.
POPSICLE introduces benchmark datasets for cryoET segmentation and localization built from the CryoET Data Portal.
First unified benchmark finds GLR family has only 3x median slowdown over LR(1) on deterministic grammars and is the fastest among generalized parsers.
Introduces ESAS benchmark dataset using LLM-assisted event injection into acoustic scenes, showing significant performance drops in existing ASC models.
Optimal SSB frame origin for LGWA cuts sampling time by 10x and tightens chirp mass and sky position constraints for stellar-mass binaries beyond LVK performance.
Randomized experiment finds AI draft assistance raises feedback provision by teaching assistants 10.8 percentage points without harming quality.
A matched-pair protocol and Accurate Differentiation Rate metric reveal that conventional LLM accuracy on SAT problems is often inflated by over-predicting satisfiability, while cross-representation agreement exceeds 80 percent for most models.
A new greedy rebalancing algorithm for multi-constraint hypergraphs, integrated into Mt-KaHyPar, reduces geometric mean connectivity by 11.5% versus Metis while improving partition balance reliability.
Bayesian optimization identifies cement-salt hydrate composites achieving up to five times higher specific energy than prior cement-based TCES materials, with LiCl-based formulations reaching 458 kJ/kg.
FLDD learns non-Markovian marginal and posterior distributions for the forward process so a factorized reverse process can match the target better and produce higher-quality samples in fewer steps.
Human face perception aligns with neural networks trained on inverse-generative and naturalistic discriminative tasks, as these best predict human dissimilarity judgments on controversial and random face pairs.
An SMT-based active learning algorithm learns minimal nondeterministic weighted automata over arbitrary semirings, with partial correctness proofs, a sufficient termination condition, and experiments showing smaller models and fewer queries than baselines.
Rabi coupling allows a third component to join a self-bound binary quantum droplet in Bose gases, stabilized by finite detuning despite added repulsive forces.
FDA-QC combines functional data analysis of curves with quasi-conformal mappings to register and analyze both boundaries and interiors of planar biological shapes for morphing and variation studies.
Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.
A new Java bytecode optimizer fuses map and filter into mapMulti to reduce stream overhead, sidestepping Streamliner's restrictions and delivering superior results in two of nine benchmarks while passing all 31,799 Kafka tests.
citing papers explorer
-
Cache-Related Smells in GitLab CI/CD: Comprehensive Catalog, Automated Detection, and Empirical Evidence
A catalog of ten cache smells in GitLab CI/CD, an automated detector achieving 0.98 F1, and empirical evidence that the smells appear in 89% of 228 mature open-source projects.
-
Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering
StackRepoQA shows LLMs reach only moderate accuracy on multi-file Java QA tasks, with gains from graph-based retrieval but frequent reliance on verbatim answer reproduction.
-
PracRepair: LLM-Empowered Automated Program Repair Inspired by Human-Like Debugging Practices
PracRepair uses LLMs for on-demand static-dynamic context construction, question-driven failure diagnosis, and iterative patch refinement via validation diagnostics and trace changes, fixing more bugs than baselines on Defects4J V1.2/V2.0 and RWB.
-
AI-Generated Smells: An Analysis of Code and Architecture in LLM and Agent-Driven Development
More capable LLMs and agents generate code with greater volume and architectural decay, following a Volume-Quality Inverse Law that neither functional correctness nor prompting mitigates.
-
Configuring Agentic AI Coding Tools: An Exploratory Study
Developers overwhelmingly rely on simple static context files such as AGENTS.md to configure agentic AI coding tools, while advanced mechanisms like skills and subagents see very low adoption.
-
Challenges in Android Data Disclosure: An Empirical Study
Survey and forum analysis of 683 Android developers finds they manually classify app data for Google's Data Safety Section or skip it, feel confident spotting collected data but not in translating it to the form, and worry about rejection.
-
Generating Verifiable Chain of Thoughts from Exection-Traces
A pipeline produces 54,000 execution-trace-verified bi-directional Chain-of-Thought rationales for code, and fine-tuning on them yields gains up to 26.6 points on LiveCodeBench-Exec and similar benchmarks.
-
Same Scrutiny, More Time: Eye Tracking Insights into Reviewing LLM-Labelled Code
Eye-tracking experiment finds that labeling code as LLM-generated increases fixation time without changing review thoroughness, with reviewers adapting criteria or using the prompt.
-
ReproScore: Separating Readiness from Outcome in Research Software Reproducibility Assessment
ReproScore separates readiness (26 static sub-metrics) from outcome (execution probes) and shows near-zero correlation between them on 423 repositories, validating the separation.
-
How Do Software Engineering Students Use Generative AI in Real-World Capstone Projects? An Empirical Baseline Study
This empirical baseline study characterizes generative AI usage across the software lifecycle in capstone projects, student-recommended responsible practices, and client expectations for understanding and quality.
-
Reliability of AI Bots Footprints in GitHub Actions CI/CD Workflows
Large-scale analysis of AI bot PRs shows Copilot and Codex achieve the highest CI/CD success rates but more frequent AI contributions correlate with reduced workflow reliability.
-
Fairness-First Design Thinking for Software Architecture
A fairness-first Design Thinking method is proposed and tested in software architecture education to systematically address hidden fairness issues in digital systems.
-
ToxiShield: Promoting Inclusive Developer Communication through Real-Time Toxicity Filtering
ToxiShield delivers a real-time GitHub extension with a BERT toxicity detector at 98% accuracy, a Claude-based coach, and a fine-tuned Llama reframer at 95% style transfer accuracy, validated by a 10-person TAM study.
-
OpDiffer: LLM-Assisted Opcode-Level Differential Testing of Ethereum Virtual Machine
OpDiffer applies LLMs and static analysis to opcode-level differential testing of EVMs, reporting 26 previously unknown bugs across nine implementations along with coverage gains and an estimate that 7.21% of real contracts could trigger the bugs.
-
Prompt Quality and Pull Request Outcomes: A Stage-Based Empirical Study of LLM-Assisted Development
Specificity and Context predict actionable code generation while Verification predicts adoption and Context predicts integration depth in LLM-assisted PR workflows.
-
Exploring Ethical Concerns of Mobile Applications from App Reviews: A Literature Survey
A systematic literature survey of 37 studies synthesizes methods for identifying ethical concerns from app reviews and proposes a research agenda focused on automated detection.
-
GreenMalloc: Allocator Optimisation for Industrial Workloads
GreenMalloc applies NSGA-II with a rand_malloc proxy to discover allocator configurations that reduce average heap usage by up to 4.1% across workloads when evaluated in gem5, with no runtime penalty and a 0.25% efficiency gain.
-
Employing Continuous Integration inspired workflows for benchmarking of scientific software -- a use case on numerical cut cell quadrature
Continuous integration workflows automate benchmarking of numerical cut-cell quadrature across scientific software packages.