REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
hub Canonical reference
IEEE Symposium on Security and Privacy , year =
Canonical reference. 94% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 15representative citing papers
Cerisier is the first mechanized program logic for modular reasoning about trusted, untrusted, and attested code in capability machines, with a universal contract for untrusted code and demonstrations on secure computation and mutual attestation.
PoisonForge benchmark shows that 1% poisoned examples achieve over 70% attack success rate on targeted tasks across 11 of 12 tested LLMs with under 0.5% leakage to non-target tasks.
A cross-modal alignment attack achieves AUC 0.821 for single-sample black-box membership inference on VLMs such as LLaVA-1.5 by quantifying image-generated caption similarity.
PII can be reconstructed from SFT models via prefix attacks, with the new COVA algorithm improving success rates and leakage varying by attacker knowledge and PII type.
Asymmetric Langevin Unlearning uses public data to suppress unlearning noise costs by O(1/n_pub²), enabling practical mass unlearning with preserved utility under distribution mismatch.
Zombie domain linkages persist after ownership changes in DNS integrations at rates of 3% in Web PKI, 24% in ENS, and 15% in Maven Central, with validate-once designs accumulating long-term risks while per-use validation prevents them.
Styx integrates sticky policies with TEEs to enforce data-specific rules throughout the full lifecycle in multi-party collaborative computing.
MemHint combines LLM classification of custom memory functions with Z3 path validation to augment CodeQL and Infer, detecting 52 memory leaks (49 confirmed) across 3.4M LOC versus 19 and 3 by vanilla tools.
Grassroots bonds add maturity dates to local cryptocurrencies to enable lending and other instruments via enforceable digital social contracts.
SynBench benchmarks DP text generators across nine datasets and uses a new MIA to show that public pre-training on portions of private data overestimates synthetic text quality and breaks DP privacy bounds.
Flutter achieves 2Δ + ε good-case latency for Byzantine Total Order Broadcast via a new binary consensus called Blink, under partial synchrony with 5f+1 servers.
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
GRASP detects anomalies in system provenance graphs via self-supervised executable prediction from two-hop neighborhoods, outperforming prior PIDS on DARPA datasets by identifying all documented attacks where behaviors are learnable plus additional unlabeled suspicious activity.
EASE closes three residual anchors in federated multimodal unlearning using bilateral displacement, cosine-sine decomposition, and forget lock, achieving near-retrain performance on forget and retain data.
CuLifter recovers types from untyped GPU register files via constraint propagation to lift 99.98% of 24,437 functions across 919 cubins to valid LLVM IR.
AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.
VRSafe adds false positive keystrokes to VR typing data to reduce keystroke inference attack accuracy and includes an efficient malicious login detector.
BONSAI introduces a four-layer architecture and four-phase workflow for human-AI co-development of visual analytics applications, shown in case studies to enable efficient novel tool creation and reconstruction from paper descriptions.
An encoding of Solidity contracts and first-order Hennessy-Milner logic into Lustre enables Kind 2 model checking of complex temporal properties in smart contracts.
GPIR achieves up to 297 times higher throughput than prior GPU PIR systems by fusing operations in stages and using pipelined transposed layouts to cut DRAM traffic during batched lattice-based queries.
AI agents can generate code in a capability-safe Scala dialect that statically prevents information leakage and malicious side effects while preserving task performance.
TESLA recovers 2D handwriting trajectories from touchscreen EM emanations on COTS smartphones, achieving 77% character recognition accuracy and 0.74 Jaccard index under realistic conditions.
The authors built an automated toolchain that extracts symbolic models from real binaries of cryptographic protocols and analyzes them for constant-time and speculative side-channel leaks, demonstrated on WhatsApp and e-passport implementations.
citing papers explorer
-
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reasoning models.
-
Cerisier: A Program Logic for Attestation in a Capability Machine
Cerisier is the first mechanized program logic for modular reasoning about trusted, untrusted, and attested code in capability machines, with a universal contract for untrusted code and demonstrations on secure computation and mutual attestation.
-
PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs
PoisonForge benchmark shows that 1% poisoned examples achieve over 70% attack success rate on targeted tasks across 11 of 12 tested LLMs with under 0.5% leakage to non-target tasks.
-
Single-Sample Black-Box Membership Inference Attack against Vision-Language Models via Cross-modal Semantic Alignment
A cross-modal alignment attack achieves AUC 0.821 for single-sample black-box membership inference on VLMs such as LLaVA-1.5 by quantifying image-generated caption similarity.
-
Reconstruction of Personally Identifiable Information from Supervised Finetuned Models
PII can be reconstructed from SFT models via prefix attacks, with the new COVA algorithm improving success rates and leakage varying by attacker knowledge and PII type.
-
Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data
Asymmetric Langevin Unlearning uses public data to suppress unlearning noise costs by O(1/n_pub²), enabling practical mass unlearning with preserved utility under distribution mismatch.
-
Zombies in Alternate Realities: The Afterlife of Domain Names in DNS Integrations
Zombie domain linkages persist after ownership changes in DNS integrations at rates of 3% in Web PKI, 24% in ENS, and 15% in Maven Central, with validate-once designs accumulating long-term risks while per-use validation prevents them.
-
Styx: Collaborative and Private Data Processing With TEE-Enforced Sticky Policy
Styx integrates sticky policies with TEEs to enforce data-specific rules throughout the full lifecycle in multi-party collaborative computing.
-
Finding Memory Leaks in C/C++ Programs via Neuro-Symbolic Augmented Static Analysis
MemHint combines LLM classification of custom memory functions with Z3 path validation to augment CodeQL and Infer, detecting 52 memory leaks (49 confirmed) across 3.4M LOC versus 19 and 3 by vanilla tools.
-
Grassroots Bonds as a Foundation for Market Liquidity
Grassroots bonds add maturity dates to local cryptocurrencies to enable lending and other instruments via enforceable digital social contracts.
-
SynBench: A Benchmark for Differentially Private Text Generation
SynBench benchmarks DP text generators across nine datasets and uses a new MIA to show that public pre-training on portions of private data overestimates synthetic text quality and breaks DP privacy bounds.
-
Fast Byzantine Total Order Broadcast
Flutter achieves 2Δ + ε good-case latency for Byzantine Total Order Broadcast via a new binary consensus called Blink, under partial synchrony with 5f+1 servers.
-
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
-
GRASP -- Graph-Based Anomaly Detection Through Self-Supervised Classification
GRASP detects anomalies in system provenance graphs via self-supervised executable prediction from two-hop neighborhoods, outperforming prior PIDS on DARPA datasets by identifying all documented attacks where behaviors are learnable plus additional unlabeled suspicious activity.
-
EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure
EASE closes three residual anchors in federated multimodal unlearning using bilateral displacement, cosine-sine decomposition, and forget lock, achieving near-retrain performance on forget and retain data.
-
CuLifter: Lifting GPU Binaries to Typed IR
CuLifter recovers types from untyped GPU register files via constraint propagation to lift 99.98% of 24,437 functions across 919 cubins to valid LLVM IR.
-
When AI reviews science: Can we trust the referee?
AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.
-
VRSafe: A Secure Virtual Keyboard to Mitigate Keystroke Inference in Virtual Reality
VRSafe adds false positive keystrokes to VR typing data to reduce keystroke inference attack accuracy and includes an efficient malicious login detector.
-
BONSAI: A Mixed-Initiative Workspace for Human-AI Co-Development of Visual Analytics Applications
BONSAI introduces a four-layer architecture and four-phase workflow for human-AI co-development of visual analytics applications, shown in case studies to enable efficient novel tool creation and reconstruction from paper descriptions.
-
KindHML: formal verification of smart contracts based on Hennessy-Milner logic
An encoding of Solidity contracts and first-order Hennessy-Milner logic into Lustre enables Kind 2 model checking of complex temporal properties in smart contracts.
-
GPIR: Enabling Practical Private Information Retrieval with GPUs
GPIR achieves up to 297 times higher throughput than prior GPU PIR systems by fusing operations in stages and using pipelined transposed layouts to cut DRAM traffic during batched lattice-based queries.
-
Tracking Capabilities for Safer Agents
AI agents can generate code in a capability-safe Scala dialect that statically prevents information leakage and malicious side effects while preserving task performance.
-
Capacitive Touchscreens at Risk: Recovering Handwritten Trajectory on Smartphone via Electromagnetic Emanations
TESLA recovers 2D handwriting trajectories from touchscreen EM emanations on COTS smartphones, achieving 77% character recognition accuracy and 0.74 Jaccard index under realistic conditions.
-
Automated Side-Channel Analysis of Cryptographic Protocol Implementations
The authors built an automated toolchain that extracts symbolic models from real binaries of cryptographic protocols and analyzes them for constant-time and speculative side-channel leaks, demonstrated on WhatsApp and e-passport implementations.
-
Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications
Empirical comparison of alignment ablation methods on a 60-prompt security evaluation suite shows task-only LoRA achieves 0.87 mean security score with 0.13 unsafe compliance.
-
AI Slop and the Software Commons
AI slop in software externalizes review and integrity costs onto the commons, requiring institutional responses drawn from Ostrom's design principles.
-
Understanding Student Experiences with TLS Client Authentication
A longitudinal study of 46 CS students finds that configuring and using mTLS client certificates is difficult even for technical users, with only 9% understanding the security implications.
-
Evaluating Differential Privacy Against Membership Inference in Federated Learning: Insights from the NIST Genomics Red Team Challenge
Stacking seven black-box estimators into a meta-classifier reveals persistent membership leakage in differentially private federated learning models at epsilon=200 on NIST genomics data, outperforming single-signal baselines.
-
Evasion Under Blockchain Sanctions
Empirical analysis of 1.07 billion Ethereum transactions shows sanctions cut Tornado Cash deposits by 71% yet the mixer remained central to most security incidents, exposing three structural enforcement weaknesses.
-
Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection
Longitudinal evaluation over yearly Android app slices shows temporal drift reduces adversarial robustness of malware detectors, with expanding-window retraining providing partial mitigation but not full recovery.
-
Security of LLM-generated Code: A Comparative Analysis
Empirical evaluation shows that code generated by all seven tested LLMs contains vulnerabilities, the majority of critical or high severity.
-
Quality and Security Signals in AI-Generated Python Refactoring Pull Requests
Empirical analysis of AI refactoring PRs shows quality attribute improvements in 22.5% of cases with new Pylint issues in 24.17% and Bandit findings in 4.7%, yet 73.5% developer acceptance.
-
Can I Check What I Designed? Mapping Security Design DSLs to Code Analyzers
An empirical study of security DSLs and code analyzers finds few common concepts, overly general weakness descriptions, and that even experts are overwhelmed by the complexity of potential mappings.
-
"It didn't feel right but I needed a job so desperately": Understanding People's Emotions & Help Needs During Financial Scams
A qualitative study maps emotions exploited by financial scammers and help-seeking needs at different scam stages, identifying risk factors and suggesting design implications for interventions.
-
Towards the Anonymization of the Language Modeling
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.
-
AI-Driven Security Alert Screening and Alert Fatigue Mitigation in Security Operations Centers: A Comprehensive Survey
A literature survey synthesizes 119 studies on AI-driven alert screening into a four-stage taxonomy of filtering, triage, correlation, and generative augmentation while identifying gaps in deployment realism and robustness.
-
What is (H)CI: Why Does the "Human'' Matter?
A workshop proposal to reflect on HCI's core identity and the importance of human elements in the era of generative AI.
-
Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits
TraceTarnish attack identifies stylometric features like function-word frequencies and type-token ratio that both strengthen authorship anonymization and serve as indicators of compromise when pre- and post-transformation texts can be compared.