Introduces APV framework and Bayesian PIIE to evaluate and enhance LLMs' reasoning about pedagogical intent, reporting strong discrimination and r=0.958 human correlation on instructional tasks.
MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems
6 Pith papers cite this work. Polarity classification is still indexing.
abstract
Memory-backed agents need provenance that can survive leaked or migrated snapshots, where logs, visible outputs, and trusted metadata may be absent. We propose MemMark, a state-evolution attribution watermark that embeds an owner-controlled signal into latent memory-write decisions. At each internal LLM call, MemMark samples among admissible candidates using keyed, distribution-preserving selection, and records cryptographic commitments with signed session anchors and reveal evidence. This makes attribution depend on reproducible backend behavior rather than mutable provenance fields. Across A-Mem and Graphiti on LoCoMo, with three LLM backbones, MemMark preserves memory utility: Overall F1 retains 99.6% of the unwatermarked baseline, while BLEU-1 changes by +0.2%. It also provides usable carrier capacity, with 1.16, 1.14, and 1.26 bits of mean entropy for update-target, link-target, and semantic-realization decisions. In the snapshot-only R3 setting, MemMark recovers the full 40-bit payload from final snapshots, while wrong-key verification remains near chance. Under nine memory-lifecycle attacks, verification distinguishes tampering, evidence deletion, and partial payload recovery. These results show that robust snapshot-only attribution is feasible for long-term agent memory without surviving traces, trusted metadata, or utility-degrading.
years
2026 6verdicts
UNVERDICTED 6representative citing papers
DAIN reframes multimodal fusion as dynamic agent collaboration with sparse activation, claiming SOTA results including 2.6% accuracy gain on ADNI across five benchmarks.
ProHiFlo introduces hierarchical coarse-to-fine flow matching with functional guidance from pretrained predictors and an adaptive SE(3)-equivariant architecture, reporting higher success rates and fewer sampling steps than prior methods on protein generation tasks.
ProWAFT proposes a workload-aware dynamic fault-tolerance method for FPGA CNN accelerators via selective TMR and partial reconfiguration, reporting lower composite cost than static TMR or reactive approaches on ResNet/MobileNet traces under SEU injection.
PASE is a neuro-symbolic self-healing system that synthesizes LLM recovery plans, verifies them in simulation, and uses DRL to optimize prompts, claiming over 40% faster recovery on cloud fault data.
EVLA combines a Unified Co-State Encoder and Electro-aware Structured Reasoning Chain with physics-guided training to produce energy-optimal driving decisions, reporting +5.6% accuracy gains over fine-tuned VLM baselines on a driving QA benchmark.
citing papers explorer
-
Beyond Skepticism: Evaluating LLMs Pedagogical Intent Reasoning with the Adaptive Pedagogical Vigilance Framework
Introduces APV framework and Bayesian PIIE to evaluate and enhance LLMs' reasoning about pedagogical intent, reporting strong discrimination and r=0.958 human correlation on instructional tasks.
-
DAIN: Dynamic Agent-Based Interaction Network for Efficient and Collaborative Multimodal Reasoning
DAIN reframes multimodal fusion as dynamic agent collaboration with sparse activation, claiming SOTA results including 2.6% accuracy gain on ADNI across five benchmarks.
-
ProWAFT: A ROMA-LPD Instance for Workload-Aware and Dynamic Fault Tolerance in FPGA-Based CNN Accelerators
ProWAFT proposes a workload-aware dynamic fault-tolerance method for FPGA CNN accelerators via selective TMR and partial reconfiguration, reporting lower composite cost than static TMR or reactive approaches on ResNet/MobileNet traces under SEU injection.
-
EVLA: An Electro-Aware Multimodal Assistant for Physically-Grounded Driving Reasoning and Control
EVLA combines a Unified Co-State Encoder and Electro-aware Structured Reasoning Chain with physics-guided training to produce energy-optimal driving decisions, reporting +5.6% accuracy gains over fine-tuned VLM baselines on a driving QA benchmark.