MatMMExtract pipeline creates MatSciFig dataset of 391k annotated materials science figure panels and MaterialScope detection dataset with high accuracy.
, year 1927
16 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CL 4 cs.AI 3 cs.LG 2 cs.CV 1 cs.CY 1 cs.SE 1 gr-qc 1 hep-ex 1 physics.comp-ph 1 quant-ph 1roles
background 2polarities
background 2representative citing papers
Rough-set analysis finds 16.4% of 305 concept profiles in Derm7pt inconsistent (306 images), capping hard CBM accuracy at 92.1%; symmetric filtering produces a 705-image consistent benchmark where EfficientNet-B5 reaches 0.90 label accuracy.
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
The paper defines five AI system categories for public administration and reports that 55% of 91 recent papers leave the system type underspecified while 31% study one type but motivate with another.
PDE-Agents shows a LangGraph-orchestrated multi-agent LLM framework with GraphRAG that reaches 100% task success and perfect material fidelity on novel materials in ablation tests, with 97.8% success across 1369 production runs.
A multi-reference audit framework for LLM translations of the Pali Canon uses embedding drift from a human reference centroid to triage candidates for LLM-judge adjudication, showing drift correlates with major error rates and model-specific differences in the high-drift tail.
Case study of 18,020 Kubernetes PRs shows label-diff congruence is prevalent and stable, with higher congruence linked to fewer review participants among core developers and more among one-time contributors.
Generative LMs in laissez-faire open-ended prompting settings disproportionately generate subordinated portrayals of minoritized race, gender, and sexual orientation identities at rates hundreds to thousands of times higher than empowering ones.
TOTEN is a knowledge-based system for structure-preserving representation of physical quantities and technical notation in Brazilian Portuguese using an ontology of engineering entities and external authorities, outperforming statistical baselines in atomicity and reconstruction.
Experimental tests of parity-check filters in a single-photon eight-mode photonic register on Quandela's Belenos processor show mean 0.6% DC leakage with 21x suppression and 94-99% syndrome channel selectivity.
Under controlled identical protocols, only one of six multi-agent LLM systems marginally exceeds a single-agent baseline on benchmark-balanced accuracy while the rest trail and cost more; a runtime workflow reaches 66.72% on GAIA.
A four-mechanism framework produces auditable, evidence-grounded trait records from LLMs applied to over 400,000 tropical plant, aquatic, and pet species.
An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.
SCENIC framework reports up to 99% exact match on structured IoT command generation using sub-0.2B models, with pruned INT8 versions retaining 91% EM@1 after 25% size reduction.
Simulations show LIGO-A# constrains the peak redshift of binary black hole merger rate (tracing star formation) to ±0.1 in one year, improving to ±0.02 with next-generation detectors.
Framework and software implementation for data-driven trigger efficiency estimation at LHCb using reconstructed candidate properties.
citing papers explorer
-
Unlocking the Visual Record of Materials Science: A Large-Scale Multimodal Dataset from Scientific Literature
MatMMExtract pipeline creates MatSciFig dataset of 391k annotated materials science figure panels and MaterialScope detection dataset with high accuracy.
-
Concept Inconsistency in Dermoscopic Concept Bottleneck Models: A Rough-Set Analysis of the Derm7pt Dataset
Rough-set analysis finds 16.4% of 305 concept profiles in Derm7pt inconsistent (306 images), capping hard CBM accuracy at 92.1%; symmetric filtering produces a 705-image consistent benchmark where EfficientNet-B5 reaches 0.90 label accuracy.
-
A Technical Typology of AI Systems in Public Administration
The paper defines five AI system categories for public administration and reports that 55% of 91 recent papers leave the system type underspecified while 31% study one type but motivate with another.
-
PDE-Agents: An LLM-Orchestrated Multi-Agent Framework for Automated Finite Element Simulations with Knowledge Graph-Augmented Reasoning
PDE-Agents shows a LangGraph-orchestrated multi-agent LLM framework with GraphRAG that reaches 100% task success and perfect material fidelity on novel materials in ablation tests, with 97.8% success across 1369 production runs.
-
From Outliers to Errors: Auditing Pali-to-English LLM Translations with Multi-Reference Adjudication
A multi-reference audit framework for LLM translations of the Pali Canon uses embedding drift from a human reference centroid to triage candidates for LLM-judge adjudication, showing drift correlates with major error rates and model-specific differences in the high-drift tail.
-
Efficiency for Experts, Visibility for Newcomers: A Case Study of Label-Code Alignment in Kubernetes
Case study of 18,020 Kubernetes PRs shows label-diff congruence is prevalent and stable, with higher congruence linked to fewer review participants among core developers and more among one-time contributors.
-
Toten: A Knowledge-Based System For Structure-Preserving Representation Of Physical Quantities And Technical Notation In Brazilian Portuguese
TOTEN is a knowledge-based system for structure-preserving representation of physical quantities and technical notation in Brazilian Portuguese using an ontology of engineering entities and external authorities, outperforming statistical baselines in atomicity and reconstruction.
-
Characterization of nested Walsh parity-check filters in a single-photon eight-mode register on a cloud photonic processor
Experimental tests of parity-check filters in a single-photon eight-mode photonic register on Quandela's Belenos processor show mean 0.6% DC leakage with 21x suppression and 94-99% syndrome channel selectivity.
-
Do More Agents Help? Controlled and Protocol-Aligned Evaluation of LLM Agent Workflows
Under controlled identical protocols, only one of six multi-agent LLM systems marginally exceeds a single-agent baseline on benchmark-balanced accuracy while the rest trail and cost more; a runtime workflow reaches 66.72% on GAIA.
-
A Registry-Bound LLM Pipeline for Evidence-Grounded Trait Extraction across Tropical Plants, Aquatic Species, and Exotic Pets
A four-mechanism framework produces auditable, evidence-grounded trait records from LLMs applied to over 400,000 tropical plant, aquatic, and pet species.
-
PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents
An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.
-
SCENIC: Semantic-Conditioned Edge-Aware Neural Framework for Structured IoT Command Generation
SCENIC framework reports up to 99% exact match on structured IoT command generation using sub-0.2B models, with pruned INT8 versions retaining 91% EM@1 after 25% size reduction.
-
Mapping the star formation peak with LIGO A# and Next-Generation detectors
Simulations show LIGO-A# constrains the peak redshift of binary black hole merger rate (tracing star formation) to ±0.1 in one year, improving to ±0.02 with next-generation detectors.