Introduces Causal Functional Signatures grounded in causal evidence and ILP-learned architectural signatures to enable explicit, comparable, and portable mechanistic claims across model scales.
, author Montani, I
19 Pith papers cite this work, alongside 425 external citations. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 2polarities
use method 2representative citing papers
A new linked multimodal dataset of Russian domestic and foreign policy speeches with texts, images, captions, harmonized metadata, and expert-refined topic annotations is introduced to support analyses in political communication and LLM applications.
Semantic search retrieves substantially more implicit receptions of Locke's work than lexical baselines in 18th-century corpora, yet remains constrained by lexical gatekeeping.
Brazilian YouTube climate videos show a transition from traditional denial of climate science to 'new denial' that undermines solutions, with the latter attracting more engagement from diverse actors.
LLMs exhibit positional bias and context-dependent scoring patterns when judging document similarity, with each model showing a stable scoring fingerprint but a shared hierarchy of sensitivity to different semantic perturbations.
LLMs display a consistent pattern of elevated form-meaning divergence and uniform rhetorical device use in argumentative texts compared to humans, quantified by new metrics FMD, GPR, and RDDE.
The MathEd-PII benchmark shows that math-aware and segment-aware LLM prompting raises PII detection F1 from 0.379 to 0.821 while cutting false redactions of instructional numbers.
SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.
TSVer is a new benchmark dataset for fact verification against time-series evidence, with 304 annotated real-world claims, 400 time series, verdicts, and justifications, plus baseline results showing current models struggle.
Constructs continuous sign conversation data from isolated signs using retrieval and diffusion models to train a direct sign-to-sign conversational AI.
ATD-Trans is a new geographically annotated Japanese-English travelogue dataset that reveals Japanese-enhanced models perform better on geo-entity translation while domestic Japanese locations remain harder to translate accurately.
An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.
MediaGraph uses co-occurrence networks from Indian news on farmer protests and a new link predictability metric to reveal source-specific reporting preferences and under-representation of farmer leaders.
A framework using language models to simulate non-existent experiments and derive novel testable hypotheses on dative verb acquisition and cross-structural generalization in children.
Dual-encoder VLMs gain robust compositional generalization by learning localized alignments from frozen patch and token embeddings instead of using global similarity.
Contradictions between highly similar medical abstracts degrade the factual accuracy and consistency of LLM responses in retrieval-augmented generation.
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
No single privacy technique wins; combining local inference, redaction, and semantic rephrasing limits PII leaks to 0.6% and proprietary code leaks to 31.3% on a 1,300-sample benchmark, with code released.
Off-the-shelf German NER tools produce divergent toponym sets that lead to distinct country assignments for climate event news, affecting assessments of national prominence in media coverage.
citing papers explorer
-
From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach
Introduces Causal Functional Signatures grounded in causal evidence and ILP-learned architectural signatures to enable explicit, comparable, and portable mechanistic claims across model scales.
-
Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches
A new linked multimodal dataset of Russian domestic and foreign policy speeches with texts, images, captions, harmonized metadata, and expert-refined topic annotations is introduced to support analyses in political communication and LLM applications.
-
Matching Meaning at Scale: Evaluating Semantic Search for 18th-Century Intellectual History through the Case of Locke
Semantic search retrieves substantially more implicit receptions of Locke's work than lexical baselines in 18th-century corpora, yet remains constrained by lexical gatekeeping.
-
Mapping Emerging Climate Misinformation Playbooks in the Global South
Brazilian YouTube climate videos show a transition from traditional denial of climate science to 'new denial' that undermines solutions, with the latter attracting more engagement from diverse actors.
-
Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring
LLMs exhibit positional bias and context-dependent scoring patterns when judging document similarity, with each model showing a stable scoring fingerprint but a shared hierarchy of sensitivity to different semantic perturbations.
-
Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models
LLMs display a consistent pattern of elevated form-meaning divergence and uniform rhetorical device use in argumentative texts compared to humans, quantified by new metrics FMD, GPR, and RDDE.
-
Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset
The MathEd-PII benchmark shows that math-aware and segment-aware LLM prompting raises PII detection F1 from 0.379 to 0.821 while cutting false redactions of instructional numbers.
-
SAM 3: Segment Anything with Concepts
SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.
-
TSVer: A Benchmark for Fact Verification Against Time-Series Evidence
TSVer is a new benchmark dataset for fact verification against time-series evidence, with 304 annotated real-world claims, 400 time series, verdicts, and justifications, plus baseline results showing current models struggle.
-
Towards Continuous Sign Language Conversation from Isolated Signs
Constructs continuous sign conversation data from isolated signs using retrieval and diffusion models to train a direct sign-to-sign conversational AI.
-
ATD-Trans: A Geographically Grounded Japanese-English Travelogue Translation Dataset
ATD-Trans is a new geographically annotated Japanese-English travelogue dataset that reveals Japanese-enhanced models perform better on geo-entity translation while domestic Japanese locations remain harder to translate accurately.
-
Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe
An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.
-
MediaGraph: A Network Theoretic Framework to Analyze Reporting Preferences in Indian News Media
MediaGraph uses co-occurrence networks from Indian news on farmer protests and a new link predictability metric to reveal source-specific reporting preferences and under-representation of farmer leaders.
-
A systematic framework for generating novel experimental hypotheses from language models
A framework using language models to simulate non-existent experiments and derive novel testable hypotheses on dative verb acquisition and cross-structural generalization in children.
-
Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
Dual-encoder VLMs gain robust compositional generalization by learning localized alignments from frozen patch and token embeddings instead of using global similarity.
-
Contradictions in Context: Challenges for Retrieval-Augmented Generation in Healthcare
Contradictions between highly similar medical abstracts degrade the factual accuracy and consistency of LLM responses in retrieval-augmented generation.
-
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
-
LLM-Redactor: An Empirical Evaluation of Eight Techniques for Privacy-Preserving LLM Requests
No single privacy technique wins; combining local inference, redaction, and semantic rephrasing limits PII leaks to 0.6% and proprietary code leaks to 31.3% on a 1,300-sample benchmark, with code released.
-
Geolocating News about Extreme Climate Events: A Comparative Analysis of Off-the-Shelf Tools for Toponym Identification in German
Off-the-shelf German NER tools produce divergent toponym sets that lead to distinct country assignments for climate event news, affecting assessments of national prominence in media coverage.