Introduces Causal Functional Signatures grounded in causal evidence and ILP-learned architectural signatures to enable explicit, comparable, and portable mechanistic claims across model scales.
hub
author Montani, I
28 Pith papers cite this work, alongside 425 external citations. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
method 2polarities
use method 2representative citing papers
A new linked multimodal dataset of Russian domestic and foreign policy speeches with texts, images, captions, harmonized metadata, and expert-refined topic annotations is introduced to support analyses in political communication and LLM applications.
Semantic search retrieves substantially more implicit receptions of Locke's work than lexical baselines in 18th-century corpora, yet remains constrained by lexical gatekeeping.
Brazilian YouTube climate videos show a transition from traditional denial of climate science to 'new denial' that undermines solutions, with the latter attracting more engagement from diverse actors.
LLMs exhibit positional bias and context-dependent scoring patterns when judging document similarity, with each model showing a stable scoring fingerprint but a shared hierarchy of sensitivity to different semantic perturbations.
LLMs display a consistent pattern of elevated form-meaning divergence and uniform rhetorical device use in argumentative texts compared to humans, quantified by new metrics FMD, GPR, and RDDE.
SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.
TSVer is a new benchmark dataset for fact verification against time-series evidence, with 304 annotated real-world claims, 400 time series, verdicts, and justifications, plus baseline results showing current models struggle.
IPA-based subword tokenizers trained across 24 languages improve tokenization quality and generalization to unseen languages compared to standard text tokenizers, especially for non-Latin scripts.
Systematic experiments reveal that activation steering trades fluency for concept control, is less effective on instruction-tuned models, and that prompting/SFT excel at injection but not removal, with textual metrics correlating to LLM judges.
AraSEG is a genre-diverse Arabic sentence segmentation corpus showing lightweight encoders and dependency parsers outperform LLMs under challenging punctuation while improving downstream parsing.
Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.
Structural probes on UD-invariant wh-movement stimuli reveal phase-count gradients and phase-internal cohesion effects in 12-13 of 13 LLMs, indicating syntactic abstractions beyond UD annotations.
Constructs continuous sign conversation data from isolated signs using retrieval and diffusion models to train a direct sign-to-sign conversational AI.
ATD-Trans is a new geographically annotated Japanese-English travelogue dataset that reveals Japanese-enhanced models perform better on geo-entity translation while domestic Japanese locations remain harder to translate accurately.
An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.
MediaGraph uses co-occurrence networks from Indian news on farmer protests and a new link predictability metric to reveal source-specific reporting preferences and under-representation of farmer leaders.
A framework using language models to simulate non-existent experiments and derive novel testable hypotheses on dative verb acquisition and cross-structural generalization in children.
LLMs exhibit different trade-offs between rule compliance and communicative success across prompting, generation constraints, and representation interventions, but remain substantially weaker than humans at guessing under lexical constraints.
Dual-encoder VLMs gain robust compositional generalization by learning localized alignments from frozen patch and token embeddings instead of using global similarity.
Contradictions between highly similar medical abstracts degrade the factual accuracy and consistency of LLM responses in retrieval-augmented generation.
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
No single privacy technique wins; combining local inference, redaction, and semantic rephrasing limits PII leaks to 0.6% and proprietary code leaks to 31.3% on a 1,300-sample benchmark, with code released.
Empirical comparison finds tokenization most important and recommends specific preprocessing order for Twitter sentiment analysis models.
citing papers explorer
-
From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach
Introduces Causal Functional Signatures grounded in causal evidence and ILP-learned architectural signatures to enable explicit, comparable, and portable mechanistic claims across model scales.
-
Linked Multi-Model Data on Russian Domestic and Foreign Policy Speeches
A new linked multimodal dataset of Russian domestic and foreign policy speeches with texts, images, captions, harmonized metadata, and expert-refined topic annotations is introduced to support analyses in political communication and LLM applications.
-
Matching Meaning at Scale: Evaluating Semantic Search for 18th-Century Intellectual History through the Case of Locke
Semantic search retrieves substantially more implicit receptions of Locke's work than lexical baselines in 18th-century corpora, yet remains constrained by lexical gatekeeping.
-
Mapping Emerging Climate Misinformation Playbooks in the Global South
Brazilian YouTube climate videos show a transition from traditional denial of climate science to 'new denial' that undermines solutions, with the latter attracting more engagement from diverse actors.
-
Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring
LLMs exhibit positional bias and context-dependent scoring patterns when judging document similarity, with each model showing a stable scoring fingerprint but a shared hierarchy of sensitivity to different semantic perturbations.
-
Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models
LLMs display a consistent pattern of elevated form-meaning divergence and uniform rhetorical device use in argumentative texts compared to humans, quantified by new metrics FMD, GPR, and RDDE.
-
SAM 3: Segment Anything with Concepts
SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.
-
TSVer: A Benchmark for Fact Verification Against Time-Series Evidence
TSVer is a new benchmark dataset for fact verification against time-series evidence, with 304 annotated real-world claims, 400 time series, verdicts, and justifications, plus baseline results showing current models struggle.
-
Phonemes to the Rescue: Multilingual Tokenization Based on International Phonetic Alphabet
IPA-based subword tokenizers trained across 24 languages improve tokenization quality and generalization to unseen languages compared to standard text tokenizers, especially for non-Latin scripts.
-
On The Effectiveness-Fluency Trade-Off In LLM Conditioning: A Systematic Study
Systematic experiments reveal that activation steering trades fluency for concept control, is less effective on instruction-tuned models, and that prompting/SFT excel at injection but not removal, with textual metrics correlating to LLM judges.
-
Arabic Sentence Segmentation Across Genres and Punctuation Conditions
AraSEG is a genre-diverse Arabic sentence segmentation corpus showing lightweight encoders and dependency parsers outperform LLMs under challenging punctuation while improving downstream parsing.
-
Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning
Introduces a triangulation-based metric to quantify lexical shifts attributable to preference tuning without requiring manual curation of examples.
-
Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent
Structural probes on UD-invariant wh-movement stimuli reveal phase-count gradients and phase-internal cohesion effects in 12-13 of 13 LLMs, indicating syntactic abstractions beyond UD annotations.
-
Towards Continuous Sign Language Conversation from Isolated Signs
Constructs continuous sign conversation data from isolated signs using retrieval and diffusion models to train a direct sign-to-sign conversational AI.
-
Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe
An encoding probe reconstructs transformer representations from acoustic, phonetic, syntactic, lexical and speaker features, showing independent syntactic/lexical contributions and training-dependent speaker effects.
-
MediaGraph: A Network Theoretic Framework to Analyze Reporting Preferences in Indian News Media
MediaGraph uses co-occurrence networks from Indian news on farmer protests and a new link predictability metric to reveal source-specific reporting preferences and under-representation of farmer leaders.
-
A systematic framework for generating novel experimental hypotheses from language models
A framework using language models to simulate non-existent experiments and derive novel testable hypotheses on dative verb acquisition and cross-structural generalization in children.
-
"Don't Say It!": Constraints, Compliance, and Communication when Language Models Play Taboo
LLMs exhibit different trade-offs between rule compliance and communicative success across prompting, generation constraints, and representation interventions, but remain substantially weaker than humans at guessing under lexical constraints.
-
Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
Dual-encoder VLMs gain robust compositional generalization by learning localized alignments from frozen patch and token embeddings instead of using global similarity.
-
Contradictions in Context: Challenges for Retrieval-Augmented Generation in Healthcare
Contradictions between highly similar medical abstracts degrade the factual accuracy and consistency of LLM responses in retrieval-augmented generation.
-
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
-
LLM-Redactor: An Empirical Evaluation of Eight Techniques for Privacy-Preserving LLM Requests
No single privacy technique wins; combining local inference, redaction, and semantic rephrasing limits PII leaks to 0.6% and proprietary code leaks to 31.3% on a 1,300-sample benchmark, with code released.
-
Best Preprocessing Techniques for Sentiment Analysis
Empirical comparison finds tokenization most important and recommends specific preprocessing order for Twitter sentiment analysis models.
-
Archi: Agentic Operations at the CMS Experiment
Archi deploys configurable agents on ingested documentation, historical data, and live monitoring to support CMS computing operators at CERN, with positive results on real queries and competitive performance from local open-weight models.
-
Geolocating News about Extreme Climate Events: A Comparative Analysis of Off-the-Shelf Tools for Toponym Identification in German
Off-the-shelf German NER tools produce divergent toponym sets that lead to distinct country assignments for climate event news, affecting assessments of national prominence in media coverage.
-
UOL@IDEM at BEA 2026 Shared Task 1: Neural Fusion and Feature-Rich Modeling for L1-Aware Vocabulary Difficulty Prediction
A feature-rich regression model using multilingual embeddings and features for frequency, cognate similarity, and predictability reports RMSE scores of 1.132, 1.037, and 0.891 for L1-aware vocabulary difficulty prediction on Spanish, German, and Chinese.