MIST is a new synthetic speech-based tool-calling dataset for IoT devices that exposes performance gaps between open- and closed-weight multimodal LLMs.
Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations , pages=
9 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Refusal in language models is mediated by a single direction in residual stream activations that can be erased to disable safety or added to elicit refusal.
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.
A difficulty-aware conversational knowledge tracing framework that combines LLMs with Item Response Theory to produce interpretable student performance predictions in tutor dialogues.
Bangla Key2Text releases 2.6M keyword-text pairs and demonstrates that fine-tuned mT5 and BanglaT5 outperform zero-shot LLMs on keyword-conditioned Bangla text generation.
Small LLMs under 2B parameters achieve better economic break-even, energy efficiency, and hardware density than larger models on legacy GPUs for industrial tasks.
ClaHF converts instance labels into preference signals via candidate predictions and a reward model, then applies RL optimization to improve text classification accuracy and calibration.
Human visual interestingness is linearly decodable from final-layer embeddings in Qwen3-VL-8B and becomes progressively more structured across vision and language layers without explicit supervision.
citing papers explorer
-
MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes
MIST is a new synthetic speech-based tool-calling dataset for IoT devices that exposes performance gaps between open- and closed-weight multimodal LLMs.
-
Refusal in Language Models Is Mediated by a Single Direction
Refusal in language models is mediated by a single direction in residual stream activations that can be erased to disable safety or added to elicit refusal.
-
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
-
What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies
MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.
-
Interpretable Difficulty-Aware Knowledge Tracing in Tutor-Student Dialogues
A difficulty-aware conversational knowledge tracing framework that combines LLMs with Item Response Theory to produce interpretable student performance predictions in tutor dialogues.
-
Bangla Key2Text: Text Generation from Keywords for a Low Resource Language
Bangla Key2Text releases 2.6M keyword-text pairs and demonstrates that fine-tuned mT5 and BanglaT5 outperform zero-shot LLMs on keyword-conditioned Bangla text generation.
-
Are Large Language Models Economically Viable for Industry Deployment?
Small LLMs under 2B parameters achieve better economic break-even, energy efficiency, and hardware density than larger models on legacy GPUs for industrial tasks.
-
ClaHF: A Human Feedback-inspired Reinforcement Learning Framework for Improving Classification Tasks
ClaHF converts instance labels into preference signals via candidate predictions and a reward model, then applies RL optimization to improve text classification accuracy and calibration.
-
Neuroscience-Inspired Analyses of Visual Interestingness in Multimodal Transformers
Human visual interestingness is linearly decodable from final-layer embeddings in Qwen3-VL-8B and becomes progressively more structured across vision and language layers without explicit supervision.