Audiogpt: Understanding and generating speech, music, sound, and talking head

· 2023 · arXiv 2304.12995

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models

cs.SD · 2026-01-06 · unverdicted · novelty 7.0

TWNM framework equips audio-language models with spatial scene analysis via FOA simulation and metadata-grounded training, reaching 70.8% accuracy on a new ASA benchmark.

WizardLM: Empowering large pre-trained language models to follow complex instructions

cs.CL · 2023-04-24 · conditional · novelty 7.0

WizardLM uses LLM-driven iterative rewriting to generate complex instruction data and fine-tunes LLaMA to reach over 90% of ChatGPT capacity on 17 of 29 evaluated skills.

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

cs.CV · 2025-05-29 · unverdicted · novelty 6.0 · 2 refs

Spatial-MLLM adds a 3D spatial encoder initialized from a visual geometry model and space-aware frame sampling to MLLMs to improve spatial understanding and reasoning from purely 2D visual inputs.

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

eess.AS · 2023-11-14 · unverdicted · novelty 6.0

Qwen-Audio trains a unified model on diverse audio and tasks with hierarchical tags to enable strong zero-shot performance on audio understanding benchmarks and multi-turn audio chat.

SALMONN: Towards Generic Hearing Abilities for Large Language Models

cs.SD · 2023-10-20 · unverdicted · novelty 6.0

SALMONN integrates speech and audio encoders with a text-based LLM to process general audio inputs, achieve competitive results on trained tasks, and exhibit emergent cross-modal abilities.

The Rise and Potential of Large Language Model Based Agents: A Survey

cs.AI · 2023-09-14 · accept · novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages

cs.CL · 2026-05-16 · unverdicted · novelty 2.0

A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.

citing papers explorer

Showing 7 of 7 citing papers.

The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models cs.SD · 2026-01-06 · unverdicted · none · ref 10
TWNM framework equips audio-language models with spatial scene analysis via FOA simulation and metadata-grounded training, reaching 70.8% accuracy on a new ASA benchmark.
WizardLM: Empowering large pre-trained language models to follow complex instructions cs.CL · 2023-04-24 · conditional · none · ref 20
WizardLM uses LLM-driven iterative rewriting to generate complex instruction data and fine-tunes LLaMA to reach over 90% of ChatGPT capacity on 17 of 29 evaluated skills.
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence cs.CV · 2025-05-29 · unverdicted · none · ref 15 · 2 links
Spatial-MLLM adds a 3D spatial encoder initialized from a visual geometry model and space-aware frame sampling to MLLMs to improve spatial understanding and reasoning from purely 2D visual inputs.
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models eess.AS · 2023-11-14 · unverdicted · none · ref 18
Qwen-Audio trains a unified model on diverse audio and tasks with hierarchical tags to enable strong zero-shot performance on audio understanding benchmarks and multi-turn audio chat.
SALMONN: Towards Generic Hearing Abilities for Large Language Models cs.SD · 2023-10-20 · unverdicted · none · ref 25
SALMONN integrates speech and audio encoders with a text-based LLM to process general audio inputs, achieve competitive results on trained tasks, and exhibit emergent cross-modal abilities.
The Rise and Potential of Large Language Model Based Agents: A Survey cs.AI · 2023-09-14 · accept · none · ref 294
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages cs.CL · 2026-05-16 · unverdicted · none · ref 229
A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.

Audiogpt: Understanding and generating speech, music, sound, and talking head

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer