AgentReview is the first LLM-based simulation framework for peer review that quantifies a 37.1% decision variation attributable to reviewer biases.
Mm-soc: Benchmarking multimodal large language models in social media platforms
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2024 2representative citing papers
LogicVista is a new benchmark dataset with 448 visual logic questions that evaluates multimodal LLMs on five reasoning tasks covering nine capabilities.
citing papers explorer
-
AgentReview: Exploring Peer Review Dynamics with LLM Agents
AgentReview is the first LLM-based simulation framework for peer review that quantifies a 37.1% decision variation attributable to reviewer biases.
-
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts
LogicVista is a new benchmark dataset with 448 visual logic questions that evaluates multimodal LLMs on five reasoning tasks covering nine capabilities.