pith. sign in

hub

Can generalist foundation models outcompete special-purpose tuning? case study in medicine

25 Pith papers cite this work. Polarity classification is still indexing.

25 Pith papers citing it

hub tools

citation-role summary

background 2

citation-polarity summary

roles

background 2

polarities

background 2

clear filters

representative citing papers

TextGrad: Automatic "Differentiation" via Text

cs.CL · 2024-06-11 · unverdicted · novelty 7.0

TextGrad performs automatic differentiation for compound AI systems by backpropagating natural-language feedback from LLMs to optimize variables ranging from code to molecular structures.

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

cs.CL · 2024-12-25 · unverdicted · novelty 6.0

HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.

Capabilities of Gemini Models in Medicine

cs.AI · 2024-04-29 · unverdicted · novelty 6.0

Med-Gemini sets new records on 10 of 14 medical benchmarks including 91.1% on MedQA-USMLE, beats GPT-4V by 44.5% on multimodal tasks, and surpasses humans on medical text summarization.

Can an LLM Learn Preferences from Choice Data?

econ.GN · 2024-01-14 · unverdicted · novelty 6.0

LLMs show improving recommendation accuracy with more observed choices under the disappointment aversion model, but learning success is heterogeneous across models and preference parameters.

GPT-4o System Card

cs.CL · 2024-10-25 · unverdicted · novelty 5.0

GPT-4o is OpenAI's end-to-end multimodal model with human-like audio latency, improved non-English text performance, stronger vision and audio understanding, and accompanying safety evaluations.

citing papers explorer

Showing 6 of 6 citing papers after filters.

  • AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments cs.HC · 2024-05-13 · conditional · none · ref 17

    AgentClinic is a multimodal agent benchmark demonstrating that LLM diagnostic accuracy on MedQA drops to below one-tenth in sequential clinical simulations, with Claude-3.5 leading and large tool-use differences across models.

  • TextGrad: Automatic "Differentiation" via Text cs.CL · 2024-06-11 · unverdicted · none · ref 66

    TextGrad performs automatic differentiation for compound AI systems by backpropagating natural-language feedback from LLMs to optimize variables ranging from code to molecular structures.

  • HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs cs.CL · 2024-12-25 · unverdicted · none · ref 47

    HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.

  • Capabilities of Gemini Models in Medicine cs.AI · 2024-04-29 · unverdicted · none · ref 179

    Med-Gemini sets new records on 10 of 14 medical benchmarks including 91.1% on MedQA-USMLE, beats GPT-4V by 44.5% on multimodal tasks, and surpasses humans on medical text summarization.

  • Can an LLM Learn Preferences from Choice Data? econ.GN · 2024-01-14 · unverdicted · none · ref 27

    LLMs show improving recommendation accuracy with more observed choices under the disappointment aversion model, but learning success is heterogeneous across models and preference parameters.

  • GPT-4o System Card cs.CL · 2024-10-25 · unverdicted · none · ref 40

    GPT-4o is OpenAI's end-to-end multimodal model with human-like audio latency, improved non-English text performance, stronger vision and audio understanding, and accompanying safety evaluations.