Title resolution pending

· 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

method 2

citation-polarity summary

use method 2

representative citing papers

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

cs.CL · 2024-05-07 · unverdicted · novelty 7.0

DeepSeek-V2 delivers top-tier open-source LLM performance using only 21B active parameters by compressing the KV cache 93.3% and cutting training costs 42.5% via MLA and DeepSeekMoE.

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

cs.CL · 2024-02-05 · unverdicted · novelty 6.0

DeepSeekMath 7B reaches 51.7% on MATH via continued pretraining on curated web math data and Group Relative Policy Optimization.

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

cs.CL · 2024-01-05 · unverdicted · novelty 4.0

DeepSeek LLM 67B exceeds LLaMA-2 70B on code, mathematics and reasoning benchmarks after pre-training on 2 trillion tokens and alignment via SFT and DPO.

citing papers explorer

Showing 3 of 3 citing papers.

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model cs.CL · 2024-05-07 · unverdicted · none · ref 29
DeepSeek-V2 delivers top-tier open-source LLM performance using only 21B active parameters by compressing the KV cache 93.3% and cutting training costs 42.5% via MLA and DeepSeekMoE.
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models cs.CL · 2024-02-05 · unverdicted · none · ref 23
DeepSeekMath 7B reaches 51.7% on MATH via continued pretraining on curated web math data and Group Relative Policy Optimization.
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism cs.CL · 2024-01-05 · unverdicted · none · ref 35
DeepSeek LLM 67B exceeds LLaMA-2 70B on code, mathematics and reasoning benchmarks after pre-training on 2 trillion tokens and alignment via SFT and DPO.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer