Apple intelligence foundation language models

Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, et al · 2024 · arXiv 2407.21075

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

representative citing papers

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

cs.LG · 2024-10-07 · accept · novelty 7.0

LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.

OpenJarvis: Personal AI, On Personal Devices

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

OpenJarvis decomposes personal AI into Intelligence, Engine, Agents, Tools & Memory, and Learning primitives and applies LLM-guided spec search to produce on-device configurations that reach within 3.2 pp of cloud baselines on average across eight tasks.

LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM

cs.AR · 2026-04-06 · conditional · novelty 6.0

LOCALUT delivers 1.82x geometric mean speedup for quantized DNN inference on real UPMEM DRAM-PIM devices by using operation-packed LUTs with canonicalization, reordering, and slice streaming.

TIDE: Every Layer Knows the Token Beneath the Context

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.

Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

cs.AI · 2025-03-23 · conditional · novelty 5.0

LLMs show accuracy drops of 0.3% to 5.9% on GSM8K math problems when culturally adapted to six countries while keeping math operations identical, with statistical significance confirmed by McNemar tests.

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

cs.CL · 2025-02-04 · unverdicted · novelty 5.0

SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.

Fortress: A Case Study in Stabilizing Search Recommendations via Temporal Data Augmentation and Feature Pruning

cs.IR · 2026-05-14 · unverdicted · novelty 4.0

Fortress stabilizes query-to-app relevance models by pruning features that cause inconsistent predictions across time periods while retaining predictive power from engagement signals.

Reinforcement Learning from Human Feedback

cs.LG · 2025-04-16 · unverdicted · novelty 2.0

The book introduces the origins, mathematical setup, and optimization stages of RLHF including reward modeling, reinforcement learning, rejection sampling, and direct alignment algorithms.

citing papers explorer

Showing 8 of 8 citing papers.

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models cs.LG · 2024-10-07 · accept · none · ref 71
LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.
OpenJarvis: Personal AI, On Personal Devices cs.LG · 2026-05-16 · unverdicted · none · ref 6
OpenJarvis decomposes personal AI into Intelligence, Engine, Agents, Tools & Memory, and Learning primitives and applies LLM-guided spec search to produce on-device configurations that reach within 3.2 pp of cloud baselines on average across eight tasks.
LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM cs.AR · 2026-04-06 · conditional · none · ref 24
LOCALUT delivers 1.82x geometric mean speedup for quantized DNN inference on real UPMEM DRAM-PIM devices by using operation-packed LUTs with canonicalization, reordering, and slice streaming.
TIDE: Every Layer Knows the Token Beneath the Context cs.CL · 2026-05-07 · unverdicted · none · ref 77
TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? cs.AI · 2025-03-23 · conditional · none · ref 3
LLMs show accuracy drops of 0.3% to 5.9% on GSM8K math problems when culturally adapted to six countries while keeping math operations identical, with statistical significance confirmed by McNemar tests.
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model cs.CL · 2025-02-04 · unverdicted · none · ref 176
SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.
Fortress: A Case Study in Stabilizing Search Recommendations via Temporal Data Augmentation and Feature Pruning cs.IR · 2026-05-14 · unverdicted · none · ref 3
Fortress stabilizes query-to-app relevance models by pruning features that cause inconsistent predictions across time periods while retaining predictive power from engagement signals.
Reinforcement Learning from Human Feedback cs.LG · 2025-04-16 · unverdicted · none · ref 144
The book introduces the origins, mathematical setup, and optimization stages of RLHF including reward modeling, reinforcement learning, rejection sampling, and direct alignment algorithms.

Apple intelligence foundation language models

fields

years

verdicts

representative citing papers

citing papers explorer