Towards Spec Learning: Inference-Time Alignment from Preference Pairs

Dhriti Krishnan; Jaromir Savelka; Tejas Goyal

arxiv: 2606.24004 · v2 · pith:63IV6FNHnew · submitted 2026-06-22 · 💻 cs.CL · cs.AI

Towards Spec Learning: Inference-Time Alignment from Preference Pairs

Dhriti Krishnan , Tejas Goyal , Jaromir Savelka This is my paper

Pith reviewed 2026-06-26 07:47 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords spec learninginference-time alignmentpreference pairsnatural language specificationsLLM alignmentno parameter updates

0 comments

The pith

Preference judgments compile into natural-language specifications that align LLMs at inference time and often beat DPO.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes spec learning to convert a brief user instruction and a small set of preference judgments into readable natural language prompts. These prompts steer an LLM at inference time without any changes to the model weights. The method targets the high cost of fine-tuning approaches like DPO and the brittleness of manual prompt crafting. On specialized domains where the preference signal is dense, the generated specifications produce responses that often outperform DPO while remaining human-readable embodiments of the original preferences.

Core claim

Spec learning compiles a brief user instruction and small set of preference judgments into natural-language specifications that condition LLMs at inference time. These specifications yield responses that frequently outperform direct preference optimization on datasets from specialized domains whose preference signal is dense, all without requiring parameter updates to the underlying models.

What carries the argument

The spec learning framework that turns preference pairs into natural-language specifications used as inference-time conditioning prompts.

If this is right

Alignment of LLM outputs becomes possible without any parameter updates or model-specific tuning.
The resulting specifications double as transparent, human-readable records of the preference signal.
Performance advantages appear specifically in domains where the preference signal is dense.
Steering relies on a brief instruction plus few judgments rather than large-scale fine-tuning data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Users could edit the natural language specifications directly to adjust behavior without new judgments.
The approach may allow preference signals to transfer across different base models more readily than fine-tuned weights.
Domain experts without ML training could generate and maintain alignment rules in plain text.

Load-bearing premise

A small set of preference judgments can be reliably compiled into natural-language specifications that generalize and remain effective at inference time across the tested domains without requiring model-specific tuning or additional validation.

What would settle it

A specialized domain with dense preference signals where responses from the compiled specifications show no improvement over or underperform those from DPO.

Figures

Figures reproduced from arXiv: 2606.24004 by Dhriti Krishnan, Jaromir Savelka, Tejas Goyal.

**Figure 2.** Figure 2: Spec compilation pipeline. The framework brackets heavy generative stages ( [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of the JANUS and BULLETS synthesizers drawn from Code-Pref D.1 Code-Pref JANUS You are an expert Python developer specializing in the creation of high-performance, production-ready code for challenges ranging from algorithmic puzzles to systems-level management and OCR correction. You deliver syntactically correct, runnable, and fully functional implementations that completely solve the requested… view at source ↗

**Figure 4.** Figure 4: Full synthesized prompts for JANUS and BULLETS on Code-Pref (N=20). 18 [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Full synthesized prompts for JANUS and BULLETS on HH-RLHF (hh-helpful, N=20). E Hardware Compute All experiments were conducted on Modal, a serverless cloud compute platform, under Modal’s Academic Research Grant Program; compute credits awarded on the basis of our submitted research abstract. Model hosting, inference, and training were distributed across pipeline stages according to their respective memor… view at source ↗

read the original abstract

Steering a large language model (LLM) toward a desired behavior typically relies on an iterative process of hand-crafting a prompt based on a careful inspection of the model's responses. This is an involved, brittle, and error-prone process. Preference-based fine-tuning is a more rigorous but often prohibitively expensive solution. We propose spec learning, a framework that relies on a brief user instruction and a small set of preference judgments. These are compiled into specifications in the form of natural-language prompts for an LLM. Specifications condition LLMs at inference time, and no parameter updates to the underlying models are required. We show that the responses generated based on the compiled specifications often outperform direct preference optimization (DPO) on datasets from specialized domains whose preference signal is dense. Unlike opaque weight updates, the resulting specifications are human-readable and double as interpretable and transparent written embodiments of the preference signal that produced them.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Spec learning turns small preference sets into readable inference-time prompts that claim to beat DPO in dense domains, but the abstract gives no numbers to evaluate the claim.

read the letter

The main point to know is that the authors describe a method called spec learning that takes a short user instruction and a small number of preference judgments, compiles them into natural-language specifications, and uses those as prompts to condition the LLM at inference time. They claim this approach often outperforms direct preference optimization on specialized domains with dense preference signals, all without any model updates.

What stands out as new is the explicit compilation step from preferences to readable specs. It extends prompt engineering in a structured way and emphasizes the transparency benefit, since the specs can be inspected by humans. The paper does well in framing the practical problems with fine-tuning—cost and lack of interpretability—and offering a lighter alternative that keeps the preference signal visible.

The experiments target domains where the preference signal is dense, which aligns with the method's strengths. If the results include proper baselines and show reliable gains, this could be useful for quick deployment in niche areas.

The soft spots are mostly around the lack of detail in the abstract. There are no reported numbers, error bars, or specifics on how the compilation works or what the datasets look like. This makes it hard to judge the size of the improvement or rule out issues like domain selection. The central assumption—that the compiled specs generalize effectively from a small set without additional validation—needs solid evidence from the full experiments to hold up. If the paper has those controls, the concern is minor; otherwise it could be more significant.

This work is for researchers and practitioners focused on inference-time alignment techniques for LLMs in specialized settings. A reader looking for low-cost ways to incorporate preferences would find the idea worth considering.

It deserves peer review because the core proposal is clear and addresses a real need, even if the empirical support requires closer inspection.

Referee Report

2 major / 0 minor

Summary. The paper proposes 'spec learning,' a framework that takes a brief user instruction plus a small set of preference judgments and compiles them into natural-language specifications (prompts). These specifications are applied at inference time to condition an LLM without any parameter updates or fine-tuning. The central claim is that responses generated from the compiled specifications often outperform direct preference optimization (DPO) on datasets from specialized domains with dense preference signals; the specifications are also presented as human-readable and interpretable embodiments of the preference data.

Significance. If the empirical outperformance claim holds with appropriate controls, the approach would be significant as a training-free, inference-only alternative to preference tuning methods. It could lower the barrier to alignment for specialized domains, improve transparency by producing readable artifacts, and avoid the cost of weight updates. The absence of any quantitative results, baselines, dataset descriptions, or error bars in the manuscript as presented, however, prevents assessment of whether these benefits are realized.

major comments (2)

[Abstract] Abstract: the manuscript asserts that 'responses generated based on the compiled specifications often outperform direct preference optimization (DPO)' on specialized domains, yet supplies no numerical results, baseline comparisons, dataset sizes, domain definitions, or statistical tests. This empirical claim is load-bearing for the paper's contribution and cannot be evaluated without the supporting evidence.
[Abstract] Abstract: the compilation procedure that turns a brief instruction and preference judgments into natural-language specifications is described only at a high level; without details on the compilation algorithm, prompt templates, or how generalization is ensured, it is impossible to assess whether the method avoids the 'brittle and error-prone' issues the authors attribute to hand-crafting prompts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address each major comment below and will revise the manuscript to provide the requested details on empirical results and the compilation procedure.

read point-by-point responses

Referee: [Abstract] Abstract: the manuscript asserts that 'responses generated based on the compiled specifications often outperform direct preference optimization (DPO)' on specialized domains, yet supplies no numerical results, baseline comparisons, dataset sizes, domain definitions, or statistical tests. This empirical claim is load-bearing for the paper's contribution and cannot be evaluated without the supporting evidence.

Authors: We agree that the abstract makes a strong empirical claim and that the version of the manuscript under review does not include numerical results, baselines, dataset sizes, domain definitions, or statistical tests. The full paper contains experiments on specialized domains with dense preference signals that compare spec learning to DPO. To address this, we will revise the abstract to summarize key quantitative findings and add a results section with dataset descriptions, baseline comparisons, metrics, and error bars. revision: yes
Referee: [Abstract] Abstract: the compilation procedure that turns a brief instruction and preference judgments into natural-language specifications is described only at a high level; without details on the compilation algorithm, prompt templates, or how generalization is ensured, it is impossible to assess whether the method avoids the 'brittle and error-prone' issues the authors attribute to hand-crafting prompts.

Authors: We agree that the abstract describes the compilation procedure only at a high level. In the revision we will expand the methods section to include the specific compilation algorithm, the prompt templates employed, and the mechanisms used to promote generalization, thereby clarifying how the approach differs from brittle hand-crafting. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical framework for compiling a brief instruction and small preference set into natural-language specifications used at inference time, with no parameter updates. The abstract and described method contain no equations, fitted parameters, self-referential definitions, or load-bearing self-citations. Performance claims rest on comparisons to DPO using external datasets rather than any derivation that reduces to the inputs by construction. The central claim is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The method implicitly assumes that natural-language prompts can faithfully encode preference signals.

pith-pipeline@v0.9.1-grok · 5683 in / 1071 out tokens · 12909 ms · 2026-06-26T07:47:44.256334+00:00 · methodology

Towards Spec Learning: Inference-Time Alignment from Preference Pairs

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)