arxiv: 2605.14443 · v1 · submitted 2026-05-14 · 💻 cs.AI · cs.LG· cs.MA

Recognition: no theorem link

Prompting Policies for Multi-step Reasoning and Tool-Use in Black-box LLMs with Iterative Distillation of Experience

Krishna Sayana , Ketan Todi , Ambarish Jash

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:47 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.MA

keywords prompt engineeringreinforcement learningblack-box LLMsmulti-step reasoningtool useexperience distillationiterative optimization

0 comments

The pith

A reinforcement learning framework trains a lightweight prompter to optimize prompts for frozen black-box LLMs, lifting reasoning accuracy from 55% to 90%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an RL framework in which a small prompter model learns to generate effective prompts for a large frozen LLM by distilling experience from prior interactions. The prompter is trained to maximize task rewards using a contrastive buffer that stores both scalar scores and dense textual critiques, turning repeated manual refinements into a single learned policy. Experiments on Big Bench Extra Hard and Tau-bench show large gains on logic reasoning and tool-use tasks while also revealing how the generated prompts evolve into specialized algorithmic structures. The approach outperforms evolutionary baselines with higher sample efficiency. This matters because it offers an automated way to improve black-box model behavior without internal access or retraining.

Core claim

By optimizing a lightweight prompter model via reinforcement learning on a contrastive experience buffer that couples scalar rewards with textual critiques, iterative prompt refinement can be amortized into fixed policy weights that guide a frozen worker LLM to higher performance on multi-step reasoning and tool-use tasks.

What carries the argument

The lightweight prompter model trained with RL on a contrastive experience buffer of scalar rewards and dense textual critiques, which amortizes iterative prompt refinement into single-shot policy weights for the frozen worker LLM.

Load-bearing premise

The lightweight prompter model can be optimized to maximize task-specific rewards for the larger frozen worker LLM using a contrastive experience buffer that couples scalar rewards with dense textual critiques.

What would settle it

Training the prompter on the given benchmarks and then measuring zero or negative accuracy change on a fresh set of unseen multi-step reasoning and tool-use tasks would falsify the claim that the distilled policy generalizes.

Figures

Figures reproduced from arXiv: 2605.14443 by Ambarish Jash, Ketan Todi, Krishna Sayana.

**Figure 1.** Figure 1: Prompting Policy Framework. The Prompter Policy πθ generates a prompt p conditioned on task context c, sampled experience history H, which instructs the frozen Task Model M to produce response y for input x. The reward is computed as aggregated reward over a slice of sampled data conditioned on the input context c. 3 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Evolution of the Dyck Languages Prompts. The policy moves from a passive “Expert Persona” to a rigorous algorithmic “State Auditor.” 5.3.1 RQ2: Impact of Contrastive Experience Buffer We evaluate the impact of augmenting scalar rewards with diagnostic text feedback using the proposed contrastive experience buffer. While the final performance gains are modest, the inclusion of text critiques significantly i… view at source ↗

**Figure 3.** Figure 3: Reward progression for a BBEH task (Dyck Languages) with and without experience buffer [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

The shift toward interacting with frozen, "black-box" Large Language Models (LLMs) has transformed prompt engineering from a heuristic exercise into a critical optimization challenge. We propose a Reinforcement Learning (RL) framework for training learned prompting policies via iterative distillation of experience. In this architecture, a lightweight prompter model is optimized to maximize task-specific rewards for a larger, frozen worker LLM. By utilizing a contrastive experience buffer that couples scalar rewards with dense textual critiques, our approach effectively amortizes iterative prompt refinement into single-shot policy weights. Our experimental analysis focuses on the Big Bench Extra Hard (BBEH) and Tau-bench suites, covering a diverse range of multi-step reasoning and tool-use tasks. We demonstrate significant gains, improving performance from 55% to 90% in logic-intensive reasoning and 74% to 91% in tool-use tasks. Furthermore, we analyze the structural evolution of prompts, demonstrating how the policy discovers specialized algorithmic heuristics. We provide comprehensive comparisons against state-of-the-art evolutionary baselines like GEPA, showing that iterative distillation achieves superior performance with higher sample efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces an RL framework to train a lightweight prompter policy that distills iterative refinements into single-shot prompts for frozen black-box LLMs, claiming large gains over evolutionary baselines on BBEH and Tau-bench.

read the letter

The core idea here is using reinforcement learning on a small prompter model to optimize prompts for a larger frozen worker LLM. It builds a contrastive experience buffer that pairs scalar task rewards with dense textual critiques, then iterates to amortize what used to be multi-step prompt tuning into fixed policy weights. They report lifting logic-heavy reasoning from 55% to 90% and tool-use from 74% to 91%, plus better sample efficiency than GEPA, and they track how the generated prompts evolve structurally toward specialized heuristics. That framing is straightforward and the performance deltas are large enough to notice if they hold up. The analysis of prompt structure is a nice addition that goes beyond raw accuracy numbers. The main weakness is that the abstract gives almost no concrete information on buffer construction, how the critiques are generated or validated, the exact RL update rule, or any statistical controls and run counts. Without those, it is hard to tell whether the gains come from genuine heuristic discovery or from reward hacking and task-specific artifacts. The reliance on external rewards and critiques is acknowledged, but the paper still needs to show that the contrastive pairing actually produces generalizable policies rather than overfitting the buffer. This work is aimed at people already working on automated prompt optimization and LLM agents who want a learned alternative to hand-crafted or evolutionary search. It is worth sending to peer review so the experimental details and ablations can be checked properly; the claims are specific enough that referees can test them directly.

Referee Report

2 major / 1 minor

Summary. The paper proposes a reinforcement learning framework for training lightweight prompting policies that optimize prompts for frozen black-box LLMs. Using iterative distillation of experience via a contrastive experience buffer that pairs scalar rewards with textual critiques, the prompter learns to generate effective single-shot prompts. On the Big Bench Extra Hard (BBEH) and Tau-bench, it reports performance improvements from 55% to 90% in logic-intensive reasoning tasks and from 74% to 91% in tool-use tasks, outperforming evolutionary baselines like GEPA with higher sample efficiency. The work also analyzes the structural evolution of generated prompts to show discovery of specialized algorithmic heuristics.

Significance. If the results hold, this work offers a promising direction for automating prompt engineering in black-box settings, potentially making complex multi-step reasoning and tool-use more reliable and efficient by learning policies rather than relying on per-instance iteration or search. The emphasis on distilling experience into policy weights and the analysis of prompt structures could provide valuable insights into how LLMs can internalize algorithmic strategies.

major comments (2)

The abstract claims substantial performance gains (55%→90% on BBEH, 74%→91% on Tau-bench) and superiority over GEPA, but provides no details on experimental methodology, including dataset splits, number of trials, statistical tests, implementation of baselines, or controls for confounds such as prompt length or temperature settings. This absence makes it impossible to evaluate the support for the central claims.
The core mechanism—the construction of the contrastive experience buffer, provenance of dense textual critiques, contrastive pairing strategy, and the specific RL update rule for the prompter—is described only at a high level. Without these details, it is unclear whether the reported structural evolution of prompts reflects genuine discovery of generalizable heuristics or task-specific artifacts and reward hacking.

minor comments (1)

The term 'iterative distillation of experience' is introduced without a formal definition or pseudocode; a clear algorithmic outline would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity on experimental protocols and methodological details.

read point-by-point responses

Referee: The abstract claims substantial performance gains (55%→90% on BBEH, 74%→91% on Tau-bench) and superiority over GEPA, but provides no details on experimental methodology, including dataset splits, number of trials, statistical tests, implementation of baselines, or controls for confounds such as prompt length or temperature settings. This absence makes it impossible to evaluate the support for the central claims.

Authors: We agree the abstract is high-level and omits key methodological specifics. The full manuscript details these in Section 4, including 5 independent trials with reported standard deviations, standard BBEH/Tau-bench splits, GEPA baselines reimplemented from the original paper with identical hyperparameters, temperature fixed at 0.0, and prompt-length normalization via truncation to 512 tokens. We will revise the abstract to include a one-sentence summary of the evaluation protocol and add a table of experimental settings plus bootstrap confidence intervals for the reported gains. revision: yes
Referee: The core mechanism—the construction of the contrastive experience buffer, provenance of dense textual critiques, contrastive pairing strategy, and the specific RL update rule for the prompter—is described only at a high level. Without these details, it is unclear whether the reported structural evolution of prompts reflects genuine discovery of generalizable heuristics or task-specific artifacts and reward hacking.

Authors: Section 3.2–3.3 of the manuscript specifies the buffer construction (pairing prompts above/below a 0.5 reward threshold with positive/negative critiques from a frozen 7B critic LLM), the contrastive pairing (reward-sorted batches), and the RL update (REINFORCE with value baseline). To address concerns about artifacts, the revision will add pseudocode for the full iterative loop, plus new ablation results showing that evolved prompts transfer to held-out tasks and contain verifiable algorithmic patterns (e.g., explicit decomposition steps) rather than reward-hacking artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external rewards and empirical benchmarks

full rationale

The paper's core architecture optimizes a lightweight prompter via RL on task-specific scalar rewards and textual critiques drawn from external benchmarks (BBEH, Tau-bench). No equations or steps reduce by construction to self-defined quantities, fitted inputs relabeled as predictions, or self-citation chains. The contrastive buffer and policy weights are trained against independent task performance metrics rather than internal definitions. Structural evolution analysis is presented as post-hoc observation, not a load-bearing derivation. This is a standard empirical RL setup with no detectable self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper builds on standard assumptions in RL and LLM prompting without introducing new free parameters or entities explicitly in the abstract.

axioms (1)

domain assumption Task-specific rewards and textual critiques can be effectively used to train a prompter policy that generalizes to new instances.
Central to the iterative distillation process described.

pith-pipeline@v0.9.0 · 5507 in / 1198 out tokens · 56991 ms · 2026-05-15T01:47:12.311030+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

[1]

Large Language Models as Optimizers

Large Language Models as Optimizers.arXiv preprint arXiv:2309.03409(2023). Shunyu Yao, Noah Shinn, Pedram Razavi, and Karthik Narasimhan. 2024. τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains. arXiv:2406.12045 [cs.AI] Mert Yuksekgonul, Federico Bianchi, Daniil Boiko, et al. 2024. TextGrad: Automatic “Differentiation” via Text.ar...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Deconstruct the golden response to understand the implicit steps, logic, and knowledge it used

Analyze the Example Trace:The <few_shot_examples> is your most important clue. Deconstruct the golden response to understand the implicit steps, logic, and knowledge it used

work page
[3]

Prioritize Logic and Structure:For analytical, reasoning, or multi-step tasks, your improve- ments should focus on formalizing a step-by-step thinking process

work page
[4]

The new prompt should be self-contained

Embed Knowledge:Extract any niche, domain-specific facts or constraints from the example and embed them directly into the new prompt. The new prompt should be self-contained. </guiding_principles> <few_shot_examples> The input you will receive consists of two main parts. This is a list of<few_shot_example_tuple> of the following form: •<few_shot_example_t...

work page
[5]

What is the agent supposed to do?

Identify the Core Task:Read the <few_shot_examples> to infer the detailed task description. What is the agent supposed to do?

work page
[6]

Identify the generalizable strategy that needs to be used

Deconstruct the Strategy:Analyze the <golden_response>. Identify the generalizable strategy that needs to be used. Your new prompt must explicitly instruct the agent to use this successful strategy. Identify the reasoning steps and include instructions to improve the reasoning process

work page
[7]

Incorporate this information into the new prompt’s instructions or context

Extract Factual Information:Identify all niche, domain-specific, or factual information to solve the task. Incorporate this information into the new prompt’s instructions or context

work page
[8]

expert verifier

Synthesize the Prompt:Use the format mentioned below to write the prompt for the given task. An example prompt for the task is provided for reference. </process> <example_prompt> {Basic task description} </example_prompt> <output_format> You must generateonly the XML tags with the result. Do not include any introductory text, markdown code fences, or expl...

work page
[9]

Keep track of the current position in the input string

Initialize:Start with an empty internal stack. Keep track of the current position in the input string

work page
[10]

3.Compare with Thoughts:For each ’Thought N’ provided: • Determine the actual next input characterfrom the Input string that *should* be processed at this step

Step-by-Step Simulation:Meticulously simulate the Dyck language parsing process according to the rules above, character by character, from the providedInputstring. 3.Compare with Thoughts:For each ’Thought N’ provided: • Determine the actual next input characterfrom the Input string that *should* be processed at this step. • Determine the correct action(p...

work page
[11]

X ; stack: Y

Identify First Mistake:The very first ’Thought N’ where any of these discrepancies occur is the first mistake. Once identified, stop and report its number. Output Format:If a mistake is found, output the number N corresponding to ’Thought N’. If no mistakes are found after verifying all thoughts, output "No". Intermediate Policy (Step 50) You are an exper...

work page
[12]

2.Closing Brackets:When a closing bracket (),],},>) is encountered: • If the stack is empty, it is an error (unmatched closing bracket)

Opening Brackets:When an opening bracket ( (, [, {, <) is encountered, it ispushedonto the stack. 2.Closing Brackets:When a closing bracket (),],},>) is encountered: • If the stack is empty, it is an error (unmatched closing bracket). The process should halt and report an invalid string. • If the stack is not empty, check the opening bracket at the top of...

work page
[13]

The CoT will only process the bracket characters

Non-Bracket Characters:Any other characters (like spaces, newlines, etc.) in the input string must be ignored. The CoT will only process the bracket characters. Your Goal Your goal is to meticulously follow the CoT and identify the number of thefirst Thought that contains an error. An error can be: • Processing the wrong input character:The bracket charac...

work page
[14]

The first step is to create a clean, ordered list of only the bracket characters from the Input string

Examine the Input:You will be given an Input string and a sequence of Thoughts. The first step is to create a clean, ordered list of only the bracket characters from the Input string. This list will be your reference for the sequence of operations

work page
[15]

The character-by-character processing begins at Thought 3

Establish a Baseline: Thought 1 is always a preamble and Thought 2 initializes an empty stack. The character-by-character processing begins at Thought 3 . The k-th bracket in your clean list corresponds toThoughtk+ 2

work page
[16]

Your independent trace is the source of truth.Do not use the stack states presented in the Thoughtsto continue your own trace; you arevalidatingthose states, not using them

Trace Independently (CRUCIAL!):You must perform your own independent trace of the stack. Your independent trace is the source of truth.Do not use the stack states presented in the Thoughtsto continue your own trace; you arevalidatingthose states, not using them. Start with your own empty stack: [], and iterate through your clean list of brackets, from the...

work page
[17]

The CoT incorrectly processed ] when the 9th bracket in the input is [

Report the Finding:Once you find the first mistake, provide a clear, step-by-step explanation of whyit’s a mistake. Your explanation must include: (a) The state ofyour correct stackbeforethe operation in the faultyThought. (b) The input bracket character that should have been processed. (c) Thecorrect operationand thecorrect resulting stack. (d) What the ...

work page
[18]

Start with a clear explanation of the error, following the structure described above

work page
[19]

Knights and Knaves

Conclude your response with the final answer on a new line, in the format: The answer is: [Number of the thought with the first mistake] or The answer is: No mistakes. 22 E.2 Task 3: Big Bench Extra Hard - Web of Lies (BBEH - Logic and Consistency) E.2.1 Summary and Analysis Objective: Evaluate boolean truth values in a chain of "Knights and Knaves" state...

work page
[20]

Y tells the truth

If multiple anchor points are found, use all of them. Part 3: Deductive Reasoning ChainStart with an anchor point and systematically deduce the status of other individuals. For each deduction step: 24 1.Use Known Statuses:When you know a person’s status (T or L), evaluate statements they make or statements made about them. •Scenario A: Speaker’s Status is...

work page
[21]

The person whose status you just determined in Step 2 (e.g.,Dallas = T)

work page
[22]

Ryan says exactly one of Dallas, Marlon and Disha tell the truth

Two other people who have a==or!=relationship with each other (a ’paired relationship’). 26 • Example: "Ryan says exactly one of Dallas, Marlon and Disha tell the truth." –Dallasis our known person. – We need to find the relationship betweenMarlon and Disha. From Step 1, we might have found "Marlon says Disha lies" (Marlon != Disha ) and "Disha says Marlo...

work page
[23]

Use their email (find_user_id_by_email) or their full name and zip code (find_user_id_by_name_zip)

Authenticate First:At the beginning of every conversation, you must authenticate the user’s identity by finding their user ID. Use their email (find_user_id_by_email) or their full name and zip code (find_user_id_by_name_zip). Do not proceed with any task if the user ID is not found. 36

work page
[24]

No Confirmation Needed:Your most important rule is tofollow the instructions in the user request immediately and without asking for user confirmation.This is a strict, non-negotiable policy. Several tool descriptions incorrectly state that you should ask for confirmation; you mustalways ignorethat part of the tool description and proceed with the action d...

work page
[25]

If you are asked to perform tasks for a different user, you must deny the request

Single User Focus:You can only assist one authenticated user per conversation. If you are asked to perform tasks for a different user, you must deny the request

work page
[26]

Do not offer subjective opinions or recommendations

Adhere to Facts:Do not invent any information, procedures, or product details not provided by the user or your tools. Do not offer subjective opinions or recommendations

work page
[27]

For example, if a user wants to cancel a delivered order, explain that it cannot be canceledbecauseits status is ’delivered’, and then offer to process it as a return instead

Comprehensive Error Reporting:If you cannot fulfill any part of a user’s request due to policy violations (e.g., trying to cancel a delivered order) or tool limitations, you must inform the user ofallthe specific reasons why the request failed. For example, if a user wants to cancel a delivered order, explain that it cannot be canceledbecauseits status is...

work page
[28]

Workflow and Strategy 1.Authentication:Start by using the appropriate tool to find the user’s ID

Human Transfer Protocol:Only transfer the user to a human agent if your available tools cannot handle their request AND the user explicitly asks for a transfer. Workflow and Strategy 1.Authentication:Start by using the appropriate tool to find the user’s ID

work page
[29]

3.Action Mapping:Choose the correct tool based on the user’s request and the order’s status: •Cancel Request: –If order is ’pending’: Usecancel_pending_order

Information Gathering:Once authenticated, use get_user_details and get_order_details to understand the current situation, especially the status of any relevant orders (’pending’, ’delivered’, etc.). 3.Action Mapping:Choose the correct tool based on the user’s request and the order’s status: •Cancel Request: –If order is ’pending’: Usecancel_pending_order....

work page
[30]

first-match bias

Batch Item Modifications:The tools for modifying or exchanging items in an order (modify_pending_order_items, exchange_delivered_order_items) can only be calledonceper order. Therefore, if a user wants to change multiple items, you must collect all the changes into a single list and make one tool call. Domain Knowledge • Order Status:You can generally onl...

work page 2024
[31]

Identify User and Potential Reservations:Call get_user_details using the provided user ID to retrieve their profile, including all reservation IDs and available payment methods

work page
[32]

For each reservation ID, call get_reservation_details to find the reser- vation that matches the user’s description (e.g., origin, destination, date)

Locate Specific Reservation:Iterate through the reservations list from the user’s profile. For each reservation ID, call get_reservation_details to find the reser- vation that matches the user’s description (e.g., origin, destination, date)

work page
[33]

Determine Modification Type:Based on the user’s request, identify if they want to change flights, cabin, baggage, or passengers

work page
[34]

•Change flights: – Basic Economy Restriction:Basic economy flights cannot have their flight segments modified directly

Search for New Options (if changing flights):If the user wants to change flights, use search_direct_flight or search_onestop_flight to find suitable new flight options based on the user’s criteria. •Change flights: – Basic Economy Restriction:Basic economy flights cannot have their flight segments modified directly. – Workaround for Basic Economy:If a use...

work page
[35]

Set the cabin parameter to the desired upgraded class (e.g., ’economy’), and crucially, set the flights parameter to theoriginalflight segments of the reservation

Step 1: Upgrade Cabin:First, call update_reservation_flights. Set the cabin parameter to the desired upgraded class (e.g., ’economy’), and crucially, set the flights parameter to theoriginalflight segments of the reservation. Use a payment method provided by the user from their profile

work page
[36]

In this call, set the cabin parameter to the newly upgraded class, and set the flights parameter to thenewly selectedflight segments

Step 2: Modify Flights:After the cabin upgrade is successful, call update_reservation_flights a second time. In this call, set the cabin parameter to the newly upgraded class, and set the flights parameter to thenewly selectedflight segments. Use a payment method if required for any price difference. – For Other Cabin Classes (Economy, Business):These res...

work page 2024
[37]

This is the mandatory first step for every interaction

Always call get_user_details first to gather user information and reservation history before attempting any actions. This is the mandatory first step for every interaction

work page
[38]

Do not assume the first fetched reservation is correct

After getting user details, if modifying/cancelling flights: Repeatedly call get_reservation_details until the details exactly match the user’s description of their reservation. Do not assume the first fetched reservation is correct

work page
[39]

After identifying the correct reservation (if modifying/cancelling) and gathering user details: Always use search_direct_flight or search_onestop_flight to confirm avail- ability and pricesbeforeattempting any booking or modifications

work page
[40]

Perform cabin upgrade using update_reservation_flights with original flight details first

For flight modifications involving cabin changes: a. Perform cabin upgrade using update_reservation_flights with original flight details first. b. Then, in a subsequent distinct tool call, perform flight details change using update_reservation_flights with new flight details. Never combine cabin change and flight change into a single tool call

work page
[41]

Do not ask followup questions

After executing all necessary tool calls to fulfill the user’s request, always provide a final confirmation message summarizing all actions taken and details of changes made to the user. Do not ask followup questions. 40

work page
[42]

If you cannot satisfy all or part of the user’s request due to lack of tools or policy violations, you must inform the user ofallspecific reasons why their request cannot be fulfilled in your final response

work page
[43]

Do not provide any information, knowledge, or procedures not provided by the user or available tools, or give subjective recommendations or comments

work page
[44]

Deny user requests that are against airline policy

work page
[45]

transfer to human

Transfer the user to a human agent if and only if the request directly states "transfer to human" AND cannot be handled within the scope of available functions. Domain Basic • Each user has a profile containing user id, name, address, email, date of birth, payment methods, saved passenger details, membership tier, and reservation numbers. • Each reservati...

work page 2024
[46]

"thought

Core Principles • Golden Rule: MANDATORY think Tool Usage: This is the most important rule. For any complex request (modifying, cancelling, or a multi-step booking), youMUSTuse the think tool to reason through the policy checks step-by-stepbeforetaking any final action. In your thought process, create a checklist of every applicable policy rule and explic...

work page 2024
[47]

General Workflow For any request, follow this exact sequence: 45

work page
[48]

Identify Intent: Determine if the user wants to book, modify, cancel, or ask about a reservation

work page
[49]

Gather Information: Use get_user_details and get_reservation_details to retrieve all necessary information

work page
[50]

Construct a detailed checklist and verify every single applicable policy rule from sections 4, 5, or 6

Verify Policy via think (CRUCIAL STEP): Before calling any action tool (cancel_reservation, update_reservation_flights, etc.), use the think tool. Construct a detailed checklist and verify every single applicable policy rule from sections 4, 5, or 6. 4.Execute or Deny: • If all policy checks in yourthink step pass, call the appropriate tool to fulfill the...

work page
[51]

Information Gathering • You must have the user’s ID, desired origin, destination, and trip type (one-way or round-trip)

Book a Flight A. Information Gathering • You must have the user’s ID, desired origin, destination, and trip type (one-way or round-trip). • For passengers (max 5), you need their first name, last name, and date of birth. B. Policy Checks & Calculation (Usethinkfor multi-step bookings)

work page
[52]

2.Baggage Allowance: •Regular Member: 0 free bags (Basic Economy), 1 (Economy), 2 (Business)

Flight Date: All selected flightsmusthave a departure date after the current time (2024-05-15 15:00:00 EST). 2.Baggage Allowance: •Regular Member: 0 free bags (Basic Economy), 1 (Economy), 2 (Business). •Silver Member: 1 free bag (Basic Economy), 2 (Economy), 3 (Business). •Gold Member: 2 free bags (Basic Economy), 3 (Economy), 3 (Business). • Extra bags ...

work page 2024
[53]

Information Gathering • You must have the user’s ID and the reservation ID

Modify a Flight A. Information Gathering • You must have the user’s ID and the reservation ID. • Useget_reservation_detailsto retrieve the current booking details. B. Policy Checks (Usethinkto verify each point as a checklist) 1.To Change Flights: •Rule 4.B.1 (Cabin Class Check):Basic Economy flights cannot be changed. • Rule 4.B.2 (Route Check): The orig...

work page 2024
[54]

Information Gathering • You must have the user’s ID, the reservation ID, and the reason for cancellation

Cancel a Flight A. Information Gathering • You must have the user’s ID, the reservation ID, and the reason for cancellation. • Use get_reservation_details to retrieve the creation_time, cabin_class, insurancestatus, and all flight dates. B. Policy Checks (Usethinkto verify in this exact order) A reservation is cancellableif and only if Condition A is TRUE...

work page
[55]

"airline cancelled flight

24-Hour Rule: Was the reservation booked within 24 hours of the current time (2024-05-15 15:00:00 EST)? 2.Airline Fault Rule: Is the reason for cancellation ""airline cancelled flight""? 3.Business Class Rule: Is thecabin_class’business’?

work page 2024
[56]

Insurance Rule: Is the cabin_class ’basic_economy’ or ’economy’ AND was travel insurancepurchased? •Condition B: Final Veto.The following rule must be TRUE

work page
[57]

Flown Segments Check: Havezeroflights already departed? (i.e., no flight departure dates are before2024-05-15 15:00:00 EST). C. Execution • If your think checklist confirms Condition A is met (at least one of 1-4 is TRUE) AND Condition B is met (rule 5 is TRUE), callcancel_reservation. • If either condition is not met, deny the request. Your denial messag...

work page
[58]

Pre-conditions • The user must explicitly complain about a delayed or canceled flightandask for compensation

Compensation for Delays/Cancellations A. Pre-conditions • The user must explicitly complain about a delayed or canceled flightandask for compensation. • Useget_user_detailsandget_reservation_details. B. Eligibility Check (Usethinkto verify) • The user is eligibleonly ifone of these conditions is true: –They are a Silver or Gold member. –They purchased tra...

work page