Test Management and Coordination During the Vera C. Rubin Observatory Commissioning and Early Operations Using Zephyr Scale

Brian Stalder; Bruno Quint; David Sanmartim; Erik Dennihy; Keith Bechtol; Tiago Ribeiro

arxiv: 2606.31795 · v1 · pith:Z3SB5PXEnew · submitted 2026-06-30 · 🌌 astro-ph.IM

Test Management and Coordination During the Vera C. Rubin Observatory Commissioning and Early Operations Using Zephyr Scale

Bruno Quint , Tiago Ribeiro , Erik Dennihy , Brian Stalder , David Sanmartim , Keith Bechtol This is my paper

Pith reviewed 2026-07-01 03:19 UTC · model grok-4.3

classification 🌌 astro-ph.IM

keywords test managementobservatory commissioningtest cyclesintegration testson-sky testsscheduler integrationdistributed teams

0 comments

The pith

Zephyr Scale coordinates hundreds of integration and on-sky tests for observatory commissioning through daily test cycles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes the adoption of a test management tool to handle the coordination of planning, design, and execution for hundreds of tests involving multiple subsystems and teams in different locations during observatory commissioning. Individual tests are prepared as cases that include step-by-step scripts for summit execution, then assembled each day into cycles that define the full test plan. Complex tests are also written as JSON files that the real-time scheduler can consume to carry out operations through an abstraction layer. A sympathetic reader would care because managing distributed testing at this scale is necessary for reliable facility operations, and the approach described remains active into early operations.

Core claim

The authors adopted Zephyr Scale to coordinate test activities, initially limited to system verification and validation but later expanded to higher-level tests that continue in early operations. The workflow creates Test Cases containing all information needed for execution, groups them into daily Test Cycles, and prepares more complex tests as JSON files for the Scheduler to ingest and execute common operations such as slewing, tracking, and data acquisition via an abstraction layer. The paper also summarizes the benefits and limitations of applying this tool, originally designed for software testing, to large-scale observatory commissioning.

What carries the argument

Zephyr Scale, a test management tool that creates Test Cases with step-by-step execution scripts, groups them into daily Test Cycles, and integrates with the Scheduler through JSON files for automated observatory operations.

If this is right

Daily grouping of Test Cases into cycles enables coordinated execution across subsystems and distributed teams.
JSON-based tests allow the Scheduler to handle high-level observing scripts for operations like slewing, tracking, and data acquisition.
The tool supports a shift from system verification focus to ongoing higher-level testing in early operations.
Centralized management of test information reduces gaps between test design and actual summit execution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This cycle-based grouping could reduce coordination friction in other distributed scientific facilities if the daily planning step is adapted to their specific constraints.
Measuring efficiency would require adding metrics on test throughput or failure modes, which the current description leaves open for future collection.
The JSON integration points to a general pattern for linking manual test planning with real-time automated execution engines in complex instruments.

Load-bearing premise

That the defined workflow for test creation, review, and deployment successfully bridges the gap between ideation and on-sky execution, since no outcome metrics, success rates, or failure examples are provided.

What would settle it

Reporting quantitative data such as the fraction of tests that reach execution without coordination issues, average time from test creation to on-sky run, or error rates traceable to planning gaps before versus after the workflow would directly test the bridging claim.

Figures

Figures reproduced from arXiv: 2606.31795 by Brian Stalder, Bruno Quint, David Sanmartim, Erik Dennihy, Keith Bechtol, Tiago Ribeiro.

**Figure 2.** Figure 2: Lifecycle of a Test Case backed by a JSON BLOCK or Scheduler Configuration. The Jira ticket and the Test [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of the number of steps per Test Case across the 680 Test Cases in the BLOCK project. The dashed [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

The commissioning of the NSF-DOE Vera C. Rubin Observatory required coordinating the planning, design, and execution of hundreds of integration and on-sky tests involving different subsystems and geographically distributed teams. To support this task, we adopted a Jira-native test management tool, Zephyr Scale. The initial use of Zephyr Scale focused solely on system verification and validation. Its use was rescoped to coordinate higher-level tests, and it is still in use in early operations. Zephyr Scale allows the creation of Test Cases, which represent individual tests. Each Test Case contains the information needed to execute a test at the summit. This includes a step-by-step script. Every day, Test Cases are grouped into a Test Cycle, which represents the test plan for all tests to be executed that day and that same night. We describe the defined workflow for test creation, review, and deployment, which bridges the gap between ideation and on-sky execution within a Test Cycle. We also outline how we write more complex tests as partially automated JSON files consumed by the Scheduler--the system's real-time, constraint-aware observation optimization engine. This integration enables the Scheduler to ingest high-level observing scripts that communicate with subsystems via an abstraction layer to execute common observatory operations, such as slewing, tracking, and data acquisition. Finally, we summarize the benefits and limitations of using Zephyr Scale, designed initially to coordinate software testing, for large-scale observatory commissioning and early operations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Descriptive case study of Zephyr Scale adoption at Rubin Observatory that details workflows but supplies no metrics to support its coordination claims.

read the letter

Hi,

The paper is a straightforward report on how the Rubin team adopted Zephyr Scale to manage hundreds of integration and on-sky tests across distributed groups. The new material is the concrete workflow description: test cases with step-by-step scripts, daily grouping into test cycles, the review and deployment steps, and the JSON handoff to the scheduler for partially automated observations.

It does a clean job laying out the practical steps and the shift from initial system verification use to broader operations. The benefits and limitations section is direct about adapting a software-testing tool to observatory work.

The soft spot is the absence of any numbers. No completion rates, no error counts, no before-and-after comparisons. The claim that the workflow bridges ideation to on-sky execution is stated but rests only on the procedural outline. The stress-test note is accurate on this point.

This is for operations engineers and test managers at other large facilities who need examples of commercial tool use in commissioning. A reader in that niche can extract usable details on the scheduler integration.

It deserves peer review as a case study. The description is solid and the topic fits the instrumentation community even if more evidence would make the central claim stronger.

Best, Your colleague

Referee Report

1 major / 1 minor

Summary. The manuscript describes the adoption of Zephyr Scale (a Jira-native test management tool) to coordinate planning, design, and execution of hundreds of integration and on-sky tests for Vera C. Rubin Observatory commissioning and early operations. It details Test Case creation with step-by-step scripts, daily grouping into Test Cycles, a workflow for creation/review/deployment that is asserted to bridge ideation to on-sky execution, integration of complex tests as JSON files consumed by the Scheduler for automated operations (slewing, tracking, data acquisition), and a summary of benefits and limitations of repurposing the tool from software testing.

Significance. If the described workflow and integration are effective, the paper provides a practical case study on adapting commercial test management software for large-scale, geographically distributed scientific instrumentation projects. This could inform similar efforts at other observatories. The purely descriptive approach without quantitative validation (e.g., completion rates or efficiency metrics) limits its value as a validated methodological contribution.

major comments (1)

[Abstract] Abstract: The assertion that the defined workflow for test creation, review, and deployment 'bridges the gap between ideation and on-sky execution within a Test Cycle' is presented without any supporting data such as test completion rates, error counts, coordination metrics, or concrete execution examples. This claim is load-bearing for the paper's central description of successful coordination but remains unsupported.

minor comments (1)

The manuscript would benefit from at least one concrete example of a Test Case script or a Test Cycle to illustrate the workflow steps.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review of our manuscript describing the adoption of Zephyr Scale for test management at Vera C. Rubin Observatory. The paper is a descriptive case study of the workflow and integration rather than a quantitative validation study. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that the defined workflow for test creation, review, and deployment 'bridges the gap between ideation and on-sky execution within a Test Cycle' is presented without any supporting data such as test completion rates, error counts, coordination metrics, or concrete execution examples. This claim is load-bearing for the paper's central description of successful coordination but remains unsupported.

Authors: We agree that the manuscript provides no quantitative metrics (such as completion rates or error counts) to substantiate the effectiveness of the workflow. The paper's scope is limited to describing the implementation of Test Cases, Cycles, the review workflow, and the JSON-Scheduler integration. The phrasing in the abstract overstates the claim by implying demonstrated success. We will revise the abstract to remove this assertion and instead neutrally describe the workflow components and their intended purpose. revision: yes

Circularity Check

0 steps flagged

Purely descriptive report with no derivations or predictions

full rationale

The manuscript is a procedural description of adopting and using Zephyr Scale for coordinating tests during observatory commissioning. It contains no equations, fitted parameters, predictions, mathematical derivations, or load-bearing self-citations. The workflow is presented as a defined process (test case creation, cycles, JSON integration) without any reduction to inputs by construction or self-referential justification. No patterns from the enumerated circularity kinds apply, as there is no derivation chain to inspect.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a descriptive operations report with no mathematical content, fitted parameters, background axioms, or postulated entities.

pith-pipeline@v0.9.1-grok · 5811 in / 1052 out tokens · 36733 ms · 2026-07-01T03:19:28.555368+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 7 canonical work pages · 1 internal anchor

[1]

and Yoachim, P

Naghib, E. and Yoachim, P. and Vanderbei, R. J. and Connolly, A. J. and Jones, R. L. , title = ". , year = 2019, volume =

2019
[2]

and Schumacher, G

Delgado, F. and Schumacher, G. , title = ". Proc.\ SPIE , year = 2014, volume =

2014
[3]

Observatory Operations: Strategies, Processes, and Systems VI , year = 2016, editor =

The LSST Scheduler from design to construction. Observatory Operations: Strategies, Processes, and Systems VI , year = 2016, editor =. doi:10.1117/12.2233630 , adsurl =

work page doi:10.1117/12.2233630 2016
[4]

LSST: from Science Drivers to Reference Design and Anticipated Data Products

LSST: From Science Drivers to Reference Design and Anticipated Data Products. , keywords =. doi:10.3847/1538-4357/ab042c , archivePrefix =. 0805.2366 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.3847/1538-4357/ab042c
[5]

Software and Cyberinfrastructure for Astronomy IV , year = 2016, editor =

LSST control software component design. Software and Cyberinfrastructure for Astronomy IV , year = 2016, editor =. doi:10.1117/12.2231796 , adsurl =

work page doi:10.1117/12.2231796 2016
[6]

Ground-based and Airborne Telescopes VI , year = 2016, editor =

LSST communications middleware implementation. Ground-based and Airborne Telescopes VI , year = 2016, editor =. doi:10.1117/12.2233099 , adsurl =

work page doi:10.1117/12.2233099 2016
[7]

Software and Cyberinfrastructure for Astronomy VIII , year = 2024, editor =

Replacing DDS with Apache Kafka as middleware technology for the Rubin Observatory control system. Software and Cyberinfrastructure for Astronomy VIII , year = 2024, editor =. doi:10.1117/12.3020002 , adsurl =

work page doi:10.1117/12.3020002 2024
[8]

doi:10.71929/rubin/2571927 , url =

The LSST Camera (LSSTCam). doi:10.71929/rubin/2571927 , url =

work page doi:10.71929/rubin/2571927
[9]

doi:10.71929/rubin/2561361 , url =

The LSST Commissioning Camera (LSSTComCam). doi:10.71929/rubin/2561361 , url =

work page doi:10.71929/rubin/2561361

[1] [1]

and Yoachim, P

Naghib, E. and Yoachim, P. and Vanderbei, R. J. and Connolly, A. J. and Jones, R. L. , title = ". , year = 2019, volume =

2019

[2] [2]

and Schumacher, G

Delgado, F. and Schumacher, G. , title = ". Proc.\ SPIE , year = 2014, volume =

2014

[3] [3]

Observatory Operations: Strategies, Processes, and Systems VI , year = 2016, editor =

The LSST Scheduler from design to construction. Observatory Operations: Strategies, Processes, and Systems VI , year = 2016, editor =. doi:10.1117/12.2233630 , adsurl =

work page doi:10.1117/12.2233630 2016

[4] [4]

LSST: from Science Drivers to Reference Design and Anticipated Data Products

LSST: From Science Drivers to Reference Design and Anticipated Data Products. , keywords =. doi:10.3847/1538-4357/ab042c , archivePrefix =. 0805.2366 , primaryClass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.3847/1538-4357/ab042c

[5] [5]

Software and Cyberinfrastructure for Astronomy IV , year = 2016, editor =

LSST control software component design. Software and Cyberinfrastructure for Astronomy IV , year = 2016, editor =. doi:10.1117/12.2231796 , adsurl =

work page doi:10.1117/12.2231796 2016

[6] [6]

Ground-based and Airborne Telescopes VI , year = 2016, editor =

LSST communications middleware implementation. Ground-based and Airborne Telescopes VI , year = 2016, editor =. doi:10.1117/12.2233099 , adsurl =

work page doi:10.1117/12.2233099 2016

[7] [7]

Software and Cyberinfrastructure for Astronomy VIII , year = 2024, editor =

Replacing DDS with Apache Kafka as middleware technology for the Rubin Observatory control system. Software and Cyberinfrastructure for Astronomy VIII , year = 2024, editor =. doi:10.1117/12.3020002 , adsurl =

work page doi:10.1117/12.3020002 2024

[8] [8]

doi:10.71929/rubin/2571927 , url =

The LSST Camera (LSSTCam). doi:10.71929/rubin/2571927 , url =

work page doi:10.71929/rubin/2571927

[9] [9]

doi:10.71929/rubin/2561361 , url =

The LSST Commissioning Camera (LSSTComCam). doi:10.71929/rubin/2561361 , url =

work page doi:10.71929/rubin/2561361