pith. sign in

arxiv: 1907.03688 · v1 · pith:6PEEUKGKnew · submitted 2019-07-08 · 💻 cs.DC

Enabling Microsoft OneDrive Integration with HTCondor

Pith reviewed 2026-05-25 00:48 UTC · model grok-4.3

classification 💻 cs.DC
keywords HTCondorOneDriveOAuth credentialsdata distributiondistributed computingcyberinfrastructurecredential managementOpen Science Grid
0
0 comments X

The pith

HTCondor now acquires, renews, and transfers OneDrive OAuth credentials automatically so users skip manual credential handling for distributed jobs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that recent HTCondor changes let the scheduler manage OAuth tokens for Microsoft OneDrive on the user's behalf, removing the need to copy credentials with each job. It describes OneDrive features such as automatic desktop synchronization and contrasts the approach with data distribution methods used on the Open Science Grid. A sympathetic reader would care because simpler credential handling could reduce the steps researchers must take to make data available to computing resources. The work focuses on qualitative advantages in ease of use rather than measured performance gains.

Core claim

By leveraging HTCondor's scheduler to handle acquisition, renewal, and secure transfer of OAuth credentials, OneDrive can serve as an easy-to-use data distribution method for distributed computing. Users no longer perform multiple manual steps with unfamiliar tools; the system takes care of access on their behalf. The paper presents this integration through descriptions of OneDrive capabilities and a comparison to existing national cyberinfrastructure practices.

What carries the argument

HTCondor scheduler-managed OAuth credential handling that acquires, renews, and transfers tokens without user intervention for OneDrive access.

If this is right

  • Users avoid copying credentials along with job submissions.
  • The scheduler handles credential renewal and secure transfer automatically.
  • OneDrive desktop clients can provide automatic synchronization to computing resources.
  • Researchers spend less time learning specialized cyberinfrastructure tools for data access.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same credential automation could apply to other OAuth-based storage services.
  • Wider adoption might increase use of distributed computing among researchers who previously found setup too cumbersome.
  • National cyberinfrastructure operators could evaluate similar integrations for additional cloud providers.

Load-bearing premise

That a description of OneDrive features and a comparison to other data distribution methods suffices to establish improved ease of use.

What would settle it

A timed user study comparing setup duration for data distribution with and without the OneDrive-HTCondor integration.

Figures

Figures reproduced from arXiv: 1907.03688 by Derek Weitzel.

Figure 1
Figure 1. Figure 1: Windows OneDrive Folder View Users are familiar with accessing and editing files from within their own desktop’s folder view. Other data distribution techniques require copying the file to separate storage services with unfamiliar tools such as scp or rsync [15]. Additionally, OneDrive is pro￾vided along with a Office 365 account, which many univerities have subscribed. Until recently, users of OneDrive or… view at source ↗
Figure 2
Figure 2. Figure 2: displays the job and token flow for a submission to HTCondor. The user submits their first job with condor_submit. The output of condor_submit will display a URL that the user must visit to acquire credentials. Once the credentials are acquired, the user resubmits the job with condor_submit. The job is transferred to the execution host as well as the credentials. The credentials are transferred along with … view at source ↗
Figure 5
Figure 5. Figure 5: HTCondor CredMon Web Page 3.3 Client A new client was written to utilize the OAuth tokens received from Azure [16]. Other clients for OneDrive also use OAuth tokens, but request the token on their own. The client that works with HTCon￾dor must accept an existing token and not require re-authorization when running on remote hosts. Additionally, the clients cannot parse the token format that HTCondor creates… view at source ↗
Figure 3
Figure 3. Figure 3: Example OAuth HTCondor Submit File In [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: Client command line To download a file, the client first retrieves the access token. The token stored on the remote resource in a directory pointed to by the _CONDOR_CREDS environment variable. The token file will be named onedrive.use. The token file contains a JSON data structure with information about the token, as well as the token itself. The client only needs the access token. The client then creates… view at source ↗
Figure 4
Figure 4. Figure 4: Credential Prompt The user will copy the URL from the prompt and visit the website. The user clicks on the "login" button that will redirect the user to the OneDrive login screen. The screen is shown in [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: HTTP Headers for Azure Graph OneDrive request [PITH_FULL_IMAGE:figures/full_fig_p003_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Syracuse Download Comparison File Sizes (MB) 0 100 200 300 400 500 600 700 800 900 Transfer Speed (mbps) 23 170 468 493 2335 HTTP HTTP Cached OneDrive OneDrive Cached stashcp stashcp Cached Transfer HTTP HTTP Cached OneDrive OneDrive Cached stashcp stashcp Cached Transfer HTTP HTTP Cached OneDrive OneDrive Cached stashcp stashcp Cached Transfer HTTP HTTP Cached OneDrive OneDrive Cached stashcp stashcp Cach… view at source ↗
Figure 9
Figure 9. Figure 9: Colorado Download Comparison transfers between research institutions. The caches and the origins of data for the HTTP and StashCache data are also connected to the research networks. In contrast, [PITH_FULL_IMAGE:figures/full_fig_p004_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Bellarmine Download Speed Comparison File Sizes (MB) 0 100 200 300 400 500 600 700 800 900 1,000 Transfer Speed (mbps) 23 170 468 493 2335 HTTP HTTP Cached OneDrive OneDrive Cached stashcp stashcp Cached Transfer HTTP HTTP Cached OneDrive OneDrive Cached stashcp stashcp Cached Transfer HTTP HTTP Cached OneDrive OneDrive Cached stashcp stashcp Cached Transfer HTTP HTTP Cached OneDrive OneDrive Cached stash… view at source ↗
Figure 11
Figure 11. Figure 11: University of Chicago Download Speed Compari [PITH_FULL_IMAGE:figures/full_fig_p005_11.png] view at source ↗
read the original abstract

Accessing data from distributed computing is essential in many workflows, but can be complicated for users of cyberinfrastructure. They must perform multiple steps to make data available to distributed computing using unfamiliar tools. Further, most research on data distribution has focused on the efficiency of providing data to computing resources rather than considering the ease of use for distributing data. Creating an easy to use data distribution method can reduce the time researchers spend learning cyberinfrastructure and increase its usefulness. Microsoft OneDrive is a online storage solution providing both file storage and sharing. OneDrive provides many different clients to access data stored in the service. It provides many features that users of cyberinfrastructure could find useful such as automatic synchronization with desktop clients. A barrier to using services such as OneDrive is the credential management necessary to access the service. Recent innovations in HTCondor have allowed the management of OAuth credentials to be handled by the scheduler on the user's behalf. The user no longer has to copy credentials along with the job, HTCondor will handle the acquisition, renewal, and secure transfer of credentials on the user's behalf. In this paper, I will focus on providing an easy to use data distribution method utilizing Microsoft OneDrive. Measuring ease of use is difficult, therefore I will will describe the features and advantages of using OneDrive. Additionally, I will compare it to measurements of data distribution methods currently used on a national cyberinfastructure, the Open Science Grid.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper describes an integration between Microsoft OneDrive and HTCondor that leverages recent HTCondor innovations in OAuth credential management. The scheduler handles credential acquisition, renewal, and secure transfer on the user's behalf, eliminating the need for users to copy credentials with jobs. OneDrive features such as automatic desktop synchronization are presented as advantages for cyberinfrastructure users. The central claim is that this yields an easy-to-use data distribution method that reduces researcher time on cyberinfrastructure; the paper explicitly notes the difficulty of measuring ease of use and therefore limits itself to feature descriptions plus a qualitative comparison to data distribution methods on the Open Science Grid.

Significance. If the integration demonstrably simplifies access, it could provide a practical systems contribution by bridging familiar consumer cloud storage with HTC workflows, reducing the learning curve for distributed computing. The description of HTCondor's credential-handling mechanism is a clear, useful systems detail. However, the complete absence of any empirical validation, timing data, or user studies means the significance of the usability claim cannot be assessed from the manuscript.

major comments (2)
  1. [Abstract] Abstract: The central claim that the integration provides an 'easy to use data distribution method' that reduces researcher time is unsupported by any quantitative evidence. The manuscript states that 'Measuring ease of use is difficult' and therefore offers only feature descriptions and a qualitative comparison; this leaves the primary motivation unsubstantiated.
  2. [Abstract] Abstract: The text promises to 'compare it to measurements of data distribution methods currently used on a national cyberinfrastructure, the Open Science Grid,' yet no measurements, tables, step counts, or analysis of OSG methods appear in the manuscript.
minor comments (2)
  1. [Abstract] Typo: 'I will will describe' should read 'I will describe'.
  2. [Abstract] Grammar: 'Microsoft OneDrive is a online storage solution' should be 'an online storage solution'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their review and comments. We address the major comments point by point below, with proposed revisions to the abstract to ensure consistency with the manuscript content.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the integration provides an 'easy to use data distribution method' that reduces researcher time is unsupported by any quantitative evidence. The manuscript states that 'Measuring ease of use is difficult' and therefore offers only feature descriptions and a qualitative comparison; this leaves the primary motivation unsubstantiated.

    Authors: We agree that the manuscript provides no quantitative evidence supporting the ease-of-use claim, consistent with the explicit statement that measuring ease of use is difficult. The contribution is a description of the HTCondor-OneDrive integration, its credential management features, and a qualitative comparison to existing practices. We will revise the abstract to remove or qualify the unsubstantiated claim about reducing researcher time and instead focus on the technical integration and features provided. revision: yes

  2. Referee: [Abstract] Abstract: The text promises to 'compare it to measurements of data distribution methods currently used on a national cyberinfrastructure, the Open Science Grid,' yet no measurements, tables, step counts, or analysis of OSG methods appear in the manuscript.

    Authors: The abstract does promise a comparison involving measurements from the Open Science Grid. The manuscript instead delivers a qualitative discussion of differences in credential handling and data distribution practices. This is an inconsistency between the abstract and the body. We will revise the abstract to state that the comparison is qualitative, based on feature descriptions and current OSG practices, without reference to quantitative measurements. revision: yes

Circularity Check

0 steps flagged

No circularity; purely descriptive systems report with no derivations or predictions

full rationale

The paper contains no equations, predictions, fitted parameters, or derivation chains of any kind. It is a feature-description document that acknowledges the difficulty of measuring ease of use and therefore restricts itself to qualitative description of OneDrive and HTCondor capabilities plus a non-quantitative comparison to OSG methods. No self-citations, ansatzes, or uniqueness claims appear, and the central statements are not shown to reduce to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on the domain assumption that HTCondor can reliably manage OAuth credentials without introducing new parameters, axioms, or entities; no free parameters or invented entities are defined.

axioms (1)
  • domain assumption HTCondor scheduler can acquire, renew, and securely transfer OAuth credentials on behalf of users
    Invoked when describing how credential management is handled without user intervention

pith-pipeline@v0.9.0 · 5781 in / 1207 out tokens · 38020 ms · 2026-05-25T00:48:11.143437+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    Georges Aad, JM Butterworth, J Thion, U Bratzler, PN Ratoff, RB Nickerson, JM Seixas, I Grabowska-Bold, F Meisel, S Lokwitz, et al. 2008. The ATLAS experiment at the CERN large hadron collider. Jinst 3 (2008), S08003

  2. [2]

    Barry Blumenfeld, David Dykstra, Lee Lueking, and Eric Wicklund. 2008. CMS conditions data access using FroNTier. In Journal of Physics: Conference Series , Vol. 119. IOP Publishing, 072007

  3. [3]

    Box. 2019. Box. https://www.box.com/

  4. [4]

    Serguei Chatrchyan, EA de Wolf, et al. 2008. The CMS experiment at the CERN LHC. Journal of instrumentation.-Bristol, 2006, currens 3 (2008), S08004–1

  5. [5]

    Dropbox. 2019. Dropbox. https://www.dropbox.com/

  6. [6]

    Ian Foster. 2011. Globus Online: Accelerating and democratizing science through cloud-based services. IEEE Internet Computing 15, 3 (2011), 70–73

  7. [7]

    James Frey. 2002. Condor DAGMan: Handling inter-job dependencies

  8. [8]

    GO. 2019. The Go Programming Language . https://golang.org/

  9. [9]

    Internet2. 2019. Internet2. https://www.internet2.edu/

  10. [10]

    Microsoft. 2019. Microsoft Azure Portal. https://portal.azure.com

  11. [11]

    Microsoft. 2019. Microsoft Graph. https://developer.microsoft.com/en-us/graph

  12. [12]

    Microsoft. 2019. Microsoft OneDrive. https://onedrive.live.com/about/en-us/

  13. [13]

    Ruth Pordes, Don Petravick, Bill Kramer, Doug Olson, Miron Livny, Alain Roy, Paul Avery, Kent Blackburn, Torre Wenaus, Frank Würthwein, et al. 2007. The open science grid. In Journal of Physics: Conference Series , Vol. 78. IOP Publishing, 012057

  14. [14]

    Douglas Thain, Todd Tannenbaum, and Miron Livny. 2005. Distributed computing in practice: the Condor experience. Concurrency and computation: practice and experience 17, 2-4 (2005), 323–356

  15. [15]

    Andrew Tridgell, Paul Mackerras, et al. 1996. The rsync algorithm. (1996)

  16. [16]

    Derek Weitzel. 2019. djw8605/onedrive-oauth: First release of OneDrive oauth. https://doi.org/10.5281/zenodo.3265184

  17. [17]

    Derek Weitzel, Marian Zvada, Ilija Vukotic, Rob Gardner, Brian Bockelman, Mats Rynge, Edgar Fajardo Hernandez, Brian Lin, and Matyas Selmeci. 2019. Stash- Cache: A Distributed Caching Federation for the Open Science Grid. Proceedings of the Practice and Experience on Advanced Research Computing (2019)

  18. [18]

    Alex Withers, Brian Bockelman, Derek Weitzel, Duncan Brown, Jeff Gaynor, Jim Basney, Todd Tannenbaum, and Zach Miller. 2018. SciTokens: Capability- Based Secure Access to Remote Scientific Data. In Proceedings of the Practice and Experience on Advanced Research Computing . ACM, 24