pith. sign in

arxiv: 1907.02956 · v1 · pith:3MHS6QBQnew · submitted 2019-07-05 · 💻 cs.CY · cs.IR

The FACTS of Technology-Assisted Sensitivity Review

Pith reviewed 2026-05-25 01:57 UTC · model grok-4.3

classification 💻 cs.CY cs.IR
keywords sensitivity reviewfreedom of informationtechnology-assisted reviewfairnessaccountabilityconfidentialitytransparencysafety
0
0 comments X

The pith

Technology is needed to assist human sensitivity reviewers for born-digital government documents, but must address issues of fairness, accountability, confidentiality, transparency and safety.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the adoption of born-digital documents such as email makes purely manual sensitivity review impractical under Freedom of Information laws, creating a need for technology to assist reviewers in identifying sensitive information. It examines the impact of FACTS issues on such technology-assisted processes. A reader would care because these technologies must balance public access to information with protection of sensitive data without introducing new problems. The authors also highlight important areas for future research on applying these principles.

Core claim

With the adoption of born-digital documents, human-only sensitivity review is not practical and there is a need for new technologies to assist human sensitivity reviewers; issues of fairness, accountability, confidentiality, transparency and safety (FACTS) impact technology-assisted sensitivity review.

What carries the argument

The FACTS principles (fairness, accountability, confidentiality, transparency, and safety) and how they apply to technology-assisted sensitivity review of government documents.

If this is right

  • Technology-assisted systems must ensure fairness in how sensitive information is detected across different types of documents and content.
  • Accountability requires that the use of technology in review processes can be explained and justified to the public.
  • Confidentiality must be preserved when technology processes potentially sensitive government records.
  • Transparency in the technology's decision-making is necessary for public trust in the review process.
  • Safety measures are needed to prevent technology from exposing or mishandling sensitive information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Without addressing FACTS, technology assistance could lead to inconsistent protection of sensitive information or reduced public confidence in government transparency.
  • Similar FACTS considerations may apply to technology-assisted review in other regulated domains such as corporate compliance or healthcare records.
  • Future tools could integrate FACTS checks directly into document management systems to streamline the process.

Load-bearing premise

The assumption that human-only sensitivity review of born-digital documents like email is not practical due to the volume and nature of such records.

What would settle it

Evidence that human reviewers can practically handle the volume of born-digital government documents without technological assistance, or a successful manual review process that scales with digital records.

Figures

Figures reproduced from arXiv: 1907.02956 by Craig Macdonald, Graham McDonald, Iadh Ounis.

Figure 1
Figure 1. Figure 1: Technology-assisted sensitivity review: input, process and output. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
read the original abstract

At least ninety countries implement Freedom of Information laws that state that government documents must be made freely available, or opened, to the public. However, many government documents contain sensitive information, such as personal or confidential information. Therefore, all government documents that are opened to the public must first be reviewed to identify, and protect, any sensitive information. Historically, sensitivity review has been a completely manual process. However, with the adoption of born-digital documents, such as e-mail, human-only sensitivity review is not practical and there is a need for new technologies to assist human sensitivity reviewers. In this paper, we discuss how issues of fairness, accountability, confidentiality, transparency and safety (FACTS) impact technology-assisted sensitivity review. Moreover, we outline some important areas of future FACTS research that will need to be addressed within technology-assisted sensitivity review.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript is a position paper claiming that the adoption of born-digital documents (e.g., e-mail) renders human-only sensitivity review of government documents under Freedom of Information laws impractical, thereby necessitating technology-assisted approaches. It argues that issues of fairness, accountability, confidentiality, transparency, and safety (FACTS) must be considered in such technologies and outlines key areas for future FACTS-related research.

Significance. If the motivating premise holds, the paper usefully frames an interdisciplinary challenge at the intersection of public records law and ethical AI deployment. It could serve as a conceptual starting point for research on assistive tools that respect legal disclosure requirements while addressing ethical risks, though its value depends on acceptance of the practicality claim as scene-setting rather than a tested assertion.

major comments (1)
  1. [Abstract] Abstract: The assertion that 'human-only sensitivity review is not practical' for born-digital documents is stated as fact without supporting data, references, statistics on document volumes, or case examples. This premise is load-bearing for the motivation of technology assistance and the entire FACTS discussion that follows.
minor comments (2)
  1. [Abstract] The statistic 'at least ninety countries' would benefit from a supporting citation or reference.
  2. The discussion of FACTS components and future research areas could be made more concrete with brief examples of how each FACTS issue might manifest in sensitivity review tools.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that 'human-only sensitivity review is not practical' for born-digital documents is stated as fact without supporting data, references, statistics on document volumes, or case examples. This premise is load-bearing for the motivation of technology assistance and the entire FACTS discussion that follows.

    Authors: We accept the point. The abstract presents the impracticality of purely manual review for born-digital records as a premise without citations or quantitative support. Although the body of the paper frames this as a known consequence of the shift to electronic records under FOI regimes, the abstract itself does not reference the relevant government reports or archival literature on review backlogs and volume growth. We will revise the abstract to qualify the statement as a motivating premise drawn from the digital-records literature and add one or two supporting references. This change preserves the position-paper character while directly addressing the concern. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a position paper whose content is entirely discursive and motivational. It states that born-digital documents make purely manual sensitivity review impractical and therefore calls for technology assistance whose FACTS implications should be studied. No equations, fitted parameters, models, predictions, or derivations appear anywhere in the text. Consequently there are no load-bearing steps that could reduce by construction to the paper's own inputs, no self-citation chains that function as unverified uniqueness theorems, and no renaming of known results. The practicality premise functions only as scene-setting, not as a derived claim.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a discussion paper on ethical considerations in a sociotechnical system; no mathematical axioms, free parameters, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5667 in / 1062 out tokens · 23679 ms · 2026-05-25T01:57:39.352536+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

  1. [1]

    Daniel Abril, Guillermo Navarro-Arribas, and Vicenç Torra. 2011. On the Declas- sification of Confidential Documents. In Proc. of MDAI

  2. [2]

    Sir Alex Allan. 2015. Government Digital Records and Archives Review. Cabi- net Office. https://www.gov.uk/government/publications/government-digital- records-and-archives-review-by-sir-alex-allan

  3. [3]

    Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlingsson, and Dawn Song. 2018. The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets. CoRR abs/1802.08232 (2018)

  4. [4]

    Gordon V Cormack and Maura R Grossman. 2014. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In Proc. SIGIR

  5. [5]

    J Shane Culpepper, Fernando Diaz, and Mark D Smucker. 2018. Research Fron- tiers in Information Retrieval: Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018). In ACM SIGIR Forum

  6. [6]

    Chad Cumby and Rayid Ghani. 2011. A Machine Learning Based System for Semi-Automatically Redacting Documents. In Proc. IAAI

  7. [7]

    DARPA. 2010. DARPA, New technologies to support declassification. (2010). http://fas.org/sgp/news/2010/09/darpa-declass.pdf

  8. [8]

    Franck Dernoncourt, Ji Young Lee, Ozlem Uzuner, and Peter Szolovits. 2017. De-identification of patient notes with recurrent neural networks. Journal of the American Medical Informatics Association 24, 3 (2017), 596–606

  9. [9]

    James Gardner and Li Xiong. 2008. HIDE: an integrated system for health information DE-identification. In Proc. International Symposium on Computer- Based Medical Systems

  10. [10]

    Yikun Guo, Robert Gaizauskas, Ian Roberts, George Demetriou, Mark Hepple, et al

  11. [11]

    Identifying personal health information using support vector machines. In Proc. i2b2 workshop on challenges in natural language processing for clinical data

  12. [12]

    Dilip Gupta, Melissa Saul, and John Gilbertson. 2004. Evaluation of a deidentifi- cation (De-Id) software engine to share pathology reports and clinical documents for research. American journal of clinical pathology 121, 2 (2004), 176–186

  13. [13]

    Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. 2002. Gene selection for cancer classification using support vector machines. Machine learning 46, 1-3 (2002), 389–422

  14. [14]

    Sarah M Kalis. 2014. Google Spain SL, Google Inc. v. Agencia Espanola de Proteccion de Datos, Mario Costeja Gonzalez: An Entitlement to Erasure and Its Endlenss Effects. Tul. J. Int’l & Comp. L. 23 (2014), 589

  15. [15]

    Adjoa Linzy. 2011. The Attorney-Client Privilege and Discovery of Electronically- Stored Information. Duke L. & Tech. Rev. (2011), 1

  16. [16]

    Graham McDonald, Craig Macdonald, and Iadh Ounis. 2018. Active Learning Strategies for Technology Assisted Sensitivity Review. In Proc. ECIR

  17. [17]

    Graham McDonald, Craig Macdonald, and Iadh Ounis. 2019. How Sensitivity Classification Effectiveness Impacts Reviewers in Technology-Assisted Sensitivity Review. In Proc. CHIIR

  18. [18]

    Graham McDonald, Craig Macdonald, Iadh Ounis, and Timothy Gollins. 2014. Towards a classifier for digital sensitivity review. InProc. ECIR

  19. [19]

    Ishna Neamatullah, Margaret M Douglass, H Lehman Li-wei, Andrew Reisner, Mauricio Villarroel, William J Long, Peter Szolovits, George B Moody, Roger G Mark, and Gari D Clifford. 2008. Automated de-identification of free-text medical records. BMC medical informatics and decision making 8, 1 (2008), 32

  20. [20]

    Douglas W Oard, Jason R Baron, Bruce Hedin, David D Lewis, and Stephen Tomlinson. 2010. Evaluation of information retrieval for E-discovery. Artificial Intelligence and Law 18, 4 (2010), 347–386

  21. [21]

    Department of Justice. 1996. The Freedom of Information Act 5 U.S.C. s 552, AS AMENDED BY PUBLIC LAW NO. 104-231, 110 STAT

  22. [22]

    https://www.justice.gov/oip/blog/foia-update-freedom-information-act-5- usc-sect-552-amended-public-law-no-104-231-110-stat

  23. [23]

    Roy Peled and Yoram Rabin. 2010. The constitutional right to information.Colum. Hum. Rts. L. Rev. 42 (2010), 357

  24. [24]

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proc. SIGKDD

  25. [25]

    Fabrizio Sebastiani. 2002. Machine Learning in Automated Text Categorization. ACM Comput. Surv. 34, 1 (2002), 1–47

  26. [26]

    Burr Settles. 2012. Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6, 1 (2012), 1–114

  27. [27]

    Latanya Sweeney. 1996. Replacing personally-identifying information in medical records, the Scrub system.. In Proc. American Medical Informatics Association annual fall symposium

  28. [28]

    The National Archives. 2016. The Application of Technology-Assisted Re- view to Born-Digital Records Transfer, Inquiries and Beyond. The National Archives. http://www.nationalarchives.gov.uk/documents/technology-assisted- review-to-born-digital-records-transfer.pdf

  29. [29]

    The National Archives. 2017. Digital Strategy. http://www.nationalarchives.gov.uk/documents/the-national-archives-digital- strategy-2017-19.pdf

  30. [30]

    Alistair G Tough. 2018. The Scope and Appetite for Technology-Assisted Sen- sitivity Reviewing of Born-Digital Records in a Resource Poor Environment: A Case Study From Malawi. In Handbook of Research on Heritage Management and Preservation. IGI Global, 175–182

  31. [31]

    UK Government. 1958. Public Records Act 1958 c. 51. http://www.legislation.gov.uk/ukpga/Eliz2/6-7/51

  32. [32]

    UK Government. 2000. Freedom of Information Act 2000 c. 36. https://www.legislation.gov.uk/ukpga/2000/36/contents

  33. [33]

    UK Government. 2010. Equality Act 2010 c. 15. https://www.legislation.gov.uk/ukpga/2010/15/contents

  34. [34]

    Özlem Uzuner, Tawanda C Sibanda, Yuan Luo, and Peter Szolovits. 2008. A de-identifier for medical discharge summaries. Artificial intelligence in medicine 42, 1 (2008), 13–35

  35. [35]

    Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proc. NAACL-HLT

  36. [36]

    Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level Convolu- tional Networks for Text Classification. In Proc. NIPS