The FACTS of Technology-Assisted Sensitivity Review

Craig Macdonald; Graham McDonald; Iadh Ounis

arxiv: 1907.02956 · v1 · pith:3MHS6QBQnew · submitted 2019-07-05 · 💻 cs.CY · cs.IR

The FACTS of Technology-Assisted Sensitivity Review

Graham McDonald , Craig Macdonald , Iadh Ounis This is my paper

Pith reviewed 2026-05-25 01:57 UTC · model grok-4.3

classification 💻 cs.CY cs.IR

keywords sensitivity reviewfreedom of informationtechnology-assisted reviewfairnessaccountabilityconfidentialitytransparencysafety

0 comments

The pith

Technology is needed to assist human sensitivity reviewers for born-digital government documents, but must address issues of fairness, accountability, confidentiality, transparency and safety.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the adoption of born-digital documents such as email makes purely manual sensitivity review impractical under Freedom of Information laws, creating a need for technology to assist reviewers in identifying sensitive information. It examines the impact of FACTS issues on such technology-assisted processes. A reader would care because these technologies must balance public access to information with protection of sensitive data without introducing new problems. The authors also highlight important areas for future research on applying these principles.

Core claim

With the adoption of born-digital documents, human-only sensitivity review is not practical and there is a need for new technologies to assist human sensitivity reviewers; issues of fairness, accountability, confidentiality, transparency and safety (FACTS) impact technology-assisted sensitivity review.

What carries the argument

The FACTS principles (fairness, accountability, confidentiality, transparency, and safety) and how they apply to technology-assisted sensitivity review of government documents.

If this is right

Technology-assisted systems must ensure fairness in how sensitive information is detected across different types of documents and content.
Accountability requires that the use of technology in review processes can be explained and justified to the public.
Confidentiality must be preserved when technology processes potentially sensitive government records.
Transparency in the technology's decision-making is necessary for public trust in the review process.
Safety measures are needed to prevent technology from exposing or mishandling sensitive information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Without addressing FACTS, technology assistance could lead to inconsistent protection of sensitive information or reduced public confidence in government transparency.
Similar FACTS considerations may apply to technology-assisted review in other regulated domains such as corporate compliance or healthcare records.
Future tools could integrate FACTS checks directly into document management systems to streamline the process.

Load-bearing premise

The assumption that human-only sensitivity review of born-digital documents like email is not practical due to the volume and nature of such records.

What would settle it

Evidence that human reviewers can practically handle the volume of born-digital government documents without technological assistance, or a successful manual review process that scales with digital records.

Figures

Figures reproduced from arXiv: 1907.02956 by Craig Macdonald, Graham McDonald, Iadh Ounis.

read the original abstract

At least ninety countries implement Freedom of Information laws that state that government documents must be made freely available, or opened, to the public. However, many government documents contain sensitive information, such as personal or confidential information. Therefore, all government documents that are opened to the public must first be reviewed to identify, and protect, any sensitive information. Historically, sensitivity review has been a completely manual process. However, with the adoption of born-digital documents, such as e-mail, human-only sensitivity review is not practical and there is a need for new technologies to assist human sensitivity reviewers. In this paper, we discuss how issues of fairness, accountability, confidentiality, transparency and safety (FACTS) impact technology-assisted sensitivity review. Moreover, we outline some important areas of future FACTS research that will need to be addressed within technology-assisted sensitivity review.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Position paper that flags standard AI ethics issues for tech-assisted sensitivity review of government documents but offers no new methods, data, or evidence.

read the letter

The main point is that this paper argues born-digital documents like email make manual sensitivity review for freedom of information requests impractical, so technology assistance is needed, and that fairness, accountability, confidentiality, transparency, and safety (FACTS) must be considered in any such tools. It closes by listing high-level future research areas. That is the entire contribution. The authors are correct that volume and speed of digital records create real pressure on review processes, and mapping the usual ethics checklist onto this domain is a reasonable step. The framing helps connect abstract principles to a concrete public-sector workflow. Beyond that, the paper stays at the level of discussion. It provides no numbers on review backlogs, no examples of current failures, no proposed classifier or interface, and no evaluation of how FACTS issues would actually arise in practice. The central practicality claim is asserted rather than examined. This is a position paper, not a research contribution with verifiable results. It would be of interest to people working on AI applications in archives or digital government who are already thinking about ethics and want a short orientation to the sensitivity-review setting. Most readers looking for technical substance or empirical grounding will find little to use. I would not bring it to a reading group focused on methods or results. It is not the kind of work I would cite. A serious editor should desk-reject rather than send it for peer review, as it lacks the grounding or novelty that justifies referee time.

Referee Report

1 major / 2 minor

Summary. The manuscript is a position paper claiming that the adoption of born-digital documents (e.g., e-mail) renders human-only sensitivity review of government documents under Freedom of Information laws impractical, thereby necessitating technology-assisted approaches. It argues that issues of fairness, accountability, confidentiality, transparency, and safety (FACTS) must be considered in such technologies and outlines key areas for future FACTS-related research.

Significance. If the motivating premise holds, the paper usefully frames an interdisciplinary challenge at the intersection of public records law and ethical AI deployment. It could serve as a conceptual starting point for research on assistive tools that respect legal disclosure requirements while addressing ethical risks, though its value depends on acceptance of the practicality claim as scene-setting rather than a tested assertion.

major comments (1)

[Abstract] Abstract: The assertion that 'human-only sensitivity review is not practical' for born-digital documents is stated as fact without supporting data, references, statistics on document volumes, or case examples. This premise is load-bearing for the motivation of technology assistance and the entire FACTS discussion that follows.

minor comments (2)

[Abstract] The statistic 'at least ninety countries' would benefit from a supporting citation or reference.
The discussion of FACTS components and future research areas could be made more concrete with brief examples of how each FACTS issue might manifest in sensitivity review tools.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that 'human-only sensitivity review is not practical' for born-digital documents is stated as fact without supporting data, references, statistics on document volumes, or case examples. This premise is load-bearing for the motivation of technology assistance and the entire FACTS discussion that follows.

Authors: We accept the point. The abstract presents the impracticality of purely manual review for born-digital records as a premise without citations or quantitative support. Although the body of the paper frames this as a known consequence of the shift to electronic records under FOI regimes, the abstract itself does not reference the relevant government reports or archival literature on review backlogs and volume growth. We will revise the abstract to qualify the statement as a motivating premise drawn from the digital-records literature and add one or two supporting references. This change preserves the position-paper character while directly addressing the concern. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a position paper whose content is entirely discursive and motivational. It states that born-digital documents make purely manual sensitivity review impractical and therefore calls for technology assistance whose FACTS implications should be studied. No equations, fitted parameters, models, predictions, or derivations appear anywhere in the text. Consequently there are no load-bearing steps that could reduce by construction to the paper's own inputs, no self-citation chains that function as unverified uniqueness theorems, and no renaming of known results. The practicality premise functions only as scene-setting, not as a derived claim.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a discussion paper on ethical considerations in a sociotechnical system; no mathematical axioms, free parameters, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5667 in / 1062 out tokens · 23679 ms · 2026-05-25T01:57:39.352536+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 1 internal anchor

[1]

Daniel Abril, Guillermo Navarro-Arribas, and Vicenç Torra. 2011. On the Declas- sification of Confidential Documents. In Proc. of MDAI

work page 2011
[2]

Sir Alex Allan. 2015. Government Digital Records and Archives Review. Cabi- net Office. https://www.gov.uk/government/publications/government-digital- records-and-archives-review-by-sir-alex-allan

work page 2015
[3]

Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlingsson, and Dawn Song. 2018. The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets. CoRR abs/1802.08232 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

Gordon V Cormack and Maura R Grossman. 2014. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In Proc. SIGIR

work page 2014
[5]

J Shane Culpepper, Fernando Diaz, and Mark D Smucker. 2018. Research Fron- tiers in Information Retrieval: Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018). In ACM SIGIR Forum

work page 2018
[6]

Chad Cumby and Rayid Ghani. 2011. A Machine Learning Based System for Semi-Automatically Redacting Documents. In Proc. IAAI

work page 2011
[7]

DARPA. 2010. DARPA, New technologies to support declassification. (2010). http://fas.org/sgp/news/2010/09/darpa-declass.pdf

work page 2010
[8]

Franck Dernoncourt, Ji Young Lee, Ozlem Uzuner, and Peter Szolovits. 2017. De-identification of patient notes with recurrent neural networks. Journal of the American Medical Informatics Association 24, 3 (2017), 596–606

work page 2017
[9]

James Gardner and Li Xiong. 2008. HIDE: an integrated system for health information DE-identification. In Proc. International Symposium on Computer- Based Medical Systems

work page 2008
[10]

Yikun Guo, Robert Gaizauskas, Ian Roberts, George Demetriou, Mark Hepple, et al

work page
[11]

Identifying personal health information using support vector machines. In Proc. i2b2 workshop on challenges in natural language processing for clinical data

work page
[12]

Dilip Gupta, Melissa Saul, and John Gilbertson. 2004. Evaluation of a deidentifi- cation (De-Id) software engine to share pathology reports and clinical documents for research. American journal of clinical pathology 121, 2 (2004), 176–186

work page 2004
[13]

Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. 2002. Gene selection for cancer classification using support vector machines. Machine learning 46, 1-3 (2002), 389–422

work page 2002
[14]

Sarah M Kalis. 2014. Google Spain SL, Google Inc. v. Agencia Espanola de Proteccion de Datos, Mario Costeja Gonzalez: An Entitlement to Erasure and Its Endlenss Effects. Tul. J. Int’l & Comp. L. 23 (2014), 589

work page 2014
[15]

Adjoa Linzy. 2011. The Attorney-Client Privilege and Discovery of Electronically- Stored Information. Duke L. & Tech. Rev. (2011), 1

work page 2011
[16]

Graham McDonald, Craig Macdonald, and Iadh Ounis. 2018. Active Learning Strategies for Technology Assisted Sensitivity Review. In Proc. ECIR

work page 2018
[17]

Graham McDonald, Craig Macdonald, and Iadh Ounis. 2019. How Sensitivity Classification Effectiveness Impacts Reviewers in Technology-Assisted Sensitivity Review. In Proc. CHIIR

work page 2019
[18]

Graham McDonald, Craig Macdonald, Iadh Ounis, and Timothy Gollins. 2014. Towards a classifier for digital sensitivity review. InProc. ECIR

work page 2014
[19]

Ishna Neamatullah, Margaret M Douglass, H Lehman Li-wei, Andrew Reisner, Mauricio Villarroel, William J Long, Peter Szolovits, George B Moody, Roger G Mark, and Gari D Clifford. 2008. Automated de-identification of free-text medical records. BMC medical informatics and decision making 8, 1 (2008), 32

work page 2008
[20]

Douglas W Oard, Jason R Baron, Bruce Hedin, David D Lewis, and Stephen Tomlinson. 2010. Evaluation of information retrieval for E-discovery. Artificial Intelligence and Law 18, 4 (2010), 347–386

work page 2010
[21]

Department of Justice. 1996. The Freedom of Information Act 5 U.S.C. s 552, AS AMENDED BY PUBLIC LAW NO. 104-231, 110 STAT

work page 1996
[22]

https://www.justice.gov/oip/blog/foia-update-freedom-information-act-5- usc-sect-552-amended-public-law-no-104-231-110-stat

work page
[23]

Roy Peled and Yoram Rabin. 2010. The constitutional right to information.Colum. Hum. Rts. L. Rev. 42 (2010), 357

work page 2010
[24]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proc. SIGKDD

work page 2016
[25]

Fabrizio Sebastiani. 2002. Machine Learning in Automated Text Categorization. ACM Comput. Surv. 34, 1 (2002), 1–47

work page 2002
[26]

Burr Settles. 2012. Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6, 1 (2012), 1–114

work page 2012
[27]

Latanya Sweeney. 1996. Replacing personally-identifying information in medical records, the Scrub system.. In Proc. American Medical Informatics Association annual fall symposium

work page 1996
[28]

The National Archives. 2016. The Application of Technology-Assisted Re- view to Born-Digital Records Transfer, Inquiries and Beyond. The National Archives. http://www.nationalarchives.gov.uk/documents/technology-assisted- review-to-born-digital-records-transfer.pdf

work page 2016
[29]

The National Archives. 2017. Digital Strategy. http://www.nationalarchives.gov.uk/documents/the-national-archives-digital- strategy-2017-19.pdf

work page 2017
[30]

Alistair G Tough. 2018. The Scope and Appetite for Technology-Assisted Sen- sitivity Reviewing of Born-Digital Records in a Resource Poor Environment: A Case Study From Malawi. In Handbook of Research on Heritage Management and Preservation. IGI Global, 175–182

work page 2018
[31]

UK Government. 1958. Public Records Act 1958 c. 51. http://www.legislation.gov.uk/ukpga/Eliz2/6-7/51

work page 1958
[32]

UK Government. 2000. Freedom of Information Act 2000 c. 36. https://www.legislation.gov.uk/ukpga/2000/36/contents

work page 2000
[33]

UK Government. 2010. Equality Act 2010 c. 15. https://www.legislation.gov.uk/ukpga/2010/15/contents

work page 2010
[34]

Özlem Uzuner, Tawanda C Sibanda, Yuan Luo, and Peter Szolovits. 2008. A de-identifier for medical discharge summaries. Artificial intelligence in medicine 42, 1 (2008), 13–35

work page 2008
[35]

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proc. NAACL-HLT

work page 2016
[36]

Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level Convolu- tional Networks for Text Classification. In Proc. NIPS

work page 2015

[1] [1]

Daniel Abril, Guillermo Navarro-Arribas, and Vicenç Torra. 2011. On the Declas- sification of Confidential Documents. In Proc. of MDAI

work page 2011

[2] [2]

Sir Alex Allan. 2015. Government Digital Records and Archives Review. Cabi- net Office. https://www.gov.uk/government/publications/government-digital- records-and-archives-review-by-sir-alex-allan

work page 2015

[3] [3]

Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlingsson, and Dawn Song. 2018. The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets. CoRR abs/1802.08232 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

Gordon V Cormack and Maura R Grossman. 2014. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In Proc. SIGIR

work page 2014

[5] [5]

J Shane Culpepper, Fernando Diaz, and Mark D Smucker. 2018. Research Fron- tiers in Information Retrieval: Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018). In ACM SIGIR Forum

work page 2018

[6] [6]

Chad Cumby and Rayid Ghani. 2011. A Machine Learning Based System for Semi-Automatically Redacting Documents. In Proc. IAAI

work page 2011

[7] [7]

DARPA. 2010. DARPA, New technologies to support declassification. (2010). http://fas.org/sgp/news/2010/09/darpa-declass.pdf

work page 2010

[8] [8]

Franck Dernoncourt, Ji Young Lee, Ozlem Uzuner, and Peter Szolovits. 2017. De-identification of patient notes with recurrent neural networks. Journal of the American Medical Informatics Association 24, 3 (2017), 596–606

work page 2017

[9] [9]

James Gardner and Li Xiong. 2008. HIDE: an integrated system for health information DE-identification. In Proc. International Symposium on Computer- Based Medical Systems

work page 2008

[10] [10]

Yikun Guo, Robert Gaizauskas, Ian Roberts, George Demetriou, Mark Hepple, et al

work page

[11] [11]

Identifying personal health information using support vector machines. In Proc. i2b2 workshop on challenges in natural language processing for clinical data

work page

[12] [12]

Dilip Gupta, Melissa Saul, and John Gilbertson. 2004. Evaluation of a deidentifi- cation (De-Id) software engine to share pathology reports and clinical documents for research. American journal of clinical pathology 121, 2 (2004), 176–186

work page 2004

[13] [13]

Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. 2002. Gene selection for cancer classification using support vector machines. Machine learning 46, 1-3 (2002), 389–422

work page 2002

[14] [14]

Sarah M Kalis. 2014. Google Spain SL, Google Inc. v. Agencia Espanola de Proteccion de Datos, Mario Costeja Gonzalez: An Entitlement to Erasure and Its Endlenss Effects. Tul. J. Int’l & Comp. L. 23 (2014), 589

work page 2014

[15] [15]

Adjoa Linzy. 2011. The Attorney-Client Privilege and Discovery of Electronically- Stored Information. Duke L. & Tech. Rev. (2011), 1

work page 2011

[16] [16]

Graham McDonald, Craig Macdonald, and Iadh Ounis. 2018. Active Learning Strategies for Technology Assisted Sensitivity Review. In Proc. ECIR

work page 2018

[17] [17]

Graham McDonald, Craig Macdonald, and Iadh Ounis. 2019. How Sensitivity Classification Effectiveness Impacts Reviewers in Technology-Assisted Sensitivity Review. In Proc. CHIIR

work page 2019

[18] [18]

Graham McDonald, Craig Macdonald, Iadh Ounis, and Timothy Gollins. 2014. Towards a classifier for digital sensitivity review. InProc. ECIR

work page 2014

[19] [19]

Ishna Neamatullah, Margaret M Douglass, H Lehman Li-wei, Andrew Reisner, Mauricio Villarroel, William J Long, Peter Szolovits, George B Moody, Roger G Mark, and Gari D Clifford. 2008. Automated de-identification of free-text medical records. BMC medical informatics and decision making 8, 1 (2008), 32

work page 2008

[20] [20]

Douglas W Oard, Jason R Baron, Bruce Hedin, David D Lewis, and Stephen Tomlinson. 2010. Evaluation of information retrieval for E-discovery. Artificial Intelligence and Law 18, 4 (2010), 347–386

work page 2010

[21] [21]

Department of Justice. 1996. The Freedom of Information Act 5 U.S.C. s 552, AS AMENDED BY PUBLIC LAW NO. 104-231, 110 STAT

work page 1996

[22] [22]

https://www.justice.gov/oip/blog/foia-update-freedom-information-act-5- usc-sect-552-amended-public-law-no-104-231-110-stat

work page

[23] [23]

Roy Peled and Yoram Rabin. 2010. The constitutional right to information.Colum. Hum. Rts. L. Rev. 42 (2010), 357

work page 2010

[24] [24]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proc. SIGKDD

work page 2016

[25] [25]

Fabrizio Sebastiani. 2002. Machine Learning in Automated Text Categorization. ACM Comput. Surv. 34, 1 (2002), 1–47

work page 2002

[26] [26]

Burr Settles. 2012. Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6, 1 (2012), 1–114

work page 2012

[27] [27]

Latanya Sweeney. 1996. Replacing personally-identifying information in medical records, the Scrub system.. In Proc. American Medical Informatics Association annual fall symposium

work page 1996

[28] [28]

The National Archives. 2016. The Application of Technology-Assisted Re- view to Born-Digital Records Transfer, Inquiries and Beyond. The National Archives. http://www.nationalarchives.gov.uk/documents/technology-assisted- review-to-born-digital-records-transfer.pdf

work page 2016

[29] [29]

The National Archives. 2017. Digital Strategy. http://www.nationalarchives.gov.uk/documents/the-national-archives-digital- strategy-2017-19.pdf

work page 2017

[30] [30]

Alistair G Tough. 2018. The Scope and Appetite for Technology-Assisted Sen- sitivity Reviewing of Born-Digital Records in a Resource Poor Environment: A Case Study From Malawi. In Handbook of Research on Heritage Management and Preservation. IGI Global, 175–182

work page 2018

[31] [31]

UK Government. 1958. Public Records Act 1958 c. 51. http://www.legislation.gov.uk/ukpga/Eliz2/6-7/51

work page 1958

[32] [32]

UK Government. 2000. Freedom of Information Act 2000 c. 36. https://www.legislation.gov.uk/ukpga/2000/36/contents

work page 2000

[33] [33]

UK Government. 2010. Equality Act 2010 c. 15. https://www.legislation.gov.uk/ukpga/2010/15/contents

work page 2010

[34] [34]

Özlem Uzuner, Tawanda C Sibanda, Yuan Luo, and Peter Szolovits. 2008. A de-identifier for medical discharge summaries. Artificial intelligence in medicine 42, 1 (2008), 13–35

work page 2008

[35] [35]

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proc. NAACL-HLT

work page 2016

[36] [36]

Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level Convolu- tional Networks for Text Classification. In Proc. NIPS

work page 2015