APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation
Pith reviewed 2026-05-07 08:24 UTC · model grok-4.3
The pith
A new expert-annotated corpus and hybrid framework let smaller AI systems summarize and interpret privacy policies with better readability and reliability than GPT-4o or LLaMA-3-70B.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that the APPSI-139 parallel corpus of 139 English privacy policies, containing 15,692 rewritten summary pairs and 36,351 annotations across 11 categories, combined with the TCSI-pp-V2 hybrid framework that coordinates expert modules through alternating training, produces summarization and interpretation outputs superior in readability and reliability to those from GPT-4o and LLaMA-3-70B.
What carries the argument
APPSI-139 provides the domain-specific parallel corpus with expert rewrites and category annotations, while TCSI-pp-V2 supplies the hybrid coordination of expert modules via alternating training to maintain both efficiency and accuracy.
Load-bearing premise
Domain-expert annotations serve as objective ground truth for legal clarity and the chosen readability and reliability metrics accurately reflect real-world usefulness without systematic bias from annotator selection or test-set construction.
What would settle it
A follow-up evaluation on a fresh collection of privacy policies, using blind human ratings of comprehension and decision accuracy, would show the hybrid system no longer outperforms GPT-4o.
Figures
read the original abstract
Privacy policies are essential for users to understand how service providers handle their personal data. However, these documents are often long and complex, as well as filled with technobabble and legalese, causing users to unknowingly accept terms that may even contradict the law. While summarizing and interpreting these privacy policies is crucial, there is a lack of high-quality English parallel corpus optimized for legal clarity and readability. To address this issue, we introduce APPSI-139, a high-quality English privacy policy corpus meticulously annotated by domain experts, specifically designed for summarization and interpretation tasks. The corpus includes 139 English privacy policies, 15,692 rewritten parallel corpora, and 36,351 fine-grained annotation labels across 11 data practice categories. Concurrently, we propose TCSI-pp-V2, a hybrid privacy policy summarization and interpretation framework that employs an alternating training strategy and coordinates multiple expert modules to effectively balance computational efficiency and accuracy. Experimental results show that the hybrid summarization system built on APPSI-139 corpus and the TCSI-pp-V2 framework outperform large language models, such as GPT-4o and LLaMA-3-70B, in terms of readability and reliability. The source code and dataset are available at https://github.com/EnlightenedAI/APPSI-139.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces APPSI-139, a parallel corpus of 139 English privacy policies containing 15,692 rewritten summaries and 36,351 fine-grained expert annotations across 11 data-practice categories. It also presents TCSI-pp-V2, a hybrid summarization and interpretation framework that alternates training across multiple expert modules. The central claim is that systems built from this corpus and framework outperform GPT-4o and LLaMA-3-70B on readability and reliability.
Significance. If the empirical claims are substantiated with complete evaluation details, the released corpus would constitute a reusable benchmark for legal-text summarization, and the hybrid framework could demonstrate practical advantages of modular, alternating training over pure LLM approaches in a high-stakes domain. Public code and data release supports reproducibility.
major comments (2)
- Abstract: the headline claim that the hybrid system 'outperform[s] large language models ... in terms of readability and reliability' is unsupported by any reported metric definitions, numerical scores, statistical tests, or baseline implementation details, preventing verification of the central empirical result.
- Evaluation section (implied by abstract): the readability and reliability metrics are computed against the same domain-expert annotations used to construct the training data, yet no inter-annotator agreement figures, guideline validation, or external correlation with user-comprehension studies are referenced, leaving open the possibility that reported gains reflect annotation-style bias rather than genuine improvement in legal clarity.
minor comments (1)
- Abstract: the informal phrase 'technobabble and legalese' could be replaced by a more precise description of the linguistic phenomena targeted by the annotations.
Simulated Author's Rebuttal
We thank the referee for the careful reading and valuable suggestions. We provide point-by-point responses to the major comments and have prepared revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: Abstract: the headline claim that the hybrid system 'outperform[s] large language models ... in terms of readability and reliability' is unsupported by any reported metric definitions, numerical scores, statistical tests, or baseline implementation details, preventing verification of the central empirical result.
Authors: We concur that the abstract would be strengthened by including references to the supporting evidence. In the revised version, we will update the abstract to briefly define the readability and reliability metrics, report the key numerical improvements over GPT-4o and LLaMA-3-70B, mention the statistical tests performed, and point to the detailed baseline descriptions in the Evaluation section. revision: yes
-
Referee: Evaluation section (implied by abstract): the readability and reliability metrics are computed against the same domain-expert annotations used to construct the training data, yet no inter-annotator agreement figures, guideline validation, or external correlation with user-comprehension studies are referenced, leaving open the possibility that reported gains reflect annotation-style bias rather than genuine improvement in legal clarity.
Authors: The evaluation is conducted on a held-out test set of policies and annotations disjoint from the training data. The annotation guidelines were developed and validated through multiple rounds of expert review, as described in the corpus construction section. We acknowledge that inter-annotator agreement figures and external user-comprehension correlations are not reported in the current manuscript. We will add IAA statistics and a limitations discussion that addresses potential annotation bias and proposes future user studies to correlate with real-world comprehension. revision: partial
Circularity Check
No significant circularity; empirical evaluation on new corpus and framework
full rationale
The paper introduces APPSI-139 as a new annotated corpus and TCSI-pp-V2 as a hybrid framework, then reports experimental outperformance on readability/reliability metrics computed from the corpus's expert labels. This is standard supervised ML evaluation with train/test splits on held-out data rather than any derivation that reduces by construction to fitted inputs or self-citations. No equations, uniqueness theorems, or load-bearing self-citations appear in the provided text; the central claim remains falsifiable against external benchmarks or user studies. Minor risk (score 2) stems only from reliance on the same expert annotation process for both training and evaluation, which is common and non-circular in data-driven work.
Axiom & Free-Parameter Ledger
free parameters (1)
- hyperparameters of TCSI-pp-V2 modules and alternating schedule
axioms (1)
- domain assumption Domain-expert annotations provide reliable ground truth for legal clarity and readability
Reference graph
Works this paper leans on
-
[1]
Large language model safety: A holistic survey,
Identifying the provision of choices in privacy policy text. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Process- ing, pages 2774–2779. Dan Shi, Tianhao Shen, Yufei Huang, Zhigen Li, Yongqi Leng, Renren Jin, Chuang Liu, Xinwei Wu, Zishan Guo, Linhao Yu, Ling Shi, Bojian Jiang, and Deyi Xiong. 2024. Large language model s...
-
[2]
Information Systems Frontiers, 13:501–514
A user-centric evaluation of the readability of privacy policies in popular web sites. Information Systems Frontiers, 13:501–514. Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A continual pre-training framework for language un- derstanding. In The Thirty-Fourth AAAI Conference on Artificial Intelligenc...
-
[3]
In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pages 133–135
Automatic summarization of privacy policies using ensemble learning. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pages 133–135. Shuai Wang, Xiang Zhao, Bo Li, Bin Ge, and Daquan Tang. 2017. Integrating extractive and abstrac- tive models for long text summarization. In 2017 IEEE International Congress on Big Da...
2017
-
[4]
Automatic text summarization: A review of ap- proaches, challenges, and future directions. Journal of Computer Science & Technology, 25. Haopeng Zhang, Philip S Yu, and Jiawei Zhang. 2025. A systematic survey of text summarization: From statistical methods to large language models. ACM Computing Surveys, 57(11):1–41. Tianyi Zhang, Varsha Kishore, Felix Wu...
-
[5]
Links to access the dataset and its metadata https://github.com/EnlightenedAI/ APPSI-139
-
[6]
The data is saved in a JSON format, where an example is shown in the README.md
-
[7]
Research group will maintain this dataset on the official Github account
-
[8]
( https://creativecommons
CC BY 4.0. ( https://creativecommons. org/licenses/by/4.0/) B Data Practice Category Data Practice Category Information also known as Topic, is used to describe the category of the sentence or term in privacy policies. It includes: • First Party Collection: The types of user infor- mation collected by the service provider, the pur- pose of collection, and...
2020
-
[9]
In some cases, personal information, once leaked, may be used against the individual’s will or in con- junction with other data, posing a significant risk to the person’s rights
Disclosure: When personal information is dis- closed, the individual and the organization or institution collecting or processing it lose con- trol over its distribution, resulting in uncon- trolled spreading and usage. In some cases, personal information, once leaked, may be used against the individual’s will or in con- junction with other data, posing a...
-
[10]
Such information should be regarded as personal sensitive information
Illegal Provision: Certain personal informa- tion becomes a significant risk to the indi- vidual’s rights when shared without consent, especially if it’s spread beyond the intended scope. Such information should be regarded as personal sensitive information. For in- stance, sexual orientation, banking details, and medical history related to infectious dis...
-
[11]
lengthi- ness
Abuse: Some personal information, when used beyond its authorized limits or for pur- poses other than originally intended, may pose Algorithm 1 TCSI-pp-V2 framework. Input: Privacy policy P ; Specified topics ∈ T opics. Output: Summarization Pats. Initialize: P = {p1, ..., pn} ← P reprocessing(P ); F iltered = list() #Step 1: Five trained experts carry ou...
2023
-
[12]
com/EnlightenedAI/APPSI-139
To foster transparency and reproducibil- ity, we provide the source code, annotation guidelines, and dataset in a public repository, which can be accessed via https://github. com/EnlightenedAI/APPSI-139. • Any other comments? None. Figure 9: Annotation of Privacy Policy in Doccano Figure 10: Rewritten of Privacy Policy in Doccano H.5 Uses • Has the datase...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.