QwenSafe: Multimodal Content Rating Description Identification via Preference-Aligned VLMs

Aruna Seneviratne; Dishanika Denipitiyage; Suranga Seneviratne

arxiv: 2605.20584 · v1 · pith:PBWPTGM6new · submitted 2026-05-20 · 💻 cs.CV

QwenSafe: Multimodal Content Rating Description Identification via Preference-Aligned VLMs

Dishanika Denipitiyage , Aruna Seneviratne , Suranga Seneviratne This is my paper

Pith reviewed 2026-05-21 06:06 UTC · model grok-4.3

classification 💻 cs.CV

keywords content rating descriptorsvision-language modelsmultimodal classificationpreference optimizationmobile app marketplacescontent moderationsupervised fine-tuning

0 comments

The pith

QwenSafe outperforms existing vision-language models at classifying content rating descriptors by using preference alignment on multimodal app data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops QwenSafe to automatically identify standardized content rating descriptors required for mobile apps by processing both app metadata and screenshots together. The challenge is that these descriptors must accurately reflect potentially sensitive content, yet manual verification does not scale well for large marketplaces. The authors create a pipeline called metadata2CRD to build training data that links app materials to specific descriptor definitions and then apply supervised fine-tuning plus direct preference optimization to make the model prefer answers supported by evidence. Evaluation across twelve descriptors shows consistent gains in correctly spotting when a descriptor applies.

Core claim

By adapting Qwen3-VL-8B with the metadata2CRD pipeline for data synthesis and then applying direct preference optimization, the resulting QwenSafe model achieves higher accuracy in binary classification of Apple content rating descriptors than the base model and other leading vision-language models. The improvements are particularly notable in positive-class recall, reaching 111.8% over one baseline, 36.1% over another, and 2.1% over the third. This establishes that aligning model predictions to descriptor-specific multimodal evidence enhances automated content rating tasks.

What carries the argument

metadata2CRD pipeline for creating aligned question-answer pairs combined with direct preference optimization to align the VLM outputs to visual and textual evidence for each content rating descriptor

Load-bearing premise

The data generated by the metadata2CRD pipeline produces high-quality pairs that represent real app content and enable the model to generalize without biases introduced by synthesis or image interpretation.

What would settle it

A large-scale evaluation against human expert labels on actual submitted apps, checking whether the reported recall improvements persist outside the synthetic training distribution.

Figures

Figures reproduced from arXiv: 2605.20584 by Aruna Seneviratne, Dishanika Denipitiyage, Suranga Seneviratne.

**Figure 1.** Figure 1: Comparison of age ratings across major authorities (USK, PEGI, ESRB, IARC, ACB, and Apple) app content rating in Australia, as this research was conducted by setting the geographical location as Australia. Children need constant vigilance and effort to protect their personal data, as they often do not fully understand the risks of how their data is collected and used. According to COPPA §312.4 [35], applic… view at source ↗

**Figure 2.** Figure 2: Content descriptors of T he SimsTM F reeP lay and Netf lix apps across App Store and Play Store. fine granular second layer compared to apple. For example ACB divides violence into 12 sub-categories whereas Apple has only four sub-categories. Compared to Android, Apple recently introduces Parental Controls and Age Assurance as a protection layer for children under 16 years. 2.2 iOS and Android Content Rati… view at source ↗

**Figure 3.** Figure 3: (a) Content descriptor taxonomy and (b) mapping to 12 different Apple content rating descriptors This process yields a unified, hierarchical taxonomy that fully covers all 12 content descriptors defined in the iOS ecosystem (cf [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: The overview of the QwenSafe pipeline. The pipeline involves four stage process. a) metadata2CRD: constructing QwenSafe training data from Apple app metadata, b) supervised fine tuning Qwen3-VL model, c) DPO dataset generation and d) DPO optimisation. information, as their large user base increases the likelihood of detecting and reporting violations, thereby enhancing the reliability of their content rati… view at source ↗

**Figure 5.** Figure 5: Example illustrating model behaviour on the Mature/Suggestive Themes descriptor. QwenSafe recognises the subtle cues present in both the screenshot and description and accurately labels the impact as mild, demonstrating improved sensitivity to low-intensity content. Evaluation Metrics: The goal of QwenSafe is to reliably detect the presence of specific content rating descriptors in mobile app metadata. T… view at source ↗

**Figure 6.** Figure 6: Analysis of non-disclosed CRDs identified by QwenSafe. (i) Distribution of applications containing non-disclosed CRDs across Apple age rating categories (4+, 9+, 12+, and 17+) and descriptor types. (ii) Representative examples of applications where QwenSafe detects CRDs that are not declared in the app metadata. restricted web access, and contests, as Apple does not provide severity annotations). Full per… view at source ↗

read the original abstract

Mobile app marketplaces require developers to disclose standardized content rating descriptors (CRDs) to inform users about potentially sensitive or restricted content. Ensuring the accuracy and consistency of these disclosures remains challenging due to the multimodal nature of app content, which spans textual descriptions and visual interfaces. In this paper, we present QwenSafe, a Vision-Language Model (VLM) designed to automatically identify the presence of Apple-defined CRDs by jointly reasoning over app metadata and screenshots. To enable scalable training for this task, we introduce metadata2CRD, a data-construction pipeline that synthesizes descriptor-aligned question-answer pairs by combining app descriptions, screenshots, and formal descriptor definitions. We adapt Qwen3-VL-8B using supervised fine-tuning followed by Direct Preference Optimization (DPO) to align model predictions with descriptor-specific evidence and explanations across visual and textual modalities. We evaluate QwenSafe on 12 Apple-defined content rating descriptors and compare it against state-of-the-art vision-language models, including Qwen3-VL, LLaVA-1.6, and Gemini-2.5-Flash. QwenSafe consistently outperforms all baselines in binary CRD classification, achieving improvements in positive-class recall of 111.8%, 36.1%, and 2.1%, respectively. Our results demonstrate that descriptor-aware multimodal alignment substantially improves automated content classification and highlights the potential of vision-language models to support scalable and consistent content rating in mobile app marketplaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a practical metadata-to-CRD synthesis pipeline and DPO alignment for VLM-based app content rating detection, but the reported recall gains rest on thin evaluation details that leave room for synthesis artifacts.

read the letter

The core takeaway is that this work builds a targeted data pipeline called metadata2CRD to generate descriptor-aligned QA pairs from app descriptions, screenshots, and official definitions, then uses SFT plus DPO on Qwen3-VL-8B to improve binary classification across 12 Apple CRDs. The reported positive-class recall lifts over the base model, LLaVA-1.6, and Gemini-2.5-Flash are the main empirical hook.

Referee Report

2 major / 2 minor

Summary. The paper presents QwenSafe, a VLM based on Qwen3-VL-8B that is fine-tuned with SFT followed by DPO to identify the presence of 12 Apple-defined content rating descriptors (CRDs) from joint app metadata and screenshots. It introduces the metadata2CRD pipeline to synthesize descriptor-aligned QA pairs from descriptions, screenshots, and formal definitions. The central empirical claim is that QwenSafe outperforms baselines (Qwen3-VL, LLaVA-1.6, Gemini-2.5-Flash) in binary CRD classification, with positive-class recall gains of 111.8%, 36.1%, and 2.1% respectively.

Significance. If the reported gains reflect genuine multimodal generalization rather than pipeline artifacts, the work could support more scalable and consistent automated content rating for app marketplaces. The combination of descriptor-specific definitions with DPO for evidence alignment is a sensible technical choice for this safety-oriented task. However, the absence of dataset statistics, split details, and external validation substantially weakens the strength of the conclusions.

major comments (2)

[§4.2 and Table 1] §4.2 and Table 1: The manuscript reports large positive-class recall improvements but provides no information on evaluation dataset size, per-descriptor sample counts, train-test split ratios, or statistical significance testing. This information is required to assess whether the 111.8%, 36.1%, and 2.1% gains are reliable or could arise from variance or imbalance.
[§3.1] §3.1 (metadata2CRD pipeline): The evaluation set is generated by the same synthesis procedure used for training data, with no experiments or analysis addressing possible data leakage, keyword injection, or distribution shift. The central claim that QwenSafe performs robust joint metadata+screenshot reasoning therefore requires external validation on human-annotated real-world apps, which is not reported.

minor comments (2)

[Abstract] Abstract: The sentence reporting recall improvements lists three percentages but does not explicitly map them to the three named baselines; adding this mapping would improve clarity.
[§2] §2 (Related Work): The discussion of prior VLM safety and content moderation work is brief; adding references to recent multimodal safety benchmarks would better situate the contribution.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have reviewed the major comments carefully and provide point-by-point responses below, indicating the revisions we will incorporate to address the concerns raised.

read point-by-point responses

Referee: [§4.2 and Table 1] §4.2 and Table 1: The manuscript reports large positive-class recall improvements but provides no information on evaluation dataset size, per-descriptor sample counts, train-test split ratios, or statistical significance testing. This information is required to assess whether the 111.8%, 36.1%, and 2.1% gains are reliable or could arise from variance or imbalance.

Authors: We agree that these details are necessary to properly evaluate the reliability of the reported gains. The current manuscript provides only high-level dataset descriptions in §4. In the revised version we will expand §4.2 and Table 1 to report the total size of the evaluation set, the number of positive and negative samples per descriptor, the train-test split ratios employed, and the results of statistical significance tests (e.g., McNemar’s test) comparing QwenSafe against the baselines. These additions will allow readers to assess whether the observed improvements are robust to variance and class imbalance. revision: yes
Referee: [§3.1] §3.1 (metadata2CRD pipeline): The evaluation set is generated by the same synthesis procedure used for training data, with no experiments or analysis addressing possible data leakage, keyword injection, or distribution shift. The central claim that QwenSafe performs robust joint metadata+screenshot reasoning therefore requires external validation on human-annotated real-world apps, which is not reported.

Authors: We acknowledge the validity of this concern. The metadata2CRD pipeline relies on formal descriptor definitions rather than surface-level keywords, and we used disjoint app sets for training and evaluation to reduce direct leakage. Nevertheless, we did not include explicit ablation studies on keyword injection or distribution shift. In the revision we will add such analysis (e.g., performance after removing obvious keyword cues) and clarify the steps taken to ensure separation between splits. We agree that external validation on independently human-annotated real-world apps would provide stronger evidence of generalization beyond the synthetic distribution; we will explicitly note this as a limitation and outline it as future work. revision: partial

standing simulated objections not resolved

External validation on human-annotated real-world apps is not available in the current study and would require new data collection outside the scope of this work.

Circularity Check

0 steps flagged

No circularity in empirical evaluation of VLM fine-tuning pipeline

full rationale

The paper presents a standard empirical ML workflow: it introduces a data synthesis pipeline (metadata2CRD) to generate training pairs from app metadata, screenshots, and descriptor definitions, performs supervised fine-tuning followed by DPO on Qwen3-VL-8B, and reports direct performance metrics (positive-class recall improvements) against external baselines on a binary classification task for 12 descriptors. These metrics are measured outcomes on an evaluation set and do not reduce by any equations or definitions to quantities that are tautologically equivalent to the training inputs or fitted parameters. No mathematical derivations, self-citations, uniqueness theorems, or ansatzes are present in the provided text that would create a load-bearing circular chain. The central claims rest on observable model outputs rather than self-referential constructions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the quality and representativeness of the synthesized training data and on the assumption that joint multimodal reasoning over metadata and screenshots is sufficient to determine CRD presence.

free parameters (1)

DPO and SFT hyperparameters
The supervised fine-tuning and direct preference optimization stages involve multiple tunable parameters whose specific values are not reported.

axioms (1)

domain assumption App metadata and screenshots jointly contain sufficient evidence to determine the presence or absence of each CRD.
The model is trained and evaluated under the premise that these two modalities are adequate inputs for the classification task.

pith-pipeline@v0.9.0 · 5807 in / 1413 out tokens · 42811 ms · 2026-05-21T06:06:26.466604+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adapt Qwen3-VL-8B using supervised fine-tuning followed by Direct Preference Optimization (DPO) to align model predictions with descriptor-specific evidence...
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

QwenSafe consistently outperforms all baselines in binary CRD classification, achieving improvements in positive-class recall of 111.8%, 36.1%, and 2.1%, respectively.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

Apple Newsroom.: Apple expands tools to help parents protect kids and teens online. (2025),https://www.apple.com/au/newsroom/2025/06/ apple-expands-tools-to-help-parents-protect-kids-and-teens-online/#: ~:text=12%20June%202025-,Apple%20expands%20tools%20to%20help% 20parents%20protect%20kids%20and%20teens,they%20set%20up%20their% 20device

work page 2025
[2]

com/google-play-statistics-and-trends

42matters: Google play statistics and trends 2025 (2025),https://42matters. com/google-play-statistics-and-trends

work page 2025
[3]

42matters: ios apple app store statistics and trends 2025 (2025),https:// 42matters.com/ios-apple-app-store-statistics-and-trends

work page 2025
[4]

Apple Inc.: Age ratings values and definitions (2025),https: //developer.apple.com/help/app-store-connect/reference/ age-ratings-values-and-definitions

work page 2025
[5]

Apple Inc.: Choosing a category.https://developer.apple.com/app-store/ categories/(2025)

work page 2025
[6]

Apple Inc.: Set an app age rating.https://developer.apple.com/help/ app-store-connect/manage-app-information/set-an-app-age-rating(2025)

work page 2025
[7]

austlii.edu.au/cgi-bin/viewdb/au/legis/cth/consol\_act/bsa1992214/ (1992)

Australasian Legal Information Institute: Online content regulation.https://www. austlii.edu.au/cgi-bin/viewdb/au/legis/cth/consol\_act/bsa1992214/ (1992)

work page 1992
[8]

Australasian Legal Information Institute: ONLINE SAFETY ACT 2021 - SECT 105.https://www.austlii.edu.au/cgi-bin/viewdoc/au/legis/cth/ consol\_act/osa2021154/s105.html(2021)

work page 2021
[9]

Qwen3-VL Technical Report

Bai, S., et al.: Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Board, E.S.R.: Rating guide (1994),https://www.esrb.org/ratings-guide/

work page 1994
[11]

Canadian centre for child protection: Reviewing the enforcement of app age rat- ings in apple’s app store and google play.https://content.c3p.ca/pdfs/C3P\ _AppAgeRatingReport\_en.pdf(2022)

work page 2022
[12]

In: Proceedings of the 31st ACM international conference on multimedia

Cao,R.,Hee,M.S.,Kuek,A.,Chong,W.H.,Lee,R.K.W.,Jiang,J.:Pro-cap:Lever- aging a frozen vision-language model for hateful meme detection. In: Proceedings of the 31st ACM international conference on multimedia. pp. 5244–5252 (2023)

work page 2023
[13]

Available at SSRN (2025)

Carter, M., Zhangshao, T., Hardwick, T., Egliston, B., Xiao, L.Y.: Investigating mobile games’ compliance with australia’s 2024 mandatory minimum age classifi- cations scheme for gambling-like mechanics. Available at SSRN (2025)

work page 2024
[14]

In: Proceedings of the 22nd international conference on World Wide Web

Chen, Y., Xu, H., Zhou, Y., Zhu, S.: Is this app safe for children? a comparison study of maturity ratings on Android and iOS applications. In: Proceedings of the 22nd international conference on World Wide Web. pp. 201–212 (2013)

work page 2013
[15]

arXiv preprint arXiv:2103.12407 (2021)

Chiu, K.L., Collins, A., Alexander, R.: Detecting hate speech with gpt-3. arXiv preprint arXiv:2103.12407 (2021)

work page arXiv 2021
[16]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., et al.: Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

arXiv preprint arXiv:2502.15739 (2025)

Denipitiyage, D., Silva, B., Seneviratne, S., Seneviratne, A., Chawla, S.: Detect- ing content rating violations in android applications: A vision-language approach. arXiv preprint arXiv:2502.15739 (2025)

work page arXiv 2025
[18]

Denipitiyage et al

eSafety commissioner, Australia: Illegal and restricted online content."https:// www.esafety.gov.au/key-topics/Illegal-restricted-content(2024) 18 D. Denipitiyage et al

work page 2024
[19]

eSafety commissioner, Australia: Illegal and restricted online content (2024)

work page 2024
[20]

(2016),https://gdpr-info.eu/

European General Data Protection Regulation: General data protection regulation gdpr. (2016),https://gdpr-info.eu/

work page 2016
[21]

Google: App Discovery with Google Play, Part 3: Machine Learning to Fight Spam and Abuse at Scale.https://research.google/blog/ app-discovery-with-google-play-part-3-machine-learning-to-fight-spa/ m-and-abuse-at-scale/(Mar 2015)

work page 2015
[22]

Google: Keeping google play safe for users and developers: June 29, 2023 (2023),https://support.google.com/googleplay/android-developer/answer/ 13721042?hl=en

work page 2023
[23]

google.com/googleplay/answer/6209544?hl=en

Google: Apps & games content ratings on google play (2025),https://support. google.com/googleplay/answer/6209544?hl=en

work page arXiv 2025
[24]

In: 2023 International Conference on Machine Learning and Applications (ICMLA)

Guo, K., Hu, A., Mu, J., Shi, Z., Zhao, Z., Vishwamitra, N., Hu, H.: An inves- tigation of large language models for real-world hate speech detection. In: 2023 International Conference on Machine Learning and Applications (ICMLA). pp. 1568–1573. IEEE (2023)

work page 2023
[25]

Haotian Liu, Chunyuan Li, Y.L., Lee, Y.J.: Improved baselines with visual instruc- tion tuning (2023)

work page 2023
[26]

In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Hu, B., Liu, B., Gong, N.Z., Kong, D., Jin, H.: Protecting your children from inappropriate content in mobile apps: An automatic maturity rating framework. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. pp. 1111–1120 (2015)

work page 2015
[27]

Ibrahim, H.: Google play review times: Expectations and tips to streamline approval (Nov 2024),https://median.co/blog/ google-play-review-times-what-to-expect-and-how-to-streamline-approval

work page 2024
[28]

Interactive Software Federation of Europe (ISFE): Pegi-pan-european game infor- mation.http://www.pegi.info/en/index/id/952(2003)

work page 2003
[29]

(2025),https://www

International Age Rating Coalition: How iarc works. (2025),https://www. globalratings.com/how-iarc-works.aspx

work page 2025
[30]

com/iphone-apps/95993/11-iphone-apps-that-got-banned-and-why

Jensen, K.T.: 11 iphone apps that got banned and why (2022),https://au.pcmag. com/iphone-apps/95993/11-iphone-apps-that-got-banned-and-why

work page 2022
[31]

Advances in neural information processing systems33, 2611–2624 (2020)

Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., Testuggine, D.: The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in neural information processing systems33, 2611–2624 (2020)

work page 2020
[32]

In: Proceedings of the 17th International Workshop on Mobile Computing Systems and Applications

Liu, M., Wang, H., Guo, Y., Hong, J.: Identifying and analyzing the privacy of apps for kids. In: Proceedings of the 17th International Workshop on Mobile Computing Systems and Applications. pp. 105–110 (2016)

work page 2016
[33]

In: Proceedings of the AAAI conference on artificial intelligence

Mathew,B.,Saha,P.,Yimam,S.M.,Biemann,C.,Goyal,P.,Mukherjee,A.:Hatex- plain: A benchmark dataset for explainable hate speech detection. In: Proceedings of the AAAI conference on artificial intelligence. vol. 35, pp. 14867–14875 (2021)

work page 2021
[34]

org/film-ratings/

Motion Picture Association: Film rating (1968),https://www.motionpictures. org/film-ratings/

work page 1968
[35]

ecfr.gov/current/title-16/chapter-I/subchapter-C/part-312

National Archives: Children’s online privacy protection rule (2022),https://www. ecfr.gov/current/title-16/chapter-I/subchapter-C/part-312

work page 2022
[36]

google.com/console/about/programs/families/(2015)

Play Store: Creating apps and games for children and families.https://play. google.com/console/about/programs/families/(2015)

work page 2015
[37]

Advances in neural information processing systems36, 53728–53741 (2023) QwenSafe 19

Rafailov, R., Sharma, A., Mitchell, E., Manning, C.D., Ermon, S., Finn, C.: Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems36, 53728–53741 (2023) QwenSafe 19

work page 2023
[38]

(2025),https: //www.classification.gov.au/classification-ratings/what-are-ratings

Regional Development Department of Infrastructure, Transport and Communi- cation.: The advisory categories for films and computer games. (2025),https: //www.classification.gov.au/classification-ratings/what-are-ratings

work page 2025
[39]

In: Proceedings of the ACM Web Conference 2023

Sun, R., Xue, M., Tyson, G., Wang, S., Camtepe, S., Nepal, S.: Not seen, not heard in the digital world! measuring privacy practices in children’s apps. In: Proceedings of the ACM Web Conference 2023. pp. 2166–2177 (2023)

work page 2023
[40]

(2025),https://usk

Unterhaltungssoftware Selbstkontrolle: SK age categories. (2025),https://usk. de/en/the-usk/faqs/age-categories/

work page 2025
[41]

In: Proceedings of the ACM on Web Conference 2025

Wang, H., Tan, R.Y., Lee, R.K.W.: Cross-modal transfer from memes to videos: Addressing data scarcity in hateful video detection. In: Proceedings of the ACM on Web Conference 2025. pp. 5255–5263 (2025)

work page 2025
[42]

Royal Society Open Science12(5), 250704 (2025)

Xiao, L.Y., Lund, M.L.: Non-compliance with and non-enforcement of uk loot box industry self-regulation on the apple app store: a longitudinal study on poor implementation. Royal Society Open Science12(5), 250704 (2025)

work page 2025
[43]

In: Proceedings of the 13th Asia-Pacific Symposium on Internetware

Zhou, C., Zhan, X., Li, L., Liu, Y.: Automatic maturity rating for Android apps. In: Proceedings of the 13th Asia-Pacific Symposium on Internetware. pp. 16–27 (2022) A Appendix Due to space constraints, we provide the complete multi-class classification re- sultsacrossalldescriptorsinTable3.Thistablereportsmildandstrongprecision and recall for all methods...

work page arXiv 2022

[1] [1]

Apple Newsroom.: Apple expands tools to help parents protect kids and teens online. (2025),https://www.apple.com/au/newsroom/2025/06/ apple-expands-tools-to-help-parents-protect-kids-and-teens-online/#: ~:text=12%20June%202025-,Apple%20expands%20tools%20to%20help% 20parents%20protect%20kids%20and%20teens,they%20set%20up%20their% 20device

work page 2025

[2] [2]

com/google-play-statistics-and-trends

42matters: Google play statistics and trends 2025 (2025),https://42matters. com/google-play-statistics-and-trends

work page 2025

[3] [3]

42matters: ios apple app store statistics and trends 2025 (2025),https:// 42matters.com/ios-apple-app-store-statistics-and-trends

work page 2025

[4] [4]

Apple Inc.: Age ratings values and definitions (2025),https: //developer.apple.com/help/app-store-connect/reference/ age-ratings-values-and-definitions

work page 2025

[5] [5]

Apple Inc.: Choosing a category.https://developer.apple.com/app-store/ categories/(2025)

work page 2025

[6] [6]

Apple Inc.: Set an app age rating.https://developer.apple.com/help/ app-store-connect/manage-app-information/set-an-app-age-rating(2025)

work page 2025

[7] [7]

austlii.edu.au/cgi-bin/viewdb/au/legis/cth/consol\_act/bsa1992214/ (1992)

Australasian Legal Information Institute: Online content regulation.https://www. austlii.edu.au/cgi-bin/viewdb/au/legis/cth/consol\_act/bsa1992214/ (1992)

work page 1992

[8] [8]

Australasian Legal Information Institute: ONLINE SAFETY ACT 2021 - SECT 105.https://www.austlii.edu.au/cgi-bin/viewdoc/au/legis/cth/ consol\_act/osa2021154/s105.html(2021)

work page 2021

[9] [9]

Qwen3-VL Technical Report

Bai, S., et al.: Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

Board, E.S.R.: Rating guide (1994),https://www.esrb.org/ratings-guide/

work page 1994

[11] [11]

Canadian centre for child protection: Reviewing the enforcement of app age rat- ings in apple’s app store and google play.https://content.c3p.ca/pdfs/C3P\ _AppAgeRatingReport\_en.pdf(2022)

work page 2022

[12] [12]

In: Proceedings of the 31st ACM international conference on multimedia

Cao,R.,Hee,M.S.,Kuek,A.,Chong,W.H.,Lee,R.K.W.,Jiang,J.:Pro-cap:Lever- aging a frozen vision-language model for hateful meme detection. In: Proceedings of the 31st ACM international conference on multimedia. pp. 5244–5252 (2023)

work page 2023

[13] [13]

Available at SSRN (2025)

Carter, M., Zhangshao, T., Hardwick, T., Egliston, B., Xiao, L.Y.: Investigating mobile games’ compliance with australia’s 2024 mandatory minimum age classifi- cations scheme for gambling-like mechanics. Available at SSRN (2025)

work page 2024

[14] [14]

In: Proceedings of the 22nd international conference on World Wide Web

Chen, Y., Xu, H., Zhou, Y., Zhu, S.: Is this app safe for children? a comparison study of maturity ratings on Android and iOS applications. In: Proceedings of the 22nd international conference on World Wide Web. pp. 201–212 (2013)

work page 2013

[15] [15]

arXiv preprint arXiv:2103.12407 (2021)

Chiu, K.L., Collins, A., Alexander, R.: Detecting hate speech with gpt-3. arXiv preprint arXiv:2103.12407 (2021)

work page arXiv 2021

[16] [16]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., et al.: Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

arXiv preprint arXiv:2502.15739 (2025)

Denipitiyage, D., Silva, B., Seneviratne, S., Seneviratne, A., Chawla, S.: Detect- ing content rating violations in android applications: A vision-language approach. arXiv preprint arXiv:2502.15739 (2025)

work page arXiv 2025

[18] [18]

Denipitiyage et al

eSafety commissioner, Australia: Illegal and restricted online content."https:// www.esafety.gov.au/key-topics/Illegal-restricted-content(2024) 18 D. Denipitiyage et al

work page 2024

[19] [19]

eSafety commissioner, Australia: Illegal and restricted online content (2024)

work page 2024

[20] [20]

(2016),https://gdpr-info.eu/

European General Data Protection Regulation: General data protection regulation gdpr. (2016),https://gdpr-info.eu/

work page 2016

[21] [21]

Google: App Discovery with Google Play, Part 3: Machine Learning to Fight Spam and Abuse at Scale.https://research.google/blog/ app-discovery-with-google-play-part-3-machine-learning-to-fight-spa/ m-and-abuse-at-scale/(Mar 2015)

work page 2015

[22] [22]

Google: Keeping google play safe for users and developers: June 29, 2023 (2023),https://support.google.com/googleplay/android-developer/answer/ 13721042?hl=en

work page 2023

[23] [23]

google.com/googleplay/answer/6209544?hl=en

Google: Apps & games content ratings on google play (2025),https://support. google.com/googleplay/answer/6209544?hl=en

work page arXiv 2025

[24] [24]

In: 2023 International Conference on Machine Learning and Applications (ICMLA)

Guo, K., Hu, A., Mu, J., Shi, Z., Zhao, Z., Vishwamitra, N., Hu, H.: An inves- tigation of large language models for real-world hate speech detection. In: 2023 International Conference on Machine Learning and Applications (ICMLA). pp. 1568–1573. IEEE (2023)

work page 2023

[25] [25]

Haotian Liu, Chunyuan Li, Y.L., Lee, Y.J.: Improved baselines with visual instruc- tion tuning (2023)

work page 2023

[26] [26]

In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Hu, B., Liu, B., Gong, N.Z., Kong, D., Jin, H.: Protecting your children from inappropriate content in mobile apps: An automatic maturity rating framework. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. pp. 1111–1120 (2015)

work page 2015

[27] [27]

Ibrahim, H.: Google play review times: Expectations and tips to streamline approval (Nov 2024),https://median.co/blog/ google-play-review-times-what-to-expect-and-how-to-streamline-approval

work page 2024

[28] [28]

Interactive Software Federation of Europe (ISFE): Pegi-pan-european game infor- mation.http://www.pegi.info/en/index/id/952(2003)

work page 2003

[29] [29]

(2025),https://www

International Age Rating Coalition: How iarc works. (2025),https://www. globalratings.com/how-iarc-works.aspx

work page 2025

[30] [30]

com/iphone-apps/95993/11-iphone-apps-that-got-banned-and-why

Jensen, K.T.: 11 iphone apps that got banned and why (2022),https://au.pcmag. com/iphone-apps/95993/11-iphone-apps-that-got-banned-and-why

work page 2022

[31] [31]

Advances in neural information processing systems33, 2611–2624 (2020)

Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Ringshia, P., Testuggine, D.: The hateful memes challenge: Detecting hate speech in multimodal memes. Advances in neural information processing systems33, 2611–2624 (2020)

work page 2020

[32] [32]

In: Proceedings of the 17th International Workshop on Mobile Computing Systems and Applications

Liu, M., Wang, H., Guo, Y., Hong, J.: Identifying and analyzing the privacy of apps for kids. In: Proceedings of the 17th International Workshop on Mobile Computing Systems and Applications. pp. 105–110 (2016)

work page 2016

[33] [33]

In: Proceedings of the AAAI conference on artificial intelligence

Mathew,B.,Saha,P.,Yimam,S.M.,Biemann,C.,Goyal,P.,Mukherjee,A.:Hatex- plain: A benchmark dataset for explainable hate speech detection. In: Proceedings of the AAAI conference on artificial intelligence. vol. 35, pp. 14867–14875 (2021)

work page 2021

[34] [34]

org/film-ratings/

Motion Picture Association: Film rating (1968),https://www.motionpictures. org/film-ratings/

work page 1968

[35] [35]

ecfr.gov/current/title-16/chapter-I/subchapter-C/part-312

National Archives: Children’s online privacy protection rule (2022),https://www. ecfr.gov/current/title-16/chapter-I/subchapter-C/part-312

work page 2022

[36] [36]

google.com/console/about/programs/families/(2015)

Play Store: Creating apps and games for children and families.https://play. google.com/console/about/programs/families/(2015)

work page 2015

[37] [37]

Advances in neural information processing systems36, 53728–53741 (2023) QwenSafe 19

Rafailov, R., Sharma, A., Mitchell, E., Manning, C.D., Ermon, S., Finn, C.: Direct preference optimization: Your language model is secretly a reward model. Advances in neural information processing systems36, 53728–53741 (2023) QwenSafe 19

work page 2023

[38] [38]

(2025),https: //www.classification.gov.au/classification-ratings/what-are-ratings

Regional Development Department of Infrastructure, Transport and Communi- cation.: The advisory categories for films and computer games. (2025),https: //www.classification.gov.au/classification-ratings/what-are-ratings

work page 2025

[39] [39]

In: Proceedings of the ACM Web Conference 2023

Sun, R., Xue, M., Tyson, G., Wang, S., Camtepe, S., Nepal, S.: Not seen, not heard in the digital world! measuring privacy practices in children’s apps. In: Proceedings of the ACM Web Conference 2023. pp. 2166–2177 (2023)

work page 2023

[40] [40]

(2025),https://usk

Unterhaltungssoftware Selbstkontrolle: SK age categories. (2025),https://usk. de/en/the-usk/faqs/age-categories/

work page 2025

[41] [41]

In: Proceedings of the ACM on Web Conference 2025

Wang, H., Tan, R.Y., Lee, R.K.W.: Cross-modal transfer from memes to videos: Addressing data scarcity in hateful video detection. In: Proceedings of the ACM on Web Conference 2025. pp. 5255–5263 (2025)

work page 2025

[42] [42]

Royal Society Open Science12(5), 250704 (2025)

Xiao, L.Y., Lund, M.L.: Non-compliance with and non-enforcement of uk loot box industry self-regulation on the apple app store: a longitudinal study on poor implementation. Royal Society Open Science12(5), 250704 (2025)

work page 2025

[43] [43]

In: Proceedings of the 13th Asia-Pacific Symposium on Internetware

Zhou, C., Zhan, X., Li, L., Liu, Y.: Automatic maturity rating for Android apps. In: Proceedings of the 13th Asia-Pacific Symposium on Internetware. pp. 16–27 (2022) A Appendix Due to space constraints, we provide the complete multi-class classification re- sultsacrossalldescriptorsinTable3.Thistablereportsmildandstrongprecision and recall for all methods...

work page arXiv 2022