Recognition: unknown
SHARE: Social-Humanities AI for Research and Education
Pith reviewed 2026-05-10 15:32 UTC · model grok-4.3
The pith
The SHARE models are the first causal language models pretrained exclusively on social sciences and humanities texts and approach general models on domain tasks with far less data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The SHARE models represent the first causal language models fully pretrained by and for the social sciences and humanities. Their performance in modeling SSH texts is close to that of general purpose models such as Phi-4, which use approximately 100 times more tokens, as measured by a custom SSH Cloze benchmark. The MIRROR user interface prototypes a generative AI system that reviews text inputs from SSH disciplines while generating no text of its own, allowing the capabilities of the SHARE models to be used without compromising SSH principles and norms.
What carries the argument
The SHARE causal language models pretrained exclusively on SSH data, together with the MIRROR non-generative review interface that preserves critical engagement.
If this is right
- Domain-specific pretraining on SSH corpora can yield competitive modeling performance without the scale of general-purpose training runs.
- Non-generative interfaces can apply language model strengths to SSH work while avoiding risks to originality and critical norms.
- Specialized benchmarks focused on SSH texts offer a more relevant way to assess models intended for those fields than general metrics.
Where Pith is reading between the lines
- Models like SHARE could lower barriers for SSH researchers to use AI assistance tailored to their data sources and ethical standards.
- The non-generative design of MIRROR might extend to other domains where full text generation conflicts with disciplinary expectations around authorship.
- Further testing on diverse SSH subfields could clarify whether the efficiency gains hold beyond the initial benchmark.
Load-bearing premise
The custom SSH Cloze benchmark provides a valid and representative measure of modeling quality for actual SSH research tasks, and the pretraining used only SSH data without any general-purpose corpora.
What would settle it
A direct comparison on real SSH research tasks such as close reading of historical texts or sociological analysis where the SHARE models show markedly lower accuracy than Phi-4, or documentation that the pretraining corpus included substantial non-SSH material.
Figures
read the original abstract
This intermediate technical report introduces the SHARE family of base models and the MIRROR user interface. The SHARE models are the first causal language models fully pretrained by and for the social sciences and humanities (SSH). Their performance in modelling SSH texts is close to that of general purpose models (Phi-4) which use 100 times more tokens, as shown by our custom SSH Cloze benchmark. The MIRROR user interface is designed for reviewing text inputs from the SSH disciplines while preserving critical engagement. By prototyping a generative AI interface that does not generate any text, we propose a way to harness the capabilities of the SHARE models without compromising the integrity of SSH principles and norms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the SHARE family of causal language models, presented as the first fully pretrained exclusively on social sciences and humanities (SSH) data, along with the MIRROR user interface. It claims that SHARE models achieve performance close to general-purpose models such as Phi-4 on a custom SSH Cloze benchmark despite using 100 times fewer tokens, and describes MIRROR as a non-generative interface for reviewing SSH texts to preserve critical engagement and disciplinary norms.
Significance. If the performance claims and benchmark validity hold, this could represent a meaningful contribution to domain-adapted language modeling for interpretive fields, demonstrating efficient pretraining on specialized corpora and an interface design that prioritizes review over generation. Such work might help address mismatches between general LLMs and SSH research practices, but the current lack of disclosed quantitative results, corpus details, or validation limits its immediate assessability.
major comments (2)
- [Benchmark description and results] The central claim of near-parity with Phi-4 on SSH text modeling (despite 100x fewer tokens) depends entirely on the custom SSH Cloze benchmark. No details are provided on its construction, item selection criteria, disciplinary coverage, expert validation, or correlation to real SSH tasks such as long-range coherence or theoretical nuance detection (see the section describing the SSH Cloze benchmark and associated results). This makes it impossible to determine whether the benchmark measures genuine SSH modeling capability or merely superficial domain adaptation.
- [Model pretraining and data] The assertion that the SHARE models are 'fully pretrained by and for the SSH' and 'exclusively on SSH data' is load-bearing for the efficiency claim but lacks supporting evidence on corpus composition, total token count, source filtering, or safeguards against general-purpose data leakage (see the model pretraining and data sections). Without these, the 100x token comparison cannot be evaluated.
minor comments (2)
- [Abstract] The abstract refers to 'Phi-4' without a citation or specification of the exact model, training tokens, or reference paper; add this for reproducibility.
- [Conclusion or discussion] As an 'intermediate technical report,' the manuscript would benefit from an explicit limitations section or statement on the preliminary status of the benchmark and models.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback on our intermediate technical report. We address each major comment below and will revise the manuscript accordingly to provide the requested details and clarifications.
read point-by-point responses
-
Referee: [Benchmark description and results] The central claim of near-parity with Phi-4 on SSH text modeling (despite 100x fewer tokens) depends entirely on the custom SSH Cloze benchmark. No details are provided on its construction, item selection criteria, disciplinary coverage, expert validation, or correlation to real SSH tasks such as long-range coherence or theoretical nuance detection (see the section describing the SSH Cloze benchmark and associated results). This makes it impossible to determine whether the benchmark measures genuine SSH modeling capability or merely superficial domain adaptation.
Authors: We acknowledge that the current manuscript provides only a high-level overview of the SSH Cloze benchmark and lacks the granular details needed for full evaluation. In the revised version, we will expand the benchmark section to describe its construction process, item selection criteria, disciplinary coverage across SSH fields, expert validation steps, and any available analyses or planned validations correlating benchmark scores with real SSH tasks such as long-range coherence and theoretical nuance detection. This will strengthen the substantiation of the performance claims. revision: yes
-
Referee: [Model pretraining and data] The assertion that the SHARE models are 'fully pretrained by and for the SSH' and 'exclusively on SSH data' is load-bearing for the efficiency claim but lacks supporting evidence on corpus composition, total token count, source filtering, or safeguards against general-purpose data leakage (see the model pretraining and data sections). Without these, the 100x token comparison cannot be evaluated.
Authors: We agree that additional evidence on the pretraining corpus is required to support the efficiency claims and allow proper evaluation of the 100x token comparison. The revised manuscript will include expanded details in the model pretraining and data sections on corpus composition, total token count, source filtering methods, and safeguards against general-purpose data leakage. revision: yes
Circularity Check
No circularity: purely descriptive report with no derivations or self-referential reductions
full rationale
The manuscript introduces the SHARE models and MIRROR interface as a descriptive technical report. No equations, derivations, fitted parameters, or load-bearing self-citations appear in the provided text. The central performance claim is tied to a custom SSH Cloze benchmark, but this is presented as external evidence rather than a quantity derived by construction from the pretraining inputs or prior self-citations. The 'first' status and performance comparison do not reduce to definitional equivalence or ansatz smuggling. The derivation chain is self-contained and non-circular.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Abdin, M., Aneja, J., Behl, H., Bubeck, S., Eldan, R., Gunasekar, S., . . . others (2024). Phi-4 technical report.arXiv preprint arXiv:2412.08905. Ansel, J., Yang, E., He, H., Gimelshein, N., Jain, A., Voznesensky, M., . . . others (2024). Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation. In Proceedin...
-
[2]
Knoth, P., & Zdrahal, Z. (2012). Core: three access levels to underpin open access.D-Lib Magazine, 18(11/12), 1–13. Kuditipudi, R., Huang, J., Zhu, S., Yang, D., Potts, C., & Liang, P. (2025). Blackbox model provenance via palimpsestic membership inference.arXiv preprint arXiv:2510.19796. Kyriakidis, K. (2025). Focus on stem at the expense of humanities: ...
-
[3]
Lo, K., Wang, L. L., Neumann, M., Kinney, R., & Weld, D. S. (2020). S2orc: The semantic scholar open research corpus. InProceedings of the 58th annual meeting of the association for computational linguistics(pp. 4969–4983). Masterman, M. (2005).Language, cohesion and form (edited by yorick wilks). Cambridge University Press. McCombs, M. E., & Shaw, D. L. ...
work page internal anchor Pith review arXiv 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.