Patent Claim Generation by Fine-Tuning OpenAI GPT-2
Pith reviewed 2026-05-25 12:31 UTC · model grok-4.3
The pith
Fine-tuning GPT-2 produces the first machine-generated patent claims.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We are the first to generate patent claims by machines and the first to apply GPT-2 to patent claim generation. By fine-tuning the model on patent claims and using their unique language structure, the model produces coherent text under both conditional and unconditional sampling.
What carries the argument
Fine-tuned GPT-2 model leveraging the unique language structure in patent claims as implicit human annotations to learn claim generation.
If this is right
- Patent claims can be generated automatically from the fine-tuned model.
- The quality can be assessed through qualitative analysis of generated samples.
- A new sampling approach for text generation is proposed.
- An email bot enables other researchers to interact with the model.
Where Pith is reading between the lines
- If the generated claims prove legally sound at scale, patent drafting costs could decrease significantly.
- The technique may extend to generating other types of legal or technical documents with similar structures.
- Longer training beyond the first 100 steps might yield even more coherent and complex claims.
Load-bearing premise
That qualitative inspection of text generated after only the first 100 training steps can meaningfully assess the coherence and legal utility of the claims.
What would settle it
A panel of patent lawyers evaluating a sample of generated claims and concluding that they do not meet legal standards for novelty, support, or clarity would show the approach does not work.
read the original abstract
In this work, we focus on fine-tuning an OpenAI GPT-2 pre-trained model for generating patent claims. GPT-2 has demonstrated impressive efficacy of pre-trained language models on various tasks, particularly coherent text generation. Patent claim language itself has rarely been explored in the past and poses a unique challenge. We are motivated to generate coherent patent claims automatically so that augmented inventing might be viable someday. In our implementation, we identified a unique language structure in patent claims and leveraged its implicit human annotations. We investigated the fine-tuning process by probing the first 100 steps and observing the generated text at each step. Based on both conditional and unconditional random sampling, we analyze the overall quality of generated patent claims. Our contributions include: (1) being the first to generate patent claims by machines and being the first to apply GPT-2 to patent claim generation, (2) providing various experiment results for qualitative analysis and future research, (3) proposing a new sampling approach for text generation, and (4) building an e-mail bot for future researchers to explore the fine-tuned GPT-2 model further.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports on fine-tuning OpenAI's GPT-2 model on patent claim data to generate new patent claims. It claims to be the first to do so, examines the fine-tuning by looking at generated text in the first 100 steps using conditional and unconditional sampling, proposes a new sampling approach, and provides an email bot for further exploration by researchers.
Significance. Should the approach yield coherent and legally sound patent claims upon proper evaluation, this work would mark a significant step in applying large pre-trained language models to the specialized and structured domain of patent claims, potentially facilitating AI-augmented invention processes. The qualitative results and the provided exploration tool offer a starting point for the community.
major comments (3)
- [Abstract] The evaluation of the generated patent claims' quality and coherence is based exclusively on qualitative observation of outputs from the first 100 fine-tuning steps (Abstract), without any quantitative metrics, held-out test set evaluation, or expert legal review, which is insufficient to substantiate the central claim of producing usable claims.
- [Abstract] No comparisons to baselines or prior methods for text generation in technical domains are reported (Abstract), making it difficult to gauge the relative performance of the fine-tuned GPT-2 model.
- [Abstract] The assertion of being the first to generate patent claims by machines and apply GPT-2 to this task lacks supporting discussion of related work on patent text processing or claim generation (Abstract).
minor comments (2)
- [Abstract] The description of the 'unique language structure in patent claims' and how it was leveraged is not detailed enough for reproducibility.
- The paper would benefit from including the dataset size, source, and preprocessing steps in the main text.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] The evaluation of the generated patent claims' quality and coherence is based exclusively on qualitative observation of outputs from the first 100 fine-tuning steps (Abstract), without any quantitative metrics, held-out test set evaluation, or expert legal review, which is insufficient to substantiate the central claim of producing usable claims.
Authors: The work is explicitly positioned as an exploratory study of the fine-tuning process on patent claims rather than a claim of producing legally usable outputs. We acknowledge the evaluation is limited to qualitative observations and will revise the abstract and introduction to more clearly state the exploratory scope, limitations, and that no claims are made regarding legal soundness or immediate usability. revision: partial
-
Referee: [Abstract] No comparisons to baselines or prior methods for text generation in technical domains are reported (Abstract), making it difficult to gauge the relative performance of the fine-tuned GPT-2 model.
Authors: As the first reported application of GPT-2 (or any LM) to patent claim generation, no task-specific baselines existed at the time. We will add a short discussion of related text generation techniques in technical domains to provide context for relative performance. revision: yes
-
Referee: [Abstract] The assertion of being the first to generate patent claims by machines and apply GPT-2 to this task lacks supporting discussion of related work on patent text processing or claim generation (Abstract).
Authors: We agree a discussion of prior patent text processing work would strengthen the novelty claim. We will add a related work paragraph covering relevant patent analysis and generation literature. revision: yes
- Provision of quantitative metrics, held-out test set results, or expert legal review, as these were outside the scope of the original exploratory study and cannot be added without new experiments.
Circularity Check
Empirical fine-tuning experiment contains no circular derivations or self-referential claims
full rationale
The paper describes an applied ML experiment: fine-tuning GPT-2 on patent claims and qualitatively inspecting generated text at early training steps. No equations, fitted parameters, predictions, or first-principles derivations are present. The central claims (first application of GPT-2 to this task, new sampling approach) are supported by reported experimental observations rather than any reduction to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems. The work is self-contained as an empirical demonstration without the circular patterns enumerated in the analysis criteria.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Patent claim language possesses a unique implicit structure that can be exploited by language-model fine-tuning.
Reference graph
Works this paper leans on
-
[1]
Deep contextualized word representations
M.E.Peters, M.Neumann, M.Iyyer, M.Gardner, C.Clark, K.Lee, L.Zettlemoyer, Deep contextualized word representations, (2018). https://arxiv.org/abs/1802.05365 (accessed April10, 2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[2]
A.Radford, K.Narasimhan, T.Salimans, I.Sutskever, Improving Language Understanding by Generative Pre-Training (transformer in real world), (n.d.) 1–12
-
[3]
A.Radrof, J.Wu, R.Child, D.Luan, D.Amodei, I.Sutskever, Language Models are Unsupervised Multitask Learners, (2018)
work page 2018
-
[4]
J.Devlin, M.-W.Chang, K.Lee, K.Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proc. 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Vol. 1 (Long Short Pap., 2019: pp. 4171–4186. https://aclweb.org/anthology/papers/N/N19/N 19-1423/
work page 2019
-
[5]
BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model
A.Wang, K.Cho, BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model, (2019). http://arxiv.org/abs/1902.04094 (accessed March1, 2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[6]
https://github.com/openai/gpt-2 (accessed June2, 2019)
OpenAI, GPT-2 source code, (n.d.). https://github.com/openai/gpt-2 (accessed June2, 2019)
work page 2019
-
[7]
L.Aristodemou, F.Tietze, The state-of-the-art on Intellectual Property Analytics (IPA): A literature review on artificial intelligence, machine learning and deep learning methods for analysing intellectual property (IP) data, World Pat. Inf. 55 (2018) 37–51. doi:10.1016/j.wpi.2018.07.002
-
[8]
M.Lupu, Information retrieval, machine learning, and Natural Language Processing for intellectual property information, World Pat. Inf. 49 (2017) A1–A3. doi:10.1016/j.wpi.2017.06.002
-
[9]
A.Vaswani, N.Shazeer, N.Parmar, J.Uszkoreit, L.Jones, A.N.Gomez, L.Kaiser, I.Polosukhin, Attention Is All You Need, (2017). http://arxiv.org/abs/1706.03762 (accessed December24, 2018)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[10]
S.Ruder, NLP’s ImageNet moment has arrived, (n.d.). http://ruder.io/nlp-imagenet/
-
[11]
USPTO, USPTO Open Data Portal, (n.d.). https://developer.uspto.gov/
-
[12]
https://console.cloud.google.com/bigquery?p =patents-public-data
Google, Google Patents Public Datasets on BigQuery, (n.d.). https://console.cloud.google.com/bigquery?p =patents-public-data
-
[13]
https://github.com/minimaxir/gpt-2-simple
M.Woolf, gpt-2-simple, (n.d.). https://github.com/minimaxir/gpt-2-simple
-
[14]
https://data.mendeley.com/datasets/b8853hnj 7b/draft?a=b99308ff-c24b-428c-96d7- d851962a2714
gpt2-claims-2013_for_345M.npz, (n.d.). https://data.mendeley.com/datasets/b8853hnj 7b/draft?a=b99308ff-c24b-428c-96d7- d851962a2714
-
[15]
https://data.mendeley.com/datasets/9dvny7cg cz/draft?a=6ba92bff-b464-4665-90c9- 8e03f1ba4a13
gpt2-claims-2013.txt, (n.d.). https://data.mendeley.com/datasets/9dvny7cg cz/draft?a=6ba92bff-b464-4665-90c9- 8e03f1ba4a13
work page 2013
-
[16]
https://openai.com/blog/better-language- models/ (accessed June3, 2019)
OpenAI, Better Language Models and Their Implications, (n.d.). https://openai.com/blog/better-language- models/ (accessed June3, 2019)
work page 2019
-
[17]
https://colab.research.google.com (accessed June2, 2019)
Google Colaboratory, (n.d.). https://colab.research.google.com (accessed June2, 2019)
work page 2019
-
[18]
https://github.com/nshepperd/gpt- 2/blob/finetuning/src/memory_saving_gradie nts.py
N.Shepperd, memory_saving_gradients.py, (n.d.). https://github.com/nshepperd/gpt- 2/blob/finetuning/src/memory_saving_gradie nts.py
-
[19]
https://github.com/huggingface/pytorch- pretrained-BERT (accessed June3, 2019)
H.Face, The Big-&-Extending-Repository-of- Transformers: Pretrained PyTorch models for Google’s BERT, OpenAI GPT & GPT-2, Google/CMU Transformer-XL., (n.d.). https://github.com/huggingface/pytorch- pretrained-BERT (accessed June3, 2019)
work page 2019
-
[20]
https://data.mendeley.com/datasets/cgy6ng9k wm/draft?a=c9b6c696-f768-46de-b531- 7c2b5479bc50
First 100 steps of fine-tuning GPT-2, (n.d.). https://data.mendeley.com/datasets/cgy6ng9k wm/draft?a=c9b6c696-f768-46de-b531- 7c2b5479bc50
-
[21]
Visualizing Attention in Transformer-Based Language Representation Models
J.Vig, Visualizing Attention in Transformer- Based Language Representation Models, (2019). http://arxiv.org/abs/1904.02679 (accessed April26, 2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[22]
The Curious Case of Neural Text Degeneration
A.Holtzman, J.Buys, M.Forbes, Y.Choi, The Curious Case of Neural Text Degeneration, (2019). http://arxiv.org/abs/1904.09751 (accessed May21, 2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[23]
https://data.mendeley.com/datasets/wftfn4rs4 p/draft?a=009a0411-eb5c-4dcf-bc0d- 4995842a38ae
Unconditional sampling results, (n.d.). https://data.mendeley.com/datasets/wftfn4rs4 p/draft?a=009a0411-eb5c-4dcf-bc0d- 4995842a38ae
-
[24]
https://data.mendeley.com/datasets/sp3g6c4m c5/draft?a=df172e77-ed9f-4b9d-9339- e1e8305a6d3d
Conditional sampling results (1), (n.d.). https://data.mendeley.com/datasets/sp3g6c4m c5/draft?a=df172e77-ed9f-4b9d-9339- e1e8305a6d3d
-
[25]
https://data.mendeley.com/datasets/dnxdrgr3h 6/draft?a=173df973-966c-4c1f-b3f6- 1417d6aa1f4a
Conditional sampling results (2), (n.d.). https://data.mendeley.com/datasets/dnxdrgr3h 6/draft?a=173df973-966c-4c1f-b3f6- 1417d6aa1f4a. 11
-
[26]
Story Ending Prediction by Transferable BERT
Z.Li, X.Ding, T.Liu, Story Ending Prediction by Transferable BERT, (2019). http://arxiv.org/abs/1905.07504 (accessed June2, 2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[27]
T.Pires, E.Schlinger, D.Garrette, How multilingual is Multilingual BERT?, ArXiv1906.01502v1 [Cs]. (2019). http://arxiv.org/abs/1906.01502v1 (accessed June10, 2019). Appendix A The following SQL selects the first claims of all US utility patents in 2013 and aggregate s the CPC codes at subclass level: (data source: Google Patents Public Datasets on BigQu...
work page internal anchor Pith review Pith/arXiv arXiv 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.