pith. sign in

arxiv: 2509.20909 · v2 · submitted 2025-09-25 · 💻 cs.CL

LogitTrace: Detecting Benchmark Contamination via Layerwise Logit Trajectories

Pith reviewed 2026-05-18 14:29 UTC · model grok-4.3

classification 💻 cs.CL
keywords benchmark contaminationlogit trajectoriesmemorization detectionlayerwise analysislarge language modelsevaluation auditing
0
0 comments X

The pith

Contaminated benchmark examples commit to answers earlier across layerwise logit trajectories than clean examples do.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that memorization from benchmark contamination leaves a detectable signature in how large language models build their answer preferences layer by layer. Contaminated examples tend to lock onto the correct answer quickly and stabilize, while clean examples accumulate evidence more gradually before committing. These trajectory differences persist across models and survive rephrasing of the input, which defeats most existing detection approaches based on string overlap or final-output likelihood. A lightweight classifier using the trajectory features can then separate the two categories reliably. If the claim holds, it supplies an internal-model signal for auditing whether strong benchmark scores reflect genuine reasoning or prior exposure to the test items.

Core claim

LogitTrace tracks the evolving logit values assigned to candidate answers as they pass through successive layers of the model. Contaminated examples display earlier commitment, with the target answer's logit rising sharply and stabilizing sooner, whereas clean examples show slower, more incremental growth consistent with step-by-step evidence accumulation. These patterns allow a simple classifier to distinguish contamination status across different models and input variants. Controlled experiments that inject target samples via LoRA reproduce the same early-commitment signatures, linking the observed trajectories to repeated exposure during training.

What carries the argument

Layerwise logit trajectories that record how preference logits for possible answers change from early to late layers, revealing the timing of decision stabilization.

If this is right

  • Benchmark scores on tasks such as AIME or Math500 can be audited for hidden contamination using signals available inside the model itself.
  • Detection continues to function on reworded or paraphrased questions that break surface-overlap and completion-based checks.
  • The same layerwise view supplies a new observable for studying how memorization alters internal computation rather than just final outputs.
  • Trajectory features could be monitored during evaluation to flag potential data leakage without requiring the original training corpus.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Interventions that slow early commitment on seen examples might reduce inflated performance on contaminated benchmarks.
  • The method could extend to detecting memorization induced by fine-tuning rather than pretraining exposure.
  • Combining trajectory signals with existing overlap-based detectors might raise overall reliability without added external data.

Load-bearing premise

The observed differences in trajectory timing are produced by memorization from contamination rather than by question difficulty, prompt format, or model-specific factors that happen to align with contamination labels.

What would settle it

Finding that clean but unusually difficult examples produce the same early-commitment trajectory shape as known contaminated ones, or that rephrasing alone alters trajectories independently of contamination status.

Figures

Figures reproduced from arXiv: 2509.20909 by Ali Payani, Haiyan Zhao, Mengnan Du, Yingcong Li, Zirui He.

Figure 1
Figure 1. Figure 1: These four examples illustrate the potential risk that memorization may masquerade as generalization. In (a), the model answers the original problem correctly, which is expected. In (b), after rephrasing, the model still provides the correct solution, appearing to generalize. In (c), however, even when the input is reduced to a minimal set of keywords, the model produces the correct answer together with a … view at source ↗
Figure 2
Figure 2. Figure 2: Class-mean activation heatmaps. Average channel activations across layers for clean vs. contaminated samples. Red regions denote stronger activation and blue denotes weaker activation. 0 5 10 15 20 25 Layer 10 1 10 0 10 1 10 2 10 3 0 10 3 10 2 10 1 10 0 Margin Mean activation trajectories of contaminated samples 0 5 10 15 20 25 Layer Mean activation trajectories of clean samples 0.0 0.5 1.0 1.5 2.0 Entropy… view at source ↗
Figure 3
Figure 3. Figure 3: Mean activation trajectories of correctly classified samples. Each panel shows layerwise dynamics at the answer position for contaminated (left) and clean (right) samples. ence answer. (ii) TS-Guessing (Deng et al., 2024): contamination is detected if the model can in￾fer the gold answer through slot-filling or symbolic transformations, without requiring exact string match. These two belong to completion-d… view at source ↗
Figure 4
Figure 4. Figure 4: Layer-wise activations for one sample across LoRA ranks, with layers on the x-axis, 24 feature channels on the y-axis, and color showing normalized values. 0 5 10 15 20 25 Layer 0.0 0.2 0.4 0.6 0.8 1.0 Grad-CAM Clean Contaminated [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Class-mean Grad-CAM visualization comparing clean vs. contaminated samples. Qwen2.5-7B Llama3.1-8B 0 20 40 60 80 100 F1 Score (%) 68.7 81.4 95.2 64.5 75.8 88.9 layer=1/3 layer=1/2 layer=1 [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Dataset composition and group-level performance comparison. Left: fraction of samples classified as seen-like (94.9%) versus not-seen (5.1%). Middle: Top-1 accuracy and MRR@6 for the two groups. Right: effect sizes (Seen–Not) with 95% confidence intervals. MMLU mathematics may derive from recalling previously seen problems rather than solving unseen ones from first principles. Our detector thus recovers hi… view at source ↗
Figure 8
Figure 8. Figure 8: Activation trajectories of the 10 most confidently classified samples by the discriminator trained on Qwen2.5-7B. Clean samples are shown on the left, while contaminated samples are shown on the right. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Activation trajectories of the 10 most confidently classified samples by the discriminator trained on Qwen2.5-Math-7B. Clean samples are shown on the left, while contaminated samples are shown on the right. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Activation trajectories of the 10 most confidently classified samples by the discriminator trained on Llama3.1-8B. Clean samples are shown on the left, while contaminated samples are shown on the right. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Activation trajectories of the 10 most confidently classified samples by the discriminator trained on Qwen3-8B. Clean samples are shown on the left, while contaminated samples are shown on the right. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
read the original abstract

Large language models (LLMs) are commonly evaluated on challenging benchmarks such as AIME and Math500, where benchmark contamination can make memorized solutions appear as genuine reasoning. Existing detection methods largely rely on surface overlap, completion behavior, or final-output likelihood, and often degrade when inputs are simply rephrased. In this paper, we propose LogitTrace(Layerwise Logit Trajectories), a framework for analyzing memorization-like decision dynamics through intermediate logit trajectories. Instead of judging memorization only from the final answer, LogitTrace examines how model preferences emerge and stabilize across layers. We find that contaminated examples tend to show earlier commitment, while clean examples exhibit more gradual evidence accumulation. These trajectory signals allow a lightweight classifier to separate contaminated and clean examples across multiple models and input variants. Controlled LoRA injection experiments further show that repeated exposure to target samples induces similar trajectory patterns. Overall, our results suggest that LogitTrace provides evidence beyond surface overlap and final-output confidence, offering a useful lens for studying memorization-like behavior in LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces LogitTrace, a framework for detecting benchmark contamination in LLMs by analyzing layerwise logit trajectories rather than final outputs or surface overlap. It claims that contaminated examples exhibit earlier commitment to answers across layers, while clean examples show more gradual evidence accumulation; these signals enable a lightweight classifier to separate the two across models and input variants. Controlled LoRA injection experiments are presented as further evidence that repeated exposure induces similar trajectory patterns.

Significance. If the central claim holds after controls, the work offers a potentially useful internal-model lens on memorization that could complement existing detection methods and improve evaluation reliability on benchmarks such as AIME and Math500. The LoRA injection component provides a controlled causal test that is a methodological strength.

major comments (2)
  1. [Experimental Setup / Abstract] The central claim requires that trajectory differences are caused by contamination-induced memorization. However, the manuscript provides no indication that contaminated and clean examples are matched or controlled for question difficulty, length, lexical features, or prompt formatting (see skeptic note and abstract description of experiments). These factors independently modulate evidence accumulation speed; without regression controls, propensity matching, or difficulty-stratified ablations, the classifier separation and LoRA results risk being spurious.
  2. [Abstract / Results] The abstract states that a classifier works and that LoRA reproduces the pattern, yet reports no quantitative results, error bars, dataset sizes, accuracy metrics, or statistical tests. This absence makes it impossible to assess effect sizes, reproducibility, or whether post-hoc choices affect the separation claim.
minor comments (2)
  1. [Method] Clarify the precise operational definition of 'earlier commitment' (e.g., the layer index at which the target logit first exceeds a threshold or the rate of logit stabilization).
  2. [Introduction] Add a short related-work paragraph situating LogitTrace against prior internal-representation analyses of memorization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of experimental rigor that we address below. We have revised the manuscript to incorporate additional controls and quantitative reporting as outlined in the point-by-point responses.

read point-by-point responses
  1. Referee: [Experimental Setup / Abstract] The central claim requires that trajectory differences are caused by contamination-induced memorization. However, the manuscript provides no indication that contaminated and clean examples are matched or controlled for question difficulty, length, lexical features, or prompt formatting (see skeptic note and abstract description of experiments). These factors independently modulate evidence accumulation speed; without regression controls, propensity matching, or difficulty-stratified ablations, the classifier separation and LoRA results risk being spurious.

    Authors: We agree that explicit controls for question difficulty, length, lexical features, and prompt formatting are necessary to strengthen the attribution of trajectory differences to contamination. Our original experiments drew contaminated and clean examples from the same benchmark distributions to reduce some baseline differences, but we did not report formal matching or regression adjustments. In the revision we will add difficulty-stratified ablations (binning by proxy difficulty metrics such as average model performance on held-out similar items), propensity-score matching on length and lexical overlap, and regression controls that partial out these covariates when evaluating classifier separation and LoRA-induced patterns. These additions will directly address the concern that the observed signals could be confounded. revision: yes

  2. Referee: [Abstract / Results] The abstract states that a classifier works and that LoRA reproduces the pattern, yet reports no quantitative results, error bars, dataset sizes, accuracy metrics, or statistical tests. This absence makes it impossible to assess effect sizes, reproducibility, or whether post-hoc choices affect the separation claim.

    Authors: The referee is correct that the abstract and high-level results summary omitted concrete metrics. The full manuscript contains the underlying experiments, but these details were not elevated to the abstract. We will revise the abstract to report dataset sizes (number of contaminated and clean examples per model), classifier accuracy with standard deviations across multiple runs or folds, and statistical significance tests comparing trajectory features between conditions. Error bars will be added to figures, and a summary table of quantitative results will be included in the main text so that effect sizes and reproducibility can be evaluated directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity in LogitTrace derivation chain

full rationale

The paper's central method observes empirical differences in layerwise logit trajectories (earlier commitment for contaminated examples vs. gradual accumulation for clean ones) and trains a lightweight classifier on these signals, with supporting LoRA injection experiments to induce similar patterns via repeated exposure. No derivation step reduces by construction to self-definition, fitted parameters renamed as predictions, or load-bearing self-citations. The approach relies on observable model behaviors across models and input variants rather than tautological inputs, making the chain self-contained against external contamination benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that logit trajectories encode memorization signals independently of surface features. No explicit free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Differences in layerwise logit trajectories reliably indicate memorization versus genuine reasoning.
    This premise is invoked when the authors interpret earlier commitment as evidence of contamination.

pith-pipeline@v0.9.0 · 5720 in / 1303 out tokens · 32604 ms · 2026-05-18T14:29:09.756115+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 3 internal anchors

  1. [1]

    Language models are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS), 2020

  2. [2]

    Extracting training data from large language models

    Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, et al. Extracting training data from large language models. In USENIX Security Symposium, 2021

  3. [3]

    Quantifying memorization across neural language models

    Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, and Chiyuan Zhang. Quantifying memorization across neural language models. In The Eleventh International Conference on Learning Representations, 2023

  4. [4]

    Investigating data contamination in modern benchmarks for large language models

    Chunyuan Deng, Yilun Zhao, Xiangru Tang, Mark Gerstein, and Arman Cohan. Investigating data contamination in modern benchmarks for large language models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024

  5. [5]

    Generalization or memorization? data contamination and trustworthy evaluation for large language models

    Yiheng Dong, Yuxin Guo, Yuxiang Wang, Yinpeng Chen, Xiaohui Sun, Lei Zhang, and Heng Ji. Generalization or memorization? data contamination and trustworthy evaluation for large language models. In Findings of the Association for Computational Linguistics: ACL 2024, 2024

  6. [6]

    Duarte, Xuandong Zhao, Arlindo L

    André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, and Lei Li. DE-COP : Detecting copyrighted content in language models training data, 2024

  7. [7]

    Data contamination quiz: A tool to detect and estimate contamination in large language models, 2023

    Shahriar Golchin and Mihai Surdeanu. Data contamination quiz: A tool to detect and estimate contamination in large language models, 2023

  8. [8]

    Time travel in LLMs : Tracing data contamination in large language models

    Shahriar Golchin and Mihai Surdeanu. Time travel in LLMs : Tracing data contamination in large language models. In Proceedings of the 2024 International Conference on Learning Representations (ICLR), 2024

  9. [9]

    Copyright violations and large language models

    Antonia Karamolegkou, Jiaang Li, Li Zhou, and Anders S gaard. Copyright violations and large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

  10. [10]

    Memorization or reasoning? exploring the idiom understanding of llms

    Jisu Kim, Youngwoo Shin, Uiji Hwang, Jihun Choi, Richeng Xuan, and Taeuk Kim. Memorization or reasoning? exploring the idiom understanding of llms. arXiv preprint arXiv:2505.16216, 2025

  11. [11]

    Overestimation in llm evaluation: A controlled large-scale study on data contamination’s impact on machine translation, 2025

    Muhammed Yusuf Kocyigit, Eleftheria Briakou, Daniel Deutsch, Jiaming Luo, Colin Cherry, and Markus Freitag. Overestimation in llm evaluation: A controlled large-scale study on data contamination’s impact on machine translation, 2025

  12. [12]

    Memorization vs

    Aochong Oliver Li and Tanya Goyal. Memorization vs. reasoning: Updating llms with new knowledge. In Findings of the Association for Computational Linguistics: ACL 2025, 2025

  13. [13]

    Task contamination: Language models may not be few-shot anymore, 2023

    Changmao Li and Jeffrey Flanigan. Task contamination: Language models may not be few-shot anymore, 2023

  14. [14]

    Lost in the Middle: How Language Models Use Long Contexts

    Xuefeng Li, Jason Wei, Denny Zhou, et al. Evaluating verbal reasoning in language models. arXiv preprint arXiv:2307.03172, 2023 a

  15. [15]

    Estimating contamination via perplexity: Quantifying memorisation in language model evaluation

    Yucheng Li et al. Estimating contamination via perplexity: Quantifying memorisation in language model evaluation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023 b

  16. [16]

    Feder Cooper, Daphne Ippolito, Christopher A

    Milad Nasr, Javier Rando, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Florian Tram \`e r, and Katherine Lee. Scalable extraction of training data from aligned, production language models. In The Thirteenth International Conference on Learning Representations, 2025

  17. [17]

    Interpreting GPT : The logit lens, 2020

    Nostalgebraist. Interpreting GPT : The logit lens, 2020. Online blog post

  18. [18]

    GPT-4 Technical Report

    OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

  19. [19]

    Lipton, and J

    Avi Schwarzschild, Zhili Feng, Pratyush Maini, Zachary C. Lipton, and J. Zico Kolter. Memorization with compression: Rethinking llm memorization through the lens of adversarial compression. In Advances in Neural Information Processing Systems (NeurIPS), 2024

  20. [20]

    Grad-cam: Visual explanations from deep networks via gradient-based localization

    Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp.\ 618--626, 2017

  21. [21]

    Detecting pretraining data from large language models

    Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, and Luke Zettlemoyer. Detecting pretraining data from large language models. In The Twelfth International Conference on Learning Representations, 2024

  22. [22]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023

  23. [23]

    Dice: Detecting in-distribution contamination in llm's fine-tuning phase for math reasoning

    Shangqing Tu, Kejian Zhu, Yushi Bai, Zijun Yao, Lei Hou, and Juanzi Li. Dice: Detecting in-distribution contamination in llm's fine-tuning phase for math reasoning. arXiv preprint arXiv:2406.04197, 2024

  24. [24]

    Reasoning or memorization? unreliable results of reinforcement learning due to data contamination

    Yifei Wu, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Sean Welleck, Peter West, Chandra Bhagavatula, Ronan Le Bras, et al. Reasoning or memorization? unreliable results of reinforcement learning due to data contamination. arXiv preprint arXiv:2503.04683, 2025

  25. [25]

    Do llms know to respect copyright notice? In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

    Jialiang Xu, Shenglan Li, Zhaozhuo Xu, and Denghui Zhang. Do llms know to respect copyright notice? In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

  26. [26]

    arXiv preprint arXiv:2311.04850 , year =

    Shuo Yang, Wei-Lin Chiang, Lianmin Zheng, Joseph E Gonzalez, and Ion Stoica. Rethinking benchmark and contamination for language models with rephrased samples. arXiv preprint arXiv:2311.04850, 2023

  27. [27]

    Data contamination can cross language barriers

    Feng Yao, Yufan Zhuang, Zihao Sun, Sunan Xu, Animesh Kumar, and Jingbo Shang. Data contamination can cross language barriers. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024

  28. [28]

    Data contamination calibration for black-box llms

    Wentao Ye, Jiaqi Hu, Liyao Li, Haobo Wang, Gang Chen, and Junbo Zhao. Data contamination calibration for black-box llms. In Findings of the Association for Computational Linguistics: ACL 2024, 2024

  29. [29]

    Codeipprompt: Intellectual property infringement assessment of code language models

    Zhiyuan Yu, Yuhao Wu, Ning Zhang, Chenguang Wang, Yevgeniy Vorobeychik, and Chaowei Xiao. Codeipprompt: Intellectual property infringement assessment of code language models. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.\ 40373--40389. PMLR, 2023

  30. [30]

    PaCoST : Paired confidence significance testing for benchmark contamination detection in large language models

    Huixuan Zhang, Yun Lin, and Xiaojun Wan. PaCoST : Paired confidence significance testing for benchmark contamination detection in large language models. In Findings of the Association for Computational Linguistics: EMNLP 2024, 2024 a

  31. [31]

    Pre-training data detection for large language models: A divergence-based calibration method

    Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, and Xueqi Cheng. Pre-training data detection for large language models: A divergence-based calibration method. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 b

  32. [32]

    Measuring copyright risks of large language models via partial information probing

    Weijie Zhao, Huajie Shao, Zhaozhuo Xu, Suzhen Duan, and Denghui Zhang. Measuring copyright risks of large language models via partial information probing. In Proceedings of the International Conference on Data-Centric AI (DCAI), 2024

  33. [33]

    Don't make your llm an evaluation benchmark cheater

    Kun Zhou, Yutao Zhu, Zhipeng Chen, Wentong Chen, Wayne Xin Zhao, Xu Chen, Yankai Lin, Ji-Rong Wen, and Jiawei Han. Don't make your llm an evaluation benchmark cheater. arXiv preprint arXiv:2311.01964, 2023

  34. [34]

    Inference-time decontamination: Reusing leaked benchmarks for large language model evaluation

    Qin Zhu, Qinyuan Cheng, Runyu Peng, Xiaonan Li, Ru Peng, Tengxiao Liu, Xipeng Qiu, and Xuanjing Huang. Inference-time decontamination: Reusing leaked benchmarks for large language model evaluation. In Findings of the Association for Computational Linguistics: EMNLP 2024, 2024 a

  35. [35]

    Clean-eval: Clean evaluation on contaminated large language models

    Wenhong Zhu, Hongkun Hao, Zhiwei He, Yunze Song, Yumeng Zhang, Hanxu Hu, Yiran Wei, Rui Wang, and Hongyuan Lu. Clean-eval: Clean evaluation on contaminated large language models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024 b

  36. [36]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  37. [37]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  38. [38]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  39. [39]

    b/G5O6 m= 'BV ۜ i?漠7S 1ɞ/ps>Yem'>y1 ŸR a AV

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...