We use a per-device train batch size of 4 with 4 gradient ac- cumulation steps, resulting in an effective batch size of

with a cosine learning rate scheduler, a learning rate of 0 · 2019

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Temporal Tokenization Strategies for Event Sequence Modeling with Large Language Models

cs.CL · 2025-12-15 · unverdicted · novelty 6.0

No single temporal tokenization strategy is best for all event data; performance depends on matching the tokenizer to the statistical shape of the data.

citing papers explorer

Showing 1 of 1 citing paper.

Temporal Tokenization Strategies for Event Sequence Modeling with Large Language Models cs.CL · 2025-12-15 · unverdicted · none · ref 11
No single temporal tokenization strategy is best for all event data; performance depends on matching the tokenizer to the statistical shape of the data.

We use a per-device train batch size of 4 with 4 gradient ac- cumulation steps, resulting in an effective batch size of

fields

years

verdicts

representative citing papers

citing papers explorer