How the Avengers assemble: Ecological modelling of effective cast sizes for movies

Lewis Mitchell; Matthew Roughan; Tobin South

arxiv: 1906.08403 · v1 · pith:IQXXOSMEnew · submitted 2019-06-20 · 💻 cs.SI · physics.soc-ph

How the Avengers assemble: Ecological modelling of effective cast sizes for movies

Matthew Roughan , Lewis Mitchell , Tobin South This is my paper

Pith reviewed 2026-05-25 19:39 UTC · model grok-4.3

classification 💻 cs.SI physics.soc-ph

keywords Shannon entropycharacter diversityecological diversityMarvel Cinematic UniverseJensen-Shannon divergencerecommender systemsmovie classificationcast size

0 comments

The pith

A Shannon-entropy metric on character counts measures effective cast size in movies and predicts success in the Marvel Cinematic Universe.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Movies have varying numbers of characters, but direct counts like credits are unreliable. The paper proposes using Shannon entropy from ecology to capture the effective diversity of characters based on their appearance frequencies. This metric aids in classifying films and, when combined with Jensen-Shannon divergence, measures similarity between movies for applications like recommendations. When tested on the Marvel Cinematic Universe, the measures also correlate with film success and clarify connections across the series.

Core claim

The number of characters in a movie is characterised using a Shannon-entropy based metric drawn from ecological diversity, and the metric generalised with Jensen-Shannon divergence provides a similarity measure that is useful in recommender systems and predicts success for films in the MCU while understanding relationships in the film arc.

What carries the argument

Shannon-entropy metric applied to character appearance or mention counts, generalised via Jensen-Shannon divergence to quantify movie similarity.

If this is right

The metric enables taxonomic classification of movies based on character diversity.
It supplies a similarity measure for use in recommender systems such as Netflix.
The measures predict success for films within the MCU.
They provide insight into relationships between stories across the franchise arc.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar entropy-based approaches could be tested on other film franchises or television series to analyze narrative complexity.
Recommender systems could incorporate this similarity to suggest films with comparable cast dynamics.

Load-bearing premise

Character appearance or mention counts can be extracted reliably from available data sources and the resulting entropy meaningfully reflects narrative importance.

What would settle it

Finding that the entropy metric does not correlate with independent assessments of cast size or fails to predict success metrics like box office performance in the MCU.

Figures

Figures reproduced from arXiv: 1906.08403 by Lewis Mitchell, Matthew Roughan, Tobin South.

**Figure 1.** Figure 1: Effective cast size of each movie in the MCU showing type of movies by shape, and sub-sequences connected by dashed lines. The x-axis is the theatrical release date. sub-sequences of the overall set of movies are indicated by dashed lines. There are many interesting features of this plot. When considered by class we see notable features: most origin movies have a small effective cast, which grows in sequel… view at source ↗

**Figure 2.** Figure 2: Profitability as a function of effective cast size. the franchise, and as one of the key initiators of the cinematic universe. Quality of acting and direction, cast “star power”, timing and other factors cannot be discounted as important to the overall success of a movie. However, the effective cast size also appears to influence the profitability of a movie. This is a fact that does not seem to be missed … view at source ↗

**Figure 3.** Figure 3: IMDb rating as a function of effective cast size. movies. The two transcriptions were very different: the second set were performed at a much courser level [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Conflict and dialogue metrics of cast size. Shading indicates less complete (dialogue) datasets. There are additional patterns of note. We can understand that movies that sit above the reference line contain more dialogue-based participation, and less conflict, and in turn, those below the line entail more conflict. Extreme examples are Spider-Man and Captain America: The First Avenger. Origin movies usual… view at source ↗

**Figure 5.** Figure 5: A comparison of the two distance metrics showing that effective divergence barDeffective lies below the Jensen-Shannon measure D¯ JS. In practical terms the metric performs exactly as you might expect [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Heat map of normalised similarities S¯ effective between pairs of movies. Evident are blocks of movies corresponding to the major sub-sequences, e.g., the Thor movies or the Iron Man movies. Also noticeable is the sharing of cast between the Avengers (team-up) movies and many of the others. projecting into a 2D space, the 3D projection more clearly separates these clusters, but is hard to illustrate here. … view at source ↗

**Figure 7.** Figure 7: Dendrogram derived from hierarchical clustering of the movies based on the dis-similarities DA,B -0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4 Ant-Man Ant-Man And The Wasp The Avengers Avengers: Age Of Ultron Avengers: Infinity War Black Panther Captain America: Civil War Captain America: The First Avenger Captain America: The Winter Soldier Captain Marvel Doctor Strange Guardians Of The Galaxy Guardians Of … view at source ↗

**Figure 8.** Figure 8: MDS projection into a 2D space based on the cast dissimilarities. Note that a small translation has been applied to place Avengers: Infinity War at the origin. June 21, 2019 20/25 [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

read the original abstract

The number of characters in a movie is an interesting feature. However, it is non-trivial to measure directly. Naive metrics such as the number of credited characters vary wildly. Here, we show that a metric based on the notion of "ecological diversity" as expressed through a Shannon-entropy based metric can characterise the number of characters in a movie, and is useful in taxonomic classification. We also show how the metric can be generalised using Jensen-Shannon divergence to provide a measure of the similarity of characters appearing in different movies, for instance of use in recommender systems, e.g., Netflix. We apply our measures to the Marvel Cinematic Universe (MCU), and show what they teach us about this highly successful franchise of movies. In particular, these measures provide a useful predictor of "success" for films in the MCU, as well as a natural means to understand the relationships between the stories in the overall film arc.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies Shannon entropy to movie cast data for an effective size metric and MCU success prediction, but the unvalidated extraction of character counts is the key weak point.

read the letter

The main thing here is that the authors take a standard Shannon entropy measure from ecology and use it on character appearance counts to define effective cast size, then extend it with Jensen-Shannon divergence for movie similarity. They apply both to the MCU and report that the metric helps classify films and predicts success measures there. That domain application is the actual new piece; the underlying math is off-the-shelf information theory and ecology work, but the film-franchise use case is not common in the literature I know. They lay out the taxonomic and recommender angles clearly enough in the abstract and show some concrete observations about how the MCU stories relate through shared characters. That part is straightforward and could be useful to someone building media features. The soft spot is exactly the one the stress-test flags. The metric stands or falls on whether the input proportions come from reliable, narrative-relevant counts rather than production artifacts. The abstract gives no source details, no parsing method, and no check against manual ground truth, so it is impossible to tell if the entropy value tracks story importance or just dialogue volume and credit practices. Without that, the success prediction and similarity claims rest on an untested assumption. The paper is aimed at media analytics or recommender-system researchers who want a new cast-diversity feature. A reader working on quantitative pop-culture analysis would get practical ideas from the MCU case, even if they have to redo the data step themselves. It deserves peer review because the application is concrete and the methods are reproducible in principle; a referee can ask for the missing validation without the whole idea falling apart. I would send it out, but flag the data extraction as the item that needs to be shown explicitly.

Referee Report

3 major / 2 minor

Summary. The paper proposes a Shannon-entropy metric H = −∑ p_i log p_i computed from proportions of character appearances or mentions to quantify 'effective cast size' in films as an alternative to naive credit counts. It claims this ecological-diversity measure enables taxonomic classification of movies, generalizes via Jensen-Shannon divergence to a similarity metric useful for recommender systems, and when applied to the Marvel Cinematic Universe (MCU) yields insights into narrative relationships and serves as a predictor of film success.

Significance. If the extraction of reliable p_i values and the interpretation of H as reflecting narrative importance rather than data artifacts can be established, the work would offer a novel quantitative bridge between information theory, ecology, and film studies with direct applications to classification, recommendation, and franchise analysis. The MCU case study provides a concrete, high-visibility testbed. The absence of any reported validation, data provenance, or error analysis currently prevents assessment of whether these downstream uses are supported or spurious.

major comments (3)

[Abstract / Methods] Abstract and Methods (wherever described): The central claims rest on the premise that character mention/appearance counts can be extracted to form a distribution p_i from which H meaningfully quantifies effective cast size. No section specifies the data source (scripts, credits, Wikipedia, etc.), the parsing procedure, or any validation against manual ground-truth counts. This is load-bearing; without it the ecological interpretation fails and the taxonomic, JSD-similarity, and MCU-success claims become untestable.
[MCU results] MCU results section: The claim that the metric 'provides a useful predictor of success' for MCU films is stated without reported statistical controls, baseline comparisons, or error analysis. If success is measured by box-office or ratings, the manuscript must demonstrate that the entropy term adds explanatory power beyond obvious covariates (budget, release date, prior franchise performance).
[JSD section] JSD generalization: The extension to Jensen-Shannon divergence for movie similarity is presented as immediately useful for recommender systems, yet no quantitative evaluation (e.g., precision@K on a held-out set, comparison to content-based or collaborative baselines) is supplied to support this utility claim.

minor comments (2)

[Methods] Notation for the entropy formula should be introduced with an explicit equation number and definition of the summation index i (over characters).
[Results] The manuscript should include a table or figure showing example p_i distributions and resulting H values for a few well-known films to illustrate the metric.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed report. The comments highlight important gaps in documentation and validation that we agree need to be addressed. Below we respond point-by-point and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and Methods (wherever described): The central claims rest on the premise that character mention/appearance counts can be extracted to form a distribution p_i from which H meaningfully quantifies effective cast size. No section specifies the data source (scripts, credits, Wikipedia, etc.), the parsing procedure, or any validation against manual ground-truth counts. This is load-bearing; without it the ecological interpretation fails and the taxonomic, JSD-similarity, and MCU-success claims become untestable.

Authors: We agree that the current manuscript lacks explicit description of the data source and extraction pipeline. We will add a dedicated Methods subsection that specifies the source (Wikipedia plot summaries and cast lists for the films analyzed), the rule-based parsing procedure used to obtain character mention counts, and a small-scale validation exercise comparing the automated counts against manual annotations on a sample of films. This will be included in the revised version. revision: yes
Referee: [MCU results] MCU results section: The claim that the metric 'provides a useful predictor of success' for MCU films is stated without reported statistical controls, baseline comparisons, or error analysis. If success is measured by box-office or ratings, the manuscript must demonstrate that the entropy term adds explanatory power beyond obvious covariates (budget, release date, prior franchise performance).

Authors: The MCU analysis as written is descriptive and does not include the requested controls or model comparisons. We will revise the results section to present a multiple regression (or similar) with box-office or rating as the outcome, including covariates for budget, release date, and prior franchise performance. We will report the incremental explanatory power attributable to the entropy metric and any associated error or robustness checks. revision: yes
Referee: [JSD section] JSD generalization: The extension to Jensen-Shannon divergence for movie similarity is presented as immediately useful for recommender systems, yet no quantitative evaluation (e.g., precision@K on a held-out set, comparison to content-based or collaborative baselines) is supplied to support this utility claim.

Authors: We acknowledge that the manuscript asserts potential utility for recommender systems without any quantitative backing. We will either add a limited evaluation (for example, ranking movies by JSD and inspecting overlap with known similar titles) or temper the claim to present JSD as a similarity measure whose recommender value remains to be tested. The choice will depend on space and scope considerations in revision. revision: partial

Circularity Check

0 steps flagged

No circularity detected in derivation

full rationale

The paper defines its effective cast size via the standard Shannon entropy H = -∑ p_i log p_i applied to character mention proportions p_i obtained from external data sources, then extends it with the likewise standard Jensen-Shannon divergence for inter-movie similarity. No equations or claims reduce the output metric to a fitted parameter, self-defined quantity, or self-citation chain; the central results are direct computations on new inputs rather than tautological restatements of those inputs. The MCU success prediction is presented as an empirical correlation using the independently computed metric, with no indication that it collapses to the input data by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are stated. The entropy calculation implicitly assumes a well-defined probability distribution over characters that can be obtained from movie metadata.

pith-pipeline@v0.9.0 · 5688 in / 1033 out tokens · 21702 ms · 2026-05-25T19:39:45.824916+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Ecological Diversity: Measuring the Unmeasurable

Daly AJ, Baetens JM, De Baets B. Ecological Diversity: Measuring the Unmeasurable. Mathematics. 2018;6(7). doi:10.3390/math6070119

work page doi:10.3390/math6070119 2018
[2]

Screenplay: The foundations of screenwriting

Field S. Screenplay: The foundations of screenwriting. Delta; 2005

work page 2005
[3]

The thirty-six dramatic situations

Polti G. The thirty-six dramatic situations. JK Reeve; 1921

work page 1921
[4]

Twenty master plots and how to build them

Tobias RB. Twenty master plots and how to build them. Piatkus; 1993

work page 1993
[5]

The seven basic plots: Why we tell stories

Booker C. The seven basic plots: Why we tell stories. A&C Black; 2004

work page 2004
[6]

An Estimate of an Upper Bound for the Entropy of English

Brown PF, Pietra SAD, Pietra VJD, Lai JC, Mercer RL. An Estimate of an Upper Bound for the Entropy of English. Computational Linguistics. 1992;18(1):31–40

work page 1992
[7]

The complexity and entropy of literary styles

Kontoyiannis I. The complexity and entropy of literary styles. Department of Statistics, Stanford University; 1997. June 21, 2019 23/25

work page 1997
[8]

Shakespeare and other English Renaissance authors as characterized by Information Theory complexity quantifiers

Rosso OA, Craig H, Moscato P. Shakespeare and other English Renaissance authors as characterized by Information Theory complexity quantifiers. Physica A: Statistical Mechanics and its Applications. 2009;388(6):916–926

work page 2009
[9]

Information flow reveals prediction limits in online social activity

Bagrow JP, Liu X, Mitchell L. Information flow reveals prediction limits in online social activity. Nature Human Behaviour. 2019;3(2):122

work page 2019
[10]

Toward a meaningful definition of vocabulary size

D’Anna CA, Zechmeister EB, Hall JW. Toward a meaningful definition of vocabulary size. Journal of Reading Behavior. 1991;23(1):109–122

work page 1991
[11]

Humans store about 1.5 megabytes of information during language acquisition

Mollica F, Piantadosi ST. Humans store about 1.5 megabytes of information during language acquisition. Royal Society Open Science. 2019;6(3):181393

work page 2019
[12]

Palm Sunday: an autobiographical collage

Vonnegut K. Palm Sunday: an autobiographical collage. Dial Press; 1999

work page 1999
[13]

Macroanalysis: Digital methods and literary history

Jockers ML. Macroanalysis: Digital methods and literary history. University of Illinois Press; 2013

work page 2013
[14]

Syuzhet: Extract Sentiment and Plot Arcs from Text; 2015

Jockers ML. Syuzhet: Extract Sentiment and Plot Arcs from Text; 2015. Available from: https://github.com/mjockers/syuzhet

work page 2015
[15]

The Bestseller Code: Anatomy of the blockbuster novel

Archer J, Jockers ML. The Bestseller Code: Anatomy of the blockbuster novel. St. Martin’s Press; 2016

work page 2016
[16]

A multiscale theory for the dynamical evolution of sentiment in novels

Gao J, Jockers ML, Laudun J, Tangherlini T. A multiscale theory for the dynamical evolution of sentiment in novels. In: 2016 International Conference on Behavioral, Economic and Socio-cultural Computing (BESC); 2016. p. 1–4

work page 2016
[17]

The emotional arcs of stories are dominated by six basic shapes

Reagan AJ, Mitchell L, Kiley D, Danforth CM, Dodds PS. The emotional arcs of stories are dominated by six basic shapes. EPJ Data Science. 2016;5(1):31

work page 2016
[18]

The Data Science of Hollywood: Using Emotional Arcs of Movies to Drive Business Model Innovation in Entertainment Industries

Del Vecchio M, Kharlamov A, Parry G, Pogrebna G. The Data Science of Hollywood: Using Emotional Arcs of Movies to Drive Business Model Innovation in Entertainment Industries. arXiv preprint arXiv:180702221. 2018

work page 2018
[19]

Universal properties of mythological networks

Mac Carron P, Kenna R. Universal properties of mythological networks. EPL (Europhysics Letters). 2012;99(2):28002

work page 2012
[20]

Viking sagas: Six degrees of Icelandic separation Social networks from the Viking era

Mac Carron P, Kenna R. Viking sagas: Six degrees of Icelandic separation Social networks from the Viking era. Significance. 2013;10(6):12–17. doi:10.1111/j.1740-9713.2013.00704.x

work page doi:10.1111/j.1740-9713.2013.00704.x 2013
[21]

Narrative as a Complex Network: A Study of Victor Hugo’s Les Mis´ erables

Min S, Park J. Narrative as a Complex Network: A Study of Victor Hugo’s Les Mis´ erables. In: Proceedings of HCI Korea. Hanbit Media, Inc.; 2016. p. 100–107

work page 2016
[22]

Representation of texts as complex networks: a mesoscopic approach

Ferraz de Arruda H, Nascimento Silva F, Queiroz Marinho V, Raphael Amancio D, da Fontoura Costa L. Representation of texts as complex networks: a mesoscopic approach. Journal of Complex Networks. 2017;6(1):125–144

work page 2017
[23]

Linguistic analysis of differences in portrayal of movie characters

Ramakrishna A, Mart´ ınez VR, Malandrakis N, Singla K, Narayanan S. Linguistic analysis of differences in portrayal of movie characters. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2017. p. 1669–1678

work page 2017
[24]

Marvel Universe looks almost like a real social network

Alberich R, Miro-Julia J, Rossell´ o F. Marvel Universe looks almost like a real social network. arXiv preprint cond-mat/0202174. 2002;. June 21, 2019 24/25

work page internal anchor Pith review Pith/arXiv arXiv 2002
[25]

Diana in the World of Men: a character network approach to analysing gendered vocal representation in Wonder Woman

Jones P. Diana in the World of Men: a character network approach to analysing gendered vocal representation in Wonder Woman . Feminist Media Studies. 2018;00(00):1–17. doi:10.1080/14680777.2018.1510846

work page doi:10.1080/14680777.2018.1510846 2018
[26]

The one comparing narrative social network extraction techniques

Edwards M, Mitchell L, Tuke J, Roughan M. The one comparing narrative social network extraction techniques. arXiv preprint arXiv:181101467. 2018

work page 2018
[27]

IMDb Help Center

What does uncredited mean?;. IMDb Help Center

work page
[28]

Can it be listed on IMDb?

I worked on a title but did not receive a screen credit. Can it be listed on IMDb?;. IMDb Help Center

work page
[29]

IMDb Help Center

Attributes;. IMDb Help Center

work page
[30]

Axiomatic Characterizations of Information Measures

Csisz´ ar I. Axiomatic Characterizations of Information Measures. Entropy. 2008;10(3):261–273. doi:10.3390/e10030261

work page doi:10.3390/e10030261 2008
[31]

A corpus driven study of the potential for vocabulary learning through watching movies

Webb S. A corpus driven study of the potential for vocabulary learning through watching movies. International Journal of Corpus Linguistics. 2010;15(4):497–519

work page 2010
[32]

A survey of available corpora for building data-driven dialogue systems

Serban IV, Lowe R, Henderson P, Charlin L, Pineau J. A survey of available corpora for building data-driven dialogue systems. arXiv preprint arXiv:151205742. 2015

work page 2015
[33]

https://transcripts.fandom.com/wiki/Transcripts_Wiki

Transcripts Wiki on Fandom;. https://transcripts.fandom.com/wiki/Transcripts_Wiki

work page
[34]

https://www.scriptslug.com/

Script Slug;. https://www.scriptslug.com/

work page
[35]

Multisemiotic Transcriptions as Film Referencing Systems

Baldry A. Multisemiotic Transcriptions as Film Referencing Systems. InTRAlinea: Online Translation Journal. 2016

work page 2016
[36]

Tidy data

Wickham H. Tidy data. Journal of Statistical Software. Submitted

work page
[37]

Marvel Cinematic Universe wiki

work page
[38]

Evolution and measurement of species diversity

Whittaker RH. Evolution and measurement of species diversity. Taxon. 1972; p. 213–251

work page 1972
[39]

Early Predictions of Movie Success: The Who, What, and When of Profitability

Lash MT, Zhao K. Early Predictions of Movie Success: The Who, What, and When of Profitability. Journal of Management Information Systems. 2016;33(3):874–903

work page 2016
[40]

On Choosing and Bounding Probability Metrics

Gibbs AL, Su FE. On Choosing and Bounding Probability Metrics. Interdisciplinary Science Reviews. 2002;70(3):419–435. doi:10.1111/j.1751-5823.2002.tb00178.x

work page doi:10.1111/j.1751-5823.2002.tb00178.x 2002
[41]

Similarity-Based Methods For Word Sense Disambiguation

Dagan I, Lee L, Pereira F. Similarity-Based Methods For Word Sense Disambiguation. In: Thirty-Fifth Annual Meeting of the Association for Computational Linguistics; 1997

work page 1997
[42]

Divergence measures based on the Shannon entropy

Lin J. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory. 1991;37:145–151. June 21, 2019 25/25

work page 1991

[1] [1]

Ecological Diversity: Measuring the Unmeasurable

Daly AJ, Baetens JM, De Baets B. Ecological Diversity: Measuring the Unmeasurable. Mathematics. 2018;6(7). doi:10.3390/math6070119

work page doi:10.3390/math6070119 2018

[2] [2]

Screenplay: The foundations of screenwriting

Field S. Screenplay: The foundations of screenwriting. Delta; 2005

work page 2005

[3] [3]

The thirty-six dramatic situations

Polti G. The thirty-six dramatic situations. JK Reeve; 1921

work page 1921

[4] [4]

Twenty master plots and how to build them

Tobias RB. Twenty master plots and how to build them. Piatkus; 1993

work page 1993

[5] [5]

The seven basic plots: Why we tell stories

Booker C. The seven basic plots: Why we tell stories. A&C Black; 2004

work page 2004

[6] [6]

An Estimate of an Upper Bound for the Entropy of English

Brown PF, Pietra SAD, Pietra VJD, Lai JC, Mercer RL. An Estimate of an Upper Bound for the Entropy of English. Computational Linguistics. 1992;18(1):31–40

work page 1992

[7] [7]

The complexity and entropy of literary styles

Kontoyiannis I. The complexity and entropy of literary styles. Department of Statistics, Stanford University; 1997. June 21, 2019 23/25

work page 1997

[8] [8]

Shakespeare and other English Renaissance authors as characterized by Information Theory complexity quantifiers

Rosso OA, Craig H, Moscato P. Shakespeare and other English Renaissance authors as characterized by Information Theory complexity quantifiers. Physica A: Statistical Mechanics and its Applications. 2009;388(6):916–926

work page 2009

[9] [9]

Information flow reveals prediction limits in online social activity

Bagrow JP, Liu X, Mitchell L. Information flow reveals prediction limits in online social activity. Nature Human Behaviour. 2019;3(2):122

work page 2019

[10] [10]

Toward a meaningful definition of vocabulary size

D’Anna CA, Zechmeister EB, Hall JW. Toward a meaningful definition of vocabulary size. Journal of Reading Behavior. 1991;23(1):109–122

work page 1991

[11] [11]

Humans store about 1.5 megabytes of information during language acquisition

Mollica F, Piantadosi ST. Humans store about 1.5 megabytes of information during language acquisition. Royal Society Open Science. 2019;6(3):181393

work page 2019

[12] [12]

Palm Sunday: an autobiographical collage

Vonnegut K. Palm Sunday: an autobiographical collage. Dial Press; 1999

work page 1999

[13] [13]

Macroanalysis: Digital methods and literary history

Jockers ML. Macroanalysis: Digital methods and literary history. University of Illinois Press; 2013

work page 2013

[14] [14]

Syuzhet: Extract Sentiment and Plot Arcs from Text; 2015

Jockers ML. Syuzhet: Extract Sentiment and Plot Arcs from Text; 2015. Available from: https://github.com/mjockers/syuzhet

work page 2015

[15] [15]

The Bestseller Code: Anatomy of the blockbuster novel

Archer J, Jockers ML. The Bestseller Code: Anatomy of the blockbuster novel. St. Martin’s Press; 2016

work page 2016

[16] [16]

A multiscale theory for the dynamical evolution of sentiment in novels

Gao J, Jockers ML, Laudun J, Tangherlini T. A multiscale theory for the dynamical evolution of sentiment in novels. In: 2016 International Conference on Behavioral, Economic and Socio-cultural Computing (BESC); 2016. p. 1–4

work page 2016

[17] [17]

The emotional arcs of stories are dominated by six basic shapes

Reagan AJ, Mitchell L, Kiley D, Danforth CM, Dodds PS. The emotional arcs of stories are dominated by six basic shapes. EPJ Data Science. 2016;5(1):31

work page 2016

[18] [18]

The Data Science of Hollywood: Using Emotional Arcs of Movies to Drive Business Model Innovation in Entertainment Industries

Del Vecchio M, Kharlamov A, Parry G, Pogrebna G. The Data Science of Hollywood: Using Emotional Arcs of Movies to Drive Business Model Innovation in Entertainment Industries. arXiv preprint arXiv:180702221. 2018

work page 2018

[19] [19]

Universal properties of mythological networks

Mac Carron P, Kenna R. Universal properties of mythological networks. EPL (Europhysics Letters). 2012;99(2):28002

work page 2012

[20] [20]

Viking sagas: Six degrees of Icelandic separation Social networks from the Viking era

Mac Carron P, Kenna R. Viking sagas: Six degrees of Icelandic separation Social networks from the Viking era. Significance. 2013;10(6):12–17. doi:10.1111/j.1740-9713.2013.00704.x

work page doi:10.1111/j.1740-9713.2013.00704.x 2013

[21] [21]

Narrative as a Complex Network: A Study of Victor Hugo’s Les Mis´ erables

Min S, Park J. Narrative as a Complex Network: A Study of Victor Hugo’s Les Mis´ erables. In: Proceedings of HCI Korea. Hanbit Media, Inc.; 2016. p. 100–107

work page 2016

[22] [22]

Representation of texts as complex networks: a mesoscopic approach

Ferraz de Arruda H, Nascimento Silva F, Queiroz Marinho V, Raphael Amancio D, da Fontoura Costa L. Representation of texts as complex networks: a mesoscopic approach. Journal of Complex Networks. 2017;6(1):125–144

work page 2017

[23] [23]

Linguistic analysis of differences in portrayal of movie characters

Ramakrishna A, Mart´ ınez VR, Malandrakis N, Singla K, Narayanan S. Linguistic analysis of differences in portrayal of movie characters. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); 2017. p. 1669–1678

work page 2017

[24] [24]

Marvel Universe looks almost like a real social network

Alberich R, Miro-Julia J, Rossell´ o F. Marvel Universe looks almost like a real social network. arXiv preprint cond-mat/0202174. 2002;. June 21, 2019 24/25

work page internal anchor Pith review Pith/arXiv arXiv 2002

[25] [25]

Diana in the World of Men: a character network approach to analysing gendered vocal representation in Wonder Woman

Jones P. Diana in the World of Men: a character network approach to analysing gendered vocal representation in Wonder Woman . Feminist Media Studies. 2018;00(00):1–17. doi:10.1080/14680777.2018.1510846

work page doi:10.1080/14680777.2018.1510846 2018

[26] [26]

The one comparing narrative social network extraction techniques

Edwards M, Mitchell L, Tuke J, Roughan M. The one comparing narrative social network extraction techniques. arXiv preprint arXiv:181101467. 2018

work page 2018

[27] [27]

IMDb Help Center

What does uncredited mean?;. IMDb Help Center

work page

[28] [28]

Can it be listed on IMDb?

I worked on a title but did not receive a screen credit. Can it be listed on IMDb?;. IMDb Help Center

work page

[29] [29]

IMDb Help Center

Attributes;. IMDb Help Center

work page

[30] [30]

Axiomatic Characterizations of Information Measures

Csisz´ ar I. Axiomatic Characterizations of Information Measures. Entropy. 2008;10(3):261–273. doi:10.3390/e10030261

work page doi:10.3390/e10030261 2008

[31] [31]

A corpus driven study of the potential for vocabulary learning through watching movies

Webb S. A corpus driven study of the potential for vocabulary learning through watching movies. International Journal of Corpus Linguistics. 2010;15(4):497–519

work page 2010

[32] [32]

A survey of available corpora for building data-driven dialogue systems

Serban IV, Lowe R, Henderson P, Charlin L, Pineau J. A survey of available corpora for building data-driven dialogue systems. arXiv preprint arXiv:151205742. 2015

work page 2015

[33] [33]

https://transcripts.fandom.com/wiki/Transcripts_Wiki

Transcripts Wiki on Fandom;. https://transcripts.fandom.com/wiki/Transcripts_Wiki

work page

[34] [34]

https://www.scriptslug.com/

Script Slug;. https://www.scriptslug.com/

work page

[35] [35]

Multisemiotic Transcriptions as Film Referencing Systems

Baldry A. Multisemiotic Transcriptions as Film Referencing Systems. InTRAlinea: Online Translation Journal. 2016

work page 2016

[36] [36]

Tidy data

Wickham H. Tidy data. Journal of Statistical Software. Submitted

work page

[37] [37]

Marvel Cinematic Universe wiki

work page

[38] [38]

Evolution and measurement of species diversity

Whittaker RH. Evolution and measurement of species diversity. Taxon. 1972; p. 213–251

work page 1972

[39] [39]

Early Predictions of Movie Success: The Who, What, and When of Profitability

Lash MT, Zhao K. Early Predictions of Movie Success: The Who, What, and When of Profitability. Journal of Management Information Systems. 2016;33(3):874–903

work page 2016

[40] [40]

On Choosing and Bounding Probability Metrics

Gibbs AL, Su FE. On Choosing and Bounding Probability Metrics. Interdisciplinary Science Reviews. 2002;70(3):419–435. doi:10.1111/j.1751-5823.2002.tb00178.x

work page doi:10.1111/j.1751-5823.2002.tb00178.x 2002

[41] [41]

Similarity-Based Methods For Word Sense Disambiguation

Dagan I, Lee L, Pereira F. Similarity-Based Methods For Word Sense Disambiguation. In: Thirty-Fifth Annual Meeting of the Association for Computational Linguistics; 1997

work page 1997

[42] [42]

Divergence measures based on the Shannon entropy

Lin J. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory. 1991;37:145–151. June 21, 2019 25/25

work page 1991