{"paper":{"title":"Measuring Spark on AWS: A Case Study on Mining Scientific Publications with Annotation Query","license":"http://creativecommons.org/licenses/by/4.0/","headline":"","cross_cats":[],"primary_cat":"cs.DC","authors_text":"Darin McBeath, Ron Daniel Jr","submitted_at":"2018-02-02T15:24:33Z","abstract_excerpt":"Annotation Query (AQ) is a program that provides the ability to query many different types of NLP annotations on a text, as well as the original content and structure of the text. The query results may provide new annotations, or they may select subsets of the content and annotations for deeper processing. Like GATE's Mimir, AQ is based on region algebras. Our AQ is implemented to run on a Spark cluster. In this paper we look at how AQ's runtimes are affected by the size of the collection, the number of nodes in the cluster, the type of node, and the characteristics of the queries. Cluster siz"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"1802.00728","kind":"arxiv","version":1},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}