{"work":{"id":"f20e62ba-6265-4b97-aa8c-ddefaf2f5762","openalex_id":null,"doi":null,"arxiv_id":"1705.03551","raw_key":null,"title":"TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension","authors":null,"authors_text":"Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer","year":2017,"venue":"cs.CL","abstract":"We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduced large-scale datasets, TriviaQA (1) has relatively complex, compositional questions, (2) has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and (3) requires more cross sentence reasoning to find answers. We also present two baseline algorithms: a feature-based classifier and a state-of-the-art neural network, that performs well on SQuAD reading comprehension. Neither approach comes close to human performance (23% and 40% vs. 80%), suggesting that TriviaQA is a challenging testbed that is worth significant future study. Data and code available at -- http://nlp.cs.washington.edu/triviaqa/","external_url":"https://arxiv.org/abs/1705.03551","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-28T18:32:29.019014+00:00","pith_arxiv_id":"1705.03551","created_at":"2026-05-09T05:45:22.105276+00:00","updated_at":"2026-06-28T18:32:29.019014+00:00","title_quality_ok":true,"display_title":"TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension","render_title":"TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension"},"hub":{"state":{"work_id":"f20e62ba-6265-4b97-aa8c-ddefaf2f5762","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":98,"external_cited_by_count":null,"distinct_field_count":12,"first_pith_cited_at":"2019-01-13T23:27:58+00:00","last_pith_cited_at":"2026-06-23T16:46:36+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T13:38:55.755839+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"dataset","n":11},{"context_role":"background","n":9},{"context_role":"method","n":1}],"polarity_counts":[{"context_polarity":"use_dataset","n":11},{"context_polarity":"background","n":8},{"context_polarity":"unclear","n":1},{"context_polarity":"use_method","n":1}],"runs":{"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T17:16:17.481772+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":14},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":12},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":12},{"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"ed240a10-5b19-406c-baa5-30803f465785","shared_citers":11},{"title":"Measuring Massive Multitask Language Understanding","work_id":"e87ec49a-544b-4ec8-8991-75298c64ff5e","shared_citers":11},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":9},{"title":"Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge","work_id":"28ea1282-d657-4c61-a83c-f1249be6d6b1","shared_citers":9},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":8},{"title":"HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering","work_id":"c87d7e5f-b81a-41c8-beca-f0b9d598aae4","shared_citers":8},{"title":"Training Compute-Optimal Large Language Models","work_id":"b2faf28d-86b7-429c-bc42-469458efc246","shared_citers":8},{"title":"Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them","work_id":"513eb205-04ca-4722-9a43-a74e8cbe7e85","shared_citers":7},{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":7},{"title":"DeepSeek-V3 Technical Report","work_id":"57d2791d-2219-4c31-a077-afc04b12a75c","shared_citers":7},{"title":"Mistral 7B","work_id":"eb5e1305-ad11-4875-ad8d-ad8b8f697599","shared_citers":7},{"title":"Program Synthesis with Large Language Models","work_id":"fd241a05-03b9-4de2-9588-9d77ce176125","shared_citers":7},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":7},{"title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","work_id":"50e3b368-0243-4726-8186-233869802ad1","shared_citers":6},{"title":"Gemini: A Family of Highly Capable Multimodal Models","work_id":"83f7c85b-3f11-450f-ac0c-64d9745220b2","shared_citers":6},{"title":"HellaSwag: Can a Machine Really Finish Your Sentence?","work_id":"79f44c0c-96f4-4edb-bc50-a3c9d6b85936","shared_citers":6},{"title":"Language Models (Mostly) Know What They Know","work_id":"8ca58a10-da41-4f70-baae-7e449512e345","shared_citers":6},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":6},{"title":"Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer","work_id":"2c6b3f6d-54e4-4df7-baa7-475a490799af","shared_citers":6},{"title":"SQuAD: 100,000+ Questions for Machine Comprehension of Text","work_id":"0492dd16-26e8-48d9-874c-3dd90cae7b85","shared_citers":6},{"title":"arXiv preprint arXiv:1606.06031 , year=","work_id":"3775e6d1-2f28-4791-82a9-538c0507512c","shared_citers":5}],"time_series":[{"n":2,"year":2019},{"n":1,"year":2020},{"n":1,"year":2021},{"n":1,"year":2022},{"n":3,"year":2023},{"n":6,"year":2024},{"n":6,"year":2025},{"n":25,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T17:16:27.411727+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T17:16:27.359819+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension","claims":[{"claim_text":"We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduced large-scale datasets, TriviaQA (1) has relatively complex, compositional questions, (2) has considerable syntactic and lexical variability between questions and corresponding answer-evidence senten","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T17:16:14.411219+00:00"}},"summary":{"title":"TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension","claims":[{"claim_text":"We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduced large-scale datasets, TriviaQA (1) has relatively complex, compositional questions, (2) has considerable syntactic and lexical variability between questions and corresponding answer-evidence senten","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":14},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":12},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":12},{"title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","work_id":"ed240a10-5b19-406c-baa5-30803f465785","shared_citers":11},{"title":"Measuring Massive Multitask Language Understanding","work_id":"e87ec49a-544b-4ec8-8991-75298c64ff5e","shared_citers":11},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":9},{"title":"Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge","work_id":"28ea1282-d657-4c61-a83c-f1249be6d6b1","shared_citers":9},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":8},{"title":"HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering","work_id":"c87d7e5f-b81a-41c8-beca-f0b9d598aae4","shared_citers":8},{"title":"Training Compute-Optimal Large Language Models","work_id":"b2faf28d-86b7-429c-bc42-469458efc246","shared_citers":8},{"title":"Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them","work_id":"513eb205-04ca-4722-9a43-a74e8cbe7e85","shared_citers":7},{"title":"Decoupled Weight Decay Regularization","work_id":"07ef7360-d385-4033-83f7-8384a6325204","shared_citers":7},{"title":"DeepSeek-V3 Technical Report","work_id":"57d2791d-2219-4c31-a077-afc04b12a75c","shared_citers":7},{"title":"Mistral 7B","work_id":"eb5e1305-ad11-4875-ad8d-ad8b8f697599","shared_citers":7},{"title":"Program Synthesis with Large Language Models","work_id":"fd241a05-03b9-4de2-9588-9d77ce176125","shared_citers":7},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":7},{"title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","work_id":"50e3b368-0243-4726-8186-233869802ad1","shared_citers":6},{"title":"Gemini: A Family of Highly Capable Multimodal Models","work_id":"83f7c85b-3f11-450f-ac0c-64d9745220b2","shared_citers":6},{"title":"HellaSwag: Can a Machine Really Finish Your Sentence?","work_id":"79f44c0c-96f4-4edb-bc50-a3c9d6b85936","shared_citers":6},{"title":"Language Models (Mostly) Know What They Know","work_id":"8ca58a10-da41-4f70-baae-7e449512e345","shared_citers":6},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":6},{"title":"Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer","work_id":"2c6b3f6d-54e4-4df7-baa7-475a490799af","shared_citers":6},{"title":"SQuAD: 100,000+ Questions for Machine Comprehension of Text","work_id":"0492dd16-26e8-48d9-874c-3dd90cae7b85","shared_citers":6},{"title":"arXiv preprint arXiv:1606.06031 , year=","work_id":"3775e6d1-2f28-4791-82a9-538c0507512c","shared_citers":5}],"time_series":[{"n":2,"year":2019},{"n":1,"year":2020},{"n":1,"year":2021},{"n":1,"year":2022},{"n":3,"year":2023},{"n":6,"year":2024},{"n":6,"year":2025},{"n":25,"year":2026}],"dependency_candidates":[]},"authors":[]}}