Be Consistent! Improving Procedural Text Comprehension using Label Consistency

Antoine Bosselut; Bhavana Dalvi Mishra; Claire Cardie; Niket Tandon; Peter Clark; Wen-tau Yih; Xinya Du

arxiv: 1906.08942 · v1 · pith:MBRUT2F3new · submitted 2019-06-21 · 💻 cs.CL · cs.LG

Be Consistent! Improving Procedural Text Comprehension using Label Consistency

Xinya Du , Bhavana Dalvi Mishra , Niket Tandon , Antoine Bosselut , Wen-tau Yih , Peter Clark , Claire Cardie This is my paper

Pith reviewed 2026-05-25 19:21 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords procedural text comprehensionlabel consistencyentity state trackingProParanatural language processingmachine learning

0 comments

The pith

Training models with label consistency across multiple descriptions improves procedural text comprehension.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to improve how models track changes in entity properties like location as a procedure unfolds in text. It proposes a training framework that uses the availability of multiple independent descriptions of the same procedure to enforce that their predictions agree. This builds a consistency bias directly into the model. A reader would care because procedural texts are dynamic and current systems still make many errors on entity state tracking. The approach yields higher F1 scores on the ProPara benchmark than earlier methods.

Core claim

The authors claim that a learning framework which leverages label consistency during training, by requiring predictions from multiple independent descriptions of the same procedural text to agree, builds consistency bias into the model and produces significantly higher F1 scores on entity state tracking than prior state-of-the-art systems on the ProPara dataset.

What carries the argument

The label consistency learning framework that enforces agreement between predictions from different descriptions of the same procedural text.

If this is right

Entity state tracking becomes more accurate for procedures described in multiple ways.
The method applies directly to any procedural text where several independent accounts exist.
Consistency bias is learned at training time and requires no change at inference.
Performance gains appear on the standard ProPara benchmark without new labeled data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same consistency mechanism could be tried on other tracking or sequence-labeling tasks that have multiple annotations.
When multiple descriptions are scarce, one might generate synthetic variants that preserve consistency to test the same benefit.
The approach may reduce description-specific biases by averaging across accounts.

Load-bearing premise

Multiple independent descriptions of the same procedural text are available and forcing their predictions to be consistent improves entity-state tracking accuracy without introducing new errors.

What would settle it

Running the label-consistency training on ProPara and finding no F1 gain or a drop relative to the same model trained without the consistency term would falsify the central claim.

Figures

Figures reproduced from arXiv: 1906.08942 by Antoine Bosselut, Bhavana Dalvi Mishra, Claire Cardie, Niket Tandon, Peter Clark, Wen-tau Yih, Xinya Du.

**Figure 1.** Figure 1: Fragments from three independent texts about photosynthesis. Although (1) is ambiguous as to whether oxygen is being created or merely moved, evidence from (2) and (3) suggests it is being created, helping to correctly interpret (1). More generally, encouraging consistency between predictions from different paragraphs about the same process/procedure can improve performance. many state changes by multipl… view at source ↗

**Figure 2.** Figure 2: Three (simplified) passages from ProPara describing photosynthesis, the (gold) state changes each entity [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Example of batches constructed from a group (here, the group contains three labeled examples [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of the LaCE training framework, illustrated for the procedural comprehension task ProPara. During training, LaCE processes batches of examples {x1,...,xk} for each group Xg, where predictions for one example (here ˆy1) are compared against its gold (producing loss Lsup), and its summary against summaries of all other examples to encourage consistency of predictions (producing Lcon), repeating for … view at source ↗

**Figure 5.** Figure 5: Comparing LaCE vs. ProStruct based on Recall on the test partition, by varying amount of labeled paragraphs available per training topic els varying two different parameters: (1) the percentage of the labeled (ProPara) training data used to train the system (2) for LaCE only, whether the additional unlabeled data was also used. This allows us to see performance under different conditions of sparsity of … view at source ↗

read the original abstract

Our goal is procedural text comprehension, namely tracking how the properties of entities (e.g., their location) change with time given a procedural text (e.g., a paragraph about photosynthesis, a recipe). This task is challenging as the world is changing throughout the text, and despite recent advances, current systems still struggle with this task. Our approach is to leverage the fact that, for many procedural texts, multiple independent descriptions are readily available, and that predictions from them should be consistent (label consistency). We present a new learning framework that leverages label consistency during training, allowing consistency bias to be built into the model. Evaluation on a standard benchmark dataset for procedural text, ProPara (Dalvi et al., 2018), shows that our approach significantly improves prediction performance (F1) over prior state-of-the-art systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Consistency regularization on multiple descriptions lifts ProPara F1, but the paper never shows the lift survives an ablation that just pools the extra data without the consistency term.

read the letter

The main takeaway is that training with a consistency penalty across independent descriptions of the same procedure improves entity-state tracking on ProPara. The authors treat multiple descriptions as a natural source of supervision that should agree on the underlying state changes, and they bake that agreement into the loss. That is the concrete new piece: a simple way to turn extra text into a regularizer rather than just more training examples. The implementation looks clean enough on the surface and the reported F1 gain over prior systems is the kind of practical increment that matters for this narrow task. Credit to them for identifying that multiple descriptions are often available for procedural text and for trying to exploit the consistency signal directly. The soft spot is exactly the one the stress-test flags. Nothing in the abstract or the reported experiments isolates whether the consistency objective itself is doing the work or whether the model is simply benefiting from seeing more surface forms of the same procedure. A straightforward control—train on the union of descriptions with ordinary supervision only—would settle it, and its absence leaves the central claim under-supported. The math is standard cross-entropy plus a consistency term, so no hidden fitting tricks, but the attribution question remains open. This is the sort of paper that belongs in a specialized NLP venue or workshop rather than a top-tier conference; the idea is modest but usable. A serious referee should see it because the task is well-defined, the data is public, and the fix is cheap to try, even if the current evidence does not yet pin down why it works. I would bring it to a reading group only if someone is actively working on procedural text or consistency regularization; otherwise it is too incremental. I would not cite it in my own work unless I needed the exact ProPara numbers. Send it to review.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a training framework for procedural text comprehension that leverages multiple independent descriptions of the same text to enforce label consistency during learning, with the goal of improving entity-state tracking. It evaluates the approach on the ProPara benchmark and claims significant F1 gains over prior state-of-the-art systems.

Significance. If the consistency objective is shown to be the source of the gains (rather than data volume alone), the work offers a generalizable technique for incorporating consistency biases when redundant annotations exist, which could benefit other sequence-labeling tasks involving dynamic state changes.

major comments (2)

[Experiments] Experiments section: No ablation is reported that trains a model on the union of all descriptions using only standard per-description supervision (without the consistency term). This control is required to isolate whether the F1 lift derives from the consistency bias or from the simple increase in training data volume and diversity.
[Method] Method section: The precise formulation of the consistency loss (including how entity-state predictions are aligned and aggregated across descriptions, and the weighting hyperparameter) is not specified with sufficient detail to verify that the mechanism does not introduce new errors on individual descriptions.

minor comments (2)

[Abstract] Abstract: The claim of 'significantly improves prediction performance (F1)' should be accompanied by the concrete delta and baseline numbers for immediate context.
Ensure that all tables reporting F1 scores include standard deviations or statistical significance tests when comparing systems.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major point below and will revise the manuscript to incorporate the requested clarifications and experiments.

read point-by-point responses

Referee: Experiments section: No ablation is reported that trains a model on the union of all descriptions using only standard per-description supervision (without the consistency term). This control is required to isolate whether the F1 lift derives from the consistency bias or from the simple increase in training data volume and diversity.

Authors: We agree that this ablation is essential to isolate the contribution of the consistency term. We will add the requested control experiment (training on the union of descriptions with standard supervision only) to the Experiments section of the revised manuscript. revision: yes
Referee: Method section: The precise formulation of the consistency loss (including how entity-state predictions are aligned and aggregated across descriptions, and the weighting hyperparameter) is not specified with sufficient detail to verify that the mechanism does not introduce new errors on individual descriptions.

Authors: We acknowledge that additional detail is needed for reproducibility. We will expand the Method section in the revision to include the exact mathematical formulation of the consistency loss, the alignment and aggregation procedure across descriptions, and the role and tuning of the weighting hyperparameter. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical gains from external multi-description data and consistency objective

full rationale

The paper's central claim is an empirical F1 improvement on the external ProPara benchmark (Dalvi et al. 2018) achieved by training with a label-consistency objective over multiple independent descriptions. No derivation chain reduces by construction to fitted inputs or self-citations; the consistency bias is an added training term whose effect is measured against prior systems on held-out data. The approach is self-contained against external benchmarks with no self-definitional, fitted-prediction, or load-bearing self-citation patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no concrete free parameters, axioms, or invented entities; the approach appears to rest on standard supervised learning plus an added consistency term whose exact formulation is not described.

pith-pipeline@v0.9.0 · 5687 in / 988 out tokens · 30657 ms · 2026-05-25T19:21:05.599794+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 8 internal anchors

[1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Jonathan Berant, Vivek Srikumar, Pei-Chun Chen, Abby Vander Linden, Brittany Harding, Brad Huang, Peter Clark, and Christopher D Manning. 2014. Modeling biological processes for reading comprehension. In Proc. EMNLP'14

work page 2014
[4]

Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, and Yejin Choi. 2018. Simulating action dynamics with neural process networks. 6th International Conference on Learning Representations (ICLR)

work page 2018
[5]

Danqi Chen, Jason Bolton, and Christopher D. Manning. 2016. A thorough examination of the cnn/daily mail reading comprehension task. CoRR, abs/1606.02858

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings

Kevin Chen, Christopher B. Choy, Manolis Savva, Angel X. Chang, Thomas A. Funkhouser, and Silvio Savarese. 2018. Text2shape: Generating shapes from natural language by learning joint embeddings. CoRR, abs/1803.08495

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Chinchor

Nancy A. Chinchor. 2002. Message understanding conference ( muc ) tests of discourse processing

work page 2002
[8]

Charles LA Clarke, Gordon V Cormack, and Thomas R Lynam. 2001. Exploiting redundancy in question answering. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM

work page 2001
[9]

Bhavana Dalvi, Lifu Huang, Niket Tandon, Wen-tau Yih, and Peter Clark. 2018. Tracking state changes in procedural text: A challenge dataset and models for process paragraph comprehension. NAACL-HLT'18, arXiv preprint arXiv:1805.06975

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Rajarshi Das, Tsendsuren Munkhdalai, Xingdi Yuan, Adam Trischler, and Andrew McCallum. 2019. Building dynamic knowledge graphs from text using machine reading comprehension. ICLR. ArXiv:1810.05682

work page internal anchor Pith review Pith/arXiv arXiv 2019
[11]

Dumais, Michele Banko, Eric Brill, Jimmy J

Susan T. Dumais, Michele Banko, Eric Brill, Jimmy J. Lin, and Andrew Y. Ng. 2002. Web question answering: is more always better? In SIGIR

work page 2002
[12]

Kuzman Ganchev, Jo \ a o Graça, Jennifer Gillenwater, and Ben Taskar. 2010. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11:2001--2049

work page 2010
[13]

Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. 2018. Allennlp: A deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640

work page internal anchor Pith review Pith/arXiv arXiv 2018
[14]

Aditya Gupta and Greg Durrett. 2019. Tracking discrete and continuous entity state for process understanding. arXiv preprint arXiv:1904.03518. (To appear in NAACL'19 workshop on Structured Prediction for NLP)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[15]

Philip Haeusser, Alexander Mordvintsev, and Daniel Cremers. 2017. Learning by association-a versatile semi-supervised training method for neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 3, page 6

work page 2017
[16]

Viktor Hangya, Fabienne Braune, Alexander Fraser, and Hinrich Sch \"u tze. 2018. Two methods for domain adaptation of bilingual tasks: Delightfully simple and broadly applicable. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

work page 2018
[17]

Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, and Yann LeCun. 2017. Tracking the world state with recurrent entity networks. In ICLR

work page 2017
[18]

Sepp Hochreiter and J \"u rgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735--1780

work page 1997
[19]

Chlo \'e Kiddon, Ganesa Thandavam Ponnuraj, Luke Zettlemoyer, and Yejin Choi. 2015. Mise en place: Unsupervised interpretation of instructional recipes. In Proc. EMNLP'15

work page 2015
[20]

Chlo \'e Kiddon, Luke Zettlemoyer, and Yejin Choi. 2016. Globally coherent text generation with neural checklist models. In Proc. EMNLP'16

work page 2016
[21]

Scott Kirkpatrick, C. D. Gelatt, and Mario P. Vecchi. 1988. Optimization by simulated annealing

work page 1988
[22]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch

work page 2017
[23]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532--1543

work page 2014
[24]

Fraser, and Viktor Hangya

Hinrich Sch \"u tze, Fabienne Braune, Alexander M. Fraser, and Viktor Hangya. 2018. Two methods for domain adaptation of bilingual tasks: Delightfully simple and broadly applicable. In ACL

work page 2018
[25]

Minjoon Seo, Sewon Min, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Query-reduction networks for question answering. In ICLR

work page 2017
[26]

Partha Pratim Talukdar, Joseph Reisinger, Marius Pasca, Deepak Ravichandran, Rahul Bhagat, and Fernando Pereira. 2008. Weakly-supervised acquisition of labeled class instances using graph random walks. In EMNLP

work page 2008
[27]

Niket Tandon, Bhavana Dalvi Mishra , Joel Grus, Wen-tau Yih, Antoine Bosselut, and Peter Clark. 2018. Reasoning about actions and state changes by injecting commonsense knowledge. EMNLP'18, arXiv preprint arXiv:1808.10012

work page internal anchor Pith review Pith/arXiv arXiv 2018
[28]

Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M Rush, Bart van Merri \"e nboer, Armand Joulin, and Tomas Mikolov. 2015. Towards AI -complete question answering: A set of prerequisite toy tasks. arXiv preprint arXiv:1502.05698

work page internal anchor Pith review Pith/arXiv arXiv 2015
[29]

Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Sch \"o lkopf. 2003. Learning with local and global consistency. In NIPS

work page 2003
[30]

Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML), pages 912--919

work page 2003

[1] [1]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Jonathan Berant, Vivek Srikumar, Pei-Chun Chen, Abby Vander Linden, Brittany Harding, Brad Huang, Peter Clark, and Christopher D Manning. 2014. Modeling biological processes for reading comprehension. In Proc. EMNLP'14

work page 2014

[4] [4]

Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, and Yejin Choi. 2018. Simulating action dynamics with neural process networks. 6th International Conference on Learning Representations (ICLR)

work page 2018

[5] [5]

Danqi Chen, Jason Bolton, and Christopher D. Manning. 2016. A thorough examination of the cnn/daily mail reading comprehension task. CoRR, abs/1606.02858

work page internal anchor Pith review Pith/arXiv arXiv 2016

[6] [6]

Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings

Kevin Chen, Christopher B. Choy, Manolis Savva, Angel X. Chang, Thomas A. Funkhouser, and Silvio Savarese. 2018. Text2shape: Generating shapes from natural language by learning joint embeddings. CoRR, abs/1803.08495

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Chinchor

Nancy A. Chinchor. 2002. Message understanding conference ( muc ) tests of discourse processing

work page 2002

[8] [8]

Charles LA Clarke, Gordon V Cormack, and Thomas R Lynam. 2001. Exploiting redundancy in question answering. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM

work page 2001

[9] [9]

Bhavana Dalvi, Lifu Huang, Niket Tandon, Wen-tau Yih, and Peter Clark. 2018. Tracking state changes in procedural text: A challenge dataset and models for process paragraph comprehension. NAACL-HLT'18, arXiv preprint arXiv:1805.06975

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Rajarshi Das, Tsendsuren Munkhdalai, Xingdi Yuan, Adam Trischler, and Andrew McCallum. 2019. Building dynamic knowledge graphs from text using machine reading comprehension. ICLR. ArXiv:1810.05682

work page internal anchor Pith review Pith/arXiv arXiv 2019

[11] [11]

Dumais, Michele Banko, Eric Brill, Jimmy J

Susan T. Dumais, Michele Banko, Eric Brill, Jimmy J. Lin, and Andrew Y. Ng. 2002. Web question answering: is more always better? In SIGIR

work page 2002

[12] [12]

Kuzman Ganchev, Jo \ a o Graça, Jennifer Gillenwater, and Ben Taskar. 2010. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11:2001--2049

work page 2010

[13] [13]

Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. 2018. Allennlp: A deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640

work page internal anchor Pith review Pith/arXiv arXiv 2018

[14] [14]

Aditya Gupta and Greg Durrett. 2019. Tracking discrete and continuous entity state for process understanding. arXiv preprint arXiv:1904.03518. (To appear in NAACL'19 workshop on Structured Prediction for NLP)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[15] [15]

Philip Haeusser, Alexander Mordvintsev, and Daniel Cremers. 2017. Learning by association-a versatile semi-supervised training method for neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 3, page 6

work page 2017

[16] [16]

Viktor Hangya, Fabienne Braune, Alexander Fraser, and Hinrich Sch \"u tze. 2018. Two methods for domain adaptation of bilingual tasks: Delightfully simple and broadly applicable. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

work page 2018

[17] [17]

Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, and Yann LeCun. 2017. Tracking the world state with recurrent entity networks. In ICLR

work page 2017

[18] [18]

Sepp Hochreiter and J \"u rgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735--1780

work page 1997

[19] [19]

Chlo \'e Kiddon, Ganesa Thandavam Ponnuraj, Luke Zettlemoyer, and Yejin Choi. 2015. Mise en place: Unsupervised interpretation of instructional recipes. In Proc. EMNLP'15

work page 2015

[20] [20]

Chlo \'e Kiddon, Luke Zettlemoyer, and Yejin Choi. 2016. Globally coherent text generation with neural checklist models. In Proc. EMNLP'16

work page 2016

[21] [21]

Scott Kirkpatrick, C. D. Gelatt, and Mario P. Vecchi. 1988. Optimization by simulated annealing

work page 1988

[22] [22]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch

work page 2017

[23] [23]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532--1543

work page 2014

[24] [24]

Fraser, and Viktor Hangya

Hinrich Sch \"u tze, Fabienne Braune, Alexander M. Fraser, and Viktor Hangya. 2018. Two methods for domain adaptation of bilingual tasks: Delightfully simple and broadly applicable. In ACL

work page 2018

[25] [25]

Minjoon Seo, Sewon Min, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Query-reduction networks for question answering. In ICLR

work page 2017

[26] [26]

Partha Pratim Talukdar, Joseph Reisinger, Marius Pasca, Deepak Ravichandran, Rahul Bhagat, and Fernando Pereira. 2008. Weakly-supervised acquisition of labeled class instances using graph random walks. In EMNLP

work page 2008

[27] [27]

Niket Tandon, Bhavana Dalvi Mishra , Joel Grus, Wen-tau Yih, Antoine Bosselut, and Peter Clark. 2018. Reasoning about actions and state changes by injecting commonsense knowledge. EMNLP'18, arXiv preprint arXiv:1808.10012

work page internal anchor Pith review Pith/arXiv arXiv 2018

[28] [28]

Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M Rush, Bart van Merri \"e nboer, Armand Joulin, and Tomas Mikolov. 2015. Towards AI -complete question answering: A set of prerequisite toy tasks. arXiv preprint arXiv:1502.05698

work page internal anchor Pith review Pith/arXiv arXiv 2015

[29] [29]

Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Sch \"o lkopf. 2003. Learning with local and global consistency. In NIPS

work page 2003

[30] [30]

Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML), pages 912--919

work page 2003