Integrating Knowledge and Reasoning in Image Understanding

Chitta Baral; Somak Aditya; Yezhou Yang

arxiv: 1906.09954 · v1 · pith:ZAXJWRZ5new · submitted 2019-06-24 · 💻 cs.CV · cs.AI

Integrating Knowledge and Reasoning in Image Understanding

Somak Aditya , Yezhou Yang , Chitta Baral This is my paper

Pith reviewed 2026-05-25 17:36 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords image understandingknowledge integrationreasoning mechanismsdeep learningneural networksvisual question answeringexternal knowledgesemantic segmentation

0 comments

The pith

Integrating external knowledge with neural networks and higher-level reasoning addresses limitations in data-driven image understanding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys representative reasoning mechanisms, knowledge integration methods, and their applications in tasks such as object recognition, semantic segmentation, and visual question answering. It identifies the absence of external knowledge and higher-level reasoning as a key hindrance in current deep learning approaches for image understanding. By reviewing efforts from various research groups, the work highlights concrete ways neural networks can be combined with knowledge sources. A reader would care because this points to practical routes for making systems more capable when training data alone proves insufficient. The survey ends by outlining potential pathways forward based on the reviewed methods.

Core claim

Deep learning based data-driven approaches have succeeded in image understanding applications but still lack knowledge integration as well as higher-level reasoning capabilities. This work presents a brief survey of representative reasoning mechanisms, knowledge integration methods, and corresponding applications. It further discusses key efforts on integrating external knowledge with neural networks and concludes by discussing potential pathways to improve reasoning capabilities.

What carries the argument

Survey of reasoning mechanisms and methods for integrating external knowledge with neural networks in image understanding tasks.

If this is right

Visual question answering and similar tasks can draw on external knowledge bases to handle cases beyond what training data covers.
Combining neural networks with structured knowledge sources yields concrete performance gains in image understanding.
Multiple distinct approaches to reasoning integration already exist and can be built upon.
Future image understanding systems will require explicit pathways for incorporating higher-level reasoning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar integration strategies could apply to other perception tasks where pure pattern matching fails on novel inputs.
Structured knowledge graphs might serve as a modular add-on rather than requiring full retraining of networks.
Evaluating integrated systems on out-of-distribution images would provide a direct test of the reasoning benefit.

Load-bearing premise

The selected representative papers and methods provide a balanced and sufficiently complete view of the field.

What would settle it

An experiment showing that purely data-driven methods without external knowledge or explicit reasoning achieve equal or better results than the surveyed integrated approaches on standard image understanding benchmarks would undermine the claimed hindrance.

Figures

Figures reproduced from arXiv: 1906.09954 by Chitta Baral, Somak Aditya, Yezhou Yang.

**Figure 2.** Figure 2: (a) Example of questions that require explicit external knowledge [35], (b) Example where knowledge helps [37]. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Deep learning based data-driven approaches have been successfully applied in various image understanding applications ranging from object recognition, semantic segmentation to visual question answering. However, the lack of knowledge integration as well as higher-level reasoning capabilities with the methods still pose a hindrance. In this work, we present a brief survey of a few representative reasoning mechanisms, knowledge integration methods and their corresponding image understanding applications developed by various groups of researchers, approaching the problem from a variety of angles. Furthermore, we discuss upon key efforts on integrating external knowledge with neural networks. Taking cues from these efforts, we conclude by discussing potential pathways to improve reasoning capabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a short survey that organizes a handful of existing papers on knowledge integration for image understanding but adds no new methods or analysis.

read the letter

Hi, the core fact is that this paper is a brief survey rather than original work. It pulls together representative examples of reasoning mechanisms and external knowledge use in tasks like object recognition and visual question answering, then notes some ways people have tried wiring knowledge into neural nets. The organization is straightforward and the selection covers a few different angles, which gives a newcomer a quick set of citations to start from. That is the main value it delivers. The discussion of future pathways stays high-level, repeating the general idea that more reasoning and knowledge will help without spelling out concrete next steps or comparing the cited approaches in any depth. Because the paper is short by design, the coverage is necessarily limited and there is no way to judge from the text whether important lines of work were left out. No equations, no new experiments, and no formal claims to verify, so the usual soundness checks do not apply. A reader already active in vision-plus-knowledge work will not find fresh synthesis or overlooked connections here. Someone entering the area might find the references useful as a starting list, but the piece does not claim or deliver deeper insight. I would not bring it to a reading group focused on current methods, and I would not cite it in my own papers. If a venue explicitly wants short survey pieces on this topic, it could reasonably go to referees; otherwise the lack of new content makes it a borderline case for serious review.

Referee Report

1 major / 1 minor

Summary. This manuscript is a brief survey claiming that purely data-driven deep learning methods for image understanding tasks (object recognition, semantic segmentation, visual question answering) are limited by lack of knowledge integration and higher-level reasoning; it reviews a few representative reasoning mechanisms and knowledge-integration approaches from the literature, discusses key efforts to combine external knowledge with neural networks, and outlines potential pathways for improvement.

Significance. If the selected examples accurately reflect the state of the field, the survey could usefully synthesize existing work and highlight directions for moving beyond purely data-driven image understanding; the manuscript does not advance new derivations, proofs, or empirical results.

major comments (1)

[Abstract] Abstract: the central synthesis claim rests on the representativeness of the 'few' selected methods, yet the text provides no explicit selection criteria, coverage of omitted lines of work, or discussion of potential selection bias; this directly affects the load-bearing assumption that the reviewed efforts constitute a balanced view.

minor comments (1)

[Abstract] Abstract: the phrasing 'discuss upon key efforts' is nonstandard and should be revised for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive suggestion. The concern about explicit selection criteria in the abstract is well-taken for a survey paper, and we will revise accordingly to strengthen the presentation of scope and balance.

read point-by-point responses

Referee: [Abstract] Abstract: the central synthesis claim rests on the representativeness of the 'few' selected methods, yet the text provides no explicit selection criteria, coverage of omitted lines of work, or discussion of potential selection bias; this directly affects the load-bearing assumption that the reviewed efforts constitute a balanced view.

Authors: We agree this is a valid point for improving clarity in a brief survey. In the revised manuscript we will (1) expand the abstract to state the selection criteria (recent works integrating external knowledge or symbolic reasoning with neural networks for image understanding tasks, chosen to illustrate diverse mechanisms across object recognition, segmentation, and VQA), (2) add a short paragraph in the introduction explicitly noting the scope, key omitted lines of work (e.g., purely symbolic systems, large-scale pre-training without explicit knowledge bases, and reinforcement-learning-only reasoning), and (3) include a brief limitations statement on potential selection bias. These changes will be confined to the front matter and will not alter the core reviewed content. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

This is a brief survey paper with no derivations, equations, fitted parameters, predictions, or technical claims that could reduce to self-definition or self-citation. The central claim is a high-level synthesis of existing literature on knowledge integration and reasoning in image understanding, supported by references to external work rather than any new proof or measurement whose validity depends on the paper's own inputs. No load-bearing steps exist that match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper; it introduces no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5623 in / 872 out tokens · 16470 ms · 2026-05-25T17:36:54.774660+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Markov Logic Network … PSL … Logic Tensor Network … Graph-Gated Neural Network … Relational Reasoning Layer … Knowledge Distillation
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

weighted First Order Logical formulas … hinge-loss energy function … Lukasiewicz T-norm

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

Spatial knowledge distillation to aid visual rea- soning

Somak Aditya, Rudra Saha, Yezhou Yang, and Chitta Baral. Spatial knowledge distillation to aid visual rea- soning. IEEE Winter Conference on Applications of Computer Vision (WACV), pages 227–235, 2019

work page 2019
[2]

Ex- plicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Somak Aditya, Yezhou Yang, and Chitta Baral. Ex- plicit Reasoning over End-to-End Neural Architectures for Visual Question Answering. In AAAI, pages 629– 637, 2018

work page 2018
[3]

Combining knowledge and reasoning through probabilistic soft logic for image puzzle solv- ing

Somak Aditya, Yezhou Yang, Chitta Baral, and Yian- nis Aloimonos. Combining knowledge and reasoning through probabilistic soft logic for image puzzle solv- ing. In UAI 2018, pages 238–248. Association For Un- certainty in Artiﬁcial Intelligence (AUAI), 2018

work page 2018
[4]

Image understand- ing using vision and reasoning through scene descrip- tion graph

Somak Aditya, Yezhou Yang, Chitta Baral, Yiannis Aloimonos, and Cornelia Fermller. Image understand- ing using vision and reasoning through scene descrip- tion graph. Computer Vision and Image Understanding, pages 33–45, 2017

work page 2017
[5]

The descrip- tion logic handbook: Theory, implementation and ap- plications

Franz Baader, Diego Calvanese, Deborah McGuinness, Peter Patel-Schneider, and Daniele Nardi. The descrip- tion logic handbook: Theory, implementation and ap- plications. Cambridge university press, 2003

work page 2003
[6]

Hinge-loss markov random ﬁelds and probabilistic soft logic

Stephen H Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. Hinge-loss markov random ﬁelds and probabilistic soft logic. Journal of Machine Learning Research, 18:1–67, 2017

work page 2017
[7]

Murel: Multimodal Relational Rea- soning for Visual Question Answering

Remi Cadene, Hedi Ben-Younes, Nicolas Thome, and Matthieu Cord. Murel: Multimodal Relational Rea- soning for Visual Question Answering. In IEEE Con- ference on Computer Vision and Pattern Recognition CVPR, 2019

work page 2019
[8]

Applying fuzzy dls in the extrac- tion of image semantics

Stamatia Dasiopoulou, Ioannis Kompatsiaris, and Michael G Strintzis. Applying fuzzy dls in the extrac- tion of image semantics. In Journal on Data Semantics XIV, pages 105–132. Springer, 2009

work page 2009
[9]

Commonsense rea- soning and commonsense knowledge in artiﬁcial intel- ligence

Ernest Davis and Gary Marcus. Commonsense rea- soning and commonsense knowledge in artiﬁcial intel- ligence. Commun. ACM, 58(9):92–103, August 2015

work page 2015
[10]

Applying semantic reasoning in image re- trieval

Maaike de Boer, Laura Daniele, Paul Brandt, and Maya Sappelli. Applying semantic reasoning in image re- trieval. Proc. ALLDATA, 2015

work page 2015
[11]

Problog: A probabilistic prolog and its applica- tion in link discovery

Luc De Raedt, Angelika Kimmig, and Hannu Toivo- nen. Problog: A probabilistic prolog and its applica- tion in link discovery. In Proceedings of the 20th In- ternational Joint Conference on Artiﬁcal Intelligence , IJCAI’07, pages 2468–2473, San Francisco, CA, USA,

work page
[12]

Morgan Kaufmann Publishers Inc

work page
[13]

Observing human-object interactions: Using spatial and functional compatibility for recognition

Abhinav Gupta, Aniruddha Kembhavi, and Larry S Davis. Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on Pattern Analysis and Machine In- telligence, 31(10):1775–1789, 2009

work page 2009
[14]

Conceptnet 3: a ﬂexible, multilingual semantic net- work for common sense knowledge

Catherine Havasi, Robert Speer, and Jason Alonso. Conceptnet 3: a ﬂexible, multilingual semantic net- work for common sense knowledge. InRecent advances in natural language processing, pages 27–29. Citeseer, 2007

work page 2007
[15]

Distill- ing the knowledge in a neural network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network. stat, 1050:9, 2015

work page 2015
[16]

Gqa: A new dataset for real-world visual reasoning and compo- sitional question answering

Drew A Hudson and Christopher D Manning. Gqa: A new dataset for real-world visual reasoning and compo- sitional question answering. Conference on Computer Vision and Pattern Recognition (CVPR), 2019

work page 2019
[17]

Clevr: A diagnostic dataset for compositional language and elementary visual reasoning

Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Gir- shick. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2901–2910, 2017

work page 2017
[18]

Image retrieval us- ing scene graphs

Justin Johnson, Ranjay Krishna, Michael Stark, Jia Li, Michael Bernstein, and Li Fei-Fei. Image retrieval us- ing scene graphs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 3668– 3678, June 2015

work page 2015
[19]

Shamma, Michael S

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin John- son, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yan- nis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Li Fei-Fei. Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision, 123(1):32–73, May 2017

work page 2017
[20]

Exploiting language models for visual recognition

Dieu-Thu Le, Jasper Uijlings, and Raffaella Bernardi. Exploiting language models for visual recognition. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing , pages 769– 779, 2013

work page 2013
[21]

Comput- ing lp mln using asp and mln solvers

Joohyung Lee, Samidh Talsania, and Yi Wang. Comput- ing lp mln using asp and mln solvers. Theory and Prac- tice of Logic Programming, 17(5-6):942–960, 2017

work page 2017
[22]

Douglas B. Lenat. Cyc: A large-scale investment in knowledge infrastructure. Commun. ACM, 38(11):33– 38, November 1995

work page 1995
[23]

Collective activ- ity detection using hinge-loss markov random ﬁelds

Ben London, Sameh Khamis, Stephen Bach, Bert Huang, Lise Getoor, and Larry Davis. Collective activ- ity detection using hinge-loss markov random ﬁelds. In Proceedings of the IEEE CVPR Workshops, pages 566– 571, 2013

work page 2013
[24]

Deep- problog: Neural probabilistic logic programming

Robin Manhaeve, Sebastijan Dumancic, Angelika Kim- mig, Thomas Demeester, and Luc De Raedt. Deep- problog: Neural probabilistic logic programming. In Advances in Neural Information Processing Systems , pages 3753–3763, 2018

work page 2018
[25]

The more you know: Using knowledge graphs for image classiﬁcation

Kenneth Marino, Ruslan Salakhutdinov, and Abhinav Gupta. The more you know: Using knowledge graphs for image classiﬁcation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 2673–2681, 2017

work page 2017
[26]

George A. Miller. Wordnet: A lexical database for en- glish. Commun. ACM, 38(11):39–41, November 1995

work page 1995
[27]

Randell, Zhan Cui, and Anthony G

David A. Randell, Zhan Cui, and Anthony G. Cohn. A spatial logic based on regions and connection. In Pro- ceedings 3rd International Conference ON Knowledge Representation And Reasoning, 1992

work page 1992
[28]

Markov logic networks

Matthew Richardson and Pedro Domingos. Markov logic networks. Machine learning , 62(1-2):107–136, 2006

work page 2006
[29]

End-to-end dif- ferentiable proving

Tim Rockt¨aschel and Sebastian Riedel. End-to-end dif- ferentiable proving. In Advances in Neural Information Processing Systems, pages 3788–3800, 2017

work page 2017
[30]

Kvqa: Knowledge-aware visual question answering

Naganand Yadati Sanket Shah, Anand Mishra and Partha Pratim Talukdar. Kvqa: Knowledge-aware visual question answering. In AAAI, 2019

work page 2019
[31]

A simple neural network module for relational reasoning

Adam Santoro, David Raposo, David G Barrett, Ma- teusz Malinowski, Razvan Pascanu, Peter Battaglia, and Timothy Lillicrap. A simple neural network module for relational reasoning. In NIPS, pages 4967–4976, 2017

work page 2017
[32]

Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge

Luciano Seraﬁni and Artur d’Avila Garcez. Logic ten- sor networks: Deep learning and logical reasoning from data and knowledge. arXiv preprint arXiv:1606.04422, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[33]

The geometry of a scene: On deep semantics for visual perception driven cognitive ﬁlm, studies

Jakob Suchan and Mehul Bhatt. The geometry of a scene: On deep semantics for visual perception driven cognitive ﬁlm, studies. In2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1–

work page
[34]

Suchanek, Gjergji Kasneci, and Gerhard Weikum

Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: A core of semantic knowledge. In Pro- ceedings of the 16th International Conference on World Wide Web, WWW ’07, pages 697–706, New York, NY , USA, 2007. ACM

work page 2007
[35]

Using a minimal action grammar for activity understanding in the real world

Douglas Summers-Stay, Ching L Teo, Yezhou Yang, Cornelia Ferm ¨uller, and Yiannis Aloimonos. Using a minimal action grammar for activity understanding in the real world. In 2012 IEEE/RSJ International Con- ference on Intelligent Robots and Systems, pages 4104–

work page 2012
[36]

Fvqa: fact-based visual question answering

Peng Wang, Qi Wu, Chunhua Shen, Anthony Dick, and Anton van den Hengel. Fvqa: fact-based visual question answering. IEEE TPAMI, 2017

work page 2017
[37]

Visual question answering: A survey of methods and datasets

Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, and Anton van den Hengel. Visual question answering: A survey of methods and datasets. Computer Vision and Image Understanding, 163:21–40, 2017

work page 2017
[38]

Ask me anything: Free-form vi- sual question answering based on knowledge from ex- ternal sources

Qi Wu, Peng Wang, Chunhua Shen, Anthony Dick, and Anton van den Hengel. Ask me anything: Free-form vi- sual question answering based on knowledge from ex- ternal sources. In IEEE Conference on Computer Vi- sion and Pattern Recognition CVPR, pages 4622–4630, 2016

work page 2016
[39]

Incorporating Human Domain Knowledge into Large Scale Cost Function Learning

Markus Wulfmeier, Dushyant Rao, and Ingmar Pos- ner. Incorporating human domain knowledge into large scale cost function learning. arXiv preprint arXiv:1612.04318, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[40]

Morariu, and Larry S

Ruichi Yu, Ang Li, Vlad I. Morariu, and Larry S. Davis. Visual Relationship Detection with Internal and Exter- nal Linguistic Knowledge Distillation. ICCV, 2017

work page 2017
[41]

Scene understanding by reasoning sta- bility and safety

Bo Zheng, Yibiao Zhao, Joey Yu, Katsushi Ikeuchi, and Song-Chun Zhu. Scene understanding by reasoning sta- bility and safety. International Journal of Computer Vi- sion, 112(2):221–238, 2015

work page 2015
[42]

Temporal relational reasoning in videos

Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. Temporal relational reasoning in videos. In ECCV, September 2018

work page 2018
[43]

Reasoning about object affordances in a knowledge base represen- tation

Yuke Zhu, Alireza Fathi, and Li Fei-Fei. Reasoning about object affordances in a knowledge base represen- tation. In ECCV (2), pages 408–424. Springer, 2014

work page 2014

[1] [1]

Spatial knowledge distillation to aid visual rea- soning

Somak Aditya, Rudra Saha, Yezhou Yang, and Chitta Baral. Spatial knowledge distillation to aid visual rea- soning. IEEE Winter Conference on Applications of Computer Vision (WACV), pages 227–235, 2019

work page 2019

[2] [2]

Ex- plicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Somak Aditya, Yezhou Yang, and Chitta Baral. Ex- plicit Reasoning over End-to-End Neural Architectures for Visual Question Answering. In AAAI, pages 629– 637, 2018

work page 2018

[3] [3]

Combining knowledge and reasoning through probabilistic soft logic for image puzzle solv- ing

Somak Aditya, Yezhou Yang, Chitta Baral, and Yian- nis Aloimonos. Combining knowledge and reasoning through probabilistic soft logic for image puzzle solv- ing. In UAI 2018, pages 238–248. Association For Un- certainty in Artiﬁcial Intelligence (AUAI), 2018

work page 2018

[4] [4]

Image understand- ing using vision and reasoning through scene descrip- tion graph

Somak Aditya, Yezhou Yang, Chitta Baral, Yiannis Aloimonos, and Cornelia Fermller. Image understand- ing using vision and reasoning through scene descrip- tion graph. Computer Vision and Image Understanding, pages 33–45, 2017

work page 2017

[5] [5]

The descrip- tion logic handbook: Theory, implementation and ap- plications

Franz Baader, Diego Calvanese, Deborah McGuinness, Peter Patel-Schneider, and Daniele Nardi. The descrip- tion logic handbook: Theory, implementation and ap- plications. Cambridge university press, 2003

work page 2003

[6] [6]

Hinge-loss markov random ﬁelds and probabilistic soft logic

Stephen H Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. Hinge-loss markov random ﬁelds and probabilistic soft logic. Journal of Machine Learning Research, 18:1–67, 2017

work page 2017

[7] [7]

Murel: Multimodal Relational Rea- soning for Visual Question Answering

Remi Cadene, Hedi Ben-Younes, Nicolas Thome, and Matthieu Cord. Murel: Multimodal Relational Rea- soning for Visual Question Answering. In IEEE Con- ference on Computer Vision and Pattern Recognition CVPR, 2019

work page 2019

[8] [8]

Applying fuzzy dls in the extrac- tion of image semantics

Stamatia Dasiopoulou, Ioannis Kompatsiaris, and Michael G Strintzis. Applying fuzzy dls in the extrac- tion of image semantics. In Journal on Data Semantics XIV, pages 105–132. Springer, 2009

work page 2009

[9] [9]

Commonsense rea- soning and commonsense knowledge in artiﬁcial intel- ligence

Ernest Davis and Gary Marcus. Commonsense rea- soning and commonsense knowledge in artiﬁcial intel- ligence. Commun. ACM, 58(9):92–103, August 2015

work page 2015

[10] [10]

Applying semantic reasoning in image re- trieval

Maaike de Boer, Laura Daniele, Paul Brandt, and Maya Sappelli. Applying semantic reasoning in image re- trieval. Proc. ALLDATA, 2015

work page 2015

[11] [11]

Problog: A probabilistic prolog and its applica- tion in link discovery

Luc De Raedt, Angelika Kimmig, and Hannu Toivo- nen. Problog: A probabilistic prolog and its applica- tion in link discovery. In Proceedings of the 20th In- ternational Joint Conference on Artiﬁcal Intelligence , IJCAI’07, pages 2468–2473, San Francisco, CA, USA,

work page

[12] [12]

Morgan Kaufmann Publishers Inc

work page

[13] [13]

Observing human-object interactions: Using spatial and functional compatibility for recognition

Abhinav Gupta, Aniruddha Kembhavi, and Larry S Davis. Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on Pattern Analysis and Machine In- telligence, 31(10):1775–1789, 2009

work page 2009

[14] [14]

Conceptnet 3: a ﬂexible, multilingual semantic net- work for common sense knowledge

Catherine Havasi, Robert Speer, and Jason Alonso. Conceptnet 3: a ﬂexible, multilingual semantic net- work for common sense knowledge. InRecent advances in natural language processing, pages 27–29. Citeseer, 2007

work page 2007

[15] [15]

Distill- ing the knowledge in a neural network

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network. stat, 1050:9, 2015

work page 2015

[16] [16]

Gqa: A new dataset for real-world visual reasoning and compo- sitional question answering

Drew A Hudson and Christopher D Manning. Gqa: A new dataset for real-world visual reasoning and compo- sitional question answering. Conference on Computer Vision and Pattern Recognition (CVPR), 2019

work page 2019

[17] [17]

Clevr: A diagnostic dataset for compositional language and elementary visual reasoning

Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Gir- shick. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2901–2910, 2017

work page 2017

[18] [18]

Image retrieval us- ing scene graphs

Justin Johnson, Ranjay Krishna, Michael Stark, Jia Li, Michael Bernstein, and Li Fei-Fei. Image retrieval us- ing scene graphs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages 3668– 3678, June 2015

work page 2015

[19] [19]

Shamma, Michael S

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin John- son, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yan- nis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Li Fei-Fei. Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision, 123(1):32–73, May 2017

work page 2017

[20] [20]

Exploiting language models for visual recognition

Dieu-Thu Le, Jasper Uijlings, and Raffaella Bernardi. Exploiting language models for visual recognition. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing , pages 769– 779, 2013

work page 2013

[21] [21]

Comput- ing lp mln using asp and mln solvers

Joohyung Lee, Samidh Talsania, and Yi Wang. Comput- ing lp mln using asp and mln solvers. Theory and Prac- tice of Logic Programming, 17(5-6):942–960, 2017

work page 2017

[22] [22]

Douglas B. Lenat. Cyc: A large-scale investment in knowledge infrastructure. Commun. ACM, 38(11):33– 38, November 1995

work page 1995

[23] [23]

Collective activ- ity detection using hinge-loss markov random ﬁelds

Ben London, Sameh Khamis, Stephen Bach, Bert Huang, Lise Getoor, and Larry Davis. Collective activ- ity detection using hinge-loss markov random ﬁelds. In Proceedings of the IEEE CVPR Workshops, pages 566– 571, 2013

work page 2013

[24] [24]

Deep- problog: Neural probabilistic logic programming

Robin Manhaeve, Sebastijan Dumancic, Angelika Kim- mig, Thomas Demeester, and Luc De Raedt. Deep- problog: Neural probabilistic logic programming. In Advances in Neural Information Processing Systems , pages 3753–3763, 2018

work page 2018

[25] [25]

The more you know: Using knowledge graphs for image classiﬁcation

Kenneth Marino, Ruslan Salakhutdinov, and Abhinav Gupta. The more you know: Using knowledge graphs for image classiﬁcation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 2673–2681, 2017

work page 2017

[26] [26]

George A. Miller. Wordnet: A lexical database for en- glish. Commun. ACM, 38(11):39–41, November 1995

work page 1995

[27] [27]

Randell, Zhan Cui, and Anthony G

David A. Randell, Zhan Cui, and Anthony G. Cohn. A spatial logic based on regions and connection. In Pro- ceedings 3rd International Conference ON Knowledge Representation And Reasoning, 1992

work page 1992

[28] [28]

Markov logic networks

Matthew Richardson and Pedro Domingos. Markov logic networks. Machine learning , 62(1-2):107–136, 2006

work page 2006

[29] [29]

End-to-end dif- ferentiable proving

Tim Rockt¨aschel and Sebastian Riedel. End-to-end dif- ferentiable proving. In Advances in Neural Information Processing Systems, pages 3788–3800, 2017

work page 2017

[30] [30]

Kvqa: Knowledge-aware visual question answering

Naganand Yadati Sanket Shah, Anand Mishra and Partha Pratim Talukdar. Kvqa: Knowledge-aware visual question answering. In AAAI, 2019

work page 2019

[31] [31]

A simple neural network module for relational reasoning

Adam Santoro, David Raposo, David G Barrett, Ma- teusz Malinowski, Razvan Pascanu, Peter Battaglia, and Timothy Lillicrap. A simple neural network module for relational reasoning. In NIPS, pages 4967–4976, 2017

work page 2017

[32] [32]

Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge

Luciano Seraﬁni and Artur d’Avila Garcez. Logic ten- sor networks: Deep learning and logical reasoning from data and knowledge. arXiv preprint arXiv:1606.04422, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[33] [33]

The geometry of a scene: On deep semantics for visual perception driven cognitive ﬁlm, studies

Jakob Suchan and Mehul Bhatt. The geometry of a scene: On deep semantics for visual perception driven cognitive ﬁlm, studies. In2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1–

work page

[34] [34]

Suchanek, Gjergji Kasneci, and Gerhard Weikum

Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: A core of semantic knowledge. In Pro- ceedings of the 16th International Conference on World Wide Web, WWW ’07, pages 697–706, New York, NY , USA, 2007. ACM

work page 2007

[35] [35]

Using a minimal action grammar for activity understanding in the real world

Douglas Summers-Stay, Ching L Teo, Yezhou Yang, Cornelia Ferm ¨uller, and Yiannis Aloimonos. Using a minimal action grammar for activity understanding in the real world. In 2012 IEEE/RSJ International Con- ference on Intelligent Robots and Systems, pages 4104–

work page 2012

[36] [36]

Fvqa: fact-based visual question answering

Peng Wang, Qi Wu, Chunhua Shen, Anthony Dick, and Anton van den Hengel. Fvqa: fact-based visual question answering. IEEE TPAMI, 2017

work page 2017

[37] [37]

Visual question answering: A survey of methods and datasets

Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, and Anton van den Hengel. Visual question answering: A survey of methods and datasets. Computer Vision and Image Understanding, 163:21–40, 2017

work page 2017

[38] [38]

Ask me anything: Free-form vi- sual question answering based on knowledge from ex- ternal sources

Qi Wu, Peng Wang, Chunhua Shen, Anthony Dick, and Anton van den Hengel. Ask me anything: Free-form vi- sual question answering based on knowledge from ex- ternal sources. In IEEE Conference on Computer Vi- sion and Pattern Recognition CVPR, pages 4622–4630, 2016

work page 2016

[39] [39]

Incorporating Human Domain Knowledge into Large Scale Cost Function Learning

Markus Wulfmeier, Dushyant Rao, and Ingmar Pos- ner. Incorporating human domain knowledge into large scale cost function learning. arXiv preprint arXiv:1612.04318, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[40] [40]

Morariu, and Larry S

Ruichi Yu, Ang Li, Vlad I. Morariu, and Larry S. Davis. Visual Relationship Detection with Internal and Exter- nal Linguistic Knowledge Distillation. ICCV, 2017

work page 2017

[41] [41]

Scene understanding by reasoning sta- bility and safety

Bo Zheng, Yibiao Zhao, Joey Yu, Katsushi Ikeuchi, and Song-Chun Zhu. Scene understanding by reasoning sta- bility and safety. International Journal of Computer Vi- sion, 112(2):221–238, 2015

work page 2015

[42] [42]

Temporal relational reasoning in videos

Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. Temporal relational reasoning in videos. In ECCV, September 2018

work page 2018

[43] [43]

Reasoning about object affordances in a knowledge base represen- tation

Yuke Zhu, Alireza Fathi, and Li Fei-Fei. Reasoning about object affordances in a knowledge base represen- tation. In ECCV (2), pages 408–424. Springer, 2014

work page 2014