Visual Interaction with Deep Learning Models through Collaborative Semantic Inference
Pith reviewed 2026-05-24 16:30 UTC · model grok-4.3
The pith
A collaborative semantic inference framework co-designs deep learning models and visual interfaces so users can see and control intermediate reasoning steps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a framework of collaborative semantic inference (CSI) for the co-design of interactions and models to enable visual collaboration between humans and algorithms. The approach exposes the intermediate reasoning process of models which allows semantic interactions with the visual metaphors of a problem, which means that a user can both understand and control parts of the model reasoning process. We demonstrate the feasibility of CSI with a co-designed case study of a document summarization system.
What carries the argument
The collaborative semantic inference (CSI) framework that co-designs model structure and visual interface to expose and allow interaction with intermediate reasoning processes.
If this is right
- Users gain the ability to understand and control parts of the model's reasoning through semantic interactions.
- Visual metaphors of the problem domain become the basis for controlling the model.
- The co-design approach makes it feasible to build explainable systems for tasks such as document summarization.
- Human-algorithm visual collaboration becomes possible by making intermediate processes accessible.
Where Pith is reading between the lines
- If the framework works for summarization, similar co-design could apply to other decision-making tasks where transparency matters.
- Building interpretability directly into the model via interface co-design may reduce reliance on separate explanation techniques.
- Success here suggests that model architectures could be chosen partly based on how well they support visual user control.
Load-bearing premise
Co-designing the interface and model will expose the intermediate reasoning in a usable way that gives real control and understanding without major losses in performance or added complexity.
What would settle it
If users testing the summarization system cannot correctly identify or change the model's key decisions using the provided visual interactions, or if accuracy drops substantially compared to standard models.
Figures
read the original abstract
Automation of tasks can have critical consequences when humans lose agency over decision processes. Deep learning models are particularly susceptible since current black-box approaches lack explainable reasoning. We argue that both the visual interface and model structure of deep learning systems need to take into account interaction design. We propose a framework of collaborative semantic inference (CSI) for the co-design of interactions and models to enable visual collaboration between humans and algorithms. The approach exposes the intermediate reasoning process of models which allows semantic interactions with the visual metaphors of a problem, which means that a user can both understand and control parts of the model reasoning process. We demonstrate the feasibility of CSI with a co-designed case study of a document summarization system.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a Collaborative Semantic Inference (CSI) framework for co-designing visual interfaces and deep learning model structures. This enables users to understand and semantically interact with intermediate model reasoning processes via visual metaphors, preserving human agency over automated decisions. Feasibility is asserted via a single case study of a co-designed document summarization system.
Significance. If the co-design approach can be shown to expose controllable intermediate representations without accuracy or complexity penalties, it would offer a concrete path to integrate HCI principles with DL architectures for explainable systems. The conceptual framing is timely for domains where black-box models risk eroding user control.
major comments (1)
- The case study (described in the abstract and presumably detailed in the full manuscript) asserts feasibility of CSI for document summarization but reports no quantitative metrics on summary quality (e.g., ROUGE scores), latency, model accuracy relative to a non-CSI baseline, or user-task performance. Without these, the central claim that co-design exposes intermediate reasoning for effective control remains untested and load-bearing for the feasibility assertion.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive feedback. Below we address the major comment point by point.
read point-by-point responses
-
Referee: The case study (described in the abstract and presumably detailed in the full manuscript) asserts feasibility of CSI for document summarization but reports no quantitative metrics on summary quality (e.g., ROUGE scores), latency, model accuracy relative to a non-CSI baseline, or user-task performance. Without these, the central claim that co-design exposes intermediate reasoning for effective control remains untested and load-bearing for the feasibility assertion.
Authors: The manuscript presents CSI as a conceptual co-design framework whose primary contribution is a structured process for jointly designing visual interfaces and model architectures to expose and enable semantic control over intermediate reasoning steps. The document summarization case study illustrates this process by describing the specific visual metaphors, the corresponding model modifications, and how they together permit users to inspect and intervene in the reasoning pipeline. Feasibility of the framework is demonstrated through the existence of this integrated design rather than through performance benchmarking; the paper does not claim that CSI yields higher ROUGE scores or lower latency than non-CSI baselines. Because the central claim concerns the viability of the co-design method itself, not the superiority of any particular summarizer, quantitative model metrics were outside the intended scope. We are prepared to add an explicit limitations paragraph clarifying this scope and outlining directions for future empirical studies if the editor considers it necessary. revision: no
Circularity Check
No circularity: purely conceptual framework proposal without derivations or self-referential reductions.
full rationale
The paper proposes the CSI framework at a conceptual level for co-designing visual interfaces and model structures in deep learning, supported only by a descriptive case study of document summarization. No equations, parameter fitting, predictions, or mathematical derivations appear in the provided text. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim does not reduce to its inputs by construction, as there are no quantitative predictions or fitted elements that could be circular. This is a standard non-finding for a non-mathematical HCI/DL position paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
J. Adebayo, J. Gilmer, I. Goodfellow, M. Hardt, and B. Kim. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems, 2018
work page 2018
-
[2]
S. Amershi, M. Cakmak, W. B. Knox, and T. Kulesza. Power to the people: The role of humans in interactive machine learning. AI Magazine, 35(4):105–120, 2014
work page 2014
-
[3]
S. Amershi, D. Weld, M. V orvoreanu, A. Fourney, B. Nushi, P. Collisson, J. Suh, S. Iqbal, P. N. Bennett, K. Inkpen, et al. Guidelines for human- ai interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, p. 3. ACM, 2019
work page 2019
- [4]
-
[5]
T. Babaian, B. J. Grosz, and S. M. Shieber. A writer’s collaborative assistant. In Proceedings of the 7th International Conference on Intelligent User Interfaces, pp. 7–14. ACM, 2002
work page 2002
-
[6]
Neural Machine Translation by Jointly Learning to Align and Translate
D. Bahdanau, K. Cho, and Y . Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[7]
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
D. Bau, J.-Y . Zhu, H. Strobelt, Z. Bolei, J. B. Tenenbaum, W. T. Freeman, and A. Torralba. Gan dissection: Visualizing and understanding generative adversarial networks. arXiv preprint arXiv:1811.10597, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[8]
Y . Belinkov, N. Durrani, F. Dalvi, H. Sajjad, and J. Glass. What do neural machine translation models learn about morphology? In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 861–872, 2017. 2How to train the model is described in our previous work [27]
work page 2017
-
[9]
Y . Belinkov and J. Glass. Analysis methods in neural language processing: A survey. Transactions of the Association for Computational Linguistics, 7:49–72, 2019
work page 2019
-
[10]
M. S. Bernstein, G. Little, R. C. Miller, B. Hartmann, M. S. Ackerman, D. R. Karger, D. Crowell, and K. Panovich. Soylent: a word processor with a crowd inside. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology, pp. 313–322. ACM, 2010
work page 2010
-
[11]
S. Bostandjiev, J. O’Donovan, and T. H ¨ollerer. TasteWeights: a visual interactive hybrid recommender system. In Proceedings of the 6th ACM Conference on Recommender Systems, pp. 35–42. ACM, 2012
work page 2012
- [12]
-
[13]
R. Caruana, Y . Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. Intelli- gible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Con- ference on Knowledge Discovery and Data Mining, pp. 1721–1730. ACM, 2015
work page 2015
-
[14]
D. Cashman, G. Patterson, A. Mosca, and R. Chang. Rnnbow: Visualizing learning via backpropagation gradients in recurrent neural networks. In Workshop on Visual Analytics for Deep Learning (VADL), 2017
work page 2017
-
[15]
J. Chae, S. Gao, A. Ramanthan, C. Steed, and G. D. Tourassi. Visualization for classification in deep neural networks. InWorkshop on Visual Analytics for Deep Learning, 2017
work page 2017
-
[16]
Q. Chen and V . Koltun. Photographic image synthesis with cascaded refinement networks. InProceedings of the IEEE International Conference on Computer Vision, pp. 1511–1520, 2017
work page 2017
-
[17]
D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129–145, 1996
work page 1996
-
[18]
M. W. Craven and J. W. Shavlik. Using neural networks for data mining. Future Generation Computer Systems, 13(2-3):211–229, 1997
work page 1997
-
[19]
R. J. Crouser and R. Chang. An affordance-based framework for human computation and human-computer collaboration. IEEE Transactions on Visualization and Computer Graphics, 18(12):2859–2868, 2012
work page 2012
- [20]
-
[21]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[22]
D. Dey, V . Ramakrishna, M. Hebert, and J. Andrew Bagnell. Predicting multiple structured visual interpretations. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2947–2955, 2015
work page 2015
- [23]
- [24]
-
[25]
J. A. Fails and D. R. Olsen Jr. Interactive machine learning. InProceedings of the 8th International Conference on Intelligent User Interfaces, pp. 39–
-
[26]
A. Fan, M. Lewis, and Y . Dauphin. Hierarchical neural story generation. arXiv preprint arXiv:1805.04833, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[27]
S. Gehrmann, Y . Deng, and A. Rush. Bottom-up abstractive summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4098–4109, 2018
work page 2018
- [28]
-
[29]
B. J. Grosz. Collaborative systems (aaai-94 presidential address). AI magazine, 17(2):67, 1996
work page 1996
-
[30]
M. J. Guzdial, J. Chen, S.-Y . Chen, and M. Riedl. A general level design editor for co-creative level design. In Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference, 2017
work page 2017
-
[31]
J. Heer. Agency plus automation: Designing artificial intelligence into interactive systems. Proceedings of the National Academy of Sciences , 116(6):1844–1850, 2019
work page 2019
- [32]
-
[33]
F. M. Hohman, M. Kahng, R. Pienta, and D. H. Chau. Visual analytics in deep learning: An interrogative survey for the next frontiers. IEEE Transactions on Visualization and Computer Graphics, 2018
work page 2018
-
[34]
Improving fairness in machine learning systems: What do industry practitioners need?
K. Holstein, J. W. Vaughan, H. Daum ´e III, M. Dud ´ık, and H. Wallach. Improving fairness in machine learning systems: What do industry practi- tioners need? arXiv preprint arXiv:1812.05239, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [35]
-
[36]
E. Horvitz. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 159–166. ACM, 1999
work page 1999
-
[37]
C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, I. Simon, C. Hawthorne, A. Dai, M. Hoffman, M. Dinculescu, and D. Eck. Music transformer: Generating music with long-term structure. 2019
work page 2019
-
[38]
M. C. Hughes, G. Hope, L. Weiner, T. H. McCoy Jr, R. H. Perlis, E. B. Sudderth, and F. Doshi-Velez. Semi-supervised prediction-constrained topic models. In AISTATS, pp. 1067–1076, 2018
work page 2018
- [39]
-
[40]
H. Jing and K. R. McKeown. The decomposition of human-written sum- mary sentences. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 129–136, 1999
work page 1999
- [41]
-
[42]
D. Keim, G. Andrienko, J.-D. Fekete, C. G ¨org, J. Kohlhammer, and G. Melanc ¸on. Visual analytics: Definition, process, and challenges. In Information Visualization, pp. 154–175. Springer, 2008
work page 2008
- [43]
-
[44]
B. Kim, R. Khanna, and O. O. Koyejo. Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems, pp. 2280–2288, 2016
work page 2016
- [45]
-
[46]
P.-J. Kindermans, S. Hooker, J. Adebayo, M. Alber, K. T. Sch¨utt, S. D¨ahne, D. Erhan, and B. Kim. The (Un)reliability of saliency methods. NIPS workshop on Explaining and Visualizing Deep Learning, 2017
work page 2017
-
[47]
A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations
J. Krause, A. Dasgupta, J. Swartz, Y . Aphinyanaphongs, and E. Bertini. A workflow for visual diagnostics of binary classifiers using instance-level explanations. arXiv preprint arXiv:1705.01968, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [48]
-
[49]
T. Kulesza, M. Burnett, W.-K. Wong, and S. Stumpf. Principles of explana- tory debugging to personalize interactive machine learning. InProceedings of the 20th International Conference on Intelligent User Interfaces , pp. 126–137. ACM, 2015
work page 2015
-
[50]
T. Kulesza, S. Stumpf, M. Burnett, W.-K. Wong, Y . Riche, T. Moore, I. Oberst, A. Shinsel, and K. McIntosh. Explanatory debugging: Sup- porting end-user debugging of machine-learned programs. In Visual Lan- guages and Human-Centric Computing (VL/HCC), 2010 IEEE Symposium on, pp. 41–48. IEEE, 2010
work page 2010
-
[51]
B. C. Kwon, M.-J. Choi, J. T. Kim, E. Choi, Y . B. Kim, S. Kwon, J. Sun, and J. Choo. RetainVis: Visual analytics with interpretable and inter- active recurrent neural networks on electronic medical records. IEEE Transactions on Visualization and Computer Graphics, 25(1):299–309, 2019
work page 2019
-
[52]
C. Lacave and F. J. D´ıez. A review of explanation methods for bayesian networks. The Knowledge Engineering Review, 17(2):107–127, 2002
work page 2002
-
[53]
V . Lai and C. Tan. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. arXiv preprint arXiv:1811.07901, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[54]
T. Lei, R. Barzilay, and T. Jaakkola. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[55]
J. Li, X. Chen, E. Hovy, and D. Jurafsky. Visualizing and understanding neural models in nlp. arXiv preprint arXiv:1506.01066, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[56]
B. Y . Lim, A. K. Dey, and D. Avrahami. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In Pro- ceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2119–2128. ACM, 2009
work page 2009
-
[57]
H. Lin, T. Wu, K. Wongsuphasawat, Y . Choi, and J. Heer. Visualizing attention in sequence-to-sequence summarization models
-
[58]
Z. C. Lipton. The mythos of model interpretability. arXiv preprint arXiv:1606.03490, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[59]
S. Liu, Z. Li, T. Li, V . Srikumar, V . Pascucci, and P.-T. Bremer. NLIZE: A perturbation-driven visual interrogation tool for analyzing and interpreting natural language inference models. IEEE Transactions on Visualization and Computer Graphics, 25(1):651–660, 2019
work page 2019
-
[60]
S. Liu, X. Wang, M. Liu, and J. Zhu. Towards better analysis of ma- chine learning models: A visual analytics perspective. Visual Informatics, 1(1):48–56, 2017
work page 2017
-
[61]
Y . Lu, R. Garcia, B. Hansen, M. Gleicher, and R. Maciejewski. The state- of-the-art in predictive visual analytics. In Computer Graphics Forum, vol. 36, pp. 539–562. Wiley Online Library, 2017
work page 2017
-
[62]
Y . Ming, S. Cao, R. Zhang, Z. Li, Y . Chen, Y . Song, and H. Qu. Under- standing hidden memories of recurrent neural networks. In 2017 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 13–24. IEEE, 2017
work page 2017
- [63]
-
[64]
T. Park, M.-Y . Liu, T.-C. Wang, and J.-Y . Zhu. Semantic image synthe- sis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2337–2346, 2019
work page 2019
-
[65]
N. Pezzotti, T. H¨ollt, J. Van Gemert, B. P. Lelieveldt, E. Eisemann, and A. Vilanova. Deepeyes: Progressive visual analytics for designing deep neural networks. IEEE Transactions on Visualization and Computer Graphics, 24(1):98–108, 2018
work page 2018
-
[66]
M. T. Ribeiro, S. Singh, and C. Guestrin. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM, 2016
work page 2016
- [67]
-
[68]
A. See, P. J. Liu, and C. D. Manning. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1073–1083, 2017
work page 2017
-
[69]
Direct-Manipulation Visualization of Deep Networks
D. Smilkov, S. Carter, D. Sculley, F. B. Vi ´egas, and M. Wattenberg. Direct-manipulation visualization of deep networks. arXiv preprint arXiv:1708.03788, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[70]
Embedding Projector: Interactive Visualization and Interpretation of Embeddings
D. Smilkov, N. Thorat, C. Nicholson, E. Reif, F. B. Vi´egas, and M. Wat- tenberg. Embedding projector: Interactive visualization and interpretation of embeddings. arXiv preprint arXiv:1611.05469, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[71]
C. D. Stolper, A. Perer, and D. Gotz. Progressive visual analytics: User- driven visual exploration of in-progress analytics. IEEE Transactions on Visualization and Computer Graphics, 20(12):1653–1662, 2014
work page 2014
-
[72]
H. Strobelt, S. Gehrmann, M. Behrisch, A. Perer, H. Pfister, and A. Rush. Debugging sequence-to-sequence models with Seq2Seq-Vis. In Pro- ceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 368–370, 2018
work page 2018
-
[73]
H. Strobelt, S. Gehrmann, M. Behrisch, A. Perer, H. Pfister, and A. M. Rush. Seq2Seq-Vis: A visual debugging tool for sequence-to-sequence models. IEEE transactions on visualization and computer graphics , 25(1):353–363, 2019
work page 2019
-
[74]
H. Strobelt, S. Gehrmann, H. Pfister, and A. M. Rush. LSTMVis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE transactions on visualization and computer graphics, 24(1):667–676, 2018
work page 2018
- [75]
-
[76]
F.-Y . Tzeng and K.-L. Ma. Opening the black box-data driven visualization of neural networks. In VIS 05. IEEE Visualization, 2005., pp. 383–390. IEEE, 2005
work page 2005
-
[77]
A. Vellido, J. D. Mart ´ın-Guerrero, and P. J. Lisboa. Making machine learning models interpretable. InEuropean Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), vol. 12, pp. 163–172. Citeseer, 2012
work page 2012
-
[78]
Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR
S. Wachter, B. Mittelstadt, and C. Russell. Counterfactual explanations without opening the black box: Automated decisions and the GDPR.arXiv preprint arXiv:1711.00399, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[79]
J. Wang, L. Gou, H. Yang, and H.-W. Shen. Ganviz: A visual analyt- ics approach to understand the adversarial game. IEEE transactions on visualization and computer graphics, 24(6):1905–1917, 2018
work page 1905
-
[80]
T.-C. Wang, M.-Y . Liu, J.-Y . Zhu, A. Tao, J. Kautz, and B. Catanzaro. High- resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807, 2018
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.