Recommending Related Tables
Pith reviewed 2026-05-25 00:57 UTC · model grok-4.3
The pith
Tables are recommended by embedding their elements in multiple semantic spaces and learning to combine the similarities.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes a theoretically sound framework for table matching based on multi-space element representations combined via discriminative learning, which outperforms prior methods on Wikipedia table data.
What carries the argument
Representation of table elements in multiple semantic spaces combined using a discriminative learning model to compute table similarity.
If this is right
- Proactive recommendations of related structured content can be provided to spreadsheet users.
- Table similarity computation becomes more accurate by leveraging multiple semantic views.
- Ranked lists of relevant tables can be generated effectively from large collections like Wikipedia.
Where Pith is reading between the lines
- This approach might generalize to matching other structured data formats beyond tables.
- Deployment in enterprise environments would require validating the method on non-Wikipedia data.
- Future work could explore additional semantic spaces or different learning models for combination.
Load-bearing premise
The purpose-built test collection from Wikipedia tables is representative of real-world table recommendation scenarios.
What would settle it
Demonstrating that the method does not outperform baselines on a collection of enterprise spreadsheets would falsify the claim of state-of-the-art performance in practical settings.
Figures
read the original abstract
Tables are an extremely powerful visual and interactive tool for structuring and manipulating data, making spreadsheet programs one of the most popular computer applications. In this paper we introduce and address the task of recommending related tables: given an input table, identifying and returning a ranked list of relevant tables. One of the many possible application scenarios for this task is to provide users of a spreadsheet program proactively with recommendations for related structured content on the Web. At its core, the related table recommendation task boils down to computing the similarity between a pair of tables. We develop a theoretically sound framework for performing table matching. Our approach hinges on the idea of representing table elements in multiple semantic spaces, and then combining element-level similarities using a discriminative learning model. Using a purpose-built test collection from Wikipedia tables, we demonstrate that the proposed approach delivers state-of-the-art performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the task of recommending related tables to a given input table, motivated by applications such as proactive recommendations in spreadsheet programs. It develops a framework for table matching that represents table elements in multiple semantic spaces and combines element-level similarities via a discriminative learning model. Using a purpose-built test collection derived from Wikipedia tables, the approach is shown to achieve state-of-the-art performance.
Significance. If the results hold, the multi-semantic-space representation offers a principled way to capture different facets of table similarity, which could benefit structured data recommendation systems. The construction of a purpose-built test collection from Wikipedia tables is a positive contribution that enables future work on this task.
major comments (1)
- [Experiments] Experiments section: The SOTA claim and applicability to the scenarios in the introduction (proactive spreadsheet recommendations, web structured content) rest on results from the purpose-built Wikipedia test collection, but no details are given on collection construction, relevance judgment protocol, inter-annotator agreement, or any cross-domain validation. This is load-bearing because the collection's element distributions, schema variability, and relevance criteria may not match enterprise spreadsheets or user-generated content.
minor comments (1)
- The abstract would benefit from specifying the evaluation metrics (e.g., MAP or NDCG) used to establish state-of-the-art performance.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The primary concern raised is the level of detail provided on the test collection and its implications for the SOTA claims and applicability. We address this point below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: Experiments section: The SOTA claim and applicability to the scenarios in the introduction (proactive spreadsheet recommendations, web structured content) rest on results from the purpose-built Wikipedia test collection, but no details are given on collection construction, relevance judgment protocol, inter-annotator agreement, or any cross-domain validation. This is load-bearing because the collection's element distributions, schema variability, and relevance criteria may not match enterprise spreadsheets or user-generated content.
Authors: We agree that expanded details on the test collection are warranted to support the claims. Section 4 describes the Wikipedia table sampling process and pairing strategy, but the relevance judgment protocol, inter-annotator agreement statistics, and explicit discussion of schema variability were not elaborated sufficiently. In the revised version we will add a dedicated subsection detailing the judgment guidelines, report agreement measures, and include a limitations paragraph addressing differences from enterprise spreadsheets and user-generated content. We maintain that the collection serves as a valid proxy for web structured data (consistent with prior table corpora), but acknowledge the absence of cross-domain experiments and will frame the results accordingly without overgeneralizing applicability. revision: yes
Circularity Check
No circularity; derivation is self-contained empirical framework
full rationale
The paper introduces a table matching framework based on multi-semantic-space element representations combined via a discriminative learning model, then reports empirical SOTA results on a purpose-built Wikipedia test collection. No equations, parameter-fitting procedures, or self-citations are visible that would reduce the claimed similarity computation or performance result to the inputs by construction. The central claim rests on standard representation and supervised combination techniques evaluated externally on held-out data rather than any self-definitional, fitted-input-renamed-as-prediction, or self-citation-load-bearing step.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ahmad Ahmadov, Maik /T_hiele, Julian Eberius, Wolfgang Lehner, and Robert Wrembel. 2015. Towards a Hybrid Imputation Approach Using Web Tables.. In Proc. of BDC ’15 . 21–30
work page 2015
-
[2]
Anonymous. 2017. Removed to Protect Anonymity. (2017)
work page 2017
-
[3]
Sreeram Balakrishnan, Alon Y. Halevy, Boulos Harb, Hongrae Lee, Jayant Mad- havan, Afshin Rostamizadeh, Warren Shen, Kenneth Wilder, Fei Wu, and Cong Yu. 2015. Applying WebTables in Practice. In Proc. of CIDR ’15
work page 2015
-
[4]
Somnath Banerjee, Soumen Chakrabarti, and Ganesh Ramakrishnan. 2009. Learn- ing to Rank for /Q_uantity Consensus /Q_ueries. InProc. of SIGIR ’09 . 243–250
work page 2009
-
[5]
Chandra Sekhar Bhagavatula, /T_hanapon Noraset, and Doug Downey. 2013. Meth- ods for Exploring and Mining Tables on Wikipedia. In Proc. of IDEA ’13 . 18–26
work page 2013
-
[6]
Chandra Sekhar Bhagavatula, /T_hanapon Noraset, and Doug Downey. 2015. TabEL: Entity Linking in Web Tables. InProc. of ISWC 2015. 425–441
work page 2015
-
[7]
Cafarella, Alon Halevy, and Nodira Khoussainova
Michael J. Cafarella, Alon Halevy, and Nodira Khoussainova. 2009. Data Integra- tion for the Relational Web. Proc. of VLDB Endow. 2 (2009), 1090–1101
work page 2009
-
[8]
Cafarella, Alon Halevy, and Jayant Madhavan
Michael J. Cafarella, Alon Halevy, and Jayant Madhavan. 2011. Structured Data on the Web. Commun. ACM 54 (2011), 72–79
work page 2011
-
[9]
Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang
Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, and Yang Zhang
-
[10]
WebTables: Exploring the Power of Tables on the Web. Proc. of VLDB Endow. 1 (2008), 538–549
work page 2008
-
[11]
Fernando Chirigati, Jialu Liu, Flip Korn, You (Will) Wu, Cong Yu, and Hao Zhang
-
[12]
Knowledge Exploration Using Tables on the Web. Proc. of VLDB Endow. 10 (2016), 193–204
work page 2016
-
[13]
Eric Crestan and Patrick Pantel. 2011. Web-scale Table Census and Classi/f_ication. In Proc. of WSDM ’11 . 545–554
work page 2011
-
[14]
Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, and Cong Yu. 2012. Finding Related Tables. In Proc. of SIGMOD ’12 . 817–828
work page 2012
-
[15]
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, /T_homas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion. InProc. of KDD ’14. 601–610
work page 2014
-
[16]
J.L. Fleiss et al. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76 (1971), 378–382
work page 1971
-
[17]
Faegheh Hasibi, Krisztian Balog, Dar ´ıo Gariglio/t_ti, and Shuo Zhang. 2017. Nordlys: A Toolkit for Entity-Oriented and Semantic Search. In Proc. of SIGIR ’17. 1289–1292
work page 2017
-
[18]
Yusra Ibrahim, Mirek Riedewald, and Gerhard Weikum. 2016. Making Sense of Entities and /Q_uantities in Web Tables. InProc. of CIKM ’16 . 1703–1712
work page 2016
-
[19]
Oliver Lehmberg, Dominique Ritze, Robert Meusel, and Christian Bizer. 2016. A Large Public Corpus of Web Tables Containing Time and Context Metadata. In Proc. of WWW ’16 Companion . 75–76
work page 2016
-
[20]
Oliver Lehmberg, Dominique Ritze, Petar Ristoski, Robert Meusel, Heiko Paul- heim, and Christian Bizer. 2015. /T_he Mannheim Search Join Engine.Web Semant. 35 (2015), 159–166
work page 2015
-
[21]
Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proc. of VLDB Endow. 3 (2010), 1338–1347
work page 2010
-
[22]
Craig Macdonald, Rodrygo L T Santos, and Iadh Ounis. 2012. On the Usefulness of /Q_uery Features for Learning to Rank. InProc. of CIKM ’12 . 2559–2562
work page 2012
-
[23]
Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, and Alon Y. Halevy
-
[24]
Harnessing the Deep Web: Present and Future
Harnessing the Deep Web: Present and Future. CoRR abs/0909.1785 (2009)
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[25]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and /T_heir Compositionality. In Proc. of NIPS ’13 . 3111–3119
work page 2013
-
[26]
Emir Munoz, Aidan Hogan, and Alessandra Mileo. 2014. Using Linked Data to Mine RDF from Wikipedia’s Tables. In Proc. of WSDM ’14 . 533–542
work page 2014
-
[27]
Neural Programmer: Inducing Latent Programs with Gradient Descent
Arvind Neelakantan, /Q_uoc V. Le, and Ilya Sutskever. 2015. Neural Programmer: Inducing Latent Programs with Gradient Descent. CoRR abs/1511.04834 (2015)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[28]
/T_hanh Tam Nguyen, /Q_uoc Viet Hung Nguyen, Weidlich Ma/t_thias, and Aberer Karl. 2015. Result Selection and Summarization for Web Table Search. In ISDE ’15. 425–441
work page 2015
-
[29]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. GloVe: Global Vectors for Word Representation. InProc. of EMNLP ’14 . 1532–1543
work page 2014
-
[30]
Rakesh Pimplikar and Sunita Sarawagi. 2012. Answering Table /Q_ueries on the Web Using Column Keywords. Proc. of VLDB Endow. 5 (2012), 908–919
work page 2012
-
[31]
Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A Benchmark Collection for Research on Learning to Rank for Information Retrieval. Inf. Retr. 13, 4 (Aug 2010), 346–374
work page 2010
-
[32]
Petar Ristoski and Heiko Paulheim. 2016. RDF2vec: RDF Graph Embeddings for Data Mining. In Proc. of ISWC ’16. 498–514
work page 2016
-
[33]
Dominique Ritze, Oliver Lehmberg, Yaser Oulabi, and Christian Bizer. 2016. Pro/f_iling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases. In Proc. of WWW ’16 . 251–261
work page 2016
-
[34]
Sunita Sarawagi and Soumen Chakrabarti. 2014. Open-domain /Q_uantity /Q_ueries on Web Tables: Annotation, Response, and Consensus Models. In Proc. of KDD ’14. 711–720
work page 2014
-
[35]
Sekhavat, Francesco Di Paolo, Denilson Barbosa, and Paolo Merialdo
Yoones A. Sekhavat, Francesco Di Paolo, Denilson Barbosa, and Paolo Merialdo
-
[36]
Knowledge Base Augmentation using Tabular Data. In Proc. of LDOW ’14
-
[37]
Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity Linking with a Knowl- edge Base: Issues, Techniques, and Solutions. IEEE Trans. Knowl. Data Eng. 27, 2 (feb 2015), 443–460
work page 2015
-
[38]
Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. 2005. Early Versus Late Fusion in Semantic Video Analysis. In Proc. of MULTIMEDIA ’05 . 399–402
work page 2005
-
[39]
Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Pas ¸ca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. 2011. Recovering Semantics of Tables on the Web. Proc. of VLDB Endow. 4 (2011), 528–538
work page 2011
-
[40]
Jiannan Wang, Guoliang Li, and Jianhua Fe. 2011. Fast-join: An Efficient Method for Fuzzy Token Matching Based String Similarity Join. In Proc. of ICDE ’11 . 458–469
work page 2011
-
[41]
Mohamed Yakout, Kris Ganjam, Kaushik Chakrabarti, and Surajit Chaudhuri
-
[42]
InfoGather: Entity Augmentation and A/t_tribute Discovery by Holistic Matching with Web Tables. In Proc. of SIGMOD ’12 . 97–108
-
[43]
Pengcheng Yin, Zhengdong Lu, Hang Li, and Ben Kao. 2016. Neural Enquirer: Learning to /Q_uery Tables in Natural Language. InProc. of IJCAI ’16 . 2308–2314
work page 2016
-
[44]
Meihui Zhang and Kaushik Chakrabarti. 2013. InfoGather+: Semantic Matching and Annotation of Numeric and Time-varying A/t_tributes in Web Tables. InProc. of SIGMOD ’13. 145–156
work page 2013
-
[45]
Shuo Zhang and Krisztian Balog. 2017. Design Pa/t_terns for Fusion-Based Ob- ject Retrieval. In Proceedings of the 39th European conference on Advances in Information Retrieval (ECIR ’17) . Springer, 684–690
work page 2017
-
[46]
Shuo Zhang and Krisztian Balog. 2017. EntiTables: Smart Assistance for Entity- Focused Tables. In Proc. of SIGIR ’17 . 255–264
work page 2017
-
[47]
Shuo Zhang and Krisztian Balog. 2018. Ad Hoc Table Retrieval Using Semantic Similarity. In Proceedings of /T_he Web Conference (WWW ’18). 1553–1562
work page 2018
-
[48]
Shuo Zhang and Krisztian Balog. 2018. On-the-/f_ly Table Generation. InProceed- ings of 41st International ACM SIGIR Conference on Research and Development in Information Retrieval
work page 2018
-
[49]
Stefan Zwicklbauer, Christoph Einsiedler, Michael Granitzer, and Christin Seifert
-
[50]
Towards Disambiguating Web Tables. In Proc. of ISWC-PD’ 13. 205–208
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.