arxiv: 2604.17454 · v1 · submitted 2026-04-19 · 💻 cs.CV

Recognition: unknown

HSG: Hyperbolic Scene Graph

Liyang Wang , Zeyu Zhang , Hao Tang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:28 UTC · model grok-4.3

classification 💻 cs.CV

keywords scene graphshyperbolic embeddingshierarchical structuregraph representationscomputer visionretrievalgeometric distanceentailment relationships

0 comments

The pith

Embedding scene graphs in hyperbolic space encodes object hierarchies through geometric distance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that scene graph embeddings benefit from hyperbolic geometry because it naturally represents hierarchical entailment between places and objects via distance, unlike Euclidean space. This addresses limitations in existing methods that use contrastive learning and attention but produce less consistent structures. The approach maintains competitive retrieval while delivering clear gains at the graph level. A sympathetic reader would care because scene graphs support structured reasoning in visual tasks, and stronger hierarchies could improve how models understand relationships in scenes. The results highlight that the geometry choice directly affects structural quality in the learned representations.

Core claim

HSG learns scene graph embeddings in hyperbolic space where hierarchical relationships are naturally encoded through geometric distance. This produces better hierarchical structure quality than Euclidean embeddings while preserving strong retrieval performance. The largest improvements appear in graph-level metrics, with PP IoU at 33.17 and Graph IoU at 33.51, exceeding the best prior Euclidean variant by 8.14 points.

What carries the argument

Hyperbolic space for scene graph embeddings, where distance represents hierarchical entailment between objects and places.

If this is right

Hierarchical structure quality in scene graph representations increases measurably.
Graph-level IoU metrics rise substantially while retrieval performance holds steady.
Entailment relationships between places and objects become more explicitly captured in the embeddings.
The method supports improved structural consistency for multiview and 3D scene reasoning tasks.
Hyperbolic distance serves as a direct geometric signal for hierarchy in visual graphs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same hyperbolic approach could be tested on other structured visual data such as video scene graphs.
If distance-based hierarchy works here, it may transfer to knowledge-graph style reasoning in vision-language models.
A direct comparison with other non-Euclidean spaces would isolate whether hyperbolic geometry is uniquely suited.
Downstream tasks like visual question answering might show gains if they rely on the improved graph structures.

Load-bearing premise

The observed gains in graph-level metrics stem from hyperbolic geometry's encoding of hierarchies rather than from other modeling choices or data properties.

What would settle it

Training an otherwise identical model entirely in Euclidean space that reaches Graph IoU scores near 33.51 would show the geometry choice is not responsible for the gains.

Figures

Figures reproduced from arXiv: 2604.17454 by Hao Tang, Liyang Wang, Zeyu Zhang.

**Figure 2.** Figure 2: Scene graph construction and cross-view consistency. Left: [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: HSG model design: HSG adopts a similar architecture to MSG, but replaces L2-normalized hyperspherical embeddings and cosine similarity with Lorentz hyperboloid embeddings via the exponential map and negative Lorentzian distance, along with an additional entailment loss to enforce hierarchical structure. We can map a vector u ∈ R n+1 from Euclidean space onto the manifold by projecting it to the tangent sp… view at source ↗

**Figure 4.** Figure 4: Different curvature initializations: Curvature initialization strongly impacts retrieval and graphlevel performance, values that are too small or large cause metric collapse due to limited hierarchical capacity or instability. Performance peaks at curv_init = 80, where hyperbolic space balances hierarchical expressiveness and training stability [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of embedding distances from [ROOT]: [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Visualization of place and object embeddings projected to the Poincaré disk during training for HSG. Each stage corresponds to an evaluation performed at equal training intervals. Blue and red points correspond to place and object embeddings, respectively [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: Distribution of embedding distances from [ROOT]: We embed training images for HSG using different encoder backbones: DINOv2(default), ConvNeXt-base and ViT-base, respectively. Embedding distances from [Root]. To qualitatively evaluate the hierarchical structure learned by different encoder backbones, we visualize the distribution of embedding distances to the root node for HSG models instantiated with D… view at source ↗

**Figure 8.** Figure 8: Top: visualization of detection results. Bottom: a screenshot of the interactive graph visualization, where place nodes are shown in blue and object nodes in orange. E.3 Qualitative real-world experiment [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of place nodes. Each group of three images shown side by side corresponds to nodes connected in the MSG, indicating that they are recognized as the same place [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗

**Figure 10.** Figure 10: Objects recognized across different views that correspond to the same physical object are grouped into a single object node. Within a scene, instances of the same object are highlighted with the same color. Note that some images were captured sideways; in the visualization, we retain their original orientation [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗

read the original abstract

Scene graph representations enable structured visual understanding by modeling objects and their relationships, and have been widely used for multiview and 3D scene reasoning. Existing methods such as MSG learn scene graph embeddings in Euclidean space using contrastive learning and attention based association. However, Euclidean geometry does not explicitly capture hierarchical entailment relationships between places and objects, limiting the structural consistency of learned representations. To address this, we propose Hyperbolic Scene Graph (HSG), which learns scene graph embeddings in hyperbolic space where hierarchical relationships are naturally encoded through geometric distance. Our results show that HSG improves hierarchical structure quality while maintaining strong retrieval performance. The largest gains are observed in graph level metrics: HSG achieves a PP IoU of 33.17 and the highest Graph IoU of 33.51, outperforming the best AoMSG variant (25.37) by 8.14, highlighting the effectiveness of hyperbolic representation learning for scene graph modeling. Code: https://github.com/AIGeeksGroup/HSG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HSG applies hyperbolic embeddings to scene graphs and gets an 8-point Graph IoU lift, but the comparisons do not isolate the geometry from other modeling differences.

read the letter

HSG takes the MSG approach to scene graph embeddings and moves it into hyperbolic space to better handle the hierarchical relationships between objects and places. The headline result is the Graph IoU score rising to 33.51 from the best prior variant's 25.37, with a similar gain on PP IoU, while retrieval performance stays competitive. The motivation is straightforward: Euclidean space does not naturally encode entailment, and hyperbolic geometry does through its distance properties. That is a reasonable extension given prior uses of hyperbolic embeddings elsewhere. The code release helps anyone who wants to check or extend the implementation. The paper stays focused on the specific task of improving structural consistency in scene graphs for multiview and 3D reasoning. The main limitation is that the reported gains are measured against AoMSG variants that already differ in attention mechanisms, loss terms, and other details. Without a direct control that keeps the rest of the architecture fixed and only swaps the manifold, it is difficult to attribute the lift specifically to hyperbolic geometry rather than capacity, optimization, or data biases. The abstract also gives no error bars or significance tests, so the reliability of the 8-point difference is hard to judge from the summary alone. This is a narrow, incremental piece aimed at researchers already working on scene graphs or geometric embeddings in vision. It would be worth a reading group slot if the group is looking at alternatives to Euclidean representations for structured data. The work is coherent enough on its own terms to deserve peer review, though any referee would likely press for tighter ablations and more statistical detail.

Referee Report

2 major / 2 minor

Summary. The paper proposes Hyperbolic Scene Graph (HSG) to embed scene graphs in hyperbolic space, arguing that this geometry better encodes hierarchical entailment relationships between objects and places than Euclidean embeddings used in prior work such as MSG and AoMSG variants. It claims improved hierarchical structure quality alongside competitive retrieval performance, with the largest gains in graph-level metrics: HSG reports PP IoU of 33.17 and Graph IoU of 33.51, outperforming the best AoMSG variant (25.37) by 8.14 points. Code is provided for reproducibility.

Significance. If the gains can be causally linked to hyperbolic geometry rather than confounding modeling differences, the approach would offer a principled way to exploit hyperbolic space's exponential volume growth for hierarchical visual structures, with potential benefits for downstream tasks like 3D scene reasoning and retrieval. The open code is a positive factor for verification.

major comments (2)

[Abstract and Results] Abstract and Results section: the 8.14-point Graph IoU lift is presented as evidence for hyperbolic geometry's advantage, yet the comparisons are only to AoMSG variants that already differ in association mechanism, loss formulation, and embedding handling; no Euclidean control re-using identical attention, contrastive objective, and training schedule is reported, so the delta cannot be attributed specifically to the manifold choice.
[Results] Experimental details (throughout Results): metric improvements are stated without error bars, statistical tests, full baseline hyperparameter tables, or dataset statistics, preventing assessment of whether the hierarchical gains are robust or sensitive to implementation choices.

minor comments (2)

[Method] The definitions of PP IoU and Graph IoU are introduced only in the results; they should be defined in the method section with explicit formulas.
[Abstract] The abstract mentions 'multiview and 3D scene reasoning' applications but the experiments appear limited to standard scene-graph benchmarks; clarifying the evaluation scope would help.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the evidence and reporting.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results section: the 8.14-point Graph IoU lift is presented as evidence for hyperbolic geometry's advantage, yet the comparisons are only to AoMSG variants that already differ in association mechanism, loss formulation, and embedding handling; no Euclidean control re-using identical attention, contrastive objective, and training schedule is reported, so the delta cannot be attributed specifically to the manifold choice.

Authors: We acknowledge that the referee's point is valid and that confounding factors in the existing baselines prevent a fully isolated attribution to hyperbolic geometry. To address this directly, we will add a new Euclidean control experiment in the revised manuscript. This control will reuse the exact same attention mechanism, contrastive objective, training schedule, and all other modeling choices as HSG, differing solely in the embedding manifold (Euclidean versus hyperbolic). The results of this ablation will be reported to provide clearer evidence on the contribution of the geometry. revision: yes
Referee: [Results] Experimental details (throughout Results): metric improvements are stated without error bars, statistical tests, full baseline hyperparameter tables, or dataset statistics, preventing assessment of whether the hierarchical gains are robust or sensitive to implementation choices.

Authors: We agree that the current results section would benefit from greater statistical detail and transparency. In the revised manuscript we will report error bars as standard deviations across multiple random seeds, include statistical significance tests for the primary metrics, provide a complete table of hyperparameters for HSG and all baselines, and add relevant dataset statistics (e.g., scene, object, and relation counts). These additions will allow readers to better evaluate robustness. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical modeling claims or derivations

full rationale

The paper introduces HSG as a hyperbolic-space alternative to Euclidean scene-graph methods (MSG/AoMSG) and reports direct benchmark results (PP IoU 33.17, Graph IoU 33.51). No mathematical derivation chain exists; performance deltas are measured outcomes on external data rather than quantities fitted or predicted from the same inputs. No self-definitional equations, fitted-input predictions, or load-bearing self-citations appear in the provided text. The evaluation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the geometric property that hyperbolic space encodes hierarchies via distance, treated as a domain assumption from non-Euclidean geometry rather than a new postulate.

axioms (1)

domain assumption Hyperbolic geometry naturally encodes hierarchical relationships through geometric distance
Explicitly stated as the motivation for moving from Euclidean to hyperbolic space.

pith-pipeline@v0.9.0 · 5469 in / 1158 out tokens · 52749 ms · 2026-05-10T05:28:11.304850+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 13 canonical work pages · 4 internal anchors

[1]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Atigh, M.G., Schoep, J., Acar, E., Van Noord, N., Mettes, P.: Hyperbolic image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4453–4462 (2022) 4, 12

2022
[2]

In: International conference on machine learning

Bachmann, G., Bécigneul, G., Ganea, O.: Constant curvature graph convolutional networks. In: International conference on machine learning. pp. 486–496. PMLR (2020) 14

2020
[3]

Advances in Neural Information Processing Sys- tems34, 12316–12327 (2021) 5

Bai, Y., Ying, Z., Ren, H., Leskovec, J.: Modeling heterogeneous hierarchies with relation-specific hyperbolic cones. Advances in Neural Information Processing Sys- tems34, 12316–12327 (2021) 5

2021
[4]

Vicreg: Variance-invariance-covariance regularization for self-supervised learning.arXiv preprint arXiv:2105.04906,

Bardes, A., Ponce, J., LeCun, Y.: Vicreg: Variance-invariance-covariance regular- ization for self-supervised learning. arXiv preprint arXiv:2105.04906 (2021) 11

work page arXiv 2021
[5]

arXiv preprint arXiv:2111.08897 (2021)

Baruch, G., Chen, Z., Dehghan, A., Dimry, T., Feigin, Y., Fu, P., Gebauer, T., Joffe, B., Kurz, D., Schwartz, A., et al.: Arkitscenes: A diverse real-world dataset for 3d indoor scene understanding using mobile rgb-d data. arXiv preprint arXiv:2111.08897 (2021) 10

work page arXiv 2021
[6]

arXiv preprint arXiv:2303.01986 (2023) 11

Bordes, F., Balestriero, R., Vincent, P.: Towards democratizing joint-embedding self-supervised learning. arXiv preprint arXiv:2303.01986 (2023) 11

work page arXiv 2023
[7]

Flavors of geometry31(59-115), 2 (1997) 5

Cannon, J.W., Floyd, W.J., Kenyon, R., Parry, W.R., et al.: Hyperbolic geometry. Flavors of geometry31(59-115), 2 (1997) 5

1997
[8]

In: European conference on computer vision

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End- to-end object detection with transformers. In: European conference on computer vision. pp. 213–229. Springer (2020) 10

2020
[9]

Mathematics (Boston, Mass.), Birkhäuser (1992),https://books.google.co.kr/books?id=uXJQQgAACAAJ6

do Carmo, M.: Riemannian Geometry. Mathematics (Boston, Mass.), Birkhäuser (1992),https://books.google.co.kr/books?id=uXJQQgAACAAJ6

1992
[10]

Advances in neural information processing systems32(2019) 2

Chami, I., Ying, Z., Ré, C., Leskovec, J.: Hyperbolic graph convolutional neural networks. Advances in neural information processing systems32(2019) 2

2019
[11]

In: International conference on machine learning

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PmLR (2020) 4, 11

2020
[12]

IEEE Transactions on Pattern Analysis and Machine Intelli- gence45(9), 11169–11183 (2023) 3

Cong, Y., Yang, M.Y., Rosenhahn, B.: Reltr: Relation transformer for scene graph generation. IEEE Transactions on Pattern Analysis and Machine Intelli- gence45(9), 11169–11183 (2023) 3

2023
[13]

Advances in Neural Informa- tion Processing Systems33, 182–192 (2020) 5

Dasgupta, S., Boratko, M., Zhang, D., Vilnis, L., Li, X., McCallum, A.: Improving local identifiability in probabilistic box embeddings. Advances in Neural Informa- tion Processing Systems33, 182–192 (2020) 5

2020
[14]

In: International Conference on Machine Learning

Desai, K., Nickel, M., Rajpurohit, T., Johnson, J., Vedantam, S.R.: Hyperbolic image-text representations. In: International Conference on Machine Learning. pp. 7694–7731. PMLR (2023) 4, 7, 8, 13

2023
[15]

In: Edsger Wybe Dijkstra: his life, work, and legacy, pp

Dijkstra, E.W.: A note on two problems in connexion with graphs. In: Edsger Wybe Dijkstra: his life, work, and legacy, pp. 287–290 (2022) 13

2022
[16]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) 12

work page internal anchor Pith review Pith/arXiv arXiv 2010
[17]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

El Banani, M., Raj, A., Maninis, K.K., Kar, A., Li, Y., Rubinstein, M., Sun, D., Guibas,L.,Johnson,J.,Jampani,V.:Probingthe3dawarenessofvisualfoundation models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21795–21806 (2024) 11 16 Liyang Wang, Zeyu Zhang, and Hao Tang

2024
[18]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Ermolov, A., Mirvakhabova, L., Khrulkov, V., Sebe, N., Oseledets, I.: Hyperbolic vision transformers: Combining improvements in metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7409–7419 (2022) 4

2022
[19]

In: International conference on machine learning

Ganea, O., Bécigneul, G., Hofmann, T.: Hyperbolic entailment cones for learn- ing hierarchical embeddings. In: International conference on machine learning. pp. 1646–1655. PMLR (2018) 5, 8, 14

2018
[20]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Ge, S., Mishra, S., Kornblith, S., Li, C.L., Jacobs, D.: Hyperbolic contrastive learn- ing for visual representations beyond objects. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6840–6849 (2023) 4, 8

2023
[21]

In: International conference on learning representations (2018) 2, 14

Gu, A., Sala, F., Gunel, B., Ré, C.: Learning mixed-curvature representations in product spaces. In: International conference on learning representations (2018) 2, 14

2018
[22]

He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016) 12

2016
[23]

Hydra: A real-time spatial perception system for 3d scene graph construction and optimization,

Hughes, N., Chang, Y., Carlone, L.: Hydra: A real-time spatial perception system for 3d scene graph construction and optimization. arXiv preprint arXiv:2201.13360 (2022) 4

work page arXiv 2022
[24]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Johnson, J., Krishna, R., Stark, M., Li, L.J., Shamma, D., Bernstein, M., Fei-Fei, L.: Image retrieval using scene graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3668–3678 (2015) 1, 3, 14

2015
[25]

Khrulkov, V., Mirvakhabova, L., Ustinova, E., Oseledets, I., Lempitsky, V.: Hyper- bolicimageembeddings.In:ProceedingsoftheIEEE/CVFconferenceoncomputer vision and pattern recognition. pp. 6418–6428 (2020) 4, 12

2020
[26]

Krioukov, D., Papadopoulos, F., Kitsak, M., Vahdat, A., Boguná, M.: Hyperbolic geometryofcomplexnetworks.PhysicalReviewE—Statistical,Nonlinear,andSoft Matter Physics82(3), 036106 (2010) 5

2010
[27]

International journal of computer vision123(1), 32–73 (2017) 3

Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.J., Shamma, D.A., et al.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision123(1), 32–73 (2017) 3

2017
[28]

In: Proceedings of the 57th annual meeting of the association for computational linguistics

Le, M., Roller, S., Papaxanthos, L., Kiela, D., Nickel, M.: Inferring concept hier- archies from text corpora via hyperbolic embeddings. In: Proceedings of the 57th annual meeting of the association for computational linguistics. pp. 3231–3241 (2019) 8

2019
[29]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Leng, Z., Birdal, T., Liang, X., Tombari, F.: Hypersdfusion: Bridging hierarchi- cal structures in language and geometry for enhanced 3d text2shape generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19691–19700 (2024) 2

2024
[30]

In: proceedings of the IEEE/CVF conference on computer vision and pat- tern recognition

Li, R., Zhang, S., He, X.: Sgtr: End-to-end scene graph generation with trans- former. In: proceedings of the IEEE/CVF conference on computer vision and pat- tern recognition. pp. 19486–19496 (2022) 3

2022
[31]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Li, R., Zhang, S., Lin, D., Chen, K., He, X.: From pixels to graphs: Open- vocabulary scene graph generation with vision-language models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 28076– 28086 (2024) 4

2024
[32]

In: International Conference on Machine Learning

Lin, Y.W.E., Coifman, R.R., Mishne, G., Talmon, R.: Hyperbolic diffusion em- bedding and distance for hierarchical representation learning. In: International Conference on Machine Learning. pp. 21003–21025. PMLR (2023) 4 HSG: Hyperbolic Scene Graph 17

2023
[33]

In: European conference on computer vision

Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Jiang, Q., Li, C., Yang, J., Su, H., et al.: Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In: European conference on computer vision. pp. 38–55. Springer (2024) 14

2024
[34]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11976–11986 (2022) 1, 12

2022
[35]

SGDR: Stochastic Gradient Descent with Warm Restarts

Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016) 10

work page Pith review arXiv 2016
[36]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017) 10

work page internal anchor Pith review Pith/arXiv arXiv 2017
[37]

In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Nguyen, K.A., Köper, M., im Walde, S.S., Vu, N.T.: Hierarchical embeddings for hypernymy detection and directionality. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. pp. 233–243 (2017) 5

2017
[38]

Advances in neural information processing systems30(2017) 2, 5, 14

Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representa- tions. Advances in neural information processing systems30(2017) 2, 5, 14

2017
[39]

In: International conference on machine learning

Nickel,M.,Kiela,D.:Learningcontinuoushierarchiesinthelorentzmodelofhyper- bolic geometry. In: International conference on machine learning. pp. 3779–3788. PMLR (2018) 12

2018
[40]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023) 1, 10, 11, 12

work page internal anchor Pith review Pith/arXiv arXiv 2023
[41]

Compositional entailment learning for hyperbolic vision-language models.arXiv preprint arXiv:2410.06912, 2024

Pal, A., van Spengler, M., di Melendugno, G.M.D., Flaborea, A., Galasso, F., Mettes, P.: Compositional entailment learning for hyperbolic vision-language mod- els. arXiv preprint arXiv:2410.06912 (2024) 4

work page arXiv 2024
[42]

In: International conference on machine learning

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021) 8

2021
[43]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: General- ized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 658–666 (2019) 10

2019
[44]

In: European conference on computer vision

Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: European conference on computer vision. pp. 17–35. Springer (2016) 10

2016
[45]

In: 2020 IEEE international conference on robotics and automation (ICRA)

Rosinol, A., Abate, M., Chang, Y., Carlone, L.: Kimera: an open-source library for real-time metric-semantic localization and mapping. In: 2020 IEEE international conference on robotics and automation (ICRA). pp. 1689–1696. IEEE (2020) 4, 14

2020
[46]

In: International conference on machine learning

Sala, F., De Sa, C., Gu, A., Ré, C.: Representation tradeoffs for hyperbolic em- beddings. In: International conference on machine learning. pp. 4460–4469. PMLR (2018) 2, 14

2018
[47]

In: International symposium on graph drawing

Sarkar, R.: Low distortion delaunay embedding of trees in hyperbolic plane. In: International symposium on graph drawing. pp. 355–366. Springer (2011) 5

2011
[48]

DINOv3

Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025) 14

work page internal anchor Pith review Pith/arXiv arXiv 2025
[49]

Advances in Neural Information Processing Systems 37, 91220–91259 (2024) 4 18 Liyang Wang, Zeyu Zhang, and Hao Tang

Sinha, A., Zeng, S., Yamada, M., Zhao, H.: Learning structured representations with hyperbolic embeddings. Advances in Neural Information Processing Systems 37, 91220–91259 (2024) 4 18 Liyang Wang, Zeyu Zhang, and Hao Tang

2024
[50]

Advances in neural information processing systems29(2016) 8

Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. Advances in neural information processing systems29(2016) 8

2016
[51]

Order -embeddings of images and language[J]

Vendrov, I., Kiros, R., Fidler, S., Urtasun, R.: Order-embeddings of images and language. arXiv preprint arXiv:1511.06361 (2015) 1, 4

work page arXiv 2015
[52]

In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Vilnis, L., Li, X., Murty, S., McCallum, A.: Probabilistic embedding of knowledge graphs with box lattice measures. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 263–272 (2018) 5

2018
[53]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Wald, J., Dhamo, H., Navab, N., Tombari, F.: Learning 3d semantic scene graphs from 3d indoor reconstructions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3961–3970 (2020) 4

2020
[54]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non- parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3733–3742 (2018) 4

2018
[55]

In: Proceedings of the IEEE conference on computer vision and pat- tern recognition

Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative mes- sage passing. In: Proceedings of the IEEE conference on computer vision and pat- tern recognition. pp. 5410–5419 (2017) 3

2017
[56]

In: Proceedings of the European conference on computer vision (ECCV)

Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph r-cnn for scene graph gen- eration. In: Proceedings of the European conference on computer vision (ECCV). pp. 670–685 (2018) 1, 14

2018
[57]

In: International conference on machine learning

Yang, M., Zhou, M., Ying, R., Chen, Y., King, I.: Hyperbolic representation learn- ing: Revisiting and advancing. In: International conference on machine learning. pp. 39639–39659. PMLR (2023) 2

2023
[58]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Zellers, R., Yatskar, M., Thomson, S., Choi, Y.: Neural motifs: Scene graph parsing with global context. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5831–5840 (2018) 3

2018
[59]

Zhang, J., Zhu, G., Li, S., Liu, X., Song, H., Tang, X., Feng, C.: Multiview scene graph.AdvancesinNeuralInformationProcessingSystems37,17761–17788(2024) 1, 4, 10, 11

2024
[60]

and Hartley, Richard I

Zhang, Z., Li, D., Reid, I., Hartley, R.: Geoworld: Geometric world models. arXiv preprint arXiv:2602.23058 (2026) 2

work page arXiv 2026
[61]

Tsinghua Science and Technology30(4), 1511–1525 (2025) 4 HSG: Hyperbolic Scene Graph 19 A Hyperparameter setting Table 5:Hyperparameters used in the HSG main experiments

Zheng, W., Zhang, G., Zhao, X., Feng, Z., Song, L., Kou, H.: Hyperbolic graph wavelet neural network. Tsinghua Science and Technology30(4), 1511–1525 (2025) 4 HSG: Hyperbolic Scene Graph 19 A Hyperparameter setting Table 5:Hyperparameters used in the HSG main experiments. Hyperparameter Value/Range Original image size192×256 Input image size224×224 Batch ...

work page arXiv 2025