Can LLMs Interpret and Leverage Structured Linguistic Representations? A Case Study with AMRs

Ankush Raut , Xiaofeng Zhu , Maria Leonor Pacheco

Authors on Pith no claims yet

classification 💻 cs.CL

keywords llmscontextslanguagetaskscosineleveragelinguisticllama

read the original abstract

This paper evaluates the ability of Large Language Models (LLMs) to leverage contextual information in the form of structured linguistic representations. Specifically, we examine the impact of encoding both short and long contexts using Abstract Meaning Representation (AMR) structures across a diverse set of language tasks. We perform our analysis using 8-bit quantized and instruction-tuned versions of Llama 3.1 (8B), Phi-3, and Mistral 7B. Our results indicate that, for tasks involving short contexts, augmenting the prompt with the AMR of the original language context often degrades the performance of the underlying LLM. However, for tasks that involve long contexts, such as dialogue summarization in the SAMSum dataset, this enhancement improves LLM performance, for example, by increasing the zero-shot cosine similarity score of Llama 3.1 from 66% to 76%. This improvement is more evident in the newer and larger LLMs, but does not extend to the older or smaller ones. In addition, we observe that LLMs can effectively reconstruct the original text from a linearized AMR, achieving a cosine similarity of 81% in the best-case scenario.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Misaligned by Reward: Socially Undesirable Preferences in LLMs
cs.CL 2026-05 unverdicted novelty 6.0

Reward models for LLMs frequently select socially undesirable options across four social domains, show no overall best performer, and exhibit a bias-avoidance versus context-sensitivity trade-off.
Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison Triggers They Fail to Detect
cs.CL 2026-05 unverdicted novelty 6.0

LLMs generate social media posts that shift readers' perceived standing and comparison-related feelings but fail to reliably detect the same social-comparison triggers via prompt-based classification.
Why Low-Resource NLP Needs More Than Cross-Lingual Transfer: Lessons Learned from Luxembourgish
cs.CL 2026-05 unverdicted novelty 4.0

Cross-lingual transfer and language-specific data efforts are interdependent and complementary for effective low-resource NLP, as demonstrated through Luxembourgish case studies and synthesis.
From If-Statements to ML Pipelines: Revisiting Bias in Code-Generation
cs.CL 2026-04 unverdicted novelty 4.0

LLM-generated ML pipelines show higher bias (87.7% sensitive attributes) than conditional statements (59.2%), indicating that simple if-statement tests underestimate bias risk in practical code generation.
FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals
cs.CL 2026-05 unverdicted novelty 3.0

A feature-based decision tree with parsing-derived signals and heuristics detects LLM-generated code in a lightweight, CPU-only setup for SemEval-2026 Task 13.
mdok-style at SemEval-2026 Task 10: Finetuning LLMs for Conspiracy Detection
cs.CL 2026-05 unverdicted novelty 2.0

Finetuning Qwen3-32B with data augmentation and self-training achieves competitive 8th-place ranking on SemEval-2026 conspiracy detection.
mdok-style at SemEval-2026 Task 9: Finetuning LLMs for Multilingual Polarization Detection
cs.CL 2026-05 unverdicted novelty 2.0

Finetuning LLMs with QLoRA and multilingual data augmentation for polarization detection, type, and manifestation in SemEval-2026 Task 9.
mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code
cs.LG 2026-04 unverdicted novelty 2.0

Fine-tuning LLMs by adapting the mdok approach produces competitive results on binary detection, source attribution, and hybrid/adversarial code identification in SemEval-2026 Task 13.