LLMs show partial and variable perceptual alignment with human touch on textiles, succeeding on samples like silk satin but failing on cotton denim when matching descriptive language to embedding similarity.
Artificial intelligence, values, and alignment
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
SmoothLLM mitigates jailbreaking attacks on LLMs by randomly perturbing multiple copies of a prompt at the character level and aggregating the outputs to detect adversarial inputs.
Selecting preference pairs whose DPO implicit reward gap is small yields better LLM alignment than random or baseline selection while using only 10% of the data.
citing papers explorer
-
TouchAI: Exploring human-AI perceptual alignment in touch through language model representations
LLMs show partial and variable perceptual alignment with human touch on textiles, succeeding on samples like silk satin but failing on cotton denim when matching descriptive language to embedding similarity.
-
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
SmoothLLM mitigates jailbreaking attacks on LLMs by randomly perturbing multiple copies of a prompt at the character level and aggregating the outputs to detect adversarial inputs.
-
Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
Selecting preference pairs whose DPO implicit reward gap is small yields better LLM alignment than random or baseline selection while using only 10% of the data.