The paper defines and measures 'problem drift' in multi-agent LLM debates across tasks and proposes DRIFTJudge and DRIFTPolicy as baselines to detect and reduce it.
Attention-based lstm for clinical time series classification
9 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
FinBERT adapts BERT to the financial domain and outperforms prior state-of-the-art methods on financial sentiment analysis tasks.
Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.
Targeted data augmentation with GPT-4 synthetic responses and ALP phrase-level extraction substantially improves SciBERT performance on severely imbalanced rubric categories for NGSS scientific explanations, achieving perfect precision/recall/F1 on several categories while outperforming SMOTE.
Llama 3.1 annotates Polish medical texts to train DistilBERT classifiers achieving F1 scores above 0.80 that are 500 times smaller than the teacher model.
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.
An attention-based LSTM model predicts heat stress from heart rate, HRV, and SpO2 data with 95.4% accuracy and 0.982 F1 score on a 19-worker dataset.
A sequential fine-tuning strategy for pre-trained language models reports modest accuracy gains of 4.7%, 0.99%, and 0.72% on semantic similarity, sequence labeling, and text classification tasks.
citing papers explorer
-
Stay Focused: Problem Drift in Multi-Agent Debate
The paper defines and measures 'problem drift' in multi-agent LLM debates across tasks and proposes DRIFTJudge and DRIFTPolicy as baselines to detect and reduce it.
-
OPT: Open Pre-trained Transformer Language Models
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
-
FinBERT: Financial Sentiment Analysis with Pre-trained Language Models
FinBERT adapts BERT to the financial domain and outperforms prior state-of-the-art methods on financial sentiment analysis tasks.
-
Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning
Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.
-
Exploring Data Augmentation and Resampling Strategies for Transformer-Based Models to Address Class Imbalance in AI Scoring of Scientific Explanations in NGSS Classroom
Targeted data augmentation with GPT-4 synthetic responses and ALP phrase-level extraction substantially improves SciBERT performance on severely imbalanced rubric categories for NGSS scientific explanations, achieving perfect precision/recall/F1 on several categories while outperforming SMOTE.
-
ADMEDTAGGER: an annotation framework for distillation of expert knowledge for the Polish medical language
Llama 3.1 annotates Polish medical texts to train DistilBERT classifiers achieving F1 scores above 0.80 that are 500 times smaller than the teacher model.
-
Towards the Anonymization of the Language Modeling
Authors introduce MLM and CLM specialization methods that avoid memorizing identifiers in sensitive training data while aiming for a privacy-utility tradeoff on medical datasets.
-
Enhancing Construction Worker Safety in Extreme Heat: A Machine Learning Approach Utilizing Wearable Technology for Predictive Health Analytics
An attention-based LSTM model predicts heat stress from heart rate, HRV, and SpO2 data with 95.4% accuracy and 0.982 F1 score on a 19-worker dataset.
-
To Tune or Not To Tune? How About the Best of Both Worlds?
A sequential fine-tuning strategy for pre-trained language models reports modest accuracy gains of 4.7%, 0.99%, and 0.72% on semantic similarity, sequence labeling, and text classification tasks.