Visual ChatGPT Talking, Drawing and Editing with Visual Foundation Models. pdf
WebGPT Browser-assisted question-answering with human feedback. pdf
Training language models to follow instructions with human feedback. pdf
Scaling Laws for Reward Model Overoptimization. pdf
Teaching language models to support answers with verified quotes. pdf
Quark Controllable Text Generation with Reinforced Unlearning. pdf
Scalable agent alignment via reward modeling a research direction. pdf
Learning to summarize from human feedback. pdf
Recursively Summarizing Books with Human Feedback. pdf
Red Teaming Language Models to Reduce Harms Methods, Scaling Behaviors, and Lessons Learned. pdf
Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation. pdf
Is Reinforcement Learning (Not) for Natural Language Processing Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization. pdf
Learning to summarize with human feedback. pdf
Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning. pdf
Pretraining Language Models with Human Preferences. pdf
Improving alignment of dialogue agents via targeted human judgements. pdf
InstructGPT Training language models to follow instructions with human feedback. pdf
Interactive Learning from Policy-Dependent Human Feedback. pdf
Few-shot Preference Learning for Human-in-the-Loop RL. pdf
Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning. pdf
GPT-4 Technical Report. pdf
Fine-Tuning Language Models from Human Preferences. pdf
Deep TAMER Interactive Agent Shaping in High-Dimensional State Spaces. pdf
Discovering Language Model Behaviors with Model-Written Evaluations. pdf
Deep Reinforcement Learning from Human Preferences. pdf
Aligning Language Models with Preferences through f-divergence Minimization. pdf
Constitutional Al Harmlessness from Al Feedback. pdf
Better Aligning Text-to-Image Models with Human Preference. pdf
大模型RLHF论文合集.rar
(79.13 MB, 需要: RMB 29 元)


雷达卡



京公网安备 11010802022788号







