Dan
AI Engineer
Recent
- Dec 2, 2024 Language-vision models: recent developments
- Nov 28, 2024 Quick introduction to RLHF for fine-tuning LLMs to better match human preferences
- Mar 29, 2024 Curvature explains loss of plasticity
- Dec 3, 2023 Auxiliary task discovery through generate-and-test
- Nov 23, 2023 Hybrid actor-critic algorithm for quantum reinforcement learning at CERN beam lines
- Nov 3, 2023 Planning with expectation models
- Oct 30, 2023 Loss of plasticity in continual deep reinforcement learning
- Oct 29, 2023 Loss of Plasticity in Deep Continual Learning
- Jul 23, 2023 Reinforcement learning with unsupervised auxiliary tasks
- Jun 1, 2023 Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor
Large language models 3
Reinforcement learning 12
- Mar 29, 2024 Curvature explains loss of plasticity
- Dec 3, 2023 Auxiliary task discovery through generate-and-test
- Nov 23, 2023 Hybrid actor-critic algorithm for quantum reinforcement learning at CERN beam lines
- Nov 3, 2023 Planning with expectation models
- Oct 30, 2023 Loss of plasticity in continual deep reinforcement learning
- Oct 29, 2023 Loss of Plasticity in Deep Continual Learning
- Jul 23, 2023 Reinforcement learning with unsupervised auxiliary tasks
- Jun 1, 2023 Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor
- Apr 15, 2023 An emphatic approach to the problem of off-policy temporal-difference learning
- Apr 15, 2023 Recent Developments in Emphatic Temporal-Differences Methods and Emphatic Weightings
- Mar 12, 2023 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- Feb 15, 2023 A review of different Transformers-based algorithms and methods