Distributed On-Policy Actor-Critic Reinforcement Learning
Distributed On-Policy Actor-Critic Reinforcement Learning
Autori:
Izdanje: Sinteza 2022 - International Scientific Conference on Information Technology and Data Related Research
DOI: 10.15308/Sinteza-2022-389-393
Oblast: DECIDE Project Session
Stranice: 389-393
Apstrakt:
In this paper, a novel distributed on-policy Actor-Critic algorithm for multiagent reinforcement learning is proposed. The algorithm consists of the temporal difference scheme with function approximation at the Critic stage, and a policy gradient algorithm at the Actor stage, derived starting from a global objective. At both stages, decentralized agreement among the agents is achieved using the linear dynamic consensus strategy. Compared to the existing schemes, the algorithm has improved convergence rate and noise immunity, and a possibility to achieve multi-task global optimization.
Ključne reči: Multi-Agent Systems, Reinforcement Learning, Actor-Critic, Distributed Consensus, Function Approximation
Priložene datoteke:
- 389-393 ( veličina: 546,81 KB, broj pregleda: 231 )
Zahvaljujemo se što ste preuzeli publikaciju sa portala Singipedia.
Ukoliko želite da se prijavite za obaveštenja o sadržajima iz oblasti ove publikacije, možete nam ostaviti adresu svoje elektronske pošte.
Preuzimanje citata:
BibTeX format
RefWorks Tagged format
Unapred formatirani prikaz citata
BibTeX format
@article{article, author = {M. Stanković, M. Beko, M. Pavlović, I. Popadić and S. Stanković}, title = {Distributed On-Policy Actor-Critic Reinforcement Learning}, journal = {Sinteza 2022 - International Scientific Conference on Information Technology and Data Related Research}, year = 2022, pages = {389-393}, doi = {10.15308/Sinteza-2022-389-393} }
RT Conference Proceedings A1 Miloš Stanković A1 Miloš Beko A1 Miloš Pavlović A1 Ilija Popadić A1 Srđan Stanković T1 Distributed On-Policy Actor-Critic Reinforcement Learning AD Univerzitet Singidunum, Beograd, Beograd, Srbija YR 2022 NO doi: 10.15308/Sinteza-2022-389-393
M. Stanković, M. Beko, M. Pavlović, I. Popadić and S. Stanković, Distributed On-Policy Actor-Critic Reinforcement Learning, Univerzitet Singidunum, Beograd, 2022, doi:10.15308/Sinteza-2022-389-393