Monte Carlo tree search control scheme for multibody dynamics applications

Tang, Yixuan; Orzechowski, Grzegorz; Prokop, Aleš; Mikkola, Aki

doi:10.1007/s11071-024-09509-8

Monte Carlo tree search control scheme for multibody dynamics applications

Files

s11071024095098.pdf (4.59 MB)

Date

2024-04-03

Authors

Tang, Yixuan

Orzechowski, Grzegorz

Prokop, Aleš

Mikkola, Aki

Publisher

Springer Nature

ORCID

0000-0002-7526-1366

Altmetrics

Abstract

There is considerable interest in applying reinforcement learning (RL) to improve machine control across multiple industries, and the automotive industry is one of the prime examples. Monte Carlo Tree Search (MCTS) has emerged and proven powerful in decision-making games, even without understanding the rules. In this study, multibody system dynamics (MSD) control is first modeled as a Markov Decision Process and solved with Monte Carlo Tree Search. Based on randomized search space exploration, the MCTS framework builds a selective search tree by repeatedly applying a Monte Carlo rollout at each child node. However, without a library of available choices, deciding among the many possibilities for agent parameters can be intimidating. In addition, the MCTS poses a significant challenge for searching due to the large branching factor. This challenge is typically overcome by appropriate parameter design, search guiding, action reduction, parallelization, and early termination. To address these shortcomings, the overarching goal of this study is to provide needed insight into inverted pendulum controls via vanilla and modified MCTS agents, respectively. A series of reward functions are well-designed according to the control goal, which maps a specific distribution shape of reward bonus and guides the MCTS-based control to maintain the upright position. Numerical examples show that the reward-modified MCTS algorithms significantly improve the control performance and robustness of the default choice of a constant reward that constitutes the vanilla MCTS. The exponentially decaying reward functions perform better than the constant value or polynomial reward functions. Moreover, the exploitation vs. exploration trade-off and discount parameters are carefully tested. The study’s results can guide the research of RL-based MSD users.
There is considerable interest in applying reinforcement learning (RL) to improve machine control across multiple industries, and the automotive industry is one of the prime examples. Monte Carlo Tree Search (MCTS) has emerged and proven powerful in decision-making games, even without understanding the rules. In this study, multibody system dynamics (MSD) control is first modeled as a Markov Decision Process and solved with Monte Carlo Tree Search. Based on randomized search space exploration, the MCTS framework builds a selective search tree by repeatedly applying a Monte Carlo rollout at each child node. However, without a library of available choices, deciding among the many possibilities for agent parameters can be intimidating. In addition, the MCTS poses a significant challenge for searching due to the large branching factor. This challenge is typically overcome by appropriate parameter design, search guiding, action reduction, parallelization, and early termination. To address these shortcomings, the overarching goal of this study is to provide needed insight into inverted pendulum controls via vanilla and modified MCTS agents, respectively. A series of reward functions are well-designed according to the control goal, which maps a specific distribution shape of reward bonus and guides the MCTS-based control to maintain the upright position. Numerical examples show that the reward-modified MCTS algorithms significantly improve the control performance and robustness of the default choice of a constant reward that constitutes the vanilla MCTS. The exponentially decaying reward functions perform better than the constant value or polynomial reward functions. Moreover, the exploitation vs. exploration trade-off and discount parameters are carefully tested. The study’s results can guide the research of RL-based MSD users.

Keywords

Monte Carlo Tree Search , Multibody dynamics , Reward functions , Parametric analysis , Artificial intelligence control , Inverted pendulum , Monte Carlo Tree Search , Multibody dynamics , Reward functions , Parametric analysis , Artificial intelligence control , Inverted pendulum

Citation

NONLINEAR DYNAMICS. 2024, vol. 112, issue 10, p. 8363-8391.
https://doi.org/10.1007/s11071-024-09509-8

Document type

Peer-reviewed

Document version

Published version

Language of document

en

DOI

10.1007/s11071-024-09509-8

URI

http://hdl.handle.net/11012/245527

Collections

Ústav automobilního a dopravního inženýrství

Creative Commons license

Except where otherwised noted, this item's license is described as Creative Commons Attribution 4.0 International

Citace PRO

Full item page

Monte Carlo tree search control scheme for multibody dynamics applications

Files

Date

Authors

Advisor

Referee

Mark

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

Altmetrics

Abstract

Description

Keywords

Citation

Document type

Document version

Date of access to the full text

Language of document

Study field

Comittee

Date of acceptance

Defence

Result of defence

DOI

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Citace PRO