Skip to content

Commit e99ae6b

Browse files
xiongjyupuyuan1996PaParaZz1KJLdefeatedpuyuan
authored
fix(xjy): merged latest main branch (#368)
* v0.2.0 * style(pu): use actions/upload-artifact@v3 * fix(pu): fix Union import in game_segment * style(pu): use actions/upload-artifact@v4 * test(nyz): only upload cov in macos * fix(pu): fix reanalyze_ratio compatibility with rope embed (#342) * fix(pu): fix release.yml * fix(pu): fix release.yml (#343) * fix(pu): fix release.yml * fix(pu): fix release.yml * fix(pu): fix release.yml * fix(pu): fix release.yml * fix(pu): fix release.yml * fix(pu): use actions/download-artifact@v2 * fix(pu): use actions/download-artifact@v4 * release v0.2.0 * fix(lkj): fix typo in customize_envs.md * fix(pu): adapt atari and dmc2gym env to support shared_memory (#345) * fix(pu): fix atari and dmc2gym env to support shared_memory * tmp * fix(pu): fix frame_stack_num default cfg in atari env --------- Co-authored-by: puyuan <puyuan1996@qq.com> * delete unnecessary comments and translate CN comments into EN * delete unnecessary comment --------- Co-authored-by: 蒲源 <2402552459@qq.com> Co-authored-by: PaParaZz1 <niuyazhe314@outlook.com> Co-authored-by: 蒲源 <48008469+puyuan1996@users.noreply.github.com> Co-authored-by: 林楷傑 <46377141+KJLdefeated@users.noreply.github.com> Co-authored-by: puyuan <puyuan1996@qq.com>
1 parent 46d69fc commit e99ae6b

File tree

23 files changed

+225
-75
lines changed

23 files changed

+225
-75
lines changed

.github/workflows/release.yml

Lines changed: 99 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,10 @@ jobs:
3838
run: |
3939
make zip
4040
ls -al dist
41-
- name: Upload packed files to artifacts
42-
uses: actions/upload-artifact@v2
41+
- name: Upload packed files to artifacts (source)
42+
uses: actions/upload-artifact@v4
4343
with:
44-
name: build-artifacts-all
44+
name: build-artifacts-source
4545
path: ./dist/*
4646

4747
wheel_build:
@@ -102,35 +102,119 @@ jobs:
102102
run: |
103103
ls -al ./wheelhouse
104104
mv wheelhouse dist
105-
- name: Upload packed files to artifacts
106-
uses: actions/upload-artifact@v2
105+
- name: Upload packed files to artifacts (wheels)
106+
uses: actions/upload-artifact@v4
107107
with:
108-
name: build-artifacts-all
108+
name: build-artifacts-wheels-${{ matrix.os }}-${{ matrix.python }}-${{ matrix.architecture }}
109109
path: ./dist/*
110110

111+
wheel_aggregate:
112+
name: Aggregate all wheels
113+
runs-on: ubuntu-20.04
114+
needs: wheel_build
115+
steps:
116+
- name: Create aggregation directory
117+
run: mkdir -p aggregated_wheels_all
118+
119+
- name: Download wheel ubuntu-20.04, 3.7, x86_64
120+
uses: actions/download-artifact@v4
121+
with:
122+
name: build-artifacts-wheels-ubuntu-20.04-3.7-x86_64
123+
path: aggregated_wheels_all
124+
- name: Download wheel ubuntu-20.04, 3.8, x86_64
125+
uses: actions/download-artifact@v4
126+
with:
127+
name: build-artifacts-wheels-ubuntu-20.04-3.8-x86_64
128+
path: aggregated_wheels_all
129+
- name: Download wheel ubuntu-20.04, 3.9, x86_64
130+
uses: actions/download-artifact@v4
131+
with:
132+
name: build-artifacts-wheels-ubuntu-20.04-3.9-x86_64
133+
path: aggregated_wheels_all
134+
- name: Download wheel ubuntu-20.04, 3.10, x86_64
135+
uses: actions/download-artifact@v4
136+
with:
137+
name: build-artifacts-wheels-ubuntu-20.04-3.10-x86_64
138+
path: aggregated_wheels_all
139+
- name: Download wheel ubuntu-20.04, 3.11, x86_64
140+
uses: actions/download-artifact@v4
141+
with:
142+
name: build-artifacts-wheels-ubuntu-20.04-3.11-x86_64
143+
path: aggregated_wheels_all
144+
- name: Download wheel ubuntu-20.04, 3.7, aarch64
145+
uses: actions/download-artifact@v4
146+
with:
147+
name: build-artifacts-wheels-ubuntu-20.04-3.7-aarch64
148+
path: aggregated_wheels_all
149+
- name: Download wheel ubuntu-20.04, 3.8, aarch64
150+
uses: actions/download-artifact@v4
151+
with:
152+
name: build-artifacts-wheels-ubuntu-20.04-3.8-aarch64
153+
path: aggregated_wheels_all
154+
- name: Download wheel ubuntu-20.04, 3.9, aarch64
155+
uses: actions/download-artifact@v4
156+
with:
157+
name: build-artifacts-wheels-ubuntu-20.04-3.9-aarch64
158+
path: aggregated_wheels_all
159+
- name: Download wheel ubuntu-20.04, 3.10, aarch64
160+
uses: actions/download-artifact@v4
161+
with:
162+
name: build-artifacts-wheels-ubuntu-20.04-3.10-aarch64
163+
path: aggregated_wheels_all
164+
- name: Download wheel ubuntu-20.04, 3.11, aarch64
165+
uses: actions/download-artifact@v4
166+
with:
167+
name: build-artifacts-wheels-ubuntu-20.04-3.11-aarch64
168+
path: aggregated_wheels_all
169+
170+
- name: Download wheel macos-13, 3.7, x86_64
171+
uses: actions/download-artifact@v4
172+
with:
173+
name: build-artifacts-wheels-macos-13-3.7-x86_64
174+
path: aggregated_wheels_all
175+
- name: Download wheel macos-13, 3.8, x86_64
176+
uses: actions/download-artifact@v4
177+
with:
178+
name: build-artifacts-wheels-macos-13-3.8-x86_64
179+
path: aggregated_wheels_all
180+
- name: Download wheel macos-13, 3.7, arm64
181+
uses: actions/download-artifact@v4
182+
with:
183+
name: build-artifacts-wheels-macos-13-3.7-arm64
184+
path: aggregated_wheels_all
185+
- name: Download wheel macos-13, 3.8, arm64
186+
uses: actions/download-artifact@v4
187+
with:
188+
name: build-artifacts-wheels-macos-13-3.8-arm64
189+
path: aggregated_wheels_all
190+
191+
- name: Upload unified wheels artifact
192+
uses: actions/upload-artifact@v4
193+
with:
194+
name: build-artifacts-wheels
195+
path: aggregated_wheels_all
196+
111197
# the publishing can only be processed on linux system
112198
wheel_publish:
113199
name: Publish the wheels to pypi
114200
runs-on: ubuntu-20.04
115201
needs:
116202
- source_build
117-
- wheel_build
203+
- wheel_aggregate
118204
strategy:
119205
fail-fast: false
120206
matrix:
121207
python:
122208
- '3.8.7'
123209

124210
steps:
125-
- name: Download packed files to artifacts
126-
uses: actions/download-artifact@v3
211+
- name: Download unified wheels artifact
212+
uses: actions/download-artifact@v4
127213
with:
128-
name: build-artifacts-all
214+
name: build-artifacts-wheels
129215
path: ./dist
130-
- name: Show the buildings
131-
shell: bash
132-
run: |
133-
ls -al ./dist
216+
- name: Show the aggregated wheels
217+
run: ls -al ./dist
134218
- name: Upload distribution 📦 to github release
135219
uses: svenstaro/upload-release-action@v2
136220
with:
@@ -145,4 +229,4 @@ jobs:
145229
password: ${{ secrets.PYPI_API_TOKEN }}
146230
verbose: true
147231
skip_existing: true
148-
packages_dir: dist/
232+
packages_dir: dist/

.github/workflows/release_test.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,12 @@ jobs:
3838
make zip
3939
ls -al dist
4040
- name: Upload packed files to artifacts
41-
uses: actions/upload-artifact@v2
41+
uses: actions/upload-artifact@v4
4242
with:
4343
name: build-artifacts-source-pack
4444
path: ./dist/*
4545
- name: Upload packed files to artifacts
46-
uses: actions/upload-artifact@v2
46+
uses: actions/upload-artifact@v4
4747
with:
4848
name: build-artifacts-all
4949
path: ./dist/*
@@ -108,12 +108,12 @@ jobs:
108108
ls -al ./wheelhouse
109109
mv wheelhouse dist
110110
- name: Upload packed files to artifacts
111-
uses: actions/upload-artifact@v3
111+
uses: actions/upload-artifact@v4
112112
with:
113113
name: build-artifacts-${{ runner.os }}-cp${{ matrix.python }}-${{ matrix.architecture }}
114114
path: ./dist/*
115115
- name: Upload packed files to artifacts
116-
uses: actions/upload-artifact@v2
116+
uses: actions/upload-artifact@v4
117117
with:
118118
name: build-artifacts-all
119119
path: ./dist/*

.github/workflows/test.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ jobs:
100100
run: |
101101
make clean build unittest
102102
- name: Upload coverage to Codecov
103+
if: ${{ env.OS_NAME == 'MacOS' }}
103104
uses: codecov/codecov-action@v4
104105
with:
105106
token: ${{ secrets.CODECOV_TOKEN }}

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1449,4 +1449,5 @@ events.*
14491449
# pooltool-specific stuff
14501450
!/assets/pooltool/**
14511451
lzero/mcts/ctree/ctree_alphazero/pybind11
1452+
14521453
zoo/jericho/envs/z-machine-games-master

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
2025.04.01 (v0.2.0)
1+
2025.04.09 (v0.2.0)
22
- env: Add Metadrive environment and configurations (#192)
33
- env: Add Sampled MuZero/UniZero and DMC environment with related configurations (#260)
44
- env: Polish Chess environment and its render method; add unittests and configurations (#272)

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
2929
[![discord badge](https://dcbadge.vercel.app/api/server/dkZS2JF56X?style=flat)](https://discord.gg/dkZS2JF56X)
3030

31-
Updated on 2025.04.01 LightZero-v0.2.0
31+
Updated on 2025.04.09 LightZero-v0.2.0
3232

3333
English | [简体中文(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [Documentation](https://opendilab.github.io/LightZero) | [LightZero Paper](https://arxiv.org/abs/2310.08348) | [🔥UniZero Paper](https://arxiv.org/abs/2406.10667) | [🔥ReZero Paper](https://arxiv.org/abs/2404.16364)
3434

README.zh.md

Lines changed: 27 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
[![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
2828
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
2929

30-
最近更新于 2025.04.01 LightZero-v0.2.0
30+
最近更新于 2025.04.09 LightZero-v0.2.0
3131

3232
[English](https://github.com/opendilab/LightZero/blob/main/README.md) | 简体中文 | [文档](https://opendilab.github.io/LightZero) | [LightZero 论文](https://arxiv.org/abs/2310.08348) | [🔥UniZero 论文](https://arxiv.org/abs/2406.10667) | [🔥ReZero 论文](https://arxiv.org/abs/2404.16364)
3333

@@ -52,25 +52,37 @@
5252
**LightZero** 的目标是**标准化 MCTS 算法族,以加速相关研究和应用。** [Benchmark](#benchmark) 中介绍了目前所有已实现算法的性能比较。
5353

5454
### 导航
55-
- [概览](#概览)
55+
- [LightZero](#lightzero)
56+
- [🔍 背景](#-背景)
57+
- [🎨 概览](#-概览)
5658
- [导航](#导航)
57-
- [特点](#特点)
58-
- [框架结构](#框架结构)
59-
- [集成算法](#集成算法)
60-
- [安装方法](#安装方法)
61-
- [快速开始](#快速开始)
62-
- [文档](#文档)
63-
- [基线算法比较](#基线算法比较)
64-
- [MCTS相关笔记](#MCTS-相关笔记)
59+
- [💥 特点](#-特点)
60+
- [🧩 框架结构](#-框架结构)
61+
- [🎁 集成算法](#-集成算法)
62+
- [⚙️ 安装方法](#️-安装方法)
63+
- [使用 Docker 进行安装](#使用-docker-进行安装)
64+
- [🚀 快速开始](#-快速开始)
65+
- [📚 文档](#-文档)
66+
- [📊 基线算法比较](#-基线算法比较)
67+
- [📝 MCTS 相关笔记](#-mcts-相关笔记)
6568
- [论文笔记](#论文笔记)
6669
- [算法框架图](#算法框架图)
67-
- [MCTS相关论文](#MCTS-相关论文)
70+
- [MCTS 相关论文](#mcts-相关论文)
6871
- [重要论文](#重要论文)
72+
- [LightZero Implemented series](#lightzero-implemented-series)
73+
- [AlphaGo series](#alphago-series)
74+
- [MuZero series](#muzero-series)
75+
- [MCTS Analysis](#mcts-analysis)
76+
- [MCTS Application](#mcts-application)
6977
- [其他论文](#其他论文)
70-
- [反馈意见和贡献](#反馈意见和贡献)
71-
- [引用](#引用)
72-
- [致谢](#致谢)
73-
- [许可证](#许可证)
78+
- [ICML](#icml)
79+
- [ICLR](#iclr)
80+
- [NeurIPS](#neurips)
81+
- [Other Conference or Journal](#other-conference-or-journal)
82+
- [💬 反馈意见和贡献](#-反馈意见和贡献)
83+
- [🌏 引用](#-引用)
84+
- [💓 致谢](#-致谢)
85+
- [🏷️ 许可证](#️-许可证)
7486

7587
### 💥 特点
7688
**轻量**:LightZero 中集成了多种 MCTS 族算法,能够在同一框架下轻量化地解决多种属性的决策问题。

docs/source/tutorials/envs/customize_envs.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -81,12 +81,17 @@ In a custom environment, you need to provide properties for observation space an
8181

8282
```python
8383
@property
84-
defobservation_space(self):
85-
return self.env.observation_space
84+
def observation_space(self):
85+
return self._observation_space
8686

8787
@property
8888
def action_space(self):
89-
return self.env.action_space
89+
return self._action_space
90+
91+
@property
92+
def legal_actions(self):
93+
# get the actual legal actions
94+
return np.arange(self._action_space.n)
9095
```
9196

9297
### 6. **Render Method**<br>

lzero/config/meta.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
__TITLE__ = "LightZero"
88

99
#: Version of this project.
10-
__VERSION__ = "0.1.0"
10+
__VERSION__ = "0.2.0"
1111

1212
#: Short description of the project, will be included in ``setup.py``.
1313
__DESCRIPTION__ = 'A lightweight and efficient MCTS/AlphaZero/MuZero algorithm toolkits.'

lzero/mcts/buffer/game_buffer_unizero.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ def sample(
7676
)
7777

7878
# target policy
79-
batch_target_policies_re = self._compute_target_policy_reanalyzed(policy_re_context, policy._target_model, current_batch[1]) # current_batch[1] is batch_action
79+
batch_target_policies_re = self._compute_target_policy_reanalyzed(policy_re_context, policy._target_model, current_batch[1], current_batch[-1]) # current_batch[1] is batch_action
8080
batch_target_policies_non_re = self._compute_target_policy_non_reanalyzed(
8181
policy_non_re_context, self._cfg.model.action_space_size
8282
)
@@ -235,7 +235,7 @@ def reanalyze_buffer(
235235
# obtain the current_batch and prepare target context
236236
policy_re_context, current_batch = self._make_batch_for_reanalyze(batch_size)
237237
# target policy
238-
self._compute_target_policy_reanalyzed(policy_re_context, policy._target_model, current_batch[1], current_batch[-1] )
238+
self._compute_target_policy_reanalyzed(policy_re_context, policy._target_model, current_batch[1], current_batch[-1])
239239

240240
def _make_batch_for_reanalyze(self, batch_size: int) -> Tuple[Any]:
241241
"""
@@ -432,7 +432,7 @@ def _compute_target_policy_reanalyzed(self, policy_re_context: List[Any], model:
432432
# =============== NOTE: The key difference with MuZero =================
433433
# To obtain the target policy from MCTS guided by the recent target model
434434
# TODO: batch_obs (policy_obs_list) is at timestep t, batch_action is at timestep t
435-
m_output = model.initial_inference(batch_obs, batch_action[:self.reanalyze_num], start_pos=batch_timestep) # NOTE: :self.reanalyze_num
435+
m_output = model.initial_inference(batch_obs, batch_action[:self.reanalyze_num], start_pos=batch_timestep[:self.reanalyze_num]) # NOTE: :self.reanalyze_num
436436
# =======================================================================
437437

438438
if not model.training:
@@ -459,13 +459,13 @@ def _compute_target_policy_reanalyzed(self, policy_re_context: List[Any], model:
459459
roots = MCTSCtree.roots(transition_batch_size, legal_actions)
460460
roots.prepare(self._cfg.root_noise_weight, noises, reward_pool, policy_logits_pool, to_play)
461461
# do MCTS for a new policy with the recent target model
462-
MCTSCtree(self._cfg).search(roots, model, latent_state_roots, to_play, batch_timestep)
462+
MCTSCtree(self._cfg).search(roots, model, latent_state_roots, to_play, batch_timestep[:self.reanalyze_num])
463463
else:
464464
# python mcts_tree
465465
roots = MCTSPtree.roots(transition_batch_size, legal_actions)
466466
roots.prepare(self._cfg.root_noise_weight, noises, reward_pool, policy_logits_pool, to_play)
467467
# do MCTS for a new policy with the recent target model
468-
MCTSPtree(self._cfg).search(roots, model, latent_state_roots, to_play, batch_timestep)
468+
MCTSPtree(self._cfg).search(roots, model, latent_state_roots, to_play, batch_timestep[:self.reanalyze_num])
469469

470470
roots_legal_actions_list = legal_actions
471471
roots_distributions = roots.get_distributions()

lzero/mcts/buffer/game_segment.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ def append(
135135
obs: np.ndarray,
136136
reward: np.ndarray,
137137
action_mask: np.ndarray = None,
138-
to_play: List = [-1],
138+
to_play: int = -1,
139139
timestep: int = 0,
140140
chance: int = 0,
141141
) -> None:

lzero/mcts/ptree/ptree_ez.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -239,7 +239,7 @@ def prepare(
239239
noises: List[float],
240240
value_prefixs: List[float],
241241
policies: List[List[float]],
242-
to_play: List = [-1]
242+
to_play: Union[int, List] = -1
243243
) -> None:
244244
"""
245245
Overview:
@@ -261,7 +261,7 @@ def prepare(
261261
self.roots[i].add_exploration_noise(root_noise_weight, noises[i])
262262
self.roots[i].visit_count += 1
263263

264-
def prepare_no_noise(self, value_prefixs: List[float], policies: List[List[float]], to_play: List = [-1]) -> None:
264+
def prepare_no_noise(self, value_prefixs: List[float], policies: List[List[float]], to_play: Union[int, List] = -1) -> None:
265265
"""
266266
Overview:
267267
Expand the roots without noise.

lzero/mcts/ptree/ptree_mz.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ def prepare(
220220
noises: List[float],
221221
rewards: List[float],
222222
policies: List[List[float]],
223-
to_play: List = [-1]
223+
to_play: Union[int, List] = -1
224224
) -> None:
225225
"""
226226
Overview:
@@ -241,7 +241,7 @@ def prepare(
241241
self.roots[i].add_exploration_noise(root_noise_weight, noises[i])
242242
self.roots[i].visit_count += 1
243243

244-
def prepare_no_noise(self, rewards: List[float], policies: List[List[float]], to_play: List = [-1]) -> None:
244+
def prepare_no_noise(self, rewards: List[float], policies: List[List[float]], to_play: Union[int, List] = -1) -> None:
245245
"""
246246
Overview:
247247
Expand the roots without noise.

0 commit comments

Comments
 (0)