diff --git a/docs/dataset/css/bulma.min.css b/docs/dataset/css/bulma.min.css index 28c8dc4..d5f5d61 100644 --- a/docs/dataset/css/bulma.min.css +++ b/docs/dataset/css/bulma.min.css @@ -1,3 +1,21 @@ +.bibtex-container{ + flex-grow:1; + margin:0 auto + ;position:relative; + width:auto +} + +.bibtex pre{ + -webkit-overflow-scrolling: touch; + overflow-x:auto; + padding:1.25em 1.5em; + white-space:pre; + word-wrap:normal; + font-family: "Courier", monospace; + background-color: #f4f4f4; + text-align: left; +} + .hero{ align-items:stretch; display:flex; @@ -10,4 +28,6 @@ } + + \ No newline at end of file diff --git a/docs/dataset/css/style.css b/docs/dataset/css/style.css index 9665e62..85dec28 100644 --- a/docs/dataset/css/style.css +++ b/docs/dataset/css/style.css @@ -6,8 +6,18 @@ h2, h3, h4, h5, a, p, span, body {font-weight: normal; font-family: "Google Sans .header-menu {background-color: #efeff3; width: 100%; padding: 16px 0;} .header-menu-content {max-width: 960px; margin: auto;} .header-menu-item {display: inline-block; margin-left: 16px; margin-right: 16px; font-size: 1.2em;} -.links {width: 100%; margin: auto; text-align: center; padding-top: 8px;} + +.links {width: 120%; margin: auto; text-align: center; padding-top: 8px; margin-left: -10%;} .links a {margin-left: 8px;} +.links br {display: none;} + +@media (max-width: 900px) { + .links {width: 100%; margin: auto; text-align: center; padding-top: 8px;} + .links br {display: block;} + .links a {margin-top: 8px; } +} + + .content {max-width: 960px; margin: auto; margin-top: 48px; margin-bottom: 64px;} a, h2 {color: rgb(100, 142, 246); text-decoration: none;} a:hover {color: #fa6d6d;} diff --git a/docs/dataset/index.html b/docs/dataset/index.html index 0e60a41..c595cd5 100644 --- a/docs/dataset/index.html +++ b/docs/dataset/index.html @@ -38,7 +38,7 @@ Home
- Code + Code
Dataset @@ -51,11 +51,18 @@

Dataset

- We open-source all data corresponding to the 80-task and 30-task datasets used in our multi-task experiments. They can be downloaded below. The two datasets contain 545M and 345M transitions, respectively. + We're open-sourcing $57$-task datasets from the SimbaV2 agent's replay buffers, available for download below. We hope this release encourages other research groups to share their datasets and checkpoints, driving collaboration and progress.

+
+ Click to Download:
+

@@ -63,7 +70,7 @@

Dataset

Overview

- We release two multi-task datasets with data from 80 and 30 tasks, respectively. The datasets are summarized below. + We release a total of $57$ single-task expert datasets with transition data from the replay buffers of single-task SimbaV2 agents. The domains consist of MuJoCo (5), DMControl (28), MyoSuite (10), and HumanoidBench (14), encompassing a variety of locomotion and manipulation tasks with varying levels of complexity. The dataset details are summarized below.

@@ -89,19 +96,19 @@

Overview

5 @@ -116,19 +123,19 @@

Overview

28 @@ -149,13 +156,13 @@

Overview

690k @@ -176,29 +183,24 @@

Overview

690k
- 4 + 5 690k - 345M + 171M - 20GB + 11.4GB - +
- 11 + 13 690k - 345M + 171M 20GB - +
- 345M + 171M - 20GB + 14.3GB - +
- 345M + 171M - 20GB + 11.1GB - +
-
-

- These datasets are obtained from the replay buffers of 240 single-task SimbaV2 agents, and thus contain a wide variety of behaviors ranging from random to expert policies. Multi-task agents trained on the above datasets, as well as checkpoints for each of the single-task agents, are available here. -

-

Tasks

- We list all 57 tasks in the dataset below, along with SimbaV2 policy visualizations and summary statistics. + We list all $57$ tasks in the dataset below, along with SimbaV2 policy visualizations and: summary statistics.

@@ -248,7 +250,7 @@

Tasks

- +
@@ -552,7 +554,7 @@

Tasks

- +
@@ -1098,7 +1100,7 @@

Tasks

- +
@@ -1384,7 +1386,7 @@

Tasks

- +
diff --git a/docs/dataset/js/bulma-carousel.js b/docs/dataset/js/bulma-carousel.js index 28bb55b..7456808 100644 --- a/docs/dataset/js/bulma-carousel.js +++ b/docs/dataset/js/bulma-carousel.js @@ -1036,6 +1036,7 @@ var defaultOptions = { autoplaySpeed: 3000 }; + var Autoplay = function (_EventEmitter) { _inherits(Autoplay, _EventEmitter); diff --git a/docs/dataset/videos/.DS_Store b/docs/dataset/videos/.DS_Store new file mode 100644 index 0000000..9705024 Binary files /dev/null and b/docs/dataset/videos/.DS_Store differ diff --git a/docs/index.html b/docs/index.html index 2f70302..0f1311c 100644 --- a/docs/index.html +++ b/docs/index.html @@ -3,6 +3,7 @@ + @@ -31,7 +32,7 @@

Under Review

- Hojoon Lee1, 2$\dagger$,  + Hojoon Lee1$\dagger$,  Youngdo Lee1$\dagger$,  Takuma Seno2Donghu Kim1
@@ -52,7 +53,7 @@

@@ -69,7 +70,7 @@

TL;DR

- Stop worrying about algorithms, just change the network architecture to SimbaV2 + Stop worrying about algorithms, just change the network architecture to SimbaV2

@@ -151,7 +152,7 @@

Scaling Network Size & UTD Ratio

Empiricial Analysis: Training Stability

- We track $4$ metrics during training to understand the learning dynamics of SimbaV2 and Simba: + We track average return and $4$ metrics during training to understand the learning dynamics of SimbaV2 and Simba:

  • (a) Average normalized return across tasks

  • (b) Weighted sum of $\ell_2$-norms of all intermediate features in critics

  • @@ -281,7 +282,7 @@

    SimbaV2 with Online RL

    Paper

    SimbaV2: Hyperspherical Normalization for Scalable Deep Reinforcement Learning
    Hojoon Lee*, Youngdo Lee*, Takuma Seno, Donghu Kim, Peter Stone, Jaegul Choo

    - arXiv preprint

    + arXiv preprint

    @@ -290,21 +291,31 @@

    Paper

    -
    -
-
-

Citation

+
+

Citation

- If you find our work useful, please consider citing the paper as follows: + If you find our work useful, please consider citing the paper as follows:

-
-
-
+
@article{lee2025simbav2,
+         title={Hyperspherical Normalization for Scalable Deep Reinforcement Learning}, 
+         author={Hojoon Lee and Youngdo Lee and Takuma Seno and Donghu Kim and Peter Stone and Jaegul Choo},
+         journal={arXiv preprint arXiv:2502.15280},
+         year={2025},
+}
+
+