-
Notifications
You must be signed in to change notification settings - Fork 90
Open
Description
import pip install gym
import numpy as np
Membuat lingkungan CartPole
env = gym.make('CartPole-v1')
Inisialisasi tabel Q dengan nilai-nilai acak
state_space_size = env.observation_space.shape[0]
action_space_size = env.action_space.n
q_table = np.random.rand(state_space_size, action_space_size)
Hyperparameters
learning_rate = 0.1
discount_factor = 0.99
exploration_prob = 1.0
exploration_decay = 0.995
max_episodes = 1000
for episode in range(max_episodes):
state = env.reset()
done = False
total_reward = 0
while not done:
# Pilih tindakan dengan epsilon-greedy policy
if np.random.rand() < exploration_prob:
action = env.action_space.sample() # Eksplorasi
else:
action = np.argmax(q_table[state, :]) # Eksploitasi
next_state, reward, done, _ = env.step(action)
# Update nilai Q dengan Q-learning
q_table[state, action] = (1 - learning_rate) * q_table[state, action] + \
learning_rate * (reward + discount_factor * np.max(q_table[next_state, :]))
state = next_state
total_reward += reward
# Kurangi nilai epsilon (exploration)
exploration_prob *= exploration_decay
print(f"Episode {episode + 1}, Total Reward: {total_reward}")
Akhir dari pelatihan, Anda dapat menguji agen Anda di sini
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels