Skip to content

Assignment1 #6

@ratnasar

Description

@ratnasar

import pip install gym
import numpy as np

Membuat lingkungan CartPole

env = gym.make('CartPole-v1')

Inisialisasi tabel Q dengan nilai-nilai acak

state_space_size = env.observation_space.shape[0]
action_space_size = env.action_space.n
q_table = np.random.rand(state_space_size, action_space_size)

Hyperparameters

learning_rate = 0.1
discount_factor = 0.99
exploration_prob = 1.0
exploration_decay = 0.995
max_episodes = 1000

for episode in range(max_episodes):
state = env.reset()
done = False
total_reward = 0

while not done:
    # Pilih tindakan dengan epsilon-greedy policy
    if np.random.rand() < exploration_prob:
        action = env.action_space.sample()  # Eksplorasi
    else:
        action = np.argmax(q_table[state, :])  # Eksploitasi
    
    next_state, reward, done, _ = env.step(action)
    
    # Update nilai Q dengan Q-learning
    q_table[state, action] = (1 - learning_rate) * q_table[state, action] + \
                             learning_rate * (reward + discount_factor * np.max(q_table[next_state, :]))
    
    state = next_state
    total_reward += reward

# Kurangi nilai epsilon (exploration)
exploration_prob *= exploration_decay

print(f"Episode {episode + 1}, Total Reward: {total_reward}")

Akhir dari pelatihan, Anda dapat menguji agen Anda di sini

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions