This project demonstrates the prediction of gold prices using a machine learning model (Random Forest Regressor). The model is trained on historical gold price data to predict future values with a focus on correlation analysis and error measurement (R squared).
The dataset used contains historical data of gold prices and related financial parameters, loaded from a CSV file. The date approximation is of around 10 years of data.
- pandas: Data manipulation and analysis
- numpy: Numerical computations
- matplotlib: Data visualization
- seaborn: Advanced data visualization
- scikit-learn: Machine learning models and metrics
- Data Loading: The dataset is loaded using
pandas
from a CSV file. - Data Cleaning: Checked for missing values and converted columns to numeric types.
- Exploratory Data Analysis (EDA):
- Used correlation matrix to find relationships between variables.
- Visualized the distribution of gold prices.
- Model Training:
- Features (
X
) were extracted by removing theDate
andGLD
columns. - The target (
Y
) is theGLD
(gold price). - The data is split into training and testing sets using an 80/20 split.
- Features (
- Prediction:
- The Random Forest Regressor model is trained on the training set.
- The model predicts the values on the test set, and the performance is evaluated using the R-squared error.
- Visualization:
- Comparison of actual vs predicted gold prices using a line plot.
- The model's accuracy is measured using the R-squared error, where a higher score indicates better performance.
- Correlation Heatmap: Shows the relationships between different financial factors and gold prices.
- Actual vs Predicted Plot: Compares the model's predictions with actual values.
- Python 3.9.6 or above
- Libraries:
pandas
,numpy
,matplotlib
,seaborn
,scikit-learn