A collection of course and hobby experiments exploring a variety of machine-learning techniques, from classic NLP to modern deep learning and reinforcement learning. Each project lives in its own directory and can be run independently.
The repository root contains individual project folders plus a standalone Java agent:
| Path | Description |
|---|---|
Author Classification - naivebayLearning - NLP/ |
Naive Bayes text classifier that distinguishes between disputed Federalist Papers. |
CreditCardFraudDetection-NN/ |
Scratch-built feed-forward neural network for transaction fraud detection. |
Joke Recommentation Unsupervised Learning/ |
Matrix factorisation recommender trained on the Jester joke ratings dataset. |
keras-yolo2-master-Raccoon Classification/ |
YOLOv2-based object detector fine-tuned for raccoon imagery. |
ExMarioAgent.java |
Reinforcement-learning Mario agent submitted to the 2009 RL-Competition. |
Each project section below explains its layout, dependencies, configuration, and how to launch the code.
- Overview
- Directory Layout
- Dependencies
- Configuration
- Running the Classifier
- Key Implementation Notes
A Python 3 implementation that tokenises Federalist Papers essays and trains a unigram Naive Bayes classifier to attribute disputed essays to Hamilton, Madison, or Jay. The pipeline covers tokenisation, stop-word removal, stemming, dictionary smoothing, and log-probability scoring.
| Path | Purpose |
|---|---|
main.py |
Entry point that orchestrates training and evaluation on the chosen author folders. |
tokenizer.py |
Handles tokenisation, stop-word removal, and stemming logic. |
dictionary.py |
Builds smoothed word-frequency dictionaries and computes log-probabilities. |
fileGet.py |
Splits documents into training/unknown sets and loads corpora. |
Hamilton/, Madison/, Jay/, Disputed/ |
Default corpora of essays grouped by author. Binary subsets (bin*) are also provided for experiments. |
Install Python 3 plus the Natural Language Toolkit. The project requires the Punkt tokenizer and stop-word corpus:
pip install nltk pandas
python - <<'PY'
import nltk
nltk.download('punkt')
nltk.download('stopwords')
PYThe dataset folders are configured at the top of main.py. Adjust the variables if you place corpora in different directories, or switch to the binary subsets by uncommenting the provided examples inside the script.
Run the default experiment with:
python3 main.pyBy default the script analyses the Hamilton, Madison, Jay, and Disputed folders. Update the folder names (line 14 of main.py) to evaluate alternative corpora.
- Uses log-probabilities with add-one smoothing to avoid underflow.
- Unknown-token handling reserves a random split of each author’s texts to measure distinctive vocabulary usage.
- Outputs include per-author probabilities that incorporate prior publishing frequency.
Implements a neural network from first principles (no deep-learning frameworks) to classify credit-card transactions as fraudulent or legitimate. The network architecture, activation functions, and training loop are defined in pure Python.
| Path | Purpose |
|---|---|
neural_net.py |
Command-line interface for training and evaluating the model. |
neuron.py |
Defines the neuron and network classes powering forward/backward passes. |
trainCredit.csv, testCredit.csv |
Default Kaggle-style credit card datasets. Binary subsets (trainBin.csv, testBin.csv) are available for quick experimentation. |
Requires Python 3 with NumPy for numerical operations.
The script expects training and evaluation files, neuron count, and a decision threshold:
python3 neural_net.py trainCredit.csv testCredit.csv 28 0.528specifies the neuron count per layer and must match the feature count in the CSVs.0.5sets the classification threshold; adjust as needed for new datasets.
- The script skips the first row and column to match the supplied dataset formatting.
- Tune the
iteratevariable insideneural_net.pyto change the number of passes over each training row. - When experimenting with other datasets, start by increasing the threshold toward
0.9to compensate for different class imbalances.
A matrix-factorisation recommendation engine that predicts joke ratings using collaborative filtering on the Jester dataset. The implementation evaluates reconstruction accuracy across multiple optimisation iterations.
| Path | Purpose |
|---|---|
project3.py |
Entry point implementing stochastic gradient descent for latent-factor learning. |
testJ.csv |
Lightweight sample dataset for quick experiments. |
jester_dataset_3.zip |
Full Jester dataset archive; unzip to access the complete ratings matrix. |
Install Python 3 along with the required scientific stack:
pip install pandas numpy matplotlibProvide the dataset path and optional hyperparameters. Missing arguments fall back to sensible defaults.
python3 project3.py testJ.csv 0.005 0.001 10 500Arguments correspond to [dataset] [learning_rate] [regularisation_lambda] [latent_features] [iterations].
- Keep the learning rate small (≈0.005) to avoid divergence.
- Use
testJ.csvfor rapid iteration, then switch to the unzipped Jester dataset for full-scale evaluation. - Visualisation hooks rely on Matplotlib; ensure you have a display backend if plotting on a remote server.
Forks the keras-yolo2 project to train an object detector that finds raccoons in images. The workflow covers anchor generation, training on annotated images, and running inference with the trained weights.
| Path | Purpose |
|---|---|
config.json |
Central configuration for dataset paths, training hyperparameters, and model architecture. |
train.py / predict.py |
CLI scripts for training and inference, respectively. |
gen_anchors.py |
Utility for recomputing anchor boxes from the training annotations. |
images/ |
Sample images for inference sanity checks. |
experimental/, examples/, backend.py, frontend.py, utils.py |
Core YOLOv2 implementation components. |
Install Python 3 with the libraries listed in requirements.txt (Keras, OpenCV, imgaug, tqdm, h5py, etc.). Use a virtual environment for isolation:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtEdit config.json before training:
- Set
model: "Tiny Yolo"to match the supplied lightweight backbone. - Update
train_image_folderandtrain_annot_folderto point at the extracted raccoon dataset (images + Pascal VOC XML annotations). - Leave
pretrained_weightsblank unless resuming from existing weights. - Choose a descriptive
saved_weights_namefor the output model file.
Generate anchors (optional) and train the network:
python gen_anchors.py -c config.json
python train.py -c config.jsonUse the provided hyperparameters (e.g., 2 training iterations, 10 epochs, 3 warm-up epochs) as a starting point.
After training, run detection on an image or video:
python predict.py -c config.json -w path/to/best_weights.h5 -i path/to/inputDetections are written alongside the input media with a -Detected suffix.
- Ensure the raccoon dataset is resized consistently; YOLO performs best when training images share similar dimensions.
- The project is optimised for live video streams, so highly variable still-image sizes may reduce accuracy.
ExMarioAgent.java customises the Rutgers RL-Competition 2009 Mario agent. It augments the baseline heuristic controller with Monte Carlo state-value updates, episodic memory, and JSON-driven policy initialisation to solve Mario levels via reinforcement learning.
- Java 8 or newer.
- RL-Glue codec (
org.rlcommunity.rlglue) to interface with the Mario environment. - JSON processing library available on the classpath to parse
ref_export.json.
Compile the agent alongside the RL-Competition environment and launch via the RL-Glue agent loader:
javac ExMarioAgent.java
java org.rlcommunity.rlglue.codec.util.AgentLoader edu.rutgers.rl3.comp.ExMarioAgentEnsure ref_export.json and the RL environment resources are accessible on the classpath or working directory.
- Tracks 13 binary state features covering monsters, pits, power-ups, and terrain topology.
- Maintains policy, reward, and iteration tables sized at
2^13 × 12for Monte Carlo updates. - Stores per-episode state/action/reward sequences to refine the policy after each run.
Have fun exploring the projects! Contributions and experiments are welcome—feel free to open issues or pull requests with improvements.