CA1 Integrated Assessment MSc Data Analytics CCT Semester 2

Paper Title

Storage Solutions and Data Analytics: RDBMS, Hadoop, and APIs in Neural Networks Contexts

The chosen topic is storage solutions and Neural Networks, with NN being considered a type of Machine Learning (ML) process known as deep learning (Mishra and Gupta, 2017). The field of Big Data is constantly growing and encompasses a need for efficient data management and processing tools. Two well-known tools for handling and analyzing large datasets are Relational Database Management Systems (RDBMS) and Hadoop. However, the rampant advancement of Machine Learning and Neural Networks, the integration of these data management tools with advanced analytics technologies is the focus of this paper.

A. Objectives

Examine the current state of RDBMS, Hadoop, and APIs when used in modeling NN and a CNN.
Store a 1.6GB dataset in both a RDBMS (MySQL) and Hadoop, and then retrieve the data into a Jupyter Notebook to model a neural network.
Utilize an API (Keras) to model a convolutional neural network and compare its performance in conjunction with RDBMS and Hadoop.
Discuss the rationale behind the selection of the NN and CNN model for each scenario.

B. Conclusions

HDFS is more efficient than MySQL in terms of data storage memory usage and data processing.
A deep learning API like Keras can be an excellent starting point for non-technical users who wish to model a NN. The significant technical knowledge required to use MySQL or Hadoop highlights Keras as a viable alternative.
NNs must be validated; training scores alone are insufficient to determine if a model is adjusted. Validation of accuracy and loss should be conducted and compared with training scores.
A NN will generalize better to unseen data if more data is fed into the model. An effective approach to mitigate overfitting is to augment the dataset.
Dropout should be implemented and tested when a CNN is overfitted; in this case, it proved helpful in reducing overfitting.

C. Implementation Steps

Extract people.csv zipped folder.
Increase dataset size by using Jupyter Notebook 1.Increasing_dataset_size.ipynb.
Load the dataset into HDFS and MySQL:
- HDFS:
  - Create a directory and pass the file into it:
```
hadoop fs -mkdir /CA1_S2
hadoop fs -put ./people_increased.csv /CA1_S2
```
- MySQL:
  - Create a table, you can use Population_data_table_creation.sql for it. To load the data please use script 2.Importing_1.6GB_CSV_to_MySQL.ipynb.
Run 3.HDFS_Data_to_model_NN.ipynb, data stored in Hadoop will be used to model a Neural Network.
Run 4.MySQL_Data_to_model_NN.ipynb, data stored in MySQL will be used to model a Neural Network.
Run 6.API_Data_to_model_CNN.ipynb, we are using IMDB Keras dataset to model a Convolutional Neural Network.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.ipynb_checkpoints		.ipynb_checkpoints
1.Increasing_dataset_size.ipynb		1.Increasing_dataset_size.ipynb
2.Importing_1.6GB_CSV_to_MySQL.ipynb		2.Importing_1.6GB_CSV_to_MySQL.ipynb
3.HDFS_Data_to_model_NN.ipynb		3.HDFS_Data_to_model_NN.ipynb
4.MySQL_Data_to_model_NN.ipynb		4.MySQL_Data_to_model_NN.ipynb
5.API_Data_to_model_CNN.ipynb		5.API_Data_to_model_CNN.ipynb
Demo_recording.mp4		Demo_recording.mp4
LICENSE		LICENSE
Population_data_table_creation.sql		Population_data_table_creation.sql
README.md		README.md
people.csv.zip		people.csv.zip
sba23021_Integrated_CA.docx		sba23021_Integrated_CA.docx
sba23021_Integrated_CA_Cover_Page.docx		sba23021_Integrated_CA_Cover_Page.docx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CA1 Integrated Assessment MSc Data Analytics CCT Semester 2

Paper Title

A. Objectives

B. Conclusions

C. Implementation Steps

About

Releases

Packages

Languages

License

JoseRicoCct/CA1_Integrated_Assesment_MSc_Data_Analytics_CCT_Semester_2

Folders and files

Latest commit

History

Repository files navigation

CA1 Integrated Assessment MSc Data Analytics CCT Semester 2

Paper Title

A. Objectives

B. Conclusions

C. Implementation Steps

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages