I am an engineer⚙️ and data scientist 🧑🏻🔬. I decided to pursue a master's degree in Applied Data Science in the Netherlands because I am fascinated by the diverse ways it can be applied to inform decision-making in various industries, and improve people's lives. I am seeking to combine my technical and analytical skills to excel in the wonderful field of Data Science.
- 🌍 I'm based in Netherlands
- ✉️ You can contact me at lealcastillo1996@gmail.com
- 🤝 I'm open to collaborating on all things related to data that are interesting
Description: The chosen domain for the real-world QA task is cloud computing, focusing on Kubernetes technology. The QA system uses Kubernetes public documentation and real-time Google searches as its knowledge source. Performance evaluation is done using a Machine-trained evaluation score (MTES) called estimated human label (EHL), computed through an ML classification model. This model is trained using N-gram-based metrics. A carefully balanced dataset, labeled by human experts, includes various question categories. The research aims to enhance OS-powered QA systems and provide valuable insights into their performance factors by combining human expertise and MTES.
Project: https://github.com/lealcastillo1996/Thesis_LLMs
Research paper: https://studenttheses.uu.nl/bitstream/handle/20.500.12932/44283/final_thesis_JELC.pdf?sequence=1
Description: This study aims to identify the key determinants of property sales prices in Mexico City and understand how they vary across different geographic locations. Thus, the following research questions will be addressed: what are the key determinants for house prices in Mexico City according to Spatial Random Forest (SRF), Geographically Weighted Regression (GWR) and Multiple Geographically Weighted Regression (MGWR)? Specifically, which are the main determinants for each method and how do these results compare with each other?
Project: https://github.com/EwoutvanderVelde/SpatialCourse
Research Paper: https://github.com/EwoutvanderVelde/SpatialCourse/blob/main/Final_Report%20(2).pdf
Description: A new streaming platform recommendation system was developed from scratch, employing a combination of collaborative filtering and content-based filtering methods to deliver tailor-made and varied suggestions. This innovative system also includes an interactive interface, granting users the ability to adjust the diversity of their recommendations, ensuring a seamless and personalized user experience that matches their unique preferences
Project: https://github.com/iabrilvzqz/personalisation-for-public-media
Research Paper: https://github.com/iabrilvzqz/personalisation-for-public-media/blob/master/report%20INFOPPM.pdf
Description: A study involving natural language processing (NLP) was carried out on a dataset of more than 300,000 tweets, utilizing LDA (Latent Dirichlet Allocation) and Hugging Face open source Transformer models.
Project: https://discord.com/channels/1127677457030455437/1127677458578153494/1135889833940750419