To discover more tweets like this and other content, follow me on Twitter: @bhavsarkaustubh
NOTE: This repository will be continuously updated with relevant tweets as I write them.
Look beyond summary statistics for EDA -> Are you relying solely on summary statistics for your exploratory data analysis in Machine Learning projects? Here's why you shouldn't!
The 'No Free Lunch' theorem -> If you think there's a perfect machine learning algorithm out there that can solve any problem, think again.
Use of Weights and Biases -> Neural networks, the backbone of modern AI, rely on tunable parameters called weights and biases. But why do we need both parameters?
Linear Regression for classification -> Can linear regression be used for classification problems?
Handling limited training data and validation datasets -> At times, we face the challenge of having limited training data and struggle to create a separate validation dataset. In such situations, we have two options to consider. Find them out here.
Cosine Similarity -> When comparing vectors using cosine similarity, magnitude takes a backseat! The magic lies in the direction they point towards.
Benefits of Chaining Prompts -> Ever wondered why the complex prompt is frequently divided into smaller prompts, despite claims that the newer language models are capable of handling complex prompts?
Steming and Lemmatization -> Have you ever heard of stemming and lemmatization? They are both language processing techniques used in NLP, but do you know how they differ?
What is Reinforcement Language? -> Want to understand Reinforcement Learning? It can be a tricky concept, but a unique 'farmer analogy' can help simplify it.
Operational Databases, Data Lakes, and Data Warehouses -> Confused about the difference between operational databases, data lakes, and data warehouses? Here's a quick guide with examples to help you understand! A lemonade business analogy.
Unit Test Vs Smoke Test -> Do you know what unit testing and smoke testing are? These are two important ways programmers make sure their code works properly.
Optimizing the Memory Usage of Dataframes -> Are you working with large datasets in Pandas? Optimizing the memory usage of your dataframes can help prevent memory errors. Here's a tip: try downcasting numerical columns to smaller data types!
Argparse -> Make your Python scripts more user-friendly with 'argparse' - the module that makes it easy to parse and handle command-line arguments! Bonus: Know how 'argparse' can help to streamline workflow as a Machine Learning engineer!
Shallow copy Vs Deep copy -> Are you confused about the difference between shallow copy and deep copy in Python? Let's clear it up!
Overview of Docstrings -> Writing clear and maintainable code is an important goal for any developer. One way to achieve this in Python is by using docstrings to document your functions!
DataFrame.apply(), Zip, and Vectorization -> Need a faster alternative to df.apply() in Pandas? Try using vectorization or the zip method for faster and more efficient calculations.
Numpy Tiled Arrays -> Want to repeat a NumPy array multiple times? With 'np.tile', you can quickly and easily create tiled arrays to suit your needs.
DataFrame.apply() command -> If you're working with large datasets in pandas, for-loops can be slow and memory-intensive. DataFrame.apply() is a better option for optimizing your code and improving performance. See how to use it with both lambda and user-defined functions.
Short and Basic QnA on Neural Networks -> What do we tune in a neural network? Who tunes these parameters? What basis does it use to tune the parameters? Why we don't use RELU activation function in the output layer for classification? What was the need for RELU when we already had SIGMOID activation function?