The first rule for implementing something with ML or blockchainโฆ
is to figure out if you can implement it without ML or blockchain.
UC Berkeley breaks out the learning system of a machine learning algorithm into three main parts.
๐ญ. ๐๐ข๐ง๐๐๐ซ ๐๐๐ ๐ซ๐๐ฌ๐ฌ๐ข๐จ๐ง
Type: Supervised
Use: Predicting continuous values (e.g., stock prices)
Explanation: Finds the relationship between input and output variables by fitting a straight line. Best for simple, linear relationships.
๐ฎ. ๐๐จ๐ ๐ข๐ฌ๐ญ๐ข๐ ๐๐๐ ๐ซ๐๐ฌ๐ฌ๐ข๐จ๐ง
Type: Supervised
Use: Classification problems (e.g., spam vs. not spam)
Explanation: Used to model binary outcomes by fitting data to a sigmoid curve, outputting probabilities.
๐ฏ. ๐๐๐๐ข๐ฌ๐ข๐จ๐ง ๐๐ซ๐๐๐ฌ
Type: Supervised
Use: Classification & regression
Explanation: Splits data into branches to make decisions based on conditions. Offers easy interpretability but may overfit without tuning.
๐ฐ. ๐๐๐ง๐๐จ๐ฆ ๐
๐จ๐ซ๐๐ฌ๐ญ
Type: Supervised
Use: Classification & regression
Explanation: Combines multiple decision trees for robust predictions, reducing overfitting. Ideal for handling complex data with noise.
๐ฑ. ๐๐ฎ๐ฉ๐ฉ๐จ๐ซ๐ญ ๐๐๐๐ญ๐จ๐ซ ๐๐๐๐ก๐ข๐ง๐๐ฌ (๐๐๐)
Type: Supervised
Use: Classification
Explanation: Finds the hyperplane that best separates data points into classes. Works well for high-dimensional data.
๐ฒ. ๐-๐๐๐๐ซ๐๐ฌ๐ญ ๐๐๐ข๐ ๐ก๐๐จ๐ซ๐ฌ (๐๐๐)
Type: Supervised
Use: Classification
Explanation: Classifies based on the majority class among nearest neighbors. Best for low-dimensional, well-labeled data.
๐ณ. ๐๐๐ขฬ๐ฏ๐ ๐๐๐ฒ๐๐ฌ
Type: Supervised
Use: Text classification, spam detection
Explanation: Uses probability for predictions, assuming feature independence. Often effective with text and sentiment analysis.
๐ด. ๐-๐๐๐๐ง๐ฌ ๐๐ฅ๐ฎ๐ฌ๐ญ๐๐ซ๐ข๐ง๐
Type: Unsupervised
Use: Grouping data (e.g., customer segmentation)
Explanation: Clusters data points around centroids, used to find patterns without labeled data.
๐ต. ๐๐ซ๐ข๐ง๐๐ข๐ฉ๐๐ฅ ๐๐จ๐ฆ๐ฉ๐จ๐ง๐๐ง๐ญ ๐๐ง๐๐ฅ๐ฒ๐ฌ๐ข๐ฌ (๐๐๐)
Type: Unsupervised
Use: Reducing data dimensions
๐ญ๐ฌ. ๐๐๐ฎ๐ซ๐๐ฅ ๐๐๐ญ๐ฐ๐จ๐ซ๐ค๐ฌ
Type: Supervised/Unsupervised
Use: Complex tasks like image & language processing

This session falls into the category of Machine Learning, specifically focusing on the Data Science and Model Development aspects of AI lifecycle management. By understanding the AI process, we can appreciate how data is transformed into intelligent decisions. From collecting and preprocessing data to training models and deploying them, each step plays a vital role in making AI systems effective and reliable.
1๏ธโฃ Data Collection: The AI process begins with gathering data from different sources. This can include information like numbers, text, images, or videos. The data acts as the building blocks for AI systems, helping them learn and make decisions. Think of it as the raw material we need to work with!
2๏ธโฃ Data Preprocessing: Once we have the data, we need to clean and organize it. This step involves removing any errors, duplicates, or irrelevant parts. We also make sure the data is in a format that the AI algorithms can understand. It's like tidying up the data so that it's ready for analysis!
3๏ธโฃ Feature Extraction: Now, we need to extract the most important parts of the data. These are called features, and they help the AI algorithms understand what's significant in the data. It's like highlighting the essential details that will guide the AI system's decision-making process.
4๏ธโฃ Model Training: Next, we feed the extracted features into AI models. These models are like intelligent algorithms that learn from the data. We train them by repeatedly showing them examples and helping them adjust their settings to make accurate predictions or decisions. It's like teaching a model to recognize patterns or make judgments based on what it has learned!
5๏ธโฃ Model Evaluation: Once the model has been trained, we need to check how well it performs. We use evaluation metrics to measure its accuracy or effectiveness. This step helps us ensure that the model is reliable and provides valuable insights. It's like testing the model to make sure it's doing a good job!
6๏ธโฃ Deployment and Inference: After training and evaluation, we put the model to work in the real world. We integrate it into systems or applications where it can process new, unseen data and provide predictions or decisions. It's like unleashing the power of the trained model to make practical use of its intelligence!
7๏ธโฃ Continuous Monitoring and Improvement: AI is an ongoing process. We regularly monitor the model's performance, collect feedback, and update it as needed. This ensures that the AI system remains accurate and aligned with the desired outcomes. It's like taking care of the model and making improvements to keep it at its best!

The terms "AI,โ "machine learning" and "deep learning" are often used interchangeably- but they don't mean the same thing. Here's a breakdown of how they differ.
| AI (Neural Networks (1950sโ1970s) / Gen AI (present)) | Artificial intelligence studies how computers mimic the functions of natural intelligence. The term was coined in 1956, and includes everything from machine learning to cybernetics, machine ethics and more. |
| Machine Learning (1980sโ2010s) | Machine learning can be split into unsupervised and supervised learning. In unsupervised machine learning, algorithms attempt to structure unlabeled data in meaningful ways and uncover hidden patterns, for example, through clustering. In supervised learning, algorithms learn to make predictions from a training dataset of labeled data, such as assigning a known class to previously unseen data. |
| Deep Learning (2011โ2020s) | Deep learning is a subset of machine learning that mimics the structure of the human brain to solve both supervised and unsupervised tasks, using multiple layers of artificial neural networks to make progressively more abstract and higher-level decisions. |

A comprehensive guide to choosing the right machine learning model for your problem, from image generation to natural language understanding.
Machine Learning tooling๐ A ranked list of awesome machine learning Python libraries.
|
Top-10 study listMy top-10 study list to learn Machine Learning:
10 steps of Machine Learning
|
Machine Learning hyperparameters are crucial for optimizing model performance.

More complete list:


The 3 types of machine learning (that every data scientist should know). Here's 3 months of research in 3 minutes. Let's go! โบ๏ธ
| Frequently used algorithms for biomedical research | Example Usage (Data Type) | Type of learning | |
|---|---|---|---|
| Machine Learning (SL) | SVM | Cancer vs healthy classification (gene expression) | Supervised Learning (SL):
|
| KNN | Multiclass tissue classification (gene expression) | ||
| Regression | Genome-wide association analysis (SNP) | ||
| Random forest | Pathway-based classification (gene expression, SNP) | ||
| Deep Learning (SL) | CNN | Protein secondary structure prediction (amino acid sequence) | |
| RNN | Sequence similarity prediction (nucleotide sequence) | ||
| Clustering (UL) | Hierarchical | Protein family clustering (amino acid sequence) | Unsupervised Learning (UL):
|
| K-means | Clustering genes by chromosomes (gene expression) | ||
| Dimensionality Reduction (UL) | PCA | Classification of outliers (gene expression) | |
| tSNE | Data visualization (single cell RNA-sequencing) | ||
| NMF | Clustering gene expression profiles (gene expression) |



| Design | Model Development | Operations |
|---|---|---|
| Requirements engineering | Data Engineering | ML Model Deployment |
| ML Use-Cases prioritization | ML Model Engineering | CI/CD Pipelines |
| Data availability check | Model testing & validation | Monitoring & Triggering |

Machine Learning languages: Python, R, C++, Java, Prolog, Lisp, Lush.
Data Analysis & Visualisation tools: Pandas, Matplotlib, Jupyter Notebook, Weka, Tableau.
Big Data tools: MemSQL, Apache Spark.
Machine Learning platforms & frameworks: Numpy, Scikit-learn, NLTK, Azure ML, Apache Mahout, Knime, Weka, Amazon ML, Rapid Miner, Colab, Scikit Learn, Tensor Flow, Keras, PyTorch, Shougan.
Machine Learning frameworks for natural network modelling: Pytorch, Keras, Caffe 2, Tensorflow & Tensorboard.
Maths for Machine Learning: Linear Algebra, Statistics, Geometry, Calculus, Probability, Regression.
Tensorflow isn't a high-level visualization library. Plotly, Seaborn and Matplotlib are.
Classification
Regression
Clustering
Dimensionality Reduction
Association
Bagging
Stacking
Boosting
Feedforward Neural Networks
Recurrent Neural Networks
Generative Models
Specialized Networks
Value-Based
Policy-Based
Model-Based
Other Algorithms
Estimated savings: 10% of sales x 3 months = $6,000,000. Pretty shocking what a couple data science skills can do for a business.

This repo covers everything you need to know about MLOps.
The goal of the series is to understand the basics of MLOps like model building, monitoring, configurations, testing, packaging, deployment, cicd, etc.

| Term | Meaning |
|---|---|
| association | The extent to which values of one field depend on or are predicted by values of another field. |
| bagging | A modeling technique that is designed to enhance the stability of the model and avoid overfitting. See also boosting, overfitting. |
| batch scoring | Running the model predictions offline (asynchronously) on a large dataset. |
| Bayesian network | A graphical model that displays variables in a data set and the probabilistic or conditional in-dependencies between them. |
| binomial logistic regression | A logistic regression that is used for targets with two discrete categories. See also multinomial logistic regression, target. |
| boosting | A modeling technique that creates a sequence of models, rather than a single model, to obtain more accurate predictions. Cases are classified by applying the whole set of models to them, and then combining the separate predictions into one overall prediction. See also bagging. |
| classification and regression tree algorithm | A decision tree algorithm that uses recursive partitioning to split the training records into segments by minimizing the impurity at each step. See also Quick, Unbiased, Efficient Statistical Tree algorithm. |
| confidence score | An estimate of the accuracy of a prediction, usually expressed as a number from 0.0 to 1.0. |
| correlation | A statistical measure of the association between two numeric fields. Values range from -1 to +1. A correlation of 0 means that there is no relationship between the two fields. |
| Cox regression algorithm | An algorithm that produces a survival function that predicts the probability that the event of interest has occurred at a given time for given values of the predictor variables. |
| cross-validation | A technique for testing how well a model generalizes in the absence of a holdout test sample. Cross-validation divides the training data into a number of subsets, and then builds the same number of models, with each subset held out in turn. Each of those models is tested on the holdout sample, and the average accuracy of the models on those holdout samples is used to estimate the accuracy of the model when applied to new data. See also overfitting. |
| data quality | The extent to which data has been accurately coded and stored. Factors that adversely affect data quality include missing values, data entry errors, measurement errors, and coding inconsistencies. |
| data set | A collection of data, usually in the form of rows (records) and columns (fields) and contained in a file or database table. |
| data visualization | The process of presenting data patterns in graphical format, including the use of traditional plots as well as advanced interactive graphics. In many cases, visualization reveals patterns that would be difficult to find using other methods. |
| decision list | An algorithm that identifies subgroups or segments that show a higher or lower likelihood of a given binary (yes/no) outcome relative to the overall population. |
| decision tree algorithm | An algorithm that classifies data, or predicts future outcomes, based on a set of decision rules. |
| deployment | The process of enabling the widespread use of a predictive analytics project within an organization. |
| evaluate | The process of determining whether a model will accurately predict the target on new and future data. |
| heat map | A graphical representation of data values in a two-dimensional table format, in which higher values are represented by darker colors and lower values by lighter ones. |
| histogram | A graphical display of the distribution of values for a numeric field, in the form of a vertical bar chart in which taller bars indicate higher values. |
| linear regression | A statistical technique for estimating a linear model for a continuous (numeric) output field. Linear models predict a continuous target based on linear relationships between the target and one or more predictors. See also regression. |
| linear regression model | A modeling algorithm that assumes that the relationship between the input and the output for the model is of a particular, simple form. The model fits the best line through linear regression and generates a linear mapping between the input variables and each output variable. |
| logistic regression | A statistical technique for classifying records based on the values of the input fields. Logistic regression is similar to linear regression, but takes a categorical target field instead of a numeric one. See also regression. |
| misclassification cost | A specification of the relative importance of different kinds of classification errors, such as classifying a high-risk credit applicant as low risk. Costs are specified in the form of weights applied to specific incorrect predictions. |
| model building | The process of creating data models by using algorithms. Model building typically consists of several stages: training, testing and (optionally) validation of evaluation. See also testing, training, validation. |
| multinomial logistic regression | A logistic regression that is used for targets with more than two categories. See also binomial logistic regression, target. |
| neural network | A mathematical model for predicting or classifying cases by using a complex mathematical scheme that simulates an abstract version of brain cells. A neural network is trained by presenting it with a large number of observed cases, one at a time, and allowing it to update itself repeatedly until it learns the task. |
| online scoring | Apply model prediction real time on a single record through a published endpoint within or outside the organization, expects fast response in terms of milliseconds. |
| overfitting | The unintentional modeling of chance variations in data, leading to models that do not work well when applied to other data sets. Bagging and cross-validation are two methods for detecting or preventing overfitting. See also bagging, cross-validation. |
| partition | To divide a data set into separate subsets or samples for the training, testing, and validation stages of model building. |
| predictive analytics | A business process and a set of related technologies that are concerned with the prediction of future possibilities and trends. Predictive analytics applies such diverse disciplines as probability, statistics, machine learning, and artificial intelligence to business problems to find the best action for a given situation. |
| Predictive Model Markup Language (PMML) | An XML-based language defined by the Data Mining Group that provides a way for companies to define predictive models and share models between compliant vendors' applications. |
| probability | A measure of the likelihood that an event will occur. Probability values range from 0 to 1; 0 implies that the event never occurs, and 1 implies that the event always occurs. A probability of 0.5 indicates that the event has an even chance of occurring or not occurring. |
| Quick, Unbiased, Efficient Statistical Tree algorithm (QUEST) | A decision tree algorithm that provides a binary classification method for building the tree. The algorithm is designed to reduce the processing time required for large C & R tree analyses while also reducing the tendency found in classification tree methods to favor inputs that allow more splits. See also classification and regression tree algorithm, decision tree algorithm. |
| Regression | A statistical technique for estimating the value of a target field based on the values of one or more input fields. See also linear regression, logistic regression. |
| Regression tree algorithm | A tree-based algorithm that splits a sample of cases repeatedly to derive homogeneous subsets, based on values of a numeric output field. See also Chi-squared Automatic Interaction Detector algorithm. |
| score | To apply a predictive model to a data set with the intention of producing a classification or prediction for a new, untested case. |
| script | A series of commands, combined in a file, that carry out a particular function when the file is run. Scripts are interpreted as they are run. |
| testing | The stage of model building in which the model produced by the training stage is tested against a data subset for which the outcome is already known. See also model building, training, validation. |
| training | The initial stage of model building, involving a subset of the source data. The model can then be tested against a further, different subset for which the outcome is already known. See also model building, testing, validation. |
| transformation | A formula that is applied to the values of a field to alter the distribution of values. Some statistical methods require that fields have a particular distribution. When a field's distribution differs from what is required, a transformation (such as taking logarithms of values) can often remedy the problem. |
| unrefined model | A model that contains information extracted from the data but which is not designed for generating predictions directly. |
| validation | An optional final stage of model building in which the refined model from the testing stage is validated against a further subset of the source data. See also model building, testing, training. |
Mathematics: Probability | Statictics | Discrete
Programming: Python | R | Java
Database: MySQL | MongoDB
Machine Learning: Scikit learn Supervised learning
Linsupervised learning | Reinforcement learning
Machine Learning: ML Libraries and Non-ML Libraries
ML Algorithms: Linear | Logistic Regression | KNN | K-means | Random | forest & more!
Deep Learning: TensorFlow, Keras | Neural Networks | CNN, RNN, GAN, LSTMS
Data Visualization Tools: Tableau | Qlikview | PowerBI
ML Engineer





If you're serious about growing in AI/ML, these are top 12 blogs worth reading in 2025 ๐
These authors build production LLM systems, ship AI features to millions of users, and share insights you won't find anywhere else:
1) Andrej Karpathy (ex Tesla AI Director & OpenAI co-founder)
Neural networks and LLMs explained from first principles by one of the OGs of modern AI.
๐ Visit Blog
2) Sebastian Raschka, PhD
Deep dives into LLM training and fine-tuning with super clear code examples.
๐ Visit Blog
3) Interconnects by Nathan Lambert
AI alignment, open-source models, and ecosystem news.
๐ Visit Blog
4) LilโLog by Lilian Weng (ex VP of Research at OpenAI)
Lessons from someone who worked on practical AI safety and alignment at OpenAI.
๐ Visit Blog
5) Chip Huyen
Real-world MLOps and production ML systems design patterns.
๐ Visit Blog
6) Eugene Yan (Principal Applied Scientist at Amazon)
Great writing on applied ML, data science, and working with recommender systems in production.
๐ Visit Blog
7) Philipp Schmid (Senior AI Relation Engineer at Google DeepMind, ex Hugging Face)
Tutorials on building and deploying LLM apps on AWS.
๐ Visit Blog
8) Jason Liu
Learn from a consultant sharing real lessons on LLMs, data, and open-source tools.
๐ Visit Blog
9) Hamel H. (ex GitHub Staff ML Engineer)
MLOps workflows, fine-tuning, and product strategy from an ML veteran.
๐ Visit Blog
10) Berkeley Artificial Intelligence Research Blog
Latest academic breakthroughs in computer vision, NLP, and robotics.
๐ Visit Blog
11) Hugging Face
Product updates, tutorials, and the latest from open-source AI.
๐ Visit Blog
12) Google DeepMind
Google's premier AI research division.
๐ Visit Blog