Machine Learning Techniques (MCA556)
From your syllabus.
---
Learning with Trees
Decision Trees are one of the most popular machine learning algorithms.
They make decisions using a tree-like structure.
Example:
Study Hours?
|
> 5 Hours
|
Pass
< 5 Hours
|
Fail
---
Components of Decision Tree
Root Node
Starting point of the tree.
Example:
Study Hours?
---
Internal Node
Represents a condition.
Example:
Attendance > 75%?
---
Leaf Node
Final prediction.
Example:
Pass
Fail
---
Advantages of Decision Trees
Easy to understand
Easy to visualize
Works with numerical and categorical data
Requires little data preparation
---
Disadvantages
Can overfit
Sensitive to data changes
Large trees become complex
---
Constructing Decision Trees
Steps:
1. Select best feature
2. Split dataset
3. Create branches
4. Repeat recursively
5. Stop when classification is complete
---
Classification and Regression Trees (CART)
CART stands for:
Classification And Regression Trees
Used for:
Classification
Output is a category.
Examples:
Pass/Fail
Spam/Not Spam
---
Regression
Output is a numerical value.
Examples:
Salary prediction
House price prediction
---
Ensemble Learning
Combining multiple models to create a stronger model.
Idea:
Weak Learners
↓
Combine
↓
Strong Learner
Benefits:
Higher accuracy
Better generalization
Reduced overfitting
---
Types of Ensemble Learning
Bagging
Boosting
---
Bagging (Bootstrap Aggregating)
Multiple models are trained independently.
Process:
Dataset
↓
Random Samples
↓
Many Models
↓
Voting/Average
↓
Final Prediction
---
Advantages of Bagging
Reduces variance
Prevents overfitting
Improves stability
---
Example
Random Forest
Most famous Bagging algorithm.
Random Forest:
Uses many Decision Trees
Final answer through voting
---
Boosting
Boosting improves weak models sequentially.
Idea:
Model 1
↓
Fix Mistakes
↓
Model 2
↓
Fix Mistakes
↓
Model 3
↓
Final Strong Model
---
Advantages of Boosting
High accuracy
Handles complex problems
Improves weak learners
---
Popular Boosting Algorithms
AdaBoost
Adaptive Boosting.
---
Gradient Boosting
Improves prediction by minimizing errors.
---
XGBoost
Most widely used boosting algorithm.
Applications:
Data science competitions
Industry projects
---
Difference Between Bagging and Boosting
Bagging Boosting
Models trained independently Models trained sequentially
Reduces variance Reduces bias
Faster Slower
Random Forest AdaBoost, XGBoost
---
Probability and Learning
Machine Learning often uses probability.
Probability helps:
Handle uncertainty
Make predictions
Estimate outcomes
Applications:
Spam filtering
Disease prediction
Recommendation systems
---
Data into Probabilities
Example:
80 students passed
20 students failed
Probability of passing:
80/100 = 0.8
or
80%
---
Basic Statistics
Statistics helps understand data.
Important terms:
---
Mean
Average value.
\bar{x}=\frac{\sum x}{n}
---
Median
Middle value after sorting.
---
Mode
Most frequent value.
---
Variance
Measures spread of data.
Variance=\frac{\sum (x-\bar{x})^2}{n}
---
Gaussian Mixture Models (GMM)
Advanced clustering algorithm.
Assumption: Data is generated from multiple Gaussian distributions.
Advantages:
Flexible cluster shapes
Better than K-Means in many cases
Applications:
Image processing
Speech recognition
Pattern recognition
---
Nearest Neighbour Methods
One of the simplest ML techniques.
Most common:
K-Nearest Neighbour (KNN)
Idea:
Find the K closest data points and classify based on neighbors.
Example:
New Student
↓
Find 5 nearest students
↓
Majority Vote
↓
Prediction
---
Advantages of KNN
Easy to understand
No training phase
Good for small datasets
---
Disadvantages of KNN
Slow for large datasets
Sensitive to irrelevant features
Requires choosing K value
---
Applications of KNN
Recommendation systems
Image recognition
Medical diagnosis
Pattern recognition
---
Important Exam Questions
Short Questions
1. What is a Decision Tree?
2. Define CART.
3. What is Ensemble Learning?
4. Define Bagging.
5. Define Boosting.
6. What is Random Forest?
7. What is KNN?
8. What is GMM?
---
Long Questions
1. Explain Decision Tree construction.
2. Discuss CART with examples.
3. Explain Ensemble Learning.
4. Differentiate Bagging and Boosting.
5. Explain KNN algorithm.
6. Explain Gaussian Mixture Models.
---
Quick Revision
Decision Tree = Tree-based prediction model.
CART = Classification and Regression Trees.
Ensemble Learning = Combining multiple models.
Bagging = Independent model training.
Random Forest = Bagging-based algorithm.
Boosting = Sequential improvement of models.
KNN = Nearest neighbour classification.
GMM = Advanced clustering model.
Next Unit 5:
PCA, LDA, Factor Analysis, ICA, Isomap, Genetic Algorithms, Evolutionary Learning, Reinforcement Learning, Markov Decision Process (MDP) — the final unit of Machine Learning and often asked in theory exams.
No comments:
Post a Comment