Saturday, June 13, 2026

Unit 4: Decision Trees, CART, Ensemble Learning, Bagging, Boosting & Nearest Neighbour

 Machine Learning Techniques (MCA556)

From your syllabus. 

---

Learning with Trees

Decision Trees are one of the most popular machine learning algorithms.

They make decisions using a tree-like structure.

Example:

Study Hours?

      |

   > 5 Hours

      |

     Pass


   < 5 Hours

      |

     Fail



---

Components of Decision Tree


Root Node


Starting point of the tree.


Example:


Study Hours?



---


Internal Node


Represents a condition.


Example:


Attendance > 75%?



---


Leaf Node


Final prediction.


Example:


Pass

Fail



---


Advantages of Decision Trees


Easy to understand


Easy to visualize


Works with numerical and categorical data


Requires little data preparation




---


Disadvantages


Can overfit


Sensitive to data changes


Large trees become complex




---


Constructing Decision Trees


Steps:


1. Select best feature



2. Split dataset



3. Create branches



4. Repeat recursively



5. Stop when classification is complete





---


Classification and Regression Trees (CART)


CART stands for:


Classification And Regression Trees


Used for:


Classification


Output is a category.


Examples:


Pass/Fail


Spam/Not Spam




---


Regression


Output is a numerical value.


Examples:


Salary prediction


House price prediction




---


Ensemble Learning


Combining multiple models to create a stronger model.


Idea:


Weak Learners

      ↓

Combine

      ↓

Strong Learner


Benefits:


Higher accuracy


Better generalization


Reduced overfitting




---


Types of Ensemble Learning


Bagging


Boosting



---


Bagging (Bootstrap Aggregating)


Multiple models are trained independently.


Process:


Dataset

   ↓

Random Samples

   ↓

Many Models

   ↓

Voting/Average

   ↓

Final Prediction



---


Advantages of Bagging


Reduces variance


Prevents overfitting


Improves stability




---


Example


Random Forest


Most famous Bagging algorithm.


Random Forest:


Uses many Decision Trees


Final answer through voting




---


Boosting


Boosting improves weak models sequentially.


Idea:


Model 1

 ↓

Fix Mistakes

 ↓

Model 2

 ↓

Fix Mistakes

 ↓

Model 3

 ↓

Final Strong Model



---


Advantages of Boosting


High accuracy


Handles complex problems


Improves weak learners




---


Popular Boosting Algorithms


AdaBoost


Adaptive Boosting.



---


Gradient Boosting


Improves prediction by minimizing errors.



---


XGBoost


Most widely used boosting algorithm.


Applications:


Data science competitions


Industry projects




---


Difference Between Bagging and Boosting


Bagging Boosting


Models trained independently Models trained sequentially

Reduces variance Reduces bias

Faster Slower

Random Forest AdaBoost, XGBoost




---


Probability and Learning


Machine Learning often uses probability.


Probability helps:


Handle uncertainty


Make predictions


Estimate outcomes



Applications:


Spam filtering


Disease prediction


Recommendation systems




---


Data into Probabilities


Example:


80 students passed

20 students failed


Probability of passing:


80/100 = 0.8


or


80%



---


Basic Statistics


Statistics helps understand data.


Important terms:



---


Mean


Average value.


\bar{x}=\frac{\sum x}{n}



---


Median


Middle value after sorting.



---


Mode


Most frequent value.



---


Variance


Measures spread of data.


Variance=\frac{\sum (x-\bar{x})^2}{n}



---


Gaussian Mixture Models (GMM)


Advanced clustering algorithm.


Assumption: Data is generated from multiple Gaussian distributions.


Advantages:


Flexible cluster shapes


Better than K-Means in many cases



Applications:


Image processing


Speech recognition

Pattern recognition




---

Nearest Neighbour Methods

One of the simplest ML techniques.

Most common:

K-Nearest Neighbour (KNN)


Idea:


Find the K closest data points and classify based on neighbors.


Example:


New Student

     ↓

Find 5 nearest students

     ↓

Majority Vote

     ↓

Prediction

---

Advantages of KNN

Easy to understand

No training phase

Good for small datasets

---

Disadvantages of KNN

Slow for large datasets

Sensitive to irrelevant features

Requires choosing K value

---

Applications of KNN

Recommendation systems

Image recognition

Medical diagnosis

Pattern recognition

---

Important Exam Questions

Short Questions

1. What is a Decision Tree?

2. Define CART.

3. What is Ensemble Learning?

4. Define Bagging.

5. Define Boosting.

6. What is Random Forest?

7. What is KNN?

8. What is GMM?

---

Long Questions

1. Explain Decision Tree construction.

2. Discuss CART with examples.

3. Explain Ensemble Learning.

4. Differentiate Bagging and Boosting.

5. Explain KNN algorithm.

6. Explain Gaussian Mixture Models.

---

Quick Revision

Decision Tree = Tree-based prediction model.

CART = Classification and Regression Trees.

Ensemble Learning = Combining multiple models.

Bagging = Independent model training.

Random Forest = Bagging-based algorithm.

Boosting = Sequential improvement of models.

KNN = Nearest neighbour classification.

GMM = Advanced clustering model.

Next Unit 5:


PCA, LDA, Factor Analysis, ICA, Isomap, Genetic Algorithms, Evolutionary Learning, Reinforcement Learning, Markov Decision Process (MDP) — the final unit of Machine Learning and often asked in theory exams. 

No comments:

Post a Comment