Saturday, June 13, 2026

Unit 5: Dimensionality Reduction, Genetic Algorithms & Reinforcement Learning

 

Machine Learning Techniques (MCA556)


From your syllabus.


Dimensionality Reduction

In Machine Learning, datasets may contain many features (columns).

Example:

Student Data
------------
Name
Age
Gender
Address
Attendance
Marks
Projects
Activities
...

Too many features can:

  • Increase training time
  • Increase memory usage
  • Cause overfitting

Dimensionality Reduction reduces the number of features while preserving important information.


Benefits

  • Faster computation
  • Less storage
  • Better visualization
  • Reduced overfitting

Principal Component Analysis (PCA)

Most important dimensionality reduction technique.

Purpose:

  • Convert many features into fewer important features.

Idea:

  • Preserve maximum variance.
  • Reduce dimensions.

Example:

100 Features
     ↓
PCA
     ↓
10 Important Features

Applications:

  • Face Recognition
  • Image Compression
  • Data Visualization

Linear Discriminant Analysis (LDA)

Used for:

  • Dimensionality Reduction
  • Classification

Difference:

PCA LDA
Unsupervised Supervised
Maximizes variance Maximizes class separation

Applications:

  • Face recognition
  • Pattern classification

Factor Analysis

Statistical method used to identify hidden factors affecting data.

Example:

Student Performance depends on:

  • Intelligence
  • Study Hours
  • Motivation

These hidden variables are called factors.

Applications:

  • Psychology
  • Market Research
  • Social Sciences

Independent Component Analysis (ICA)

Separates mixed signals into independent components.

Example:

Two people speaking simultaneously.

ICA can separate:

Mixed Audio
      ↓
ICA
      ↓
Speaker 1
Speaker 2

Applications:

  • Signal Processing
  • Medical Data Analysis
  • Audio Separation

Locally Linear Embedding (LLE)

Non-linear dimensionality reduction technique.

Assumption: Nearby data points remain nearby after transformation.

Used when data lies on a curved surface.

Applications:

  • Pattern recognition
  • Data visualization

Isomap

Isomap = Isometric Mapping

Advanced dimensionality reduction technique.

Purpose:

  • Preserve geometric structure of data.

Applications:

  • Image analysis
  • Visualization
  • Pattern recognition

Least Squares Optimization

Used to minimize prediction error.

Idea:

Find the best line that minimizes squared errors.

Linear Regression is based on Least Squares Optimization.


Evolutionary Learning

Inspired by biological evolution.

Key concepts:

  • Selection
  • Mutation
  • Crossover
  • Survival of the fittest

Used to solve optimization problems.


Genetic Algorithms (GA)

One of the most important evolutionary algorithms.

Inspired by natural selection.


Basic Terminology

Chromosome

A possible solution.


Population

Collection of chromosomes.


Fitness Function

Measures solution quality.

Higher fitness = Better solution.


Genetic Algorithm Steps

Step 1

Initialize Population

Generate random solutions.


Step 2

Evaluate Fitness

Check quality of each solution.


Step 3

Selection

Choose best solutions.


Step 4

Crossover

Combine parents to create offspring.

Parent A + Parent B
          ↓
       Child

Step 5

Mutation

Randomly modify genes.

Purpose:

  • Maintain diversity

Step 6

Replacement

Create next generation.

Repeat until optimal solution found.


Applications of Genetic Algorithms

  • Scheduling
  • Route Optimization
  • Machine Learning
  • Robotics
  • Engineering Design

Reinforcement Learning (RL)

Learning through rewards and punishments.

Agent learns by interacting with environment.


Components of RL

Agent

Learner.

Example: Robot


Environment

World around the agent.

Example: Road


Action

Decision taken by agent.

Example: Move Left


Reward

Feedback received.

Example:

Correct Action → +10
Wrong Action → -5

Reinforcement Learning Process

Agent
  ↓
Action
  ↓
Environment
  ↓
Reward
  ↓
Learning

Applications of Reinforcement Learning

  • Self-driving cars
  • Robotics
  • Game playing AI
  • Resource management

Markov Decision Process (MDP)

Mathematical framework for Reinforcement Learning.

An MDP contains:

  1. State (S)
  2. Action (A)
  3. Reward (R)
  4. Transition Probability (P)

Example of MDP

Robot Navigation:

State:
Current Position

Action:
Move Left / Right

Reward:
Reach Destination

Next State:
New Position

Markov Property

Future state depends only on the current state.

Not on previous history.


Important Exam Questions

Short Questions

  1. What is PCA?
  2. Define LDA.
  3. What is ICA?
  4. Define Isomap.
  5. What is Genetic Algorithm?
  6. What is Fitness Function?
  7. Define Reinforcement Learning.
  8. What is MDP?

Long Questions

  1. Explain PCA with advantages.
  2. Differentiate PCA and LDA.
  3. Explain Genetic Algorithm with steps.
  4. Discuss Evolutionary Learning.
  5. Explain Reinforcement Learning architecture.
  6. Explain Markov Decision Process.

Quick Revision

  • PCA = Reduce dimensions while preserving variance.
  • LDA = Reduce dimensions while separating classes.
  • ICA = Separate mixed signals.
  • Isomap = Preserve geometric structure.
  • GA = Optimization inspired by evolution.
  • Population = Collection of solutions.
  • Fitness Function = Quality measure.
  • RL = Learning through rewards.
  • MDP = Mathematical model for RL.
  • Markov Property = Future depends only on current state.

Machine Learning Techniques (MCA556) is now complete.

Next Subject Options

  1. .NET Framework with C# (MCA552)
  2. Compiler Design (MCA554)
  3. Optimization Techniques (MCA555)
  4. Advanced JavaScript (MCA557 B/C)

For exams, I would suggest Compiler Design next because it is usually considered the toughest paper and benefits from early preparation.

Unit 4: Decision Trees, CART, Ensemble Learning, Bagging, Boosting & Nearest Neighbour

 Machine Learning Techniques (MCA556)

From your syllabus. 

---

Learning with Trees

Decision Trees are one of the most popular machine learning algorithms.

They make decisions using a tree-like structure.

Example:

Study Hours?

      |

   > 5 Hours

      |

     Pass


   < 5 Hours

      |

     Fail



---

Components of Decision Tree


Root Node


Starting point of the tree.


Example:


Study Hours?



---


Internal Node


Represents a condition.


Example:


Attendance > 75%?



---


Leaf Node


Final prediction.


Example:


Pass

Fail



---


Advantages of Decision Trees


Easy to understand


Easy to visualize


Works with numerical and categorical data


Requires little data preparation




---


Disadvantages


Can overfit


Sensitive to data changes


Large trees become complex




---


Constructing Decision Trees


Steps:


1. Select best feature



2. Split dataset



3. Create branches



4. Repeat recursively



5. Stop when classification is complete





---


Classification and Regression Trees (CART)


CART stands for:


Classification And Regression Trees


Used for:


Classification


Output is a category.


Examples:


Pass/Fail


Spam/Not Spam




---


Regression


Output is a numerical value.


Examples:


Salary prediction


House price prediction




---


Ensemble Learning


Combining multiple models to create a stronger model.


Idea:


Weak Learners

      ↓

Combine

      ↓

Strong Learner


Benefits:


Higher accuracy


Better generalization


Reduced overfitting




---


Types of Ensemble Learning


Bagging


Boosting



---


Bagging (Bootstrap Aggregating)


Multiple models are trained independently.


Process:


Dataset

   ↓

Random Samples

   ↓

Many Models

   ↓

Voting/Average

   ↓

Final Prediction



---


Advantages of Bagging


Reduces variance


Prevents overfitting


Improves stability




---


Example


Random Forest


Most famous Bagging algorithm.


Random Forest:


Uses many Decision Trees


Final answer through voting




---


Boosting


Boosting improves weak models sequentially.


Idea:


Model 1

 ↓

Fix Mistakes

 ↓

Model 2

 ↓

Fix Mistakes

 ↓

Model 3

 ↓

Final Strong Model



---


Advantages of Boosting


High accuracy


Handles complex problems


Improves weak learners




---


Popular Boosting Algorithms


AdaBoost


Adaptive Boosting.



---


Gradient Boosting


Improves prediction by minimizing errors.



---


XGBoost


Most widely used boosting algorithm.


Applications:


Data science competitions


Industry projects




---


Difference Between Bagging and Boosting


Bagging Boosting


Models trained independently Models trained sequentially

Reduces variance Reduces bias

Faster Slower

Random Forest AdaBoost, XGBoost




---


Probability and Learning


Machine Learning often uses probability.


Probability helps:


Handle uncertainty


Make predictions


Estimate outcomes



Applications:


Spam filtering


Disease prediction


Recommendation systems




---


Data into Probabilities


Example:


80 students passed

20 students failed


Probability of passing:


80/100 = 0.8


or


80%



---


Basic Statistics


Statistics helps understand data.


Important terms:



---


Mean


Average value.


\bar{x}=\frac{\sum x}{n}



---


Median


Middle value after sorting.



---


Mode


Most frequent value.



---


Variance


Measures spread of data.


Variance=\frac{\sum (x-\bar{x})^2}{n}



---


Gaussian Mixture Models (GMM)


Advanced clustering algorithm.


Assumption: Data is generated from multiple Gaussian distributions.


Advantages:


Flexible cluster shapes


Better than K-Means in many cases



Applications:


Image processing


Speech recognition

Pattern recognition




---

Nearest Neighbour Methods

One of the simplest ML techniques.

Most common:

K-Nearest Neighbour (KNN)


Idea:


Find the K closest data points and classify based on neighbors.


Example:


New Student

     ↓

Find 5 nearest students

     ↓

Majority Vote

     ↓

Prediction

---

Advantages of KNN

Easy to understand

No training phase

Good for small datasets

---

Disadvantages of KNN

Slow for large datasets

Sensitive to irrelevant features

Requires choosing K value

---

Applications of KNN

Recommendation systems

Image recognition

Medical diagnosis

Pattern recognition

---

Important Exam Questions

Short Questions

1. What is a Decision Tree?

2. Define CART.

3. What is Ensemble Learning?

4. Define Bagging.

5. Define Boosting.

6. What is Random Forest?

7. What is KNN?

8. What is GMM?

---

Long Questions

1. Explain Decision Tree construction.

2. Discuss CART with examples.

3. Explain Ensemble Learning.

4. Differentiate Bagging and Boosting.

5. Explain KNN algorithm.

6. Explain Gaussian Mixture Models.

---

Quick Revision

Decision Tree = Tree-based prediction model.

CART = Classification and Regression Trees.

Ensemble Learning = Combining multiple models.

Bagging = Independent model training.

Random Forest = Bagging-based algorithm.

Boosting = Sequential improvement of models.

KNN = Nearest neighbour classification.

GMM = Advanced clustering model.

Next Unit 5:


PCA, LDA, Factor Analysis, ICA, Isomap, Genetic Algorithms, Evolutionary Learning, Reinforcement Learning, Markov Decision Process (MDP) — the final unit of Machine Learning and often asked in theory exams. 

Unit 3: Logistic Regression, SVM, Neural Networks & Deep Learning

 

Machine Learning Techniques (MCA556)


From your syllabus.


Supervised Learning

In supervised learning:

  • Input data is given
  • Correct output (label) is known
  • Model learns relationship between input and output

Examples:

  • Spam detection
  • Disease prediction
  • Student result prediction

Logistic Regression

Used for classification problems.

Unlike Linear Regression, Logistic Regression predicts categories.

Examples:

  • Pass / Fail
  • Spam / Not Spam
  • Yes / No

Sigmoid Function

Logistic Regression uses the Sigmoid Function.

Output range:

0 to 1

Interpretation:

  • Close to 1 → Positive Class
  • Close to 0 → Negative Class

Applications of Logistic Regression

  • Email spam detection
  • Disease diagnosis
  • Loan approval
  • Fraud detection

Support Vector Machine (SVM)

SVM is a powerful classification algorithm.

Goal:

  • Find the best boundary that separates classes.

Example:

Students
Pass  ● ● ● ●

-----------
Boundary

○ ○ ○ ○
Fail

The boundary is called a:

Hyperplane


Advantages of SVM

  • High accuracy
  • Effective in high dimensions
  • Works well with small datasets

Kernel Function

Sometimes data cannot be separated by a straight line.

Kernel functions transform data into higher dimensions.

Types:

Linear Kernel

Used for linearly separable data.


Polynomial Kernel

Creates curved boundaries.


Radial Basis Function (RBF)

Most commonly used kernel.


Sigmoid Kernel

Similar to neural networks.


Neural Network

Inspired by the human brain.

Consists of:

Input Layer
      ↓
Hidden Layer
      ↓
Output Layer

Used for:

  • Classification
  • Prediction
  • Pattern recognition

Artificial Neuron

Basic building block of neural networks.

Components:

  1. Inputs
  2. Weights
  3. Summation
  4. Activation Function
  5. Output

Perceptron

Simplest neural network model.

Developed by:

Structure:

Inputs
  ↓
Weights
  ↓
Activation
  ↓
Output

Used for binary classification.


Limitations of Perceptron

Cannot solve complex non-linear problems.

Example:

  • XOR problem

Multilayer Neural Network

Contains multiple hidden layers.

Input
 ↓
Hidden Layer 1
 ↓
Hidden Layer 2
 ↓
Output

Advantages:

  • Handles complex patterns
  • Better prediction

Backpropagation

Most important neural network learning algorithm.

Purpose:

  • Update weights
  • Reduce prediction error

Steps:

Step 1

Forward Pass

Prediction is generated.


Step 2

Calculate Error

Difference between actual and predicted values.


Step 3

Backward Pass

Error travels backward.


Step 4

Update Weights

Model learns and improves.


Activation Functions

Used to introduce non-linearity.

Sigmoid

Output between 0 and 1.


Tanh

Output between -1 and 1.


ReLU

Most popular activation function.

Advantages:

  • Fast
  • Efficient

Deep Neural Network (DNN)

Neural network with many hidden layers.

Input
 ↓
Hidden Layer
 ↓
Hidden Layer
 ↓
Hidden Layer
 ↓
Output

Deep Learning

Branch of Machine Learning using Deep Neural Networks.

Applications:

Image Recognition

Face detection

Speech Recognition

Voice assistants

Natural Language Processing

ChatGPT, translation systems

Self Driving Cars

Object detection


Difference Between ML and Deep Learning

Machine Learning Deep Learning
Less data needed Large data needed
Faster training Slower training
Manual feature extraction Automatic feature extraction
Simpler models Complex neural networks

Important Exam Questions

Short Questions

  1. Define Logistic Regression.
  2. What is SVM?
  3. Define Hyperplane.
  4. What is a Kernel Function?
  5. Define Perceptron.
  6. What is Backpropagation?
  7. What is Deep Learning?
  8. What is ReLU?

Long Questions

  1. Explain Logistic Regression with Sigmoid Function.
  2. Explain SVM and Kernel Functions.
  3. Discuss Neural Networks and their architecture.
  4. Explain Perceptron and its limitations.
  5. Explain Backpropagation Algorithm.
  6. Differentiate Machine Learning and Deep Learning.

Quick Revision

  • Logistic Regression = Classification algorithm.
  • Sigmoid Function = Converts output to probability.
  • SVM = Finds best separating boundary.
  • Hyperplane = Decision boundary.
  • Kernel = Converts data to higher dimensions.
  • Perceptron = Basic neural network.
  • Backpropagation = Weight update algorithm.
  • ReLU = Most popular activation function.
  • Deep Learning = Neural networks with many layers.

Next Unit 4:

Decision Trees, CART, Ensemble Learning, Bagging, Boosting, Probability & Learning, Gaussian Mixture Models, Nearest Neighbour Methods. This unit is frequently asked in university exams.

Unit 2: Evaluation Metrics, K-Means, Bayes Learning, Clustering & Feature Reduction

 



From your syllabus.


Evaluation Metrics

Evaluation metrics help us measure how good a machine learning model is.


Confusion Matrix

Used for classification problems.

Actual / Predicted Positive Negative
Positive TP FN
Negative FP TN

Where:

  • TP = True Positive
  • TN = True Negative
  • FP = False Positive
  • FN = False Negative

Precision

Measures how many predicted positives are actually correct.

Example: If 100 emails are predicted as spam and 90 are actually spam:

Precision = 90%


Recall

Measures how many actual positives were correctly identified.

Example: Out of 100 spam emails, if system detects 80:

Recall = 80%


F1 Score

Balance between Precision and Recall.

Higher F1 Score means better model.


Mean Squared Error (MSE)

Used in regression models.

Measures average squared prediction error.

Smaller MSE = Better model.


Flexibility vs Interpretability

Flexible Models

Examples:

  • Neural Networks
  • Deep Learning

Advantages:

  • High accuracy

Disadvantages:

  • Hard to understand

Interpretable Models

Examples:

  • Linear Regression
  • Decision Trees

Advantages:

  • Easy to understand

Disadvantages:

  • Sometimes less accurate

Reducible and Irreducible Error

Reducible Error

Can be reduced by:

  • Better data
  • Better algorithms

Irreducible Error

Cannot be eliminated.

Caused by:

  • Randomness
  • Noise in data

Unsupervised Learning

Learning from unlabeled data.

Goal:

  • Discover hidden patterns

K-Means Clustering

Most important clustering algorithm.

Purpose:

  • Divide data into K groups.

Steps

  1. Select K clusters.
  2. Choose initial centroids.
  3. Assign points to nearest centroid.
  4. Update centroid positions.
  5. Repeat until stable.

Example:

Students grouped by marks:
Cluster 1 → High Performers
Cluster 2 → Average
Cluster 3 → Low Performers

Advantages:

  • Simple
  • Fast

Disadvantages:

  • Need to choose K beforehand

Vector Quantization

Technique for compressing data.

Applications:

  • Image compression
  • Signal processing

Self Organizing Feature Map (SOFM)

Neural network used for:

  • Visualization
  • Clustering
  • Pattern recognition

Developed by:

Also called: Kohonen Map


Instance Based Learning

Stores training examples and compares new examples.

Example:

  • K-Nearest Neighbour (KNN)

Advantages:

  • Simple

Disadvantages:

  • Slow for large datasets

Feature Reduction

Reducing the number of features while keeping important information.

Benefits:

  • Faster training
  • Reduced storage
  • Less overfitting

Probability in Machine Learning

Probability measures uncertainty.

Range:

0 ≤ Probability ≤ 1
  • 0 = Impossible
  • 1 = Certain

Bayes Learning

Based on Bayes Theorem.

Most important probability concept in ML.

Used in:

  • Spam detection
  • Disease prediction
  • Recommendation systems

Clustering

Grouping similar data points.

Applications:

  • Customer segmentation
  • Image processing
  • Market analysis

Adaptive Hierarchical Clustering

Creates clusters in tree form.

Types:

Agglomerative

Start with individual points and merge.

Divisive

Start with one cluster and split.


Gaussian Mixture Model (GMM)

Advanced clustering technique.

Assumes data is generated from multiple Gaussian distributions.

Advantages:

  • Flexible clusters
  • Better than K-Means for complex data

Applications:

  • Pattern recognition
  • Speech processing
  • Image segmentation

Important Exam Questions

Short Questions

  1. Define Precision.
  2. Define Recall.
  3. What is F1 Score?
  4. What is MSE?
  5. What is K-Means?
  6. What is Feature Reduction?
  7. State Bayes Theorem.
  8. What is GMM?

Long Questions

  1. Explain Precision, Recall and F1 Score.
  2. Explain K-Means Clustering with steps.
  3. Discuss Bayes Learning.
  4. Explain Gaussian Mixture Models.
  5. Explain Feature Reduction.
  6. Compare K-Means and Hierarchical Clustering.

Quick Revision

  • Precision = Correct positive predictions.
  • Recall = Found actual positives.
  • F1 Score = Balance of Precision and Recall.
  • MSE = Regression error measure.
  • K-Means = Popular clustering algorithm.
  • Bayes Theorem = Probability-based learning.
  • GMM = Advanced clustering method.
  • Feature Reduction = Fewer but important features.

Next Unit 3:

Logistic Regression, Support Vector Machine (SVM), Kernel Functions, Perceptron, Neural Networks, Backpropagation, Deep Neural Networks — the most important ML unit for exams and interviews.