From your syllabus.
Evaluation Metrics
Evaluation metrics help us measure how good a machine learning model is.
Confusion Matrix
Used for classification problems.
| Actual / Predicted | Positive | Negative |
|---|---|---|
| Positive | TP | FN |
| Negative | FP | TN |
Where:
- TP = True Positive
- TN = True Negative
- FP = False Positive
- FN = False Negative
Precision
Measures how many predicted positives are actually correct.
Example: If 100 emails are predicted as spam and 90 are actually spam:
Precision = 90%
Recall
Measures how many actual positives were correctly identified.
Example: Out of 100 spam emails, if system detects 80:
Recall = 80%
F1 Score
Balance between Precision and Recall.
Higher F1 Score means better model.
Mean Squared Error (MSE)
Used in regression models.
Measures average squared prediction error.
Smaller MSE = Better model.
Flexibility vs Interpretability
Flexible Models
Examples:
- Neural Networks
- Deep Learning
Advantages:
- High accuracy
Disadvantages:
- Hard to understand
Interpretable Models
Examples:
- Linear Regression
- Decision Trees
Advantages:
- Easy to understand
Disadvantages:
- Sometimes less accurate
Reducible and Irreducible Error
Reducible Error
Can be reduced by:
- Better data
- Better algorithms
Irreducible Error
Cannot be eliminated.
Caused by:
- Randomness
- Noise in data
Unsupervised Learning
Learning from unlabeled data.
Goal:
- Discover hidden patterns
K-Means Clustering
Most important clustering algorithm.
Purpose:
- Divide data into K groups.
Steps
- Select K clusters.
- Choose initial centroids.
- Assign points to nearest centroid.
- Update centroid positions.
- Repeat until stable.
Example:
Students grouped by marks:
Cluster 1 → High Performers
Cluster 2 → Average
Cluster 3 → Low Performers
Advantages:
- Simple
- Fast
Disadvantages:
- Need to choose K beforehand
Vector Quantization
Technique for compressing data.
Applications:
- Image compression
- Signal processing
Self Organizing Feature Map (SOFM)
Neural network used for:
- Visualization
- Clustering
- Pattern recognition
Developed by:
Also called: Kohonen Map
Instance Based Learning
Stores training examples and compares new examples.
Example:
- K-Nearest Neighbour (KNN)
Advantages:
- Simple
Disadvantages:
- Slow for large datasets
Feature Reduction
Reducing the number of features while keeping important information.
Benefits:
- Faster training
- Reduced storage
- Less overfitting
Probability in Machine Learning
Probability measures uncertainty.
Range:
0 ≤ Probability ≤ 1
- 0 = Impossible
- 1 = Certain
Bayes Learning
Based on Bayes Theorem.
Most important probability concept in ML.
Used in:
- Spam detection
- Disease prediction
- Recommendation systems
Clustering
Grouping similar data points.
Applications:
- Customer segmentation
- Image processing
- Market analysis
Adaptive Hierarchical Clustering
Creates clusters in tree form.
Types:
Agglomerative
Start with individual points and merge.
Divisive
Start with one cluster and split.
Gaussian Mixture Model (GMM)
Advanced clustering technique.
Assumes data is generated from multiple Gaussian distributions.
Advantages:
- Flexible clusters
- Better than K-Means for complex data
Applications:
- Pattern recognition
- Speech processing
- Image segmentation
Important Exam Questions
Short Questions
- Define Precision.
- Define Recall.
- What is F1 Score?
- What is MSE?
- What is K-Means?
- What is Feature Reduction?
- State Bayes Theorem.
- What is GMM?
Long Questions
- Explain Precision, Recall and F1 Score.
- Explain K-Means Clustering with steps.
- Discuss Bayes Learning.
- Explain Gaussian Mixture Models.
- Explain Feature Reduction.
- Compare K-Means and Hierarchical Clustering.
Quick Revision
- Precision = Correct positive predictions.
- Recall = Found actual positives.
- F1 Score = Balance of Precision and Recall.
- MSE = Regression error measure.
- K-Means = Popular clustering algorithm.
- Bayes Theorem = Probability-based learning.
- GMM = Advanced clustering method.
- Feature Reduction = Fewer but important features.
Next Unit 3:
Logistic Regression, Support Vector Machine (SVM), Kernel Functions, Perceptron, Neural Networks, Backpropagation, Deep Neural Networks — the most important ML unit for exams and interviews.
No comments:
Post a Comment