Course Outline
Machine Learning Introduction
- Types of machine learning – supervised vs unsupervised
- From statistical learning to machine learning
- The data mining workflow: business understanding, data preparation, modeling, deployment
- Choosing the right algorithm for the task
- Overfitting and the bias-variance tradeoff
Python and ML Libraries Overview
- Why use programming languages for ML
- Choosing between R and Python
- Python crash course and Jupyter Notebooks
- Python libraries: pandas, NumPy, scikit-learn, matplotlib, seaborn
Testing and Evaluating ML Algorithms
- Generalization, overfitting, and model validation
- Evaluation strategies: holdout, cross-validation, bootstrapping
- Metrics for regression: ME, MSE, RMSE, MAPE
- Metrics for classification: accuracy, confusion matrix, unbalanced classes
- Model performance visualization: profit curve, ROC curve, lift curve
- Model selection and grid search for tuning
Data Preparation
- Data import and storage in Python
- Exploratory analysis and summary statistics
- Handling missing values and outliers
- Standardization, normalization, and transformation
- Qualitative data recoding and data wrangling with pandas
Classification Algorithms
- Binary vs multiclass classification
- Logistic regression and discriminant functions
- Naïve Bayes, k-nearest neighbors
- Decision trees: CART, Random Forests, Bagging, Boosting, XGBoost
- Support Vector Machines and kernels
- Ensemble learning techniques
Regression and Numerical Prediction
- Least squares and variable selection
- Regularization methods: L1, L2
- Polynomial regression and nonlinear models
- Regression trees and splines
Unsupervised Learning
- Clustering techniques: k-means, k-medoids, hierarchical clustering, SOMs
- Dimensionality reduction: PCA, factor analysis, SVD
- Multidimensional scaling
Text Mining
- Text preprocessing and tokenization
- Bag-of-words, stemming, and lemmatization
- Sentiment analysis and word frequency
- Visualizing text data with word clouds
Recommendation Systems
- User-based and item-based collaborative filtering
- Designing and evaluating recommendation engines
Association Pattern Mining
- Frequent itemsets and Apriori algorithm
- Market basket analysis and lift ratio
Outlier Detection
- Extreme value analysis
- Distance-based and density-based methods
- Outlier detection in high-dimensional data
Machine Learning Case Study
- Understanding the business problem
- Data preprocessing and feature engineering
- Model selection and parameter tuning
- Evaluation and presentation of findings
- Deployment
Summary and Next Steps
Requirements
- Basic understanding of statistics and linear algebra
- Familiarity with data analysis or business intelligence concepts
- Some exposure to programming (preferably Python or R) is recommended
- Interest in learning applied machine learning for data-driven projects
Audience
- Data analysts and scientists
- Statisticians and research professionals
- Developers and IT professionals exploring machine learning tools
- Anyone involved in data science or predictive analytics projects
Testimonials (3)
Even with having to miss a day due to customer meetings, I feel I have a much clearer understanding of the processes and techniques used in Machine Learning and when I would use one approach over another. Our challenge now is to practice what we have learned and start to apply it to our problem domain
Richard Blewett - Rock Solid Knowledge Ltd
Course - Machine Learning – Data science
I like that training was focused on examples and coding. I thought that it is impossible to pack so much content into three days of training, but I was wrong. Training covered many topics and everything was done in a very detailed manner (especially tuning of model's parameters - I didn't expected that there will be a time for this and I was gratly surprised).
Bartosz Rosiek - GE Medical Systems Polska Sp. Zoo
Course - Machine Learning – Data science
It is showing many methods with pre prepared scripts- very nicely prepared materials & easy to traceback