 Excel Basics and Excel for Data Analysis

•  Introduction to MS Excel
•  Why MS Excel
•  Functionalities for a Data Scientist
•  Using Excel
•  Data Analysis in Excel
•  Basic Data Manipulation Functions
•  Mean, Maximum, Round, Sum etc.
•  Statical Functions
•  Filter, Sort, Lookup in Excel
•  Using Pivots in Excel
•  Creating Pivot Tables and Charts
•  Usage of Slicers
•  Plotting in Excel – Usage of Visualization Capabilities

R Basics and R for Data Analysis

•  Introduction to Programming
•  Introduction to R and RStudio
•  What is R?
•  What is Open Source?
•  Capabilities of R
•  GUI for R
•  R IDE -   RStudio
•  Installation and Functioning of R and Rstudio
•  Using R
•  R interface
•  R Session
•  R Console
•  Getting Help
•  Entering and Running Commands/Programs
•  Programming in R
•  Data Types
•  Operators in R
•  Data Input and Output
•  R Data Frames
•  R statistics – Mean, Median, Mode etc.
•  Data Manipulation in R – Counting, Merging, Append, Sort, Subset, Filter, New Variable Creation etc.
•  R Logical Statements - If/ else, Loops etc.
•  Plotting- Graphs and Charts
•  Packages in R- Details of the most commonly used packages
•  Functions in R (High Level)
•  R- Best Practices

Statistics

•  What is Statistics
•  Data Types
•  Qualitative vs. Quantitative
•  Basic Operations Based on Data Type
•  Variables
•  Measurement Scales
•  Measures of Variance
•  Measures of Central Tendency
•  Correlation vs. Causation (Correlational vs. Experimental Research
•  Sampling – Usage of Sampling
•  Distributions
•  Normal Distribution
•  Why the "Normal distribution" is Important
•  Illustration of How the Normal Distribution is Used in Statistical Reasoning
•  Characteristics
•  Standard Normal Distribution
•  Central Limit Theorem
•  Hypothesis Testing
•  What is Hypothesis Testing?
•  The magic of Hypothesis Testing
•  Null and Alternate Hypothesis
•  P Value.
•  Usage of Hypothesis Testing in Business Problems
•  Explanation of Hypothesis Testing Using Real World Example
•  Types of Hypothesis Testing
•  Z test
•  T test
•  Chi Square test
•  Introduction to ANOVA and Basics of Regression/Classification

Linear Regression Topic Details

•  Introduction to Simple Linear Regression
•  Graphical Understanding of Regression (Scatter Plot, Box Plot, Density Plot)
•  Example Problem and Mathematics behind Regression
•  Assumptions for Linear Regression
•  Correlation (Linear and Non Linear
•  Introduction to Multiple Linear Regression.
•  Building A Regression Model (Steps to Establish a Regression
•  Data Preparation – Data Audit, Missing Value and Outliers
•  Building the model
•  Linear Regression – Interpretation of  Output and Diagnostics
•  Assessing the Coefficients
•  P Value - Checking for Statistical Significance
•  R-Square and Adjusted R Squared
•  Standard Error and F-Statistic
•  How to Know if the Model is Best Fit for Your Data?
•  Using Linear Model for Predictions
•  Checking Accuracy and Error Rates
•  Model Improvement
•  Over-fitting and Cross Validation
•  Multicollinearity and VIF
•  Do it Yourself Case
•  Flavor of Advanced Regression Models

Logistic Regression

•  Why Logistic Regression
•  Introduction to Classification and Challenges with Linear Regression
•  Event Rate and Class Bias
•  Example Problem (Some real world examples of Binary Classification problems),Mechanics and Mathematics behind Logistic Regression
•  Assumptions for Logistic Regression
•  Building a Logistic Regression Model.
•  Data Preparation – Data Audit, Missing Value and Outliers.
•  Variable Importance and Feature Extraction
•  Create WOE for Categorical Variables
•  Compute Information Value
•  Compute Information Value.
•  OMulticollinearity (VIF)
•  Building Logit Models
•  Predictions
•  Logistic  Regression – Interpretation of  Output
•  Coefficients
•  Variable Importance
•  Model Diagnostics
•  Misclassification Error and Confusion Matrix
•  ROC Curve
•  Accuracy
•  Specificity, Sensitivity and F Score/li>
•  Lift/Gain Charts and KS Curve
•  Model Improvement
•  Over-fitting and Cross Validation
•  Flavor of Advanced Classification Concepts – Classification of Unstructured Data
•  Do it Yourself Case

Time Series Modeling

•  Introduction to Time Series
•  Difference between Time Series, Cross-Sectional and Pooled Data
•  Example Problem (Some real world examples of Time Series Problems), Mechanics and Fundamental of Mathematics behind  Time series Analysis
•  Assumptions for Time Series analysis
•  Understanding Time Series Data
•  Visualizing Time Series Data
•  Stationary vs. No Stationary Data.
•  Trend vs Seasonality vs White Noise
•  Decomposing Time Series Data
•  Decomposing Non-Seasonal Data
•  Decomposing Seasonal Data
•  Forecasts using Exponential Smoothing
•  Simple Exponential Smoothing
•  Holt’s Exponential Smoothing
•  Holt-Winters Exponential Smoothing
•  ARIMA Models
•  Concept of Auto-Correlation and Partial Auto Correlation
•  Differencing a Time Series
•  Selecting a Candidate ARIMA Model
•  Forecasting Using an ARIMA Model
•  Predictions and Diagnostics
•  Do it Yourself Case

•  Supervised, Unsupervised and Semi-supervised Algorithms
•  Concept of a Recommendation Engine
•  Example Problem (Real world examples of MBA applications
•  MBA Hyper Parameters
•  Lift
•  Confidance
•  Support
•  Generating output using Association rules
•  Filtration of Rules
•  Removal of Redundant Rules
•  Control the Rules
•  Finding rules for Particular Entity
•  Visualizing Rules
•  Challenges with Association Rules and Ways to Overcome
•  Do it Yourself Case

Decision Trees

•  Type of Classification Algorithms
•  Fundamentals of Tree bases Systems
•  Decision Boundary of Tree based Algorithms
•  Types of Tree Algorithms
•  C4.5
•  CHAID
•  CART
•  Concept of Impurity Measure
•  GINI
•  Chi Square
•  Building a Decision Tree Model
•  Data Preparation – Data Audit, Missing Value and Outliers
•  Creating Test and Training Samples
•  Variable Importance and Feature Extraction
•  Prediction using Decision Trees
•  Over fitting and Cross Validation
•  Flavor of Advanced Concepts in Trees (Random Forests)

K-Means Clustering

•  Unsupervised Algorithms and Introduction to Clustering
•  Intro to Distance based Algorithms
•  Example Problem (Some real world examples of Clustering Applications
•  Assumptions for Clustering
•  Mechanics of Clustering
•  Type of Measure- Euclidean, Manhattan, Jaccard
•  How are Clusters Generated
•  Creating Clusters
•  Standardization of Inputs
•  Deciding the Number of Clusters – Elbow Curve and Silhouette Distance
•  Understanding the Output
•  Cluster Diagnosis
•  Profiling

Soft Skill Training

•  Interpersonal Communication
•  Introduction of Interpersonal Relations
•  Johari Windows
•  Listening skills
•  Closure
•  Presentation skills I and II
•  Different Stages of Making a Presentation –  Analysis, Thinking, Data, Creativity Needed at Each Stage
•  Plan before Execution
•  Plan Speech – Structures of Presentation, Numerical Prsentation, Visual Aids
•  Requirements for Impactful Delivery
•  Email and virtual communication
•  Intent in Virtual Communication – NTPL Activity
• Basics of Email Etiquette + Case Sudies
•  Digital Empathy
•  Closure
•  Assertiveness & Communication
•  Introduction to Assertiveness
•  Assertive, Non-assertive, and Aggressive (Inner Dynamics and Outer Behavior)
•  Reflection Sheet
•  Assertiveness Technique and Role Plays
•  Recap
•  Interview Skills