Lead Scoring Model for Education institute

Date:

visit the GitHub Repository.

Lead Scoring Model for Education institute

Table of Contents

  1. Project Overview
  2. Approach
    1. Data Understanding and Cleaning
    2. Exploratory Data Analysis (EDA)
    3. Feature Engineering
    4. Model Building and Evaluation
  3. Key Learnings
  4. Conclusion
  5. Further Reading
  6. Co-Authors

Project Overview

This project involves developing a lead scoring model for X Education to identify potential customers with a higher likelihood of conversion. The goal is to achieve a lead conversion rate of around 80%.

Approach

The approach was divided into four key stages:

  1. Data Understanding and Cleaning
  2. Exploratory Data Analysis (EDA)
  3. Feature Engineering
  4. Model Building and Evaluation

Part 1: Data Understanding and Cleaning

  • Objective: Understanding the dataset’s structure and cleaning the data.
  • Tasks Performed:
    • Handling missing values.
    • Dropping irrelevant columns.
    • Addressing data inconsistencies.

Part 2: Exploratory Data Analysis (EDA)

  • Objective: Gaining insights through visualizations and statistical summaries.
  • Key Findings:
    • Importance of features like total visits, time spent on the website, and page views in predicting conversion.

Part 3: Feature Engineering

  • Objective: Enhancing the model’s predictive power through feature transformation and creation.
  • Tasks Performed:
    • Encoding categorical variables.
    • Scaling numerical features.
    • Handling outliers.
    • Splitting the dataset into training and testing sets.

Part 4: Model Building and Evaluation

  • Model Used: Logistic Regression.
  • Performance:
    • Achieved an accuracy of approximately 79.05% on the test set.
    • Evaluated using sensitivity, specificity, precision, and the precision-recall curve.
    • Identified areas for improvement in recall.

Key Learnings

  • Data Preprocessing: Essential for accurate modeling.
  • Exploratory Data Analysis (EDA): Crucial for understanding the dataset and feature selection.
  • Feature Engineering: Significantly impacts model performance.
  • Model Selection and Evaluation: Vital for achieving desired outcomes.
  • Interpretability and Explainability: Important for providing actionable insights.
  • Precision-Recall Trade-off: Aids in selecting an optimal threshold for the model.

Conclusion

The project successfully developed a lead scoring model for X Education, employing a systematic approach and best practices in data preprocessing and modeling.

Further Reading

For more details, visit the GitHub Repository.

Co-Authors

  • Ananth Ram
  • Rubina D Souza