Files
DMCT-NOTES/unit 2/02_Data_Science_Process.md
Akshat Mehta 8f8e35ae95 unit 2 added
2025-11-24 15:26:41 +05:30

1.5 KiB

Standard Process for Data Science (CRISP-DM)

CRISP-DM stands for Cross Industry Standard Process for Data Mining. It is a standard way to do data mining projects.

It has 6 Phases:

1. Business Understanding

Goal: Define what problem we are trying to solve.

  • Example: An online retailer wants to classify items as "High Demand" or "Low Demand".
  • Questions: Is item type related to demand? Can we predict demand accurately?

2. Data Understanding

Goal: Get to know the data.

  • Example: Looking at the inventory data (orders, item type).
  • Insight: Knowing if items are perishable (like milk) or non-perishable helps understand stock needs.

3. Data Preparation

Goal: Clean and format the data for the model.

  • Steps:
    • Handle missing values.
    • Convert categories to numbers (dummy encoding).
    • Check for connections (correlation) between variables.

4. Modeling

Goal: Build the machine learning model.

  • We try to find a function that connects inputs (like number of orders) to the output (demand).
  • We might try different models to find the best one.

5. Evaluation

Goal: Check how good the model is.

  • We test the model on unseen data (data it hasn't seen before).
  • We compare the predicted values with the actual values.

6. Deployment

Goal: Use the model in the real world.

  • If the model is good, we put it to work.
  • Example: Create an app where the retailer enters item details and gets a demand prediction.