unit 2 added
This commit is contained in:
37
unit 2/02_Data_Science_Process.md
Normal file
37
unit 2/02_Data_Science_Process.md
Normal file
@@ -0,0 +1,37 @@
|
||||
# Standard Process for Data Science (CRISP-DM)
|
||||
|
||||
**CRISP-DM** stands for **Cr**oss **I**ndustry **S**tandard **P**rocess for **D**ata **M**ining. It is a standard way to do data mining projects.
|
||||
|
||||
It has **6 Phases**:
|
||||
|
||||
## 1. Business Understanding
|
||||
**Goal**: Define what problem we are trying to solve.
|
||||
- **Example**: An online retailer wants to classify items as "High Demand" or "Low Demand".
|
||||
- **Questions**: Is item type related to demand? Can we predict demand accurately?
|
||||
|
||||
## 2. Data Understanding
|
||||
**Goal**: Get to know the data.
|
||||
- **Example**: Looking at the inventory data (orders, item type).
|
||||
- **Insight**: Knowing if items are perishable (like milk) or non-perishable helps understand stock needs.
|
||||
|
||||
## 3. Data Preparation
|
||||
**Goal**: Clean and format the data for the model.
|
||||
- **Steps**:
|
||||
- Handle missing values.
|
||||
- Convert categories to numbers (dummy encoding).
|
||||
- Check for connections (correlation) between variables.
|
||||
|
||||
## 4. Modeling
|
||||
**Goal**: Build the machine learning model.
|
||||
- We try to find a function that connects inputs (like number of orders) to the output (demand).
|
||||
- We might try different models to find the best one.
|
||||
|
||||
## 5. Evaluation
|
||||
**Goal**: Check how good the model is.
|
||||
- We test the model on **unseen data** (data it hasn't seen before).
|
||||
- We compare the **predicted** values with the **actual** values.
|
||||
|
||||
## 6. Deployment
|
||||
**Goal**: Use the model in the real world.
|
||||
- If the model is good, we put it to work.
|
||||
- **Example**: Create an app where the retailer enters item details and gets a demand prediction.
|
||||
Reference in New Issue
Block a user