Files
DMCT-NOTES/unit 1/06_Data_Discretization.md
2025-11-24 16:55:19 +05:30

28 lines
937 B
Markdown

# Data Discretization
**Data Discretization** is the process of converting a large number of continuous values into a smaller number of finite intervals (bins).
## Why use it?
- Makes data easier to understand.
- Many algorithms work better with categories than infinite numbers.
## Techniques
### 1. Binning
- Sorting data and dividing it into "bins".
- **Example**: Grouping ages into [0-10], [11-20], etc.
- Helps smooth out noise.
### 2. Histogram Analysis
- Using a bar chart (histogram) to see the distribution and decide where to split the data.
### 3. Cluster Analysis
- Using clustering (like K-Means) to group similar values, then using those groups as the intervals.
## Concept Hierarchy
- Organizing data from **low-level** concepts to **high-level** concepts.
- **Example (Location)**:
- Street -> City -> State -> Country.
- **Top-down Mapping**: General to Specific.
- **Bottom-up Mapping**: Specific to General.