28 lines
937 B
Markdown
28 lines
937 B
Markdown
# Data Discretization
|
|
|
|
**Data Discretization** is the process of converting a large number of continuous values into a smaller number of finite intervals (bins).
|
|
|
|
## Why use it?
|
|
- Makes data easier to understand.
|
|
- Many algorithms work better with categories than infinite numbers.
|
|
|
|
## Techniques
|
|
|
|
### 1. Binning
|
|
- Sorting data and dividing it into "bins".
|
|
- **Example**: Grouping ages into [0-10], [11-20], etc.
|
|
- Helps smooth out noise.
|
|
|
|
### 2. Histogram Analysis
|
|
- Using a bar chart (histogram) to see the distribution and decide where to split the data.
|
|
|
|
### 3. Cluster Analysis
|
|
- Using clustering (like K-Means) to group similar values, then using those groups as the intervals.
|
|
|
|
## Concept Hierarchy
|
|
- Organizing data from **low-level** concepts to **high-level** concepts.
|
|
- **Example (Location)**:
|
|
- Street -> City -> State -> Country.
|
|
- **Top-down Mapping**: General to Specific.
|
|
- **Bottom-up Mapping**: Specific to General.
|