addition of unit 1 3 4 5
This commit is contained in:
27
unit 1/06_Data_Discretization.md
Normal file
27
unit 1/06_Data_Discretization.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Data Discretization
|
||||
|
||||
**Data Discretization** is the process of converting a large number of continuous values into a smaller number of finite intervals (bins).
|
||||
|
||||
## Why use it?
|
||||
- Makes data easier to understand.
|
||||
- Many algorithms work better with categories than infinite numbers.
|
||||
|
||||
## Techniques
|
||||
|
||||
### 1. Binning
|
||||
- Sorting data and dividing it into "bins".
|
||||
- **Example**: Grouping ages into [0-10], [11-20], etc.
|
||||
- Helps smooth out noise.
|
||||
|
||||
### 2. Histogram Analysis
|
||||
- Using a bar chart (histogram) to see the distribution and decide where to split the data.
|
||||
|
||||
### 3. Cluster Analysis
|
||||
- Using clustering (like K-Means) to group similar values, then using those groups as the intervals.
|
||||
|
||||
## Concept Hierarchy
|
||||
- Organizing data from **low-level** concepts to **high-level** concepts.
|
||||
- **Example (Location)**:
|
||||
- Street -> City -> State -> Country.
|
||||
- **Top-down Mapping**: General to Specific.
|
||||
- **Bottom-up Mapping**: Specific to General.
|
||||
Reference in New Issue
Block a user