addition of unit 1 3 4 5

This commit is contained in:
Akshat Mehta
2025-11-24 16:55:19 +05:30
parent 8f8e35ae95
commit f8aea15aaa
24 changed files with 596 additions and 0 deletions

View File

@@ -0,0 +1,27 @@
# Data Discretization
**Data Discretization** is the process of converting a large number of continuous values into a smaller number of finite intervals (bins).
## Why use it?
- Makes data easier to understand.
- Many algorithms work better with categories than infinite numbers.
## Techniques
### 1. Binning
- Sorting data and dividing it into "bins".
- **Example**: Grouping ages into [0-10], [11-20], etc.
- Helps smooth out noise.
### 2. Histogram Analysis
- Using a bar chart (histogram) to see the distribution and decide where to split the data.
### 3. Cluster Analysis
- Using clustering (like K-Means) to group similar values, then using those groups as the intervals.
## Concept Hierarchy
- Organizing data from **low-level** concepts to **high-level** concepts.
- **Example (Location)**:
- Street -> City -> State -> Country.
- **Top-down Mapping**: General to Specific.
- **Bottom-up Mapping**: Specific to General.