937 B
937 B
Data Discretization
Data Discretization is the process of converting a large number of continuous values into a smaller number of finite intervals (bins).
Why use it?
- Makes data easier to understand.
- Many algorithms work better with categories than infinite numbers.
Techniques
1. Binning
- Sorting data and dividing it into "bins".
- Example: Grouping ages into [0-10], [11-20], etc.
- Helps smooth out noise.
2. Histogram Analysis
- Using a bar chart (histogram) to see the distribution and decide where to split the data.
3. Cluster Analysis
- Using clustering (like K-Means) to group similar values, then using those groups as the intervals.
Concept Hierarchy
- Organizing data from low-level concepts to high-level concepts.
- Example (Location):
- Street -> City -> State -> Country.
- Top-down Mapping: General to Specific.
- Bottom-up Mapping: Specific to General.