Files
DMCT-NOTES/unit 1/06_Data_Discretization.md
2025-11-24 16:55:19 +05:30

937 B

Data Discretization

Data Discretization is the process of converting a large number of continuous values into a smaller number of finite intervals (bins).

Why use it?

  • Makes data easier to understand.
  • Many algorithms work better with categories than infinite numbers.

Techniques

1. Binning

  • Sorting data and dividing it into "bins".
  • Example: Grouping ages into [0-10], [11-20], etc.
  • Helps smooth out noise.

2. Histogram Analysis

  • Using a bar chart (histogram) to see the distribution and decide where to split the data.

3. Cluster Analysis

  • Using clustering (like K-Means) to group similar values, then using those groups as the intervals.

Concept Hierarchy

  • Organizing data from low-level concepts to high-level concepts.
  • Example (Location):
    • Street -> City -> State -> Country.
  • Top-down Mapping: General to Specific.
  • Bottom-up Mapping: Specific to General.