Files
DMCT-NOTES/unit 1/01_Introduction_to_Data_Mining.md
2025-11-24 16:55:19 +05:30

1.3 KiB

Introduction to Data Mining

What is Data Mining?

Data Mining is the process of digging through large amounts of raw data to find useful patterns, trends, and knowledge.

  • Analogy: Like mining gold from rocks. The rocks are the "raw data," and the gold is the "knowledge."

Key Definitions

  • Data: Raw facts and figures (e.g., sales logs, sensor readings).
  • Mining: Extracting something valuable.

The DIKW Pyramid

The DIKW model shows how we move from raw data to wisdom.

  1. Data (D): Raw, unprocessed facts.
    • Example: Numbers like 42, 35, 50.
  2. Information (I): Data that is organized and has meaning.
    • Example: "These are the ages of employees."
  3. Knowledge (K): Understanding gained from analysis.
    • Example: "The team has a mix of young and experienced people."
  4. Wisdom (W): Applying knowledge to make good decisions.
    • Example: "Let's create a mentorship program to share skills."

Major Issues in Data Mining

  1. Privacy and Security: Mining can reveal sensitive personal info. We must protect it.
  2. Scalability: Can the system handle huge amounts of data (Big Data)?
  3. Data Quality: If data is dirty or missing, the results will be wrong ("Garbage In, Garbage Out").
  4. Ethical Use: Ensuring data isn't used for discrimination or bias.