operational-efficiency Data cleaning

  1. Data Filtering
    Filtering involves removing irrelevant or unnecessary data from a dataset to reduce noise and focus on the most relevant information.
  2. Data Validation
    Data validation aims to check it data adheres to defined rules and constraints, identifying and correcting inconsistencies.
  3. Data Deduplication
    Data deduplication involves eliminating duplicate records from a dataset, ensuring that each record is unique.
  4. Data Encoding
    Data encoding involves converting categorical data into a numerical format to make it compatible with machine learning algorithms.
  5. Data Imputation
    Data imputation entails replacing missing or null values with estimated values to maintain data integrity.
  6. Data Aggregation
    Data aggregation entails grouping data by category, time period, or another criterion to obtain summarized statistics.
  7. Data Standardization
    Standardizing data involves putting all data into a common format to facilitate comparison and analysis.
  8. Data Sampling
    Data sampling is the process of selecting a representative subset of data to expedite analysis while preserving data integrity.
  9. Data Transformation
    Data transformation involves modifying existing data to make it more suitable for analysis or modeling.
  10. Outlier Detection
    Outlier detection is the process of identifying and managing values that significantly deviate from the rest of the data, often by treating or removing them.
  11. Data Cleansing
    Data cleansing is process that encompasses the application of multiple techniques to ensure data accuracy, completeness, and compliance with standards.
  12. Data Profiling
    Data profiling involves in-depth analysis of data to understand its structure, characteristics, and quality.