Last updated 9/2022
MP4 | Video: h264, 1280×720 | Audio: AAC, 44.1 KHz
Language: English | Size: 2.04 GB | Duration: 5h 23m
Learn why, when, and how to maximize the quality of your data to optimize data-based decisions
What you’ll learn
Strategies for increasing data quality
Ways to assess data quality
Interpreting data visualizations
How to spot problems in data
Requirements
Interest in working with data
Interest in knowing more about data quality
Some Python skills are useful for the optional coding videos
Description
All of our decisions are based on data. Our sense organs gather data, our memories are data, and our gut-instincts are data. If you want to make good decisions, you need to have high-quality data.This course is about data quality: What it means, why it’s important, and how you can increase the quality of your data. In this course, you will learn:High-level strategies for ensuring high data quality, including terminology, data documentation and management, and the different research phases in which you can check and increase data quality.Qualitative and quantitative methods for evaluating data quality, including visual inspection, error rates, and outliers. Python code is provided to see how to implement these visualizations and scoring methods using pandas, numpy, seaborn, and matplotlib.Specific data methods and algorithms for cleaning data and rejecting bad or unusual data. As above, Python code is provided to see how to implement these procedures using pandas, numpy, seaborn, and matplotlib.This course is for Data practitioners who want to understand both the high-level strategies and the low-level procedures for evaluating and improving data quality.Managers, clients, and collaborators who want to understand the importance of data quality, even if they are not working directly with data.
Overview
Section 1: Introduction
Lecture 1 Is this course right for you?
Section 2: Download course materials (Python code)
Lecture 2 Download the code
Section 3: Why data quality matters
Lecture 3 Section summary
Lecture 4 Is data or are data??
Lecture 5 On the origins and quality of data
Lecture 6 GIGO (garbage in, garbage out)
Lecture 7 Data quality influences data-driven decisions
Section 4: Ensuring high data quality
Lecture 8 Section summary
Lecture 9 Data management
Lecture 10 Data documentation
Lecture 11 Data audits
Lecture 12 Data cleaning phases
Lecture 13 Improve quality before getting data
Lecture 14 Improve quality during data collection
Lecture 15 Improve quality after data collection
Lecture 16 Improve quality during data analysis
Lecture 17 Risks of biased results
Section 5: Assessing data quality
Lecture 18 Section summary
Lecture 19 Qualitative vs. quantitative quality assessments
Lecture 20 Qualitative assessments via visual inspection
Lecture 21 Code: Visualizing data distributions
Lecture 22 Variance assessments
Lecture 23 Correlations and correlation matrices
Lecture 24 Data error rates
Lecture 25 Sample sizes
Lecture 26 Code: Measuring data quality
Section 6: Data transformations
Lecture 27 Section summary
Lecture 28 Z-score scaling
Lecture 29 Min/max scaling
Lecture 30 Binning (rounding)
Lecture 31 Unit normalization
Lecture 32 Rank transform
Lecture 33 Nonlinear transformations
Lecture 34 Code: Transforming data
Section 7: Outliers and missing data
Lecture 35 Section summary
Lecture 36 What are outliers?
Lecture 37 The z-score method
Lecture 38 The modified z-score method
Lecture 39 Dealing with missing data
Lecture 40 Code: Dealing with bad or missing data
Section 8: Be a high-quality data scientist
Lecture 41 Section summary
Lecture 42 Keeping up with data science developments
Lecture 43 Can you know everything?
Lecture 44 What data scientists want
Section 9: Bonus
Lecture 45 Bonus material
Data science practitioners,Data scientist students,Managers or colleagues who work with data practitioners
Password/解压密码www.tbtos.com
转载请注明:0daytown » Improving Data Quality In Data Analytics & Machine Learning