Learning Notebook - David Rostcheck
Public View
learning_event details
Learning Event ID
Subject
Topic
Program
Length
Institution
Presenter
Format
Recorded Date
Completed Date
Notes
Machine learning algorithms like Linear Regression and Gaussian Naive Bayes assume the numerical variables have a Gaussian probability distribution. If not, they should be transformed into a numerical range that is ideal for numerical analysis and linear for the linear transformations of ML methods like deep learning Your data may not have a Gaussian distribution and instead may have a Gaussian-like distribution (e.g. nearly Gaussian but with outliers or a skew) or a totally different distribution (e.g. exponential). As such, you may be able to achieve better performance on a wide range of machine learning algorithms by transforming input and/or output variables to have a Gaussian or more-Gaussian distribution. Power transforms like the Box-Cox transform and the Yeo-Johnson transform provide an automatic way of performing these transforms on your data and are provided in the scikit-learn Python machine learning library. In this tutorial, you will discover how to use power transforms in scikit-learn to make variables more Gaussian for modeling. After completing this tutorial, you will know: Many machine learning algorithms prefer or perform better when numerical variables have a Gaussian probability distribution. Power transforms are a technique for transforming numerical input or output variables to have a Gaussian or more-Gaussian-like probability distribution. How to use the PowerTransform in scikit-learn to use the Box-Cox and Yeo-Johnson transforms when preparing data for predictive modeling.
Personal Notes
review of 2017-2020 study: Review transforms and scaling (most packages now do this automatically or at least have auto-optimizing APIs like PowerTransform)
Link
Review
Return to
main screen