In the previous article (https://d4datascience.com/2016/11/10/a-data-science-project-part-1/), we have done basic data analysis like calculating means, frequency tables, summary etc. Now we will derive new variables. Why?
Derived variables will help to understand more about them. For example, We have derived variable ip(derived from incomeperperson variable) which will help us to understand how many people fall in lower income class or higher income class. Similarly other variables le, ac transformed into new variables.
How do we decide these value of cutoff points?
This is answered by the business people or you have to explore the data to divide them into different buckets.
If you have any query, let me know in comment section.