Data Science Interview Questions – Part 1

Data science is a field which has no ends. It doesn’t matter how much you will read it will always be less. One interviewer told me that you use only 5% of knowledge what you learn in data science. Its actually true. Although the type of questions changes according to the job profiles. I will list all the questions that I ask during the interview and also the questions which were asked to me.

keep checking this blog post as I will always be updating this post.(I will also write their answers soon)

Analytics and Consulting firms

  1. Explain logistic regression? why do we use it? Assumptions of linear regression
  2. Clustering questions- How do you choose between K means and Hierarchical clustering?
  3. Explain ROC curve, Precision- Recall.
  4. What do you mean by p-value(My favorite question. Most people don’t know answer to this question)
  5. Explain the steps in a data science project.
  6. Difference between machine learning and statistical modeling.
  7. Explain me logistic regression in LAYMEN TERMS. (Without using technical words)
  8. What is the correlation? Is it bad or good?
  9. What do you mean by data science.(Another fav)
  10. Types of join.(Must)
  11. What is R square?
  12. What is random forest?
  13. Explain any algorithm end to end.(Most often logistic regression and decision tree)
  14. Whats the most challenging project you have done? How did you overcome ?
  15. Explain Central limit theorem.





A Data Science Project- Part 3: Hypothesis testing and ANOVA

This post is in continuation of A Data Science project series. In this post, We will use ANOVA test whenever we need to check if two or more groups are different from each other or not. i.e. let’s say there are four races in a school – White, Hispanic, Black and Indian. Now school management wants to know if the marks scored by these races are statistically different or not. If yes then they would like to do something about it.



A Data Science Project- Part 2: Making Sense of Data

I hope you have gone through part1 and part2, Today I will tell you not only how to explore data through visualization but also the most important part how to interpret them. Don’t miss reading summary in the end of every post.

we will create bar chart, histograms, scatter-plot, box-plots etc. We will also check association between two variables. Let’s start.



A Data Science Project- Part 1(b)

In the previous article (, we have done basic data analysis like calculating means, frequency tables, summary etc. Now we will derive new variables. Why?
Derived variables will help to understand more about them. For example, We have derived variable ip(derived from incomeperperson variable) which will help us to understand how many people fall in lower income class or higher income class. Similarly other variables le, ac transformed into new variables.
How do we decide these value of cutoff points?
This is answered by the business people or you have to explore the data to divide them into different buckets.

If you have any query, let me know in comment section.

A Data Science Project- Part 1(a)

In the earlier post, we have discussed our hypothesis and different variables impacting life expectancy. In this post, we will start digging into data. I suggest you go through codebook(check the earlier post). It is important to understand variables first before doing any analysis. I have chosen SAS as data analysis language, but you are free to choose any language R or Python. The process and logic will be same, the just syntax will be different. Now in this part, we will explore all the variables. We will check frequency and distribution of variables. For numeric variables, you can draw the histogram, boxplot, quantiles to understand them and for the categorical variables, you can draw barplot, frequency tables.

A Data Science Project-Introduction: How can we have better life expectancy!

Hi, future data science buddies!

I have been asked many times how can I become data scientist or what should I do to become a data scientist.  Here I have written about some technical skills that are required to be a data scientist.

But above article was lacking one thing, and Monica Rogati has summed it up perfectly.

This advice from Monica Rogati is a must read for all beginners who want to make into data science. In one sentence, she says

Do a project you care about. Make it good and share it.

Now the question is how to choose a data science project?

So I thought how about I can give you some ideas about the projects and how to start a project and finish it. I am starting a series of blog posts in which step by step I will show you the journey of a data science project.

I am sharing one of my project which I have undertaken during a course. This is a basic project and read this article to understand the approach that you will follow in any project. You will figure out why hypothesis generation is necessary. You will also get to know that why people say every good data analysis starts with a question. You will understand what type of questions data science can answer. You can choose any project topic as long as you have relevant data available.

My research Project Topic was How life expectancy is related to social, economic, environmental factors.

The dataset I have downloaded is from Gapminder  Details of the data were given in codebook.

This is gapminder data’s codebook link

After looking through the codebook for the gap minder study, I have decided that I am particularly interested in Life expectancy.
So I chose a variable lifeexpectancy from the code book.
I want to study the life expectancy behavior with social,economic and environmental factors. I found out the variables which are related to these factors.

  • alcconsumption(alcohol consumption)
  • co2emissions(cumulative CO2 emission)
  • oilperperson(oil Consumption)
  • incomeperperson

I have performed a literature review to see what the work has been done on this topic so that I can bring out some new research. I have taken references from below research:

  • Life expectancy of people with intellectual disability:
    a 35-year follow-up study
    K. Patja,1 M. Iivanainen,1 H.Vesala,2 H. Oksanen3 & I. Ruoppila4    
  • Income distribution and life expectancy
    R G Wilkinson
  • Changes in life expectancy in Russia in the mid-1990s
    Vladimir Shkolnikov, PhD,Prof Martin McKee,Prof David A Leon, PhD
  • Drinking Pattern and Mortality:
    The Italian Risk Factor and Life Expectancy Pooling Project
  1. After going through those literature I generated hypothesis that I believe, people who are having better income, less alcohol consumption and better environment conditions are having better life expectancy. So These factors are positively correlated with life expectancy according to my hypothesis.

We would like to find out what is the association between life expectancy and social,economic and environmental factors. We would like to address several questions i.e.

  1.    Does alcohol consumption and oil consumption determine life
  2.    Does better income means better life expectancy?
  3.    Is there causal relationship ?
  4.    If there is more CO2 emission then would it means lower life

The Variables that we will consider for the hypothesis are

  1. incomeperperson (Gross Domestic Product per capita in constant 2000 US$)
  2. alcconsumption (alcohol consumption per adult (age 15+), litres
    Recorded and estimated average alcohol consumption, adult (15+) per
    capita consumption in litres pure alcohol)
  3. co2emissions (cumulative CO2 emission (metric tons), Total amount of CO2 emission in metric tons since 1751)
  4. lifeexpectancy (life expectancy at birth (years)
    The average number of years a newborn child would live if current
    mortality patterns were to stay the same.)
  5. oilperperson (oil Consumption per capita (tonnes per year and person))

This will help us to find answers for important questions like How does a country can have better life expectancy without compromising on its development projects which causes lots of CO2 emission. Where does a country should focus its agenda to have a better life expectancy. So lets dive into the data….



How to make career into Data Science

Hello, Guys!! Recently I have been asked by many people that what they should do to make a career in data science field.
The first requisite is that you should have the fire in your belly to make into this field. Believe me, there is no dearth of jobs.There are jobs out there you should be willing to be employable.
Data science is a field where industry heavily rely on logical and analytical capability.I am mentioning some things which are least requirement to make into this field.You should be good in basics first.I am mentioning below how to become a data scientist for free.I have also got chance to interview people for my firm. While interviewing, some people have experience of 1 year working in analytics field but they don’t know basics.There was one person who has done Msc in statistics but he did not know basics of statistics like p-value.These are some of the practices which I have followed to get into this field.Analyticsvidhya blog is highly recommended blog for beginners.You would find everything else there which I didn’t write in this post.

1.Solve puzzles/brain teasers/speed maths calculations.
2.Enroll in MOOCs courses from Coursera/Edx/Udacity.I am mentioning courses which should be taken and start from option a from every course mentioned below.After completing a part, you can move to option b.If you have to pick only one course then choose AnalyticsEdge in Edx.

For Statistics
(Note: Statistics is an important part of Data Science. You should know at least Basics of Statistics i.e. concepts of probability, Normal Distribution, standard deviation, p-values, correlation, and causation etc)
a. Data analysis and Statistical Inference on Coursera.
b. Intro to Descriptive and Inferential Statistics.

For R

a.The analyticsEdge from MIT in Edx is one of the best course to get you started in the  power of analyticsEdge.This course is time taking and very rigorous.You should be able to spend 15-20 hours per week.Kindly do assignments they provide otherwise it will be of no use.If they have taken offline their assignment, google their assignment you will find it.(I finished it 100%)

For Python
a.Datacamp launched their python for data science course.Do take it
b.Dataquest is also one of the good resources.

a.Data Analysis and Interpretation.This is also available python.

For Machine Learning
a. The famous Machine learning by Andrew Ng on Coursera

For Predictive analytics
a.Predictive analytics from IIMB on Edx

3. Start reading one blog every day from Analyticsvidhya(Heavily recommended for beginners), Data Science central and kdnuggets.
4.Participate in competitions on Kaggle/Analyticsvidhya
a.Titanic is a good and very easy problem on data science to get you started.You would find various tutorials for this problem.Just google it.
5.Start thinking in numbers and solve some case studies from consulting
6.Be good in maths.
7.Learn Linear regression, Logistic Regression, Decision Tree, Random Forest algorithms in depth.In industry Logistic regression is the widely used algorithm.

For Interview
Make a good resume
Prepare on some famous logical puzzles
Guesstimate questions like how many cigarette are being sold in one month.
Work on your problem-solving structure.Interviewer checks your approach to solving tough problems, not your solution.
Research about company

Keep applying and show your passion for this field.
You know skills are cheap but passion is priceless. I read daily about data science like news paper.Don’t be fascinated by news of sexiest job title.It is a demanding job.If you don’t have it in you, you won’t survive in this field.

Update : You must read Monica Rogati’s advice too. Here is the link.