How do you know you’re hiring the right candidate for your data science jobs?

Well, you can never know.

The only thing you can do during the interview process to hire strong candidates is to ask them the right questions.

But which questions actually matter?

Which questions are going to get you the employee you, your company, or your data science team/project needs right now?

It’s tough, we know.
That’s why we’ve compiled a list of questions (from our Data Science Interview Recipe) that:

you should ask your prospective candidates during an interviewing process (as well as the “right” answers).

The first thing you need to gauge as an interviewer is their hard skills by asking technical questions.

Their technical skills are the top priority

1. Explain the “bias-variance trade-off” and why it is fundamental to machine learning.

Why: The purpose of this question is to find out whether the candidate understands the fundamental concepts in machine learning.

It’s the basics that we’re trying to cover during this part of the recruiting process.

Answer: First of all, the bias and variance here are referring to error due to bias and error due to variance.

We need to find a good trade-off for a model’s ability to minimize bias and variance.

This helps us to avoid overfitting and underfitting.

Bias represents the ability to capture the true patterns in a dataset.

We look at the predicted values and the true target values (note: this is not the observed value or our training data).

A high training error is a sign of high bias, which means it will not work well when tested.

Variance captures the range of predictions for each data record.

Variance only focuses on the range of predictions of each data point.

When a model works well in training, but it has a high test error, it’s a sign of high variance and less accurate.

2. Which of the following machine learning algorithms can be used for inputting missing values of both categorical and continuous variables?

a) K-means clustering

b) Linear regression

c) K-NN (k-nearest neighbour)

d) Logistic regression

Why: What we are trying to find out from this question is your candidate’s understanding of the best practices to handle missing data.

Most machine learning algorithms do not like missing data, so they should handle them properly.

Answer: The exact answer is the K nearest neighbour (K-NN).

It is a good practice to identify and replace missing values for each column in our missing data before we perform modelling.

Apart from dropping the row with the missing value, or replacing them with an average value, another common way to impute the missing value is to predict its value by using a model.

KNN can be used because it can compute the nearest neighbour. If it doesn’t have a value, it just computes the nearest neighbour based on all the other features.

3. Do you think 50 small decision trees are better than a single large one? Can you explain why with an example?

Why: This is one of the most popular interview questions.

Here, we are trying to find out how much your data science candidate understand the concept of an ensemble in data science.

Answer: In general, the answer is yes Combining trees will make a stronger learner.

The new learner is usually more accurate, more robust, and less prone to overfitting.

An example of an ensemble model is a random forest. Random forest is a type of ensemble learning based on decision trees.

These trees splitting the original data set into a number of subsets.

The model then selects final predictions using a “majority wins” model.

It significantly reduces the risk of error from an individual tree and improves the data’s reliability.

4. What is an RFM model and how can we use it?

Why: This question tests the interviewee’s understanding of metrics that quantifies the value of a customer.

You may also choose to ask them to describe how you prioritize customers.

If they understand how RFM works, they can get creative and apply it to other scenarios.

Answer: The definitive answer is that RFM can be used to answer business questions

RFM is popular because it is easy to calculate and almost every company captures these data points:

• RECENCY (R): When was the last time the customer visited?

• FREQUENCY (F): How often does the customer make purchases?

• MONETARY (M): How much did the customer spend?

We can then split each of the dimensions into 5 different tiers by assigning a score from 1-5.

For example, the most recently visited customers will receive an R score of 5, and the least recently visited customers will receive an R score of 1.

After we have assigned scores for all three dimensions, we can then group them together and provide them with different levels of services.

Now that we have the hard skills scanned for –

we need to get onto the top behavioural questions to ask your potential data scientists.

This is a key part of the hiring process.

6. How have you used data to elevate the experience of a customer or stakeholder?

Why: Ultimately, data science is about improving decision-making and performance—whether for end-users or for your company as a whole.

If the candidate doesn’t understand or care about the ultimate impact of their work, they may lack the big-picture thinking, business insights, and business impact that you’re looking for in this data scientist position.

Business-related questions are integral to hiring for a data scientist job as well as helping them build a successful data science career with your company.

Answer: Look for answers that draw a clear line from data to an objective business result, like lower costs.

Listen for signs that the candidate always looks for ways to add more value through their work and pride themselves on their business decisions.

7. Tell me about a time when you had to clean and organize a big data set.

Why: The best data scientists take pains to ensure that the data they’re working with is high quality.

“Dirty” or disorganized data can tarnish the value of analysis and generate misleading insights, so it’s essential to know that your new hire is experienced in cleaning and organizing data, no matter how big the data set is.

Answer: Candidates may mention a variety of techniques and tools here, like value correction methods and automated cleanup tools, like Paxata.

Great answers will delve into why they chose those methods—the more specificity, the better.

8. Tell me about a data science project you’ve worked on where you encountered a challenging problem. How did you respond?

Why: When you’re working with data, the question isn’t if problems will arise, it’s when.

You want to know that your candidate understands how to deal with data-related problems or errors.

How do they correct the mistake?

How do they use their communication skills to communicate the problem to leaders, customers, or other stakeholders?

Answer: A great answer will reveal the candidate’s nimbleness and ability to adapt, as well as their problem-solving skills.

Pay attention to signs that they learned something from the experience and put precautions in place to prevent the same issue from occurring in the future.

Last but most definitely not least, you should scan for soft skills from your candidate. This can be done through open-ended questions.

9. Tell me about a data professional you’ve worked with who you really admire. What do you admire about them?

Why: This question can help you get a sense of the traits your candidate values in the people they work with.

This will give you an idea of how they’ll get along with the rest of the team and whether they’ll be motivated by their interactions with their peers.

Their answer may also shine a light on the kind of data scientist they aspire to be, allowing you to gauge their level of ambition.

Answers: the person’s impressive skills, generosity with their time, or commitment to advancing the field as a whole.

Listen for descriptions that align with the manager or data science team the candidate would be working with.

Afterall this candidate could be one of your next data science leaders.

They could be the one training new candidates for their data science roles with your company.

The previous questions exemplify the importance of the interviewer’s role during the hiring process.

It is integral to finding the right data science fit for your company.

That’s it!

Here are some of the many questions we think are important to ask your data science candidate.

For more questions please check out our Data Science Interview Recipe written by our founder and senior data science professional, Dr Lau to help you interview better.

We hope you’ve learned something.

As always email us at  for questions or advice, always here to help you grow.