Table of Contents |
Conditional probability is a fundamental concept in statistics and data analytics that helps you understand the likelihood of an event occurring, given that another event has already occurred.
Conditional probability is denoted as which reads as “the probability of event A occurring, given that event B has occurred.” The formula for conditional probability is:
To calculate conditional probabilities, you often need to use marginal and joint probabilities. Marginal probability refers to the probability of a single event occurring, regardless of the outcomes of other events. It is derived from the total probability distribution of all possible events. Consider a retail store analyzing the probability of a customer purchasing electronics. If the store has data on various product categories, the marginal probability of purchasing electronics is calculated by considering only the electronics purchases, regardless of other product categories.
Joint probability refers to the probability of two or more events occurring simultaneously. It considers the intersection of events. In the same retail store, the joint probability might be the probability that a customer purchases both electronics and clothing during a single visit. This helps in understanding the relationship between different product categories and customer behavior.
To illustrate these probabilities, consider a healthcare insurance company that wants to analyze risk factors affecting insurance pricing for clients. You are provided with data in a spreadsheet that contains information about individuals, including their age, gender, BMI (Body Mass Index), number of children, smoking status, region, insurance price, and a categorization of insurance price.
The Insurance Price Category column is a categorization of the Insurance Price column. The Insurance Price column shows how much money the customers are paying per year for insurance. If the customer pays less than $15,000 per year, they are considered to pay a low insurance price. If the customer pays more than $15,000 but less than $25,000, the customer is considered to pay a moderate insurance price, and if the customer pays more than $25,0000 per year for health insurance, the customer is considered to pay a high price.
The table below shows a crosstabulation for two events that can be derived using a PivotTable from the healthcare data:
From the table above, we can calculate marginal and joint probabilities. The marginal probability is the likelihood of the event occurring without considering other events. For example, the probability a customer will have low priced insurance is the probability that a customer will have moderately priced insurance is
and the probability that a customer will have high priced insurance is
These probabilities are referred to as marginal probabilities because of their location in the margins of the crosstabulation table. But, are these probabilities related to the price category of insurance different for smokers versus non-smokers? Conditional probability allows you to answer this question.
The joint probability provides the probability of the intersection of two events. For example, the probability that a randomly selected customer will be a non-smoker and have moderately priced insurance is
The table below is a joint probability table, which summarizes the probability information of the insurance pricing structure for smokers and non-smokers:
Now that you have the foundation to distinguish between marginal and joint probabilities, you are ready to calculate and interpret conditional probabilities.
EXAMPLE
Imagine you are working as a data analyst for a health insurance company, and they want to analyze risk factors affecting insurance. Your manager has asked you to calculate the probability that a randomly selected customer pays a high insurance price, given they are a non-smoker.Now that you have become comfortable with the ideas of conditional probability, let’s put it all together! We'll step through the real-world process of being provided data in Excel, creating the PivotTable, and then calculating a conditional probability.
Now that you know how to calculate conditional probabilities, let’s explore a small case study that illustrates how conditional probabilities can be used to evaluate customer churn in the telecommunications industry.
IN CONTEXT
TelcoConnect is a telecommunications company that provides mobile phone services to its customers. They offer two primary contract types: Postpaid (monthly billing) and Prepaid (pay-as-you-go). TelcoConnect is concerned about customer churn (that is, customers leaving their service), as it impacts revenue and customer satisfaction.
The TelcoConnect analytics team wants to understand the relationship between contract type and churn. Specifically, they aim to answer the following questions:
The crosstabulation for a sample of 100 customers is provided below.
- What is the likelihood that a customer churns, given they are Postpaid?
- What is the likelihood that a customer churns, given they are Prepaid?
- How does contract type influence churn behavior?
![]()
You can use conditional probabilities to address TelcoConnect’s questions. Calculate the probability of churning given Postpaid and Prepaid contracts and compare the probabilities.
From the conditional probabilities, customers in a postpaid contract plan are more likely to churn than customers in a prepaid plan. TelcoConnect may use this insight to:
It is important to consider that there could be additional factors that might influence churn probability.
- Tailor retention strategies based on contract type.
- Offer targeted promotions to customers in a postpaid plan to reduce churn.
Source: THIS TUTORIAL WAS AUTHORED BY SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.