Use Sophia to knock out your gen-ed requirements quickly and affordably. Learn more
×

Splitting and Combining Data

Author: Sophia

what's covered
In this lesson, you will learn how to split and combine data in Excel. Specifically, this lesson will cover:

Table of Contents

1. Splitting Data

Sometimes multiple pieces of information are contained within one column. You may only need a single component of an item in a column to complete the intended analysis. For example, in the last tutorial, only one country's value would be required to determine which country produces the most content. Being able to split the information in a single column into multiple columns allows for clarity and easier usability of the data to create specific analyses. Whether you decide to split data into multiple columns depends on your specific analysis needs and the structure of the data set.

1a. Splitting Country Data in Excel

EXAMPLE

You want to create a visualization that displays the count of the number of movies and TV shows by country. You are told that the first country in the country column of the data is the country where the production of the TV show or movie originally occurred. You only need one row per country to make the visualization you have in mind. You decide to create another column named origin_country that is the first country listed in the country column.

When you examine several of the values of the country column, you observe that their values are separated by commas.



The symbols used to separate data values in a data set are known as delimiters. In this case, the comma is the delimiter for the country column.

You can use Excel to split the values of the country column into multiple columns so you can obtain one column that contains the first country value from the country column.

try it
Create a new column named origin_country that is the first country value in the country column (if multiple values are listed in the country column). You will split the values of the country column so that each value before the comma will be placed into one column, and the value after the second comma will be placed into another column, and so on.

Open the netflix.xlsx data. Perform the following:

1. Copy the country column to the end of your last column in the netflix_titles worksheet. Copy the country column to column P and select the values of the country column.

Note: If you do not copy and paste the column you want to split to the end of the columns in the worksheet, the split columns will overwrite the columns to the right of the original country column.

2. Select Split Text to Columns in the Data menu.



The following dialog box opens.

3. Select Comma for the delimiter and then Apply.



You will now see that data is populated in columns Q and R. The first value of country persists in the country column. If there was a second value of country, it is moved to column Q and the third to column R.

4. Rename column P origin_country, name column Q second_country, and name column R third_country.

You can observe that Enter the Warrior’s Gate was produced in France, Canada, and China.



5. Check your work by splitting the country column into multiple columns and compare the columns you obtain to the example.

watch
Follow along with the steps for splitting the country column in Excel.

term to know
Delimiter
A special character like a backslash, hyphen, tab, comma, or semicolon used to separate data.

1b. Splitting Dateparts in Excel

You can also split date values into multiple columns if your date contains multiple date components such as month, day, and year. The components that make up the date (day, month, and year) are known as dateparts.

Many times, it is useful to extract a datepart from a date column. For example, suppose you want to perform a monthly distribution of content additions for Netflix. That is, you want to determine if there is a particular month in which Netflix adds the most content to its platform. You could create a frequency distribution for the months for the date_added column. But, in the current form that the date_added column is in, this would be difficult. However, if you split the date_added column into its different dateparts, it would be straightforward to construct the frequency distribution for the months.

EXAMPLE

You can use Excel to split the month value from the date_added column using either the numeric or the abbreviated text version of the month.

try it
Open netflix.xlsx and perform the following:

  1. Add 2 columns to the spreadsheet to the right of the date_added column.
  2. Name these columns as month_added_numeric (column L) and month_added_text (column M) to indicate the month that Netflix added content to its platform.
  3. In columns L2 and M2, type the following formulas and select ENTER.



    The MONTH() function extracts the month component from the date_added column (K) and returns the numeric value for the month. If it appears as anything other than a single integer, select general formatting, and it should correct the issue. The TEXT() function extracts the three-character abbreviation from a date column in Excel.
  4. Once you have your formula in the first cell of the two columns, month_added_numeric and month_added_text, hover over the small black plus sign (Auto-Fill Handle) at the bottom-right corner of the cell. Click and hold the left mouse button, then drag the handle down to the last cell where you want the formula to apply.

    The date_added has now been split into its month component. Now, you can create a frequency distribution for either one of the month_added columns to answer the business question related to a monthly distribution of content analysis for Netflix.
  5. Check your work by creating the two datepart month columns (month_added_numeric and month_added_text) from the date_added column and compare the columns you obtain to the example.

watch
Follow along with the steps for splitting the date_added column in Excel.

term to know
Datepart
An individual component from the date field (e.g. month, day, and year).


2. Combining Data

Just as you can split data from one column into multiple columns, you can also combine data from multiple columns into a single column. When you have related data spread across multiple columns (first name, last name), combining them into a single column (full name) makes the data set more organized. Combining columns like street, city, and zip_code into an address column makes data entry easier and improves readability. In a similar fashion to splitting data, the columns you decide to combine should be driven by your analysis goals.

2a. Combining Data in Excel

EXAMPLE

Combining relevant columns can create more informative columns. Suppose you need one column that combines the information from the duration and type columns in the Netflix data. You can create a combined column that describes a description of the content along with the duration. You can create a column named content_duration that looks as follows:

try it
Create a column named content_duration that combines the information from the type and duration columns of the Netflix data.

Open the netflix.xlsx data and perform the following.

There are two methods you can use to combine data in Excel.

Method 1: The & Symbol
1. Create a column named content_duration in column U.

2. In cell U2, type the following formula and select ENTER.

=B2&" "&N2
Using the formula with the & symbol will combine the text in cells B2 and N2 separated by a space.

The type column (column B) contains the content indicating whether the observation is a TV Show or Movie, and duration (column N) is the column that provides the length of the TV Show or Movie.

You should see the values of type and duration combined into the content_duration column.



3. Once you have your formula in the first cell of the column, hover over the small black plus sign (Auto-Fill Handle) at the bottom-right corner of the cell. Click and hold the left mouse button, then drag the handle down to the last cell where you want the formula to apply.

4. Check your work by creating the content_duration column by combining data from the type and duration columns using Method 1, and compare the column you obtain to the example.

Method 2: The CONCAT() Function
The second way you can combine data from multiple columns is with the CONCAT() function, which is an abbreviation for concatenate or concatenation. The term concatenation is often used in computing or data processing to describe the action of combining multiple strings or texts into one continuous string.

1. In Excel, create a new column named content_duration in column V.

The CONCAT() function in Excel allows you to merge two or more text values into a single column/cell. The general form of the formula is:

=CONCAT(A2, " ",B2)
The text in cell A2 will be combined with the text in cell B2 with a space as the delimiter that will separate text values. The second argument is the delimiter used to separate the text values from each column. You can specify any delimiter like a dash (-), backspace (/), or star (*) if you place the delimiter in the quotation marks of the second argument.

2. In cell V2, enter the following formula:

=CONCAT(B2,” “,N2)
You should see the values of type and duration combined into the content_duration column.



3. You can follow the auto-fill instructions provided earlier to auto-fill all the rows for content_duration.

The final version of content_duration should look the same as it did using the & symbol from Method 1 to combine the duration and type columns.

4. Check your work by creating the content_duration column by combining data from the type and duration columns using Method 2 and compare the column you obtain to the example.

watch
Follow along with the steps for combining type and duration columns in Excel using Method 1: The Ampersand Symbol.

watch
Follow along with the steps for combining type and duration columns in Excel using Method 2: The Concatenate Function.

hint
The concatenating functions in Excel are versatile and illustrate how there is often more than one way to accomplish a task in analytics. The table below describes several common ways an analyst might use the & and CONCAT() functions to combine data from multiple cells.  

Data in Column A  Data in Column B & Function Method   CONCAT() Function Method  Output in Column C  
Rajesh  Patel   =A2& " "& B2 =CONCAT(A2, " ", B2) Rajesh Patel  
+1 555-1234  ="+" &B2  =CONCAT("+1", B2) +1555-1234 
Product A  Service X =A2& " - "&B2 =CONCAT(A2, " - ", B2) Product A - Service X 
Excel  Functions  =A2& ", " &B2 =CONCAT(A2, ", ", B2) Excel, Functions 

term to know
Concatenate
To combine multiple text strings into one.

summary
In this lesson, you learned how to combine related data from multiple columns (e.g., first name and last name) into a single column for better organization. Additionally, you learned how to split data within a single column into multiple columns for clarity and easier usability. These techniques help make your dataset more organized and facilitate specific analyses.

Source: THIS TUTORIAL WAS AUTHORED BY SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

Terms to Know
Concatenate

To combine multiple text strings into one.

Datepart

An individual component from the date field (e.g. month, day, and year).

Delimiter

A special character like a backslash, hyphen, tab, comma, or semicolon used to separate data.