Table of Contents |
Sometimes multiple pieces of information are contained within one column. You may only need a single component of an item in a column to complete the intended analysis. For example, in the last tutorial, only one country's value would be required to determine which country produces the most content. Being able to split the information in a single column into multiple columns allows for clarity and easier usability of the data to create specific analyses. Whether you decide to split data into multiple columns depends on your specific analysis needs and the structure of the data set.
EXAMPLE
You want to create a visualization that displays the count of the number of movies and TV shows by country. You are told that the first country in the country column of the data is the country where the production of the TV show or movie originally occurred. You only need one row per country to make the visualization you have in mind. You decide to create another column named origin_country that is the first country listed in the country column.
You can also split date values into multiple columns if your date contains multiple date components such as month, day, and year. The components that make up the date (day, month, and year) are known as dateparts.
Many times, it is useful to extract a datepart from a date column. For example, suppose you want to perform a monthly distribution of content additions for Netflix. That is, you want to determine if there is a particular month in which Netflix adds the most content to its platform. You could create a frequency distribution for the months for the date_added column. But, in the current form that the date_added column is in, this would be difficult. However, if you split the date_added column into its different dateparts, it would be straightforward to construct the frequency distribution for the months.
EXAMPLE
You can use Excel to split the month value from the date_added column using either the numeric or the abbreviated text version of the month.
Just as you can split data from one column into multiple columns, you can also combine data from multiple columns into a single column. When you have related data spread across multiple columns (first name, last name), combining them into a single column (full name) makes the data set more organized. Combining columns like street, city, and zip_code into an address column makes data entry easier and improves readability. In a similar fashion to splitting data, the columns you decide to combine should be driven by your analysis goals.
EXAMPLE
Combining relevant columns can create more informative columns. Suppose you need one column that combines the information from the duration and type columns in the Netflix data. You can create a combined column that describes a description of the content along with the duration. You can create a column named content_duration that looks as follows:
| Data in Column A | Data in Column B | & Function Method | CONCAT() Function Method | Output in Column C |
|---|---|---|---|---|
| Rajesh | Patel | =A2& " "& B2 | =CONCAT(A2, " ", B2) | Rajesh Patel |
| +1 | 555-1234 | ="+" &B2 | =CONCAT("+1", B2) | +1555-1234 |
| Product A | Service X | =A2& " - "&B2 | =CONCAT(A2, " - ", B2) | Product A - Service X |
| Excel | Functions | =A2& ", " &B2 | =CONCAT(A2, ", ", B2) | Excel, Functions |
Source: THIS TUTORIAL WAS AUTHORED BY SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.