Table of Contents |
The concepts for preparing and cleaning data for the next several tutorials will be illustrated with Netflix data from Kaggle. The full Netflix dataset is available on Kaggle and contains a comprehensive collection of information about movies and TV shows available on Netflix. For this course's purposes, we modified the data set to give you a more focused and manageable learning experience.
Formatting in Excel is the process of changing the appearance of your data. Formatting allows you to enhance readability by making it easier for users to interpret and understand the information. When data is formatted and standardized (one of the data cleaning steps), users of the data can better determine what the data on the spreadsheet is “saying”. Consistent data formats, decimal places, and alignment prevent errors during analysis. Conditional formatting allows you to emphasize specific values or patterns. For example, you can highlight cells with the highest or lowest values, identify duplicates, or color-code data based on specific criteria.
In the following sections, you will format data values related to a Netflix data set.
Excel offers a variety of formatting tools to help users present numerical data clearly and effectively, depending on the type of analysis you are performing and the visualization(s) you want to show. Some of the most commonly used formatting options are illustrated below.
Screenshot locating the format menu
General
The General format is the default number format that Excel applies when you enter a number. It displays numbers as they are typed, without any specific formatting. For example, if you enter 1234.567, it will appear exactly as 1234.567.
Since general formatting is the default format in Excel and because this dataset has yet to be cleaned, all of the columns in Netflix.xlsx have general formatting applied, except for date_added.
Currency
The Currency format is used for monetary values and displays the currency symbol next to the number. It also allows for the specification of decimal places. For example, if you format 1234.567 as currency with two decimal places, it will appear as $1,234.57.
EXAMPLE
Currency formatting applies dollar signs ($) and decimal places to the budget column.Comma
Alternatively, the comma format is used to display numbers with the thousand's place separators (,) and a specified number of decimal places. For example, 1234.567 formatted with the comma style and two decimal places would appear as 1,234.57. Comparing this to the currency, the comma and decimal places appear, but the $ is no longer included.
EXAMPLE
In Excel, you can apply a comma format. Comma formatting applies a thousands separator and two decimal places to the budget column.Accounting
The accounting format is like the currency format but aligns the currency symbols and decimal points in a column, making it easier to read financial data. For example, 1234.567 formatted as accounting with two decimal places would appear as $1,234.57.
EXAMPLE
In Excel, you can apply a currency format. Accounting formatting applies dollar signs ($) and decimal places and aligns the currency symbols and decimal points in a column to the food expenditures column.Number
Number formatting is used to display numbers with a specific amount of decimal places and without the thousands separators. For instance, formatting 1234.567 as a number with two decimal places would display it as 1234.57. Unlike Accounting though no dollar sign will be displayed.
EXAMPLE
In Excel, you can apply a number format. Number formatting applies two decimal places to the food expenditures column.Percentage
The percentage format multiplies the cell value by 100 and displays the result with a percentage symbol. For example, if you enter 0.1234 and format it as a percentage with two decimal places, it will appear as 12.34%.
EXAMPLE
In Excel, you can apply a percentage format. Percentage formatting multiplies the highlighted cells by 100 and displays the results with a percent symbol (%) and two decimal places.All of these formatting tools help make numbers more readable and interpretable, ensuring numerical information is presented in a clear and professional manner.
Formatting text in Excel is essential for creating clear and visually appealing spreadsheets. For data analysts, getting data types and formats correct is fundamental to their work. It ensures:
Auto Fit
If you want to adjust the cell size to fit the text in the column or row, Auto Fit can greatly enhance the readability of your spreadsheet. You can manually adjust the column width and row height by dragging the borders of the column or row headers. Alternatively, you can use the Auto Fit feature to automatically resize the columns or rows to fit the text.
EXAMPLE
In Excel, you can apply the auto fit function. The auto fit function adjusts the column size to fit the longest string of text or numbers.Merge and Center
Merge and center allow you to combine multiple cells into one larger cell and center the text within it. This is particularly handy for creating headers that span multiple columns.
EXAMPLE
Apply the merge and center function. The merge and center function allows you to center a title over multiple columns. Let’s create a heading that spans the first three columns of the data sheet, show_id, type and title. We can call this heading Show Information.Data analysts need to work with consistent data. Consistency helps ensure data looks professional and can easily be sorted or filtered.
The Date format is used to display dates in various styles. Excel offers multiple date formats, such as MM/DD/YYYY, DD-MMM-YYYY, etc. For example, entering 07/13/2024 can be formatted to display as 13-Jul-2024.
EXAMPLE
In Excel, you can apply the date format to the date_added column to consistently format the date in MM/DD/YYYY format.There are many different ways to format dates. By choosing the “Number Format” tool in the Home menu, scrolling to “More Number Formats...”, and then choosing the “Date” category, you can choose from many different date types from the window that appears on the right side of the screen.
In this window, one can find various other formatting options, or you can create your own by using the “Custom” feature.
Conditional formatting in Excel is a powerful tool that allows you to automatically apply formatting to cells based on the values they contain. This feature is particularly useful for highlighting important data, identifying trends, and making your spreadsheet more visually appealing. With conditional formatting, you can set rules that change the appearance of cells, such as their background color, font color, or border, depending on the criteria you specify.
Another common use of conditional formatting is to create data bars, which visually represent the magnitude of values within a range. This can be particularly helpful for comparing numbers at a glance.
You can also use conditional formatting to create color scales, which apply a gradient of colors to a range of cells based on their values. For instance, you might use a color scale to highlight temperature data, with lower temperatures in blue and higher temperatures in red.
A data analyst might use conditional formatting on numbers to quickly identify trends, outliers, and patterns within a dataset. By applying different colors, icons, or data bars to cells based on their values, conditional formatting helps to visually highlight important information, making it easier to spot anomalies or significant data points at a glance. This can be particularly useful when dealing with large datasets, as it allows the analyst to focus on key areas that require further investigation or action. Additionally, conditional formatting can enhance the overall readability and presentation of the data, making it more accessible and understandable for stakeholders who may not be as familiar with the raw numbers.
Apply Color to Numbers
A data analyst might want to highlight all numbers above a certain threshold in yellow to quickly identify data.
EXAMPLE
In Excel, you can apply conditional formatting to the average rating column to show which shows have an average rating of 3 or greater.Highlight the Bottom 10%
A data analyst might want to highlight all numbers below a certain threshold in light blue to pick out certain values quickly.
EXAMPLE
In Excel, you can apply conditional formatting to the budget column to see which shows have a budget in the bottom 10%.Much like using conditional formatting on numbers, you can use it on text to swiftly pinpoint and emphasize crucial information within a dataset. By setting up specific formatting rules, such as altering the font or background color based on certain conditions, analysts can detect patterns, anomalies, or significant categories. This visual distinction makes the data more comprehensible and accessible, facilitating quicker and more precise analysis. Moreover, conditional formatting can be employed to highlight errors or inconsistencies, ensuring the data’s quality and accuracy. In essence, it boosts the efficiency and effectiveness of data analysis by offering immediate visual indicators.
Highlight Movies and TV Shows in Different Colors
A data analyst might want to highlight all movies in blue and all TV shows in pink.
EXAMPLE
In Excel, you can apply conditional formatting to the type column so that all records that are movies are highlighted in light blue and all records that are listed as TV Show are highlighted in plum.The UNIQUE Function
Suppose you want to highlight movies and TV shows differently using colors. You first need to examine the type column and determine what are the different categories of this categorical variable. You can use the UNIQUE() function in Excel.
When using the UNIQUE() function, there are some things to keep in mind. First is the syntax.
=UNIQUE(array, [by_col], [exactly_once]), where:
EXAMPLE
You can apply the UNIQUE() function to the type column so you will know how many categories are in the data.Conditional formatting in Excel provides a straightforward method to detect duplicates. The conditional formatting to detect duplicates does not alter the data, it just provides a visual cue for further action.
EXAMPLE
Identifying duplicate directors allows you to explore patterns. You might discover that some directors are prolific or that specific genres are associated with certain directors. Determining if there are any duplicate directors is the first step in this process.Source: THIS TUTORIAL WAS AUTHORED BY SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.