In this lesson, you will become familiar with data privacy and data security issues that can arise during data analysis. Specifically, this lesson will cover:
Data privacy is the proper collection, storage, and use of customer data. It involves ensuring data is collected with proper consent, is properly protected, and is used only for authorized purposes. Building and maintaining a good reputation, securing customer trust, and staying out of costly legal trouble are all reasons that businesses should want to ensure they implement data privacy practices.
In the past, businesses primarily relied on manual record-keeping. Consequently, data collection was limited to basic customer information like names, addresses, and purchase history. The advent of computers and digital technology transformed the type and amount of data that businesses collect. Businesses started collecting more detailed data, including preferences, behavior, and interactions. The rise of the internet, e-commerce, and mobile apps led to an explosion of data. Businesses began tracking online activities, social media interactions, and location data. As businesses increasingly collect more data on users, privacy concerns have grown.
One key tenet of data privacy is that individuals should have control over their personal data, including how and what data businesses collect and how that data is used. Businesses implement data privacy strategies to balance utilizing customer data for growth and protecting individual privacy (Kosinski, 2023).
1a. Data Privacy Issues
The exponential growth of data collection by businesses across a variety of sectors has created endless possibilities and opportunities for growing a business and better serving its customers. The hyper-personalized messages that companies like Starbucks send to their customers are not possible without collecting extremely detailed data about customer preferences and behaviors. Businesses must carefully distinguish between what is a helpful product suggestion and what is intrusive. Sometimes these targeted marketing messages can go too far.
EXAMPLE
One of the more infamous examples of targeted marketing going too far was Target, a major retailer, using customer data to target moms-to-be. In one notable case, a teenage girl started receiving targeted ads for baby products before her parents knew she was pregnant. This incident highlights the delicate balance between personalization and privacy. While tailored ads can be effective, businesses must be cautious not to overstep boundaries.
Below are several strategies that businesses employ to keep their customer data private and prevent issues when analyzing data (HBS, 2015).
Transparency: Businesses need to continuously communicate with customers about how customer data is being collected and used. Transparency builds trust. Building and maintaining trust among customers leads to gaining more customers and retaining current customers.
EXAMPLE
Businesses can inform customers about their data collection and usage practices and allow the customers to opt out. When customers can actively manage their data preferences, this allows them to feel empowered, understand their rights, and provide consent for the business using their data. In turn, informed consent assures that businesses are conforming to privacy laws.
Limit the collection of personal identifiable information (PII): PII is any piece of data that can uniquely identify a person such as social security number, driver’s license number, or credit card number. Businesses should only collect data that is relevant to the intended purpose. By collecting only the necessary PII, businesses respect the individual customer’s privacy, and the business can focus on deriving insights from customer data over individual identification.
Conform to data privacy laws: There are several data privacy laws that govern how businesses can use data. HIPPA (Health Insurance Portability and Accountability Act) is a data privacy law that ensures an individual’s healthcare data will be kept private. FCRA (Fair Credit Reporting Act) has a clause that a consumer’s credit score must be kept private by a business that is using credit scores to make decisions. Being aware, understanding, and conforming to privacy laws is an integral part of keeping customer data private.
IN CONTEXT Government Regulation of Data Privacy
HIPPA and FCRA are US data privacy laws enforced at the national level. There are additional privacy laws that differ from state to state and region to region. The laws are constantly evolving. Staying current with data privacy laws is a big job, and many businesses have an individual or several individuals devoted to this task.
The General Data Protection Regulation (GDPR) is a comprehensive data privacy regulation enacted by the European Union (EU). The GDPR contains several key principles that are the foundation of data privacy, such as requiring businesses to be transparent about how the business is going to use customer data and obtain the appropriate consent. Collect only the customer data that is required for the intended purpose. Do not collect irrelevant or excessive customer data. The data should only be stored as long as needed. If customer data is no longer needed for the intended purpose, the data should be destroyed (Cipm, 2024).
Although GDPR is an EU law, its impact extends beyond Europe, affecting businesses in the United States that interact with EU businesses and customers. The GDPR applies to US businesses that are providing goods and services to EU consumers. The size and revenue of a US business does not matter. If a US business is processing personal or user data of an EU consumer, the GDPR laws are applicable to that US business. The US is slowly starting to adopt similar laws. Currently, there are 18 states that have passed legislation related to privacy laws (Law, 2024).
Data privacy laws hold the business financially accountable even if the law is violated by mistake. Several issues can arise when data privacy laws are violated. Financial penalties can range from thousands to millions of dollars depending on the severity of the violation. Legal fees are also incurred that stem from investigations or lawsuits due to the violation. Reputational damage can occur due to loss of customer trust or negative publicity, which in turn could lead to decreased revenue. Business operations can be impacted due to employees having to stop their normal work tasks and deal with a data privacy violation. This pause can impact the productivity of the business. Proactive data privacy measures such as monitoring changing data privacy laws and ensuring conformity to the laws are essential to avoid these issues.
terms to know
Personal Identifiable Information (PII)
Information that permits the identity of an individual.
HIPPA (Health Insurance Portability and Accountability Act)
A federal law established in 1996 that provides rules for the use and sharing of individual health information.
FCRA (Fair Credit Reporting Act)
A federal law established in 1970 that ensures the privacy of information contained in consumer credit reports.
2. Data Security
did you know
In June 2024, Ticketmaster experienced a massive data breach. The criminal hacker group ShinyHunters claimed responsibility for stealing personal details from over 560 million customers worldwide. The stolen data included names, addresses, phone numbers, and partial credit card information. The hackers demanded a $500,000 ransom to prevent selling the data to other parties (Schneid, 2024). This is one of many news stories that seem to be more common. These types of malicious data attacks highlight the need for businesses to be vigilant with their data security practices and procedures.
While data privacy is concerned with the customer's control over how their data is used, data security is concerned with protecting data from unauthorized use, data breaches, or theft. Data security is essential for data privacy, but the reverse relationship is not always true. For example, a business could have an extremely secure method of guarding customer data, but the business may not respect the customer’s privacy if it mishandles customer data or ignores consent guidelines. Data privacy and security are related but denote slightly different concepts. The table below highlights a comparison between the two concepts.
Comparison of Data Privacy and Data Security
Data Privacy
Data Security
Definition
Proper collection, processing, storage, and use of customer data for authorized purposes
Protecting data from unauthorized use, malicious attacks, or abuse
Key Concern
Transparency, data privacy laws
Actively monitoring data, encryption, data breach response
Purpose
Individual customer rights and freedom from intrusion
Provide access to authorized users only
Example
Marketing company obtaining consent for personalized advertisements based on behavior (past online purchases and searches)
Standard text messages (SMS) are not encrypted, but services like WhatsApp and Apple's iMessage are encrypted.
terms to know
Data Breach
Unauthorized individuals gain access to personal or sensitive data (social security numbers, bank account information, etc..).
Hacker
An individual with knowledge of computer systems who accesses data that should otherwise not be accessible.
Malicious Data Attack
An intentional effort to gain access to personal or sensitive data in a manner that compromises the confidentiality of the data using manipulation or unauthorized techniques.
Encryption
Process of transforming readable text (numbers and letters) into an indecipherable format.
2a. Data Security Issues
When analyzing data, several data security issues can arise. Let’s explore several data security practices that businesses implement to avoid these issues.
Data integrity is the process of maintaining accurate, consistent, and reliable data throughout its lifecycle. Maintaining data integrity enhances the overall security of the data. When data maintains its intended state and remains unaltered, the data quality and usability can be trusted. An example of data integrity is error detection and correction. Medical professionals rely on accurate patient records for proper diagnosis and treatment. Incorrect records can lead to adverse events for the patient.
EXAMPLE
Cyberattacks can pose a threat to data integrity. A cyberattack is any intentional effort to steal, expose, alter, disable, or destroy data through unauthorized access. A cyberattack may alter and manipulate the data. Measures must be put in place to revert the data back to its original form so data integrity can be maintained. Businesses take measures to prevent cyberattacks like backing up data and storing it in a separate location, purchasing software that can detect cyberattacks, and educating employees on cyberattacks and what they can do to avoid them.
Data encryption is like putting your sensitive information in a secret code before sending it over the internet or storing it on your computer. Imagine you have something of great value in a treasure chest, and you want to keep it safe. Instead of leaving it wide open, you lock it with a special key. Only someone with that key can unlock the chest and see what’s inside.
EXAMPLE
Unencrypted data can lead to data breaches because hackers can easily access unencrypted data. Hackers intercept data as the data travels across networks. When data travels across networks it is like sending a message to two different places. If the data is not encrypted, hackers can read everything. Unencrypted data lets hackers have access to the messages (the data).
terms to know
Data Integrity
Maintaining the trustworthiness of data by ensuring it is accurate and secure.
Cyberattack
An intentional effort to access unauthorized data to alter, destroy, or steal the data for financial gain.
Network
System designed to transmit data between two or more devices.
3. Ethical Issues
Data privacy and security are both closely related to ethical issues that arise when analyzing data. You have already learned about some ethical challenges such as the tension between extracting valuable insights and protecting individual customer privacy. Securing data against data breaches, unauthorized access, or cyberattacks are also part of ethical issues that arise during data analysis. Beyond the data and security issues that have already been discussed, data analysts must guard against bias and discrimination that can be present in the data analysis.
EXAMPLE
In 2019, Apple teamed up with Goldman Sachs to offer credit cards to consumers. Credit acceptance or denial and credit limit (if accepted) is determined using data analytics. Steve Wozniak, a co-founder of Apple, reported receiving a much higher credit limit than his wife, who shared all his assets. Goldman Sachs was not found liable for unfair lending practices, but many researchers claimed the algorithm was biased against females (Perry, 2019).
One way to prevent an ethical issue that the Apple credit card highlights is to ensure that the data set used in the data analysis is representative of the entire population. A representative analysis data set ensures reliable insights when applying the data analytical results to the broader population. Collect data from a wide range of sources, including different demographics, backgrounds, and regions. Avoid overrepresenting or underrepresenting different groups. Before analysis, document your beliefs and assumptions. This helps recognize biases as you review results.
Humans have biases, and humans are the ones that analyze data. Our biases can seep into the data analysis process. It is important to recognize your biases to assess the results for any potential biases and adjust the analysis as needed. Consider an example where stereotypes influence data interpretation.
EXAMPLE
Scenario: Hiring Decisions in a Tech Company Stereotype: “Engineers are highly competent but lack social skills.” Data Collection: A tech company analyzes job performance metrics for engineers. Bias Impact:
Interpretation: The company may disproportionately focus on technical achievements (competence) while overlooking teamwork, collaboration, and communication.
Outcome: Engineers who excel technically but struggle with interpersonal skills might be promoted, while those with balanced abilities are undervalued.
In this case, the stereotype affects how data is evaluated, potentially leading to biased decisions. Awareness of such biases is crucial for fair and accurate data analysis (To, 2024).
term to know
Bias
A natural inclination for or against an individual, a group, or idea.
summary
In this lesson, you learned to distinguish between two interconnected but distinct concepts: data privacy and data security. This lesson compared the two concepts along with describing potential data privacy issues and data security issues that can arise during data analysis.
Ethical issues related to data privacy and security were discussed including the importance of considering how bias in data collection may affect downstream analysis and how bias on an individual level can seep into the data analysis process. Analysts must be on guard for both types of bias to avoid inaccurate analyses.
Source: THIS TUTORIAL WAS AUTHORED BY SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.
A natural inclination for or against an individual, a group, or idea.
Cyberattack
An intentional effort to access unauthorized data to alter, destroy, or steal the data for financial gain.
Data Breach
Unauthorized individuals gain access to personal or sensitive data (social security numbers, bank account information, etc..).
Data Integrity
Maintaining the trustworthiness of data by ensuring it is accurate and secure.
Encryption
Process of transforming readable text (numbers and letters) into an indecipherable format.
FCRA (Fair Credit Reporting Act)
A federal law established in 1970 that ensures the privacy of information contained in consumer credit reports.
HIPPA (Health Insurance Portability and Accountability Act)
A federal law established in 1996 that provides rules for the use and sharing of individual health information.
Hacker
An individual with knowledge of computer systems who accesses data that should otherwise not be accessible.
Malicious Data Attack
An intentional effort to gain access to personal or sensitive data in a manner that compromises the confidentiality of the data using manipulation or unauthorized techniques.
Network
System designed to transmit data between two or more devices.
Personal Identifiable Information (PII)
Information that permits the identity of an individual.