Table of Contents |
Subqueries in SQL are queries nesting within each other. One query can be used in another query as part of the results. Using subqueries, you can retrieve data that will be used for filtering, comparing, or computing in the main query. Combining and manipulating data from multiple tables and sources provides a powerful tool to check data in multiple dimensions in one query.
SQL statements can contain subqueries in various places:
EXAMPLE
SELECT product_name, price FROM products WHERE price > (SELECT AVG(price) FROM products);This example retrieves the names and prices of products whose prices are greater than the average price calculated by the subquery (SELECT AVG(price) FROM products).
By allowing you to operate on multiple tables and conditions simultaneously, subqueries enhance the flexibility of SQL queries. However, subqueries should be used with caution, as not optimizing them can adversely affect performance.
There are many potential uses for subqueries in the PostgreSQL database you have been working with in this class. For example, we could use a subquery to find all of the invoices that have a total amount larger than the average total amount.
We could do this in two separate steps based on what we’ve learned so far. We could first calculate the average total:
SELECT ROUND(AVG(total),2)
FROM invoice;
Next, we can take that value and query the invoice table to find those that are larger than 5.71.
SELECT invoice_id, invoice_date,customer_id, total
FROM invoice
WHERE total > 5.71;
Although this approach works, we can also do it in a way that can pass the result from the first query (that calculated the average) into the second query by using a subquery. The subquery can be nested in a SELECT, INSERT, DELETE, or UPDATE statement, but we mostly see it in a SELECT statement.
To construct the subquery, we would enclose the second query in round brackets and place it in the WHERE clause as an expression. So, consider our original second statement:
SELECT invoice_id, invoice_date,customer_id, total
FROM invoice
WHERE total > 5.71;
We would remove the 5.71 and replace it with round brackets enclosing the first statement, like this:
SELECT invoice_id, invoice_date,customer_id, total
FROM invoice
WHERE total > (SELECT ROUND(AVG(total),2) FROM invoice);
The query inside the round brackets is called the subquery, or inner query. The query that contains the subquery is called the outer query. Notice that we don’t include the semicolon at the end of the inner statement in the subquery.
The database would execute the subquery first, then take the result from that subquery and pass it to the outer query. Then, the database executes the outer query.
A subquery could potentially return 0 or more results, so it is important to consider the operator used to compare with its results. We can use any valid SQL operator in this type of query. For example, if we use a = operator, we would expect that the subquery would return 0 or 1 row(s) as a result.
Let's look at an obvious subquery that is querying on the customer_id to return the customer_id:
SELECT invoice_id, invoice_date,customer_id, total
FROM invoice
WHERE customer_id =
(SELECT customer_id
FROM customer
WHERE customer_id = 1);
Since the primary key of the customer table is the customer_id, querying on the customer_id would only return one value.
Consider if we tried to select a value that doesn’t exist:
SELECT invoice_id, invoice_date,customer_id, total
FROM invoice
WHERE customer_id =
(SELECT customer_id
FROM customer
WHERE customer_id = 0);
The results would still run, but show that 0 rows were displayed:
However, consider the case in which we have multiple rows being returned:
SELECT invoice_id, invoice_date,customer_id, total
FROM invoice
WHERE customer_id =
(SELECT customer_id
FROM customer
WHERE country = 'USA');
Since there are 13 customers that live in the country USA, we will end up getting this error:
In order to avoid this error, we have to use the IN operator instead of the equal sign. This will allow 0, 1, or many results to be returned:
SELECT invoice_id, invoice_date, customer_id, total
FROM invoice
WHERE customer_id IN
(SELECT customer_id
FROM customer
WHERE country = 'USA');
We can also use multiple columns as criteria to compare with the subquery. In order to do so, we must use the round brackets around the columns within the WHERE clause and have them match up with the columns that we want to compare within the subquery. For example, if we wanted to compare the customer_id and the billing_country in the invoice table with the customer_id and country in the customer table, we could do the following:
SELECT invoice_id, invoice_date, customer_id, total
FROM invoice
WHERE (customer_id, billing_country) IN
(SELECT customer_id, country
FROM customer
WHERE country = 'USA');
If we did not put parentheses around the columns named in the first WHERE clause, we would get an error.
SELECT invoice_id, invoice_date,customer_id, total
FROM invoice
WHERE customer_id, billing_country IN
(SELECT customer_id, country
FROM customer
WHERE country = 'USA');
Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.