SQL: Subqueries

A subquery is a query nested inside another query. As a data practitioner, subqueries are really useful to scale and to make more powerful queries.

The main reason for the existence of subqueries is the need to combine information from multiple tables. Because, information in a relational database is not stored in a single table, it’s shared between several tables to meet standards.

Three types of subqueries:

  • Scalar subquery – when it’s one and only value, we call it scalar subquery or subquery as values
  • Multi-row subquery – when it’s several values, like a list or a column, we call it multi-rows subquery or subquery as lists
  • multi-column subquery – when it’s several columns, we call it multi-columns subquery or subquery as tables

The query inside (nested) is commonly referred to as a subquery or inner query, and the one containing the subquery is known as the outer query.

In SQL execution order, the inner query is executed first, and its result is then used in the execution of the outer query. Despite this nesting, the overall execution order within both the inner and outer queries follows the standard SQL query execution order.

There are some guidelines to consider when using subqueries. The most important one is to ALWAYS enclose subqueries in parentheses. This is the only way SQL knows it’s a subquery.

Beyond that, we should arrange subqueries using the same style as any other query and align it to the right side of the river.

Scalar Subqueries

Scalar subqueries always return one and only one value. We can use them to:

  • Calculate percentages of total
  • Do filtering in WHERE and HAVING clause
    • Reminder: To filter the aggregate results in SQL, we use the HAVING clause instead of the WHERE clause.
Multi-row Subqueries

The commonly used operators for multi-row subqueries are IN and NOT IN.

correlated subquery is an inner query that uses values from the outer query. As a result, it can’t execute independently like other subqueries.

There are cases when the use of correlated subqueries is more appropriate than joins — for example, when inner and outer queries operate on the same table.

We often use correlated subqueries with the EXISTS operator. EXISTS is a logical operator that checks if the subquery returns any row. The syntax of its use is as follows:

SELECT column_1, column_2, . . .
  FROM table
 WHERE EXISTS(subquery);

In general, when we have more than three tables, nested subqueries can be necessary to retrieve information. A nested subquery is a query nested inside another subquery. 

  1. There are three types of subqueries based on their output:
    • Scalar subqueries (single value). Use them anywhere you can use values — for example, operations and WHERE clauses.
    • Multi-row subqueries (several values, one column). Use them, for example, to filter rows with IN and NOT IN operators.
    • Multi-column subqueries (several rows, several columns). Use them wherever you can use tables (i.e., FROM and with joins).
  2. A correlated subquery is a subquery that we can’t execute separately from the outer query execution. A correlated subquery can be any type of subquery (scalar, multi-row, or multi-column).
  3. A nested subquery is a subquery nested inside another subquery. We can nest any type of subquery (scalar, multi-row, or multi_column) inside another subquery.

Here are some steps to follow to write effective queries:

  1. Identify the tables where you can find the information you are looking for
  2. Identify the sub-problems to solve
  3. Write the queries to solve these sub-problems
  4. See how to combine and modify these queries to reach a solution with a single query.
  5. Optimize the query if necessary — there can be several approaches to solve the same problem.
Posted in SQL

Leave a Reply

Your email address will not be published. Required fields are marked *