The JOIN operations, which are among the possible TableExpressions in a FROM clause, perform joins between two tables. [You can also perform a join between two tables using an explicit equality test in a WHERE clause, such as "WHERE t1.col1 = t2.col2".]
Syntax
JOIN Operation
The JOIN operations are:
- INNER JOIN operation
Specifies a join between two tables with an explicit join clause.
- LEFT OUTER JOIN operation
Specifies a join between two tables with an explicit join clause, preserving unmatched rows from the first table.
- RIGHT OUTER JOIN operation
Specifies a join between two tables with an explicit join clause, preserving unmatched rows from the second table.
- CROSS JOIN operation
Specifies a join that produces the Cartesian product of two tables. It has no explicit join clause.
- NATURAL JOIN operation
Specifies an inner or outer join between two tables. It has no explicit join clause. Instead, one is created implicitly using the common columns from the two tables.
In all cases, you can specify additional restrictions on one or both of the tables being joined in outer join clauses or in the WHERE clause.
This article looks at different types of SQL joins. If you’re new to the subject you may want to check out the SQL joins article as well. Please note that joins only work with relational databases.
Quick review of SQL Join Types
A SQL join tells the database to combine columns from different tables. We normally join tables by matching the foreign keys in one table to the primary keys in another. For example, every record in the
| count[*] |
| -------- |
| 0 |
6 table has a unique ID in the | count[*] |
| -------- |
| 0 |
7 field: that’s the primary key. To match the key, every record in | count[*] |
| -------- |
| 0 |
8 has a product ID in the | count[*] |
| -------- |
| 0 |
9 field: that’s a foreign key. If we want to combine information about an order with information about the product that was ordered, we can do an inner join:SELECT
orders.total as total,
products.title as title
FROM
orders INNER JOIN products
ON
orders.product_id = products.id
It’s very important that we use
SELECT
count[*]
FROM
reviews
0 and not SELECT
count[*]
FROM
reviews
1 in the join: both fields are just numbers, so some order IDs will match some product IDs, but those matches will be meaningless.The problem with SQL joins explained
Even if we use the correct fields, there is a trap here for the unwary. It’s easy to check that every record in
SELECT
count[*]
FROM
reviews
2 contains a product ID—a count of the number of null values in SELECT
count[*]
FROM
reviews
0 returns 0:SELECT
count[*]
FROM
orders
WHERE
orders.product_id IS NULL
| count[*] |
| -------- |
| 0 |
But what if things don’t always match? For example, suppose we’re trying to find out which products lack reviews. If we look at the
SELECT
count[*]
FROM
reviews
4 table, it has 1,112 entries:SELECT
count[*]
FROM
reviews
| count[*] |
| -------- |
| 1112 |
Every single review refers to a product:
SELECT
count[*]
FROM
reviews
WHERE
reviews.product_id IS NULL
| count[*] |
| -------- |
| 0 |
But does every product have reviews? To find out, let’s count the number of products:
SELECT
count[*]
FROM
products
| count[*] |
| -------- |
| 200 |
We can then combine the
| count[*] |
| -------- |
| 0 |
6 and SELECT
count[*]
FROM
reviews
4 table and count the number of distinct products in the result. [In real life we’d probably use SELECT
count[*]
FROM
reviews
7 to get this number, but using SELECT
count[*]
FROM
reviews
8 helps us illustrate the idea.]SELECT
count[distinct products.id]
FROM
products INNER JOIN reviews
ON
products.id = reviews.product_id
SELECT
count[*]
FROM
orders
WHERE
orders.product_id IS NULL
0Only 176 of the 200 products have any reviews. As a result, if we count the number of reviews for each product, we’ll only get the counts where there were some reviews—our query won’t tell us anything about products that lack reviews because the inner join won’t find any matching when combining the tables. This query demonstrates the problem:
SELECT
count[*]
FROM
orders
WHERE
orders.product_id IS NULL
1SELECT
count[*]
FROM
orders
WHERE
orders.product_id IS NULL
2We’ve ordered the result in ascending order by count; as this shows, the lowest count is 1, when it should be 0.
Outer SQL join types to the rescue
All right: we know how many products don’t have reviews, but which ones are they? One way to answer that question is to use the type of SQL join known the left outer join, also called a “left join”. This kind of join always returns at least one record from the first table we mention [i.e., the one on the left]. To see how it works, imagine we have two little tables called
SELECT
count[*]
FROM
reviews
9 and | count[*] |
| -------- |
| 1112 |
0. The SELECT
count[*]
FROM
reviews
9 table contains three rows:SELECT
count[*]
FROM
orders
WHERE
orders.product_id IS NULL
3while the
| count[*] |
| -------- |
| 1112 |
0 table contains just two rows:SELECT
count[*]
FROM
orders
WHERE
orders.product_id IS NULL
4If we do an inner join on these two tables, matching
| count[*] |
| -------- |
| 1112 |
3 to | count[*] |
| -------- |
| 1112 |
4, only the | count[*] |
| -------- |
| 1112 |
5 records match:SELECT
count[*]
FROM
orders
WHERE
orders.product_id IS NULL
5SELECT
count[*]
FROM
orders
WHERE
orders.product_id IS NULL
6Nothing in the
| count[*] |
| -------- |
| 1112 |
0 table is red, so the first record from SELECT
count[*]
FROM
reviews
9 isn’t included in the result. Similarly, nothing from SELECT
count[*]
FROM
reviews
9 is green, so the nylon material from | count[*] |
| -------- |
| 1112 |
0 is discarded as well.If we do a left outer join, though, the database keeps every record from the left table that lacks a match. Since there aren’t matching values from the right table, SQL fills in those columns with
SELECT
count[*]
FROM
reviews
WHERE
reviews.product_id IS NULL
0:SELECT
count[*]
FROM
orders
WHERE
orders.product_id IS NULL
7SELECT
count[*]
FROM
orders
WHERE
orders.product_id IS NULL
8Keeping all of the records from the left table turns out to be useful in a lot of different situations. For example, if we want to see which paints don’t have matching fabrics, we can do a left outer SQL join:
SELECT
count[*]
FROM
orders
WHERE
orders.product_id IS NULL
9| count[*] |
| -------- |
| 0 |
0This is easier to read if we select only the rows where the values from the right-hand table are
SELECT
count[*]
FROM
reviews
WHERE
reviews.product_id IS NULL
0:| count[*] |
| -------- |
| 0 |
1| count[*] |
| -------- |
| 0 |
2We can use this technique to get a list of products that don’t have any reviews by doing a left outer join and keeping only the rows where
SELECT
count[*]
FROM
reviews
WHERE
reviews.product_id IS NULL
2 has been filled in with SELECT
count[*]
FROM
reviews
WHERE
reviews.product_id IS NULL
0:| count[*] |
| -------- |
| 0 |
3| count[*] |
| -------- |
| 0 |
4What about right outer SQL join and full outer join?
The SQL standard defines two other kinds of SQL join types for the outer join, but they are used much less often—so much less than some databases don’t even implement them. A right outer join works exactly like a left outer join, except it always keeps rows from the right table and fills columns from the left table with
SELECT
count[*]
FROM
reviews
WHERE
reviews.product_id IS NULL
0 when there aren’t matches. It’s pretty easy to see that you can always use a left outer join instead of a right one by swapping the tables around; there’s no particular reason to favor one over the other, but almost everyone uses the left-handed form, so we suggest you do too.A full outer join keeps all of the information from both tables. If a record on the left lacks a match on the right, the database will fill in the missing right-hand values with
SELECT
count[*]
FROM
reviews
WHERE
reviews.product_id IS NULL
0, and if a record on the right lacks a match on the left, it fills in the missing left-hand values. For example, if we do a full outer join on SELECT
count[*]
FROM
reviews
WHERE
reviews.product_id IS NULL
6 and SELECT
count[*]
FROM
reviews
WHERE
reviews.product_id IS NULL
7 we get:| count[*] |
| -------- |
| 0 |
5Full outer joins are occasionally useful for finding the overlap between two tables, but in twenty years of writing SQL, I have only ever used them in lessons like this one.
Which SQL join type to use?
To review, there are four basic types of joins. Inner joins only keep records that match, and the other three types fill in missing values with
SELECT
count[*]
FROM
reviews
WHERE
reviews.product_id IS NULL
0 as shown in Figure 1. Some people think of the left table as the main or initial table; the type of join you use will determine how many records from that initial table you’ll return, as well as any additional records you’ll return based on the columns you want from the other table. We’ve already seen exceptions to this here [there were multiple reviews for each product, for example], but that’s a good sign you have a good primary table to start with.In general, you’ll only really need to use inner joins and left outer joins. Which join type you use depends on whether you want to include unmatched rows in your results:
- If you need unmatched rows in the primary table, use a left outer join.
- If you don’t need unmatched rows, use an inner join.
For another angle on joins that abstracts away the SQL, check out our article on joins using Metabase’s query builder.
Common problems with SQL joins
Doing an inner SQL join instead of an outer join
This is probably the most common error. Real data often has gaps, and inner joins will discard records without warning you whenever keys don’t line up. Counting the number of rows from one table that don’t have matches in another is a good safety check; if there are any, you should think about using an outer join instead of an inner one.
Using SQL joins on “matches” that aren’t meaningful
A person’s weight in kilograms and the value of their last purchase in dollars are both numbers, so it’s possible to do a join by matching them, but the result will [probably] be meaningless. A less frivolous example comes up when one table contains several foreign keys that refer to different tables, which can lead to joining patient data with vehicle registrations instead of appointment dates. Declaring foreign keys in tables can help prevent this.
Confusing NULLs in data with NULLs from mis-matches
If one of the tables in an outer join contains
SELECT
count[*]
FROM
reviews
WHERE
reviews.product_id IS NULL
0s, we may wind up with a column with values that are missing because they weren’t in the original data and because of mismatches. Depending on the problem we’re trying to solve, these different “flavors” of SELECT
count[*]
FROM
reviews
WHERE
reviews.product_id IS NULL
0 may matter.