Now that we have installed a SQL Server Express Database, SQL Server Management Studio (SSMS), AdventureWorks, can create a SQL SELECT Statements and in our last chapter, we learned about data modelling and normalisation. As normalised databases are very common, we will have to learn how to write cross table SELECT Statements using different types of joins statements in the FROM Clause of your SELECT Statements. In this tutorial, we are going to look at the different types of join statements, aliasing for readability, and well as giving your output names.
SINGLE TABLE SELECT
Let us start by opening SQL Server Management Server (SSMS) and executing the following SQL Statement against the AdventureWorks Database and Person Schema:
SELECT TOP 100 p.BusinessEntityID , p.PersonType , p.NameStyle , p.Title , p.FirstName , p.MiddleName , p.LastName , p.Suffix , p.EmailPromotion , p.AdditionalContactInfo , p.Demographics --, p.rowguid --, p.ModifiedDate FROM Person.Person p;
A couple of things to note here:
- We have put the letter p after the table name, this is called an alias; as we will start joining tables and may have the same columns in multiple tables, for example, ModifiedDate, we have to let the RDBMS know exactly what table the column is from.
- Your SQL Statement will return an error if your SELECT Clause has ambiguous columns i.e. you reference a column name that exists within two tables and not given an alias to say exactly which table the column refers to.
- I like to (and it is common practice) put commas before each line in the SELECT Clause as it means I can comment out a column by using
--
; as you can see in the last two columns has been commented out.
INNER JOIN
Inner Joins instruct your database to join two tables in your SELECT Statement and only return records where there is a match in the join condition, in essence, it is the intersect between the two tables. Execute the following statement:
SELECT TOP 100 p.BusinessEntityID , p.PersonType , p.NameStyle , p.Title , p.FirstName , p.MiddleName , p.LastName , p.Suffix , p.EmailPromotion -- , p.AdditionalContactInfo -- , p.Demographics --, p.rowguid --, p.ModifiedDate , pp.PhoneNumber , pnt.Name FROM Person.Person p INNER JOIN Person.PersonPhone pp ON (p.BusinessEntityID = pp.BusinessEntityID) INNER JOIN Person.PhoneNumberType pnt ON (pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID)
Now let us understand what this query is doing:
- We define our tables in our FROM Clause and, my personal preference, give the relationship information in the ON condition.
- You can also give a simple list of tables in your FROM Clause and define your join conditions in the WHERE Clause, but I personally hate this.
- In our SELECT Statement, we are
- Defining our first table called Person in the Person Schema and giving this an alias p.
- We are performing an INNER JOIN with table PersonPhone in the Person Schema and giving this an alias of pp.
- After the pp alias, we have an ON condition where we specify that we are connecting the tables based on the BusinessEntityID on the Person table matching the BusinessEntityID on the PersonPhone table.
- We are performing an INNER JOIN of the results of Person and PersonPhone with the PhoneNumberType table to get the type of the phone number.
As we are using an INNER JOINS, it means that a record in the Person table must have at least one entry in the PersonPhone table to be returned in your result set. Which joins you use depends on your data model and constraints in your Conceptual Data Model (CDM), for example, if every person in your data model must have a phone number, then it would be fine to use an INNER JOIN, however, if phone number is optional, then you might lose Person records and therefore may want to use a LEFT OUTER JOIN.
Note: in this particular case, an INNER JOIN is fine to use.
OUTER JOIN
Now let us modify the query slight to demonstrate the difference between an INNER and LEFT OUTER JOINS; we will modify the SELECT Statement to only return the phone number type when it is equal to 3, we have added a AND condition to our JOIN Clause.
SELECT p.PersonType , p.Title , p.FirstName , p.MiddleName , p.LastName , p.Suffix , pp.PhoneNumber , pnt.Name FROM Person.Person p INNER JOIN Person.PersonPhone pp ON (p.BusinessEntityID = pp.BusinessEntityID) INNER JOIN Person.PhoneNumberType pnt ON (pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID AND pnt.PhoneNumberTypeID = 3)
Execute this statement and see how many records are returned, we should get 736 records returned. Remember as we are performing inner joins, we are asking the RDBMS to get information from Person which must have a corresponding record in PersonPhone, and PersonPhone must have a corresponding record in PhoneNumberType.
Now let us change the FROM Clause to use a LEFT OUTER JOIN, execute it and observe the results.
SELECT p.BusinessEntityID , p.PersonType , p.NameStyle , p.Title , p.FirstName , p.MiddleName , p.LastName , pp.PhoneNumber , pnt.Name FROM Person.Person p INNER JOIN Person.PersonPhone pp ON (p.BusinessEntityID = pp.BusinessEntityID) LEFT OUTER JOIN Person.PhoneNumberType pnt ON (pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID AND pnt.PhoneNumberTypeID = 3)
Once we execute this SQL Statement we will get 19972 records as before, what is going on here?
- The LEFT OUTER JOIN Statement instructed the database to keep records on the left table of the JOIN statement regardless of if there are matches on the right table, so in essence, it will return the intersect between both tables and also all of the records on the left-hand side.
- If there are records in on the LEFT-hand table that does not have records on PhoneNumberTypeID 3, with a LEFT join, Person records will still be returned with null (empty values) in the columns on the right-hand table.
RIGHT OUTER JOINS are conceptually the same as LEFT OUTER JOINS, however, they will return records that intersect between the two tables, and all records on the right hand of the JOIN Statement regardless of matching records.
FULL OUTER JOINS bring back all records regardless of the join condition. If the join condition does not match then there will be null records.
Let us look at this from a diagram point of view, it may better help us visualise joins.
VISUALISING JOINS
To help you cement your understanding of the main types of joins you will see, the following Venn diagrams to represent what information is returned by joining tables A and B:
INNER JOIN – Records where the records in A matches records in B as defined in the join conditions.
LEFT OUTER JOIN – Records where the records in A matches records in B as defined in the join conditions; left outer joins will also include all records on the left-hand side of the join condition, and nulls (empty) in the right-hand columns for when there are no matches.
RIGHT OUTER JOIN – Records where the records in A matches records in B as defined in the join conditions; right outer joins will also include all records on the right-hand side of the join condition, and nulls (empty) in the left-hand columns for when there are no matches.
FULL OUTER JOIN – All records. If there are no matches, then nulls (empty) will be returned.
Now have a look at the AdventureWorks Database and explore with different JOIN conditions, query multiple tables, using Alias, and exploring how this all comes together.
CROSS JOIN
Although the INNER and OUTER (LEFT, RIGHT, FULL) JOINS are the most common types, ANSI specifies one more join type that is commonly used, the CROSS JOIN.
Cross joins are where every record in the left-hand table gets a copy of all records in the right-hand table and can be achieved by specifying CROSS JOIN and having no ON Clause in your join statement. I have to say, commonly this is the results of a coding mistake as opposed to being on intentional, but this type of join does have some uses.
For example, I might have a sales table (number of sales on a particular date) and a date table (list of all dates), and I might require a record per sale per date, even if there were no sales on a particular date; therefore, a Cross join would do the trick.
Note: If you ever think that for some reason you are getting magnitudes of more records that you expected, a CROSS JOIN is a very likely culprit.
SUMMARY
I hope you enjoyed this tutorial about SQL Join conditions and now feeling a bit more confident about writing your own queries. In this tutorial, we covered:
- Tableau Alias
- INNER JOINs
- LEFT OUTER JOINs
- RIGHT OUTER JOINs
- FULL OUTER JOINs
The most important aspect of learning about joins is to practice as much as you can. Have a look through the AdventureWorks Database and explore joining tables.
I hope you enjoyed this article and, as always, let me know what you thought about this and what other things you would like me to write about.
Sources
- Wikipedia