Toan Hoang

SQL102 / Structured Query Language (SQL)

In the first part of this tutorial series, SQL101 / Relational Database Management Systems, we went through the process of installing a Relational Database Management System (SQL Server Express), installing SQL Server Management Studio (SSMS), and finally restoring a sample database for us to explore (AdventureWorks). In this episode, we will be going through and introducing you to the Structure Query Language, better known as SQL.

Structure Query Language (SQL)

Structured Query Language, commonly referred to as SQL (pronounced “sequel”), is a language used in programming and designed for managing data held in a relational database management system (RDBMS).

Originally based upon relational algebra and tuple relational calculus, SQL consists of many types of statements, which may be informally classed as sublanguages, commonly:

SQL was one of the first commercial languages for Edgar F. Codd’s relational model, as described in his influential 1970 paper, “A Relational Model of Data for Large Shared Data Banks”. Despite not entirely adhering to the relational model as described by Codd, it became the most widely used database language.

SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987. Since then, the standard has been revised to include a larger set of features. Despite the existence of such standards, most SQL code is not completely portable among different database systems without slight adjustments.

Database providers have also included extensions to Standard SQL to add procedural programming language functionality, such as control-of-flow constructs. These include:

SourceCommon NameFull Name
ANSI/ISO StandardSQL/PSMSQL/Persistent Stored Modules
Interbase / FirebirdPSQLProcedural SQL
IBM DB2SQL PLSQL Procedural Language (implements SQL/PSM)
IBM InformixSPLStored Procedural Language
IBM NetezzaNZPLSQLbased on Postgres PL/pgSQL
InvantivePSQLInvantive Procedural SQL (implements SQL/PSM and PL/SQL)
Microsoft / SybaseT-SQLTransact-SQL
Mimer SQLSQL/PSMSQL/Persistent Stored Module (implements SQL/PSM)
MySQLSQL/PSMSQL/Persistent Stored Module (implements SQL/PSM)
MonetDBSQL/PSMSQL/Persistent Stored Module (implements SQL/PSM)
NuoDBSSPStarkey Stored Procedures
OraclePL/SQLProcedural Language/SQL (based on Ada)
PostgreSQLPL/pgSQLProcedural Language/PostgreSQL
SAP R/3ABAPAdvanced Business Application Programming
SAP HANASQLScriptSQLScript
SybaseWatcom-SQLSQL Anywhere Watcom-SQL Dialect
TeradataSPLStored Procedural Language

* We will be focusing on Microsoft SQL Server and in later tutorials will be using T-SQL to create functions and stored procedures.

For the remainder of this tutorial, we will be focused on learning the basic structure and syntax around Data Query Language (DQL), more specifically the SELECT Statement.

SELECT FROM

The main statement in the Data Query Language (DQL) for SQL is the SELECT Statement; the SELECT Statement is used to retrieve information from relational database management systems (RDBMS). We will use SQL Server Management Studio (SSMS) to issue SELECT Statements to the AdventureWorks database (set up in the previous tutorial); we will gradually build our knowledge about the various aspects of the SELECT Statement. So let us get started:

Let us understand a bit more about the various Clauses.

SELECT CLAUSE

The SELECT Clause of the SELECT Statement tells the Relational Database Management System (RDBMS) what data is to be returned.

SELECT TOP 100 *

FROM CLAUSE

The FROM Clause of the statement tells the Relational Database Management System (RDBMS) where to get the information from.

FROM Person.Person

Database Schemas

A Database Schema is a way to group related objects in a database together. In our example, we are querying the Person table within the Person schema; in AdventureWorks, the Person schema also has tables containing addresses, phone numbers as well as other Person related information.

FROM [Schema Name].[Table Name]

Database Tables

A database table is a collection of related data held in a structured format within a database. It consists of columns and rows.

TOP 100

This part of the SELECT Clause tells the RDBMS to return the first 100 records; this is used to limit our data and especially useful when exploring data.

Asterix (*)

The asterisk (*) tells the Relational Database Management System (RDBMS) to return all columns.

Now, let us resume with our tutorial.

While looking at the results you can see that there are columns within our returned dataset that we may or may not care about, for example, I do not care about the demographics or rowguid (for now); actually the more I think about it, I want to be able to specify the exact columns that I want, so let us modify our query to do exactly that in the SELECT Clause:

SELECT TOP 100 BusinessEntityID, Title, FirstName, MiddleName, LastName, PersonType, EmailPromotion
FROM Person.Person

Execute this query and have a look at the results; notice that in the SELECT Clause you can have your columns in whatever order you want.

Note: SQL Server Management Studio (SSMS) comes with intellisense which will try to autocomplete your SQL Statements, this is truly a godsend.

SELECT FROM WHERE

Now that we have a SQL SELECT Statement that instructs the RDBMS to return specific columns from a specific table in a specific schema, I may also want to as the RDBMS to filter the data based on provided condition. Let us role play for a moment… you are working on getting a list of people that you can actively market to, and your marketing team has informed you that you can only market to Persons who have their EmailPromotion set to 1, so let us amend our SQL Statement:

SELECT BusinessEntityID, Title, FirstName, MiddleName, LastName, PersonType
FROM Person.Person
WHERE EmailPromotion = 1

As we want a complete list, we have removed the TOP 100; we have also removed the EmailPromotion from the SELECT Clause as we know that it is going to be 1.

Note: On the bottom right-hand side of SSMS you can see the number of records returned. In our case, we are returning 5044 rows. 

You now want to send the results back to your marketing team for validation, you do this by right-clicking in the Results Panel and selecting Save Results As. Unfortunately, you get an email response saying that you were miss informed, they want a list of names where Email Promotion is either 1 or 2. As such we have to amend our query again:

SELECT BusinessEntityID, Title, FirstName, MiddleName, LastName, PersonType
FROM Person.Person
WHERE EmailPromotion = 1 OR EmailPromotion = 2

You run this query which now returns 8814 rows, you save this and send this to the marketing team.

COMMON WHERE CLAUSE OPERATORS

Below are the common where clause operators that you can use to filter your data.

OperatorDescriptionExample
=EqualsEmailPromotion = 1
<> or !=Not equalsEmailPromotion <> 0
>Greater thanEmailPromotion > 0
<Less thanEmailPromotion < 1
>=Greater than or equal toEmailPromotion >=1
<=Less than or equal toEmailPromotion <=1
BETWEENBetween an inclusive rangeEmailPromotion BETWEEN 1 AND 2
LIKESearches for a patternFirstName LIKE ‘A%’. The % is a wildcard sign, and therefore this will bring back any record where the first name starts with A.

 

FirstName LIKE ‘%a%d%’ will bring back records where there is an a followed by d within the first name column. 

INMatches one of the values in a listEmailPromotion IN (1,2)
ORTrue is one of two conditions is trueEmailPromotion = 1 OR EmailPromotion = 2
ANDTrue if both conditions are trueEmailPromotion = 1 AND FirstName LIKE ‘A%’
IS NULLTrue if there is no valueTitle IS NULL
IS NOT NULLTrue if there is a valueTitle IS NOT NULL

Please take a moment to experiment with your WHERE Clause and test out the operators above.

Note: When comparing text columns you need to use single quotes. When you are comparing numbers you do not need any quotations. Do not use double Quotes in the WHERE Clause.

AGGREGATION AND GROUP BY CLAUSE

The marketing team has made an additional request of you, they are unsure of the data you have provided, and asked you to provide them with a summary so they can see the high-level numbers. As such, we will have to modify the SQL Statement to bring back summary (aggregated) information. The marketing team wants to see how many people there are with EmailPromotion 0, 1 and 2. Let us create a new SQL Statement:

SELECT EmailPromotion, COUNT(BusinessEntityID)
FROM Person.Person
GROUP BY EmailPromotion

Execute this SQL Statement and look at the results.

We want to include the EmailPromotion in our results, we also want the COUNT of the BusinessEntityID which is a unique key allocated to each person. Given that we are aggregating require a GROUP BY Clause; when we are using an Aggregate function like COUNT, SUM, MIN or MAX, we must include the non aggregated columns in a GROUP BY clause.

The marketing team comes back with another request, be careful of being too helpful at work. Given that it is October, they want to see a list of people who have a first name starting with the letter O, and as a follow up they also want to see the same results for every letter. Let us go about writing our two SQL Statements:

SELECT COUNT(BusinessEntityID)
FROM Person.Person
WHERE LEFT(FirstName,1) = 'O'

As we do not have any non-aggregated columns in our SELECT Clause, we do not need a GROUP BY, we have also used the LEFT Function (more on functions in later chapters) to get the first letter of the FirstName. This result a count of 164. Now let us get the entire alphabet:

SELECT LEFT(FirstName,1), COUNT(BusinessEntityID)
FROM Person.Person
GROUP BY LEFT(FirstName,1)
ORDER BY 1

We have now put the LEFT(FirstName,1) into the SELECT Clause and therefore require this to also be reflected in the GROUP BY Clause. We have also removed the WHERE Clause as we want all the values of the alphabet. Lastly, we have introduced an ORDER BY Clause which will instruct the RDBMS to sort the result set by column one (default is ascending).

They love what you are doing for them, but have one last request, they promise. They would like to order by the second column (highest first) and only return letter that has more than 500 people.  

SELECT LEFT(FirstName,1), COUNT(BusinessEntityID)
FROM Person.Person
GROUP BY LEFT(FirstName,1)
HAVING COUNT(BusinessEntityID) > 500
ORDER BY 2 DESC

We have now added a HAVING BY Clause that will filter the aggregation. WHERE Clauses filter the dataset rows, whereas, HAVING filters the aggregated results. We have also changed the ORDER BY Clause to sort on the second column and in descending order.

Execute the statement and review the results.

Summary

This has been a long article, but it covers the basics of the SELECT Statement and its various Clauses. To summarise:

We have covered a lot so far and in the next part of this series, we will go through concepts of relationship modelling and how data is designed and stored in Relational Database Management Systems (RDBMS), we will go through some more theory before looking at connecting (joining) tables.

As always, please let me know what you think, leave a comment below or reach out to me on Twitter @Tableau_Magic.

Sources

Exit mobile version