Recently while working on Salesforce Marketing Cloud platform, a colleague and I came across an SQL query written by an external vendor in a form that we were rather unfamiliar with. It goes like this:
This query performed very badly on Salesforce Marketing Cloud with very long run time and frequent timeouts, giving us much frustration. Unlike Google Bigquery which we are familiar with, Salesforce Marketing Cloud has a much lower compute and cannot process the load of a complex query.
Notice that two of its sub-queries referenced the main table in the where table:
This prompted an intellectual curiosity within the team; `How many ways can we write an SQL query and what is the time complexity of each query?`.
We decided to replay an experiment based on a 1988 article by Fabian Pascal to compare the performance of different ways of writing joins in an SQL query. Presenting ‘8 ways to join tables in SQL’ with added in time complexity from Google Bigquery, hope you enjoy this piece of work!
Method 1: Relational Algebra
Relational algebra is the most common way of writing a query and also the most natural way to do so. The code is clean, easy to troubleshoot, and unsurprisingly, it is also the most efficient way to join two tables.
Method 2: Uncorrelated Subquery
The uncorrelated subquery method executes the filter function by first creating a subquery list of account_number, followed by an IN function to filter account number in the subquery. Although efficiency is not as good as Relational Algebra, it is one of the more commonly used join method due to the ease of writing.
Method 3: Correlated Subquery
In the Correlated Subquery, the EXISTS function is used to search in an unfiltered subquery `SELECT *`. The filter operation in the subquery requires a `where mp.account_number = t.account_number`. This is one of the join function used in the inefficient query. The performance of this join is agonizing.
Method 4: Scalar Subquery in the WHERE clause
By using a subquery as a filter on the WHERE function, the query is able to filter f_online = 1. Quite a cool way of thinking but unfortunately it doesn’t perform well.
Method 5: Scalar Subquery in the SELECT clause
Another really interesting way of writing a query, this method uses a subquery in the SELECT function to extract the account_number from another table, but as the two tables have a many to many relation, we have to add in a filter to remove the nulls.
Method 6: Aggregate function to check existence
Similar to Scalar Subquery, this method uses a subquery in the WHERE function. The difference is that this method uses a subquery COUNT(*) with a filter of >1.
Method 7: Correlated Subquery (double negative)
Similar to Correlated Subquery, but uses a double negative. This is also one of the join used in the inefficient query . But interestingly, it did not perform as badly as I expected. This could be simply due to the data structure where there are more exceptions than inclusions.
Method 8: Uncorrelated Subquery (double negative)
Similar to Uncorrelated subquery but uses a double negative.
In conclusion, the way we write our SQL query does have a big impact on the efficiency of the query. The most efficient query runs for 13.8 seconds, compared to the least efficient query runtime of 52.6 seconds. We re-wrote the inefficient query by the external vendor on Salesforce Marketing Cloud and we were able to reduce the run time from 45 mins coupled with frequent timeouts, to 4 mins. After that, we went for a nice cold beer and celebrated our victory.
Queries don’t start from `SELECT` but `FROM`. Now we know why LIMIT doesn’t reduce compute cost. Source
*1988 article by Fabian Pascal “SQL Redundancy and DBMS Performance” in the journal Database Programming & Design — http://www.dbdebunk.com/2013/02/language-redundancy-and-dbms.html
*Day 4: The Twelve Days of SQL: The way you write your query matters — https://iggyfernandez.wordpress.com/2011/12/04/day-4-the-twelve-days-of-sql-there-way-you-write-your-query-matters/
And thanks George for sharing Fabian Pascal’s 1988 experiment 😉
SQL Complexity and 8 Different Ways to Join Tables was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.