Hi everyone,
Wether you are newbie SQL writer, an experimented BigQuery novelist with a volatile memory, or a visitor in quest of good practices, this article is for you !
So here is the situation: after hours of thinking and writing and testing, you have came up with a cool query that you are super proud of, a query that shows exactly the insights you were looking for, a query to make queries great again. In short, possibly the best query anyone has ever thought of.
This is understandable, and I also believe you are a genius.
However, a query (and any bunch of code in general actually) is only as good as it gets if it can’t be shared and noone understands it.
Your query is not a good query if it only does the job.
Having a query that results in the insights you are looking for is only step one. But it is the role of everyone to make sure it is readable and understandable by anyone. In one word, make it sharing-ready (it counts as one word).
Whatever you may think, you never only write a query ‘for yourself’. It is likely the query will be shared to other people. It can be someone to help you write it or correct it. It can be someone else who will need it for business purposes. It can be your replacement in the future… hell, it could even be you in 3 months when you need to update it. One way or another, a query is never just yours, so it is important to make it pretty as soon as you write your first SELECT !
I don’t waste time on such a low concept that is beauty, I am Productivity itself.
I hear you, and of course, the purpose there isn’t to just make the query pretty. We’re talking about saving a ton of time and headache to every reader of your query who is not you right here right now. Writing a clear text helps oganizing your thoughts as well. And we’re also talking about improving general logic understanding of what your query does. In the end, the benefits outweights the few seconds taken to prettify your query.
After this very long introduction, here are, bestowed to you by the Sephora Data Team, the commandments:
1 - Thou shalt know when to indent and go back to line
Indent and go back to line a lot … but not too much !
The rules are as follows:
-
The main keywords
SELECT
,FROM
,WHERE
,GROUP BY
,HAVING
andORDER BY
should all have the same indentation level. Everything else should have at least one more level of indentation. -
CASE WHEN THEN ELSE END
should be indented as shown in the example. -
Everything within a
WITH
statement should be indented once more than the rest. -
Keep each
JOIN
andWHERE
conditions on the same line. One joined table = one line, one WHERE condition = 1 line, no more. -
Keep all the
GROUP BY
andORDER BY
parameters in one line. -
Functions parameters (like
IF
) stays in one line.
2 - Thou shalt never nest a query inside of another
It is convenient to put a whole query inside of a FROM statement, but it is a nightmare to read. BigQuery supports WITH statement that you can use to extract your nested query in a very clean way. Give a clear name to the subtable and use it in your main query !
You can also have WITH
statement inside of WITH
blocks, but never go further than one level of nesting.
3 - Thou shalt use proper table aliasing
Begone are the time where a, b or c are used as aliases for tables.
Use the first letter or first few letters of the table you are making an alias for. When your query gets bigger, it’s easier to remember that u is user table and o is order table, compared to a and b.
4 - Thou shalt be consistent with your column aliasing
You want to give names to each output columns of your query, and this is a good thing.
For clarity, pick one naming convention and follow through the whole query. An order Id column can be named: orderId, ORDERID, order_id, ORDER_ID, OrderId …
How do you separate words ? Does your word starts with caps ? Is the word all caps ? Pick one way and stick with it.
Also please pick an alias that makes sense and is composed of english words.
5 - Thou shalt make your query breathe
Separate each sub query with spaces so it doesn’t get confusing. Pretty self explanatory.
6 - Thou shalt be clear on condition precedence
If you aren’t absolutely sure of how precedence works in your WHERE
or HAVING
conditons (or if you have never heard of the word precedence before), do put parenthesis when you are mixing AND and OR conditions.
This rule might actually make your query less readable from far away … but it does clarify what is the logic you are implementing as well as ensuring that BQ parses your conditions properly.
While we’re there
While we’re there, here are a few more things you should take in account when making a query on BigQuery:
- Use Standard SQL instead of Legacy SQL, this isn’t 2015 anymore.
- Check the price of your query, BigQuery charges per query basis, and the price depends on the amount of data scanned.
- Don’t forget to focus on the insights as well as making your query beautiful !
Everything Should Be Made as Simple as Possible, But Not Simpler - Einstein
Thank you, if your query complies with all the above, it will very likely stay in the legacy of Sephora, and you will be uninamously recognized as the wise person you are. You will then be able to ask for a raise and a statue of you writing some SQL to be installed in the data corner of the company, and it will be well deserved.
– Aurelien