Analytical Functions in SQL : A Complete Guide for Beginners and Experts

Analytical functions in SQL are very important because they are used to run multiple analytical queries.

These functions go beyond basic aggregate functions like SUM, AVG, or COUNT and analytical functions are different as compared to aggregate functions which we will discuss in this article.

Table of Contents

What Are Analytical Functions in SQL?

Analytical functions perform calculations based on a group of rows and return a single result for each row within the group.

Unlike aggregate functions, which summarize data, analytical functions keep the row-level details while adding calculated values. They are used in complex data analysis scenarios, making SQL much more helpful manage data.

Important Analytical Functions In SQL

Now let us understand some of the most important analytical functions in SQL , these are the most important and most used analytical functions.

We will understand each analytical function with it’s use case and example as well.

1. ROW_NUMBER()

Use Case: ROW_NUMBER() used to assign unique row numbers to records for sorting results.

Example: Suppose you want to assign a unique number to each employee based on their salary.

sqlCopy codeSELECT employee_id, name, salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;

Result:

employee_id	name	salary	row_num
101	John	80000	1
102	Jane	75000	2
103	Mark	65000	3

2. RANK()

Use Case: RANK() also assigns a number to rows but if two rows have the same rank, the next number is skipped.

Example: Rank employees by salary, and if two employees have the same salary, assign them the same rank.

sqlCopy codeSELECT employee_id, name, salary,
       RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees;

Result:

employee_id	name	salary	rank
101	John	80000	1
102	Jane	75000	2
104	Emily	75000	2
103	Mark	65000	4

3. DENSE_RANK()

Use Case: DENSE_RANK() is similar to RANK(), but it does not leave gaps in the rank sequence for ties.

Example: Rank employees by salary without gaps in ranking numbers.

sqlCopy codeSELECT employee_id, name, salary,
       DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

Result:

employee_id	name	salary	dense_rank
101	John	80000	1
102	Jane	75000	2
104	Emily	75000	2
103	Mark	65000	3

4. NTILE()

Use Case: NTILE() is used to distribute rows into a specified number of groups, useful for dividing data into percentiles.

Example: Divide employees into 4 salary groups (quartiles).

sqlCopy codeSELECT employee_id, name, salary,
       NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;

Result:

employee_id	name	salary	quartile
101	John	80000	1
102	Jane	75000	1
104	Emily	75000	2
103	Mark	65000	3

5. LAG()

Use Case: LAG() is used to access the value from a previous row, often used for comparing current and previous values.

Example: Compare each employee’s salary to the previous employee’s salary.

sqlCopy codeSELECT employee_id, name, salary,
       LAG(salary, 1) OVER (ORDER BY salary DESC) AS previous_salary
FROM employees;

Result:

employee_id	name	salary	previous_salary
101	John	80000	NULL
102	Jane	75000	80000
104	Emily	75000	75000
103	Mark	65000	75000

6. LEAD()

Use Case: LEAD() is used to access the value from a subsequent row. It’s the opposite of LAG().

Example: Compare each employee’s salary to the next employee’s salary.

sqlCopy codeSELECT employee_id, name, salary,
       LEAD(salary, 1) OVER (ORDER BY salary DESC) AS next_salary
FROM employees;

Result:

employee_id	name	salary	next_salary
101	John	80000	75000
102	Jane	75000	75000
104	Emily	75000	65000
103	Mark	65000	NULL

7. SUM() with `OVER()`

Use Case: SUM() with OVER() is sued to calculate a running total or cumulative sum for a set of data.

Example: Calculate the cumulative total salary for employees.

sqlCopy codeSELECT employee_id, name, salary,
       SUM(salary) OVER (ORDER BY salary DESC) AS running_total
FROM employees;

Result:

employee_id	name	salary	running_total
101	John	80000	80000
102	Jane	75000	155000
104	Emily	75000	230000
103	Mark	65000	295000

8. AVG() with `OVER()`

Use Case: AVG() with OVER() is used to calculate a moving average over a set of data.

Example: Calculate the average salary for employees based on their position in the ranking.

sqlCopy codeSELECT employee_id, name, salary,
       AVG(salary) OVER (ORDER BY salary DESC) AS moving_avg
FROM employees;

Result:

employee_id	name	salary	moving_avg
101	John	80000	80000
102	Jane	75000	77500
104	Emily	75000	76667
103	Mark	65000	73750

9. FIRST_VALUE()

Use Case: FIRST_VALUE() is used to find the first value in a window or partition, often used for highlighting top values in categories.

Example: Return the highest salary (since we’re ordering by salary in descending order) for each employee.

sqlCopy codeSELECT employee_id, name, salary,
       FIRST_VALUE(salary) OVER (ORDER BY salary DESC) AS highest_salary
FROM employees;

Result:

employee_id	name	salary	highest_salary
101	John	80000	80000
102	Jane	75000	80000
104	Emily	75000	80000
103	Mark	65000	80000

10. LAST_VALUE()

Use Case: LAST_VALUE() is used to find the last value in a window or partition, helpful when you need to see the end of a sequence.

Example: Get the lowest salary for each employee from the dataset ordered by salary.

sqlCopy codeSELECT employee_id, name, salary,
       LAST_VALUE(salary) OVER (ORDER BY salary DESC) AS lowest_salary
FROM employees;

Result:

employee_id	name	salary	lowest_salary
101	John	80000	65000
102	Jane	75000	65000
104	Emily	75000	65000
103	Mark	65000	65000

How to learn SQL for Data Analysis – SQL Roadmap

Window Functions vs. Aggregate Functions

Aggregate functions like SUM(), AVG(), COUNT(), and MAX() return a single value for each group of data, whereas analytical (or window) functions return a value for each row while keeping the group context.

Aggregate Functions	Window/Analytical Functions
Summarizes data and returns one row per group.	Keeps all rows while adding calculations.
`GROUP BY` is required.	`OVER()` clause is used.
Cannot access individual rows after aggregation.	Rows remain available for additional calculations.

Best Practices for Using Analytical Functions

Use PARTITION BY Wisely: When using analytical functions, the PARTITION BY clause allows you to divide the result set into partitions and apply the function to each partition separately. Avoid partitioning on too many columns to prevent unnecessary overhead.
Optimize with Indexes: Analytical functions can be resource-intensive. To improve performance, ensure that your table has appropriate indexes on the columns used in the ORDER BY clause.
Combine with CTEs (Common Table Expressions): For more complex analyses, combining analytical functions with CTEs can make your queries more readable and efficient.

Conclusion

Analytical functions in SQL are an essential tool for anyone involved in data analysis. You can unlock new insights from your data and streamline complex queries.

If you will master these analytical functions your data journey will be a lot easy and a lot smoother.