Analytical Functions in SQL : A Complete Guide for Beginners and Experts

Analytical functions in SQL are very important because they are used to run multiple analytical queries.

These functions go beyond basic aggregate functions like SUM, AVG, or COUNT and analytical functions are different as compared to aggregate functions which we will discuss in this article.

What Are Analytical Functions in SQL?

Analytical Functions in SQL
Analytical Functions in SQL

Analytical functions perform calculations based on a group of rows and return a single result for each row within the group.

Unlike aggregate functions, which summarize data, analytical functions keep the row-level details while adding calculated values. They are used in complex data analysis scenarios, making SQL much more helpful manage data.

Important Analytical Functions In SQL

Now let us understand some of the most important analytical functions in SQL , these are the most important and most used analytical functions.

We will understand each analytical function with it’s use case and example as well.

1. ROW_NUMBER()

Use Case: ROW_NUMBER() used to assign unique row numbers to records for sorting results.

Example: Suppose you want to assign a unique number to each employee based on their salary.

sqlCopy codeSELECT employee_id, name, salary,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;

Result:

employee_idnamesalaryrow_num
101John800001
102Jane750002
103Mark650003

2. RANK()

Use Case: RANK() also assigns a number to rows but if two rows have the same rank, the next number is skipped.

Example: Rank employees by salary, and if two employees have the same salary, assign them the same rank.

sqlCopy codeSELECT employee_id, name, salary,
       RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees;

Result:

employee_idnamesalaryrank
101John800001
102Jane750002
104Emily750002
103Mark650004

3. DENSE_RANK()

Use Case: DENSE_RANK() is similar to RANK(), but it does not leave gaps in the rank sequence for ties.

Example: Rank employees by salary without gaps in ranking numbers.

sqlCopy codeSELECT employee_id, name, salary,
       DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

Result:

employee_idnamesalarydense_rank
101John800001
102Jane750002
104Emily750002
103Mark650003

4. NTILE()

Use Case: NTILE() is used to distribute rows into a specified number of groups, useful for dividing data into percentiles.

Example: Divide employees into 4 salary groups (quartiles).

sqlCopy codeSELECT employee_id, name, salary,
       NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;

Result:

employee_idnamesalaryquartile
101John800001
102Jane750001
104Emily750002
103Mark650003

5. LAG()

Use Case: LAG() is used to access the value from a previous row, often used for comparing current and previous values.

Example: Compare each employee’s salary to the previous employee’s salary.

sqlCopy codeSELECT employee_id, name, salary,
       LAG(salary, 1) OVER (ORDER BY salary DESC) AS previous_salary
FROM employees;

Result:

employee_idnamesalaryprevious_salary
101John80000NULL
102Jane7500080000
104Emily7500075000
103Mark6500075000

6. LEAD()

Use Case: LEAD() is used to access the value from a subsequent row. It’s the opposite of LAG().

Example: Compare each employee’s salary to the next employee’s salary.

sqlCopy codeSELECT employee_id, name, salary,
       LEAD(salary, 1) OVER (ORDER BY salary DESC) AS next_salary
FROM employees;

Result:

employee_idnamesalarynext_salary
101John8000075000
102Jane7500075000
104Emily7500065000
103Mark65000NULL

7. SUM() with OVER()

Use Case: SUM() with OVER() is sued to calculate a running total or cumulative sum for a set of data.

Example: Calculate the cumulative total salary for employees.

sqlCopy codeSELECT employee_id, name, salary,
       SUM(salary) OVER (ORDER BY salary DESC) AS running_total
FROM employees;

Result:

employee_idnamesalaryrunning_total
101John8000080000
102Jane75000155000
104Emily75000230000
103Mark65000295000

8. AVG() with OVER()

Use Case: AVG() with OVER() is used to calculate a moving average over a set of data.

Example: Calculate the average salary for employees based on their position in the ranking.

sqlCopy codeSELECT employee_id, name, salary,
       AVG(salary) OVER (ORDER BY salary DESC) AS moving_avg
FROM employees;

Result:

employee_idnamesalarymoving_avg
101John8000080000
102Jane7500077500
104Emily7500076667
103Mark6500073750

9. FIRST_VALUE()

Use Case: FIRST_VALUE() is used to find the first value in a window or partition, often used for highlighting top values in categories.

Example: Return the highest salary (since we’re ordering by salary in descending order) for each employee.

sqlCopy codeSELECT employee_id, name, salary,
       FIRST_VALUE(salary) OVER (ORDER BY salary DESC) AS highest_salary
FROM employees;

Result:

employee_idnamesalaryhighest_salary
101John8000080000
102Jane7500080000
104Emily7500080000
103Mark6500080000

10. LAST_VALUE()

Use Case: LAST_VALUE() is used to find the last value in a window or partition, helpful when you need to see the end of a sequence.

Example: Get the lowest salary for each employee from the dataset ordered by salary.

sqlCopy codeSELECT employee_id, name, salary,
       LAST_VALUE(salary) OVER (ORDER BY salary DESC) AS lowest_salary
FROM employees;

Result:

employee_idnamesalarylowest_salary
101John8000065000
102Jane7500065000
104Emily7500065000
103Mark6500065000

Window Functions vs. Aggregate Functions

Aggregate functions like SUM(), AVG(), COUNT(), and MAX() return a single value for each group of data, whereas analytical (or window) functions return a value for each row while keeping the group context.

Aggregate FunctionsWindow/Analytical Functions
Summarizes data and returns one row per group.Keeps all rows while adding calculations.
GROUP BY is required.OVER() clause is used.
Cannot access individual rows after aggregation.Rows remain available for additional calculations.

Best Practices for Using Analytical Functions

  • Use PARTITION BY Wisely: When using analytical functions, the PARTITION BY clause allows you to divide the result set into partitions and apply the function to each partition separately. Avoid partitioning on too many columns to prevent unnecessary overhead.
  • Optimize with Indexes: Analytical functions can be resource-intensive. To improve performance, ensure that your table has appropriate indexes on the columns used in the ORDER BY clause.
  • Combine with CTEs (Common Table Expressions): For more complex analyses, combining analytical functions with CTEs can make your queries more readable and efficient.

Conclusion

Analytical functions in SQL are an essential tool for anyone involved in data analysis. You can unlock new insights from your data and streamline complex queries.

If you will master these analytical functions your data journey will be a lot easy and a lot smoother.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

Table Of Content
Different types of SQL commands with examples 7 Reasons To try Power BI Dark Mode How To Learn SQL Fast 8 Ways To Improve DAX Query Performance in Power BI Business Analyst vs Data Analyst