Using advanced SQL features, you can even perform multidimensional data analysis as well, enabling you to answer complex analytical questions.
This can be best understood by an example on group operations, aggregating data over multiple rows stored in several tables. Suppose you collect information about sales, storing the data in related tables within a single database schema. At the end of each quarter, you need to know the summarized sales figures for every region, for each month within that quarter.
To start with, let's create a new database schema, and the tables within it that are needed for this example. You can perform all these tasks from SQL*Plus.
CREATE USER usr IDENTIFIED BY usr;
GRANT connect, resource TO usr;
CONN usr/usr
CREATE TABLE salespersons (
emp_id VARCHAR2(10) PRIMARY KEY,
emp_name VARCHAR2(40)
)
CREATE TABLE regions (
reg_id VARCHAR2(2) PRIMARY KEY,
reg_name VARCHAR2(20)
)
CREATE TABLE orders(
ordno NUMBER PRIMARY KEY,
empid VARCHAR2(10) REFERENCES salespersons(emp_id),
regid VARCHAR2(2) REFERENCES regions(reg_id),
orddate DATE,
total NUMBER(10,2)
)
Next, you have to populate these tables with some data, so that you can perform queries against them. As the orders table refers to the salespersons and regions tables, you first have to populate these two tables with data. You might use the following statements for this:
INSERT INTO salespersons VALUES ('Abuzar', 'Abuzar Bukhari');
INSERT INTO salespersons VALUES ('Usman', 'Usman Bukhari');
INSERT INTO regions VALUES ('PK', 'Pakistan');
INSERT INTO regions VALUES ('SA', 'Saudi Arabia');
Now that you have filled up the salespersons and regions tables, you can move on and populate the orders table:
INSERT INTO orders VALUES
(1001, 'Abuzar', 'PK', '10-JAN-2010', 1450.00);
INSERT INTO orders VALUES
(1002, 'Abuzar', 'PK', '15-JAN-2010', 2310.00);
INSERT INTO orders VALUES
(1003, 'Usman', 'SA', '20-JAN-2010', 1480.00);
INSERT INTO orders VALUES
(1004, 'Abuzar', 'PK', '19-FEB-2010', 3700.00);
INSERT INTO orders VALUES
(1005, 'Usman', 'SA', '24-FEB-2010', 1850.00);
INSERT INTO orders VALUES
(1006, 'Usman', 'SA', '04-MAR-2010', 1770.00);
INSERT INTO orders VALUES
(1007, 'Usman', 'SA', '05-MAR-2010', 1210.00);
INSERT INTO orders VALUES
(1008, 'Abuzar', 'PK', '05-MAR-2010', 10420.00);
Commit;
You now have all the parts required to proceed with the example, and can issue a query that summarizes sales figures for every region for each month. Here is what this query might look like:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY ROLLUP(EXTRACT(MONTH FROM (o.orddate)), r.reg_name);
In the above query, you summarize the data representing the sales figures, generating subtotals for every region for each month, and then totals for each month with the help of the GROUP BY ... ROLLUP clause. The query results should look like this:
REGION MONTH SALES
-------------------- --------- ----------
Pakistan January 3760
Saudi Arabia January 1480
January 5240
Pakistan February 3700
Saudi Arabia February 1850
February 5550
Pakistan March 10420
Saudi Arabia March 2980
March 13400
24190
As you can see, the query output contains not only monthly sales per region and month totals for each month, but also the grand total, which is nothing but the sum of the month totals. You might exclude this grand total from the report if you are using a partial ROLLUP as shown next:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY EXTRACT(MONTH FROM (o.orddate)), ROLLUP(r.reg_name);
As the month field is excluded from the ROLLUP here, the month totals will not be summarized:
REGION MONTH SALES
-------------------- --------- ----------
Pakistan January 3760
Saudi Arabia January 1480
January 5240
Pakistan February 3700
Saudi Arabia February 1850
February 5550
Pakistan March 10420
Saudi Arabia March 2980
March 13400
Changing the order in which the parameters of ROLLUP appear will also affect the report. In our example, if you move the r. reg_name parameter to the first position, the report will generate totals for each region rather than for each month. Here is the query:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY ROLLUP(r.reg_name, EXTRACT(MONTH FROM (o.orddate)));
Here is its output:
REGION MONTH SALES
-------------------- --------- ----------
Pakistan January 3760
Pakistan February 3700
Pakistan March 10420
Pakistan 17880
Saudi Arabia January 1480
Saudi Arabia February 1850
Saudi Arabia March 2980
Saudi Arabia 6310
24190
As you can see, in the above report there are no longer any month totals. Instead, it provides you with the totals for each region.
Cubing
So by now you are probably wondering if it's possible to have the totals for both each month and each region, thus having all possible combinations of the grouping columns within a single report. While there is more than one way to achieve this, using the CUBE extension of the GROUP BY clause is the most elegant and effective one.
The following figure gives a graphical depiction of how CUBE differs from ROLLUP when it comes to generating summary information:
Switching to SQL, you might write the following query using the CUBE function in the GROUP BY clause:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY CUBE(EXTRACT(MONTH FROM (o.orddate)), r.reg_name);
The previous query should generate the following report:
REGION MONTH SALES
-------------------- --------- ----------
24190
Pakistan 17880
Saudi Arabia 6310
January 5240
Pakistan January 3760
Saudi Arabia January 1480
February 5550
Pakistan February 3700
Saudi Arabia February 1850
March 13400
Pakistan March 10420
Saudi Arabia March 2980
As you can see, you now have all possible subtotals and totals combinations presented in the report. This is why changing the order of columns in CUBE won't affect the report, unlike with ROLLUP. SO, the following query should actually give you the same results as above:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY CUBE(r.reg_name, EXTRACT(MONTH FROM (o.orddate)));
However, the order of rows in the report will be a little different:
REGION MONTH SALES
-------------------- --------- ----------
24190
January 5240
February 5550
March 13400
Pakistan 17880
Pakistan January 3760
Pakistan February 3700
Pakistan March 10420
Saudi Arabia 6310
Saudi Arabia January 1480
Saudi Arabia February 1850
Saudi Arabia March 2980
Just as there is a partial ROLLUP, there is also a partial CUBE. For example, you might issue the following query:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY r.reg_name, CUBE(EXTRACT(MONTH FROM (o.orddate)));
This should generate the following output:
REGION MONTH SALES
-------------------- --------- ----------
Pakistan 17880
Pakistan January 3760
Pakistan February 3700
Pakistan March 10420
Saudi Arabia 6310
Saudi Arabia January 1480
Saudi Arabia February 1850
Saudi Arabia March 2980
As you can see, it only includes the totals for each region, and not for each month.Interestingly, you would get the same results if you replaced CUBE with ROLLUP, issuing the following query:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY r.reg_name, ROLLUP(EXTRACT(MONTH FROM (o.orddate)));
Thus, a one column CUBE operation produces the same results as the identical ROLLUP operation.
Generating reports with only summary rows
In some cases, you may not need to include the rows that represent the subtotals generated by GROUP BY, but include only the total rows. This is where the GROUPING SETS extension of the GROUP BY clause may come in handy:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY GROUPING SETS(r.reg_name, EXTRACT(MONTH FROM (o.orddate)));
This query should produce the following report:
REGION MONTH SALES
-------------------- --------- ----------
Pakistan 17880
Saudi Arabia 6310
January 5240
February 5550
March 13400
You receive an interesting result when performing a partial GROUPING SET operation like the one shown next:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY r.reg_name, GROUPING SETS(EXTRACT(MONTH FROM (o.orddate)));
This should give you the following results:
REGION MONTH SALES
-------------------- --------- ----------
Pakistan March 10420
Pakistan January 3760
Saudi Arabia March 2980
Saudi Arabia January 1480
Saudi Arabia February 1850
Pakistan February 3700
It's interesting to note that the above report doesn't include summary rows—neither for each month nor for each region. Instead, it shows you all those subtotal rows generated by GROUP BY and excluded from the full GROUPING SET operation discussed earlier in this section. In this sense, a partial GROUPING SET operation is a reverse operation to the corresponding full one.
Ranking
You might, for example, compose a query that will compute a rank for each region, based on the sales figures. To better understand how ranking works, though, let's first look at the query that simply summarizes sales figures per region:
SELECT r.reg_name region, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY GROUPING SETS(r.reg_name);
Here is the output you should see:
REGION SALES
-------------------- ----------
Pakistan 17880
Saudi Arabia 6310
Now let's look at the query that will compute a sales rank for each region in ascending order:
SELECT r.reg_name region, SUM(o.total) sales, RANK() OVER (ORDER BY SUM(o.total) ASC) rank
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY GROUPING SETS(r.reg_name);
This time, the query results should look like this:
REGION SALES RANK
-------------------- ---------- ----------
Saudi Arabia 6310 1
Pakistan 17880 2
As you can see in the above rows, not only are region sales arranged in ascending order, but the rank column has also appeared. If you don't want to see the rank column, you might remove it from the select list and add the ORDER BY clause with the rank function, rewriting the query as follows:
SELECT r.reg_name region, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY GROUPING SETS(r.reg_name)
ORDER BY RANK() OVER (ORDER BY SUM(o.total) ASC);
This should produce the following output:
REGION SALES
-------------------- ----------
Saudi Arabia 6310
Pakistan 17880
As you can see, a little trick of moving the rank function from the select list to the ORDER BY clause lets you exclude the ranking column from the output, keeping sales figures in the sales column arranged in ascending order.
Windowing
Windowing functions make up another important group of analytic SQL functions. The idea behind windowing functions is that they enable aggregate calculations to be made within a "sliding window" that may float down as you proceed through the result set. For example, you might want to include a sliding average column in the report containing total monthly sales. This column would contain the average sales for each row, based on the sales figures in the preceding, current, and following rows. Thus, the average for the January row will be calculated based on the value of the sales field in the January row, as well as the value of the sales field in the February row. In turn, the average for the February row will be determined using the sales field's values throughout all three rows. Finally, the average for the March row will be computed based on the sales for February and March.
The following diagram gives a graphical illustration of this example:
Let's first look at the query that simply calculates total monthly sales, which should look like this:
Here is the output:
Now let's modify the query and define a "window" that will float down the above rows, calculating a value for the sliding_avg field. In the following query, note the use of the ROWS BETWEEN clause, and the keywords 1 PRECEDING and 1 FOLLOWING defining the parameters of the sliding window:
The output should look like the following:
Aside from the AVG function, you can apply the windowing technique using the other aggregate functions, such as SUM, MIN, and MAX.
This can be best understood by an example on group operations, aggregating data over multiple rows stored in several tables. Suppose you collect information about sales, storing the data in related tables within a single database schema. At the end of each quarter, you need to know the summarized sales figures for every region, for each month within that quarter.
To start with, let's create a new database schema, and the tables within it that are needed for this example. You can perform all these tasks from SQL*Plus.
CREATE USER usr IDENTIFIED BY usr;
GRANT connect, resource TO usr;
CONN usr/usr
CREATE TABLE salespersons (
emp_id VARCHAR2(10) PRIMARY KEY,
emp_name VARCHAR2(40)
)
CREATE TABLE regions (
reg_id VARCHAR2(2) PRIMARY KEY,
reg_name VARCHAR2(20)
)
CREATE TABLE orders(
ordno NUMBER PRIMARY KEY,
empid VARCHAR2(10) REFERENCES salespersons(emp_id),
regid VARCHAR2(2) REFERENCES regions(reg_id),
orddate DATE,
total NUMBER(10,2)
)
Next, you have to populate these tables with some data, so that you can perform queries against them. As the orders table refers to the salespersons and regions tables, you first have to populate these two tables with data. You might use the following statements for this:
INSERT INTO salespersons VALUES ('Abuzar', 'Abuzar Bukhari');
INSERT INTO salespersons VALUES ('Usman', 'Usman Bukhari');
INSERT INTO regions VALUES ('PK', 'Pakistan');
INSERT INTO regions VALUES ('SA', 'Saudi Arabia');
Now that you have filled up the salespersons and regions tables, you can move on and populate the orders table:
INSERT INTO orders VALUES
(1001, 'Abuzar', 'PK', '10-JAN-2010', 1450.00);
INSERT INTO orders VALUES
(1002, 'Abuzar', 'PK', '15-JAN-2010', 2310.00);
INSERT INTO orders VALUES
(1003, 'Usman', 'SA', '20-JAN-2010', 1480.00);
INSERT INTO orders VALUES
(1004, 'Abuzar', 'PK', '19-FEB-2010', 3700.00);
INSERT INTO orders VALUES
(1005, 'Usman', 'SA', '24-FEB-2010', 1850.00);
INSERT INTO orders VALUES
(1006, 'Usman', 'SA', '04-MAR-2010', 1770.00);
INSERT INTO orders VALUES
(1007, 'Usman', 'SA', '05-MAR-2010', 1210.00);
INSERT INTO orders VALUES
(1008, 'Abuzar', 'PK', '05-MAR-2010', 10420.00);
Commit;
You now have all the parts required to proceed with the example, and can issue a query that summarizes sales figures for every region for each month. Here is what this query might look like:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY ROLLUP(EXTRACT(MONTH FROM (o.orddate)), r.reg_name);
In the above query, you summarize the data representing the sales figures, generating subtotals for every region for each month, and then totals for each month with the help of the GROUP BY ... ROLLUP clause. The query results should look like this:
REGION MONTH SALES
-------------------- --------- ----------
Pakistan January 3760
Saudi Arabia January 1480
January 5240
Pakistan February 3700
Saudi Arabia February 1850
February 5550
Pakistan March 10420
Saudi Arabia March 2980
March 13400
24190
As you can see, the query output contains not only monthly sales per region and month totals for each month, but also the grand total, which is nothing but the sum of the month totals. You might exclude this grand total from the report if you are using a partial ROLLUP as shown next:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY EXTRACT(MONTH FROM (o.orddate)), ROLLUP(r.reg_name);
As the month field is excluded from the ROLLUP here, the month totals will not be summarized:
REGION MONTH SALES
-------------------- --------- ----------
Pakistan January 3760
Saudi Arabia January 1480
January 5240
Pakistan February 3700
Saudi Arabia February 1850
February 5550
Pakistan March 10420
Saudi Arabia March 2980
March 13400
Changing the order in which the parameters of ROLLUP appear will also affect the report. In our example, if you move the r. reg_name parameter to the first position, the report will generate totals for each region rather than for each month. Here is the query:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY ROLLUP(r.reg_name, EXTRACT(MONTH FROM (o.orddate)));
Here is its output:
REGION MONTH SALES
-------------------- --------- ----------
Pakistan January 3760
Pakistan February 3700
Pakistan March 10420
Pakistan 17880
Saudi Arabia January 1480
Saudi Arabia February 1850
Saudi Arabia March 2980
Saudi Arabia 6310
24190
As you can see, in the above report there are no longer any month totals. Instead, it provides you with the totals for each region.
Cubing
So by now you are probably wondering if it's possible to have the totals for both each month and each region, thus having all possible combinations of the grouping columns within a single report. While there is more than one way to achieve this, using the CUBE extension of the GROUP BY clause is the most elegant and effective one.
The following figure gives a graphical depiction of how CUBE differs from ROLLUP when it comes to generating summary information:
Switching to SQL, you might write the following query using the CUBE function in the GROUP BY clause:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY CUBE(EXTRACT(MONTH FROM (o.orddate)), r.reg_name);
The previous query should generate the following report:
REGION MONTH SALES
-------------------- --------- ----------
24190
Pakistan 17880
Saudi Arabia 6310
January 5240
Pakistan January 3760
Saudi Arabia January 1480
February 5550
Pakistan February 3700
Saudi Arabia February 1850
March 13400
Pakistan March 10420
Saudi Arabia March 2980
As you can see, you now have all possible subtotals and totals combinations presented in the report. This is why changing the order of columns in CUBE won't affect the report, unlike with ROLLUP. SO, the following query should actually give you the same results as above:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY CUBE(r.reg_name, EXTRACT(MONTH FROM (o.orddate)));
However, the order of rows in the report will be a little different:
REGION MONTH SALES
-------------------- --------- ----------
24190
January 5240
February 5550
March 13400
Pakistan 17880
Pakistan January 3760
Pakistan February 3700
Pakistan March 10420
Saudi Arabia 6310
Saudi Arabia January 1480
Saudi Arabia February 1850
Saudi Arabia March 2980
Just as there is a partial ROLLUP, there is also a partial CUBE. For example, you might issue the following query:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY r.reg_name, CUBE(EXTRACT(MONTH FROM (o.orddate)));
This should generate the following output:
REGION MONTH SALES
-------------------- --------- ----------
Pakistan 17880
Pakistan January 3760
Pakistan February 3700
Pakistan March 10420
Saudi Arabia 6310
Saudi Arabia January 1480
Saudi Arabia February 1850
Saudi Arabia March 2980
As you can see, it only includes the totals for each region, and not for each month.Interestingly, you would get the same results if you replaced CUBE with ROLLUP, issuing the following query:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY r.reg_name, ROLLUP(EXTRACT(MONTH FROM (o.orddate)));
Thus, a one column CUBE operation produces the same results as the identical ROLLUP operation.
Generating reports with only summary rows
In some cases, you may not need to include the rows that represent the subtotals generated by GROUP BY, but include only the total rows. This is where the GROUPING SETS extension of the GROUP BY clause may come in handy:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY GROUPING SETS(r.reg_name, EXTRACT(MONTH FROM (o.orddate)));
This query should produce the following report:
REGION MONTH SALES
-------------------- --------- ----------
Pakistan 17880
Saudi Arabia 6310
January 5240
February 5550
March 13400
You receive an interesting result when performing a partial GROUPING SET operation like the one shown next:
SELECT r.reg_name region, TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month') month, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY r.reg_name, GROUPING SETS(EXTRACT(MONTH FROM (o.orddate)));
This should give you the following results:
REGION MONTH SALES
-------------------- --------- ----------
Pakistan March 10420
Pakistan January 3760
Saudi Arabia March 2980
Saudi Arabia January 1480
Saudi Arabia February 1850
Pakistan February 3700
It's interesting to note that the above report doesn't include summary rows—neither for each month nor for each region. Instead, it shows you all those subtotal rows generated by GROUP BY and excluded from the full GROUPING SET operation discussed earlier in this section. In this sense, a partial GROUPING SET operation is a reverse operation to the corresponding full one.
Ranking
You might, for example, compose a query that will compute a rank for each region, based on the sales figures. To better understand how ranking works, though, let's first look at the query that simply summarizes sales figures per region:
SELECT r.reg_name region, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY GROUPING SETS(r.reg_name);
Here is the output you should see:
REGION SALES
-------------------- ----------
Pakistan 17880
Saudi Arabia 6310
Now let's look at the query that will compute a sales rank for each region in ascending order:
SELECT r.reg_name region, SUM(o.total) sales, RANK() OVER (ORDER BY SUM(o.total) ASC) rank
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY GROUPING SETS(r.reg_name);
This time, the query results should look like this:
REGION SALES RANK
-------------------- ---------- ----------
Saudi Arabia 6310 1
Pakistan 17880 2
As you can see in the above rows, not only are region sales arranged in ascending order, but the rank column has also appeared. If you don't want to see the rank column, you might remove it from the select list and add the ORDER BY clause with the rank function, rewriting the query as follows:
SELECT r.reg_name region, SUM(o.total) sales
FROM regions r, orders o
WHERE r.reg_id = o.regid
GROUP BY GROUPING SETS(r.reg_name)
ORDER BY RANK() OVER (ORDER BY SUM(o.total) ASC);
This should produce the following output:
REGION SALES
-------------------- ----------
Saudi Arabia 6310
Pakistan 17880
As you can see, a little trick of moving the rank function from the select list to the ORDER BY clause lets you exclude the ranking column from the output, keeping sales figures in the sales column arranged in ascending order.
Windowing
Windowing functions make up another important group of analytic SQL functions. The idea behind windowing functions is that they enable aggregate calculations to be made within a "sliding window" that may float down as you proceed through the result set. For example, you might want to include a sliding average column in the report containing total monthly sales. This column would contain the average sales for each row, based on the sales figures in the preceding, current, and following rows. Thus, the average for the January row will be calculated based on the value of the sales field in the January row, as well as the value of the sales field in the February row. In turn, the average for the February row will be determined using the sales field's values throughout all three rows. Finally, the average for the March row will be computed based on the sales for February and March.
The following diagram gives a graphical illustration of this example:
Let's first look at the query that simply calculates total monthly sales, which should look like this:
SELECT TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month')
month, SUM(o.total) sales
FROM orders o
GROUP BY TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month');
Here is the output:
MONTH SALES
--------- ----------
January 5240
February 5550
March 13400
Now let's modify the query and define a "window" that will float down the above rows, calculating a value for the sliding_avg field. In the following query, note the use of the ROWS BETWEEN clause, and the keywords 1 PRECEDING and 1 FOLLOWING defining the parameters of the sliding window:
SELECT TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month')
month, SUM(o.total) sales,
AVG(SUM(o.total)) OVER (ORDER BY TO_CHAR(TO_DATE(EXTRACT(MONTH FROM
(o.orddate)),'MM'),'Month') ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
sliding_avg
FROM orders o
GROUP BY TO_CHAR(TO_DATE(EXTRACT(MONTH FROM (o.orddate)),'MM'),'Month' );
The output should look like the following:
MONTH SALES SLIDING_AVG
--------- ---------- -----------
February 5550 5395
January 5240 8063.33333
March 13400 9320
Aside from the AVG function, you can apply the windowing technique using the other aggregate functions, such as SUM, MIN, and MAX.
No comments:
Post a Comment