Wednesday, December 12, 2012

Alternatives to handling null values in aggregate functions or How to avoid the "Warning: Null value is eliminated by an aggregate or other SET operation."

Most SQL developers have written many queries that perform calculations using aggregate functions such as SUM() or COUNT().  Many have not thought deeply about the what is happening within that aggregate function, and lesser so deeply about the handling of NULL values.  While typically, the developer should not have to think deeply about such internals, there is a situation where it might rise to your attentions.   When you see the "Warning: Null value is eliminated by an aggregate or other SET operation".

In most querying activities, the "Warning: Null value is eliminated by an aggregate or other SET operation" wouldn't bubble up to your attention.  In SQL Server Management Studio, you might notice if you click over to the 'Messages' tab:
This appears on an otherwise working query. 

There is a situation where this demands developer attention, when it is encountered in a SQL Server Integration Services package.  A Microsoft Connect-reported issue describes the scenario where "
An SSIS package which runs fine but sporadically fails with "Warning: Null value is eliminated by an aggregate or other SET operation" (Bug 483175). 

This can occur when the query for a Source Adapter is written in such a way that it aggregates potentially Null values, either directly or through dependent objects.

Microsoft has weighed in with "The real problem is why the underlying OLEDB provider returns a failure code then. Probably the warning is related but not necessary. The specific warning can happen to be there."  Agree or disagree with the resolution to that, the reality is something about the query in the Source Adapter that will have to change.   The following is a very simple example of the scenario, and two alternatives to change the query such that OLEDB warning will no longer occur.

The requirement from the script was a result set that will count the not null values.

The first step in the script is to setup a very simple table variable.  Without specifying anything for the column beyond the name and data type, it will allow Null values.  The script then inserts two legitimate values, one Null value, then one zero.  This will be sufficient to demonstrate that different logical results are gathered by counting rows, by counting a column, and by summing.

DECLARE @tst TABLE
(
VALUE int
)

INSERT INTO @tst (Value) VALUES (1)
INSERT INTO @tst (Value) VALUES (7)
INSERT INTO @tst (Value) VALUES (NULL)
INSERT INTO @tst (Value) VALUES (0)

SELECT * FROM @tst

The first test is the query most developers would naturally right.  You might have even built this through the Query Designer:

SELECT COUNT(*) AS CountOfRows, COUNT(value) AS CountOfValues, SUM(value) AS SumOfValues FROM @tst


Results meet requirements, the 'CountOfValues' clearly returns the '3' legitimate values.  The 'CountOfRows' and 'SumOfValues' are true for what they are, but neither can be used to gain any insight into the number of non-null values.  Back on the 'Messages' tab, though, the Warning about null values appears - this is the type of query that will cause an SSIS Source Adapter to raise an exception.

What's happening here?   Basically, this query is relying on the database management systems handling of null values in a certain way, instead of having the query itself specify how nulls should be handled.  The ANSI-SQL 92 standard specifies that warnings be raised in the case of aggregate functions that rely on the elimination of null values before completing. (Gotcha! SQL Aggregate Functions and NULL has deep discussion on this topic)  The implication is that a given database management system might handle null values differently, or the handling could be set through configuration.  There are schools-of-thought that the result of the aggregate function would be a Null value since at least one value is a null value.  Perhaps from a purist point-of-view, but not particularly useful for business reporting or many common SQL querying examples.

So how to avoid this error?

Alternative 1:
Simply add the statement 'SET ANSI_WARNINGS OFF' before your script.  As you might expect, this suppresses any of the ANSI-specified warnings such as the handling of null values in aggregate functions.   In the following sample code, the ANSI_WARNINGS is turned off, the same query is ran.   No warning appears on the Messages screen.  In this sample ANSI_WARNINGS is turned back so the rest of the script can be demonstrated.  But it is not strictly required to set ANSI_WARNINGS back on.   Setting it to Off will only last through that query session.

--Alternative 1: turn warnings off, rerun
SET ANSI_WARNINGS OFF
SELECT COUNT(*) AS CountOfRows, COUNT(value) AS CountOfValues, SUM(value) AS SumOfValues FROM @tst

--turn warnings back on, for this session
SET ANSI_WARNINGS ON


Disadvantages I have found so far.  From a testability view, this suppression of warning for the remainder of the query and session could suppress other legitimate warnings.  The only practical consequence I have seen is that if the query is designed with a graphical editor such as the Query Editor, the SET ANSI_WARNINGS statement will not be included in the edited query, and could be overwritten inadvertently in a copy-paste operation.

Alternative 2:
Rewrite the query to make explicit the handling of Null values.  In the rewritten version, the query won't rely on the Database Management Systems' handling of Nulls or the ANSI standard, the handling will be explicitly stated.   Basically, the Nullable column is converted to a nonnullable boolean column based on whether the original value was populated or not.  The new nonnullable boolean column (the bit data type in SQL Server) can be summed to meet the original requirement of a count of non-null values.   In the sample script, this is all done in a single statement.  If the query was the definition of an object such as a view, you would expect the column to have a not null constraint.

--Alternative 2: rewrite to provide a logical handling of the NULL value
SELECT COUNT(*) AS CountOfRows, SUM(CASE WHEN Value IS NOT NULL THEN 1 ELSE 0 END) AS CountOfValues, SUM(COALESCE(value,0)) AS SumOfValues FROM @tst



Disadvantages seem to be adjusting to the idea of using the SUM() aggregate function to produce what one might naturally write using the COUNT() aggregate function.  It is supported, but less readable in a graphically query designer.

Conclusion:
In most cases, writing a query "Warning: Null value is eliminated by an aggregate or other SET operation" should barely register with you.  At best, it might prompt you to verify the null-ability of the columns which you are querying matches your expectation.  However, if the query is to be used in the Source Adapter of a SQL Server Integration Services package, action must be taken. 

If it is a quick work-around that is needed, and there is a strong protection against the inadvertent modification of a query, you might consider adding that statement 'SET ANSI_WARNINGS OFF' before the beginning of the query. 

If a more resilient situation is called for, for example if the query is to be accessed from multiple callers in the form of a database view, you may consider doing the work of rewriting the query to explicitly handle null values.

Entire Script:
DECLARE @tst TABLE
(
VALUE int
)

INSERT INTO @tst (Value) VALUES (1)
INSERT INTO @tst (Value) VALUES (7)
INSERT INTO @tst (Value) VALUES (NULL)
INSERT INTO @tst (Value) VALUES (0)

SELECT * FROM @tst

--works but triggers the 'Warning: Null value is eliminated by an aggregate or other SET operation' warning
--SELECT COUNT(*) AS CountOfRows, COUNT(value) AS CountOfValues, SUM(value) AS SumOfValues FROM @tst

--Alternative 1: turn warnings off, rerun
SET ANSI_WARNINGS OFF
SELECT COUNT(*) AS CountOfRows, COUNT(value) AS CountOfValues, SUM(value) AS SumOfValues FROM @tst

--turn warnings back on, for this session
SET ANSI_WARNINGS ON

--Alternative 2: rewrite to provide a logical handling of the NULL value
SELECT COUNT(*) AS CountOfRows, SUM(CASE WHEN Value IS NOT NULL THEN 1 ELSE 0 END) AS CountOfValues, SUM(COALESCE(value,0)) AS SumOfValues FROM @tst



No comments: