The Winner Takes It All from A3Ba

This article is about sorting and min-max aggregations on character data. When sorting matters, you end your query with an order by. You might think that if you’re sorting on character data, the first value listed would equal the aggregated minimum, and the last value listed would equal the aggregated maximum.

Sorting on character data is different from sorting on numeric data, in that there are a lot less assumptions involved in the latter. Numeric sorting is based on simple mathematic conventions which are the same anywhere on the planet. Character sorting is someting else, which will be demonstrated below.


Maximum tree height manifestation in Redwood CA

Consider this select statement:

alter session set nls_language = 'AMERICAN';

prompt American
select char_value "Chr"
, min (char_value) over () "Min"
, max (char_value) over () "Max"
, first_value(char_value) over (order by char_value) "First"
, last_value(char_value) over (order by char_value 
                               rows between unbounded preceding
                               and unbounded following) "Last"
from   ( select cast(column_value as varchar2(5)) char_value 
         from table(dbmsoutput_linesarray('A','3','B','a'))
order by char_value

Chr   Min   Max   First Last 
----- ----- ----- ----- -----
3     3     a     3     a    
A     3     a     3     a    
B     3     a     3     a    
a     3     a     3     a    

select parameter, value
from v$nls_parameters
where parameter in ('NLS_SORT')
order by 1

---------- ------

Now, do note I started my session with setting my NLS_LANGUAGE to America, which happens to default my sorting method to ‘BINARY’. This might not be the case for other languages, like it won’t for Dutch. Dutch sorting means that a an uppercase “B” comes after lowercase “a”, and numbers follow alfabetic characters.

As a sidenote, I converted the character found in the array [A,3,B,a] to VARCHAR2(5) just for the convenience of listing the results in an equally “spaced” fashion. No magic intended there.

To make sure that “Last_value” considers all values found in the array, I added the “unbounded both ways” analytical window.

Now, if your development environment is set up to initialize your NLS settings in “DUTCH” and your production environment in (for example) “American”, you’re in for a surprise.

When setting the language to “DUTCH”, the script above returns:

Chr   Min   Max   First Last 
----- ----- ----- ----- -----
A     3     a     A     3    
a     3     a     A     3    
B     3     a     A     3    
3     3     a     A     3    

PARAMETER VALUE                                                           
--------- ------

What we see is that the order of the rows listed confirm that in “DUTCH” sorting, numbers follow alfabetic characters, and lowercase characters and uppercase characters appear ‘mixed up’. Consistent with that order are the values return for the first_value and last_value functions. However, the aggregated maximum and minumum remain the same as in the previous example! It means BINARY SORTING remains in effect with respect to aggregation, regardless of the language setting.

To conclude, this article has been set up to stress the importance of consistent language settings within your development infrastructure.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.