Spellings Consulting

When you need IT done right.

MS SQL And Sub-Selects

Today I discovered what I would consider a very serious flaw in MS SQL. Unfortunately, this discovery caused quite a bit of damage to my data, and although I could/should have been more careful, I place most of the blame on poorly designed software. After some simple tests, it appears that MySQL does not suffer from the same issue.

Our exceptional DBA’s (and others) suggest that a good practice to prevent UPDATE errors is to execute a SELECT clause that uses the same WHERE clause before your run your UPDATE command. Had I done this, it would have prevented my mistake, which is why I say I should have been more careful.

Anyway, the bottom line is that it appears that MS SQL’s implementation of Sub-Selects in both SELECT and UPDATE commands do not function as expected. It turns out that if you use a completely invalid SELECT statement as a Sub-Select in an UPDATE or SELECT statement, MS SQL will return records when it should not, in some circumstances.

Here is the SQL statement that I ran that caused all of my grief:
UPDATE FoldersAccounts
SET iAccountID = 56789
WHERE iFoldersAccountsID in (
             SELECT iFoldersAccountsID
             FROM Folders
            WHERE iFolderID in
            (1159,1160,1161,1162,1193,1200,1197,1196,1195,1186,1191,1189))

Now, what is so outrageous is that the sub-select is completely invalid because there is no column named “iFoldersAccountsID” in the Folders table. If you try to run this statement by itself:
SELECT iFoldersAccountsID
FROM Folders
WHERE iFolderID in
(1159,1160,1161,1162,1193,1200,1197,1196,1195,1186,1191,1189))

The above query, when run by itself in MS SQL Server Mangagement Studio, returns the following error:
Msg 207, Level 16, State 1, Line 1
Invalid column name ‘iFoldersAccountsID’.

As I mentioned earlier, this is the proper response for this query because there is no iFoldersAccountsID column in the Folders table.

What is so crazy is that when you run the same SELECT statement as a sub-select, no error is returned an every single record in the FoldersAccounts table is returned. In this case, that’s 1204 records. That also means that when this same SELECT statement is used in the sub-select of an UPDATE command, every single record in the table is updated, which is obviously terrible.

Even more strange is that when you make the exact same Sub-select but swap out the table and use a table name in the FROM clause that has neither the column in the SELECT clause or the WHERE clause, then still no error is returned and a seemingly ambiguous number or records are returned. Thus, the following query returns 12 records, which appears to be one for each iFolderID:
SELECT iFoldersAccountsID
FROM PriceCodes
WHERE iFolderID in
(1159,1160,1161,1162,1193,1200,1197,1196,1195,1186,1191,1189))

Obviously, the PriceCodes table has neither an iFoldersAccountsID nor an iFolderID, yet the above query returns no errors.

Interestingly, if you use a nonexistant table name, that will throw an error. At least they got that right. :-)

Since I am clearly biased as to the “correctness” of this behavior, since it caused me so much trouble, I decided to see how MySQL handles the exact same scenario. So I created the exact same table structure in a MySQL database and tried the following query:
UPDATE FoldersAccounts
SET iAccountID = 56789
WHERE iFoldersAccountsID in (
             SELECT iFoldersAccountsID
             FROM Folders
            WHERE iFolderID in
            (1159,1160,1161,1162,1193,1200,1197,1196,1195,1186,1191,1189))

The above query in MySQL does not return an error, but it does return an “empty result” (selects zero records):

IMO, this is the better behavior, but ideally this should return an error.  This seems like something that should be a no-brainer, but maybe I am missing something.  If so, fill me in.

One thing that I did think about here is how this behavior in MSSQL could potentially be used in an SQL Injection attack to completely screw up an entire table.  This is a great reason to use Stored Procedures in web applications for any data access in web applications. 

Comments

  1. Rob
    May 15th, 2008 | 12:25 am

    I tried basically the same thing using mySQL 5.0.51a and it gave an error as it “should.”

    Can’t link an image in here, but click my name to see the screenshot.

  2. Rob
    May 15th, 2008 | 1:19 am

    And now click my name to see the corrected version (”publish_id” instead of “puplish_id”)

  3. May 15th, 2008 | 8:05 am

    I ran a simular test in SSMS using the Northwinds database Employee Table.

    I used the following query to verify syntax of the query:

    Update employees
    Set Extension = 2231
    Where employeeid in (
    Select Employeeid
    from employees
    where employeeid in (1,3,5))

    The query ran and updated the appropriate rows as expected.

    I then modified the query to the following:

    Update employees
    Set Extension = 2231
    Where employeeid in (
    Select Employeeid
    from employees
    where employeeidx in (1,3,5))

    The following message(s) were returned:

    Msg 207, Level 16, State 3, Line 1
    Invalid column name ‘employeeidx’.
    Msg 207, Level 16, State 3, Line 1
    Invalid column name ‘employeeidx’.
    Msg 207, Level 16, State 3, Line 1
    Invalid column name ‘employeeidx’.

    This is of course run against a MS SQL 2000 database from SSMS 2005.

    The same scenario run against the AdventureWorksT (MS SQL 2005) database yielded the same results:

    Update Person.Address
    Set PostalCode = 000000
    Where AddressID in (
    Select AddressID
    from Person.Address
    Where AddressIDx in (1, 3, 5))

    Msg 207, Level 16, State 1, Line 6
    Invalid column name ‘AddressIDx’.
    Msg 207, Level 16, State 1, Line 6
    Invalid column name ‘AddressIDx’.
    Msg 207, Level 16, State 1, Line 6
    Invalid column name ‘AddressIDx’.

  4. Alan
    May 15th, 2008 | 10:59 am

    It took me quite a while to get past using sub selects at all. But if you can do it all with simple joins it is easier to read and (evidently) safer. The query could be done this way:

    update fa
    set iAccountId = 56789
    from folderaccounts fa
    join folders f on ca.iFolderid = c.iFolderid
    where f.iFolderID in (1159,1160,1161,1162,1193,1200,1197,1196,1195,1186,1191,1189)

    The other benefit would be that you can see with this pretty clearly with this statement that the join to folders is not necessary because it already contains the iFolderId field.

    I do agree with you that the original query that caused your problems should return an error if the subquery is not valid. But the problem with sub queries is that they tend to be slower than direct joins (even if they do work correctly). This can become a problem if you are dealing with a big number of records returned from the subquery because the intermediate table is not indexed.

    Hope that helps.

  5. Frank L
    November 7th, 2008 | 9:05 am

    I’m an Oracle, user not MS SQL but for what I know MS SQL is very
    forgiving of statements, and I think it is treating your column
    as a variable “iFoldersAccountsID” same as if you were
    doing Select “Y” from dual; So it allows you to run your query,
    and returning true any time if finds records. Again, that is my
    guess, I know in Oracle you have to identify it as variable, but
    I have done queries in different Data bases where you don’t have
    to specify that it is a variable or a column.
    Cheers.

Leave a reply