Monday, December 06, 2010

Some useful SQL Server links (for me)

As a fairly well experienced database developer who's new to MS SQL Server, here are some links that have been useful to me so far: Introduction to Dynamic SQL Part 1 Introduction to Dynamic SQL Part 2 - here sp_executesql is what I was really looking for, as it lets you execute dynamic SQL with bound parameters

Transact-SQL Data types reference (MS)

Translating from Oracle to SQL Server
SQL Server Stored Procedure Basics

Friday, November 05, 2010

The GO command in SQL Server is funny

Well I'm doing serious work in MS SQL Server for the first time, and I'm learning by doing, understanding its quirks as I go. The GO command, which seems analogous to / in Oracle's SQL*Plus, can be confusing!

Friday, October 15, 2010

A classic database-related cartoon

Although, I was a little disappointed with the comment about sanitizing database inputs. You don't even have to sanitize them if you use bind variables. You don't need to run the input through some function that escapes quotes and such - just bind the input correctly and DONE!

OK maybe that doesn't hold true for ALL databases (I don't know one way or the other). Definitely true with Oracle though.

Remember kids (Oracle, redo logging)...

Borrowed from AskTom (and slightly cleaned up):

Table Mode Insert Mode ArchiveLog Mode  Result
---------- ----------- ---------------  ------
LOGGING    APPEND      ARCHIVELOG       redo generated
LOGGING    no append   ARCHIVELOG       redo generated
NOLOGGING  no append   ARCHIVELOG       redo generated
LOGGING    no append   NOARCHIVELOG     redo generated
NOLOGGING  no append   NOARCHIVELOG     redo generated

Saturday, March 06, 2010

Partitioned Aggregates ...or... I love "Distinct On"

Let's begin by creating a table dealing with loan data:

postgres=# create table loan (id serial primary key, type varchar(1), issue_date date, issue_amount numeric);
NOTICE: CREATE TABLE will create implicit sequence "loan_id_seq" for serial column ""
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "loan_pkey" for table "loan"

Create some test data in an appropriately-named file, testdata:


Load it up...

postgres=# \copy loan (type, issue_date, issue_amount) from 'testdata' delimiter '|'

postgres=# select * from loan;

id | type | issue_date | issue_amount
1 | A | 2009-04-05 | 10000
2 | A | 2010-01-01 | 50000
3 | A | 2007-06-05 | 40000
4 | B | 1985-08-15 | 100000
5 | B | 1990-04-30 | 10000
6 | C | 2010-03-04 | 5555
7 | D | 2001-11-22 | 512591
8 | D | 2008-05-17 | 250259
9 | D | 2010-02-06 | 22222
10 | D | 2010-02-04 | 99999
(10 rows)

Now, here's a fairly common query I have to run. I'd like to get the most recent loan per type. That is, I want to tell postgres: For each type value (A,B,C,D), get me the row with the greatest issue_date.

How to do this? We obviously need to use max(issue_date) somewhere. The rest is a little bit more esoteric.

We're going to solve this problem using the unique-to-postgres DISTINCT ON feature. DISTINCT ON (col) gives you one row for each value of col. For example:

postgres=# select distinct on (type) * from loan;

id | type | issue_date | issue_amount
1 | A | 2009-04-05 | 10000
4 | B | 1985-08-15 | 100000
6 | C | 2010-03-04 | 5555
7 | D | 2001-11-22 | 512591
(4 rows)

Ta-da! But wait, these aren't the most recent rows per type. An immediately obvious problem is that for type A, the row with id=2 should be returned. Postgres thought: For each type, return the first row I see. How do we get it to return the most recent only?

The answer lies in the good ol' ORDER BY clause. If we sort appropriately, the first row Postgres sees of each type will be the most recent:

postgres=# select distinct on (type) * from loan order by type, issue_date desc;
id | type | issue_date | issue_amount
2 | A | 2010-01-01 | 50000
5 | B | 1990-04-30 | 10000
6 | C | 2010-03-04 | 5555
9 | D | 2010-02-06 | 22222
(4 rows)

Now I can really say Ta-da... This was something that was not very intuitive to me, but has proved very useful to know.

How would I solve this problem using Oracle? I would probably use a so-called Analytic query. Like most things Oracle, this feature is powerful but complex. It's more applicable to a wider set of problems, but for this specific problem it seems a little bit complicated.

Unfortunately I don't have an Oracle install handy at the moment, but it should be something like this:

select *
from (select id, type, issue_date, issue_amount,
row_number() over (partition by type order by issue_date desc) rn
from loan)
where rn = 1;

What we're doing above is telling Oracle to make a row number for each row in the table, partitioned by type. Conceptually it's doing something like this:

1 |A |...|1
2 |A |...|2
3 |A |...|3
4 |B |...|1
5 |B |...|2
6 |C |...|1
7 |D |...|1
8 |D |...|2
9 |D |...|3
10|D |...|4

So above we see a row number assigned to each row, where the row number is reset to 1 for each type.

There is more to it. In the PARTITION BY clause we also have "order by issue_date desc" - this means for each type, row_number 1 is assigned to the row with the greatest issue_date. If we treat this whole query as a subquery, we can simply select * from it where row_number=1. The resultset should be the same as with Postgresql's DISTINCT ON:

 id | type | issue_date | issue_amount
  2 | A    | 2010-01-01 |        50000
  5 | B    | 1990-04-30 |        10000
  6 | C    | 2010-03-04 |         5555
  9 | D    | 2010-02-06 |        22222
(4 rows)

Wednesday, February 03, 2010

DOH - conversions again

I was just running a series of long-running statements...An insert statement bombed because I had hard-coded a null value to insert into a newly added column. (As our data providers feed us the data later, it will be populated.) Use of a subquery seemed to confuse Postgres, breaking the statement. A simple example can illustrate:

core=# create table delete_me (a numeric);

/* This works */
core=# insert into delete_me select null;

/* This does NOT work */
core=# insert into delete_me select a from (select null as a) sub;
ERROR: failed to find conversion function from unknown to numeric