Skip to main content

Posts

Showing posts from July, 2009

More than one way to save blocks

In these days, disk capacity is often not a big issue today. If you have at least a decent load on the database, you will hit the IOPS limit much sooner than you run out of disk space.

Well, almost. First, you will still have a lot of inactive data that consumes the space but does not require any IOPS. And second, in some applications (like ETL in DWH) you are bound by throughput. Let me talk about this case.

Don't expect any exceptional thoughts, this post just inspired by a real production case and tries to pinpoint that there is always one more way to consider.

The optimal plan for many ETL queries is a lot of fullscans with hash joins. And often, you read one table more times, to join it in different ways. Such queries benefit if you make your tables smaller - you save on I/O.

(1) In ETL, your source tables are often imported from a different system, and you actually don't need all columns from the tables. So, first of all - don't load the data you don't need. However,…

A latteral view quirk

This quest started with the usual question: why is this query so slow? To put it in the picture, it was a query loading one DWH table by reading one source table from a legacy system (already loaded to Oracle, so no heterogenous services were involved at this step), joining it several times to several tables.
(It's the usual badly-designed legacy system: if flag1 is I, join table T1 by C1, if flag1 is N, join table T1 by C2... 20 times.)

If I simplify the query, we are talking about something like:
SELECT T1.m,
case
when T1.h = 'I' then T2_I.n
when T1.h = 'G' then T2_G.n
else null
end
FROM T1
LEFT OUTER JOIN T2 T2_I
ON (T1.h = 'I' and T1.y = T2_I.c1)
LEFT OUTER JOIN T2 T2_G
ON (T1.h = 'G' and T1.z = T2_G.c2)

We even know, that the query always return number of rows identical to number of rows in T2. However, ommiting the T1.h = 'I'/'G' conditions in join clause would duplicate the rows, so the conditions are necessary there. Ofcourse i…

Gentle introduction to opimization

I was just asked to prepare a short, one-hour workshop/presentation about optimization on Oracle. As this topis is so huge, and everyone had already read something, I decided to concept this workshop as an overview of the concepts (starting with db design) and the tools available.
The .pdf version is thus a kind of checklist - have you read about all of these issues? Have you thought them out when designing your system?
I hope you will find at least one new thing there:-)
The PDF is available for download on my website download area.