Skip to main content

More than one way to save blocks

In these days, disk capacity is often not a big issue today. If you have at least a decent load on the database, you will hit the IOPS limit much sooner than you run out of disk space.

Well, almost. First, you will still have a lot of inactive data that consumes the space but does not require any IOPS. And second, in some applications (like ETL in DWH) you are bound by throughput. Let me talk about this case.

Don't expect any exceptional thoughts, this post just inspired by a real production case and tries to pinpoint that there is always one more way to consider.

The optimal plan for many ETL queries is a lot of fullscans with hash joins. And often, you read one table more times, to join it in different ways. Such queries benefit if you make your tables smaller - you save on I/O.

(1) In ETL, your source tables are often imported from a different system, and you actually don't need all columns from the tables. So, first of all - don't load the data you don't need. However, usually you just can't drop the columns - this would change the data model, you would have to update it in the ETL tool, and you would have to do a lot of work when the list of used columns change.
How to tackle this? Use views, and select NULL for the columns you don't need. Use FGA (Fine-grained auditing) (at least on test) to verify you don't access any of those non-loaded columns. (Just beware, that things like dbms_stats access all columns.)
(Bonus: depending on the source system, transferring less data may take less time due to limits of the transfer channel - ODBC, network, etc.)

(2) As the source data is usually loaded only once and truncated before each load, PCTFREE should be 0, so no space is lost for allocations that will never come.

(3) Now, with the (1) implemented, the tables contain (a lot of) NULL columns. It's just one byte for each such column, but interestingly, it still makes a difference. Just recreate the tables while putting the NULL columns at the end. (No proper application depend on column order, right?). On a 1.2GB table, we got a 35% saving just by using (2) and (3) - it's really worth trying.

(4) And of course, use APPEND hint and use the 10g table direct-load data compression (COMPRESS in table definition). Another 50% for us...

Please note that the only problem you usually face here is to get a list of used columns - fortunately, most ETL tools (like ODI) can provide it, even it means accessing directly their repository (snp_txt_crossr in ODI). The rest is easy to automate.


Popular posts from this blog

ORA-27048: skgfifi: file header information is invalid

I was asked to analyze a situation, when an attempt to recover a 11g (standby) database resulted in bunch of "ORA-27048: skgfifi: file header information is invalid" errors.

I tried to reproduce the error on my test system, using different versions (EE, SE,,, but to no avail. Fortunately, I finally got to the failing system:

SQL> recover standby database;
ORA-00279: change 9614132 generated at 11/27/2009 17:59:06 needed for thread 1
ORA-00289: suggestion :
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-00280: change 9614132 for thread 1 is in sequence #208

Interestingly, nothing interesting is written to alert.log n…

Reading data from PGA and SGA

Overview For our investigation of execution plan as it is stored in memory, we need in the first place to be able to read the memory.

We have the options of
x$ksmmem, reading SGA using SQL. Personally I don't like it, it's cumbersome and SGA read: obviously reading SGA only; it's fast and easy to doread process memory: can read PGA, process stack - and since the processes do map the SGA, too, you can read it as well. Unfortunately ptrace sends signals to the processes and the process is paused when reading it, but so far all my reads were short and fast and the processes did not notice. Some OS configurations can prevent you from using ptrace (e.g. docker by default), google for CAP_SYS_PTRACE.gdb: using your favorite debugger, you can read memory as well. Useful when investigating. Direct SGA read I always considered direct SGA read of some dark magic, but the fundamentals are actually very easy. It still looks like sorcery when actually reading the Oracle in…

A few thoughts about OCM 12c upgrade

Yesterday I sat for the 12c OCM upgrade exam, which I mentioned in few blog posts before. The first step after checking your ID is of course signing the NDA, and thus you won't find much real information here.

This time I chose Utrecht as the place to take the exam. Not that I have any special preference, I took each of the exams in a different place so far. The only requirements were convenient time and location defined as 'somewhere in Europe'. But in the end, Utrecht turned out to be a good place. Oracle NL headquarters are easy accessible, it's a very new building, the lunch was good:-)
And the city is nice to see.

Regarding the exam, the usual important notes still hold true:

Arrive on time. It's a long day and you will have a lot of things to do.You will work hard the whole day. Get a good sleep before, be well rested.Review the exam topics well. Note that they may have change over time. There is for example an update as of January 1, 2016: Flex ASM was added…