Wednesday, December 2, 2009

ORA-27048: skgfifi: file header information is invalid

I was asked to analyze a situation, when an attempt to recover a 11g (standby) database resulted in bunch of "ORA-27048: skgfifi: file header information is invalid" errors.

I tried to reproduce the error on my test system, using different versions (EE, SE,,, but to no avail. Fortunately, I finally got to the failing system:

SQL> recover standby database;
ORA-00279: change 9614132 generated at 11/27/2009 17:59:06 needed for thread 1
ORA-00289: suggestion :
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-00280: change 9614132 for thread 1 is in sequence #208

Interestingly, nothing interesting is written to alert.log nor the trace file - just the ORA-27048, with no detail, so you cannot see which file is affected from there.

However, don't lose hope, there is always one thing to try: strace the process (fortunately this was a Linux box, not a Windows one).

sqlplus / as sysdba
SQL> !ps -ef
SQL> !strace -o ~/strace.trc -p <PID> &
SQL> recover standby database;
SQL> exit

That is, run you sqlplus, get the PID of your server process and strace it. (With the usual caveat that you are running the sqlplus locally, that you have a dedicated server process... well you have to, you're doing a recovery.)

And voilĂ , it's there! The server process does a lot of gettimeofday() calls (Oracle wait interface measuring time of code pieces, perhaps), opens datafiles, opens alert.log/xml and trace, writes to them... and traverses the FRA! (the listing below is just a very short sample)

access("/u01/flash_recovery_area/T1/archivelog/", R_OK) = 0
stat64("/u01/flash_recovery_area/T1/archivelog/", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
readlink("/u01/flash_recovery_area/T1/archivelog/", 0xbfe0c5c0, 512) = -1 EINVAL (Invalid argument)
open("/u01/flash_recovery_area/T1/archivelog/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 21
fstat64(21, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fcntl64(21, F_SETFD, FD_CLOEXEC) = 0
getdents(21, /* 20 entries */, 4096) = 476
access("/u01/flash_recovery_area/T1/archivelog/2007_08_15", R_OK) = 0
access("/u01/flash_recovery_area/T1/archivelog/2007_08_15", R_OK) = 0
stat64("/u01/flash_recovery_area/T1/archivelog/2007_08_15", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
readlink("/u01/flash_recovery_area/T1/archivelog/2007_08_15", 0xbfe0c100, 512) = -1 EINVAL (Invalid argument)
open("/u01/flash_recovery_area/T1/archivelog/2007_08_15", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 22
fstat64(22, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fcntl64(22, F_SETFD, FD_CLOEXEC) = 0
getdents(22, /* 2 entries */, 4096) = 32
getdents(22, /* 0 entries */, 4096) = 0
close(22) = 0

This was the top-level directory and one of its children - an empty one (2 entries in directory, i.e. "." and "..").

When we get to a non-empty dir, the server process analyzes the actual files there:

access("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_19_3d9wj20f_.arc", R_OK) = 0
access("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_19_3d9wj20f_.arc", R_OK) = 0
stat64("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_19_3d9wj20f_.arc", {st_mode=S_IFREG|0775, st_size=47616, ...}) = 0

Well, interesting, but nothing groundbreaking. However, for some of the files, the server process does something different:

stat64("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_21_3d9wspto_.arc.gz", {st_mode=S_IFREG|0775, st_size=24228, ...}) = 0
statfs("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_21_3d9wspto_.arc.gz", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=10119516, f_bfree=4882707, f_bavail=4368756, f_files=5144576, f_ffree=4885233, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_21_3d9wspto_.arc.gz", O_RDONLY|O_LARGEFILE) = 21
fstatfs(21, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=10119516, f_bfree=4882707, f_bavail=4368756, f_files=5144576, f_ffree=4885233, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
lseek(21, 0, SEEK_SET) = 0
read(21, "\37\213\10\0106\363\304F\0\3o1_mf_1_21_3d9wspto_.a"..., 512) = 512
close(21) = 0

Yes, it opens them and reads first 512 bytes of them - the file header! And if the file header is invalid (as is for the file above, as it is a gzipped file, so no longer an Oracle file), the skgfifi function signals an error (as can be confirmed by removing all .gz files and trying the recovery again).

Just a final note - the server process skipped some of the .gz files, read the rest of them - I could not find a simple rule how it decided.

Just a few conclusing note:

  • The reaction of Oracle Support for this was: Oracle does not recommend compressing files within FRA.
  • If you use "set logsource" in SQL*plus, the scanning does not happen, as there is simply just one directory to look into (set logsource will not work with FRA, due to the YYYY_MM_DD dirs in FRA).
  • For the background info (who came up with the problem, how do they solve it), visit Dbvisit blog - they stumbled upon it while supporting their "standby on Standard Edition" solution.

Monday, November 9, 2009

11g OCM update

Just a quick note: Oracle set the 11g OCM upgrade release date to November 30, 2009. However, schedule is still "comming soon"...

Friday, October 23, 2009

OCM 11g

After all, I learned one new thing I did not expect to find out at OOW: at the OCP Lounge (where, contrary to promises, no networking took place), I was told new info regarding the 11g OCM exam.
While from supposedly trustworthy source, don't take the following as 100%-sure; still, it's worth to know, that:
- the 11g OCM Upgrade exam is scheduled to start late November this year, and will be delivered on 11gR1.
- the full 2-day 11g OCM has no schedule yet, and will be delivered on 11gR2 from it's start.

The only trouble I see is the attendance - with 10g OCM, every local Oracle was cancelling it if only one person signed for it - two persons was minimum. With the upgrade, how many people will sign up?

I plan to take it quite soon after it's available, somewhere in Europe (UK has usually densest exam schedule, while countries like Czech Republic/Slovakia/Italy etc. have lower prices for the exam).
Drop me a line if you would be interested in taking the exam, so we can force some Oracle local office to actually DO the exam instead of cancelling it again.

OOW 2009 experience

As I was at the Open World last week, you would expect me to post a bunch of blog entries, right? Well, no... first of all, it was already covered by many others, closer to the real time of the events. You can read about the perhaps most interesting event I attended at Pythian OOW09 Diaries.

Still, let me empahsize one thing - as a first-timer to OOW I realized, that after all, the sessions held there are not so important, after all. Ok, select some of them, but reserve enought time for the Unconference, for the OTN Lounge, and to meet other fellows. You will catch up with the sessions using OOW On-demand (and as I remember, the PDFs are published later for general public) later - you will have to do it anyway, you can't attend everything you would like.

Thursday, July 16, 2009

More than one way to save blocks

In these days, disk capacity is often not a big issue today. If you have at least a decent load on the database, you will hit the IOPS limit much sooner than you run out of disk space.

Well, almost. First, you will still have a lot of inactive data that consumes the space but does not require any IOPS. And second, in some applications (like ETL in DWH) you are bound by throughput. Let me talk about this case.

Don't expect any exceptional thoughts, this post just inspired by a real production case and tries to pinpoint that there is always one more way to consider.

The optimal plan for many ETL queries is a lot of fullscans with hash joins. And often, you read one table more times, to join it in different ways. Such queries benefit if you make your tables smaller - you save on I/O.

(1) In ETL, your source tables are often imported from a different system, and you actually don't need all columns from the tables. So, first of all - don't load the data you don't need. However, usually you just can't drop the columns - this would change the data model, you would have to update it in the ETL tool, and you would have to do a lot of work when the list of used columns change.
How to tackle this? Use views, and select NULL for the columns you don't need. Use FGA (Fine-grained auditing) (at least on test) to verify you don't access any of those non-loaded columns. (Just beware, that things like dbms_stats access all columns.)
(Bonus: depending on the source system, transferring less data may take less time due to limits of the transfer channel - ODBC, network, etc.)

(2) As the source data is usually loaded only once and truncated before each load, PCTFREE should be 0, so no space is lost for allocations that will never come.

(3) Now, with the (1) implemented, the tables contain (a lot of) NULL columns. It's just one byte for each such column, but interestingly, it still makes a difference. Just recreate the tables while putting the NULL columns at the end. (No proper application depend on column order, right?). On a 1.2GB table, we got a 35% saving just by using (2) and (3) - it's really worth trying.

(4) And of course, use APPEND hint and use the 10g table direct-load data compression (COMPRESS in table definition). Another 50% for us...

Please note that the only problem you usually face here is to get a list of used columns - fortunately, most ETL tools (like ODI) can provide it, even it means accessing directly their repository (snp_txt_crossr in ODI). The rest is easy to automate.

Wednesday, July 8, 2009

A latteral view quirk

This quest started with the usual question: why is this query so slow? To put it in the picture, it was a query loading one DWH table by reading one source table from a legacy system (already loaded to Oracle, so no heterogenous services were involved at this step), joining it several times to several tables.
(It's the usual badly-designed legacy system: if flag1 is I, join table T1 by C1, if flag1 is N, join table T1 by C2... 20 times.)

If I simplify the query, we are talking about something like:
when T1.h = 'I' then T2_I.n
when T1.h = 'G' then T2_G.n
else null
ON (T1.h = 'I' and T1.y = T2_I.c1)
ON (T1.h = 'G' and T1.z = T2_G.c2)

We even know, that the query always return number of rows identical to number of rows in T2. However, ommiting the T1.h = 'I'/'G' conditions in join clause would duplicate the rows, so the conditions are necessary there. Ofcourse it's not possible to move the conditions to WHERE clause, as this would elimitate all rows from result query.

To make the test case query even shorter, we can use for the demonstration just:
SELECT count(*)
ON (T1.h = 'I' and T1.y = T2.c1)

(This query makes almost no business sense now, but the the lateral view issue I want to demonstrate is still there.)

The query plan looks like:

|Id|Operation |Name|Rows |Bytes| Cost | Time |
|0 |SELECT STATEMENT | | 1 | 43 | 1804M|999:59:59 |
|1 | SORT AGGREGATE | | 1 | 43 | | |
|2 | NESTED LOOPS OUTER | | 9805M| 392G| 1804M|999:59:59 |
|3 | TABLE ACCESS FULL | T1 | 188K|7899K| 718 | 00:00:09 |
|4 | VIEW | |52124 | | 9593 | 00:01:56 |
|*5| FILTER | | | | | |
|*6| TABLE ACCESS FULL| T2 |52124 | 356K| 9593 | 00:01:56 |
Predicate Information (identified by operation id):

5 - filter("T1"."H"='I')
6 - filter("T1"."Y"="T2"."C1")

This is awful! Cost of 1804M just for joining two tables (T1: 188K rows, T2: 5M rows). And yes, the execution proves the plan is not good (I did not have the patience to wait many hours (days?) for the query to finish).

However, a colleague suggested modifying the query as follows:

SELECT count(*)
ON (T1.h = nvl('I',T2.c1) and T1.y = T2.c1)

This does not change the result set - the 'I' is always not null and thus the nvl is superfluos. However, we get a different execution plan!

|Id|Operation |Name|Rows |Bytes| Cost | Time |
|0 |SELECT STATEMENT | | 1 | 54 | 5409K| 18:01:45 |
|1 | SORT AGGREGATE | | 1 | 54 | | |
|*2| HASH JOIN OUTER | | 9805M| 493G| 5409K| 18:01:45 |
|3 | TABLE ACCESS FULL| T1 | 188K|7899K| 718 | 00:00:09 |
|4 | TABLE ACCESS FULL| T2 | 5212K| 54M| 9585 | 00:01:55 |

Predicate Information (identified by operation id):

2 - access("T1"."Y"="T2"."C1"(+) AND

The cost is now 5409K, the operation is a nice hash join, and the query really finishes in few minutes.

The question now is: WHY?

Well, this is a matter of query optimization and plan generation, so the first person to ask is directly the CBO. So, I enabled the 10053 event for the two queries and dived into the two trace files, mainly to see the differences.

Both queries had the main query block initially rewritten as:
SQL:******* UNPARSED QUERY IS *******
SELECT "T1"."Y" "Y","T1"."H" "H",
"from$_subquery$_004"."C1_0" "C1"
FROM "SCOTT"."T1" "T1",
LATERAL( (SELECT "T2"."C1" "C1_0" FROM "SCOTT"."T2" "T2" WHERE "T1"."H"='G' AND "T1"."Y"="T2"."C1"))(+) "from$_subquery$_004"

(The second query with the added NVL in "T1"."H"=NVL('G',"T2"."C1") ).

So, for Oracle, it is a lateral (correlated) view. That's not nice, but at this stage of CBO processing, normal. CBO will try to get rid of it.

However only for the NVL case the CBO trace shows:
CVM:   Merging SPJ view SEL$1 (#0) into SEL$2 (#0)

Followed by:
SQL:******* UNPARSED QUERY IS *******
FROM "SCOTT"."T1" "T1","SCOTT"."T2" "T2"
WHERE "T1"."Y"="T2"."C1"(+)
AND "T1"."H"=NVL('G',"T2"."C1"(+))

Thus, CBO was able to rewrite is as the old-fashioned (+) outer join; however, it was not able to do it for the non-NVL query. And these result are passed to the next stage, and as no constraints or predicate move-around changes the query, they are verbatim passed for actual plan generation. And understandably, a lateral (correlated) view is not considered for hash join.

Anyway, should you read the Inside the Oracle Optimizer blog, you would already know that this is the classical example of the lateral non-mergeable view. Still, why the second one worked as we wanted?

Well, the quirk is in the fact that there is no way how to write the non-NVL query using (+) syntax - there is just no place to put the (+) sign to the t1.y='I' predicate to change it from filter to join predicate. However, adding artifically a column from T2 makes it possible, and the CBO did it. The CBO internally uses the old Oracle syntax, and thus if you can't rewrite your query using that syntax, neither CBO can.

Just a note - the same applies for example for predicate length(t1.q)=10, you can save the day by using length(nvl(t1.q,t2.c1))=10.

Tested on: Windows 64-bit (EM64T), Oracle

Sunday, July 5, 2009

Gentle introduction to opimization

I was just asked to prepare a short, one-hour workshop/presentation about optimization on Oracle. As this topis is so huge, and everyone had already read something, I decided to concept this workshop as an overview of the concepts (starting with db design) and the tools available.
The .pdf version is thus a kind of checklist - have you read about all of these issues? Have you thought them out when designing your system?
I hope you will find at least one new thing there:-)
The PDF is available for download on my website download area.