Wednesday, December 2, 2009

ORA-27048: skgfifi: file header information is invalid

I was asked to analyze a situation, when an attempt to recover a 11g (standby) database resulted in bunch of "ORA-27048: skgfifi: file header information is invalid" errors.

I tried to reproduce the error on my test system, using different versions (EE, SE, 11.1.0.6, 11.1.0.7), but to no avail. Fortunately, I finally got to the failing system:

SQL> recover standby database;
ORA-00279: change 9614132 generated at 11/27/2009 17:59:06 needed for thread 1
ORA-00289: suggestion :
/u01/flash_recovery_area/T1/archivelog/2009_11_27/o1_mf_1_208_%u_.arc
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-00280: change 9614132 for thread 1 is in sequence #208


Interestingly, nothing interesting is written to alert.log nor the trace file - just the ORA-27048, with no detail, so you cannot see which file is affected from there.

However, don't lose hope, there is always one thing to try: strace the process (fortunately this was a Linux box, not a Windows one).


sqlplus / as sysdba
SQL> !ps -ef
SQL> !strace -o ~/strace.trc -p <PID> &
SQL> recover standby database;
SQL> exit

That is, run you sqlplus, get the PID of your server process and strace it. (With the usual caveat that you are running the sqlplus locally, that you have a dedicated server process... well you have to, you're doing a recovery.)

And voilà, it's there! The server process does a lot of gettimeofday() calls (Oracle wait interface measuring time of code pieces, perhaps), opens datafiles, opens alert.log/xml and trace, writes to them... and traverses the FRA! (the listing below is just a very short sample)

access("/u01/flash_recovery_area/T1/archivelog/", R_OK) = 0
stat64("/u01/flash_recovery_area/T1/archivelog/", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
readlink("/u01/flash_recovery_area/T1/archivelog/", 0xbfe0c5c0, 512) = -1 EINVAL (Invalid argument)
open("/u01/flash_recovery_area/T1/archivelog/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 21
fstat64(21, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fcntl64(21, F_SETFD, FD_CLOEXEC) = 0
getdents(21, /* 20 entries */, 4096) = 476
access("/u01/flash_recovery_area/T1/archivelog/2007_08_15", R_OK) = 0
access("/u01/flash_recovery_area/T1/archivelog/2007_08_15", R_OK) = 0
stat64("/u01/flash_recovery_area/T1/archivelog/2007_08_15", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
readlink("/u01/flash_recovery_area/T1/archivelog/2007_08_15", 0xbfe0c100, 512) = -1 EINVAL (Invalid argument)
open("/u01/flash_recovery_area/T1/archivelog/2007_08_15", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 22
fstat64(22, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fcntl64(22, F_SETFD, FD_CLOEXEC) = 0
getdents(22, /* 2 entries */, 4096) = 32
getdents(22, /* 0 entries */, 4096) = 0
close(22) = 0

This was the top-level directory and one of its children - an empty one (2 entries in directory, i.e. "." and "..").

When we get to a non-empty dir, the server process analyzes the actual files there:

access("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_19_3d9wj20f_.arc", R_OK) = 0
access("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_19_3d9wj20f_.arc", R_OK) = 0
stat64("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_19_3d9wj20f_.arc", {st_mode=S_IFREG|0775, st_size=47616, ...}) = 0


Well, interesting, but nothing groundbreaking. However, for some of the files, the server process does something different:

stat64("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_21_3d9wspto_.arc.gz", {st_mode=S_IFREG|0775, st_size=24228, ...}) = 0
statfs("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_21_3d9wspto_.arc.gz", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=10119516, f_bfree=4882707, f_bavail=4368756, f_files=5144576, f_ffree=4885233, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_21_3d9wspto_.arc.gz", O_RDONLY|O_LARGEFILE) = 21
fstatfs(21, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=10119516, f_bfree=4882707, f_bavail=4368756, f_files=5144576, f_ffree=4885233, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
lseek(21, 0, SEEK_SET) = 0
read(21, "\37\213\10\0106\363\304F\0\3o1_mf_1_21_3d9wspto_.a"..., 512) = 512
close(21) = 0


Yes, it opens them and reads first 512 bytes of them - the file header! And if the file header is invalid (as is for the file above, as it is a gzipped file, so no longer an Oracle file), the skgfifi function signals an error (as can be confirmed by removing all .gz files and trying the recovery again).

Just a final note - the server process skipped some of the .gz files, read the rest of them - I could not find a simple rule how it decided.

Just a few conclusing note:

  • The reaction of Oracle Support for this was: Oracle does not recommend compressing files within FRA.
  • If you use "set logsource" in SQL*plus, the scanning does not happen, as there is simply just one directory to look into (set logsource will not work with FRA, due to the YYYY_MM_DD dirs in FRA).
  • For the background info (who came up with the problem, how do they solve it), visit Dbvisit blog - they stumbled upon it while supporting their "standby on Standard Edition" solution.