I was asked to analyze a situation, when an attempt to recover a 11g (standby) database resulted in bunch of "ORA-27048: skgfifi: file header information is invalid" errors.
I tried to reproduce the error on my test system, using different versions (EE, SE, 11.1.0.6, 11.1.0.7), but to no avail. Fortunately, I finally got to the failing system:
Interestingly, nothing interesting is written to alert.log nor the trace file - just the ORA-27048, with no detail, so you cannot see which file is affected from there.
However, don't lose hope, there is always one thing to try: strace the process (fortunately this was a Linux box, not a Windows one).
That is, run you sqlplus, get the PID of your server process and strace it. (With the usual caveat that you are running the sqlplus locally, that you have a dedicated server process... well you have to, you're doing a recovery.)
And voilĂ , it's there! The server process does a lot of gettimeofday() calls (Oracle wait interface measuring time of code pieces, perhaps), opens datafiles, opens alert.log/xml and trace, writes to them... and traverses the FRA! (the listing below is just a very short sample)
This was the top-level directory and one of its children - an empty one (2 entries in directory, i.e. "." and "..").
When we get to a non-empty dir, the server process analyzes the actual files there:
Well, interesting, but nothing groundbreaking. However, for some of the files, the server process does something different:
Yes, it opens them and reads first 512 bytes of them - the file header! And if the file header is invalid (as is for the file above, as it is a gzipped file, so no longer an Oracle file), the skgfifi function signals an error (as can be confirmed by removing all .gz files and trying the recovery again).
Just a final note - the server process skipped some of the .gz files, read the rest of them - I could not find a simple rule how it decided.
Just a few conclusing note:
I tried to reproduce the error on my test system, using different versions (EE, SE, 11.1.0.6, 11.1.0.7), but to no avail. Fortunately, I finally got to the failing system:
SQL> recover standby database;
ORA-00279: change 9614132 generated at 11/27/2009 17:59:06 needed for thread 1
ORA-00289: suggestion :
/u01/flash_recovery_area/T1/archivelog/2009_11_27/o1_mf_1_208_%u_.arc
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-00280: change 9614132 for thread 1 is in sequence #208
Interestingly, nothing interesting is written to alert.log nor the trace file - just the ORA-27048, with no detail, so you cannot see which file is affected from there.
However, don't lose hope, there is always one thing to try: strace the process (fortunately this was a Linux box, not a Windows one).
sqlplus / as sysdba
SQL> !ps -ef
SQL> !strace -o ~/strace.trc -p <PID> &
SQL> recover standby database;
SQL> exit
That is, run you sqlplus, get the PID of your server process and strace it. (With the usual caveat that you are running the sqlplus locally, that you have a dedicated server process... well you have to, you're doing a recovery.)
And voilĂ , it's there! The server process does a lot of gettimeofday() calls (Oracle wait interface measuring time of code pieces, perhaps), opens datafiles, opens alert.log/xml and trace, writes to them... and traverses the FRA! (the listing below is just a very short sample)
access("/u01/flash_recovery_area/T1/archivelog/", R_OK) = 0
stat64("/u01/flash_recovery_area/T1/archivelog/", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
readlink("/u01/flash_recovery_area/T1/archivelog/", 0xbfe0c5c0, 512) = -1 EINVAL (Invalid argument)
open("/u01/flash_recovery_area/T1/archivelog/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 21
fstat64(21, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fcntl64(21, F_SETFD, FD_CLOEXEC) = 0
getdents(21, /* 20 entries */, 4096) = 476
access("/u01/flash_recovery_area/T1/archivelog/2007_08_15", R_OK) = 0
access("/u01/flash_recovery_area/T1/archivelog/2007_08_15", R_OK) = 0
stat64("/u01/flash_recovery_area/T1/archivelog/2007_08_15", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
readlink("/u01/flash_recovery_area/T1/archivelog/2007_08_15", 0xbfe0c100, 512) = -1 EINVAL (Invalid argument)
open("/u01/flash_recovery_area/T1/archivelog/2007_08_15", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 22
fstat64(22, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fcntl64(22, F_SETFD, FD_CLOEXEC) = 0
getdents(22, /* 2 entries */, 4096) = 32
getdents(22, /* 0 entries */, 4096) = 0
close(22) = 0
This was the top-level directory and one of its children - an empty one (2 entries in directory, i.e. "." and "..").
When we get to a non-empty dir, the server process analyzes the actual files there:
access("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_19_3d9wj20f_.arc", R_OK) = 0
access("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_19_3d9wj20f_.arc", R_OK) = 0
stat64("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_19_3d9wj20f_.arc", {st_mode=S_IFREG|0775, st_size=47616, ...}) = 0
Well, interesting, but nothing groundbreaking. However, for some of the files, the server process does something different:
stat64("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_21_3d9wspto_.arc.gz", {st_mode=S_IFREG|0775, st_size=24228, ...}) = 0
statfs("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_21_3d9wspto_.arc.gz", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=10119516, f_bfree=4882707, f_bavail=4368756, f_files=5144576, f_ffree=4885233, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_21_3d9wspto_.arc.gz", O_RDONLY|O_LARGEFILE) = 21
fstatfs(21, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=10119516, f_bfree=4882707, f_bavail=4368756, f_files=5144576, f_ffree=4885233, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
lseek(21, 0, SEEK_SET) = 0
read(21, "\37\213\10\0106\363\304F\0\3o1_mf_1_21_3d9wspto_.a"..., 512) = 512
close(21) = 0
Yes, it opens them and reads first 512 bytes of them - the file header! And if the file header is invalid (as is for the file above, as it is a gzipped file, so no longer an Oracle file), the skgfifi function signals an error (as can be confirmed by removing all .gz files and trying the recovery again).
Just a final note - the server process skipped some of the .gz files, read the rest of them - I could not find a simple rule how it decided.
Just a few conclusing note:
- The reaction of Oracle Support for this was: Oracle does not recommend compressing files within FRA.
- If you use "set logsource" in SQL*plus, the scanning does not happen, as there is simply just one directory to look into (set logsource will not work with FRA, due to the YYYY_MM_DD dirs in FRA).
- For the background info (who came up with the problem, how do they solve it), visit Dbvisit blog - they stumbled upon it while supporting their "standby on Standard Edition" solution.
Comments