Skip to main content

ORA-27048: skgfifi: file header information is invalid

I was asked to analyze a situation, when an attempt to recover a 11g (standby) database resulted in bunch of "ORA-27048: skgfifi: file header information is invalid" errors.

I tried to reproduce the error on my test system, using different versions (EE, SE, 11.1.0.6, 11.1.0.7), but to no avail. Fortunately, I finally got to the failing system:

SQL> recover standby database;
ORA-00279: change 9614132 generated at 11/27/2009 17:59:06 needed for thread 1
ORA-00289: suggestion :
/u01/flash_recovery_area/T1/archivelog/2009_11_27/o1_mf_1_208_%u_.arc
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-27048: skgfifi: file header information is invalid
ORA-00280: change 9614132 for thread 1 is in sequence #208


Interestingly, nothing interesting is written to alert.log nor the trace file - just the ORA-27048, with no detail, so you cannot see which file is affected from there.

However, don't lose hope, there is always one thing to try: strace the process (fortunately this was a Linux box, not a Windows one).


sqlplus / as sysdba
SQL> !ps -ef
SQL> !strace -o ~/strace.trc -p <PID> &
SQL> recover standby database;
SQL> exit

That is, run you sqlplus, get the PID of your server process and strace it. (With the usual caveat that you are running the sqlplus locally, that you have a dedicated server process... well you have to, you're doing a recovery.)

And voilĂ , it's there! The server process does a lot of gettimeofday() calls (Oracle wait interface measuring time of code pieces, perhaps), opens datafiles, opens alert.log/xml and trace, writes to them... and traverses the FRA! (the listing below is just a very short sample)

access("/u01/flash_recovery_area/T1/archivelog/", R_OK) = 0
stat64("/u01/flash_recovery_area/T1/archivelog/", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
readlink("/u01/flash_recovery_area/T1/archivelog/", 0xbfe0c5c0, 512) = -1 EINVAL (Invalid argument)
open("/u01/flash_recovery_area/T1/archivelog/", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 21
fstat64(21, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fcntl64(21, F_SETFD, FD_CLOEXEC) = 0
getdents(21, /* 20 entries */, 4096) = 476
access("/u01/flash_recovery_area/T1/archivelog/2007_08_15", R_OK) = 0
access("/u01/flash_recovery_area/T1/archivelog/2007_08_15", R_OK) = 0
stat64("/u01/flash_recovery_area/T1/archivelog/2007_08_15", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
readlink("/u01/flash_recovery_area/T1/archivelog/2007_08_15", 0xbfe0c100, 512) = -1 EINVAL (Invalid argument)
open("/u01/flash_recovery_area/T1/archivelog/2007_08_15", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 22
fstat64(22, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
fcntl64(22, F_SETFD, FD_CLOEXEC) = 0
getdents(22, /* 2 entries */, 4096) = 32
getdents(22, /* 0 entries */, 4096) = 0
close(22) = 0

This was the top-level directory and one of its children - an empty one (2 entries in directory, i.e. "." and "..").

When we get to a non-empty dir, the server process analyzes the actual files there:

access("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_19_3d9wj20f_.arc", R_OK) = 0
access("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_19_3d9wj20f_.arc", R_OK) = 0
stat64("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_19_3d9wj20f_.arc", {st_mode=S_IFREG|0775, st_size=47616, ...}) = 0


Well, interesting, but nothing groundbreaking. However, for some of the files, the server process does something different:

stat64("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_21_3d9wspto_.arc.gz", {st_mode=S_IFREG|0775, st_size=24228, ...}) = 0
statfs("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_21_3d9wspto_.arc.gz", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=10119516, f_bfree=4882707, f_bavail=4368756, f_files=5144576, f_ffree=4885233, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/flash_recovery_area/T1/archivelog/2007_08_17/o1_mf_1_21_3d9wspto_.arc.gz", O_RDONLY|O_LARGEFILE) = 21
fstatfs(21, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=10119516, f_bfree=4882707, f_bavail=4368756, f_files=5144576, f_ffree=4885233, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
lseek(21, 0, SEEK_SET) = 0
read(21, "\37\213\10\0106\363\304F\0\3o1_mf_1_21_3d9wspto_.a"..., 512) = 512
close(21) = 0


Yes, it opens them and reads first 512 bytes of them - the file header! And if the file header is invalid (as is for the file above, as it is a gzipped file, so no longer an Oracle file), the skgfifi function signals an error (as can be confirmed by removing all .gz files and trying the recovery again).

Just a final note - the server process skipped some of the .gz files, read the rest of them - I could not find a simple rule how it decided.

Just a few conclusing note:

  • The reaction of Oracle Support for this was: Oracle does not recommend compressing files within FRA.
  • If you use "set logsource" in SQL*plus, the scanning does not happen, as there is simply just one directory to look into (set logsource will not work with FRA, due to the YYYY_MM_DD dirs in FRA).
  • For the background info (who came up with the problem, how do they solve it), visit Dbvisit blog - they stumbled upon it while supporting their "standby on Standard Edition" solution.

Comments

Popular posts from this blog

Filter and access predicates

More than just column projections When we look around for further pointers in the tree nodes, we find more pieces resembling the column projections we have seen so far. With some experimenting, we will find out that these are access predicates and filters. First of all, the location of these pointers is not always the same, it seems that the value at 0x34 is some kind of flag, indicating whether filters and/or access predicates are present, and where. Or whether there is just one, or more of them.  It probably also indicates what other info is present, but I have no idea what info that would be or what each value means. Resembling, but different The data we see as predicates are not columns; after all, a predicate is a condition, not a single column. But the structure is similar to what we have seen with columns, and if we follow pointers further, we eventually build a tree, and some of the leaves are indeed just column projections. After some contemplation, we realize it's...

dbms_alert on RAC

Not long time ago, I came across a usage of dbms_alert to manage running jobs. As the solution implemented must work also for RAC, I wanted to know whether dbms_alert works on RAC across instances. The documentation nor Metalink does not say anything (contrary to dbms_pipe, which does NOT work on RAC). So, if they don't warn, it should work... However, Julian Dyke says, that dbms_alert does not work and is the same as dbms_pipe (sources: http://juliandyke.com/Presentations/Presentations.html#ARoughGuideToRAC , page 17, or Pro Oracle Database 10g RAC on Linux, page 426). You know, never trust anybody, so I conducted a test case (10.2.0.3 on Linux x86_64, VMware ESX server, 2-node RAC): You will need two simultaneous sessions, I mark them with DWH1> and DWH2> here. DWH1> select instance_name from v$instance; INSTANCE_NAME ---------------- DWH1 DWH2> select instance_name from v$instance; INSTANCE_NAME ---------------- DWH2 DWH2> exec dbms_alert.register('TST'); ...

Execution plan rows

The plan As mentioned in previous post, our example is based on the sample SH schema, with an added table FOOBAR (id number, key varchar2(30)): SELECT prod_id, key FROM products CROSS JOIN foobar WHERE prod_id in (143,144,id) and id in (1,2,3); In all reverse engineering, it's good to start with something simple and to know what we to look for. Thus we want to know what the execution plan should look like; and the more unique numbers/ids we can find, the better. It's much easier to look for a number like 0x12fa1893 than for 0x0 or 0x1. The execution plan, obtained using: SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY(null,null,'ALL'));  is I have added the CPU cost from the full detail of the execution plan in  v$sql_plan / x$kqlfxpl. Looking at the numbers, we also have rows (1 and 2 ... not very unique), bytes (34 / 30 / 8 - not bad) and what is not shown here, we also know object ids of the index and the table: 94765 and 92749 (nice). We did not use any...