Skip to main content

Reading data from PGA and SGA

Overview

For our investigation of execution plan as it is stored in memory, we need in the first place to be able to read the memory.

We have the options of
  • x$ksmmem, reading SGA using SQL. Personally I don't like it, it's cumbersome and slow.
  • direct SGA read: obviously reading SGA only; it's fast and easy to do
  • read process memory: can read PGA, process stack - and since the processes do map the SGA, too, you can read it as well. Unfortunately ptrace sends signals to the processes and the process is paused when reading it, but so far all my reads were short and fast and the processes did not notice. Some OS configurations can prevent you from using ptrace (e.g. docker by default), google for CAP_SYS_PTRACE.
  • gdb: using your favorite debugger, you can read memory as well. Useful when investigating.

Direct SGA read

I always considered direct SGA read of some dark magic, but the fundamentals are actually very easy. It still looks like sorcery when actually reading the Oracle internals, but the process of attaching the SGA shared segments to your process/tool memory and accessing it is easy. After all, this is simply System V IPC - shared memory for inter-process communication, introduced in System V Unix in 1983.

As long as our tool runs as oracle user (or to be more precise, as the user which runs the oracle process), we can use standard shmat (shared memory attach) calls to map the shared memory segments to our memory space. Then we see the memory in the exact way the oracle processes see it, and we can read or even write it as we see fit. (Obviously writing it is much more cool and outright dangerous - and we won't do that.)

There are just a few tricks necessary.

Finding the right SGA segments 

We need to know which memory segments to map. SGA is split into multiple segments and you can have multiple SGAs, or other processes can use shared memory, too.
Fortunately, we just need to find a process that already maps the segments - take your SMON or PMON, for example. Then go to /proc/pid/maps and you will see all the memory segments mapped. This includes the binary, heap, all libraries... and also the SYSV shared segments. Let's just ignore everything but those SYSV segments:

60000000-60001000 r--s 00000000 00:05 2719749 /SYSV00000000
60001000-602cc000 rw-s 00001000 00:05 2719749 /SYSV00000000
60400000-91c00000 rw-s 00000000 00:05 2752518 /SYSV00000000


The first range is the shared memory address range; then we have permissions (read, write, private/shared), offset in the segment, device, inode - or shmid for SYSV, file/segment name).
We can see a quirk here - the process maps the first segment (shmid 2719749) twice. First 0x1000 bytes as read only, then the rest as read write. As we will be attaching all read only, we don't really need to care and we can just attach the two segments.

Attach the memory

It's very useful to attach the segments at the same address as the oracle processes do; it's not mandatory, but it makes following any pointers easy, as we can simply follow them, they will point to the correct place in our memory space, too.
In other words, we can do
shmat(2719749, 0x60000000, SHM_RDONLY);
shmat(2752518, 0x60400000, SHM_RDONLY);


And then we can access the memory 0x60000000-0x91c00000 as it would be our process' memory.

Simple example tool

I've implemented simple tool that can do this, read any memory address range and print it out:
https://github.com/vit-spinka/direct-sga/blob/master/read_SGA_bytes.c

Caveat

This is all nice and cozy, but unfortunately not everything we need is in the SGA. The execution plan itself is there, as it is in the library cache; but many other piece of data is execution specific: bind variable values, sysdate/SCN as of start of the query... And also finding which SQL is currently executing is pretty complicated when reading from SGA (and thus harder to get the right cursor child we are interested in). For all these, it's useful to be able read memory of the process currently executing the query.

Reading process memory (PGA)

Accessing a process memory we can read the PGA. And since SGA is mapped by the process, we can also read SGA this way, too. And there is more, too: some info can be on the stack, some in the registers. There are even libraries that can unwind the stack (=show call stack) of the process. (libunwind-ptrace).

ptrace is the thing

Ptrace is a standard Linux/Unix way how to attach to a process, peek/poke it, and then let it run again. It's actually the thing that debuggers use to attach to a process and debug it (read memory, registers, step, run).
The steps are simple:
  • attach to a process ptrace(PTRACE_ATTACH): this sends SIGSTOP to the process and we have to wait for it
  • do to the process whatever we need, usually ptrace(PTRACE_PEEKDATA) to read data from memory.
  • detach the process ptrace(PTRACE_DETACH), the process will resume.

12c multi-threaded processes

Since 12c introduced Database Resident Connection Pooling, a single process can run multiple threads. And even if you don't use it, the binary is still compiled for multi-threading. And all the session-specific data is not global, but thread-local.
On Linux, this is implemented the standard way - the FS segment is set to thread base address and with some simple pointers there, you can get the tbss = where this thread-local data is stored.
(Actually if you do use DRCP, you will have multiple threads, and then it gets harder how to find the correct thread.)

cursor context

There is of course a lot of data there... but if we are looking just to information about current session and current cursor, we just need:
  • kxscio - session context. We need address of the kxscio symbol (use readelf tool) - 12c relative to tbss, in 11g absolute address.
  • at kxscio+0x68, we find the cursor context. It will be 0 if no query is executing, otherwise it's a valid pointer to SGA. 
A simple tool I made for this is at https://github.com/vit-spinka/direct-sga/blob/master/read_cursor_context.c .
kxscio points to many other interesting stuff, too, like the bind variables. But that's really out of scope of this blog post. 

Summary

With these two tools at our belt, we can get find the cursor we are interested in (just ptrace an oracle process just running it) and then we can read any data from the SGA we want. 

Comments

Popular posts from this blog

ORA-27048: skgfifi: file header information is invalid

I was asked to analyze a situation, when an attempt to recover a 11g (standby) database resulted in bunch of "ORA-27048: skgfifi: file header information is invalid" errors. I tried to reproduce the error on my test system, using different versions (EE, SE, 11.1.0.6, 11.1.0.7), but to no avail. Fortunately, I finally got to the failing system: SQL> recover standby database; ORA-00279: change 9614132 generated at 11/27/2009 17:59:06 needed for thread 1 ORA-00289: suggestion : /u01/flash_recovery_area/T1/archivelog/2009_11_27/o1_mf_1_208_%u_.arc ORA-27048: skgfifi: file header information is invalid ORA-27048: skgfifi: file header information is invalid ORA-27048: skgfifi: file header information is invalid ORA-27048: skgfifi: file header information is invalid ORA-27048: skgfifi: file header information is invalid ORA-27048: skgfifi: file header information is invalid ORA-00280: change 9614132 for thread 1 is in sequence #208 Interestingly, nothing interesting is written to...

Multitenant and standby: recover from subsetting

In the previous post we learnt how to exclude a PDB (or a datafile) from the standby database recovery. Of course, that might not be the real end goal. We may just want to skip it for now, but have the standby continue to be up-to-date for every other PDB, and eventually include the new PDB as well. Again, standard Oracle pre-12c DBA knowledge is helpful here. These files are just missing datafiles and thus a backup can be used to restore them. The new 12c features add some quirks to this process, but the base is just sound backup and recovery. Backup So let's start with a proper backup: rman target=/ Recovery Manager: Release 12.1.0.2.0 - Production on Mon Nov 16 12:42:38 2015 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved. backup database; connected to target database: CDB2 (DBID=600824249) Starting backup at 16-NOV-15 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=193...

Multitenant and standby: subsetting

In the previous post we looked at managing new PDBs added to a standby database, by copying the files to the DR server, as required. However, there is another possible approach, and that is to omit the PDB from the standby configuration altogether. There are two ways of achieving this: 1. Do it the old-school way. A long time before 12c arrived on the scene one could offline a datafile on the standby database to remove it. The same trick is used in TSPITR (tablespace point-in-time recovery), so that you don't need to restore and recover the entire database if you are only after some tablespaces. 2. 12.1.0.2 adds the option to automatically exclude the PDB from standby(s). And 12.2 adds the option to be more specific in case of multiple standbys. For the sake of curiosity I started by setting standby file management to manual again. What I found is that there was very little difference, and the steps to take are exactly the same - it’s just the error message that is slightly ...