OverviewFor our investigation of execution plan as it is stored in memory, we need in the first place to be able to read the memory.
We have the options of
- x$ksmmem, reading SGA using SQL. Personally I don't like it, it's cumbersome and slow.
- direct SGA read: obviously reading SGA only; it's fast and easy to do
- read process memory: can read PGA, process stack - and since the processes do map the SGA, too, you can read it as well. Unfortunately ptrace sends signals to the processes and the process is paused when reading it, but so far all my reads were short and fast and the processes did not notice. Some OS configurations can prevent you from using ptrace (e.g. docker by default), google for CAP_SYS_PTRACE.
- gdb: using your favorite debugger, you can read memory as well. Useful when investigating.
Direct SGA readI always considered direct SGA read of some dark magic, but the fundamentals are actually very easy. It still looks like sorcery when actually reading the Oracle internals, but the process of attaching the SGA shared segments to your process/tool memory and accessing it is easy. After all, this is simply System V IPC - shared memory for inter-process communication, introduced in System V Unix in 1983.
As long as our tool runs as oracle user (or to be more precise, as the user which runs the oracle process), we can use standard shmat (shared memory attach) calls to map the shared memory segments to our memory space. Then we see the memory in the exact way the oracle processes see it, and we can read or even write it as we see fit. (Obviously writing it is much more cool and outright dangerous - and we won't do that.)
There are just a few tricks necessary.
Finding the right SGA segmentsWe need to know which memory segments to map. SGA is split into multiple segments and you can have multiple SGAs, or other processes can use shared memory, too.
Fortunately, we just need to find a process that already maps the segments - take your SMON or PMON, for example. Then go to /proc/pid/maps and you will see all the memory segments mapped. This includes the binary, heap, all libraries... and also the SYSV shared segments. Let's just ignore everything but those SYSV segments:
60000000-60001000 r--s 00000000 00:05 2719749 /SYSV00000000
60001000-602cc000 rw-s 00001000 00:05 2719749 /SYSV00000000
60400000-91c00000 rw-s 00000000 00:05 2752518 /SYSV00000000
The first range is the shared memory address range; then we have permissions (read, write, private/shared), offset in the segment, device, inode - or shmid for SYSV, file/segment name).
We can see a quirk here - the process maps the first segment (shmid 2719749) twice. First 0x1000 bytes as read only, then the rest as read write. As we will be attaching all read only, we don't really need to care and we can just attach the two segments.
Attach the memoryIt's very useful to attach the segments at the same address as the oracle processes do; it's not mandatory, but it makes following any pointers easy, as we can simply follow them, they will point to the correct place in our memory space, too.
In other words, we can do
shmat(2719749, 0x60000000, SHM_RDONLY);
shmat(2752518, 0x60400000, SHM_RDONLY);
And then we can access the memory 0x60000000-0x91c00000 as it would be our process' memory.
Simple example toolI've implemented simple tool that can do this, read any memory address range and print it out:
CaveatThis is all nice and cozy, but unfortunately not everything we need is in the SGA. The execution plan itself is there, as it is in the library cache; but many other piece of data is execution specific: bind variable values, sysdate/SCN as of start of the query... And also finding which SQL is currently executing is pretty complicated when reading from SGA (and thus harder to get the right cursor child we are interested in). For all these, it's useful to be able read memory of the process currently executing the query.
Reading process memory (PGA)Accessing a process memory we can read the PGA. And since SGA is mapped by the process, we can also read SGA this way, too. And there is more, too: some info can be on the stack, some in the registers. There are even libraries that can unwind the stack (=show call stack) of the process. (libunwind-ptrace).
ptrace is the thingPtrace is a standard Linux/Unix way how to attach to a process, peek/poke it, and then let it run again. It's actually the thing that debuggers use to attach to a process and debug it (read memory, registers, step, run).
The steps are simple:
- attach to a process ptrace(PTRACE_ATTACH): this sends SIGSTOP to the process and we have to wait for it
- do to the process whatever we need, usually ptrace(PTRACE_PEEKDATA) to read data from memory.
- detach the process ptrace(PTRACE_DETACH), the process will resume.
12c multi-threaded processesSince 12c introduced Database Resident Connection Pooling, a single process can run multiple threads. And even if you don't use it, the binary is still compiled for multi-threading. And all the session-specific data is not global, but thread-local.
On Linux, this is implemented the standard way - the FS segment is set to thread base address and with some simple pointers there, you can get the tbss = where this thread-local data is stored.
(Actually if you do use DRCP, you will have multiple threads, and then it gets harder how to find the correct thread.)
cursor contextThere is of course a lot of data there... but if we are looking just to information about current session and current cursor, we just need:
- kxscio - session context. We need address of the kxscio symbol (use readelf tool) - 12c relative to tbss, in 11g absolute address.
- at kxscio+0x68, we find the cursor context. It will be 0 if no query is executing, otherwise it's a valid pointer to SGA.
kxscio points to many other interesting stuff, too, like the bind variables. But that's really out of scope of this blog post.