Skip to main content

More than one way to save blocks

In these days, disk capacity is often not a big issue today. If you have at least a decent load on the database, you will hit the IOPS limit much sooner than you run out of disk space.

Well, almost. First, you will still have a lot of inactive data that consumes the space but does not require any IOPS. And second, in some applications (like ETL in DWH) you are bound by throughput. Let me talk about this case.

Don't expect any exceptional thoughts, this post just inspired by a real production case and tries to pinpoint that there is always one more way to consider.

The optimal plan for many ETL queries is a lot of fullscans with hash joins. And often, you read one table more times, to join it in different ways. Such queries benefit if you make your tables smaller - you save on I/O.

(1) In ETL, your source tables are often imported from a different system, and you actually don't need all columns from the tables. So, first of all - don't load the data you don't need. However, usually you just can't drop the columns - this would change the data model, you would have to update it in the ETL tool, and you would have to do a lot of work when the list of used columns change.
How to tackle this? Use views, and select NULL for the columns you don't need. Use FGA (Fine-grained auditing) (at least on test) to verify you don't access any of those non-loaded columns. (Just beware, that things like dbms_stats access all columns.)
(Bonus: depending on the source system, transferring less data may take less time due to limits of the transfer channel - ODBC, network, etc.)

(2) As the source data is usually loaded only once and truncated before each load, PCTFREE should be 0, so no space is lost for allocations that will never come.

(3) Now, with the (1) implemented, the tables contain (a lot of) NULL columns. It's just one byte for each such column, but interestingly, it still makes a difference. Just recreate the tables while putting the NULL columns at the end. (No proper application depend on column order, right?). On a 1.2GB table, we got a 35% saving just by using (2) and (3) - it's really worth trying.

(4) And of course, use APPEND hint and use the 10g table direct-load data compression (COMPRESS in table definition). Another 50% for us...

Please note that the only problem you usually face here is to get a list of used columns - fortunately, most ETL tools (like ODI) can provide it, even it means accessing directly their repository (snp_txt_crossr in ODI). The rest is easy to automate.

Comments

Popular posts from this blog

Filter and access predicates

More than just column projections When we look around for further pointers in the tree nodes, we find more pieces resembling the column projections we have seen so far. With some experimenting, we will find out that these are access predicates and filters. First of all, the location of these pointers is not always the same, it seems that the value at 0x34 is some kind of flag, indicating whether filters and/or access predicates are present, and where. Or whether there is just one, or more of them.  It probably also indicates what other info is present, but I have no idea what info that would be or what each value means. Resembling, but different The data we see as predicates are not columns; after all, a predicate is a condition, not a single column. But the structure is similar to what we have seen with columns, and if we follow pointers further, we eventually build a tree, and some of the leaves are indeed just column projections. After some contemplation, we realize it's...

dbms_alert on RAC

Not long time ago, I came across a usage of dbms_alert to manage running jobs. As the solution implemented must work also for RAC, I wanted to know whether dbms_alert works on RAC across instances. The documentation nor Metalink does not say anything (contrary to dbms_pipe, which does NOT work on RAC). So, if they don't warn, it should work... However, Julian Dyke says, that dbms_alert does not work and is the same as dbms_pipe (sources: http://juliandyke.com/Presentations/Presentations.html#ARoughGuideToRAC , page 17, or Pro Oracle Database 10g RAC on Linux, page 426). You know, never trust anybody, so I conducted a test case (10.2.0.3 on Linux x86_64, VMware ESX server, 2-node RAC): You will need two simultaneous sessions, I mark them with DWH1> and DWH2> here. DWH1> select instance_name from v$instance; INSTANCE_NAME ---------------- DWH1 DWH2> select instance_name from v$instance; INSTANCE_NAME ---------------- DWH2 DWH2> exec dbms_alert.register('TST'); ...

Execution plan rows

The plan As mentioned in previous post, our example is based on the sample SH schema, with an added table FOOBAR (id number, key varchar2(30)): SELECT prod_id, key FROM products CROSS JOIN foobar WHERE prod_id in (143,144,id) and id in (1,2,3); In all reverse engineering, it's good to start with something simple and to know what we to look for. Thus we want to know what the execution plan should look like; and the more unique numbers/ids we can find, the better. It's much easier to look for a number like 0x12fa1893 than for 0x0 or 0x1. The execution plan, obtained using: SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY(null,null,'ALL'));  is I have added the CPU cost from the full detail of the execution plan in  v$sql_plan / x$kqlfxpl. Looking at the numbers, we also have rows (1 and 2 ... not very unique), bytes (34 / 30 / 8 - not bad) and what is not shown here, we also know object ids of the index and the table: 94765 and 92749 (nice). We did not use any...