In these days, disk capacity is often not a big issue today. If you have at least a decent load on the database, you will hit the IOPS limit much sooner than you run out of disk space.
Well, almost. First, you will still have a lot of inactive data that consumes the space but does not require any IOPS. And second, in some applications (like ETL in DWH) you are bound by throughput. Let me talk about this case.
Don't expect any exceptional thoughts, this post just inspired by a real production case and tries to pinpoint that there is always one more way to consider.
The optimal plan for many ETL queries is a lot of fullscans with hash joins. And often, you read one table more times, to join it in different ways. Such queries benefit if you make your tables smaller - you save on I/O.
(1) In ETL, your source tables are often imported from a different system, and you actually don't need all columns from the tables. So, first of all - don't load the data you don't need. However, usually you just can't drop the columns - this would change the data model, you would have to update it in the ETL tool, and you would have to do a lot of work when the list of used columns change.
How to tackle this? Use views, and select NULL for the columns you don't need. Use FGA (Fine-grained auditing) (at least on test) to verify you don't access any of those non-loaded columns. (Just beware, that things like dbms_stats access all columns.)
(Bonus: depending on the source system, transferring less data may take less time due to limits of the transfer channel - ODBC, network, etc.)
(2) As the source data is usually loaded only once and truncated before each load, PCTFREE should be 0, so no space is lost for allocations that will never come.
(3) Now, with the (1) implemented, the tables contain (a lot of) NULL columns. It's just one byte for each such column, but interestingly, it still makes a difference. Just recreate the tables while putting the NULL columns at the end. (No proper application depend on column order, right?). On a 1.2GB table, we got a 35% saving just by using (2) and (3) - it's really worth trying.
(4) And of course, use APPEND hint and use the 10g table direct-load data compression (COMPRESS in table definition). Another 50% for us...
Please note that the only problem you usually face here is to get a list of used columns - fortunately, most ETL tools (like ODI) can provide it, even it means accessing directly their repository (snp_txt_crossr in ODI). The rest is easy to automate.
Well, almost. First, you will still have a lot of inactive data that consumes the space but does not require any IOPS. And second, in some applications (like ETL in DWH) you are bound by throughput. Let me talk about this case.
Don't expect any exceptional thoughts, this post just inspired by a real production case and tries to pinpoint that there is always one more way to consider.
The optimal plan for many ETL queries is a lot of fullscans with hash joins. And often, you read one table more times, to join it in different ways. Such queries benefit if you make your tables smaller - you save on I/O.
(1) In ETL, your source tables are often imported from a different system, and you actually don't need all columns from the tables. So, first of all - don't load the data you don't need. However, usually you just can't drop the columns - this would change the data model, you would have to update it in the ETL tool, and you would have to do a lot of work when the list of used columns change.
How to tackle this? Use views, and select NULL for the columns you don't need. Use FGA (Fine-grained auditing) (at least on test) to verify you don't access any of those non-loaded columns. (Just beware, that things like dbms_stats access all columns.)
(Bonus: depending on the source system, transferring less data may take less time due to limits of the transfer channel - ODBC, network, etc.)
(2) As the source data is usually loaded only once and truncated before each load, PCTFREE should be 0, so no space is lost for allocations that will never come.
(3) Now, with the (1) implemented, the tables contain (a lot of) NULL columns. It's just one byte for each such column, but interestingly, it still makes a difference. Just recreate the tables while putting the NULL columns at the end. (No proper application depend on column order, right?). On a 1.2GB table, we got a 35% saving just by using (2) and (3) - it's really worth trying.
(4) And of course, use APPEND hint and use the 10g table direct-load data compression (COMPRESS in table definition). Another 50% for us...
Please note that the only problem you usually face here is to get a list of used columns - fortunately, most ETL tools (like ODI) can provide it, even it means accessing directly their repository (snp_txt_crossr in ODI). The rest is easy to automate.
Comments