I know I haven't posted in a little while...it's just been the daily grind of work and life I suppose.
I'll be updating this post later to illustrate what we did to integrate OBIEE 10.1.3.4.1 with Weblogic 11gR1 and SSO/OID. Note, this changes the game, as we're not dealing with deploying the .WAR file to the old Oracle Application Server. MOD OSSO? Not quite! Weblogic 11gR1 changed the game a bit. Hence, some of the documentation on OBIEE is not quite up to date (heck, I think it has NEVER been rewritten since 2006???)
Expecting certain things to work as they used to with the old way to setup SSO and passing credentials to OBIEE? Well, they won't be exactly the same, and it sadly wasn't documented anywhere. Nobody in the Oracle forums, nor on metalink, nor on ANY blog I had seen posted any solutions. It is only a matter of time before it should be in the docs, as I know at least 3 clients of mine installed Weblogic recently and have OBIEE, and might consider SSO. I'll do what I can to cover the solution. Note to all, this was even more difficult because at my client, Weblogic was being installed and administered by another firm that was responsible for the security and web architecture setup. So we were at their mercy a bit, but they were helpful as well.
To leave you with one last bit before I write more in a later post, look at the diagram out of the documentation below. I refer to this as the 'MASTER' plan because it should be practically memorized by most OBIEE practitioners, considering it has all the default ports, applications, and so on. I highlighted the areas that should be of concern to you. It is primarily where the BI Presentation Services Plug-in Resides, and the communication with the BI presentation services themselves. Assuming you configured SSO correctly, and you've configured MOST of OBIEE correctly, and you've done what you feel is right and you get to the point where you see the infamous OBIEE server message "You are not logged in..." well that's where the stumper comes in. In a nutshell, you likely will miss 1-2 things here because weblogic passes the presentation services something it is not expecting. You will need to change a few things in your instanceconfig.xml file to accommodate this. OR, you can dive into Weblogic, and change a few things as well. I'll try to share more next time. We had to look at log file after log file to finally get this one right. Oh, and this was all pretty much on Linux, not that it matters much.
Anyway, at my client, a large pharmaceuticals and health care insurance/benefits management company, I got to run into quite a few more advanced issues.
For one, anyone that has ever read the beginner Ralph Kimball books on dimensional modeling might be familiar with the concepts of late arriving fact and late arriving dimensions. I can get into the gory details on that later, but suffice to say, they are rampant all over the place with the health care industry! A former colleague of mine was a data architect and ETL admin at a very large insurance company, and told me how common it was, and what a pain in the rear-end as well! Now, I got to see it first-hand, and realized it was VERY common, at least in that industry. It presents some unique architectural challenges for modeling and handling of the ETL. Not a terribly hard problem to solve, but nonetheless, a good experience to get under the belt within this industry that has so much late arriving data.
Why would the healthcare industry have so much late arriving data? Well, to summarize a few points, let us think about it. Think about ALL the data that has to go from 1 place to the next. Your employer had to send your details to an insurance company. The insurance company had so set you up in the system, and provide a temporary ID, maybe not even a real card yet. You go to the doctor and they file a claim, which is even more data exchanged. You may get a bill or invoice or just a notice that a claim was filed, but it might not even be a bill yet. Maybe 1 month later it is a bill and you need to pay your portion of it that insurance didn't cover. There could be differential diagnoses, multiple prescriptions, and so forth. Yes, the claim might be filed, and it might qualify as a FACT in your fact table, but the dimensional attributes such as a drug price, diagnoses, payments, addresses, and so much more might change after the fact table is joined to it's dimensional values on certain keys. Read more about it in the Kimball Data Modeling book to see a clearer picture.
Thursday, May 20, 2010
OBIEE 10.1.3.4.1 with Weblogic 11gR1 and SSO / OID
Posted by Greg at 7:48 PM 0 comments
Monday, March 8, 2010
Oracle Application Performance Tuning Course
I just got done custom creating a new Oracle performance tuning / SQL tuning course, and teaching the beginning of it! This class is ideal for developers that know enough about the database to be a Jr DBA (enough to be dangerous) and emphasizes application tuning, sql tuning, and things developers should be aware of. To summarize, the course that I teach is a full 3 day class on the following topics:
The Performance Review Lifecycle
Database and Table Design Considerations
Database Architecture, Processes, and Parameters that a Developer Should be Aware of
Cursor Lifecycle including soft and hard parses
Bind variables, bind variable peeking and bind variables in your SQL (Dynamic SQL)
Cardinality, Selectivity, Computation, etc.
Statistics, and all subtopics like import/export, gathering, what they do, strategies.
The Oracle Optimizer (LOTS of extensive detail on this important topic)
Understanding how to read an Explain Plan from various tools that produce one.
Other tools to help in tuning - TKPROF, Autotrace, AWR, statspack, ADDM, STA, SPA, etc.
Indexing
Optimizing Joins and join types
Hints
PL/SQL tuning
Utilizing ANSI SQL to out-perform classic Oracle SQL as you've seen it
V$Views to analyze for performance gains
How to identify a Performance Issue that you haven't seen yet.
Tuning in different environments (Oracle 8i, 9i, 10g, 11g, OLTP vs DW vs Bulk movements, etc.)
Tips and Tricks
...and that's all. The initial offering went EXTREMELY well. I was even surprised. Hopefully I will be teaching this class more often! I cannot stress understanding Oracle performance enough, especially to BIDW types who are too used to doing simple joins between a few tables, or let tools like OBIEE with their metadata layers take some of the work out of your hands...for now.
-Greg
Posted by Greg at 9:10 PM 0 comments
Friday, February 19, 2010
Oracle Data Integrator Tutorial- ODI for beginners
I am going to be writing up a few posts and tutorials on the basics of the architecture (see previous post) and a few beginner topics to get the masses in a mode where they are not intimidated by this tool. The previous post described various aspects to ODI, and the architecture. We saw various GUIs that ODI allows us to use. This post (and possibly another one or two) will look at the DESIGNER GUI in more detail.
One of the first things I will say is that the tool can feel a bit foreign if you come from an Informatica/OWB type background like myself. The emphasis is on building smaller and reusable pieces of code, if you will. Other tools that tend to be more ETL based (Informatica) and also more GUI and non-declarative based (OWB) are not necessarily the easiest tools to break things into small pieces. Yes, there are many that say breaking mappings into smaller pieces is a best practice, but the tool doesn't necessarily thrive on your ability to follow that practice. They don't really revolve around that practice. There are even mapplets and pluggable mappings, but these pale in comparison to the design methodologies that ODI offers. ODI thrives on code re-use.
For example, the diagram below shows a typical ETL/ELT from something that may come out of OWB or Informatica (a bit bigger, defined, and manual).
The following is the type of development that ODI supports and emphasizes for code reuse:
If you are a beginner to ODI and used other ETL/ELT tools and yoou are beginning to build out mappings you'll notice many of the mappings you'll build will be more discrete, if you will. There are also many steps that are done for you. You can build mappings quite fast in ODI actually. However, do not be too worried that that tool does quite a bit for you. Like Oracle's JDeveloper tool, you can get into most of the gory details if you'd like to fine-tune certain aspects, even if the tool does much of the work for you.
You will do the bulk of your work doing mappings in the following interface:
A brief description of what we see above in the Diagram above. First, notice that the Diagram TAB is selected on the top. This opens the diagramming tool where 'mappings' are built. In the middle are the data sources. These sources are physically dragged and dropped from the far left pane of the GUI to this 'sources' section and are then joined, filtered, and so forth. The right side of the GUI is labeled as the TARGET datastore. This is your target. For a beginner, this can be a bit confusing if you're used to OWB because here, the sources and target is separated by differnet parts of the GUI. You would drag your TARGET table to this target datastore pane in the GUI and not into the same area as the sources. So, a bit different in that respect.
Below is a zoom in on only the mapping portions of the tool. Again, on the left are the source portions of the mapping diagram, and on the left is the target datastore. You might not use the terminology 'datastore' too often, but in this tool you should get used to seeing that term a lot!
Here is another screenshot of the far left portion of the main GUI, which I tend to call the 'explorer' since it is built out like many software tool's explorer-like interfaces.
Again, one final screenshot to tie it all together what we have seen so far:
In the next posting, I will go into a bit more detail on the 'explorer' and the many tabs and options available within. As you might have imagined, these 'mappings' have been created and opened from the explorer on the left, and they are technically referred to as 'interfaces'. I'll talk a bit more on the concept of interfaces, and go into a bit more detail on working with your interfaces (A.K.A. mappings to most of us).
Posted by Greg at 9:37 AM 1 comments
Labels: Data Warehousing, ODI
Monday, February 8, 2010
ODI Oracle Data Integrator Architecture Part 1
The following series will be a high level look at the pieces, components, and basic workings of ODI, which should help any reader comprehend some of the finer pieces of this data integration and ELT tool.
The GUI graphical modules are listed above.
The four ODI GUIs – Designer, Operator, Topology Manager and Security Manager, are based on Java. They can be installed on any platform that supports Java Virtual Machine 1.4, including Windows, Linux, HP-UX, Solaris, pSeries and so on.
Designer is the GUI for defining metadata, and rules for transformation and data quality. It uses these to generate scenarios for production, and is where all project development takes place. It is the core module for developers and metadata administrators.
Operator is used to manage and monitor ODI in production. It is designed for production operators and shows the execution logs with errors counts, the number of rows processed, execution statistics etc. At design time, developers use Operator for debugging purposes.
Topology Manager manages the physical and logical architecture of the infrastructure. Servers, schemas and agents are registered here in the ODI Master Repository. This module is usually used by the administrators of the infrastructure.
Security Manager manages users and their privileges in ODI. It can be used to give profiles and users access rights to ODI objects and features. This module is usually used by security administrators.
All ODI modules store their information in the centralized ODI repository.
Posted by Greg at 4:08 PM 0 comments
Labels: ODI
ETL Misconceptions and Mistakes Made While Working With Large Databases - 1TB or Greater
I'd first like to give some due credit to a fellow blogger that has pushed out so much information on data warehouse topics, it is staggering. He is Dan Lindstedt, and this next piece on my blog is about some common etl mistakes on large systems, with my commentary. His work appears frequently on the Beyenetwork blogs.
Please note that some of these issues listed below might not necessarily make a difference if you are dealing with a smaller system with less data / rows. If you have never worked on a larger size database, some of these concepts might even seem unfamiliar or odd.
Top ETL mistakes
1) Incorporating Inserts, Updates, and Deletes in to the same data flow / same process.
This is a great point, if for nothing less than splitting the individual tasks into discrete pieces. For a target table load, try an insert job, an update job, and other jobs like deletion or massive bulk loading jobs. These will all vary based on time and loads. Inserts in many cases being simple and easy, and updates and especially deletes sometimes becoming more taxing.
2) Sourcing multiple systems at the same time, depending on heterogeneous systems for data.
This can be cause for great concern if you try this, unless the data volume is smaller, again, we are referring to larger loads and larger systems though.
3) Targeting more than 1 or 2 target tables
You might consider 2 tables at once if it has a relationship, hierarchical, parent-child, or otherwise, but this should not happen often! That should be an exception, most large volume ETL jobs should be 1 target table.
4) moving rows that are too wide through a single process
Any ways to get around this issue or mistake? Comments?
5) loading very large data sets to targets WITH INDEXES ON
Many tools allow you to turn indexes off and then recompute and turn on again after a job- datastage, owb, etc. You can code it custom if you need to. Again, you don't need indexes and keys if you have other ways to check your referential integrity.
6) not running a cost-based optimizer in the database
Big mistake here. You should always consider a routine after all loads are finished to analyze, or gather stats or other such details for your database just so that your cost-based optimizer, or other tool (in a non-Oracle database) can gather all the necessary information about new structures, indexes, and so forth.
7) not keeping statistics up to date in the database
Again, this is similar to 6, almost exactly. You'd want current statistics so the cost based optimizer takes advantage of the current structure and used the right joins and indexes, etc.
8) not producing the correct indexes on the sources / lookups that need to be accessed
You should know well enough where the majority of your lookups and/or joins are going to happen....indexes....use them! But, again, see the above mistake about keeping them during a load.
9) not purchasing enough RAM for the ETL server to house the RAM caches in memory.
Not applicable to OWB necessarily, more for things like Informatica and DataStage.
10) running on a 32 bit environment which causes significant OS swapping to occur
11) running on a 32 bit environment which causes significant OS swapping to occur
12) running on a 32 bit environment which causes significant OS swapping to occur
The original author had a major issue with #10, so as to repeat it for #11 and for #12!!!
The original author points out the following concerning #10-#12:
"Let me explain this one. Any single process started within a 32 bit environment is limited (immediately) to 2GB of addressable RAM, now - take away the RAM needed for the threads or the executable engine code... usually around 300M or more, leaving us with 1.5GB to 1.7GB of RAM to work with. Then, take away any "caching" objects that might use this RAM, and multiply that "drop" in RAM by the number of caching objects that you have....
Now, you are left with possibly 200MB, maybe 100MB of RAM left to allocate for "data reading/data writing"... and it drops from there.
Let me explain this other piece as well: Windows 32 bit OS MUST RUN CODE UNDER THE 640k BOUNDARY!!! So there's additional Limitations there, Windows 32 Bit PAGEFILE.SYS (swap/temp) is SINGLE THREADED, AND BLOCKS I/O OPERATIONS when swapping pieces to disk. Oh yes, one more: Windows 32 Bit - RAM allocation will ONLY give the application 1/2 of the requested RAM, and AUTOMATICALLY put 1/2 in the pagefile.sys swap area.
By the way, here's a tid-bit for you that you have to try to believe:
If you are running Windows 32 bit, SQLServer (either version)....
1. Change your ETL "block size" to 8k to match the Database (4k I think in SQLServer2000)
2. Change your row size to fit as many rows as possible into an 8k block, the more rows per block the faster the performance.
3. Change your commit point to 8,500 rows (this is the sweet spot)
4. Check in on NETWORK PACKET SIZES and increasing those between the ETL engine and the Database, increase them from 2k/4k to 8k - again to match the disk.
5. USE ODBC/OLE DB connectivity; do NOT use NATIVE libraries (except in SQLServer2005 direct CT-LIB). If your tool uses DB-LIB as "native" connectivity, it will be slow, slow slow - DB-LIB is 12 year old technology, and is NOT multi-threaded, nor parallel, where CT-LIB is.
Windows 64 Bit does NOT have these limitations, but requires 30% more hardware to run "cool" and efficiently, but it is FAST when installed and configured correctly on the right hardware.
Linux, and Unix do not have these limitations either.
So, architecture is everything."13) Trying to do "too much" inside of a single data flow, increasing complexity and dropping performance
This does make the job harder to debug and sometimes you get those random out of resource errors...Beware. Split work where appropriate. If a mapping or process looks too complicated, it probably is.
14) believing that "I need to process all the data in one pass because it's the fastest way to do it."
"breaking-up" a single flow into multiple flows , and changing the width of the data sets and possibly the parallelism along the way would help the performance.
15) Letting the database "bounce" errors back to the ETL tool, dropping flow rates and throughput rates by factors of 4x to 10x.
comments?
16) "THINKING" in a transactional mode, rather than a batch mode, and processing each row, one row at a time (like they would code a cursor in a database language).
A bit hard to avoid in a row-by-row ETL tool! The parallel architecture and caching and memory sharing and a bunch of other things make it fast.
17) LOOPING inside an ETL process, because they think it's necessary (transactional processing again).
Careful here no matter which tool you are using. Looping, as it goes WAY back in simple logical programming can be a blessing or a disaster. If you need to loop, there's probably another way to do what you're attempting, in a more efficient way. This does not necessarily mean using a LOOP in an OWB process flow, but trying to loop somehow in the ETL itself, called a stored function or procedure, etc.
Posted by Greg at 11:08 AM 1 comments
Labels: Data Warehousing
Friday, February 5, 2010
GO New Orleans Saints!
I hope they win the super bowl and knock off those dumb Colts. They beat my Bears in 2006-2007, so GO SAINTS! Who DAT?!
Posted by Greg at 7:57 PM 0 comments
Date Dimensions anybody?
Here is some sample code for creating a date dimension, It's been out there before, My colleague Dave posted this on his blog at one point. I'd like to take a moment and maybe look at how Oracle implements their date dim with the OWB date dimension wizard....hmmmm!
SELECT TO_CHAR (dt, 'YYYYMMDD') date_id,
dt date_value,
TO_CHAR (dt, 'MM/DD/YYYY') date_formatted_name,
TO_CHAR (dt, 'Mon DD, YYYY') date_name,
TO_CHAR (dt, 'YYYY') year_num,
TRUNC (dt, 'YYYY') year_start_date,
ADD_MONTHS (TRUNC (dt, 'YYYY'), 12) - 1 year_end_date,
TO_NUMBER (TO_CHAR (dt, 'Q')) quarter_num,
'Q' || TO_CHAR (dt, 'Q') quarter_name,
TRUNC (dt, 'Q') quarter_start_date,
ADD_MONTHS (TRUNC (dt, 'Q'), 3) - 1 quarter_end_date,
TO_CHAR (dt, 'YYYY') || '-' || 'Q' || TO_CHAR (dt, 'Q') year_quarter_name,
TO_CHAR (dt, 'MM') month_num,
TO_CHAR (dt, 'Mon') month_short_name,
TO_CHAR (dt, 'Month') month_long_name,
TRUNC (dt, 'MM') month_start_date,
LAST_DAY (dt) month_end_date,
TO_CHAR (dt, 'YYYY') || '-' || TO_CHAR (dt, 'Mon') year_month_name_1,
TO_CHAR (dt, 'YYYY') || '-' || TO_CHAR (dt, 'MM') year_month_num_1,
TO_CHAR (dt, 'Mon') || ' ' || TO_CHAR (dt, 'YYYY') year_month_name_2,
TO_CHAR (dt, 'YYYY') || TO_CHAR (dt, 'MM') year_month_num_2,
TO_CHAR (dt, 'WW') week_of_year_num,
TO_CHAR (dt, 'W') week_of_month_num,
TRUNC (dt, 'W') week_start_date,
TRUNC (dt, 'W') + 6 week_end_date,
TO_CHAR (dt, 'YYYY') || TO_CHAR (dt, 'WW') year_week_num,
TO_CHAR (dt, 'YYYY') || '-' || TO_CHAR (dt, 'WW') year_week_name,
TO_CHAR (dt, 'D') weekday_num,
TO_CHAR (dt, 'Day') weekday_name
FROM (SELECT TO_DATE (:start_date) + ROWNUM dt
FROM DUAL
CONNECT BY TO_DATE (:start_date) + ROWNUM <= TO_DATE(:end_date));
Posted by Greg at 7:52 PM 0 comments
Labels: Data Warehousing
Metadata reporting on the OWB 11gR1 repository- options?
There are quite a few clients, coworkers and managers I've met with that feel the documentation capabilities of OWB are, shall we say....lacking a bit. Sure, there is the OWB online repository browser which is quite handy and gives you a lot of information. There is the metadata lineage and impact analysis diagramming which again is quite handy. There are various tricks one can do with OMB*PLUS or experts to whip up some simple documentation. Of course when you want to get specific, then it becomes a challenge.
Say you want a report, with minimal work, that is in a CSV or excel spreadsheet showing all sorts of values that you figure you can grab from the repository somehow. Again, you can accomplish some great things with OMB and experts, and I have done this, but many times it is too time intensive and overkill. Sometimes you might consider UDOs (User Defined Objects) which can be created with OMB scripting, which give you additional fields and properties to use on your objects. This can make things easy when you are in the beginning of a project and you can tackle documenting things as you go along.
What if you're 50% or even 90% done with your project, and someone demands something - fast. I say, look at the repository quickly, write some SQL, and get it out there fast.
Well in OWB 11gR1, the repository is broken down into things called workspaces. These workspaces have separate owners, and they have users themselves. So if you want to access the OWBSYS schema's many views that hold important metadata, you'll need to set some context for the user accessing the data- namely the various workspace owners. Below are a few steps that have been published in forums and the owb blog as well that outline what you should do if you cannot see any data in OWBSYS's views.....
First read this.
Then....
THEN, here are 2 sample scripts that are in the oracle forums as well. These should get you started, and get your mind thinking i terms of what other views you'd like to look at, and what other columns are you interested in...as well as the various ways to join the views!
--This script focuses more on mapping contents...
select
distinct 'TARGET',
comp.map_name,
comp.data_entity_name,
comp.operator_type
from
all_iv_xform_map_components comp,
all_iv_xform_map_parameters param
where
lower(operator_type)
in ('table', 'view', 'dimension', 'cube')
and param.map_component_id = comp.map_component_id
and param.source_parameter_id is not null
UNION
select
distinct 'SOURCE',
t1.c1,
t1.c2,
t1.c3
from
(select
comp.map_name c1,
comp.data_entity_name c2,
comp.operator_type c3,
max(param.source_parameter_id) c4
from
all_iv_xform_map_components comp,
all_iv_xform_map_parameters param
where
lower(operator_type) in
('table', 'view', 'dimension', 'cube')
and param.map_component_id = comp.map_component_id
group by
comp.map_name, comp.data_entity_name, comp.operator_type) t1
where t1.c4 is null
order by 2,1
--This script focuses more on source to target tables and column mappings
SELECT
S_TBL_LOC ,
S_TBL_name,
S_FLD_name,
T_TBL_LOC ,
T_TBL_name,
T_FLD_name
FROM
(
SELECT
parms.map_component_id S_COMP_id ,
parms.parameter_id S_PARM_id ,
colms.entity_name S_TBL_name ,
colms.column_name S_FLD_name ,
tbls.schema_name S_TBL_LOC
FROM
all_iv_xform_map_parameters PARMS,
all_iv_xform_map_components COMPS,
all_iv_columns COLMS ,
all_iv_tables TBLS
WHERE
source_parameter_id IS NULL
AND parms.map_component_id = comps.map_component_id
AND UPPER(comps.operator_type) IN ('TABLE','DIMENSION')
AND parms.data_item_id =colms.column_id
AND colms.entity_id =tbls.table_id
)
,
(
SELECT
parms.map_component_id T_COMP_id ,
parms.parameter_id T_PARM_id ,
parms.source_parameter_id T_Sid ,
colms.entity_name T_TBL_name ,
colms.column_name T_FLD_name ,
tbls.schema_name T_TBL_LOC
FROM
all_iv_xform_map_parameters PARMS,
all_iv_xform_map_components COMPS,
all_iv_columns COLMS ,
all_iv_tables TBLS
WHERE
source_parameter_id IS NOT NULL
AND parms.map_component_id = comps.map_component_id
AND UPPER(comps.operator_type) IN ('TABLE','DIMENSION')
AND parms.data_item_id =colms.column_id
AND colms.entity_id =tbls.table_id
)
WHERE
S_PARM_id=T_Sid
Posted by Greg at 7:15 PM 0 comments
Labels: OWB
Monday, February 1, 2010
Sunday, January 31, 2010
Data Warehousing Training
I just got done custom creating an advanced data warehousing class, along with updating our beginner course to utilize Oracle 11g technologies. Whew, talk about a lot of work and only a short time to get it done! To summarize, the course that I teach is a full 5 day class on the following topics:
-Beginner Course Topics-
Beginning Data Warehousing
Dimensional Modeling
Oracle 11g Database options for Data Warehousing
Oracle Warehouse Builder OWB 11gR1 introduction
OWB11g Architecture and Installation
OWB11g and OWB 10gR2 differences, similarities
Exploring OWB11g Interface
Common setup of typical scenarios with remote machines and schemas
Building ETL Mappings from source to staging to target
Deep-Dive into OWB Mapping best practices and tips with the operators
Understanding Data Quality
Data Profiling and Data Rules Option within OWB11g
Advanced operators- Match & Merge, pre-post mapping operators, table functions, etc.
Understanding Workflow installation and Process Flow Basics
-Advanced Course Topics-
Oracle OLAP
Understanding Oracle's Position on OLAP & Hyperio Essbase
How to use OWB to construct full MOLAP cubes and how it differs from ROLAP
Extensive ETL Lab (2-3 hour lab encompassing many advanced techniques)
Pluggable Mappings
The Metadata Manager (Impact, Dependency, and Propagation)
Lifecycle Management (MDLs, Snapshots, Signatures, importation, exportation, comparison of
snapshots, etc.)
Security in OWB
Extra detail on Process Flows and Scheduling
OMB*PLUS scripting
OWB Experts
Advanced Tips and Tricks for project management, and high Performance
...and that's all. I built out about 90% of this class and taught it recently to pretty good reviews... Let's hope there's more students in store as I prepare to build out a new offering to my students!
Cheers
-Greg
Posted by Greg at 4:48 PM 0 comments
Labels: Data Warehousing, OWB, Training, Virtual Machines