| Data Warehousing was an innovation from the 90's | | | | not as tight as in "traditional" transaction processing |
| that promised to change the data landscape for | | | | due to technical issues like proxy servers and caching. |
| good. How far have we come? Many vendors have | | | | Because of these differences, IT people need to |
| entered the marketplace because it makes sense to | | | | adapt to the web process flow, rather than having |
| bring together data from throughout the organization, | | | | the process adapt to IT needs as is common for |
| and this will continue to make sense in the future. | | | | most other DWH interfaces. |
| How large the Data Warehouse market will grow | | | | 6. Which Data Should Be loaded In The Data |
| nobody knows yet. But for sure it is still growing | | | | Warehouse? |
| fast, and currently is estimated at 4,5 billion dollar per | | | | The data that enter the DWH ultimately determine |
| year (IDC). | | | | its place in the organization. A "let's load all data, to |
| 1. Why Do Data Warehouse Projects Run Into Scope | | | | be safe"-attitude is a sure fire way to derail your |
| Creep? | | | | DWH project. Choices as to what should and should |
| To quote Bill Inmon (guru and author of several great | | | | not be included need to be made early on, to keep |
| books on Data Warehousing) "Traditional projects | | | | the project manageable. After proven success of the |
| start with requirements and end with data. Data | | | | delivered, deployed, and profitably exploited DWH, |
| Warehousing projects start with data and end with | | | | there always will be funding somewhere to include |
| requirements." As soon as the project gets under | | | | previously ignored interfaces. Given the anticipated |
| way, users will find new applications, and with it will | | | | lifecycle of the DWH, it makes perfect sense to |
| come new requests for data. Interestingly, these | | | | consciously exclude certain sources. The choice as to |
| projects often are justified by moving Q&R | | | | what data to include needs to be driven by business |
| work away from the 'data people'. What we've seen | | | | considerations, and in particular reference to the |
| is that the first thing that happens as soon as the | | | | company bottom line. If it can't be shown how data |
| project delivers is that more requests for special | | | | will be put to use profitably, they stay out! See also |
| queries are submitted to these same 'data people'. | | | | tip #3. |
| This may appear to undermine the initial business | | | | 7. Data Warehousing & Company Politics |
| case but actually signals the onset of value creation | | | | Data Warehouses have an impact on the company |
| from the DWH project. | | | | bottom line. Hence, they are likely candidates for turf |
| 2. Star Schema Versus Entity Relation Model? | | | | battles, and are also at risk of becoming "small |
| There has been enormous debate in the community | | | | change" in budget allocation negotiations. None of |
| about the merits of different data models. At the | | | | these considerations benefit corporate long term |
| risk of over simplifying: ER models tend to have | | | | goals. Managing a DWH project is hard enough as it |
| better performance (processing time) for the end | | | | is, and budget issues shouldn't make it any harder |
| user, and are often perceived as "easier" to | | | | than it already is. Because DWH investments are in |
| understand by end users. Drawbacks are that ER | | | | the present and revenues lie in the future, it is even |
| models require more disk space, and, because of the | | | | more important to secure funding through a sound |
| intrinsic redundancy in the data, have consistency | | | | business case and buy-in from the appropriate (high) |
| problems from a maintenance perspective. Having | | | | management level. See also Tip #3. Access to data |
| said this, the practice seems to be that often some | | | | means power, and talking about power is one of the |
| combination of the two is unavoidable in the practical | | | | greatest management taboos, still around. Sensitive |
| setting, despite preferences (ER or Star) of the chief | | | | as they are, even budgets are more readily |
| architects. Overall, Star models seem to have gained | | | | discussed... |
| the most ground. | | | | 8. Data Warehouse Projects Traps |
| 3. The Importance of a Data Warehouse Business | | | | Some commonly recurring 'roadblocks' on the path to |
| Case | | | | timely delivery of a Data Warehouse project: |
| Much has been written about the business case for a | | | | - ETL processes have eaten up so much time (and |
| Data Warehouse. What goes in to a good business | | | | still need "babysitters"), that little if any time is left to |
| case? IT savings are ubiquitous in DWH business | | | | develop applications needed to exploit the DWH |
| cases. The important point is to not limit this to 'pure' | | | | - Some data are needed, but turn out not to be |
| savings, but to connect to primary business | | | | unavailable, or not in a timely fashion |
| processes as much as possible. As an example, faster | | | | - Maintenance required for tuning, indexing, and |
| turnaround cycles for list selections are fine (when | | | | backup and recovery is severely underestimated |
| quantified in hourly rates), but it is even better if the | | | | - Different ways of calculating the same phenomenon |
| revenue from more customer acquisitions that follow | | | | lead to different results, and nobody is able to |
| from these selections can be tied in. Not only will the | | | | conclusively explain the difference(s) |
| relation to revenue growth rather than savings make | | | | - The data that is loaded (and recombined) turn out |
| for a more balanced business case, more important is | | | | to contain previously unknown inconsistencies in the |
| the intrinsic business buy-in that results from a direct | | | | source systems, the 'classic' data quality issues that |
| connection to the company bottom line. These days, | | | | trip DWH projects |
| changes in legislation (in particular Sarbanes-Oxley) | | | | - Metadata were lacking, and developers spend |
| play a major role in justifying business cases. This | | | | inordinate amounts of time finding out what a field |
| may be either through a higher company valuation | | | | really 'means' |
| for its transparent information gathering, or, less | | | | 9. DWH Hardware and Software Go Hand in Hand |
| sleepless night for the CEO, which is of course | | | | In Data Warehousing, it is not about hardware, and |
| priceless... | | | | not about software: it is about the perfect |
| 4. Why Do Data Warehouse Projects 'Never' Go | | | | integration of these two. Those who begin their |
| Wrong? | | | | project from either end, will pay dearly for this |
| Actually, Data Warehouse projects do sometimes fail. | | | | mistake. Reasons are: |
| But, they fail so rarely, that it is actually very hard to | | | | · in terms of price/performance, new, |
| believe... Especially after having talked to so many | | | | pre-integrated hardware-software combinations are |
| disgruntled end-users. And there are many ways a | | | | taking the lead |
| Data Warehouse project can go wrong. Delivering on | | | | · from a project management perspective, |
| time, data administration issues, and unavoidable data | | | | you never want to be caught between vendors |
| quality issues in feeding systems. Corporate politics | | | | when a proposed solution doesn't work as expected |
| (see Tip 7) are probably the best explanation for this | | | | · database tuning and indexing is very |
| phenomenon of near 100% success rates on DWH | | | | important and a hugely complex job, necessarily left |
| projects. In my experience, the reason why a failure | | | | to specialists (in-house trained) |
| or 'semi-failure' can go unnoticed is either because | | | | 10. Performance is Key |
| senior management is not aware, or, let's say | | | | Although I don't often find technology factors to be |
| "unmotivated" to talk about misspending of company | | | | this important, in Data Warehouse acceptance, no |
| funds. As a result, not enough is learned. Maybe we | | | | other factor will be as important as performance. As |
| as consultants have a stake in this as well, as this | | | | size increases over time, this factor becomes even |
| assures the industry plenty of ongoing business... J | | | | more important. There are three reasons for this: |
| 5. What is Different About Warehousing Web Data? | | | | |
| Kimball & Merz (2000): "Although this clickstream | | | | 1. performance has a huge impact on the |
| data in many cases is raw and unvarnished, it has the | | | | development speed (initial load is always very time |
| potential of providing unprecedented detail about | | | | consuming), and hence the overall maturity of the |
| every gesture made by every human being using the | | | | DWH at delivery time |
| Web medium". The subatomic nature of clickstream | | | | 2. performance can make or break end-user |
| data poses unique challenges. There are fewer built in | | | | acceptance, in particular the predictability of |
| feedback mechanisms to ensure data quality, | | | | performance |
| compared to other data streams. The relation | | | | 3. |
| between user mouse clicks and server log records is | | | | |