Created by the Transaction Processing Council as a benchmark for ETL applications, the TPC-DI benchmark involves integrating multiple data sources and file formats into a data warehouse schema. The benchmark specifies requirements both for historical and incremental loading, as well as transformations to be applied.
As outlined in the official TPC documentation found here, the data model for the benchmark is a retail brokerage firm.
Flat files originate from 5 primary sources:
- OLTP: CDC extracts, which are bar-delimited txt
- HR: comma delimited database extract
- Prospect List: comma delimited csv
- Financial Newswire: Variable Fixed Width Format
- CRM: XML
The destination DWH schema will have the following:
Facts | Dimensions | Reference |
---|---|---|
fCashBalances | dCustomers | TradeTypes |
fHoldings | dAccounts | StatusTypes |
fWatches | dBrokers | TaxRates |
fMarketHistory | dSecurities | Industries |
fProspects | dCompanies | Financials |
fTrade | dDate, dTime |