Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Wanet Telecoms Ltd on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

star schema - fact and dimensional tables question 1

Status
Not open for further replies.

jared5494

Technical User
Jun 9, 2002
111
US
i know using a star schema can be used to answer questions such as: who are the top 5% salespersons, what was the best month in sales, etc.

i am having a hard time finding information on how to do this in coding. i know you have to set up a fact table and dimensional tables (i am using the Northwind Database that ms sql provides), but how do you put the data into the fact tables? i tried all sorts of google searches to find info on star schemas, but came up short.

please help!
 
Can you be a bit more specific?

Have you modelled your schema? Your Fact I'm presuming should represent a business process? and you populate that table when an event that you want to capture has happened usually from your OLTP relational database. Such as a payment has been made. This is the ETL process to extract transform and load the data into your mart. You may want to create a view or stored procedure that runs everynight to insert event based data that has been added (for example such as a new payment that day) or amended(for example payment id has been changed) and store information such as a transaction date, transaction type and perhaps the load id, then store the surrogate foriegn keys that relate to your dimensions.
 
When you load your data from your oltp app to your fact tables you are going to be doing a denormilazing of some of the data and additional mormalizing of some of the data. Assume that you have 3 tables with customer data in it and one of the tables has a field called NewCustomerDate with a datatype of datetime. To get the data into the fact table you will want to move those three tables into a single fact table. Then any field which will be used for rolling up (such as our NewCustomerDate field) will be broken out into a dimention. The NewCustomerDate field should no relate to a dimention called dim_Date (or something to that effect). As you load the data from the oltp schema to the olap schema (facts and dims) you will be using a series of joins to the dimentions to get the translated values for the fact table.

Did I answer your question, or did I totally go off towards left field?

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
thanks for your replies.

here's the star schema i was going to test:

here is the northwind database diagram:

my question is that do i just populate the dimension tables with something such as the CUSTOMER dimesional table:

SELECT CUSTOMERID, COMPANYNAME, CITY, COUNTRY
INTO CUSTOMER
FROM NORTHWIND..CUSTOMER

or is there something more complicated required?
 
For the initial data load, that's exactally what you want to do.

If this was a live system you would need some more complex logic to handle updated rows and new rows.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
thanks,
i do have a question about the time dimension table. i know its so you can see monthly/weekly statistics, but how do you populate the time table?
 
Sadly the easiest way is via a cursor. You'll want to pre load it with a few years worth of data then at the begining of each year load it with another years worth of data.

Some people will preload it with data for the next hundred years so that they don't have to deal with it again.

My preference is to figure out what the latest date in the database currently is. Assume that it's today. Then add 3 years, so you can 4/20/2010. Then go to Dec 31 for that year so you get 12/31/2010. Then populate the table with dates starting from a couple years before the first date to the date you just figured out. Then have a job scheduled to run on Jan 1 of each year. Have the job find the highest date and add one year, then start adding values until you reach that point.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
thanks,
i will insert the time dimension table (for other user's seach):

CREATE TABLE TIME
(
TIMEID int IDENTITY(1,1)
PRIMARY KEY CLUSTERED,
TIME_DATE SMALLDATETIME,
TIME_YEAR SMALLINT,
TIME_DAY VARCHAR(10),
TIME_MONTH VARCHAR(10),
TIME_QUARTER VARCHAR(2)
)

DECLARE @CurrDate SMALLDATETIME

SET @CurrDate = '01/01/1996'

WHILE @CurrDate < '01/01/1999'
BEGIN
INSERT TIME
(TIME_DATE, TIME_YEAR, TIME_DAY, TIME_MONTH,
TIME_QUARTER)
VALUES
(@CurrDate, -- TIME_DATE
YEAR(@CurrDate), -- TIME_YEAR
DATENAME(dw, @CurrDate), -- TIME_DAY
DATENAME(month, @CurrDate), -- TIME_MONTH
'Q' + DATENAME(quarter,@CurrDate)) -- QuarterOfYear

SET @CurrDate = DATEADD(day, 1, @CurrDate)
END
 
Looks good to me.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
ok last question! =)

thanks again for all your help!

ok, the fact table, heres the code i tried:

SELECT NORTHWIND.."ORDER DETAILS".PRODUCTID,
NORTHWIND..ORDERS.CUSTOMERID,
NORTHWIND..ORDERS.EMPLOYEEID,
TIMEID,
NORTHWIND..SUPPLIERS.SUPPLIERID,
NORTHWIND.."ORDER DETAILS".UNITPRICE,
NORTHWIND.."ORDER DETAILS".QUANTITY
INTO ORDERS_FACT
FROM NORTHWIND..ORDERS,
NORTHWIND.."ORDER DETAILS",
TIME, NORTHWIND..SUPPLIERS,
NORTHWIND..PRODUCTS,
NORTHWIND..CUSTOMERS,
NORTHWIND..EMPLOYEES
WHERE NORTHWIND..PRODUCTS.PRODUCTID = NORTHWIND.."ORDER DETAILS".PRODUCTID
AND NORTHWIND..CUSTOMERS.CUSTOMERID = NORTHWIND..ORDERS.CUSTOMERID
AND NORTHWIND..EMPLOYEES.EMPLOYEEID = NORTHWIND..ORDERS.EMPLOYEEID
AND TIME.TIME_DATE = NORTHWIND..ORDERS.ORDERDATE



however, sql is saying:
Could not allocate space for object 'ORDERS_FACT' in database 'cis31021' because the 'PRIMARY' filegroup is full.

im thinking the query above just made a more super huge file, that was over the size that should be needed.
 
ok i forgot a few lines, here is the solution i think is right:

i added:

AND NORTHWIND..SUPPLIERS.SUPPLIERID = NORTHWIND..PRODUCTS.SUPPLIERID
AND NORTHWIND..ORDERS.ORDERID = NORTHWIND.."ORDER DETAILS".ORDERID

to the WHERE clause and it shortened it from 350,000 lines to 2155 lines LOL. whoops. i guess there is a reason why the size is limited.

would that the the only additions?
thanks!
 
That's one of the reasons that the new join format works better than the old join format.

It makes it easier to see missing paramaters. Looks like you got all of them.

Code:
SELECT NORTHWIND..[ORDER DETAILS].PRODUCTID, 
       NORTHWIND..ORDERS.CUSTOMERID, 
       NORTHWIND..ORDERS.EMPLOYEEID, 
       TIME.TIMEID, 
       NORTHWIND..SUPPLIERS.SUPPLIERID, 
       NORTHWIND..[ORDER DETAILS].UNITPRICE, 
       NORTHWIND..[ORDER DETAILS].QUANTITY
INTO ORDERS_FACT
FROM NORTHWIND..ORDERS
JOIN NORTHWIND..[ORDER DETAILS] ON NORTHWIND..ORDERS.ORDERID = NORTHWIND..[ORDER DETAILS].ORDERID

JOIN TIME ON TIME.TIME_DATE = NORTHWIND..ORDERS.ORDERDATE
JOIN NORTHWIND..PRODUCTS ON NORTHWIND..PRODUCTS.PRODUCTID = NORTHWIND..[ORDER DETAILS].PRODUCTID
JOIN NORTHWIND..SUPPLIERS ON NORTHWIND..SUPPLIERS.SUPPLIERID = NORTHWIND..PRODUCTS.SUPPLIERID
JOIN NORTHWIND..CUSTOMERS ON NORTHWIND..CUSTOMERS.CUSTOMERID = NORTHWIND..ORDERS.CUSTOMERID
JOIN NORTHWIND..EMPLOYEES ON NORTHWIND..EMPLOYEES.EMPLOYEEID = NORTHWIND..ORDERS.EMPLOYEEID

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top