给出以下星型模式表。
# geog_abb time_date amount value
#1: AL 2013-03-26 55.57 9113.3898
#2: CO 2011-06-28 19.25 9846.6468
#3: MI 2012-05-15 94.87 4762.5398
#4: SC 2013-01-22 29.84 649.7681
#5: ND 2014-12-03 37.05 6419.0224
# geog_abb geog_name geog_division_name geog_region_name
#1: AK Alaska Pacific West
#2: AL Alabama East South Central South
#3: AR Arkansas West South Central South
#4: AZ Arizona Mountain West
#5: CA California Pacific West
# time_date time_weekday time_week time_month time_month_name time_quarter time_quarter_name time_year
#1: 2010-01-01 Friday 1 1 January 1 Q1 2010
#2: 2010-01-02 Saturday 1 1 January 1 Q1 2010
#3: 2010-01-03 Sunday 1 1 January 1 Q1 2010
#4: 2010-01-04 Monday 1 1 January 1 Q1 2010
#5: 2010-01-05 Tuesday 1 1 January 1 Q1 2010
示例是剥离代理键以提高可读性。在结果中,层次结构中的级别没有其他属性,只是不要打扰它们,它们仍然是层次结构中的级别。
GEOGRAPHY (all fields)
/
/
FACT
\
\
TIME (all fields)
geog_region_name
/
geog_division_name
/
geog_abb (+ geog_name)
/
/
FACT
\
\
time_date
|
hierarchies: |
weekly / \ monthly
/ \
/ \
time_weekday time_month (+ time_month_name)
| |
| |
time_week time_quarter (+ time_quarter_name)
| |
| |
time_year time_year
是否有任何具体名称? Starflake? :)
|>-- geog_region_name
|
|>-- geog_division_name
|
|>-- geog_abb (+ geog_name)
|
|
geography base
/
/
FACT
\
\
time base
|
|
|>-- time_date
|
|>-- time_weekday
|
|>-- time_week
|
|>-- time_month (+ time_month_name)
|
|>-- time_quarter (+ time_quarter_name)
|
|>-- time_year
它基本上有一个维度基础表,用于存储维度中每个层次结构的每个级别的标识。不需要递归遍历雪花的级别,可能更少的连接。数据仍然很好地规范化,只有键被非规范化为 base 表。所有层次结构中的所有级别都与维度库中维度的最低粒度键相关联 此外,具有维度库表允许在层次结构级别的粒度下处理该表中的时间变量属性/时间查询。
还在自然键上!
# geog_abb time_date amount value
# 1: AK 2010-01-01 154.43 12395.472
# 2: AK 2010-01-02 88.89 6257.639
# 3: AK 2010-01-03 81.74 7193.075
# 4: AK 2010-01-04 165.87 11150.619
# 5: AK 2010-01-05 8.75 6953.055
# time_date time_year time_quarter time_month time_week time_weekday
# 1: 2010-01-01 2010 1 1 1 Friday
# 2: 2010-01-02 2010 1 1 1 Saturday
# 3: 2010-01-03 2010 1 1 1 Sunday
# 4: 2010-01-04 2010 1 1 1 Monday
# 5: 2010-01-05 2010 1 1 1 Tuesday
# time_year
# 1: 2010
# 2: 2011
# 3: 2012
# 4: 2013
# 5: 2014
# time_quarter time_quarter_name
# 1: 1 Q1
# 2: 2 Q2
# 3: 3 Q3
# 4: 4 Q4
# time_month time_month_name
# 1: 1 January
# 2: 2 February
# 3: 3 March
# 4: 4 April
# 5: 5 May
# time_week
# 1: 1
# 2: 2
# 3: 3
# 4: 4
# 5: 5
# time_weekday
# 1: Friday
# 2: Monday
# 3: Saturday
# 4: Sunday
# 5: Thursday
# time_date time_week time_weekday time_year
# 1: 2010-01-01 1 Friday 2010
# 2: 2010-01-02 1 Saturday 2010
# 3: 2010-01-03 1 Sunday 2010
# 4: 2010-01-04 1 Monday 2010
# 5: 2010-01-05 1 Tuesday 2010
# geog_abb geog_region_name geog_division_name
# 1: AK West Pacific
# 2: AL South East South Central
# 3: AR South West South Central
# 4: AZ West Mountain
# 5: CA West Pacific
# geog_region_name
# 1: North Central
# 2: Northeast
# 3: South
# 4: West
# geog_division_name
# 1: East North Central
# 2: East South Central
# 3: Middle Atlantic
# 4: Mountain
# 5: New England
# geog_abb geog_name geog_division_name geog_region_name
# 1: AK Alaska Pacific West
# 2: AL Alabama East South Central South
# 3: AR Arkansas West South Central South
# 4: AZ Arizona Mountain West
# 5: CA California Pacific West
维度库还可以存储主键的属性,这会减少维度的最低级别,但会更不一致(两个层次结构中的time_date
级别适合时间维度基地表)。
这种架构会有什么缺点?我不太关心连接和聚合的速度,以及查询工具的适应性 它有什么名字吗?它正在使用?如果不是为什么?
答案 0 :(得分:3)
您正在使用快捷方式构建雪花模式。
使用它,BI工具可以轻松使用快捷方式。
您还可以将维度的父级别的快捷方式转换为该级别的子级别的事实表格。它可以工作,你可以跳过连接,但是你需要在事实表中存储一个额外的列。
唯一关心的是数据完整性,如果父子关系发生变化,您不仅需要更新子表,还需要更新存储此关系的所有其他表。
如果您每次从规范化数据生成维度表,那么这不是什么大问题,但是如果您在事实表中存储父ID,则需要小心,甚至更多。
答案 1 :(得分:3)
您正在做的不是雪花模式......它类似于“数据库”和我们自己的变体“链接模型”。它本质上创建的链接表只包含位于Fact表和Dim表(以及其他Dim表)之间的键。虽然,我们将它们描述为实体表和度量表。
优点是
缺点是