SCD (๐๐ฅ๐จ๐ฐ๐ฅ๐ฒ ๐๐ก๐๐ง๐ ๐ข๐ง๐ ๐๐ข๐ฆ๐๐ง๐ฌ๐ข๐จ๐ง) in hive
Let's #hive
๐ What is SCD and w๐ก๐๐ญ ๐๐ซ๐ ๐ญ๐ก๐ ๐ญ๐ฒ๐ฉ๐๐ฌ ๐จ๐ #๐๐๐ ๐ข๐ง ๐๐ข๐ฏ๐?
โ In the context of Hive (a data warehousing and SQL-like query language built on top of Hadoop), SCD stands for ๐๐ฅ๐จ๐ฐ๐ฅ๐ฒ ๐๐ก๐๐ง๐ ๐ข๐ง๐ ๐๐ข๐ฆ๐๐ง๐ฌ๐ข๐จ๐ง.
โ Slowly Changing Dimensions are used to track changes to dimensional data over time, allowing you to maintain historical information and analyze data at different points in time.
There are mainly three types of Slowly Changing Dimensions (SCD) in Hive:
โ ๐๐ฒ๐ฉ๐ 1 ๐๐๐ (๐๐ข๐ฌ๐ญ๐จ๐ซ๐ข๐๐๐ฅ ๐๐ฏ๐๐ซ๐ฐ๐ซ๐ข๐ญ๐):
โช In Type 1 SCD, when a change occurs in a dimension record, the old record is simply updated with the new data, overwriting the existing values.
โช This approach does not preserve historical data, and any previous versions of the record are lost.
โช Type 1 SCD is suitable when historical data is not important, and you only need the latest version of the data.
โ ๐๐ฒ๐ฉ๐ 2 ๐๐๐ (๐๐ข๐ฌ๐ญ๐จ๐ซ๐ข๐๐๐ฅ ๐๐ซ๐๐๐ค๐ข๐ง๐ - ๐๐๐ ๐๐๐ฐ ๐๐๐๐จ๐ซ๐):
โช Type 2 SCD is designed to preserve the history of dimension data by adding a new record whenever a change occurs.
โช This new record contains the updated data along with an additional attribute, such as a timestamp or version number, to indicate when the change occurred.
โช This approach maintains historical data but can lead to a larger dimension table over time.
โ ๐๐ฒ๐ฉ๐ 3 ๐๐๐ (๐๐ข๐ฌ๐ญ๐จ๐ซ๐ข๐๐๐ฅ ๐๐ซ๐๐๐ค๐ข๐ง๐ - ๐๐๐ ๐๐จ๐ฅ๐ฎ๐ฆ๐ง๐ฌ):
โช Type 3 SCD involves adding new columns to the existing dimension table to track changes. For example, you might have attributes like "current_value" and "previous_value" to store the current and previous versions of the attribute.
โช This approach keeps the dimension table relatively compact while still allowing some historical tracking.
โช However, it is limited in the number of changes it can track compared to Type 2 SCD.
โ Each SCD type has its advantages and use cases, and the choice of SCD type depends on the specific requirements of your data analysis and reporting needs.