Skip to content

Commit c78f718

Browse files
authored
format
1 parent 90df784 commit c78f718

1 file changed

Lines changed: 31 additions & 27 deletions

File tree

0_Azure/2_AzureAnalytics/0_Fabric/demos/17_Overview.md

Lines changed: 31 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -33,22 +33,23 @@ Last updated: 2024-12-31
3333
<details>
3434
<summary><b>Table of Contents</b> (Click to expand)</summary>
3535

36-
- [Fabric Overview](#fabric-overview)
37-
- [Wiki](#wiki)
38-
- [Overview](#overview)
39-
- [Key Components](#key-components)
40-
- [Features](#features)
41-
- [OneLake in Microsoft Fabric](#onelake-in-microsoft-fabric)
42-
- [Lakehouse & Data Warehouse](#lakehouse--data-warehouse)
43-
- [Parquet & Delta Data Formats](#parquet--delta-data-formats)
44-
- [Dataflow Gen2 & Data Pipelines](#dataflow-gen2--data-pipelines)
45-
- [Shortcuts & Mirroring](#shortcuts--mirroring)
46-
- [Data Factory](#data-factory)
47-
- [Medallion Architecture Overview](#medallion-architecture-overview)
48-
- [Fabric: Highlights into AI/LLMs](#fabric-highlights-into-aillms)
49-
- [Writing SQL: SQL Analytics Endpoint](#writing-sql-sql-analytics-endpoint)
50-
- [How to Configure and Use the SQL Analytics Endpoint](#how-to-configure-and-use-the-sql-analytics-endpoint)
51-
- [Fabric AI Skill](#fabric-ai-skill)
36+
- [Wiki](#wiki)
37+
- [Content](#content)
38+
- [Overview](#overview)
39+
- [Key Components](#key-components)
40+
- [Features](#features)
41+
- [OneLake in Microsoft Fabric](#onelake-in-microsoft-fabric)
42+
- [Lakehouse & Data Warehouse](#lakehouse--data-warehouse)
43+
- [Parquet & Delta Data Formats](#parquet--delta-data-formats)
44+
- [Z-Order and V-Order](#z-order-and-v-order)
45+
- [Dataflow Gen2 & Data Pipelines](#dataflow-gen2--data-pipelines)
46+
- [Shortcuts & Mirroring](#shortcuts--mirroring)
47+
- [Data Factory](#data-factory)
48+
- [Medallion Architecture Overview](#medallion-architecture-overview)
49+
- [Fabric: Highlights into AI/LLMs](#fabric-highlights-into-aillms)
50+
- [Writing SQL: SQL Analytics Endpoint](#writing-sql-sql-analytics-endpoint)
51+
- [How to Configure and Use the SQL Analytics Endpoint](#how-to-configure-and-use-the-sql-analytics-endpoint)
52+
- [Fabric AI Skill](#fabric-ai-skill)
5253

5354
</details>
5455

@@ -160,7 +161,7 @@ graph TD
160161
> - `ACID Transaction`s: Ensures data reliability and consistency, supporting complex data operations without data corruption.
161162
> - `Schema Enforcement and Evolution`: Allows for schema changes over time, making it easier to manage evolving data structures.
162163
> - `Time Travel:` Enables querying of historical data, providing the ability to access and revert to previous versions of data.
163-
> - `Efficient Data Management`: Features like compaction, Z-Order, and V-Order optimize data storage and query performance
164+
> - `Efficient Data Management`: Features like compaction, [Z-Order](#z-order-and-v-order), and [V-Order](#z-order-and-v-order) optimize data storage and query performance
164165
165166
```mermaid
166167
graph TD
@@ -177,15 +178,6 @@ graph TD
177178
L -->|Query Optimization| V[✔️]
178179
end
179180
```
180-
181-
| **Aspect** | **Z-Order** | **V-Order** |
182-
|--------------------------|------------------------------------------------------------------------------|----------------------------------------------------------------------------|
183-
| **Purpose** | Improves query performance by co-locating related information in the same set of files. | Enhances read performance by organizing data in a way that leverages Microsoft Verti-Scan technology. |
184-
| **Key Features** | - Data Co-Location: Organizes data based on one or more columns, storing rows with similar values together. <br/> - Query Efficiency: Reduces the amount of data read during queries, improving performance. <br/> - Compatibility: Works with Delta Lake to enhance data-skipping algorithms. | - Special Sorting: Applies special sorting techniques to Parquet files. <br/> - Row Group Distribution: Optimizes row group distribution for better read performance. <br/> - Dictionary Encoding and Compression: Uses efficient dictionary encoding and compression. <br/> - Performance Boost: Provides fast reads under various compute engines. <br/> - Cost Efficiency: Reduces network, disk, and CPU resources during reads. |
185-
| **Timing** | Applied during read time (or table optimization). | Applied during write time. |
186-
| **Use Cases** | - When you need to improve query performance by reducing the amount of data read. <br/> - For queries that frequently filter on specific columns. | - When you need to enhance read performance and reduce storage costs. <br/> - For scenarios requiring efficient data access across various compute engines. |
187-
| **Compatibility** | Requires specific tools like Delta Lake. | Universally compatible with all Parquet engines.
188-
189181

190182
| Feature | Parquet | Delta | Available in Parquet? | Available in Delta? |
191183
|------------------------|----------------------------------------------|--------------------------------------------|-----------------------|---------------------|
@@ -198,10 +190,22 @@ graph TD
198190
| **Data Versioning** | Not available, limiting the ability to track changes over time. | Provides data versioning, allowing for auditing and rollback scenarios. || ✔️ |
199191
| **Schema Enforcement** | No built-in schema enforcement, requiring external validation. | Enforces schema consistency, maintaining data quality. || ✔️ |
200192
| **Efficient Updates** | Does not support efficient updates, making it less suitable for frequently changing data. | Allows for efficient updates and deletes, ideal for dynamic datasets. || ✔️ |
201-
| **Query Optimization** | Basic query optimization, relying on columnar storage benefits. | Advanced query optimization with features like data skipping and Z-order indexing. | ✔️ | ✔️ |
193+
| **Query Optimization** | Basic query optimization, relying on columnar storage benefits. | Advanced query optimization with features like data skipping and [Z-order](#z-order-and-v-order) indexing. | ✔️ | ✔️ |
202194
| **Use Case** | Ideal for data warehousing, batch processing, and scenarios where data is primarily read and not frequently updated. | Best suited for data lakes, real-time analytics, and environments requiring strict data integrity and frequent updates. | ✔️ | ✔️ |
203195
| **Additional Context** | Parquet is excellent for read-heavy workloads and large-scale data analytics. It's widely supported and highly efficient for scenarios where data doesn't change frequently. | Delta builds on Parquet by adding features like ACID transactions, data versioning, and efficient updates/deletes. It's designed for environments where data integrity, frequent updates, and complex data operations are crucial. | ✔️ | ✔️ |
204196

197+
## Z-Order and V-Order
198+
199+
200+
| **Aspect** | **Z-Order** | **V-Order** |
201+
|--------------------------|------------------------------------------------------------------------------|----------------------------------------------------------------------------|
202+
| **Purpose** | Improves query performance by co-locating related information in the same set of files. | Enhances read performance by organizing data in a way that leverages Microsoft Verti-Scan technology. |
203+
| **Key Features** | - Data Co-Location: Organizes data based on one or more columns, storing rows with similar values together. <br/> - Query Efficiency: Reduces the amount of data read during queries, improving performance. <br/> - Compatibility: Works with Delta Lake to enhance data-skipping algorithms. | - Special Sorting: Applies special sorting techniques to Parquet files. <br/> - Row Group Distribution: Optimizes row group distribution for better read performance. <br/> - Dictionary Encoding and Compression: Uses efficient dictionary encoding and compression. <br/> - Performance Boost: Provides fast reads under various compute engines. <br/> - Cost Efficiency: Reduces network, disk, and CPU resources during reads. |
204+
| **Timing** | Applied during read time (or table optimization). | Applied during write time. |
205+
| **Use Cases** | - When you need to improve query performance by reducing the amount of data read. <br/> - For queries that frequently filter on specific columns. | - When you need to enhance read performance and reduce storage costs. <br/> - For scenarios requiring efficient data access across various compute engines. |
206+
| **Compatibility** | Requires specific tools like Delta Lake. | Universally compatible with all Parquet engines.
207+
208+
205209
## Dataflow Gen2 & Data Pipelines
206210

207211
<img width="709" alt="image" src="https://github.com/brown9804/MSCloudEssentials_LPath/assets/24630902/4d9d5e6d-ff9c-4f21-954e-61f644c750bd">

0 commit comments

Comments
 (0)