Skip to content

Commit 5f52a39

Browse files
authored
mv
1 parent aa17c33 commit 5f52a39

1 file changed

Lines changed: 111 additions & 0 deletions

File tree

  • 0_Azure/2_AzureAnalytics/3_Databricks
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Azure Databricks
2+
3+
Costa Rica
4+
5+
[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
6+
[brown9804](https://github.com/brown9804)
7+
8+
Last updated: 2024-11-15
9+
10+
----------
11+
12+
13+
## Wiki
14+
- [What is Azure Databricks?](https://learn.microsoft.com/en-us/azure/databricks/introduction/)
15+
- [Tutorial: Implement Azure Databricks with an Azure Cosmos DB endpoint](https://learn.microsoft.com/en-us/azure/databricks/scenarios/service-endpoint-cosmosdb)
16+
- [Query databases using JDBC](https://learn.microsoft.com/en-us/azure/databricks/connect/external-systems/jdbc)
17+
- [Query SQL Server with Azure Databricks](https://learn.microsoft.com/en-us/azure/databricks/connect/external-systems/sql-server)
18+
- [How to connect from Azure Databricks to Azure SQL DB using service](https://stackoverflow.com/collectives/azure/articles/75189853/how-to-connect-from-azure-databricks-to-azure-sql-db-using-service-principal)
19+
- [Azure DataBricks To Connect SQL DataBase with Pyspark](https://stackoverflow.com/questions/76820391/azure-databricks-to-connect-sql-database-with-pyspark)
20+
- [Use a SQL connector, driver, or API](https://learn.microsoft.com/en-us/azure/databricks/dev-tools/index-driver)
21+
- [Connect to azure sql database from databricks](https://community.databricks.com/t5/data-engineering/connect-to-azure-sql-database-from-databricks-using-service/td-p/36174)
22+
- [Query data in Azure Synapse Analytics](https://learn.microsoft.com/en-us/azure/databricks/connect/external-systems/synapse-analytics)
23+
- [Connection from databricks to azure synapse](https://stackoverflow.com/questions/72873898/connection-from-databricks-to-azure-synapse)
24+
25+
## Introduction to Azure Databricks
26+
27+
> Azure Databricks is a comprehensive analytics platform for big data and AI, built on Apache Spark. It offers a collaborative workspace for data engineers, scientists, and analysts to engage in data processing, machine learning, and real-time analytics.
28+
29+
<figure>
30+
<img
31+
width="800"
32+
src="https://github.com/user-attachments/assets/c6c298f2-aae2-4ae8-b6cc-0407a22a32a2"
33+
alt="The beautiful MDN logo.">
34+
<figcaption> <br/> From https://www.databricks.com/product/azure </figcaption>
35+
</figure>
36+
37+
| Aspect | Details |
38+
| ----- | ---- |
39+
| Pricing tier | - **Standard**: Includes core Apache Spark features and Microsoft Entra integration. <br/> - **Premium**: Offers role-based access controls and advanced enterprise features. <br/> - **Trial**: Provides a 14-day free trial of our premium workspace. |
40+
41+
42+
### **Key Features**
43+
Here are some of the key features of Azure Databricks:
44+
45+
| **Feature** | **Description** |
46+
|------------------------|---------------------------------------------------------------------------------|
47+
| **Data Processing** | Efficiently process large volumes of data using Apache Spark. |
48+
| **Machine Learning** | Build, train, and deploy machine learning models at scale. |
49+
| **Real-Time Analytics**| Perform real-time data analysis and generate insights quickly. |
50+
| **Collaborative Workspace** | Provides a collaborative environment for different roles to work together. |
51+
| **Scalability** | Automatically scales resources to handle varying workloads. |
52+
| **Integration** | Seamlessly integrates with other Azure services and open-source tools. |
53+
54+
This diagram shows how Azure Databricks integrates with various data sources and storage solutions, processes data using Apache Spark, and supports machine learning and real-time analytics.
55+
56+
```mermaid
57+
graph TD
58+
A[Data Sources] -->|Azure Blob Storage, SQL DB| B[Azure Databricks Workspace]
59+
B --> C[Machine Learning - MLflow, etc.]
60+
61+
E[Data Storage] -->|Azure Data Lake, SQL DB| F[Data Processing - Apache Spark]
62+
F --> D[Real-Time Analytics - Dashboards, etc.]
63+
64+
```
65+
66+
## Architecture and Components
67+
68+
Azure Databricks architecture is divided into two main components:
69+
70+
| **Component** | **Description** |
71+
|--------------------------|---------------------------------------------------------------------------------|
72+
| **Control Plane** | Manages backend services, authentication, job scheduling, and cluster management. Hosts the web application and REST APIs. |
73+
| **Compute Plane** | Where data processing happens, consisting of clusters running Apache Spark jobs. |
74+
75+
There are two types of compute planes:
76+
77+
| **Type of Compute Plane** | **Description** |
78+
|--------------------------|---------------------------------------------------------------------------------|
79+
| **Serverless Compute** | Managed by Azure Databricks, with automatic scaling and resource management. Ideal for users preferring a hands-off approach. |
80+
| **Classic Compute** | Managed by the user, with full control over compute resources within their Azure subscription. Allows for more customization and manual scaling. |
81+
82+
This diagram below shows how the control plane interacts with both the serverless and classic compute planes.
83+
84+
```mermaid
85+
graph TD
86+
subgraph Control Plane
87+
A[Backend Services]
88+
B[Web Application]
89+
C[REST APIs]
90+
end
91+
subgraph Compute Plane
92+
D[Serverless Compute]
93+
E[Classic Compute]
94+
end
95+
A --> D
96+
A --> E
97+
B --> D
98+
B --> E
99+
C --> D
100+
C --> E
101+
```
102+
103+
## Recommended Trainings
104+
- [Explore Azure Databricks](https://learn.microsoft.com/en-us/training/modules/explore-azure-databricks/)
105+
- [Perform data analysis with Azure Databricks](https://learn.microsoft.com/en-us/training/modules/perform-data-analysis-azure-databricks/)
106+
- [Use Apache Spark in Azure Databricks](https://learn.microsoft.com/en-us/training/modules/use-apache-spark-azure-databricks/)
107+
108+
<div align="center">
109+
<h3 style="color: #4CAF50;">Total Visitors</h3>
110+
<img src="https://profile-counter.glitch.me/brown9804/count.svg" alt="Visitor Count" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/>
111+
</div>

0 commit comments

Comments
 (0)