|
| 1 | +# Azure Databricks |
| 2 | + |
| 3 | +Costa Rica |
| 4 | + |
| 5 | +[](https://github.com/) |
| 6 | +[brown9804](https://github.com/brown9804) |
| 7 | + |
| 8 | +Last updated: 2024-11-15 |
| 9 | + |
| 10 | +---------- |
| 11 | + |
| 12 | + |
| 13 | +## Wiki |
| 14 | +- [What is Azure Databricks?](https://learn.microsoft.com/en-us/azure/databricks/introduction/) |
| 15 | +- [Tutorial: Implement Azure Databricks with an Azure Cosmos DB endpoint](https://learn.microsoft.com/en-us/azure/databricks/scenarios/service-endpoint-cosmosdb) |
| 16 | +- [Query databases using JDBC](https://learn.microsoft.com/en-us/azure/databricks/connect/external-systems/jdbc) |
| 17 | +- [Query SQL Server with Azure Databricks](https://learn.microsoft.com/en-us/azure/databricks/connect/external-systems/sql-server) |
| 18 | +- [How to connect from Azure Databricks to Azure SQL DB using service](https://stackoverflow.com/collectives/azure/articles/75189853/how-to-connect-from-azure-databricks-to-azure-sql-db-using-service-principal) |
| 19 | +- [Azure DataBricks To Connect SQL DataBase with Pyspark](https://stackoverflow.com/questions/76820391/azure-databricks-to-connect-sql-database-with-pyspark) |
| 20 | +- [Use a SQL connector, driver, or API](https://learn.microsoft.com/en-us/azure/databricks/dev-tools/index-driver) |
| 21 | +- [Connect to azure sql database from databricks](https://community.databricks.com/t5/data-engineering/connect-to-azure-sql-database-from-databricks-using-service/td-p/36174) |
| 22 | +- [Query data in Azure Synapse Analytics](https://learn.microsoft.com/en-us/azure/databricks/connect/external-systems/synapse-analytics) |
| 23 | +- [Connection from databricks to azure synapse](https://stackoverflow.com/questions/72873898/connection-from-databricks-to-azure-synapse) |
| 24 | + |
| 25 | +## Introduction to Azure Databricks |
| 26 | + |
| 27 | +> Azure Databricks is a comprehensive analytics platform for big data and AI, built on Apache Spark. It offers a collaborative workspace for data engineers, scientists, and analysts to engage in data processing, machine learning, and real-time analytics. |
| 28 | +
|
| 29 | +<figure> |
| 30 | +<img |
| 31 | +width="800" |
| 32 | +src="https://github.com/user-attachments/assets/c6c298f2-aae2-4ae8-b6cc-0407a22a32a2" |
| 33 | +alt="The beautiful MDN logo."> |
| 34 | +<figcaption> <br/> From https://www.databricks.com/product/azure </figcaption> |
| 35 | +</figure> |
| 36 | + |
| 37 | +| Aspect | Details | |
| 38 | +| ----- | ---- | |
| 39 | +| Pricing tier | - **Standard**: Includes core Apache Spark features and Microsoft Entra integration. <br/> - **Premium**: Offers role-based access controls and advanced enterprise features. <br/> - **Trial**: Provides a 14-day free trial of our premium workspace. | |
| 40 | + |
| 41 | + |
| 42 | +### **Key Features** |
| 43 | +Here are some of the key features of Azure Databricks: |
| 44 | + |
| 45 | +| **Feature** | **Description** | |
| 46 | +|------------------------|---------------------------------------------------------------------------------| |
| 47 | +| **Data Processing** | Efficiently process large volumes of data using Apache Spark. | |
| 48 | +| **Machine Learning** | Build, train, and deploy machine learning models at scale. | |
| 49 | +| **Real-Time Analytics**| Perform real-time data analysis and generate insights quickly. | |
| 50 | +| **Collaborative Workspace** | Provides a collaborative environment for different roles to work together. | |
| 51 | +| **Scalability** | Automatically scales resources to handle varying workloads. | |
| 52 | +| **Integration** | Seamlessly integrates with other Azure services and open-source tools. | |
| 53 | + |
| 54 | +This diagram shows how Azure Databricks integrates with various data sources and storage solutions, processes data using Apache Spark, and supports machine learning and real-time analytics. |
| 55 | + |
| 56 | +```mermaid |
| 57 | +graph TD |
| 58 | + A[Data Sources] -->|Azure Blob Storage, SQL DB| B[Azure Databricks Workspace] |
| 59 | + B --> C[Machine Learning - MLflow, etc.] |
| 60 | +
|
| 61 | + E[Data Storage] -->|Azure Data Lake, SQL DB| F[Data Processing - Apache Spark] |
| 62 | + F --> D[Real-Time Analytics - Dashboards, etc.] |
| 63 | +
|
| 64 | +``` |
| 65 | + |
| 66 | +## Architecture and Components |
| 67 | + |
| 68 | +Azure Databricks architecture is divided into two main components: |
| 69 | + |
| 70 | +| **Component** | **Description** | |
| 71 | +|--------------------------|---------------------------------------------------------------------------------| |
| 72 | +| **Control Plane** | Manages backend services, authentication, job scheduling, and cluster management. Hosts the web application and REST APIs. | |
| 73 | +| **Compute Plane** | Where data processing happens, consisting of clusters running Apache Spark jobs. | |
| 74 | + |
| 75 | +There are two types of compute planes: |
| 76 | + |
| 77 | +| **Type of Compute Plane** | **Description** | |
| 78 | +|--------------------------|---------------------------------------------------------------------------------| |
| 79 | +| **Serverless Compute** | Managed by Azure Databricks, with automatic scaling and resource management. Ideal for users preferring a hands-off approach. | |
| 80 | +| **Classic Compute** | Managed by the user, with full control over compute resources within their Azure subscription. Allows for more customization and manual scaling. | |
| 81 | + |
| 82 | +This diagram below shows how the control plane interacts with both the serverless and classic compute planes. |
| 83 | + |
| 84 | +```mermaid |
| 85 | +graph TD |
| 86 | + subgraph Control Plane |
| 87 | + A[Backend Services] |
| 88 | + B[Web Application] |
| 89 | + C[REST APIs] |
| 90 | + end |
| 91 | + subgraph Compute Plane |
| 92 | + D[Serverless Compute] |
| 93 | + E[Classic Compute] |
| 94 | + end |
| 95 | + A --> D |
| 96 | + A --> E |
| 97 | + B --> D |
| 98 | + B --> E |
| 99 | + C --> D |
| 100 | + C --> E |
| 101 | +``` |
| 102 | + |
| 103 | +## Recommended Trainings |
| 104 | +- [Explore Azure Databricks](https://learn.microsoft.com/en-us/training/modules/explore-azure-databricks/) |
| 105 | +- [Perform data analysis with Azure Databricks](https://learn.microsoft.com/en-us/training/modules/perform-data-analysis-azure-databricks/) |
| 106 | +- [Use Apache Spark in Azure Databricks](https://learn.microsoft.com/en-us/training/modules/use-apache-spark-azure-databricks/) |
| 107 | + |
| 108 | +<div align="center"> |
| 109 | + <h3 style="color: #4CAF50;">Total Visitors</h3> |
| 110 | + <img src="https://profile-counter.glitch.me/brown9804/count.svg" alt="Visitor Count" style="border: 2px solid #4CAF50; border-radius: 5px; padding: 5px;"/> |
| 111 | +</div> |
0 commit comments