Recipient: GitHub Secure Open Source Fund
π Sponsor DataJourneyHQ
Β β’Β
π₯Official announcement
A design-first open-source data management toolkit.
Understand the mechanics of stitching tools together into one cohesive, beautiful system.
DataJourney is a design-first open-source data management toolkit that teaches you how to assemble cohesive data systems from individual components.
Rather than prescribing specific tools, it demonstrates the mechanics of integration, demonstrating how to stitch together open-source technologies into scalable, reproducible workflows. With its modular, flexible design, DataJourney serves as both a learning resource and a practical toolkit for data professionals who want to grasp the art and science of building harmonious data systems.
Built with additive, subtractive capabilities glued with open source. Each layer has a certain strength of communication inbuilt
- PO (Base): Static home(s) to keep it together
(GitHub) - P1 (Tooling): Tooling, strings
(Powered by open source) - P2 (Maintenance + Monitoring): Env, automations
(Pixi + GHA) - P3 (Abstraction): Layer(s), CLI/task manager for users to interact with
(Pixi)
{β¨= Experimental, β = Implemented}
| Status | Workflow Description | Journey Type |
|---|---|---|
| β | Pre-commit hooks configured for code linting/formatting |
Code Quality |
| β | Exploratory data analysis (EDA) using mito |
EDA |
| β | Environment management via pixi |
Environment Management |
| β | GenAI examples to analyse data GitHub AI models |
AI Data Analysis |
| β | custom Dashboard using holoviews + panel |
Dashboarding |
| β | Reading data from online sources using intake |
Data Ingestion |
| β | Data pipeline built using Dagster |
Orchestration / Pipelines |
| β | Hello world LLM design example based on LangChain |
LLM Example |
| β | Python Packaging framework design principles |
Packaging / Project Structure |
| β | Prompt enhancer powered by gpt-oss-120b |
Prompt Engineering |
| β | RAG powered by langchain, chromadb & GitHub AI models |
RAG Pipeline |
| β | GitHub actions configured |
CI/CD |
| β | Web UI build on Flask |
Web Application |
| β | Web UI re-done and expanded with FastHTML |
Web Application |
| β | Vale.sh configured at PR level |
Docs Linting |
| β | Query engine for LLM application using Chromadb |
Vector Retrieval |
| β | LLM Evaluation & Tracing for data analysis pipelines using DeepEval |
LLM Evaluation |
- Fork the repository
- Generate & add
GITHUB_TOKEN, instructions hereAdditional requirement to run LLM based workflows; Eg: DJ_prompt_enhancer, DJ_llm_analysis, others
- Switch directory
cd DataJourney - Download pixi : prefix.dev
- Activate env:
pixi shell - Install DJ framework locally
pixi run DJ_package - List all the tasks:
pixi run DJ_list - Execute a specific task from the list:
pixi run <TASK_NAME> - Execute a specific task with additional logs:
pixi run -v <TASK_NAME>
| Task Name | Description |
|---|---|
GIT_TOKEN_CHECK |
Verifies the availability and validity of the Git authentication token. |
DJ_package |
Prepares and builds the Python package for the DataJourney project. |
DJ_pre_commit |
Runs pre-commit hooks to ensure code quality and adherence to standards. |
DJ_dagster |
Sets up and runs a Dagster workflow for orchestration in the project. |
DJ_fasthtml_app |
Executes a FastAPI-based HTML application. |
DJ_flask_app |
Configures and runs a Flask-based application for data services. |
DJ_mito_app |
Launches the Mito application for interactive data analysis in notebooks. |
DJ_panel_app |
Executes a Panel dashboard app for data visualization and analytics. |
DJ_llm_analysis |
Performs analysis using large language models (LLMs) on project data. |
DJ_hello_world_langchain |
Sets up a basic LangChain app as a "Hello World" example for LLMs. |
DJ_spanish_eng_translation |
Performs Spanish to English translation with Deepseek-R1 (NOTE: Takes about ~30 secs to execute this task) |
DJ_sync_dataset_trees |
Downloads and synchronizes the trees.csv dataset into the project structure. |
DJ_chromadb_gen_embedding |
Query engine for LLM applications |
DJ_RAG_without_memory |
End-to-end Retrieval-Augmented Generation (RAG) pipeline |
DJ_prompt_enhancer |
How to design a simple prompt enhancer using gpt-oss-120b |
Just like the name suggests, pre-commit-hooks are designed to format the code based on PEP standards before committing. More details
pixi run DJ_pre_commitpixi run DJ_llm_analysispixi run DJ_dagsterpixi run DJ_panel_appNOTE: The dashboard generated is exported into HTML format and saved as stock_price_twilio_dashboard
To explore further visit trymito.io
pixi run DJ_mito_app# Run FastHTML app
pixi run DJ_fasthtml_app




