Skip to content

Commit d832c20

Browse files
committed
feat: Update getting started documentation.
1 parent 166465b commit d832c20

3 files changed

Lines changed: 123 additions & 10 deletions

File tree

_config.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ codeurl: 'https://github.com/datahaskell/docs'
1919

2020
# Default categories (in order) to appear in the navigation
2121
sections: [
22+
['getting_started', 'Getting Started'],
2223
['community', 'Community'],
2324
['tutorial', 'Tutorials'],
2425
['library', 'Libraries'],

_posts/2025-11-09-linear-regression.md

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -81,24 +81,18 @@ In our housing example, we're going to:
8181
- Normalize everything so no single feature dominates
8282

8383
```haskell
84+
oceanProximityMapping :: [(Text, Int)]
85+
oceanProximityMapping = [("ISLAND", 0), ("NEAR OCEAN", 1), ("NEAR BAY", 2), ("<1H OCEAN", 3), ("INLAND", 4)]
86+
8487
let cleaned =
8588
df
8689
|> D.impute (F.col @(Maybe Double "total_bedrooms")) meanTotalBedrooms
8790
|> D.exclude ["median_house_value"]
88-
|> D.derive "ocean_proximity" (F.lift oceanProximity (F.col "ocean_proximity"))
91+
|> D.derive "ocean_proximity" (F.recodeWithDefault 5 oceanProximityMapping (F.col "ocean_proximity"))
8992
|> D.derive
9093
"rooms_per_household"
9194
(F.col @Double "total_rooms" / F.col "households")
9295
|> normalizeFeatures
93-
94-
oceanProximity :: T.Text -> Double
95-
oceanProximity op = case op of
96-
"ISLAND" -> 0
97-
"NEAR OCEAN" -> 1
98-
"NEAR BAY" -> 2
99-
"<1H OCEAN" -> 3
100-
"INLAND" -> 4
101-
_ -> error ("Unknown ocean proximity value: " ++ T.unpack op)
10296
```
10397

10498
**Let's break this pipeline down:**
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
---
2+
layout: page
3+
title: "Using the current environment"
4+
category: getting_started
5+
date: 2026-11-23 20:42:38
6+
---
7+
8+
## Getting started
9+
10+
We recommend using **VS Code + Jupyter** as the default development stack for DataHaskell:
11+
- VS Code as your editor
12+
- Jupyter notebooks for literate, reproducible analysis
13+
- A Haskell notebook kernel (currently IHaskell)
14+
- The DataHaskell libraries (e.g. `dataframe`, `hasktorch`, plotting, etc.)
15+
16+
This page walks you through:
17+
18+
1. Installing the basic tools
19+
2. Choosing an environment (Dev Container vs local install)
20+
3. Verifying everything with a “hello DataHaskell” notebook
21+
22+
---
23+
24+
## 1. Install the basics
25+
26+
You only need to do this once per machine.
27+
28+
### 1.1. VS Code
29+
30+
1. Install **Visual Studio Code** from the official website.
31+
2. Open VS Code and install these extensions:
32+
- **Jupyter**
33+
- **Python** (used by the Jupyter extension, even if you write Haskell)
34+
- **Dev Containers** (if you plan to use the container-based environment)
35+
- **Haskell** (for syntax highlighting, type info, etc.)
36+
37+
### 1.2. Git
38+
39+
Install Git so you can clone repositories:
40+
41+
- macOS: via Homebrew (`brew install git`) or Xcode command line tools
42+
- Linux: via your package manager (e.g. `sudo apt install git`)
43+
- Windows: [Git for Windows] or via WSL (Ubuntu on Windows)
44+
45+
### 1.3. (Optional but recommended) Docker
46+
47+
If you want the easiest, most reproducible setup, install Docker:
48+
49+
- Docker Desktop (macOS/Windows) or
50+
- `docker` + `docker-compose` from your Linux distro
51+
52+
The Dev Container–based environment assumes Docker is available.
53+
54+
---
55+
56+
## 2. Choose an environment
57+
58+
You have **two main options**:
59+
60+
1. **Option A (recommended): VS Code Dev Container**
61+
Everything is pre-installed in a Docker image (GHC, Cabal/Stack, IHaskell, DataFrame, etc).
62+
63+
2. **Option B: Local installation**
64+
Install GHC, Cabal, Jupyter, IHaskell, and DataHaskell libraries directly on your machine.
65+
66+
If you’re not sure which to choose, pick **Option A**.
67+
68+
---
69+
70+
## 3. Option A – Dev Container (recommended)
71+
72+
This is the “batteries included” path. You get a pinned environment without polluting your global system.
73+
74+
### 3.1. Clone the starter repository
75+
76+
We provide a starter repository with a ready-made environment and example notebooks:
77+
78+
```bash
79+
git clone https://github.com/DataHaskell/datahaskell-starter
80+
cd datahaskell-starter
81+
```
82+
83+
### 3.2. Open the project in VS Code
84+
85+
```bash
86+
code .
87+
```
88+
89+
You'll get a popup asking if you want to re-ooen the project in a container.
90+
Select this option and VS Code will load the DataHaskell docker file.
91+
92+
### 3.3. Running the example notebook
93+
94+
Open the `getting-started` notebook. You'll see a section that says `Select Kernel` at the top right.
95+
96+
Upon clicking it you'll be asked to select a kernel. Go to `Jupyter Environment` and use the Haskell kernel installed there.
97+
98+
## 3. Option B – Installing everything locally
99+
100+
We recommend you use cabal for this section.
101+
102+
```bash
103+
cabal update
104+
cabal install --lib dataframe ihaskell-dataframe hasktorch \
105+
ihaskell dataframe-hasktorch ihaskell-dataframe time ihaskell template-haskell \
106+
vector text containers array random unix directory regex-tdfa containers \
107+
cassava statistics monad-bayes aeson \
108+
--force-reinstalls
109+
cabal install ihaskell --install-method=copy --installdir=/opt/bin
110+
ihaskell install --ghclib=$(ghc --print-libdir) --prefix=$HOME/.local/
111+
jupyter kernelspec install $HOME/.local/share/jupyter/kernels/haskell/
112+
jupyter notebook
113+
```
114+
115+
Check if this setup is working by trying out the linear regression tutorial from the DataHaskell website.
116+
117+
> Note this way of globally installing packages might break some of your existing projects.
118+

0 commit comments

Comments
 (0)