Skip to content

Commit c0ec32b

Browse files
authored
Merge pull request #3 from AlexCatarino/adds-tutorial-md
Adds Text to Tutorials
2 parents 86fcd2c + ba74a19 commit c0ec32b

1 file changed

Lines changed: 102 additions & 0 deletions

File tree

tutorial.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,103 @@
1+
# Tutorial - Create Your Own Data Source
12

3+
Implementing data sources is split into three parts:
4+
1. Creating the data source class ([`MyCustomDataType.cs`](https://github.com/QuantConnect/Lean.DataSource.SDK/blob/master/MyCustomDataType.cs))
5+
2. Creating data downloader/processor (`process.*`)
6+
3. Creating tests and a demonstration algorithm
7+
8+
## Prerequisites
9+
1. Fork this repository to your own GitHub profile
10+
2. Install [.NET 5.0 SDK](https://dotnet.microsoft.com/download/dotnet/5.0)
11+
12+
## Part 1: Setup C# Data Source Class
13+
1. Open the `MyCustomDataType.cs` file for editing
14+
2. Rename the class name `MyCustomDataType` to the data you'll be offering, starting with your vendor name (e.g. __`MyCompany`__`FlightData`)
15+
3. Remove the `SomeCustomProperty` property
16+
4. Add your dataset's fields/properties.
17+
* Add `[ProtoMember(n)]` to each field/property you add, where `n` starts at `10` and increments by `1` per field/property added
18+
5. Implement `GetSource(...)` to point to where your data lives
19+
* Replace `mycustomdatatype` with your vendor name (all lowercase), followed by the directory name where your data is in
20+
* Specify the file where your data is expected to be in
21+
* Use the `date` variable to get the date of data being requested
22+
* Use `config.Symbol.Value` to get the current ticker. Make sure that the ticker capitalization is correct. Default is uppercase.
23+
6. Implement `Reader(...)` to parse your data
24+
* Set `Symbol = config.Symbol` when creating the instance of the class
25+
* Set `EndTime` equal to the time the data first became available for consumption
26+
7. Implement `Clone()` to allow Lean to create copies of your data
27+
8. If your dataset is __NOT__ for equities data, Make `RequiresMapping()` return `false`, otherwise return `true`
28+
* See the [data sources related to equities](#subsection---data-sources-related-to-equities) section for more details
29+
9. Make `IsSparseData()` return `true`
30+
10. Make `DefaultResolution()` return the resolution of your data if the user does not specify a resolution
31+
11. Make `SupportedResolutions()` return the resolutions that your data supports
32+
12. Set the timezone that your data is saved as in `DataTimeZone()`
33+
13. (Optional) Implement `ToString()` to return pretty output
34+
14. Rename the file `MyCustomDataType.cs` to the name of the class contained within
35+
15. Open the `QuantConnect.DataSource.csproj` file for editing
36+
16. Add `<AssemblyName>QuantConnect.DataSource.{{dataSourceClassName}}</AssemblyName>` below `<RootNamespace>QuantConnect.DataSource</RootNamespace>`
37+
* Replace `{{dataSourceClassName}}` with the name of the class you implemented
38+
39+
## Part 2: Setup Downloading/Processing Script
40+
1. Create one of the following files to download/process your data:
41+
* Python: [`process.py`](https://github.com/QuantConnect/Lean.DataSource.SDK/blob/master/process.sample.py)
42+
* Bash: [`process.sh`](https://github.com/QuantConnect/Lean.DataSource.SDK/blob/master/process.sample.sh)
43+
* Jupyter Notebook: [`process.ipynb`](https://github.com/QuantConnect/Lean.DataSource.SDK/blob/master/process.sample.ipynb)
44+
45+
2. In `process.*`, output your processed/final data to: `/temp-output-directory/alternative/{{vendorName}}/{{dataSourceName}}/`
46+
* Replace `{{vendorName}}` with your vendor name (e.g. `quantconnect`)
47+
* Replace `{{dataSourceType}}` with the name of your data (e.g. `corporate-flights`)
48+
* Path should be completely lowercase, unless absolutely required
49+
* Do not use special characters in your output path (prefer `-` over `_` in directories, and `_` over `-` for file names)
50+
* __Output should be in CSV format__ (comma delimited)
51+
* Example output directory: `/temp-output-directory/alternative/quantconnect/fred`
52+
* Example output file: `/temp-output-directory/alternative/quantconnect/fred/oecdrecd.csv`
53+
54+
3. If you are processing data that is associated with stocks/equities, review the [data sources related to equities](#subsection---data-sources-related-to-equities) section
55+
56+
## Part 3: Setup Testing and Demonstration Algorithm
57+
1. Edit [`Demonstration.cs`](https://github.com/QuantConnect/Lean.DataSource.SDK/blob/master/Demonstration.cs) and create an example of how to load and use your data
58+
* Rename the algorithm class name to the name of the class created in part 1
59+
* The algorithm should be very simple and minimal
60+
2. Open the [`tests/MyCustomDataTypeTests.cs`](https://github.com/QuantConnect/Lean.DataSource.SDK/blob/master/tests/MyCustomDataTypeTests.cs) file for editing
61+
3. Scroll to the bottom of the code and make `CreateNewInstance()` return your new data type
62+
* Data can be fake data, it doesn't have to be real
63+
* Set all fields/properties of your class when creating your new data type
64+
65+
4. Ensure that tests are passing. Run the following commands in order to check for test status:
66+
* `dotnet build tests/Tests.csproj`
67+
* `dotnet test tests/bin/Debug/net5.0/Tests.dll`
68+
69+
5. Rename `tests/MyCustomDataTypeTests.cs` to the name of the class you created in part 1, ending with "Tests.cs"
70+
71+
# Subsection - Data Sources Related to Equities
72+
73+
Your data source is related to equities whenever the following is true:
74+
* The data source describes data about a specific equity Symbol, e.g. AAPL
75+
* The data source is directly linked to the equity, i.e. if my data source describes data for AAPL, then this data __only__ applies to the AAPL equity Symbol
76+
77+
For equity related data sources, update `RequiresMapping()` to return `true` in the data source class you created in part 1
78+
79+
(Note: ticker `WW` is used for example purposes)
80+
81+
If your source/raw data is "point in time", then no further special handling is required. Example:
82+
* Ticker name as of today (2021-06-24) is `WW`
83+
* Ticker `WTW` was renamed to `WW` on 2019-04-19
84+
* Data before 2019-04-19 has ticker `WTW`, not `WW`
85+
86+
Otherwise, you'll need to use QuantConnect data to get the ticker's previous name at a given point in time.
87+
88+
To do so, follow the steps below (Python/Jupyter Notebooks only):
89+
90+
1. Import required classes:
91+
* `from QuantConnect.Data.Auxiliary import *`
92+
* `from QuantConnect import *`
93+
2. Create a MapFileResolver instance:
94+
* `resolver = MapFileResolver.Create(Globals.DataFolder, Market.USA)`
95+
3. For each ticker you encounter, resolve the map file, and provide the current time:
96+
* `map_file = resolver.ResolveMapFile('WW', datetime.now())`
97+
4. Get the ticker symbol for the date provided. Provide the time of the data you're processing that contains the ticker
98+
* `data_time = datetime(2018, 1, 1)`
99+
* `ticker = map_file.GetMappedSymbol(data_time)`
100+
5. (Optional) If you need a Symbol, you can create one:
101+
* `first_date = map_file.FirstDate`
102+
* `symbol = Symbol(SecurityIdentifier.GenerateEquity(first_date, ticker, Market.USA), ticker)`
103+
* `symbol` should now represent `WTW`

0 commit comments

Comments
 (0)