|
| 1 | +# Tutorial - Create Your Own Data Source |
1 | 2 |
|
| 3 | +Implementing data sources is split into three parts: |
| 4 | + 1. Creating the data source class ([`MyCustomDataType.cs`](https://github.com/QuantConnect/Lean.DataSource.SDK/blob/master/MyCustomDataType.cs)) |
| 5 | + 2. Creating data downloader/processor (`process.*`) |
| 6 | + 3. Creating tests and a demonstration algorithm |
| 7 | + |
| 8 | +## Prerequisites |
| 9 | + 1. Fork this repository to your own GitHub profile |
| 10 | + 2. Install [.NET 5.0 SDK](https://dotnet.microsoft.com/download/dotnet/5.0) |
| 11 | + |
| 12 | +## Part 1: Setup C# Data Source Class |
| 13 | + 1. Open the `MyCustomDataType.cs` file for editing |
| 14 | + 2. Rename the class name `MyCustomDataType` to the data you'll be offering, starting with your vendor name (e.g. __`MyCompany`__`FlightData`) |
| 15 | + 3. Remove the `SomeCustomProperty` property |
| 16 | + 4. Add your dataset's fields/properties. |
| 17 | + * Add `[ProtoMember(n)]` to each field/property you add, where `n` starts at `10` and increments by `1` per field/property added |
| 18 | + 5. Implement `GetSource(...)` to point to where your data lives |
| 19 | + * Replace `mycustomdatatype` with your vendor name (all lowercase), followed by the directory name where your data is in |
| 20 | + * Specify the file where your data is expected to be in |
| 21 | + * Use the `date` variable to get the date of data being requested |
| 22 | + * Use `config.Symbol.Value` to get the current ticker. Make sure that the ticker capitalization is correct. Default is uppercase. |
| 23 | + 6. Implement `Reader(...)` to parse your data |
| 24 | + * Set `Symbol = config.Symbol` when creating the instance of the class |
| 25 | + * Set `EndTime` equal to the time the data first became available for consumption |
| 26 | + 7. Implement `Clone()` to allow Lean to create copies of your data |
| 27 | + 8. If your dataset is __NOT__ for equities data, Make `RequiresMapping()` return `false`, otherwise return `true` |
| 28 | + * See the [data sources related to equities](#subsection---data-sources-related-to-equities) section for more details |
| 29 | + 9. Make `IsSparseData()` return `true` |
| 30 | + 10. Make `DefaultResolution()` return the resolution of your data if the user does not specify a resolution |
| 31 | + 11. Make `SupportedResolutions()` return the resolutions that your data supports |
| 32 | + 12. Set the timezone that your data is saved as in `DataTimeZone()` |
| 33 | + 13. (Optional) Implement `ToString()` to return pretty output |
| 34 | + 14. Rename the file `MyCustomDataType.cs` to the name of the class contained within |
| 35 | + 15. Open the `QuantConnect.DataSource.csproj` file for editing |
| 36 | + 16. Add `<AssemblyName>QuantConnect.DataSource.{{dataSourceClassName}}</AssemblyName>` below `<RootNamespace>QuantConnect.DataSource</RootNamespace>` |
| 37 | + * Replace `{{dataSourceClassName}}` with the name of the class you implemented |
| 38 | + |
| 39 | +## Part 2: Setup Downloading/Processing Script |
| 40 | + 1. Create one of the following files to download/process your data: |
| 41 | + * Python: [`process.py`](https://github.com/QuantConnect/Lean.DataSource.SDK/blob/master/process.sample.py) |
| 42 | + * Bash: [`process.sh`](https://github.com/QuantConnect/Lean.DataSource.SDK/blob/master/process.sample.sh) |
| 43 | + * Jupyter Notebook: [`process.ipynb`](https://github.com/QuantConnect/Lean.DataSource.SDK/blob/master/process.sample.ipynb) |
| 44 | + |
| 45 | + 2. In `process.*`, output your processed/final data to: `/temp-output-directory/alternative/{{vendorName}}/{{dataSourceName}}/` |
| 46 | + * Replace `{{vendorName}}` with your vendor name (e.g. `quantconnect`) |
| 47 | + * Replace `{{dataSourceType}}` with the name of your data (e.g. `corporate-flights`) |
| 48 | + * Path should be completely lowercase, unless absolutely required |
| 49 | + * Do not use special characters in your output path (prefer `-` over `_` in directories, and `_` over `-` for file names) |
| 50 | + * __Output should be in CSV format__ (comma delimited) |
| 51 | + * Example output directory: `/temp-output-directory/alternative/quantconnect/fred` |
| 52 | + * Example output file: `/temp-output-directory/alternative/quantconnect/fred/oecdrecd.csv` |
| 53 | + |
| 54 | + 3. If you are processing data that is associated with stocks/equities, review the [data sources related to equities](#subsection---data-sources-related-to-equities) section |
| 55 | + |
| 56 | +## Part 3: Setup Testing and Demonstration Algorithm |
| 57 | + 1. Edit [`Demonstration.cs`](https://github.com/QuantConnect/Lean.DataSource.SDK/blob/master/Demonstration.cs) and create an example of how to load and use your data |
| 58 | + * Rename the algorithm class name to the name of the class created in part 1 |
| 59 | + * The algorithm should be very simple and minimal |
| 60 | + 2. Open the [`tests/MyCustomDataTypeTests.cs`](https://github.com/QuantConnect/Lean.DataSource.SDK/blob/master/tests/MyCustomDataTypeTests.cs) file for editing |
| 61 | + 3. Scroll to the bottom of the code and make `CreateNewInstance()` return your new data type |
| 62 | + * Data can be fake data, it doesn't have to be real |
| 63 | + * Set all fields/properties of your class when creating your new data type |
| 64 | + |
| 65 | + 4. Ensure that tests are passing. Run the following commands in order to check for test status: |
| 66 | + * `dotnet build tests/Tests.csproj` |
| 67 | + * `dotnet test tests/bin/Debug/net5.0/Tests.dll` |
| 68 | + |
| 69 | + 5. Rename `tests/MyCustomDataTypeTests.cs` to the name of the class you created in part 1, ending with "Tests.cs" |
| 70 | + |
| 71 | +# Subsection - Data Sources Related to Equities |
| 72 | + |
| 73 | +Your data source is related to equities whenever the following is true: |
| 74 | + * The data source describes data about a specific equity Symbol, e.g. AAPL |
| 75 | + * The data source is directly linked to the equity, i.e. if my data source describes data for AAPL, then this data __only__ applies to the AAPL equity Symbol |
| 76 | + |
| 77 | +For equity related data sources, update `RequiresMapping()` to return `true` in the data source class you created in part 1 |
| 78 | + |
| 79 | +(Note: ticker `WW` is used for example purposes) |
| 80 | + |
| 81 | +If your source/raw data is "point in time", then no further special handling is required. Example: |
| 82 | + * Ticker name as of today (2021-06-24) is `WW` |
| 83 | + * Ticker `WTW` was renamed to `WW` on 2019-04-19 |
| 84 | + * Data before 2019-04-19 has ticker `WTW`, not `WW` |
| 85 | + |
| 86 | +Otherwise, you'll need to use QuantConnect data to get the ticker's previous name at a given point in time. |
| 87 | + |
| 88 | +To do so, follow the steps below (Python/Jupyter Notebooks only): |
| 89 | + |
| 90 | + 1. Import required classes: |
| 91 | + * `from QuantConnect.Data.Auxiliary import *` |
| 92 | + * `from QuantConnect import *` |
| 93 | + 2. Create a MapFileResolver instance: |
| 94 | + * `resolver = MapFileResolver.Create(Globals.DataFolder, Market.USA)` |
| 95 | + 3. For each ticker you encounter, resolve the map file, and provide the current time: |
| 96 | + * `map_file = resolver.ResolveMapFile('WW', datetime.now())` |
| 97 | + 4. Get the ticker symbol for the date provided. Provide the time of the data you're processing that contains the ticker |
| 98 | + * `data_time = datetime(2018, 1, 1)` |
| 99 | + * `ticker = map_file.GetMappedSymbol(data_time)` |
| 100 | + 5. (Optional) If you need a Symbol, you can create one: |
| 101 | + * `first_date = map_file.FirstDate` |
| 102 | + * `symbol = Symbol(SecurityIdentifier.GenerateEquity(first_date, ticker, Market.USA), ticker)` |
| 103 | + * `symbol` should now represent `WTW` |
0 commit comments