Data API
The Data module provides comprehensive functionality for managing market data in Planar. It handles OHLCV (Open, High, Low, Close, Volume) data storage, retrieval, and manipulation using efficient storage formats like Zarr and LMDB.
Overview
The Data module is responsible for:
- OHLCV data storage and retrieval
- Data persistence using Zarr format for large datasets
- LMDB key-value storage for fast access
- Data validation and integrity checking
- Efficient data structures for time series analysis
Core Data Structures
OHLCV Data Access
Usage Examples
Data Loading and Storage
Primary Data Functions
Advanced Data Operations
Data Persistence
Zarr Storage
Planar uses Zarr format for efficient storage of large time series datasets:
LMDB Key-Value Storage
For fast metadata and configuration storage:
Data Validation and Integrity
Data Quality Checks
Data Cleaning
DataFrame Integration
Working with DataFrames
Performance Optimization
Efficient Data Access Patterns
Memory Management
Data Streaming and Updates
Real-time Data Updates
Common Data Patterns
Moving Averages
Price Analysis
Complete API Reference
Data.OHLCV_CHUNK_SIZE — ConstantDefault ZArray chunk size.
Data.OHLCV_COLUMNS — ConstantColumns for OHLCV data: timestamp, open, high, low, close, volume
Data.OHLCV_COLUMNS_COUNT — ConstantCount of OHLCV_COLUMNS
Data.OHLCV_COLUMNS_NOV — ConstantOnly the OHLC columns of OHLCV_COLUMNS
Data.OHLCV_COLUMNS_TS — ConstantThe timestamp column of OHLCV_COLUMNS
Data.compressor — ConstantDefault zarr compressor used in the module (zstd, clevel=2).
Data.Candle — TypeA struct representing a candlestick in financial trading.
timestampopenhighlowclosevolume
Candle{T} is a parametric struct that represents a candlestick with generic type T, which must be a subtype of AbstractFloat.
Data.DictView6 — TypeA view into a dictionary (a subset of keys).
dkeys
Data.EventTrace — TypeEventTrace structure for managing event data.
lock_buf_zi_arr_cachefreqlast_flush
Represents a collection of events with caching capabilities. It is designed to efficiently handle large datasets by caching event data in memory. The structure includes a ZarrInstance for data storage, a ZArray for data access, a cache for temporary storage, a frequency for event timing, and an index for the next event.
Data.LMDBDictStore — TypeLMDBDictStore is a concrete implementation of the AbstractDictStore interface.
LMDBDictStore represents a dictionary-like data store that uses LMDB as its backend. It is a subtype of AbstractDictStore defined in the Zarr package.
LMDBDictStore has the following fields:
a: An instance of LMDBDict that represents the LMDB database.lock: A ReentrantLock used for thread-safety.
LMDBDictStore can be created using the LMDBDictStore constructor function. It takes the following arguments:
path::AbstractString: The path to the LMDB database.reset::Bool=false: Iftrue, the LMDB database at the given path will be deleted and recreated.mapsize::Int=64MB: The maximum size of the LMDB database.
LMDBDictStore implements the AbstractDictStore interface, which provides methods for reading and writing data to the store.
Data.OHLCVTuple — TypeSimilar to a StructArray (and should probably be replaced by it), used for fast conversion.
Data.PairData — TypePairData is a low level struct, to attach some metadata to a ZArray. (deprecated)"
nametfdataz
Instead of constructing a PairData, directly use the OHLCV DataFrame to hold the pair information and the ZArray itself.
Data.TimeFrameError — TypeA custom exception representing a time frame error.
firstlasttd
Data.ZarrInstance — TypeCandles data is stored with hierarchy PAIR -> [TIMEFRAMES...]. A pair is a ZGroup, a timeframe is a ZArray.
pathstoregroup
Base.Filesystem.rm — MethodRemove all lmdb files associated with an LMDBDict object.
rm(d::LMDB.LMDBDict) -> Union{Nothing, Bool}
This function removes all lmdb files associated with the given LMDBDict object. It deletes the lmdb database and all associated files.
Base.delete! — MethodDelete paths from an LMDBDictStore.
delete!(
store::Data.LMDBDictStore,
paths::AbstractString...;
recursive
) -> Union{Nothing, LMDB.LMDBDict}
This function deletes the specified paths from an LMDBDictStore. It supports deleting paths recursively if the recursive parameter is set to true.
Base.delete! — MethodDelete an element from a DirectoryStore. Also removes the directory.
delete!(
store::Zarr.DirectoryStore,
paths::String...;
recursive
) -> Union{Nothing, Bool}
Base.delete! — MethodDelete the ZArray from the underlying storage.
delete!(z::Zarr.ZArray; ok) -> Union{Nothing, Bool}
Base.delete! — MethodDelete an element from a ZGroup. If the element is a group, it will be recursively deleted.
Base.empty! — MethodEmpty an LMDBDict object.
empty!(d::LMDB.LMDBDict) -> Int32
This function empties an LMDBDict object by dropping the lmdb database and syncing the environment.
Base.empty! — MethodResizes a ZArray to zero.
empty!(z::Zarr.ZArray) -> Zarr.ZArray
Base.empty! — MethodRemoves all arrays and groups from a ZGroup.
empty!(g::Zarr.ZGroup)
Base.isempty — MethodA ZArray is empty if its size is 0.
isempty(z::Zarr.ZArray) -> Bool
Base.unique! — MethodRemove duplicate from a zarray.
In a 2d zarray where we want values where the second column is unique:
unique!(x->x[2], z)Data._check_contiguity — MethodCheck the contiguity of timestamps between data and saved data.
_check_contiguity(
data_first_ts::AbstractFloat,
data_last_ts::AbstractFloat,
saved_first_ts::AbstractFloat,
saved_last_ts::AbstractFloat,
td
) -> Bool
Used to check the contiguity of timestamps between the data and saved data. It takes in the first and last timestamps of the data (data_first_ts and data_last_ts) and the first and last timestamps of the saved data (saved_first_ts and saved_last_ts). Typically used as a helper function within the context of saving or loading OHLCV data to ensure the contiguity of timestamps.
Data._get_zarray — MethodGet a ZArray object from a ZarrInstance.
_get_zarray(
zi::Data.ZarrInstance,
key::AbstractString,
sz::Tuple;
type,
overwrite,
reset
)
This function is used to retrieve a ZArray object from a ZarrInstance. It takes in the ZarrInstance, key, size, and other optional parameters and returns the ZArray object.
Data._load_ohlcv — MethodLoad ohlcv pair data from zarr instance. za: The zarr array holding the data key: the name of the array to load from the zarr instance (in the format exchange/timeframe/pair) td: the timeframe (as integer in milliseconds) of the target ohlcv table to be loaded from, to: date range
Data.candleat — MethodGet the candle at given date from a ohlcv dataframe as a Candle.
candleat(
df::DataFrames.AbstractDataFrame,
date::Dates.DateTime;
return_idx
) -> Union{Data.Candle{Float64}, Tuple{Data.Candle{Float64}, Any}}
Data.candleavl — MethodFetch the candle expected to be available at a specific date and time frame from an OHLCV DataFrame.
candleavl(
df::DataFrames.AbstractDataFrame,
tf::TimeFrames.TimeFrame,
date
) -> Union{Data.Candle{Float64}, Tuple{Data.Candle{Float64}, Any}}
The available candle is usually the candle that is date-wise left adjacent to the requested date.
Data.candlelast — MethodGet the last candle from a ohlcv dataframe as a Candle.
candlelast(
df::DataFrames.AbstractDataFrame
) -> Data.Candle{Float64}
Data.candlepair — MethodSame as candleat but also fetches the previous candle, returning a Tuple{Candle, Candle}.
Data.check_data — MethodCheck the size of data against a ZArray.
check_data(data, arr::Zarr.ZArray)
Used to check the size of data against a ZArray arr. It takes in the data and the ZArray arr as input. Compares the size of the data with the size of the ZArray. If the sizes do not match, it raises a SizeMismatchError.
Data.chunksize — MethodChoose chunk size depending on size of data with a predefined split (e.g. 1/100), padding to the nearest power of 2.
chunksize(data; parts, def) -> Tuple{Any, Vararg{Any}}
Data.closeat — MethodSee @candleat.
Data.closeavl — MethodSee @candleavl
Data.closelast — MethodSee @candlelast
Data.contiguous_ts — MethodCheck if a time series is contiguous based on a specified timeframe.
contiguous_ts(
series,
timeframe::AbstractString;
raise,
return_date
) -> Union{Bool, Tuple{Bool, Any, Any}}
This function is used to check if a time series is contiguous based on a specified timeframe. It takes in the series as the input time series and the timeframe as a string representing the timeframe (e.g., "1h", "1d"). Optional parameters raise and return_date can be specified to customize the behavior of the function.
raise: A flag indicating whether to raise aTimeFrameErrorif the time series is not contiguous. Default istrue.return_date: A flag indicating whether to return the first non-contiguous date found in the time series. Default isfalse.
Data.default_value — MethodGet the default value of a given type t.
default_value(t::Type) -> Data.Candle{Float64}
This function returns the default value of the specified type t.
Data.df! — MethodConstruct a DataFrame without copying.
Data.empty_ohlcv — MethodAn empty OHLCV dataframe.
Data.get_zgroup — MethodGet the root group of a store.
get_zgroup(
store::Zarr.AbstractStore
) -> Union{Zarr.ZArray, Zarr.ZGroup}
Data.highat — MethodSee @candleat.
Data.highavl — MethodSee @candleavl
Data.highlast — MethodSee @candlelast
Data.key_path — MethodThe full key of the data stored for the (exchange, pair, timeframe) combination.
key_path(exc_name, pair, timeframe) -> String
Data.load — MethodLoad a pair ohlcv data from storage. as_z: returns the ZArray
Data.load_data — MethodLoad data from zarr instance.
load_data(
zi::Data.ZarrInstance,
key;
serialized,
kwargs...
) -> Any
zi: The zarr instance to usekey: the name of the array to load from the zarr instance (full key path).type: Set to the type that zarr should use to store the data (only bits types). [Float64].sz: The chunks tuple which should match the shape of the already saved data.from: The starting index to load the data from. Default is an empty string, indicating no specific starting index.to: The ending index to load the data up to. Default is an empty string, indicating no specific ending index.z_col: The column in the Zarr array to load the data from. Default is1.type: The type of the data to be loaded. Default isFloat64.serialized: A flag indicating whether the data is serialized. Default isfalse. Iftrue,typeis ignored.as_z: A flag indicating whether to return the loaded data as a ZArray. Default isfalse.with_z: A flag indicating whether to return the loaded data along with the Zarr array (as tuple). Default isfalse.
Data.load_ohlcv — MethodLoad OHLCV data from a ZarrInstance.
load_ohlcv(
zi::Data.ZarrInstance,
exc_name::AbstractString,
pairs,
timeframe;
raw,
kwargs...
) -> Union{Dict{String, Zarr.ZArray}, Dict{String, Data.PairData}}
raw: A flag indicating whether to return the raw data or process it into an OHLCV format. Default isfalse.from: The starting timestamp (inclusive) for loading data. Default is an empty string, indicating loading from the beginning of the ZArray.to: The ending timestamp (exclusive) for loading data. Default is an empty string, indicating loading until the end of the ZArray.saved_col: The column index of the timestamp data in the ZArray. Default is 1.as_z: A flag indicating whether to return the loaded data as a ZArray. Default isfalse.with_z: A flag indicating whether to return the loaded data along with the ZArray object. Default isfalse.
This function is used to load OHLCV data from a ZarrInstance. It takes in the ZarrInstance zi, the exchange name exc_name, the currency pairs pairs, and the timeframe. Optional parameters raw and kwargs can be specified to customize the loading process.
Data.lowat — MethodSee @candleat.
Data.lowavl — MethodSee @candleavl
Data.lowlast — MethodSee @candlelast
Data.ohlcvtuple — MethodDefault OHLCVTuple value.
Data.openat — MethodSee @candleat.
Data.openavl — MethodSee @candleavl
Data.openlast — MethodSee @candlelast
Data.save_data — MethodSave data to a ZarrInstance with additional options.
save_data(
zi::Data.ZarrInstance,
key,
data;
serialize,
data_col,
kwargs...
) -> Zarr.ZArray
type: The type of the data to be saved. Default isFloat64.data_col: The column of the data to be saved. Default is1.z_col: The column in the Zarr array to save the data. Default is the same asdata_col.overwrite: A flag indicating whether to overwrite existing data at the specified key. Default istrue.reset: A flag indicating whether to reset the Zarr array before saving the data. Default isfalse.chunk_size: The size of the chunks to use when saving the data. Default isnothing, indicating auto-chunking.
Only dates seriality is ensured, not contiguity (unlike save_ohlcv) It creates a new array if needed, sets the chunk size if specified.
Data.save_ohlcv — MethodSave OHLCV data to a ZArray.
save_ohlcv(
zi::Data.ZarrInstance,
exc_name,
pair,
timeframe,
data;
kwargs...
) -> Union{Nothing, Zarr.ZArray}
data_col: The column index of the timestamp data in the inputdata. Default is 1.saved_col: The column index of the timestamp data in the existing data. Default is equal todata_col.type: The primitive type used for storing the data. Default isFloat64.existing: A flag indicating whether existing data should be considered during the save operation. Default istrue.overwrite: A flag indicating whether existing data should be overwritten during the save operation. Default istrue.reset: A flag indicating whether the ZArray should be reset before saving the data. Default isfalse.check::bounds(default) only checks that new data is adjacent to previous data.:allchecks full contiguity of previous and new data.:noneor anything else, no checks are done.
The saveohlcv function saves OHLCV data to a ZArray. It performs checks on the input data and existing data (if applicable) to ensure contiguity and validity. If the checks pass, it calculates the offset based on the time difference between the first timestamps of the new and existing data. Then, it updates the ZArray with the new data starting at the calculated offset. The function provides various optional parameters to customize the save operation, such as handling existing data, overwriting, resetting, and performing checks.
Data.snakecased — MethodNormalizes or special characthers separators to _.
snakecased(pair::AbstractString) -> Any
Data.stub! — MethodA stub! function usually fills a container with readily available data.
Data.to_ohlcv — MethodConvert raw ccxt OHLCV data (matrix) to a dataframe.
Data.to_ohlcv — MethodConstruct an OHLCV dataframe backed by an OHLCVTuple.
Data.to_ohlcv — MethodConvert data to OHLCV format.
to_ohlcv(
data::AbstractVector{Data.Candle},
timeframe::TimeFrames.TimeFrame
) -> Any
This function converts the input data to the OHLCV (Open, High, Low, Close, Volume) format, using the specified timeframe. It returns the converted data as a DataFrame.
Data.tobytes — MethodConvert a value data to its byte representation.
tobytes(data) -> Vector{UInt8}
This function converts the input value data to its byte representation.
Data.todata — MethodConvert a byte array bytes to its original data representation.
todata(bytes) -> Any
This function converts the input byte array bytes back to its original data representation.
Data.volumeat — MethodSee @candleat.
Data.volumeavl — MethodSee @candleavl
Data.volumelast — MethodSee @candlelast
Data.zdelete! — MethodDelete elements from a ZArray z within a specified date range.
zdelete!(
z::Zarr.ZArray,
from_dt::Union{Nothing, Dates.DateTime},
to_dt::Union{Nothing, Dates.DateTime};
by,
select,
serialized,
buffer
)
This function deletes elements from a ZArray z that fall within the specified date range. The range is defined by from_dt (inclusive) and to_dt (exclusive). The deletion is performed in place.
The by argument is optional and defaults to the identity function. It specifies the function used to extract the date value from each element of the ZArray. The select argument is optional and defaults to a function that selects the first column of each element in the ZArray. It specifies the function used to select the relevant portion of each element for deletion. The serialized argument is optional and defaults to false. If set to true, the ZArray is assumed to be serialized, and the deletion is performed on the serialized representation. The buffer argument is optional and can be used to provide an IOBuffer for intermediate storage during deletion.
Data.zilmdb — FunctionCreate a ZarrInstance at specified path using lmdb as backend.
zilmdb(; ...) -> Data.ZarrInstance
zilmdb(path::AbstractString; force) -> Data.ZarrInstance
This function creates a ZarrInstance object at the specified path using lmdb as the backend. It has an optional parameter 'force' to reset the underlying store.
Data.@as_mat — MacroRedefines given variable to a Matrix with type of the underlying container type.
Data.@candleat — MacroGet the candle value at a specific date from an OHLCV DataFrame.
This function returns the requested value at the specified date from the input OHLCV DataFrame. The optional parameter return_idx determines whether to also return the index of the opening price.
Data.@candleavl — MacroFetch the candle value expected to be available at a specific date and time frame from an OHLCV DataFrame.
The available candle is usually the candle that is date-wise left adjacent to the requested date.
Data.@candlelast — MacroGet the last candle value from an OHLCV DataFrame (df).
Data.@check_td — MacroCheck the time delta between two rows in a DataFrame.
This macro is used to check the time delta between two DataFrame to ensure they are of the same time delta. It throws a TimeFrameError if the time delta does not match the specified time delta value. If no args are provided, the macro uses the za value as the default data to check.
Data.@checkkey — MacroMacro for checking if a key exists in a DictView.
This macro checks if a given key is present in the keys field of the DictView (d).
Data.@to_mat — MacroSame as as_mat but returns the new matrix.
Data.@zcreate — MacroCreate a ZArray using the zcreate macro.
This macro is used to create a ZArray object. It provides a convenient syntax for creating and initializing a ZArray with the specified elements. It's a dirty macro. Uses existing variables:
type: eltype of the array.key: path of the array.sz: size of the array.zi: ZarrInstance object.
See Also
- Data Management Guide - Complete guide to working with market data
- Processing API - Data processing and transformation functions
- DFUtils API - DataFrame manipulation utilities
- Engine API - Core execution engine functions
- Fetch API - Data fetching and retrieval utilities