API Documentation
We highly recommend that newcomers walk through the Onda Tour before diving into this reference documentation.
Support For Generic Path-Like Types
Onda.jl attempts to be as agnostic as possible with respect to the storage system that sample data, Arrow files, etc. are read from/written to. As such, any path-like argument accepted by an Onda.jl API function should generically "work" as long as the argument's type supports:
Base.read(path)::Vector{UInt8}
(return the bytes stored atpath
)Base.write(path, bytes::Vector{UInt8})
(writebytes
to the location specified bypath
)
For backends which support direct byte range access (e.g. S3), Onda.read_byte_range
may be overloaded for the backend's corresponding path type to enable further optimizations:
Onda.read_byte_range
— Functionread_byte_range(path, byte_offset, byte_count)
Return the equivalent read(path)[(byte_offset + 1):(byte_offset + byte_count)]
, but try to avoid reading unreturned intermediate bytes. Note that the effectiveness of this method depends on the type of path
.
onda.annotation
Onda.AnnotationV1
— Type@version AnnotationV1 begin
+ recording::UUID
+ id::UUID
+ span::TimeSpan
+end
A Legolas-generated record type representing an onda.annotation
as described by the Onda Format Specification.
See https://github.com/beacon-biosignals/Legolas.jl for details regarding Legolas record types.
Onda.MergedAnnotationV1
— Type@version MergedAnnotationV1 > AnnotationV1 begin
+ from::Vector{UUID}
+end
A Legolas-generated record type representing an annotation derived from "merging" one or more existing annotations.
This record type extends AnnotationV1
with a single additional required field, from::Vector{UUID}
, whose entries are the id
s of the annotation's source annotation(s).
See https://github.com/beacon-biosignals/Legolas.jl for details regarding Legolas record types.
Onda.merge_overlapping_annotations
— Functionmerge_overlapping_annotations([predicate=TimeSpans.overlaps,] annotations)
Given the onda.annotation@1
-compliant table annotations
, return a Vector{MergedAnnotationV1}
where "overlapping" consecutive entries of annotations
have been merged using TimeSpans.shortest_timespan_containing
.
Two consecutive annotations a
and b
are determined to be "overlapping" if a.recording == b.recording && predicate(a.span, b.span)
. Merged annotations' span
fields are generated via calling TimeSpans.shortest_timespan_containing
on the overlapping set of source annotations.
Note that every annotation in the returned table has a freshly generated id
field and a non-empty from
field. An output annotation whose from
field only a contains a single element corresponds to an individual non-overlapping annotation in the provided annotations
.
Note that this function internally works with Tables.columns(annotations)
rather than annotations
directly, so it may be slower and/or require more memory if !Tables.columnaccess(annotations)
.
See also TimeSpans.merge_spans
for similar functionality on generic time spans (instead of annotations).
onda.signal
Onda.SignalV2
— Type@version SignalV2 > SamplesInfoV2 begin
+ recording::UUID
+ file_path::(<:Any)
+ file_format::String
+ span::TimeSpan
+ sensor_label::String
+ sensor_type::String
+ channels::Vector{String}
+ sample_unit::String
+end
A Legolas-generated record type representing an onda.signal
as described by the Onda Format Specification.
Note that some fields documented as required fields of onda.signal@2
in the Onda Format Specification are captured via this schema version's extension of SamplesInfoV2
.
See https://github.com/beacon-biosignals/Legolas.jl for details regarding Legolas record types.
Onda.SamplesInfoV2
— Type@version SamplesInfoV2 begin
+ sensor_type::String
+ channels::Vector{String}
+ sample_unit::String
+ sample_resolution_in_unit::Float64
+ sample_offset_in_unit::Float64
+ sample_type::String = onda_sample_type_from_julia_type(sample_type)
+ sample_rate::Float64
+end
A Legolas-generated record type representing the bundle of onda.signal
fields that are intrinsic to a signal's sample data, leaving out extrinsic file or recording information. This is useful when the latter information is irrelevant or does not yet exist (e.g. if sample data is being constructed/manipulated in-memory without yet having been serialized).
See https://github.com/beacon-biosignals/Legolas.jl for details regarding Legolas record types.
Onda.channel
— Methodchannel(x, name)
Return i
where x.channels[i] == name
.
Onda.channel
— Methodchannel(x, i::Integer)
Return x.channels[i]
.
Onda.channel_count
— Methodchannel_count(x)
Return length(x.channels)
.
Onda.sample_count
— Methodsample_count(x, duration::Period)
Return the number of multichannel samples that fit within duration
given x.sample_rate
.
Onda.sizeof_samples
— Methodsizeof_samples(x, duration::Period)
Returns the expected size (in bytes) of an encoded Samples
object corresponding to x
and duration
:
sample_count(x, duration) * channel_count(x) * sizeof(x.sample_type)
Onda.sample_type
— Methodsample_type(x)
Return x.sample_type
as an Onda.LPCM_SAMPLE_TYPE_UNION
subtype. If x.sample_type
is an Onda-specified sample_type
string (e.g. "int16"
), it will be converted to the corresponding Julia type. If x.sample_type <: Onda.LPCM_SAMPLE_TYPE_UNION
, this function simply returns x.sample_type
as-is.
Samples
Onda.Samples
— TypeSamples(data::AbstractMatrix, info::SamplesInfoV2, encoded::Bool;
+ validate::Bool=Onda.VALIDATE_SAMPLES_DEFAULT[])
Return a Samples
instance with the following fields:
data::AbstractMatrix
: A matrix of sample data. Thei
th row of the matrix corresponds to thei
th channel ininfo.channels
, while thej
th column corresponds to thej
th multichannel sample.info::SamplesInfoV2
: TheSamplesInfoV2
-compliant value that describes theSamples
instance.encoded::Bool
: Iftrue
, the values indata
are LPCM-encoded as prescribed by theSamples
instance'sinfo
. Iffalse
, the values indata
have been decoded into theinfo
's canonical units.
If validate
is true
, Onda.validate_samples
is called on the constructed Samples
instance before it is returned.
Note that getindex
and view
are defined on Samples
to accept normal integer indices, but also accept channel names or a regex to match channel names for row indices, and TimeSpan
values for column indices; see Onda/examples/tour.jl
for a comprehensive set of indexing examples.
Note also that "slices" copied from s::Samples
via getindex(s, ...)
may alias s.info
in order to avoid excessive overhead. This means one should generally avoid directly mutating s.info
, especially s.info.channels
.
Base.:==
— Method==(a::Samples, b::Samples)
Returns a.encoded == b.encoded && a.info == b.info && a.data == b.data
.
Onda.channel
— Functionchannel(x, name)
Return i
where x.channels[i] == name
.
channel(x, i::Integer)
Return x.channels[i]
.
channel(samples::Samples, name)
Return channel(samples.info, name)
.
This function is useful for indexing rows of samples.data
by channel names.
channel(samples::Samples, i::Integer)
Return channel(samples.info, i)
.
Onda.channel_count
— Functionchannel_count(x)
Return length(x.channels)
.
channel_count(samples::Samples)
Return channel_count(samples.info)
.
Onda.sample_count
— Functionsample_count(x, duration::Period)
Return the number of multichannel samples that fit within duration
given x.sample_rate
.
sample_count(samples::Samples)
Return the number of multichannel samples in samples
(i.e. size(samples.data, 2)
)
Onda.encode
— Functionencode(sample_type::DataType, sample_resolution_in_unit, sample_offset_in_unit,
+ sample_data, dither_storage=nothing)
Return a copy of sample_data
quantized according to sample_type
, sample_resolution_in_unit
, and sample_offset_in_unit
. sample_type
must be a concrete subtype of Onda.LPCM_SAMPLE_TYPE_UNION
. Quantization of an individual sample s
is performed via:
round(S, (s - sample_offset_in_unit) / sample_resolution_in_unit)
with additional special casing to clip values exceeding the encoding's dynamic range.
If dither_storage isa Nothing
, no dithering is applied before quantization.
If dither_storage isa Missing
, dither storage is allocated automatically and triangular dithering is applied to the info prior to quantization.
Otherwise, dither_storage
must be a container of similar shape and type to sample_data
. This container is then used to store the random noise needed for the triangular dithering process, which is applied to the info prior to quantization.
If:
sample_type === eltype(sample_data) &&
+sample_resolution_in_unit == 1 &&
+sample_offset_in_unit == 0
then this function will simply return sample_data
directly without copying/dithering.
encode(samples::Samples, dither_storage=nothing)
If samples.encoded
is false
, return a Samples
instance that wraps:
encode(sample_type(samples.info),
+ samples.info.sample_resolution_in_unit,
+ samples.info.sample_offset_in_unit,
+ samples.data, dither_storage)
If samples.encoded
is true
, this function is the identity.
Onda.encode!
— Functionencode!(result_storage, sample_type::DataType, sample_resolution_in_unit,
+ sample_offset_in_unit, sample_data, dither_storage=nothing)
+encode!(result_storage, sample_resolution_in_unit, sample_offset_in_unit,
+ sample_data, dither_storage=nothing)
Similar to encode(sample_type, sample_resolution_in_unit, sample_offset_in_unit, sample_data, dither_storage)
, but write encoded values to result_storage
rather than allocating new storage.
sample_type
defaults to eltype(result_storage)
if it is not provided.
If:
sample_type === eltype(sample_data) &&
+sample_resolution_in_unit == 1 &&
+sample_offset_in_unit == 0
then this function will simply copy sample_data
directly into result_storage
without dithering.
encode!(result_storage, samples::Samples, dither_storage=nothing)
If samples.encoded
is false
, return a Samples
instance that wraps:
encode!(result_storage,
+ sample_type(samples.info),
+ samples.info.sample_resolution_in_unit,
+ samples.info.sample_offset_in_unit,
+ samples.data, dither_storage)`.
If samples.encoded
is true
, return a Samples
instance that wraps copyto!(result_storage, samples.data)
.
Onda.decode
— Functiondecode(sample_resolution_in_unit, sample_offset_in_unit, sample_data)
Return sample_resolution_in_unit .* sample_data .+ sample_offset_in_unit
.
If:
sample_data isa AbstractArray &&
+sample_resolution_in_unit == 1 &&
+sample_offset_in_unit == 0
then this function is the identity and will return sample_data
directly without copying.
decode(samples::Samples, ::Type{T}=Float64)
If samples.encoded
is true
, return a Samples
instance that wraps
decode(convert(T, samples.info.sample_resolution_in_unit),
+ convert(T, samples.info.sample_offset_in_unit),
+ samples.data)
If samples.encoded
is false
, this function is the identity.
Onda.decode!
— Functiondecode!(result_storage, sample_resolution_in_unit, sample_offset_in_unit, sample_data)
Similar to decode(sample_resolution_in_unit, sample_offset_in_unit, sample_data)
, but write decoded values to result_storage
rather than allocating new storage.
decode!(result_storage, samples::Samples)
If samples.encoded
is true
, return a Samples
instance that wraps
decode!(result_storage, samples.info.sample_resolution_in_unit, samples.info.sample_offset_in_unit, samples.data)
If samples.encoded
is false
, return a Samples
instance that wraps copyto!(result_storage, samples.data)
.
Onda.load
— Functionload(signal[, span_relative_to_loaded_samples]; encoded::Bool=false)
+load(file_path, file_format::Union{AbstractString,AbstractLPCMFormat},
+ info[, span_relative_to_loaded_samples]; encoded::Bool=false)
Return the Samples
object described by signal
/file_path
/file_format
/info
.
If span_relative_to_loaded_samples
is present, return load(...)[:, span_relative_to_loaded_samples]
, but attempt to avoid reading unreturned intermediate sample data. Note that the effectiveness of this optimized method versus the naive approach depends on the types of file_path
(i.e. if there is a fast method defined for Onda.read_byte_range(::typeof(file_path), ...)
) and file_format
(i.e. does the corresponding format support random or chunked access).
If encoded
is true
, do not decode the Samples
object before returning it.
Onda.mmap
— FunctionOnda.mmap(signal)
Return Onda.mmap(signal.file_path, SamplesInfoV2(signal))
, throwing an ArgumentError
if signal.file_format != "lpcm"
.
Onda.mmap(mmappable, info)
Return Samples(data, info, true)
where data
is created via Mmap.mmap(mmappable, ...)
.
mmappable
is assumed to reference memory that is formatted according to the Onda Format's canonical interleaved LPCM representation in accordance with sample_type(info)
and channel_count(info)
. No explicit checks are performed to ensure that this is true.
Onda.store
— Functionstore(file_path, file_format::Union{AbstractString,AbstractLPCMFormat}, samples::Samples)
Serialize the given samples
to file_format
and write the output to file_path
.
store(file_path, file_format::Union{AbstractString,AbstractLPCMFormat},
+ samples::Samples, recording::UUID, start::Period,
+ sensor_label::AbstractString = samples.info.sensor_type)
Serialize the given samples
to file_format
and write the output to file_path
, returning a SignalV2
instance constructed from the provided arguments.
Onda.channel
— Methodchannel(samples::Samples, name)
Return channel(samples.info, name)
.
This function is useful for indexing rows of samples.data
by channel names.
Onda.channel
— Methodchannel(samples::Samples, i::Integer)
Return channel(samples.info, i)
.
Onda.channel_count
— Methodchannel_count(samples::Samples)
Return channel_count(samples.info)
.
Onda.sample_count
— Methodsample_count(samples::Samples)
Return the number of multichannel samples in samples
(i.e. size(samples.data, 2)
)
LPCM (De)serialization API
Onda.jl's LPCM (De)serialization API facilitates low-level streaming sample data (de)serialization and provides a storage-agnostic abstraction layer that can be overloaded to support new file/byte formats for (de)serializing LPCM-encodeable sample data.
Onda.AbstractLPCMFormat
— TypeAbstractLPCMFormat
A type whose subtypes represents byte/stream formats that can be (de)serialized to/from Onda's standard interleaved LPCM representation.
All subtypes of the form F<:AbstractLPCMFormat
must call Onda.register_lpcm_format!
and define an appropriate file_format_string
method.
See also:
Onda.AbstractLPCMStream
— TypeAbstractLPCMStream
A type that represents an LPCM (de)serialization stream.
See also:
Onda.LPCMFormat
— TypeLPCMFormat(channel_count::Int, sample_type::Type)
+LPCMFormat(info::SamplesInfoV2)
Return a LPCMFormat<:AbstractLPCMFormat
instance corresponding to Onda's default interleaved LPCM format assumed for sample data files with the "lpcm" extension.
channel_count
corresponds to length(info.channels)
, while sample_type
corresponds to sample_type(info)
Note that bytes (de)serialized to/from this format are little-endian (per the Onda specification).
Onda.LPCMZstFormat
— TypeLPCMZstFormat(lpcm::LPCMFormat; level=3)
+LPCMZstFormat(info; level=3)
Return a LPCMZstFormat<:AbstractLPCMFormat
instance that corresponds to Onda's default interleaved LPCM format compressed by zstd
. This format is assumed for sample data files with the "lpcm.zst" extension.
The level
keyword argument sets the same compression level parameter as the corresponding flag documented by the zstd
command line utility.
See https://facebook.github.io/zstd/ for details about zstd
.
Onda.format
— Functionformat(file_format::AbstractString, info; kwargs...)
Return f(info; kwargs...)
where f
constructs the AbstractLPCMFormat
instance that corresponds to file_format
and info is a SamplesInfoV2
-compliant value. f
is determined by matching file_format
to a suitable format constuctor registered via register_lpcm_format!
.
See also: deserialize_lpcm
, serialize_lpcm
Onda.deserialize_lpcm
— Functiondeserialize_lpcm(format::AbstractLPCMFormat, bytes,
+ samples_offset::Integer=0,
+ samples_count::Integer=typemax(Int))
+deserialize_lpcm(stream::AbstractLPCMStream,
+ samples_offset::Integer=0,
+ samples_count::Integer=typemax(Int))
Return a channels-by-timesteps AbstractMatrix
of interleaved LPCM-encoded sample data by deserializing the provided bytes
in the given format
, or from the given stream
constructed by deserializing_lpcm_stream
.
Note that this operation may be performed in a zero-copy manner such that the returned sample matrix directly aliases bytes
.
The returned segment is at most sample_offset
samples offset from the start of stream
/bytes
and contains at most sample_count
samples. This ensures that overrun behavior is generally similar to the behavior of Base.skip(io, n)
and Base.read(io, n)
.
This function is the inverse of the corresponding serialize_lpcm
method, i.e.:
serialize_lpcm(format, deserialize_lpcm(format, bytes)) == bytes
Onda.serialize_lpcm
— Functionserialize_lpcm(format::AbstractLPCMFormat, samples::AbstractMatrix)
+serialize_lpcm(stream::AbstractLPCMStream, samples::AbstractMatrix)
Return the AbstractVector{UInt8}
of bytes that results from serializing samples
to the given format
(or serialize those bytes directly to stream
) where samples
is a channels-by-timesteps matrix of interleaved LPCM-encoded sample data.
Note that this operation may be performed in a zero-copy manner such that the returned AbstractVector{UInt8}
directly aliases samples
.
This function is the inverse of the corresponding deserialize_lpcm
method, i.e.:
deserialize_lpcm(format, serialize_lpcm(format, samples)) == samples
Onda.deserialize_lpcm_callback
— Functiondeserialize_lpcm_callback(format::AbstractLPCMFormat, samples_offset, samples_count)
Return (callback, required_byte_offset, required_byte_count)
where callback
accepts the byte block specified by required_byte_offset
and required_byte_count
and returns the samples specified by samples_offset
and samples_count
.
As a fallback, this function returns (callback, missing, missing)
, where callback
requires all available bytes. AbstractLPCMFormat
subtypes that support partial/block-based deserialization (e.g. the basic LPCMFormat
) can overload this function to only request exactly the byte range that is required for the sample range requested by the caller.
This allows callers to handle the byte block retrieval themselves while keeping Onda's LPCM Serialization API agnostic to the caller's storage layer of choice.
Onda.deserializing_lpcm_stream
— Functiondeserializing_lpcm_stream(format::AbstractLPCMFormat, io)
Return a stream::AbstractLPCMStream
that wraps io
to enable direct LPCM deserialization from io
via deserialize_lpcm
.
Note that stream
must be finalized after usage via finalize_lpcm_stream
. Until stream
is finalized, io
should be considered to be part of the internal state of stream
and should not be directly interacted with by other processes.
Onda.serializing_lpcm_stream
— Functionserializing_lpcm_stream(format::AbstractLPCMFormat, io)
Return a stream::AbstractLPCMStream
that wraps io
to enable direct LPCM serialization to io
via serialize_lpcm
.
Note that stream
must be finalized after usage via finalize_lpcm_stream
. Until stream
is finalized, io
should be considered to be part of the internal state of stream
and should not be directly interacted with by other processes.
Onda.finalize_lpcm_stream
— Functionfinalize_lpcm_stream(stream::AbstractLPCMStream)::Bool
Finalize stream
, returning true
if the underlying I/O object used to construct stream
is still open and usable. Otherwise, return false
to indicate that underlying I/O object was closed as result of finalization.
Onda.register_lpcm_format!
— FunctionOnda.register_lpcm_format!(create_constructor)
Register an AbstractLPCMFormat
constructor so that it can automatically be used when format
is called. Authors of new AbstractLPCMFormat
subtypes should call this function for their subtype.
create_constructor
should be a unary function that accepts a single file_format::AbstractString
argument, and return either a matching AbstractLPCMFormat
constructor or nothing
. Any returned AbstractLPCMFormat
constructor f
should be of the form f(info; kwargs...)::AbstractLPCMFormat
where info
is a SamplesInfoV2
-compliant value.
Note that if Onda.register_lpcm_format!
is called in a downstream package, it must be called within the __init__
function of the package's top-level module to ensure that the function is always invoked when the module is loaded (not just during precompilation). For details, see https://docs.julialang.org/en/v1/manual/modules/#Module-initialization-and-precompilation.
Onda.file_format_string
— Functionfile_format_string(format::AbstractLPCMFormat)
Return the String
representation of format
to be written to the file_format
field of a *.signals
file.
Utilities
Onda.VALIDATE_SAMPLES_DEFAULT
— ConstantVALIDATE_SAMPLES_DEFAULT[]
Defaults to true
.
When set to true
, Samples
objects will be validated upon construction for compliance with the Onda specification.
Users may interactively set this reference to false
in order to disable this extra layer validation, which can be useful when working with malformed Onda datasets.
See also: Onda.validate_samples
Onda.upgrade
— FunctionOnda.upgrade(from::SignalV1, ::SignalV2SchemaVersion)
Return a SignalV2
instance that represents from
in the SignalV2SchemaVersion
format.
The fields of the output will match from
's fields, except:
- The
kind
field will be removed. - The
sensor_label=from.kind
field will be added. - The
sensor_type=from.kind
field will be added.
Developer Installation
To install Onda for development, run:
julia -e 'using Pkg; Pkg.develop("Onda")'
This will install Onda to the default package development directory, ~/.julia/dev/Onda
.