Resources

LogQS workflows center around creating, reading, updating, and deleting resources which are organized into four primary categories:

  • Core

  • Objects

  • Management

  • Auth

A map of these resources and their high-level relationships is shown below.

../_images/LogQS-v1-Resources.drawio.png

LogQS Resources

Typical LogQS interactions will occur primarily with the Core resources, where we create and read log data. The Objects resources are used to interact with blob data stored in object stores (such as S3). The Management resources are used to manage other resources, such as through grouping logs or assigning workflows. The Auth resources are used to manage users and their permissions.

LogQS operates as a RESTful API, and all resources are accessed through HTTP requests. The following diagram shows the basic structure of LogQS endpoints:

../_images/LogQS-v1-REST-API.drawio.png

LogQS REST API

Typical LogQS interactions will occur primarily with the Core resources, where we create and read log data. The Core resources are:

# We can import corresponding Pydantic models from the interface package
from lqs.interface.core.models import (
    Log,
    Ingestion,
    Topic,
    Record,
)

Logs

Logs are the primary resource in LogQS. All other core resources are always associated with one log. Logs can be seen as containers for topics (which can be seen as containers for records). Before any core data can be created, inlcuding ingested, a log must be created first.

Logs belong to one group, specifed on creation. Logs can be moved between groups. Log names must be unique within a group, but are otherwise arbitrary and only used for reference. Log names can be changed after creation.

The log’s start and end times are automatically updated when new records are added to or removed from the log. The start time is the earliest record time, and the end time is the latest record time. The record and object counts and sizes are also automatically updated. The record count is the number of records in the log, and the object count is the number of objects associated with the log. The record size is the total size of all records in the log, and the object size is the total size of all objects associated with the log.

The log note is a free-form text field that can be used to store any referential information about the log. The log context field is a JSON object that can be used to store any structured information about the log. The log context is intended to be used to store information that is useful for programmatic access to the log, such as log metadata used by Studio.

The locked field is a boolean that indicates whether the log is locked. Locked logs cannot be modified or deleted. The locked field is intended to be used to prevent accidental modification or deletion of logs that are in use.

The log’s time adjustment field is an integer representing a nanosecond which is to be added to all record times in the log. The time adjustment field is intended to be used to correct for occurences where the log’s recorded times are not in-sync with the real-world time, e.g., a log was recorded on a machine without a synchronized clock, so it’s record times start at 0 and are off by some constant offset. This is used for reference only, and does not affect the log’s records.

The default workflow ID field is a reference to a workflow that will be used by default when creating new processes (such as ingestions or digestions) for the log. This workflow takes precedence over the group’s default workflow and the DataStore’s default workflow, but not workflow’s assigned on a per-process basis.

Logs contain the following fields:

>> Log._describe()

Log:
id [UUID]:                              The ID of the resource.
context [Optional[dict]]:               A JSON context for the log.
default_workflow_id [Optional[UUID]]:   The ID of the workflow to be executed during state transitions of associated processes.
end_time [Optional[int]]:               The timestamp of the last record of the log.
group_id [UUID]:                        The ID of the group to which this log belongs.
lock_token [Optional[str]]:             The token used to lock the resource.
locked [bool]:                          Whether the process is locked (i.e. cannot be modified).
locked_at [Optional[datetime]]:         The time at which the resource was locked.
locked_by [Optional[UUID]]:             The ID of the user who locked the resource.
name [str]:                             The name of the log (unique per group).
note [Optional[str]]:                   A general note about the log for reference.
object_count [int]:                     The total number of objects in the log.
object_size [int]:                      The total size of all objects in the log in bytes.
record_count [int]:                     The total number of records in the log.
record_size [int]:                      The total size of all records in the log in bytes.
start_time [Optional[int]]:             The timestamp of the first record of the log.
created_at [datetime]:                  The creation timestamp of the resource.
created_by [Optional[UUID]]:            The ID of the user who created the resource.
deleted_at [Optional[datetime]]:        The deletion timestamp of the resource.
deleted_by [Optional[UUID]]:            The ID of the user who deleted the resource.
updated_at [Optional[datetime]]:        The last update timestamp of the resource.
updated_by [Optional[UUID]]:            The ID of the user who last updated the resource.

Ingestions

Ingestions are resources which track ingestion processes. An ingestion process is the primary method of creating topics and records for logs. Valid ingestions are associated with one, and only one, object.

Ingestions belong to exactly one log. Ingestions can not be moved between logs. Ingestion names are not unique and are only used for reference (a common strategy is to use the object name as the ingestion name).

An ingestion must point to a single object in order to be queued. The object can either be internal (managed by LogQS) or external (managed by the user and referenced through an object store). If the object is internal, it is a log object associated with the ingestion’s log and is referenced by it’s object key while the object store ID is null. If the object is external, it is an object in an object store and is referenced by it’s object store ID and object key.

The workflow ID references the workflow that will be used to process the ingestion which takes precedence over the log’s default workflow, the group’s default workflow, and the DataStore’s default workflow. The workflow context is a JSON object that can be used to store any structured information about the workflow. The workflow context is intended to be used when the workflow accepts arguments which can be supplied by the user.

The ingestion’s state is a string that indicates the current state of the ingestion. The ingestion’s state can be one of the following: - ready: The ingestion is ready to be processed (the default state). - queued: The ingestion is queued to be processed. The user should transition the ingestion to this state when they are ready for it to be processed. - processing: The ingestion is currently being processed. The user should not transition the ingestion to this state. - finalizing: The ingestion process transitions the ingestion to this state when it is finished processing creating ingestion parts. Once the ingestion parts are complete, the ingestion will transition to the complete state. - complete: The ingestion is complete. The ingestion parts have been processed, and the data from the ingestion has been added to the log. The ingestion should remain in this state indefinitely until the user archives it. - failed: The ingestion failed to complete. The ingestion should remain in this state indefinitely until the user archives it or re-queues it. - archived: The ingestion has been archived. The ingestion should remain in this state indefinitely until the user deletes it.

The ingestion’s error field is a JSON object that contains information about any errors that occurred during the ingestion process. The ingestion’s error field is only populated when the ingestion is in the failed state.

The ingestion’s progress field is a float between 0 and 1 that indicates the progress of the ingestion process. The ingestion’s progress field is initially null, and updated during the ingestion’s processing and finalizing states.

The ingestion’s note is a free-form text field that can be used to store any referential information about the ingestion. The ingestion’s context field is a JSON object that can be used to store any structured information about the ingestion. The ingestion’s context is intended to be used to store information that is useful for programmatic access to the ingestion, such as ingestion metadata used by Studio or queried by users.

>> Ingestion._describe()

Ingestion:
id [UUID]:                              The ID of the resource.
context [Optional[dict]]:               A JSON context for the ingestion.
error [Optional[dict]]:                 The JSON payload of an error that occurred during the process.
group_id [Optional[UUID]]:
lock_token [Optional[str]]:             The token used to lock the resource.
locked [bool]:                          Whether the process is locked (i.e. cannot be modified).
locked_at [Optional[datetime]]:         The time at which the resource was locked.
locked_by [Optional[UUID]]:             The ID of the user who locked the resource.
log_id [UUID]:                          The ID of the log to which this ingestion belongs.
name [Optional[str]]:                   The name of the ingestion (not unique).
note [Optional[str]]:                   A general note about the ingestion for reference.
object_key [Optional[str]]:             The key of the ingestion object.
object_store_id [Optional[UUID]]:       If the ingestion object is stored in an object store, the ID of the object store.
progress [Optional[float]]:             The progress of the process for the current state (a float in [0,1]).
state [ProcessState]:                   The current state of the process.
transitioned_at [Optional[datetime]]:   The time at which the process transitioned to the current state.
workflow_context [Optional[dict]]:      The context to be passed to the workflow during state transitions.
workflow_id [Optional[UUID]]:           The ID of the workflow to be executed during state transitions.
created_at [datetime]:                  The creation timestamp of the resource.
created_by [Optional[UUID]]:            The ID of the user who created the resource.
deleted_at [Optional[datetime]]:        The deletion timestamp of the resource.
deleted_by [Optional[UUID]]:            The ID of the user who deleted the resource.
updated_at [Optional[datetime]]:        The last update timestamp of the resource.
updated_by [Optional[UUID]]:            The ID of the user who last updated the resource.

Topics

Topics are resources which constitute the main sub-resources of logs. Topics can be thought of as containers for records, where all records associated with a topic are similar in tmers of the data they contain and/or represent. Each topic is associated with one, and only one, log.

Topics must have a unique name within a log, but are otherwise arbitrary and only used for reference. Topic names can be changed after creation. Topics can also optionally be associated with one other topic within the same log. This is used for reference, and is useful for keeping track of relationships between topics such as when the records of one topic are derived from data from another topic.

Each topic has a set of fields indicating how the record data within the topic should be interpreted. This includes type_name, which is a string identifier for the type, type_encoding, which is a string indicating how the topic’s record’s data is encoded, type_data, which is a string providing reference for how the record data is structured, and type_schema, which is a JSON schema representing the structure of the record data. These fields need not be populated, but are useful for validation and interpretation of the record data as well as for the automated population of the record data.

In the context of typical robotics log data (such as ROS bags or MCAP files), the topic’s name might be the topic or channel name from the log file, the type_name might be the message type (e.g., sensor_msgs/Image, sensor_msgs/msg/Image, etc.), the type_encoding might be the serialization format (e.g., ros1, cdr, etc.), the type_data might be the full message definition, and the type_schema might be a JSON schema representing the message structure which can be used to validate the record data or by external applications.

Similar to logs, topics contain information about the number of records associated with the topic as well as the size of those records. That is, the sum of record_count and record_size across all topics in a log will be equal to the log’s record_count and record_size. Similarly, the start_time and end_time of a topic are the earliest and latest record times, respectively, of the records associated with the log. The start_time of the topic’s log will be the earliest start_time of all topics in the log, and the end_time of the topic’s log will be the latest end_time of all topics in the log.

Topics contain the following fields:

>> Topic._describe()

Topic:
id [UUID]:                              The ID of the resource.
associated_topic_id [Optional[UUID]]:   The ID of an associated topic (if any) for reference.
context [Optional[dict]]:               A JSON context for the topic.
end_time [Optional[int]]:               The timestamp of the last record of the topic.
group_id [Optional[UUID]]:              The ID of the group to which this topic belongs.
lock_token [Optional[str]]:             The token used to lock the resource.
locked [bool]:                          Whether the process is locked (i.e. cannot be modified).
locked_at [Optional[datetime]]:         The time at which the resource was locked.
locked_by [Optional[UUID]]:             The ID of the user who locked the resource.
log_id [UUID]:                          The ID of the log to which this topic belongs.
name [str]:                             The name of the topic (unique per log).
note [Optional[str]]:                   A general note about the topic.
object_count [int]:                     The total number of objects in the topic.
object_size [int]:                      The total size of all objects in the topic in bytes.
record_count [int]:                     The total number of records in the topic.
record_size [int]:                      The total size of all records in the topic in bytes.
start_time [Optional[int]]:             The timestamp of the first record of the topic.
strict [bool]:                          Whether the topic's schema should be strictly enforced.
type_data [Optional[str]]:              The definition of the message type used to (de)serialize the topic's records.
type_encoding [Optional[TypeEncoding]]: The encoding of the message data of the topic's records.
type_name [Optional[str]]:              The name of the message type which the topic's records should conform to.
type_schema [Optional[dict]]:           A JSON schema describing the record data of the topic's records.
created_at [datetime]:                  The creation timestamp of the resource.
created_by [Optional[UUID]]:            The ID of the user who created the resource.
deleted_at [Optional[datetime]]:        The deletion timestamp of the resource.
deleted_by [Optional[UUID]]:            The ID of the user who deleted the resource.
updated_at [Optional[datetime]]:        The last update timestamp of the resource.
updated_by [Optional[UUID]]:            The ID of the user who last updated the resource.

Records

Records are the most granular core resource in LogQS. Records represent the actual data points corresponding to the messages found in log data which are indexed in LogQS. Each record is associated with one, and only one, topic.

Every record has a populated timestamp field representing the nanoseconds since the Unix epoch at which the record was recorded. Within a topic, the timestamp field is unique, and records are naturally sorted by timestamp in ascending order.

Records effectively contain index information about the messages in the underlying log data. The data_offset field is a non-negative integer representing the byte offset of the start of the record’s underlying message data in its source log file, while the data_length field is a non-negative integer representing the length of the record’s underlying message data in bytes. That is, if you were to read data_length bytes starting at data_offset from the source log file (assuming it’s uncompressed), you would get the record’s underlying message data (which could then be deserialized based on the type information found the record’s topic).

Some log formats support compression which require us to fetch more than just the record’s underlying message data to extract the message data. In these cases, the data indexed by the data_offset and data_length represents a “chunk” of data which contains the necessary data needed to derive the actual message data. If this is the case, the chunk_compression will be populated with a string indicating the compression algorithm used to compress the chunk, and the chunk_offset and chunk_length fields will be populated. In this case, the chunk_offset field will be a non-negative integer representing the byte offset of the start of the record’s message data within the uncompressed chunk, while the chunk_length field will be a non-negative integer representing the length of the record’s message data within the uncompressed chunk.

To locate the object which contains the record data, one would refer to the record’s ingestion_id to find the ingestion which created the record which will contain a reference to the location of the ingested object. Some log formats may have the log data stored across multiple objects. In this case, the source field on the record will be populated with a string indicating the relative path to the record’s data from the ingested object.

Records have three “data” fields: query_data, auxiliary_data, and raw_data.

The query_data field is a JSON object containing the data that can be used when querying records through the record list endpoint. This data is unstructured and unvalidated, but will typically contain a representation of the underlying message data. When possible, this field is populated during ingestion based on this data in a best effort manner. The size of the query_data object is limited on a per-record basis which is configured on a per DataStore basis. During ingestion, if the message data is too large to fit in the query_data field, the query_data field may either be populated with a subset of the data or left empty. The query_data field can be modified and should not be relied upon for critical, persistent data.

The auxiliary_data field is a JSON object containing any additional data that is useful for the record, but cannot be used for querying. The data which may be foud in the auxiliary_data field is not stored in the database; rather, this data is fetched externally depending on the context. By default, when fetching or listing records, the auxiliary_data field is not populated to avoid unnecessary overhead and data transfer. It can be populated by passing the include_auxiliary_data query parameter when fetching or listing records.

The raw_data field is a string containing a “raw” representation of the underlying data. This field is not used for querying, but can be useful for fetching underlying message data through the record endpoints. Similar to the auxiliary_data field, the raw_data field is not populated by default when fetching or listing records, but can be populated by passing the include_raw_data query parameter when fetching or listing records. If the underlying data can be represented as a string, it will populate the raw_data field directly. If the underlying data is binary, the raw_data field will be populated with a base64 encoded string.

Records contain the following fields:

>> Record._describe()

Record:
timestamp [int]:                        The timestamp, in nanoseconds, of the record.
auxiliary_data [Union[dict, list, NoneType]]: A JSON data representation of the record's auxiliary data which can be used for functional purposes.
chunk_compression [Optional[str]]:      The compression algorithm used to compress the record's message data in the log object, if any.
chunk_length [Optional[int]]:           The length, in bytes, of the record's message data in the log object's uncompressed data, if compressed.
chunk_offset [Optional[int]]:           The offset, in bytes, of the record's message data in the log object's uncompressed data relative to the start of the uncompressed chunk, if compressed.
context [Optional[dict]]:               A JSON context for the record.
data_length [Optional[int]]:            The length, in bytes, of the record's message data in the log object.
data_offset [Optional[int]]:            The offset, in bytes, of the record's message data in the log object.
error [Optional[dict]]:                 The JSON payload of an error that occurred during record processing.
ingestion_id [Optional[UUID]]:          The ID of the ingestion which created this record, if any.
lock_token [Optional[str]]:             The token used to lock the resource.
locked [bool]:                          Whether the process is locked (i.e. cannot be modified).
locked_at [Optional[datetime]]:         The time at which the resource was locked.
locked_by [Optional[UUID]]:             The ID of the user who locked the resource.
log_id [UUID]:                          The ID of the log to which this record's topic belongs.
note [Optional[str]]:                   A general note about the record for reference.
query_data [Union[dict, list, NoneType]]: A JSON data representation of the record's message data which is queryable.
raw_data [Optional[str]]:               A string representation of the record's message data, presented as-is. This data will be base 64 encoded if needed.
source [Optional[str]]:                 A relative path to the record's log object relative to the record's ingestion object, if any.
topic_id [UUID]:                        The ID of the topic to which this record belongs.
created_at [datetime]:                  The creation timestamp of the resource.
created_by [Optional[UUID]]:            The ID of the user who created the resource.
deleted_at [Optional[datetime]]:        The deletion timestamp of the resource.
deleted_by [Optional[UUID]]:            The ID of the user who deleted the resource.
updated_at [Optional[datetime]]:        The last update timestamp of the resource.
updated_by [Optional[UUID]]:            The ID of the user who last updated the resource.