Working with Records
Most non-trivial uses of LogQS will involve interacting with records which have been ingested. On this page, we’ll cover a few different aspects of working with records.
Record Auxiliary Data
First, we’ll load a log, a topic from that log which is of type sensor_msgs/Image, and a single record from that topic. We’ll dump the contents of the record to see what it looks like:
log = lqs.resource.Log.fetch("Demo Log")
topic = log.list_topics(type_name="sensor_msgs/Image")[0]
record = topic.list_records(limit=1)[0]
print(record.model_dump())
>> {'locked': False,
'locked_by': None,
'locked_at': None,
'lock_token': None,
'timestamp': 1655235727034130944,
'created_at': datetime.datetime(2023, 12, 18, 22, 25, 10, 453947, tzinfo=TzInfo(UTC)),
'updated_at': None,
'deleted_at': None,
'created_by': None,
'updated_by': None,
'deleted_by': None,
'log_id': UUID('f94c2773-6075-44d3-9638-89489e99d0c0'),
'topic_id': UUID('0f552dad-30b5-4d93-b6a2-67403527fa3a'),
'ingestion_id': UUID('707e51ae-25a7-42ff-8ed5-9d8ed603b883'),
'data_offset': 18122,
'data_length': 1710802,
'chunk_compression': None,
'chunk_offset': None,
'chunk_length': None,
'source': None,
'error': None,
'query_data': None,
'auxiliary_data': None,
'raw_data': None,
'context': None,
'note': None}
In LogQS, records can be associated with “auxiliary” data which allows us to augment records with arbitrary JSON data stored in an object store. This data is not included on records by default, as loading the data incurs a performance hit per record, but it can be loaded by setting the include_auxiliary_data parameter to True when fetching or listing records.
Note: auxiliary data can be arbitrarily large, so loading a large amount of records with auxiliary data can be problematic (including errors related to payload limits). It’s usually best to load records with auxiliary data one at a time, or in small batches.
record = topic.list_records(limit=1, include_auxiliary_data=True)[0]
print(record.model_dump())
>> {'locked': False,
'locked_by': None,
'locked_at': None,
'lock_token': None,
'timestamp': 1655235727034130944,
'created_at': datetime.datetime(2023, 12, 18, 22, 25, 10, 453947, tzinfo=TzInfo(UTC)),
'updated_at': None,
'deleted_at': None,
'created_by': None,
'updated_by': None,
'deleted_by': None,
'log_id': UUID('f94c2773-6075-44d3-9638-89489e99d0c0'),
'topic_id': UUID('0f552dad-30b5-4d93-b6a2-67403527fa3a'),
'ingestion_id': UUID('707e51ae-25a7-42ff-8ed5-9d8ed603b883'),
'data_offset': 18122,
'data_length': 1710802,
'chunk_compression': None,
'chunk_offset': None,
'chunk_length': None,
'source': None,
'error': None,
'query_data': None,
'auxiliary_data': {'image': 'UklGR...e/8f0gQAAAA',
'max_size': 640,
'quality': 80,
'format': 'webp'},
'raw_data': None,
'context': None,
'note': None}
You should see that the auxiliary data for this record includes an ‘image’ field with a base64-encoded image. In LogQS, we automatically process certain types of data, such as images, to generate this auxiliary data on-demand. Other types of data may not have auxiliary data generated automatically, in which case a user will need to manually create it.
The record model includes a helper method to display the image in notebooks:
record.load_auxiliary_data_image()
Note that the image you’d find in the auxiliary data of a record is typically downscaled and compressed, making it unsuitable for high-quality image processing. We refer to these images as “preview” images since they’re appropriate for quick reference.
If you need a full-resolution image, you’ll need to fetch and deserialize the original data from the log file.
Fetching Record Data
When we want to fetch the original log data for a record, we have to jump through a few hoops to actually get it. The record provides enough information to fetch the rest of the necessary data to fetch the orginal log data from the log file in the object store, but this is quite cumbersome.
To make this process easier, we’ve provided a utility method for fetching the record bytes given a record. Note that this process can be slow, especially when performed on a single record at a time:
record_bytes = lqs.utils.fetch_record_bytes(record)
print(record_bytes[:10])
b'`\n\x00\x00\x8e\xe4\xa8b(p'
LogQS comes with deserialization utilities for different log formats. There’s different ways of accessing these utilities, but if you’re interested in fetching and deserializing the original log data for a record, the following method is the most straightforward:
record_data = lqs.utils.get_deserialized_record_data(record)
# we omit the "data" field since it's big and not very interesting to see
print({ k: v for k, v in record_data.items() if k != "data" })
>> {'header': {'seq': 2656,
'stamp': {'secs': 1655235726, 'nsecs': 999977000},
'frame_id': 'crl_rzr/multisense_front/aux_camera_frame'},
'height': 594,
'width': 960,
'encoding': 'bgr8',
'is_bigendian': 0,
'step': 2880}
Our deserialization utilities will return a dictionary with the deserialized data in a format closely matching the original data schema. In the case of sensor_msgs/Image topics, you’ll find that the dictionary looks similar to the ROS message definition.
If we want to view this image, we’ll have to do a little processing to convert the image data to a format that can be displayed in a Jupyter notebook. We’ll use the PIL library to do this:
from PIL import Image as ImagePIL
mode = "RGB" # different encodings may use different modes
img = ImagePIL.frombuffer(
mode,
(record_data["width"], record_data["height"]),
bytes(record_data["data"]),
"raw",
mode,
0,
1,
)
# in this case, we actually have a BGR image, not an RGB, so we need to swap the channels
b, g, r = img.split()
img = ImagePIL.merge("RGB", (r, g, b))
img
Of course, we also offer a utility function which can do this for you:
from lqs.common.utils import get_record_image
img = get_record_image(record_data, format="PNG")
img
Listing Records
If we need to work with more than one record in this kind of way, there are some approaches that can be useful to improve performance depending on the context. For example, if we’re interested in getting a list of records across time, but we don’t need every record within a span of time, we can use the frequency parameter to specify how many records we want to fetch per second. This can be useful for getting a representative sample of records across time without having to load every single record.
records = topic.list_records(frequency=0.1) # 0.1 record per second, or 1 record every 10 seconds
print(f"Found {len(records)} records")
>> Found 7 records
We can then proceed as we did above to fetch the original log data for each record, but the methods used above aren’t optimized for working with a batch of records (you’ll incur unnecessary overhead for each record).
Instead, you’d want to use the iter_record_data utility method which takes a list of records as input and produces an iterator which yields a tuple of the record and the record’s data. This method is optimized for fetching data for multiple records at once as well as re-using lookup data and the deserialization utilities across multiple records:
for idx, (record, record_data) in enumerate(lqs.utils.iter_record_data(records, deserialize_results=True)):
image = get_record_image(
record_data,
format="PNG",
)
image.thumbnail((200, 200)) # make them small for the sake of compactness, but the record_data is full-res
display(image)