Data File Module¶
This module implements readers and writers for tag-length-value files used by Joulescope.
This file format is used for multiple purposes, including:
Raw data capture, playback & browsing
Processed data capture, playback & browsing
Calibration storage
Firmware update storage
The file format must meet multiple objectives:
Support a streaming interface, such as over TCP.
Support fast access loading from disk.
Support incremental processing, such as by a microcontroller for firmware updates.
The streaming requirement means that seeking back to the start of the file is not allowed. Any collections or sections must be indicated with tags. However, reading and writing files can seek, so tags may be rewritten with offset information for improved performance.
The file format starts with a 32 byte header:
16 bytes: [0xd3, 0x74, 0x61, 0x67, 0x66, 0x6d, 0x74, 0x20, 0x0d, 0x0a, 0x20, 0x0a, 0x20, 0x20, 0x1a, 0x1c]
8 bytes: total length in bytes (0=”not provided” or “streaming”)
3 bytes: reserved (0)
1 byte: file version (1)
4 bytes: crc32 over this header
The prefix is specially selected to ensure:
Identification: Help the application determine that this file is in the correct format with minimal uncertainty.
Correct endianness: Little endian has won, so this entire format is stored in little endian format.
Proper binary processing: The different line ending combinations ensure that the reader is not “fixing” the line endings, since this is a binary file format.
Display: Include “substitute” and “file separator” so that text printers to not show the rest of the file.
The remaining file contents are in tag-length-value (TLV) format with CRC32:
3 bytes: tag
1 byte: TLV flags (compression, encryption)
4 bytes: length of data in bytes (may be zero)
length bytes: The data value
pad bytes: zero padding to 8 byte + 4 boundary so that crc ends on 8 byte boundary
4 bytes: crc32
Tags are selected such that the upper byte is 0. Since the file format is little endian, this means that the tag has three usable characters. The upper tag bits have the following definitions:
bit 31: 1=compressed, 0=uncompressed
bit 30: 1=encrypted, 0=unencrypted, ChaCha20 + Poly1305 with EdDSA signature
bits [28:24]: reserved
bits [23:0]: Unique tag
The supported tags include:
b’HDR’: common header information. Must be the first tag, but SGS is optionally allowed before. Files with encrypted tags will typically use the first 24 bytes of this field as the nonce, and then increment the last uint32 with each new encrypted block.
8 byte timestamp for data creation. See time.py for timestamp format information.
4 byte version of the file data contents: major8, minor8, patch16. If this field is not used, set to 0.
4 byte vendor_id: For USB products the MSB is 0 and LSB is the USB VID.
2 byte product_id: unique within vendor_id
2 byte subtype_id: application-defined, unique within product_id. A single product may include multiple subtypes, such as firmware, FPGA bitstreams and calibration data. Each product may assign values for this field or not use it.
4 byte hardware_compatibility: application-defined. Each bit represents a potentially incompatible hardware revision. This field should set the bit for each hardware version supported. If this field is not used, set to 0.
16 byte serial number identifying device associated with this data. If this field is not used, set to 0.
b’END’: Indicate data file end. Must be the last tag.
b’CLS’: collection start. The payload is:
8 byte position to the collection end tag. This allows fast seeking to skip the collection data. In streaming datafile mode, the offset is 0.
2 byte file-specific collection identifier
1 byte collection type: 0=unstructured, 1=list, 2=map
1 byte reserved (0)
N bytes: optional application specific data.
b’CLE’: collection end. May contain application-specific data such as indices to increase access performance.
b’SUB’: A subfile, which is often used for storing the calibration record inside the data capture. The payload starts with 128 bytes 127 bytes of UTF-8 encoded characters) that contains the null-terminated file name. Unused bytes MUST be set to 0. The remaining payload is the file in this datafile format.
b’IDX’: application-specific index information.
b’MJS’: application-specific metadata, JSON formatted.
b’AJS’: application-specific data, JSON formatted.
b’ABN’: application-specific data, binary formatted.
b’UJS’: arbitrary end-user data, JSON formatted.
b’UBN’: arbitrary end-user data, binary formatted.
b’ENC’: encryption authenticity and integrity information. This tag must follow every block with the encryption bit set.
16 bytes: ChaCha20 + Poly1305 MAC
64 bytes: EdDSA curve25519 using Blake2b hash (monocypher) The signature is computed on the UNENCRYPTED data (sign-then-encrypt) For firmware updates, we care more that the firmware is valid than who created the cryptotext. If you want to prevent cryptotext forgeries, use encrypt-then-sign with use SGS/SGE and the payload only flag.
b’SGS’: signature start. This field (inclusive) and all others up to SGE (exclusive) are included in the signature. Note that this file format makes no provisions for managing keys or ensuring key validity.
1 byte: signature type
1 = EdDSA curve25519 using Blake2b hash (monocypher).
1 byte: flags
1 = include this field (default is exclude)
2 = payload only (exclude tag, length & crc32)
6 bytes: reserved zero
32 bytes: public key
b’SGE’: signature end. This field is exclude from the signature. Payload is the signature.
- class joulescope.datafile.DataFileReader(filehandle)[source]¶
Create a new instance.
- Parameters:
filehandle – The file-like object open for read. The file must support read, seek and tell.
- collection_goto_end()[source]¶
Skip to the collection end.
- Raises:
RuntimeError – If the current tag is not a COLLECTION_START.
- decrypt(signing_key, encryption_key, nonce, associated_data=None)[source]¶
Decrypt the next tag, if needed
- peek_tag_length()[source]¶
Peek at the next available entry.
- Returns:
tuple (tag, value_length)
This method gets the tag and length quickly. It does not load the data or validate the checksum.
- seek(position)[source]¶
Change to the location of another entry.
- Parameters:
position – The position returned by a previous call to
tell()
.
- class joulescope.datafile.DataFileWriter(filehandle)[source]¶
Create a new instance.
- Parameters:
filehandle – The file open for write which must support the write, seek and tell methods.
- append(tag, data=None, compress=None)[source]¶
Append a new tag-length-value field to the file.
- Parameters:
tag – The 3-byte tag as either an integer or bytes.
data – The associated data for the tag (optional).
compress – When False or None, do not attempt to compress the data. When True, attempt to compress the data.
- Returns:
The starting position for the tag.
- append_subfile(name: str, data, compress=None)[source]¶
Append a subfile.
- Parameters:
name – The name of the subfile, which must fit into 127 bytes encoded as utf-8.
data – The data in this datafile format.
compress – When False or None, do not attempt to compress the data. When True, attempt to compress the data.