Data File Module

This module implements readers and writers for tag-length-value files used by Joulescope.

This file format is used for multiple purposes, including:

  • Raw data capture, playback & browsing

  • Processed data capture, playback & browsing

  • Calibration storage

  • Firmware update storage

The file format must meet multiple objectives:

  • Support a streaming interface, such as over TCP.

  • Support fast access loading from disk.

  • Support incremental processing, such as by a microcontroller for firmware updates.

The streaming requirement means that seeking back to the start of the file is not allowed. Any collections or sections must be indicated with tags. However, reading and writing files can seek, so tags may be rewritten with offset information for improved performance.

The file format starts with a 32 byte header:

  • 16 bytes: [0xd3, 0x74, 0x61, 0x67, 0x66, 0x6d, 0x74, 0x20, 0x0d, 0x0a, 0x20, 0x0a, 0x20, 0x20, 0x1a, 0x1c]

  • 8 bytes: total length in bytes (0=”not provided” or “streaming”)

  • 3 bytes: reserved (0)

  • 1 byte: file version (1)

  • 4 bytes: crc32 over this header

The prefix is specially selected to ensure:

  • Identification: Help the application determine that this file is in the correct format with minimal uncertainty.

  • Correct endianness: Little endian has won, so this entire format is stored in little endian format.

  • Proper binary processing: The different line ending combinations ensure that the reader is not “fixing” the line endings, since this is a binary file format.

  • Display: Include “substitute” and “file separator” so that text printers to not show the rest of the file.

The remaining file contents are in tag-length-value (TLV) format with CRC32:

  • 3 bytes: tag

  • 1 byte: TLV flags (compression, encryption)

  • 4 bytes: length of data in bytes (may be zero)

  • length bytes: The data value

  • pad bytes: zero padding to 8 byte + 4 boundary so that crc ends on 8 byte boundary

  • 4 bytes: crc32

Tags are selected such that the upper byte is 0. Since the file format is little endian, this means that the tag has three usable characters. The upper tag bits have the following definitions:

  • bit 31: 1=compressed, 0=uncompressed

  • bit 30: 1=encrypted, 0=unencrypted, ChaCha20 + Poly1305 with EdDSA signature

  • bits [28:24]: reserved

  • bits [23:0]: Unique tag

The supported tags include:

  • b’HDR’: common header information. Must be the first tag, but SGS is optionally allowed before. Files with encrypted tags will typically use the first 24 bytes of this field as the nonce, and then increment the last uint32 with each new encrypted block.

    • 8 byte timestamp for data creation. See time.py for timestamp format information.

    • 4 byte version of the file data contents: major8, minor8, patch16. If this field is not used, set to 0.

    • 4 byte vendor_id: For USB products the MSB is 0 and LSB is the USB VID.

    • 2 byte product_id: unique within vendor_id

    • 2 byte subtype_id: application-defined, unique within product_id. A single product may include multiple subtypes, such as firmware, FPGA bitstreams and calibration data. Each product may assign values for this field or not use it.

    • 4 byte hardware_compatibility: application-defined. Each bit represents a potentially incompatible hardware revision. This field should set the bit for each hardware version supported. If this field is not used, set to 0.

    • 16 byte serial number identifying device associated with this data. If this field is not used, set to 0.

  • b’END’: Indicate data file end. Must be the last tag.

  • b’CLS’: collection start. The payload is:

    • 8 byte position to the collection end tag. This allows fast seeking to skip the collection data. In streaming datafile mode, the offset is 0.

    • 2 byte file-specific collection identifier

    • 1 byte collection type: 0=unstructured, 1=list, 2=map

    • 1 byte reserved (0)

    • N bytes: optional application specific data.

  • b’CLE’: collection end. May contain application-specific data such as indices to increase access performance.

  • b’SUB’: A subfile, which is often used for storing the calibration record inside the data capture. The payload starts with 128 bytes 127 bytes of UTF-8 encoded characters) that contains the null-terminated file name. Unused bytes MUST be set to 0. The remaining payload is the file in this datafile format.

  • b’IDX’: application-specific index information.

  • b’MJS’: application-specific metadata, JSON formatted.

  • b’AJS’: application-specific data, JSON formatted.

  • b’ABN’: application-specific data, binary formatted.

  • b’UJS’: arbitrary end-user data, JSON formatted.

  • b’UBN’: arbitrary end-user data, binary formatted.

  • b’ENC’: encryption authenticity and integrity information. This tag must follow every block with the encryption bit set.

    • 16 bytes: ChaCha20 + Poly1305 MAC

    • 64 bytes: EdDSA curve25519 using Blake2b hash (monocypher) The signature is computed on the UNENCRYPTED data (sign-then-encrypt) For firmware updates, we care more that the firmware is valid than who created the cryptotext. If you want to prevent cryptotext forgeries, use encrypt-then-sign with use SGS/SGE and the payload only flag.

  • b’SGS’: signature start. This field (inclusive) and all others up to SGE (exclusive) are included in the signature. Note that this file format makes no provisions for managing keys or ensuring key validity.

    • 1 byte: signature type

      • 1 = EdDSA curve25519 using Blake2b hash (monocypher).

    • 1 byte: flags

      • 1 = include this field (default is exclude)

      • 2 = payload only (exclude tag, length & crc32)

    • 6 bytes: reserved zero

    • 32 bytes: public key

  • b’SGE’: signature end. This field is exclude from the signature. Payload is the signature.

class joulescope.datafile.DataFileReader(filehandle)[source]

Create a new instance.

Parameters

filehandle – The file-like object open for read. The file must support read, seek and tell.

advance()[source]

Advance to the next TLV, ignoring data.

Returns

The tag that was skipped.

collection_goto_end()[source]

Skip to the collection end.

Raises

RuntimeError – If the current tag is not a COLLECTION_START.

decrypt(signing_key, encryption_key, nonce, associated_data=None)[source]

Decrypt the next tag, if needed

peek()[source]

Peek at the next available entry.

Returns

tuple (tag, value)

peek_tag_length()[source]

Peek at the next available entry.

Returns

tuple (tag, value_length)

This method gets the tag and length quickly. It does not load the data or validate the checksum.

pretty_print()[source]

Pretty print the datafile structure.

seek(position)[source]

Change to the location of another entry.

Parameters

position – The position returned by a previous call to tell().

skip()[source]

Skip the next available entry, skipping entire collections.

Returns

The tag that was skipped.

tell()[source]

Give the location of the current entry.

Returns

The position suitable for seek().

class joulescope.datafile.DataFileWriter(filehandle)[source]

Create a new instance.

Parameters

filehandle – The file open for write which must support the write, seek and tell methods.

append(tag, data=None, compress=None)[source]

Append a new tag-length-value field to the file.

Parameters
  • tag – The 3-byte tag as either an integer or bytes.

  • data – The associated data for the tag (optional).

  • compress – When False or None, do not attempt to compress the data. When True, attempt to compress the data.

Returns

The starting position for the tag.

append_subfile(name: str, data, compress=None)[source]

Append a subfile.

Parameters
  • name – The name of the subfile, which must fit into 127 bytes encoded as utf-8.

  • data – The data in this datafile format.

  • compress – When False or None, do not attempt to compress the data. When True, attempt to compress the data.

joulescope.datafile.subfile_split(value)[source]

Split a subfile into the name and payload.

Parameters

value – The value in the SUBFILE tag.

Returns

(name, data).