Selector

Functions and classes for performing message selection.

Selection in this context means making sure only one message refering to some file will be published further.

This is useful when multiple source for the same data are sending messages (eg two reception servers for eumetcast) but only one of each file is needed for further processing.

To check if two messages refer to the same data, the uid metadata of the messages is used.

A command-line script is also made available by this module. It is called pytroll-selector:

usage: pytroll-selector [-h] [-l LOG_CONFIG] config

Selects unique messages (based on uid) from multiple sources.

positional arguments:
  config                The yaml config file.

options:
  -h, --help            show this help message and exit
  -l LOG_CONFIG, --log-config LOG_CONFIG
                        The yaml config file for logging.

Thanks for using pytroll-selector!

An example config file to use with this script is the following:

selector_config:
  ttl: 30
publisher_config:
  name: hrit_selector
subscriber_config:
  addresses:
    - tcp://eumetcast_reception_1:9999
    - tcp://eumetcast_reception_2:9999
  nameserver: false
  topics:
    - /1b/hrit-segment/0deg

The different sections are passed straight on to run_selector(), so check it to have more information about what to pass to it.

class pytroll_watchers.selector.TTLDict(ttl=300)

A simple dictionary-like object that discards items older than a time-to-live.

Not thread-safe.

Parameters:

ttl – the time to live of the stored items in seconds.

pytroll_watchers.selector.cli(args=None)

Command line interface.

pytroll_watchers.selector.run_selector(selector_config, subscriber_config, publisher_config)

Run the selector.

The aim of the selector is to skip messages that are duplicates to already published messages. Duplicate in this context means messages referring to the same file (even if stored in different locations).

Messages that refer to new files will be published.

Parameters:
  • selector_config – A dictionary providing a ttl for the selector as seconds, so that incoming messages are forgotten after that time. If not provided, the ttl defaults to 300 seconds (5 minutes).

  • subscriber_config – a dictionary of arguments to pass to create_subscriber_from_dict_config(). The subscribtion is used as a source for messages to process.

  • publisher_config – a dictionary of arguments to pass to create_publisher_from_dict_config(). This publisher will send the selected messages.

pytroll_watchers.selector.running_selector(selector_config, subscriber_config)

Generate selected messages.

The aim of this generator is to skip messages that are duplicates to already processed messages. Duplicate in this context means messages referring to the same file (even if stored in different locations).

Parameters:
  • selector_config – a dictionary providing a ttl in seconds, otherwise it defaults to 300 seconds (5 minutes).

  • subscriber_config – a dictionary of arguments to pass to create_subscriber_from_dict_config().

Yields:

JSON representations of posttroll messages.

pytroll_watchers.selector.unique_key(msg)

Identify the content of the message with a unique key.