Record store viewer#
The basic code to create a new viewer is
This offers a lazy iterator over the whole record store with filtering capabilities. The elements of this iterator are read-only objects (“views”) of the underlying records.
See also the usage examples in the Getting Started pages and In-depth documentation.
Record store viewer#
- class smttask.view.recordstoreview.RecordStoreView(iterable=None, project=None)#
This class provides a read-only view to a subset of a ~sumatra.recordstore.RecordStore. It is composed of two parts: the underlying record store, and an iterable which iterates over all or some of the records of that record store. A RecordStoreView provides an iterator which ensures that all returned elements RecordViews and therefore also read-only.
The given iterable is not automatically converted to a list – a version underlined by a list (which thus supports len(), indexing, etc.) can be obtained with the .list property. RecordStoreView objects provide a filter interface. If rsview is such an object, then
rsview.filter(cond) is the same as filter(rsview, cond)
rsview.filter.output_data() keeps only records which produced output.
The list of defined filters is stored in the dictionary
rsview.filter.registered_filters. Custom filters can also be defined with the decorator smttask.view.recordfilter.record_filter.Exception to the ‘read-only’ property: the methods add_tag and remove_tag must for obvious reasons modify the underlying record store.
Caution
If constructed from a generator, iterating over the RecordStoreView will consume the generator. Use the .list property to ensure the used iterable is non-consumable.
- add_comment(comment, replace=False)#
Add a comment to all records in the record store view (this is stored in the ‘outcome’ variable). If the comment is already present, it is not added again.
Note
At the risk of stating the obvious, this function will modify the underlying record store.
- add_tag(tag)#
Add a tag to all records in the record store view. Multiple tags can be specified, by wrapping them in a set, list or tuple.
Note
At the risk of stating the obvious, this function will modify the underlying record store.
- add_tags(tag)#
Add a tag to all records in the record store view. Multiple tags can be specified, by wrapping them in a set, list or tuple.
Note
At the risk of stating the obvious, this function will modify the underlying record store.
- aslist()#
Return records as a list. Triggers caching of the result.
- copy(iterable=None)#
Return a new RecordStoreView for the same record store. The view will include the same records as self, unless iterable is specified.
- dframe(include: Sequence[str] = ('timestamp', 'duration', 'reason', 'outcome', 'main_file', 'script_arguments', 'parameters', 'tags', 'command_line', 'version', 'executable'), exclude: Sequence[str] = (), field_types: Dict[str, Callable] | None = None, exclude_surrogates: bool = False) pd.DataFrame#
Convert to a Pandas DataFrame. Record attributes are mapped to columns.
- Parameters:
include (Determines both which record fields to include, and in which) – order.
exclude (Has precedence over include.)
field_types (Collection of initializers for the column fields.) – Typically a plain type (like str or int), but can also be a function. This is applied to the column value before constructing the DataFrame.
exclude_surrogates (Whether to exclude surrogate records. (Records) – which where created to link task parameters to outputs, when the original record is missing.) Typically surrogate records are excluded from run statistics, since their timestamp, runtime, etc. are undefined.
- property earliest#
Return the record with the earliest timestamp.
- property first#
Return the first record in the record store; in many cases this is also the earliest record. Prefer this over earliest if returning a “old but not necessarily the first” record is acceptable, since first is much faster and O(1) in the number of records.
- labels()#
Return the list of labels.
Note: like list, the result is cached.
- property last#
Return the last record in the record store; in many cases this is also the latest record. Prefer this over latest if returning a “recent but not necessarily last” record is acceptable, since last is much faster and O(1) in the number of records.
Caution
This requires that the underlying iterator be reversible. This is always the case after calling .list, but typically not the case for consumable iterables.
- property latest#
Return the record with the latest timestamp.
Note
This function will iterate over all records to ensure it returns the one with the latest time stamp. See also last for a similar method with much faster O(1) executation time, but without the guarantee of returning the latest record.
- property list#
Ensure the contents of the iterable are cached as a list.
- Returns:
self
- most_recent()#
Return the label of the most recent record.
- rebuild_input_datastore(link_creation_function: Callable[[Record], List[Tuple[Path, Path]]])#
Iterate through the record store, recompute the output file links for each record and recreate all the links in the input data store (i.e. on the file system) to match the recompute names.
- Parameters:
link_creation_function (Callable) – (record) -> [(link location 1, link target 1), (link location 2, link target 2), … ] Both link location and link target should be relative to the roots of the input and output data stores respectively.
- rebuild_links(link_creation_function: Callable[[Record], List[Tuple[Path, Path]]])#
Iterate through the record store, recompute the output file links for each record and recreate all the links in the input data store (i.e. on the file system) to match the recompute names.
- Parameters:
link_creation_function (Callable) – (record) -> [(link location 1, link target 1), (link location 2, link target 2), … ] Both link location and link target should be relative to the roots of the input and output data stores respectively.
- remove_tag(tag)#
Remove the tag from all records in the record store view. Multiple tags can be specified, by wrapping them in a set, list or tuple.
Note
At the risk of stating the obvious, this function will modify the underlying record store.
- remove_tags(tag)#
Remove the tag from all records in the record store view. Multiple tags can be specified, by wrapping them in a set, list or tuple.
Note
At the risk of stating the obvious, this function will modify the underlying record store.
- splitby(split_fields: Sequence[str], split_names: Sequence[str] = None, drop_unused_split_fields: bool = True, get_field_value: Callable[[Any, str, Any], Any] | None = None) Dict[Tuple[str], RecordStoreView]#
Split the RecordStoreView into multiple, disjoint views, based on their values in the fields specified in split_fields. This is analogous to a ‘groupby’ operation. Grouping is done in the order in which fields appear in split_fields.
Support for hierarchical keys If a value returned by get_field_value is a namedtuple (or has a ‘_fields’ attribute), then the subfields are extracted. For example, if the field ‘time’ returns namedtuples with fields ‘start’ and ‘stop’, then the split key field ‘time’ is replaced with ‘time_start’ and ‘time_stop’.
Caution
The support for hierarchical keys is experimental and has the following limitations:
If a get_field_value returns a namedtuple for a given field name, its fields must be the same for all records.
Only one nesting level is currently supported.
Caution
If a field value is not hashable, it is converted to a string.
- Parameters:
split_fields (List[str]) – The record attributes used to split record store views. See above for the treatment of hierarchical parameters.
split_names (List[str]) – List of field names to use in the keys of the returned dictionaries. If not provided, inferred from split_fields.
drop_unused_split_fields (bool) –
Whether to omit from the key type fields which lead to no splitting. If True, fields are removed if they satisfy one of two conditions:
All records have the same value for that field. (E.g. if ‘α’ is given in the split fields, but all records have the same value for ‘α’.) This is independent of the ordering in split_fields.
If higher priority fields would produce the same split. (E.g. if setting split_fields to either [‘α’, ‘β’] or [‘α’] would produce the same splits; i.e. the values of ‘β’ are constant when conditioned on ‘α’.) This depends on the ordering, with rightmost fields removed first.
Default is True.
get_field_value (Callable) – The function to use to recover field values from records. This is used e.g. by smttask to simplify the specification of parameters from nested task specifications. The function should have the same signature as getattr: (obj, attr:str, default) -> obj.attr or default If unspecified, it defaults to view.config.get_field_value.
- Returns:
dict – multikey is a namedtuple, of same length as split_fields. It stores the value for each of these fields in that group. Its field names are determined by split_names.
- Return type:
{multikey: RecordStoreView}
- property summaries: hv.HoloMap#
A memoized version of compute_version.
- property summary: RecordStoreSummary#
Return a RecordStoreSummary. NOTE: This becoming obsolete by a combination of dframe and summaries.
- summary_hist(stat_field: str) hv.Overlay#
stat_field: One of the fields listed in self.summary_fields.
- update_reason(reason: str | Dict[str, str] | Callable[[str], str], mode: str = 'prepend')#
Update the ‘reason’ field for all records in the record store view.
- Param:reason:
Either:
String to add to the records reasons (or to replace with)
Callback function, taking the record’s ‘reason’ string and returning the updated one. If this function returns None or the unmodified reason string, the record is not modified.
- Param:mode:
One of ‘prepend’, ‘append’, ‘replace all’, ‘replace substr’, ‘callback’. Modes ‘replace substr’ and ‘callback’ can be left unspecified: they are inferred from the type of reason.
If the mode is ‘prepend’ or ‘append’, and reason is already a substring of the record’s ‘reason’ field at any position, then the record is not modified. This is to reduce the likelihood of accidentally growing the ‘reason’ field (e.g., with two functions each prepending different strings).
Note
Some standardizations are applied to all reason strings, even if they are are otherwise unmodified.
Modes
"prepend"The new reason is reason + record.reason.
"append"The new reason is record.reason + reason.
"replace all"The new reason is reason.
"replace substr"For each {pattern: string} pair in reason, we call
re.sub(pattern, string, reason). All occurences of ‘pattern’ are replaced by ‘string’."callback"The new reason is
callback(reason)."standardize"Only apply the standardizations.
Standardizations
Sequences (tuple, list, etc.) of length one are replaced by their first element. This is because while it is possible to store sequences in the ‘reason’ field, a string is really the expected format and better supported (both by the schema and by the UI).
Note
At the risk of stating the obvious, this function will modify the underlying record store.
Record view#
- class smttask.view.recordview.RecordView(record, rsview=None, *args, **kwargs)#
A read-only interface to Sumatra records with extra convenience methods. In contrast to Sumatra.Record, RecordView is hashable and thus can be used in sets or as a dictionary key.
- get_output(name='', data_types=(<class 'scityping.base.Serializable'>, ))#
Load the output data associated the provided name. (The association to name is done by matching the output path.) name should be such that that exactly one output path matches; if the record produced only one output, name is not required.
After having found the output file path, the method attempts to load it with the provided data models; the first to succeed is returned. A list of data types can be provided via the data_types, but in general it is more convenient to set a default list with the class variable RecordView.data_types. Types passed as arguments have precedence.
Data types are expected to be types defined by the Scityping package. Other types can be used, as long as they either:
Define a class method validate, which parses json data. It will be called as
mytype.validate(json_data).Accept json data as an argument, i.e.
mytype(json_data)
In both cases, they must raise TypeError if json_data is not compatible with the type.
If none of the types are able to derialize the data, the JSON data is returned as-is.
- get_param(name: str | Sequence, default: Any = <value not provided>)#
A convenience function for retrieving values from the record’s parameter set. Attributes of object types are accessed with slightly convoluted syntax, and this gets especially cumbersome with nested parameters. This function is applied recursively, at each level selecting the appropriate syntax depending on the value’s type.
This is a wrapper around smttask.view._utils.get_task_param.
- Parameters:
name (str | Sequence) – The key or attribute name to retrieve. Nested attributes can be specified by listing each level separated by a period. Multiple names can be specified by wrapping them in a list or tuple; they are tried in sequence and the first attribute found is returned. This can be used to deal with tasks that may have differently named equivalent arguments.
default (Any) – If the attributed is not found, return this value. If not specified, a KeyError is raised.
- Return type:
The value matching the attribute, or otherwise the value of default.
- Raises:
KeyError: – If the key name is not found and default is not set.
- invalidate()#
Prevent the result of a recorded task from being used, without removing removing it from the record store. Running the task, or using it as an input, will cause it to be reexecuted (rather than retrieved from disk), but retrieving the task’s output with “get_output” will still return the original result.
This is accomplished by deleting the link in the _input_ data store, while leaving the original file in the _output_ data store.
- property resultpaths#
Return the list of existing paths in the input datastore corresponding to outputs from this record. These are all symlinks to files in the output datastore; only links which point to files associated to this record are returned. (Recall that if a task is re-run, its result link will be changed to point to the newest result.)
- property task_code#
Synonym for script_content. Returns the content of the module where the task was defined, as it was when it was executed.(This is done by retrieving the file from version control; it is not actually stored in the record store.)
- property task_name#
We mapped Sumatra’s script_argument to store the task name, so this is a synonym for record.script_arguments.
- property task_type: Type[Task]#
Retrieve the Task class which generated this record. This is done on a best effort basis:
The module containing the task must already have been imported. (We don’t automatically import modules, which may have adverse affects and is a security risk.)
Modules with non-standard naming conventions may not be found: we just try to match the file name to the modules in sys.module. If exactly one match is found, we retrieve the Task from that module.
- update_reason(reason: str | Dict[str, str] | Callable[[str], str], mode: str = 'prepend')#
Update the ‘reason’ field for this record.
- Param:reason:
Either:
String to add to the records reasons (or to replace with).
Callback function, taking the record’s ‘reason’ string and returning the updated one. If this function returns None or the unmodified reason string, the record is not modified.
- Param:mode:
One of ‘prepend’, ‘append’, ‘replace all’, ‘replace substr’, ‘callback’. Modes ‘replace substr’ and ‘callback’ can be left unspecified: they are inferred from the type of reason.
If the mode is ‘prepend’ or ‘append’, and reason is already a substring of the record’s ‘reason’ field at any position, then the record is not modified. This is to reduce the likelihood of accidentally growing the ‘reason’ field (e.g., with two functions each prepending different strings).
Note
Some standardizations are applied to all reason strings, even if they are are otherwise unmodified.
Modes
"prepend"The new reason is reason + record.reason.
"append"The new reason is record.reason + reason.
"replace all"The new reason is reason.
"replace substr"For each {pattern: string} pair in reason, we call
re.sub(pattern, string, reason). All occurences of ‘pattern’ are replaced by ‘string’."callback"The new reason is
callback(reason)."standardize"Only apply the standardizations.
Standardizations
Sequences (tuple, list, etc.) of length one are replaced by their first element. This is because while it is possible to store sequences in the ‘reason’ field, a string is really the expected format and better supported (both by the schema and by the UI).
Note
At the risk of stating the obvious, this function will modify the underlying record store.
List of record store filters#
|
The default filter: keep records for which fn returns True. |
|
Keep only records which occured before the given date. |
|
Keep only records which occurred after the given date. |
|
Keep only records which occurred on the given date. |
|
Keep records for which the label contains substr. |
|
Keep only records whose number of output files is between minimum and maximum. |
|
Keep records for which the “outcome” value contains substr. |
|
Keep records for which the “outcome” value does not contain substr. |
|
Keep records for which at least one output file path contains substr. |
|
Keep records for which the “reason” value contains substr. |
|
Keep records for which the “reason” value does not contains substr. |
|
Keep records for which the “main_file” value contains substr. |
|
Keep records for which the “script_arguments” value contains substr. |
|
Keep records for which the task name contains substr. |
|
Keep records for which the “stdout_stderr value contains substr. |
|
Keep records containing all the specified tags. |
|
Keep records containing at least one of the specified tags. |
|
Keep records containing none of the specified tags. |
|
Keep records for which the “user” value contains substr. |
|
Keep records for which the “version” begins with prefixstr. |