smttask.task_types.UnpureMemoizedTask#
- class smttask.task_types.UnpureMemoizedTask(arg0=None, *, reason=None, **taskinputs)#
A Task whose output does not only depend on the inputs (and thus is not a pure function). An UnpureTask cannot be recorded, because its digest is computed from its output. For the same reason, it is always memoized and should never be cleared. (Since there may be use cases for clearing during a debugging session, it is not explicitely forbidden, but doing so will log a message at the ‘error’ criticality level.)
To motivate the use of such a Task, consider the following set of operations:
TaskA (s: string) -> Return the list of entries in a database containing s. TaskB (l: list|TaskA) -> Return a set of statistics for those entries.
TaskA is unpure: it depends on s, but also on the contents of the database. TaskB is pure, and one can write a reproducible workflow by explicitely specifying all the entries listed in l. But that would be extremely verbose, and it would hide the origin of those entries. The definition above, in terms of the output of TaskA, is clearer and more concise.
It is even more desirable to encode such a task sequence if the database changes rarely, for example only when new experiments are performed. However, if a normal Task is used to encode TaskA, then updating the database would not change the Task’s digest, and thus the statistics would not be recomputed. What we want therefore is to define and display TaskA in terms of its inputs (as with a normal Task), but compute its digest from its outputs.
Because an UnpureMemoizedTask is not recorded, it is also not meaningful to specify a reason argument.
Important
UnpureMemoizedTask still performs in-memory caching (memoization). This means that non-input dependencies (in the example above, the contents of the database) must not change during workflow execution. Similarly, UnpureMemoizedTask should still not have side-effects. Otherwise the result of tasks may depend on their execution order, which is undefined.
- __init__(arg0=None, **taskinputs)#
- Parameters:
arg0 (ParameterSet-like) – ParameterSet, or something which can be cast to a ParameterSet (like a dict or filename). The result will be parsed for task arguments defined in self.inputs.
**taskinputs – Task parameters can also be specified as keyword arguments, and will override those in :param:arg0.
Methods
__init__([arg0])bind(**kwargs)Bind task parameters to given values.
clear()If the result of a previous run was cached, deallocate it.
describe([indent, type_join_str, ...])A more human-friendly representation of the Task parameters.
draw(*args, **kwargs)Draw the dependency graph.
from_desc(desc[, on_fail])Instantiate a class from the description returned by 'desc'.
get_desc(taskinputs[, reason])get_output([name])Return the value of the output associated with name name; if the Task has only one unnamed output, specifying the name is not necessary.
load_inputs()Return a copy of self.taskinputs, with all lazy inputs resolved:
parse_result(result)Parse the task result as an object of type cls.Outputs.
partial(**kwargs)Bind task parameters to given values.
run([cache, recompute, record, reason, ...])save(path[, allow_overwrite])Save a task description.
schematic()Display an ascii-art schematic of the task inputs and outputs.
taskname()validate(value)Attributes
cachedescdigestgraphReturn a dependency graph.
has_runReturns True if a cached result (either in memory or on disk) exists and would be used on a call to run().
hashed_digestinput_filesinrootRoot of the input datastore.
loggernameorig_taskinputsoutputpathsPermanent paths to which task results are saved.
outrootRoot of the output datastore.
relative_outputpathsresultpathsPaths from which task results are retrieved.
saved_to_datastoreReturn True if the outputs are saved to the _output_ data store.
saved_to_input_datastoreReturn True if links matching the outputs exist the _input_ data store.
taskinputsunhashed_digests