Mapping over Inputs¶
Assume that we have a list of inputs and we wish to create a new target for every input in the list. We can solve this by writing a template function and then a for-loop to iterate over our list of inputs:
from gwf import Workflow, AnonymousTarget
gwf = Workflow()
def transform_photo(path):
inputs = [path]
outputs = [path + '.new']
options = {}
spec = """./transform_photo {}""".format(path)
return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec)
photos = gwf.glob('photos/*.jpg')
for index, path in enumerate(photos):
gwf.target_from_template('TransformPhoto.{}'.format(index), transform_photo(path))
While this is pretty clean, it can become unwieldy and the extra layer of indentation makes the workflow a bit harder to read.
In gwf 1.6 we introduced named inputs and outputs. This means that we can assign names to the paths that a target/template takes as inputs and outputs. Let’s rewrite our template to make use of this:
def transform_photo(path):
inputs = {'path': path}
outputs = {'path': path + '.new'}
options = {}
spec = """./transform_photo {}""".format(path)
return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec)
From gwf 1.6 and up you can use the new map()
method on your workflow
to get rid of the explicit iteration in the first example. This is what the
example would look like with map()
:
from gwf import Workflow, AnonymousTarget
gwf = Workflow()
def transform_photo(path):
inputs = {'path': path}
outputs = {'path': path + '.new'}
options = {}
spec = """./transform_photo {}""".format(path)
return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec)
photos = gwf.glob('photos/*.jpg')
gwf.map(transform_photo, photos)
Our for-loop is now gone and has been replaced with a call to map()
.
This will iterate through our list of photos, call the template function with
each path, and add the resulting targets to our workflow.
If we run gwf status
we see this:
transform_photo_0 shouldrun 0.00% [1/0/0/0]
transform_photo_1 shouldrun 0.00% [1/0/0/0]
transform_photo_2 shouldrun 0.00% [1/0/0/0]
Since I had three photos, three targets have been generated. The name of the target is automatically derived from the name of the template function and an index.
Naming Targets Differently¶
That’s useful, but what if you don’t like the generated target names? If you’re
fine with the automatic numbering, but want to use another name than the
template function name, you can set the name
argument to a string:
gwf.map(transform_photo, photos, name='TransformPhoto')
Let’s see what we get then:
TransformPhoto_0 shouldrun 0.00% [1/0/0/0]
TransformPhoto_1 shouldrun 0.00% [1/0/0/0]
TransformPhoto_2 shouldrun 0.00% [1/0/0/0]
You can also completely customize how the name is generated by giving
map()
a naming function:
import os.path
def get_photo_name(idx, target):
filename = os.path.splitext(os.path.basename(target.inputs['path']))[0]
return 'transform_photo_{}'.format(filename)
gwf.map(transform_photo, photos, name=get_photo_name)
This is what we get:
transform_photo_dog shouldrun 0.00% [1/0/0/0]
transform_photo_horse shouldrun 0.00% [1/0/0/0]
transform_photo_cat shouldrun 0.00% [1/0/0/0]
Pretty and descriptive!
Passing Multiple Arguments to a Template Function¶
What if our template took multiple arguments? Let’s modify our template function a bit:
def transform_photo(path, width):
inputs = {'path': path}
outputs = {'path': path + '.new'}
options = {}
spec = """./transform_photo --width {} {}""".format(width, path)
return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec)
To set the width
argument to 800
for all targets generated by
map()
we do this:
gwf.map(transform_photo, photos, name=get_photo_name, extra={'width': 800})
But what if the width depends on the image? If photos
is a list of
dictionaries, map()
will pass the contents of each dictionary as keyword
arguments to the function template:
photos = [
{'path': 'photos/dog.jpg', 'width': 600},
{'path': 'photos/horse.jpg', 'width': 200},
{'path': 'photos/cat.jpg', 'width': 1000},
]
gwf.map(transform_photo, photos, name=get_photo_name)
These two approaches can be combined, so you can pass a list of dictionaries as
the inputs and set arguments with extra
as well.
Chaining Maps¶
You may wonder what map()
actually returns. Let’s take a look:
transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name)
print(repr(transformed_photos))
# => TargetList(targets=[Target(name='transform_photo_dog', ...), ...])
We get something called a gwf.TargetList
back! This is a simple wrapper
around a normal list, but it allows us to access all of the inputs and outputs
of the targets contained in the TargetList
through inputs
and
outputs
, respectively. Both of these return a list of the inputs/outputs
of the targets. Thus, if your template function uses named outputs,
outputs
will be a list of dictionaries.
How is this useful? If we wanted to use another template on each transformed
photo, we can just map over the outputs of the TargetList
:
transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name)
compressed_photos = gwf.map(compress_photo, transformed_photos.outputs)
We could keep on going like this!
Collecting Files¶
Now that you’ve transformed and compressed your photos, you may also want to zip them into a single file. For this you wrote a template that looks like this:
def zip_files(paths, output_path):
inputs = {'paths': paths}
outputs = {'zipped_file': output_path}
options = {}
spec = """zip ..."""
return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec)
Your template accepts two arguments: a list of files to zip and where to put
the resulting file. We can’t use map()
because we don’t want a zip file
per photo, but a single target that depends on all of the photo files.
We start out by writing our call to target_from_template()
:
transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name)
compressed_photos = gwf.map(compress_photo, transformed_photos.outputs)
gwf.target_from_template(
name='zip_photos',
zip_files(
paths=[...],
output_path="photos.zip"
)
)
How do we get the list of paths? We can use the collect()
helper
function!
from gwf.workflow import collect
transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name)
compressed_photos = gwf.map(compress_photo, transformed_photos.outputs)
gwf.target_from_template(
name='zip_photos',
zip_files(
paths=collect(compress_photos.outputs, ['path']),
output_path="photos.zip"
)
)
The collect()
function takes a list of dictionaries and produces a
single dictionary containing a list for each key. So when given this:
[
{'path': 'photos/dog.jpg.new'},
{'path': 'photos/horse.jpg.new'},
{'path': 'photos/cat.jpg.new'},
]
it simply produces this:
{
'paths': ['photos/dog.jpg.new', 'photos/horse.jpg.new', 'photos/cat.jpg.new'],
}
Note that the name path has been pluralized, so it’s now paths in the
dictionary. We can pass this directly to our zip_files
template:
from gwf.workflow import collect
transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name)
compressed_photos = gwf.map(compress_photo, transformed_photos.outputs)
gwf.target_from_template(
name='zip_photos',
zip_files(
paths=**collect(compress_photos.outputs, ['path']),
output_path="photos.zip"
)
)
We use Python’s double-star operator to pass a dictionary as keyword arguments to a function.