Mapping over Inputs

Assume that we have a list of inputs and we wish to create a new target for every input in the list. We can solve this by writing a template function and then a for-loop to iterate over our list of inputs:

from gwf import Workflow, AnonymousTarget

gwf = Workflow()

def transform_photo(path):
    inputs = [path]
    outputs = [path + '.new']
    options = {}
    spec = """./transform_photo {}""".format(path)
    return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec)

photos = gwf.glob('photos/*.jpg')
for index, path in enumerate(photos):
    gwf.target_from_template('TransformPhoto.{}'.format(index), transform_photo(path))

While this is pretty clean, it can become unwieldy and the extra layer of indentation makes the workflow a bit harder to read.

In gwf 1.6 we introduced named inputs and outputs. This means that we can assign names to the paths that a target/template takes as inputs and outputs. Let’s rewrite our template to make use of this:

def transform_photo(path):
    inputs = {'path': path}
    outputs = {'path': path + '.new'}
    options = {}
    spec = """./transform_photo {}""".format(path)
    return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec)

From gwf 1.6 and up you can use the new map() method on your workflow to get rid of the explicit iteration in the first example. This is what the example would look like with map():

from gwf import Workflow, AnonymousTarget

gwf = Workflow()

def transform_photo(path):
    inputs = {'path': path}
    outputs = {'path': path + '.new'}
    options = {}
    spec = """./transform_photo {}""".format(path)
    return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec)

photos = gwf.glob('photos/*.jpg')
gwf.map(transform_photo, photos)

Our for-loop is now gone and has been replaced with a call to map(). This will iterate through our list of photos, call the template function with each path, and add the resulting targets to our workflow.

If we run gwf status we see this:

transform_photo_0    shouldrun       0.00% [1/0/0/0]
transform_photo_1    shouldrun       0.00% [1/0/0/0]
transform_photo_2    shouldrun       0.00% [1/0/0/0]

Since I had three photos, three targets have been generated. The name of the target is automatically derived from the name of the template function and an index.

Naming Targets Differently

That’s useful, but what if you don’t like the generated target names? If you’re fine with the automatic numbering, but want to use another name than the template function name, you can set the name argument to a string:

gwf.map(transform_photo, photos, name='TransformPhoto')

Let’s see what we get then:

TransformPhoto_0    shouldrun       0.00% [1/0/0/0]
TransformPhoto_1    shouldrun       0.00% [1/0/0/0]
TransformPhoto_2    shouldrun       0.00% [1/0/0/0]

You can also completely customize how the name is generated by giving map() a naming function:

import os.path

def get_photo_name(idx, target):
    filename = os.path.splitext(os.path.basename(target.inputs['path']))[0]
    return 'transform_photo_{}'.format(filename)

gwf.map(transform_photo, photos, name=get_photo_name)

This is what we get:

transform_photo_dog      shouldrun       0.00% [1/0/0/0]
transform_photo_horse    shouldrun       0.00% [1/0/0/0]
transform_photo_cat      shouldrun       0.00% [1/0/0/0]

Pretty and descriptive!

Passing Multiple Arguments to a Template Function

What if our template took multiple arguments? Let’s modify our template function a bit:

def transform_photo(path, width):
    inputs = {'path': path}
    outputs = {'path': path + '.new'}
    options = {}
    spec = """./transform_photo --width {} {}""".format(width, path)
    return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec)

To set the width argument to 800 for all targets generated by map() we do this:

gwf.map(transform_photo, photos, name=get_photo_name, extra={'width': 800})

But what if the width depends on the image? If photos is a list of dictionaries, map() will pass the contents of each dictionary as keyword arguments to the function template:

photos = [
    {'path': 'photos/dog.jpg', 'width': 600},
    {'path': 'photos/horse.jpg', 'width': 200},
    {'path': 'photos/cat.jpg', 'width': 1000},
]

gwf.map(transform_photo, photos, name=get_photo_name)

These two approaches can be combined, so you can pass a list of dictionaries as the inputs and set arguments with extra as well.

Chaining Maps

You may wonder what map() actually returns. Let’s take a look:

transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name)
print(repr(transformed_photos))
# => TargetList(targets=[Target(name='transform_photo_dog', ...), ...])

We get something called a gwf.TargetList back! This is a simple wrapper around a normal list, but it allows us to access all of the inputs and outputs of the targets contained in the TargetList through inputs and outputs, respectively. Both of these return a list of the inputs/outputs of the targets. Thus, if your template function uses named outputs, outputs will be a list of dictionaries.

How is this useful? If we wanted to use another template on each transformed photo, we can just map over the outputs of the TargetList:

transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name)
compressed_photos = gwf.map(compress_photo, transformed_photos.outputs)

We could keep on going like this!

Collecting Files

Now that you’ve transformed and compressed your photos, you may also want to zip them into a single file. For this you wrote a template that looks like this:

def zip_files(paths, output_path):
    inputs = {'paths': paths}
    outputs = {'zipped_file': output_path}
    options = {}
    spec = """zip ..."""
    return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec)

Your template accepts two arguments: a list of files to zip and where to put the resulting file. We can’t use map() because we don’t want a zip file per photo, but a single target that depends on all of the photo files.

We start out by writing our call to target_from_template():

transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name)
compressed_photos = gwf.map(compress_photo, transformed_photos.outputs)
gwf.target_from_template(
    name='zip_photos',
    zip_files(
        paths=[...],
        output_path="photos.zip"
    )
)

How do we get the list of paths? We can use the collect() helper function!

from gwf.workflow import collect

transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name)
compressed_photos = gwf.map(compress_photo, transformed_photos.outputs)
gwf.target_from_template(
    name='zip_photos',
    zip_files(
        paths=collect(compress_photos.outputs, ['path']),
        output_path="photos.zip"
    )
)

The collect() function takes a list of dictionaries and produces a single dictionary containing a list for each key. So when given this:

[
    {'path': 'photos/dog.jpg.new'},
    {'path': 'photos/horse.jpg.new'},
    {'path': 'photos/cat.jpg.new'},
]

it simply produces this:

{
    'paths': ['photos/dog.jpg.new', 'photos/horse.jpg.new', 'photos/cat.jpg.new'],
}

Note that the name path has been pluralized, so it’s now paths in the dictionary. We can pass this directly to our zip_files template:

from gwf.workflow import collect

transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name)
compressed_photos = gwf.map(compress_photo, transformed_photos.outputs)
gwf.target_from_template(
    name='zip_photos',
    zip_files(
        paths=**collect(compress_photos.outputs, ['path']),
        output_path="photos.zip"
    )
)

We use Python’s double-star operator to pass a dictionary as keyword arguments to a function.