.. _using_map: =================== Mapping over Inputs =================== Assume that we have a list of inputs and we wish to create a new target for every input in the list. We can solve this by writing a *template function* and then a for-loop to iterate over our list of inputs: .. code-block:: python from gwf import Workflow, AnonymousTarget gwf = Workflow() def transform_photo(path): inputs = [path] outputs = [path + '.new'] options = {} spec = """./transform_photo {}""".format(path) return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec) photos = gwf.glob('photos/*.jpg') for index, path in enumerate(photos): gwf.target_from_template('TransformPhoto.{}'.format(index), transform_photo(path)) While this is pretty clean, it can become unwieldy and the extra layer of indentation makes the workflow a bit harder to read. In *gwf* 1.6 we introduced *named* inputs and outputs. This means that we can assign names to the paths that a target/template takes as inputs and outputs. Let's rewrite our template to make use of this: .. code-block:: python def transform_photo(path): inputs = {'path': path} outputs = {'path': path + '.new'} options = {} spec = """./transform_photo {}""".format(path) return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec) From *gwf* 1.6 and up you can use the new :func:`map()` method on your workflow to get rid of the explicit iteration in the first example. This is what the example would look like with :func:`map()`: .. code-block:: python from gwf import Workflow, AnonymousTarget gwf = Workflow() def transform_photo(path): inputs = {'path': path} outputs = {'path': path + '.new'} options = {} spec = """./transform_photo {}""".format(path) return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec) photos = gwf.glob('photos/*.jpg') gwf.map(transform_photo, photos) Our for-loop is now gone and has been replaced with a call to :func:`map`. This will iterate through our list of photos, call the template function with each path, and add the resulting targets to our workflow. If we run ``gwf status`` we see this: .. code-block:: text transform_photo_0 shouldrun 0.00% [1/0/0/0] transform_photo_1 shouldrun 0.00% [1/0/0/0] transform_photo_2 shouldrun 0.00% [1/0/0/0] Since I had three photos, three targets have been generated. The name of the target is automatically derived from the name of the template function and an index. Naming Targets Differently -------------------------- That's useful, but what if you don't like the generated target names? If you're fine with the automatic numbering, but want to use another name than the template function name, you can set the ``name`` argument to a string: .. code-block:: python gwf.map(transform_photo, photos, name='TransformPhoto') Let's see what we get then: .. code-block:: text TransformPhoto_0 shouldrun 0.00% [1/0/0/0] TransformPhoto_1 shouldrun 0.00% [1/0/0/0] TransformPhoto_2 shouldrun 0.00% [1/0/0/0] You can also completely customize how the name is generated by giving :func:`map()` a naming function: .. code-block:: python import os.path def get_photo_name(idx, target): filename = os.path.splitext(os.path.basename(target.inputs['path']))[0] return 'transform_photo_{}'.format(filename) gwf.map(transform_photo, photos, name=get_photo_name) This is what we get: .. code-block:: text transform_photo_dog shouldrun 0.00% [1/0/0/0] transform_photo_horse shouldrun 0.00% [1/0/0/0] transform_photo_cat shouldrun 0.00% [1/0/0/0] Pretty and descriptive! Passing Multiple Arguments to a Template Function ------------------------------------------------- What if our template took multiple arguments? Let's modify our template function a bit: .. code-block:: python def transform_photo(path, width): inputs = {'path': path} outputs = {'path': path + '.new'} options = {} spec = """./transform_photo --width {} {}""".format(width, path) return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec) To set the ``width`` argument to ``800`` for all targets generated by :func:`map()` we do this: .. code-block:: python gwf.map(transform_photo, photos, name=get_photo_name, extra={'width': 800}) But what if the width depends on the image? If ``photos`` is a list of dictionaries, :func:`map()` will pass the contents of each dictionary as keyword arguments to the function template: .. code-block:: python photos = [ {'path': 'photos/dog.jpg', 'width': 600}, {'path': 'photos/horse.jpg', 'width': 200}, {'path': 'photos/cat.jpg', 'width': 1000}, ] gwf.map(transform_photo, photos, name=get_photo_name) These two approaches can be combined, so you can pass a list of dictionaries as the inputs and set arguments with ``extra`` as well. Chaining Maps ------------- You may wonder what :func:`map()` actually returns. Let's take a look: .. code-block:: python transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name) print(repr(transformed_photos)) # => TargetList(targets=[Target(name='transform_photo_dog', ...), ...]) We get something called a :class:`gwf.TargetList` back! This is a simple wrapper around a normal list, but it allows us to access all of the inputs and outputs of the targets contained in the :class:`TargetList` through :attr:`inputs` and :attr:`outputs`, respectively. Both of these return a list of the inputs/outputs of the targets. Thus, if your template function uses named outputs, :attr:`outputs` will be a list of dictionaries. How is this useful? If we wanted to use another template on each transformed photo, we can just map over the outputs of the :class:`TargetList`: .. code-block:: python transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name) compressed_photos = gwf.map(compress_photo, transformed_photos.outputs) We could keep on going like this! Collecting Files ---------------- Now that you've transformed and compressed your photos, you may also want to zip them into a single file. For this you wrote a template that looks like this: .. code-block:: python def zip_files(paths, output_path): inputs = {'paths': paths} outputs = {'zipped_file': output_path} options = {} spec = """zip ...""" return AnonymousTarget(inputs=inputs, outputs=outputs, options=options, spec=spec) Your template accepts two arguments: a list of files to zip and where to put the resulting file. We can't use :func:`map()` because we don't want a zip file per photo, but a single target that depends on all of the photo files. We start out by writing our call to :func:`target_from_template()`: .. code-block:: python transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name) compressed_photos = gwf.map(compress_photo, transformed_photos.outputs) gwf.target_from_template( name='zip_photos', zip_files( paths=[...], output_path="photos.zip" ) ) How do we get the list of paths? We can use the :func:`collect()` helper function! .. code-block:: python from gwf.workflow import collect transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name) compressed_photos = gwf.map(compress_photo, transformed_photos.outputs) gwf.target_from_template( name='zip_photos', zip_files( paths=collect(compress_photos.outputs, ['path']), output_path="photos.zip" ) ) The :func:`collect()` function takes a list of dictionaries and produces a single dictionary containing a list for each key. So when given this: .. code-block:: python [ {'path': 'photos/dog.jpg.new'}, {'path': 'photos/horse.jpg.new'}, {'path': 'photos/cat.jpg.new'}, ] it simply produces this: .. code-block:: python { 'paths': ['photos/dog.jpg.new', 'photos/horse.jpg.new', 'photos/cat.jpg.new'], } Note that the name `path` has been pluralized, so it's now `paths` in the dictionary. We can pass this directly to our ``zip_files`` template: .. code-block:: python from gwf.workflow import collect transformed_photos = gwf.map(transform_photo, photos, name=get_photo_name) compressed_photos = gwf.map(compress_photo, transformed_photos.outputs) gwf.target_from_template( name='zip_photos', zip_files( paths=**collect(compress_photos.outputs, ['path']), output_path="photos.zip" ) ) We use Python's double-star operator to pass a dictionary as keyword arguments to a function.