Posts Zipping By The Longest Range
Post
Cancel

Zipping By The Longest Range

Long time now, ranges have become a large part of every C++ codebase and, thus, a part of C++ programmer’s everyday life. And indeed, C++ offers many ways to operate on the elements of a range.

But sometimes software requirements go beyond the power of STL…

In this article, I will try to explain several ways on how to zip multiple ranges together as well as the downsides of those approaches, and what we can do to overcome those downsides…

The standard way of zipping ranges

Let’s say we have two std::vectors like:

1
2
std::vector<int> a{ 1, 2, 3 };
std::vector<float> b{ 4.1, 5.2, 6.3 };

and we want to have a third range with the following content: { (1, 4.1), (2, 5.2), (3, 6.3) }. The standard and straightforward way to achieve this would be to use std::transform like:

1
2
3
4
5
6
7
8
9
std::vector<std::pair<int, float>> c;

std::transform(
    a.begin(), a.end(), b.begin(),
    std::back_inserter(c),
    [](int const x, float const y) {
        return std::make_pair(x, y);
    }
);

But this approach has its downsides…

First of all, we are limited to zipping only two ranges together when there might be some cases where we want to zip as many ranges as possible. Second, we need to create a new std::vector instance which unfortunately results in memory allocation and, hence, a drop in performance. Consequently, we cannot use std::transform as a part of a range-based for loop like:

1
2
3
for (auto&& i : std::transform(...)) {
    std::cout << i.first << " " << i.second << std::endl;
}

Instead, we need to have an additional variable which IMO introduces a bit of noise to our code.

In the end, if we ever pass two (or, in case of C++17, maximum of three) ranges of different lengths, we would run into an Undefined Behavior as we would iterate off the end of one of the ranges.

Now that we have seen some drawbacks of std::transform, let’s see other options we have…

Using Boost library

Boost library has solved some of the issues we have addressed in the previous section. Its boost::combine function accepts a variadic number of ranges to be zipped and returns combined_range (which is an iterator_range of a zip_iterator).

1
2
3
4
5
6
7
8
9
#include <boost/range/combine.hpp>

std::list<int> a{ 1, 2, 3 };
std::vector<float> b{ 4.1, 5.2, 6.3 };
std::array<std::string, 3> c{ "a", "b", "c" };

for (auto&& t : boost::combine(a, b, c)) {
    std::cout << t.get<0>() << " " << t.get<1>() << " " << t.get<2>() << std::endl;
}

Since boost::combine uses iterator_range as a return value, there is no more memory allocation and performance drop associated to it. Also, we got rid of that boring temporary variable.

So what’s the problem with the Boost library then?

The general problem with the Boost library is its massiveness and the fact that you may only need boost::combine function but what you get is the whole library with lots of additional features that you don’t actually use. Also, using a 3rd party library is, by itself, not the best thing in the world. It’s a dependency which can, especially in the case of Boost, increase compilation time, complicate your deployment etc. Apart from this, there is another problem that’s present with std::transform solution too… and that’s the famous Undefined Behavior, i.e. if your input ranges are of the different lengths, then your program might crash or iterate beyond the end.

So… Boost is not perfect either…

Let’s see some more…

C++20 standard and ranges-v3 library

C++20 standard has brought up lots of awesome features and one of them is ranges library. However, despite being proposed in P1035R4, zip_view adaptor didn’t make its way through C++20 😢 (at least as far as I know so please correct me if I am wrong). Therefore, we will be talking only about Eric Niebler’s ranges-v3 library here.

Now, let’s start by explaining what actually a view in ranges-v3 library is… By definition, a view is a lightweight wrapper that presents a view of an underlying sequence of elements in some custom way without mutating or copying it. Thus, views are cheap to create and copy, and have non-owning reference semantics.

Following example shows how we can use a zip function which, given N ranges, returns a new range where the ith element is a tuple of ith elements of all N ranges:

1
2
3
4
5
6
7
8
9
#include <range/v3/all.hpp>

std::list<int> a{ 1, 2, 3 };
std::vector<float> b{ 4.1, 5.2, 6.3 };
std::array<std::string, 3> c{ "a", "b", "c" };

for (auto&& [i, j, k] : ranges::v3::view::zip(a, b, c)) {
    std::cout << i << " " << j << " " << k << << std::endl;
}

As stated in proposal P1035R4:

The typical zip operation transforms several input ranges into a single input range containing a tuple with the ith element from each range, and terminates when the smallest finite range is delimited.

Therefore, unlike std::transform and Boost library, ranges::v3::view::zip function does not lead us into an Undefined Behavior if input ranges are of different lengths.

So, basically ranges-v3 library solves pretty much all the problems we had with the previous two approaches. But… there might be something missing here… What about zipping by the longest input range?

For example, Python’s itertools module provides such functionality with zip_longest function which makes an iterator that aggregates elements from each of the ranges. If the ranges are of uneven length, missing values are filled-in with the value specified on a function call and iteration continues until the longest range is exhausted. Pretty cool 😎, right?

Let’s try and make something similar in C++…

Zip by the longest one!

First, we need to set the requirements for our zip_longest function:

  1. zip as many ranges as possible
  2. iterate up to the length of the longest range
  3. if one of the ranges runs out early, set its values to default ones

So, the usage in the end would look something like:

1
2
3
4
5
6
7
8
float a[]{ 1.2, 2.3, 3.4, 4.5 };
std::vector<int> b{ 1, 2, 3, 4, 5 };
std::array<double, 3> c{ 1.1, 2.2, 3.3 };
std::vector<std::string> d{ "a", "b", "c" };

for (auto&& [i, j, k, l] : zip_longest(a, b, c, d)) {
    std::cout << i << " - " << j << " - " << k << " - " << l << std::endl;
}

and the result output:

1
2
3
4
5
1.2 - 1 - 1.1 - a
2.3 - 2 - 2.2 - b
3.4 - 3 - 3.3 - c
4.5 - 4 - 0 -
0 - 5 - 0 -

Let’s start with the implementation details…

zip_longest function is a some sort of a helper function which accepts parameter pack (thus, it supports as many ranges as possible), forwards it to the zip_longest_range constructor and returns the constructed instance:

1
2
3
4
5
6
template <typename... Range>
[[nodiscard]] constexpr zip_longest_range<Range...> zip_longest(Range&&... ranges) {
    return zip_longest_range<Range...>(
        std::forward<Range>(ranges)...
    );
}

Now, zip_longest_range class contains functions that are common to any range-like class (e.g. std::vector or any other container). It basically does not hold any logic related to zipping but only provides begin and end zip-iterators as well as other helper functions like size and empty. Furthermore, if you take a closer look at the begin public function, you may notice that both begin and end iterators of all the ranges are being passed to the zip-iterator. You may ask yourself why is that so… But the reason is quite simple: in order to fill the missing values (in case of ranges of uneven length) with the default ones, we need to know if we have reached the end of a particular range.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
template <typename... Range>
class zip_longest_range {
    using indices = std::index_sequence_for<Range...>;

public:
    explicit constexpr zip_longest_range(Range&&... ranges)
        : ranges_{ std::forward<Range>(ranges)... },
          size_{ max_size(indices()) }
    {}

    [[nodiscard]] constexpr auto begin() const noexcept {
        return make_zip_iterator(begin_tuple(indices()), end_tuple(indices()));
    }

    [[nodiscard]] constexpr auto end() const noexcept {
        return make_zip_iterator(end_tuple(indices()));
    }

    [[nodiscard]] constexpr std::size_t size() const noexcept {
        return size_;
    }

    [[nodiscard]] constexpr bool empty() const noexcept {
        return 0 == size();
    }

private:
    template <std::size_t... Is>
    [[nodiscard]] auto begin_tuple(std::index_sequence<Is...>) const noexcept {
        return std::make_tuple(std::begin(std::get<Is>(ranges_))...);
    }

    template <std::size_t... Is>
    [[nodiscard]] auto end_tuple(std::index_sequence<Is...>) const noexcept {
        return std::make_tuple(std::end(std::get<Is>(ranges_))...);
    }

    template <std::size_t... Is>
    [[nodiscard]] constexpr std::size_t max_size(std::index_sequence<Is...>) const noexcept {
        if constexpr (0 == sizeof...(Range)) {
            return 0;
        } else {
            return std::max({ size(std::get<Is>(ranges_))... });
        }
    }

    template <typename T>
    [[nodiscard]] constexpr std::size_t size(T&& range) const noexcept {
        return std::distance(std::begin(range), std::end(range));
    }

private:
    std::tuple<Range...> ranges_;
    std::size_t size_;
};

As already mentioned, begin and end public functions create zip-iterators by using make_zip_iterator helper function with 2 overloads, first one used for creating the end iterator and second one used for creating the begin iterator:

1
2
3
4
5
6
7
8
9
10
11
12
13
template <typename... Ts>
zip_iterator<Ts...> make_zip_iterator(std::tuple<Ts...>&& end_tuple) {
    return zip_iterator<Ts...>{ std::forward<std::tuple<Ts...>>(end_tuple) };
}

template <typename... Ts>
zip_iterator<Ts...> make_zip_iterator(std::tuple<Ts...>&& begin_tuple,
                                      std::tuple<Ts...>&& end_tuple) {
    return zip_iterator<Ts...>{
        std::forward<std::tuple<Ts...>>(begin_tuple),
        std::forward<std::tuple<Ts...>>(end_tuple)
    };
}

Finally, let’s get to the implementation of the zip_iterator class which contains all the logic. Actually, to be precise, an overloaded increment operator contains all the logic as it increments iterators to all the ranges (1) and constructs the value tuple (2). While constructing the value tuple, the default value is being used if the incremented iterator is the end iterator of a particular range. Otherwise, the real value is being used.

1
2
3
4
5
constexpr zip_iterator& operator++() noexcept {
    next(indices());                  // (1)
    value_tuple_ = make(indices());   // (2)
    return *this;
}

indices is an alias for std::index_sequence_for<Ts...> and it is being used by both the next function and the make function. next function takes advantage of fold expressions in order to carefully increment the iterators of all the ranges. Why did I say carefully? Because, we must not pass the end of a particular range because we would run into an Undefined Behavior in that case.

1
2
3
4
5
6
7
8
template <std::size_t... Is>
constexpr void next(std::index_sequence<Is...>) noexcept {
    ((
        std::get<Is>(iterator_tuple_) == std::get<Is>(end_iterator_tuple_) ?
        std::get<Is>(iterator_tuple_) :
        ++std::get<Is>(iterator_tuple_)
    ),...);
}

Furthermore, make function creates a tuple out of the values current iterators are pointing to. However, if the current iterator is the end iterator, then the value of the specific type is braced-list initialized.

1
2
3
4
5
6
7
8
9
10
template <std::size_t... Is>
[[nodiscard]] constexpr auto make(std::index_sequence<Is...>) noexcept {
    return std::make_tuple(
        (
            std::get<Is>(iterator_tuple_) == std::get<Is>(end_iterator_tuple_) ?
            typename std::iterator_traits<decltype(std::prev(std::get<Is>(iterator_tuple_)))>::value_type{} :
            *std::get<Is>(iterator_tuple_)
        )...
    );
}

Take a look at the full implementation of zip_iterator class below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
template <typename... Ts>
class zip_iterator {
    using indices = std::index_sequence_for<Ts...>;

public:
    using iterator_category = std::forward_iterator_tag;
    using difference_type   = std::ptrdiff_t;
    using iterator_type     = std::tuple<Ts...>;
    using value_type        = std::tuple<typename std::iterator_traits<Ts>::value_type...>;
    using pointer           = value_type*;
    using reference         = value_type&;

    constexpr zip_iterator(std::tuple<Ts...>&& begin_iterator_tuple,
                           std::tuple<Ts...>&& end_iterator_tuple)
        : iterator_tuple_{ std::forward<std::tuple<Ts...>>(begin_iterator_tuple) },
          end_iterator_tuple_{ std::forward<std::tuple<Ts...>>(end_iterator_tuple) },
          value_tuple_{ make(indices()) }
    {}

    constexpr zip_iterator(std::tuple<Ts...>&& end_iterator_tuple)
        : iterator_tuple_{ std::forward<std::tuple<Ts...>>(end_iterator_tuple) }
    {}

    [[nodiscard]] constexpr auto operator*() noexcept {
        return value_tuple_;
    }

    [[nodiscard]] constexpr auto operator->() noexcept {
        return &value_tuple_;
    }

    constexpr zip_iterator& operator++() noexcept {
        next(indices());
        value_tuple_ = make(indices());
        return *this;
    }

    constexpr zip_iterator operator++(int) noexcept {
        auto tmp = *this;
        ++(*this);
        return tmp;
    }

    [[nodiscard]] constexpr bool operator!=(zip_iterator const& rhs) const noexcept {
        return iterator_tuple_ != rhs.iterator_tuple_;
    }

    [[nodiscard]] constexpr bool operator==(zip_iterator const& rhs) const noexcept {
        return iterator_tuple_ == rhs.iterator_tuple_;
    }

private:
    template <std::size_t... Is>
    [[nodiscard]] constexpr auto make(std::index_sequence<Is...>) noexcept {
        return std::make_tuple(
            (
                std::get<Is>(iterator_tuple_) == std::get<Is>(end_iterator_tuple_) ?
                typename std::iterator_traits<decltype(std::prev(std::get<Is>(iterator_tuple_)))>::value_type{} :
                *std::get<Is>(iterator_tuple_)
            )...
        );
    }

    template <std::size_t... Is>
    constexpr void next(std::index_sequence<Is...>) noexcept {
        ((
            std::get<Is>(iterator_tuple_) == std::get<Is>(end_iterator_tuple_) ?
            std::get<Is>(iterator_tuple_) :
            ++std::get<Is>(iterator_tuple_)
        ),...);
    }

private:
    iterator_type iterator_tuple_;
    iterator_type end_iterator_tuple_;
    value_type value_tuple_;
};

If you wish to experiment a bit, play with the code on godbolt.

Conclusion

Throughout this article, we have discussed several approaches on how to zip multiple ranges together as well as the benefits and drawbacks of using those approaches. In the end, an example of how to zip the longest range, a feature which is missing from all the other approaches we have mentioned before, has been shown. Let me know your thoughts and suggestions in the comment section below… ⬇️👇

Expanding On Standard Threads

Different Ways To Define Binary Flags

Comments powered by Disqus.

Trending Tags