4

Recently I read that some STL algorithms have undefined behaviour if the passed functor is stateful (has internal side-effects). I've used the std::generate function with a functor similar (less trivial) to the following:

class Gen
{
public:
    explicit Gen(int start = 0)
        : next(start)
    {
    }

    int operator() ()
    {
        return next++;
    }

private:
    int next;
};

Is this safe to use with std::generate? Is the order of generating values guaranteed?

Edit: Claim made here Stateful functors & STL : Undefined behaviour

Community
  • 1
  • 1
Neil Kirk
  • 21,327
  • 9
  • 53
  • 91
  • Where have you read that? I don't think there are any constraints except that it has to be callable with no arguments and yield a result of a specified type. – luk32 Jun 09 '14 at 10:15
  • @luk32 updated to add link – Neil Kirk Jun 09 '14 at 10:26
  • Are we talking pass-by-value (Scott Meyers Effective stl)? See http://stackoverflow.com/questions/17626667/why-function-objects-should-be-pass-by-value – doctorlove Jun 09 '14 at 10:27
  • Standard library algorithms take the function object by value, their implementations are allowed to do copies. Use `std::ref` to turn your function object into something which has reference semantics. This allows you to pass state around -- very useful eg. for random number generators (eg. `std::generate (v.begin (), v.end (), std::ref (rng))`). – Alexandre C. Jun 09 '14 at 11:50

3 Answers3

6

Introduction

There's no problem using a stateful functor with functions such as std::generate, but one has to be careful not to run into issues where the underlying implementation makes a copy that will change the semantics of a program in a way that is different from what the developer have in mind.

25.1p10 Algorithms library - General [algorithms.general]

[ Note: Unless otherwise specified, algorithms that take function objects as arguments are permitted to copy those function obejcts freely. Programmers for whom object identity is imoprtant should consider using a wrapper class that points to a noncopied implementation object such as reference_wrapper, or some equivalent solution. -- end note ]


Order of assignment/evaluation

The standard explicitly states that exactly last - first (N) assignments and invocations of the generator will be made, but it doesn't state in what order. More can be read in the following Q&A:


Stateful functors + std::generate, unsafe?

Generally no, but there are a few caveats.

Within std::generate the standard guarantees that the same instance of the functor type will be invoked for every element in the given range, but as can be hinted by the declaration of std::generate we might run into issues where we forget that the passed functor invoked inside the function will be a copy of the one passed as argument.

See the below snippet where we declare a functor to generate "unique" ids:

#include <iostream>
#include <algorithm>

template<class id_type>
struct id_generator {
  id_type operator() () {
    return ++idx;
  }

  id_type next_id () const {
    return idx + 1;
  }

  id_type idx {};
};

int main () {
  id_generator<int> gen;

  std::vector<int>  vec1 (5);
  std::vector<int>  vec2 (5);

  std::generate (vec1.begin (), vec1.end (), gen);
  std::generate (vec2.begin (), vec2.end (), gen);

  std::cout << gen.next_id () << std::endl; // will print '1'
}

After running the above we might expect gen.next_id () to yield 11, since we have used it to generate 5 ids for vec1, and 5 ids for vec2.

This is not the case since upon invoking std::generate our instance of id_generator will be copied, and it is the copy that will be used inside the function.


What would be the solution?

There are several solutions to this problem, all of which prevents a copy from being made when you pass your functor to some algorithm function related to std::generated.


Alternative #1

The recommended solution is to wrap your functor in a std::reference_wrapper with the use of std::ref from <functional>. This will effectively copy the reference_wrapper, but the referred to instance of generate_id will stay the same.

std::generate (vec1.begin (), vec1.end (), std::ref (gen));
std::generate (vec2.begin (), vec2.end (), std::ref (gen));

std::cout << gen.next_id () << std::endl; // will print '11'

Alternative #2

You could, of course, also make your fingers stronger by writing something as confusing as the below:

std::generate<decltype(vec1.begin()), id_generator<int>&>(vec1.begin(), vec1.end(), gen);
Community
  • 1
  • 1
Filip Roséen - refp
  • 62,493
  • 20
  • 150
  • 196
  • Thank you for the information. What about the order of elements? – Neil Kirk Jun 09 '14 at 10:38
  • @NeilKirk the order on which element is assigned a generated value is not specified, but the standard mandates that the generator passed is invoked `last - first` times, where *last* and *first* are the iterators passed. It also mandates that exactly `last - first` assignments will be made. – Filip Roséen - refp Jun 09 '14 at 11:00
  • So my example won't work to create an array of increasing numbers? – Neil Kirk Jun 09 '14 at 11:04
  • @NeilKirk I'll have to take a deeper look in the standard, but afaik there's no guarantee that the first invocation of the *functor* will be assigned to the first element in the range. I'll get back to you asap. – Filip Roséen - refp Jun 09 '14 at 11:05
  • @NeilKirk It probably will, but as far as I can see, except for `for_each`, there is no guarantee of order (and the fact that such a guarantee was felt necessary for `for_each` is a very strong indication that one wasn't meant to be implied for anything else). – James Kanze Jun 09 '14 at 11:06
  • @NeilKirk after a careful standard session, and a quick search to find related questions on SO the verdict is: **No**, there's **[no such guarantee](http://stackoverflow.com/q/14823732/1090079)** - ie. the order of assignment and invocation of the *functor* does not have to be sequential. – Filip Roséen - refp Jun 09 '14 at 11:08
  • I'd be interested in knowing on what you base the first paragraph after "The misunderstanding". I can't find any real guarantee for it in the standard (except for `for_each`); on the other hand, it's rather obvious that in some cases (e.g. `std::accumulate`), the algorithm doesn't make much sense otherwise. – James Kanze Jun 09 '14 at 11:09
  • @JamesKanze There's a note in `25.1p10` that specifically address this issue (included in the post), and there's also no wording explicitly disallowing an implementation to make copies, hence: an implementation is free to make copies if it wants to, as long as the same (potential copy) is invoked for every element. – Filip Roséen - refp Jun 09 '14 at 11:11
  • Finally: the order requirement is actually more important than the copy issues. Any copy issues are easily worked around by adding a level of indirection. The order issues aren't, and make `std::generate` almost useless. – James Kanze Jun 09 '14 at 11:11
  • @JamesKanze it will generate **1** value for every **1** element in the range, so it's not *"useless*"; it serves the purpose for which it was defined. If one wants a sequential generation of elements it's possible through [`std::iota`](http://en.cppreference.com/w/cpp/algorithm/iota) (even if it requires a little bit more typing in terms of implementation). – Filip Roséen - refp Jun 09 '14 at 11:13
  • @FilipRoséen-refp The note says exactly the opposite of what you claim: "Unless otherwise specified, algorithms that take function objects as arguments are permitted to copy those function objects freely." So `generate` could use a new copy of the argument for each assignment. – James Kanze Jun 09 '14 at 11:14
  • @JamesKanze after a more careful study of the wording around `std::generate` the wording explicitly states *invocations of `gen`*, not invocations of a copy of `gen`; this should be proof enough that an implementation must invoke the copy ending up as an argument. – Filip Roséen - refp Jun 09 '14 at 11:21
  • @FilipRoséen-refp This is what I'm trying to find out. What has precedence here: the fact that the standard says that the implementation can copy freely, or the fact that it uses the actual name of the argument (which suggests the same object) in the specification of the behavior? – James Kanze Jun 09 '14 at 11:23
  • @JamesKanze first of all, *"Notes"* are normative; meaning that they kind of don't matter, but are there to serve as a hint to the reader what is going on, 2nd: the explicit wording of `[alg.generate]` is what stands. It's more restrictive than the previous wording, and wins per default. – Filip Roséen - refp Jun 09 '14 at 11:24
5

Almost every meaningful generator is stateful. Think about it: a generator invocation has no arguments. If the generator were stateless it would thus have to generate the same value every time. In other words, the only stateless generator is a constant generator:

template <typename T>
struct constant {
    constant(T&& value) : value{value} {}

    T const& operator ()() const { return value; }

private:
    T const value;
};

Or, alternatively, you could have a generator which has state external to the object itself, such as std::random_device. But these are really the only two applicable cases. This would be awfully restrictive. So yes – it is definitely possible to have a stateful generator.

The source for this statement that you refer to is not talking about generators – instead, it is talking about predicats, which are a different class of functions, and have different requirements (a predicate is a function with a single argument which returns a bool value).

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Well, it could use some global/static variables to hold its state, but that would be truly awful. Just to prove I am not inventing things to nitpick, the current [example on cppreference](http://en.cppreference.com/w/cpp/algorithm/generate) uses `std::rand` which clearly has a state, and is a function so there is no other way to do it. – luk32 Jun 09 '14 at 10:17
  • @luk32 Oh, I actually counted that as stateful as well – but you’re right, it’s an important distinction – Konrad Rudolph Jun 09 '14 at 10:24
0

As far as the standard is concerned, you have no guaranteed. An implementation can copy a functional object any time it wants. The idea is that an implementation could potentially divide the range into sub-ranges, and process each sub-range in a separate thread. (At least, this was what was claimed at one point in the standardization process.)

Globally, of course, for some algorithms, like generate or accumulate, allowing multiple copies or reordering would make the algorithm useless. Usually, such algorithms guarantee order; they seem to have forgotten generate in this. And while the standard does seem to have forgotten allowing copies, pratically, if order is guaranteed, there is no need for multiple copies in the processing, so I think we can safely assume that there aren't, even in the absense of a strict guarantee.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • **No**, `std::generate` is free to copy the argument passed in, it cannot invoke different copies of *gen* for each element assigned, as stated in `[alg.generate]`. It uses *"gen"* to refer to the functor, not a "copy of gen" or equivalent. – Filip Roséen - refp Jun 09 '14 at 11:22
  • But other statements say that the implementation can copy freely, and treat such names as placeholders or "for exposition"; I don't think the name of a parameter has any significance in the standard. (But it's hard to say; the library parts of the standard aren't always as precise as one would like.) Also: we're talking about a strict interpretation of the standard here. In practice, I use `std::accumulate` without worrying about the copies, although the standard may (or may not) allow them. Implementors do try to make their implementations usable. – James Kanze Jun 09 '14 at 11:27
  • It's talking about a specific object, not a type. And in any case, the wording involved implies that the requirements for *functors* passed to `std::generate`, and similar functions, does not include them being *copy-assignable*, only *copy-constructible*; this means that an implementation is not allowed to change `gen` by assigning it a new value (that was originally a copy of itself). – Filip Roséen - refp Jun 09 '14 at 11:37
  • Actually, the standard doesn't even mandate it being *copy-constructible*; the **only** requirement is that it is a *function object*. I wrote *copy-constructible* above because it's implied that it can be copied when `std::generate` is invoked, but if one explicitly specify the *template-argument* this is no longer an issue. – Filip Roséen - refp Jun 09 '14 at 11:43
  • @FilipRoséen-refp The standard _does_ require _copy-constructible_, because otherwise, you can't pass by value. (OK, so it's a very indirect requirement.) For the rest... Many interpretations are possible. I don't really know which one is correct. I do definitely remember hearing the "split between multiple threads" argument in the standards committee (but I'm not sure it means anything; the suggested generator object isn't thread safe, even if no copies are made). I think it's an issue on which we have to count, at least partially, and the common sense of the implementors. – James Kanze Jun 09 '14 at 12:23
  • No, since you can explicitly state the argument type that isn't even an *indirect* requirement, nowhere does it say that the *functor* must be able to be copy-constructible. The *"split between multiple threads"* are however a valid point, but that doesn't mean that `gen` can be copied. – Filip Roséen - refp Jun 09 '14 at 12:32