The view from Nov 2016 C++ Standard Meeting Issaquah
09 December 2016
I attended
the Issaquah meeting with plans to address the C++ 17 National
body comments. Upon arrival, I was also asked to chair the
Evolution Working Group as the chair was delayed in arrival. I
will describe in this post some of the thinking process involved
in such a role.
I compiled a slide deck of that describes all the new and changed features based on the Issaquah meeting. There is also a video embedded below that was recorded at the final plenary session at Meeting C++, where more than 700 people attended. It provides an update of the presentation in the slides linked above. I also presented this to 600 people as a keynote on Heterogeneous Computing in C++ at code::dive in Poland the day immediately after the C++ Standard meeting.
It is a sign of the stability of the C++17 CD that there were significantly less comments than the two combined C++11 CDs. The chart below shows how the various releases of C++11, 14, and 17 rank in terms of how many comments were submitted from the various Nations.
The next charts show their distribution for this release by Nations, and through the various working groups. As usual the US submitted the most, but we also had comments from Spain, GB, Canada, Finland, France, Russia, Japan, and Switzerland. I worked most of this summer helping to deliver the Canadian comments as its Head of Delegation, while also involved now with the UK comments. This involves reviewing the C++17 CD page by page.
The
comments are coded by 2 letter country designation with CH being
Switzerland. There is the official comment paper which
accumulates each Nation's comments, similar to the United
Nations.
- P0488R0 WG21 Working paper: NB Comments, ISO/IEC CD 14882
and a Late
paper which we still accepted and processed (though there is no
guarantee that would normally happen):
- P0489R0 WG21 Working paper: Late Comments on CD 14882
In
reality, most of the people who comment on these drafts already
attend the Standard meetings and are familiar with the issues.
Some National Bodies are very organized, with work load spread
out between teams with assignment to review specific chapters,
while others work on a more ad-hoc basis. Belonging to both the
Canadian and the UK delegation currently, and used to working
within the US delegation, I see all forms and everything in
between.
Most of
the comments by far were aimed at Library and LEWG, while
Evolution had about 60, and SG1 had about 20.
If you
recall from my last
trip report from the C++ Standard Oulu meeting in June, there
was some diverging opinion about several key feature inclusions
and exclusions in C++17. So it is no surprise that their
disappointment was repeated through the National Body (NB)
Comments. In turn, some nations put in defensive comments
pre-emptively to balance out the opposition.
There were
many opposing comments from several NBs asking variously to add
back or remove concepts/unified call syntax/default comparison,
inline variables, and many other issues that were viewed as
contentious (note many of the features have links in the
downloadable slides). These issues were
decided on Monday first thing in full plenary to see if there was
any increase in consensus and could be changed from what was
already in the draft. We have got used to this as the best way to
deal with these potentially contentious issues early, so as not
to waste committee time working on them. It would be a waste of
time to do that, only to have them voted down if people had
already decided, when that time could be better advancing
something else.
The
following votes were taken immediately in the Monday plenary
involving Evolution of C++ language design:
- ES 4, US 2, Late5: Add back concepts (all or part): 22 for,
24 against adding; no consensus (will not discuss)
- ES 5, US 68: Add back unified call syntax P0301R0 : many against adding, no
consensus (will not discuss)
- ES 7, US 5, US 69, RU 5, Late7, Late14: Add back default
comparisons P0221R2 (all or part): 16 against
adding, borderline, will discuss but it will be an uphill
battle
- ES 1, US 65, Late13: remove inline variables ( P0386R2 ): 29 against removal (it stays, will not discuss)
The other set are Library issues:
- GB 44, FI 5: Remove elementary string conversions P0067R4:
1 for removal, will discuss
- US 18, US 70: remove dynamic exception specifications
P0003R4 : 1 against removal, will
discuss
- US 22, CA 11, Late11: Add in std::byte P0298R1 :2 against adding, will
discuss
- ES 6, US 21, US 67, Late8, Late10: Add in operator dot P0252R2: 30 against adding: will not discuss
The only issue that had mild consensus for a possible change was adding default comparison operator, removing elementary string conversion, removing dynamic exception specification, and adding std::byte as these all squeaked by with a low number of negative votes. Everything else stays as is after the morning poll meaning that, while they were the only ones up for discussion the rest of the week, they would need very compelling reasons to be changed.
Thanks to ViIle and Jens as they had already separated all the NB comments into different WGs and VIlle already had organized the list to be triaged by EWG, so all I had to do was make sure EWG stayed on target, working through all the NB comments assigned to us until ViIle arrived.
Evolution
Working Group
This group is in charge of designing new language addition to C++. Chairing EWG was fun though I have to slow down my speech style considerably to match the current chair’s style.
We triaged all issues by assigning a priority status to each one as follows:
- Immediate yes for
simple and obvious comments
- Immediate no
for
simple and obvious comments
- Pass
to another Working Group (WG)
- Need
extensive discussion so defer until after the
triage
- Need a
paper post-meeting due
to its complexity or controversial position
This
allowed us to immediately work through all the issues, leaving
only those that need discussion/paper to be extensively scheduled
in the following days.
In all cases, we would need to increase consensus in order for something to be changed in the Draft. This means the bar for a change at this stage is very high, so while people can have high expectations (fix all problems, add this great feature back, remove this feature) and people do, most will be resolved as No Consensus. People are always invited to write a paper if they are dissatisfied. This is the only way to push through progress.
Deduction
Guides
Sometimes,
the interaction of some issues between Working Groups is such
that we would need to have a joint session. I called for this to
occur on Monday afternoon with a joint session of Evolution and
Library Evolution on Deduction Guides, as the triage had revealed
an unusually large number of issues relating to deduction guides
and this impacted Library design. Deduction guides are new to
C++17 from this feature called "template
parameter deduction for constructors"
It enables the use of explicit deduction rules to be created and used along with current implicit deduction rules for template arguments. This simplified certain formulations for variadic lock guard, as an example in the paper but it has general uses. This introduces some problems when it is applied uniformly to the Standard library, and in most cases the existing constructor syntax already provide the desired behavior, but in some cases explicit deduction guides would be needed to complement the implicit deduction cases. The National Body comments fills in the gaps on a few missing cases and also fixes inconsistencies. There were also dueling comments that proposed removing all implicit deduction guides in favor of explicit deduction guides, while others proposed tweaking implicit deduction guides. The committee decided that implicit deduction guides are to be kept, and where needed explicit deduction guides be added.
Other National Body comments we reviewed involved:
Expression
Evaluation Order
These comments are concerned with the change in evaluation order from this proposal commonly called " Refining Expression Evaluation Order for Idiomatic C++"
where
a+=b, a-=b,
and all variants with = where b is evaluated before a will have a different evaluation order as
a.operator=(b)
where a is evaluated before b.This may look bad, but it was worse before C++17 as it was undefined, so code that relied on this evaluation order was not really portable and may have relied on one compiler's behavior. Now at least it is well-defined.
There was some concern about this feature when it was approved, but one compiler implementer had implemented it and found reasonable speedups, while others have done code-base searches and found no impact. There was no consensus at this meeting to reverse this feature out of C++17 so it stays.
Decomposition declarations (aka structured binding)
Tthese comments are concerned with the change in syntax in this proposal called "Structured bindings" where we can store a value and bind names to its components
Most comments concerned that [] was chosen over the original {} syntax and some want that reversed though that had no consensus. There were other concerns including enabling modifiers (static, extern, inline, constexpr, and thread_local), init-captures, arrays support, discarding values in declarations, explicit types in decomposition declaration, decomposition in parenthesis, and when is get<>() functions called. Almost all were either rejected or were to be considered post-C++17 extensions. The only one that was accepted was Decomposition declarations in parenthesis which allows
auto[a, b, c] {expr}, auto[a, b, c]= expr, and auto[a, b, c] (expr)
for uniformity reasons.
Default comparison revival
This one
is not in C++17 but many want to revive it with a simplified
proposal such as only enabling == and !=, or make the syntax
be opt-in only, or a more comprehensive proposal that enables
multi-way comparison which gained popularity in the end but it
would require more work. All of these were deferred to post
C++17.
This was a
final controversy that occurred in plenary requiring a vote to be
withdrawn. This feature adds a byte type with no arithmetic
operations to C++ but it was not in C++17, and there were several
comments including one from Canada that wish to add it in as all
the work had been completed on it but due to a procedural mishap,
it was not included.The only sub-part that became controversial
is the name of "byte" which had been bikeshed. Some want it to be
a storage_byte so that it is clear that this is about storage and
not arithmetic operations. The feature was proposed to be added
for C++17 to address the comment but without the name change. At
the plenary, the name concern was brought up and this caused
enough people to change their mind and this was not added. There
is discussion to have this vote to be retaken in the next meeting
as the proposal had gone through all sub groups and it is
uncommon to have it reversed at plenary. However, it does happen
and I am working behind the scene with the Canadians to deal with
this sensibly.
Parallelism
and Concurrency (SG1)
SG1 had about 20 comments and there were again some dueling varieties. Here are some of the most prominent controversies some of which involved SYCL as it became a prominent use case discussion during SG1:
Parallel Algorithms Exception Handling
In C++17, we added most of Parallel Algorithm TS1 as is. This enabled STL algorithms to be executed, potentially on CPUs in parallel by the addition of an extra parameter at the beginning of say STL sort, for example. But its pedigree was really GPUs and Heterogeneous Programming, and as such this opens the door towards that support. This parameter is called execution policy. So what is an execution policy? It promise that a particular kind of reordering will preserve meaning of program . These were called
par, seq and par_vec
in the TS. It enables the predicate function to be executed in an unordered sequence. More specifically, par means the algorithm is permitted to invoke the user-provided function objects unsequenced if invoked in different threads, or invoke them in indeterminate order if executed on one thread. And par_vec means the algorithm is permitted to invoke the user-defined function objects in unordered fashion in unspecified threads, or invoke them unsequenced if executed on one thread.
Here are some examples of their use in the TS:
std::algo(std::seq, begin, end, Func);
std::algo(std::par, begin, end, Func);
std::algo(std::par_vec, begin, end, Func);
However, even before the TS was added to C++ 17, we had removed or changed in the C++17 CD the following features from the TS:
- removed dynamic execution policy in order to not preserve state in preparation for future addition of new execution policies
- changed par_vec to par_unseq as a better naming convention
But the most interesting change from the Oulu meeting was the replacement of exception_list with terminate and don't unwind. Exception_list is how parallel algorithms handle exceptions. Essentially, they are a list of exception_ptrs. In other words, exception can enter, but can not exit and if they were to escape, then the system is allowed to terminate without unwinding. The reason is because many feel that exception_lists are really unmanageable and few implementers other then SYCL have actually implemented it. If you have plenty of threads, say in a GPU, or even in a CPU with nested context, then a multitude of exceptions could arise, say one from each thread. SG1 agreed and until we know what to do with the exception handling policy, we would prefer to not introduce something that is unmanageable and can not be fixed once it is enshrined in a Standard. This change generated a lot of NB comments (US15, US167, US17, US169, US16, US168, US170, CA17). There were those who wanted to bring back the original policy because they care about a safety context where system must be able to report an exception, and unwind. There are those who wanted it to be removed because they knew this was temporary and indeed is the state of how most parallel system handle exceptions when they escape (including OpenMP parallel regions). During deliberation at Issaquah, there was suggestion that we create a second variant of the policy that were known to throw no exceptions and those that were known to throw. This would have created 6 sets of execution policies, and in my opinion open the door to bifurcate on every quality we can think of, and therefore deeply undesirable. So I suggested that we enable a way to support future policies resembling exception reduction. But this is only possible if we were to change the current Parallel Algorithm exception to be attached to the policy, and not to the algorithm. This change obtained greater consensus and was the only change that was approved. So in the C++17 CD exceptions that escape continues to terminate and don't unwind. This change opens the door to enable us to customize a more flexible policy for parallel algorithms. This was ultimately voted into the Working Paper for C++17 through P0502.
inner_product becomes transform_reduce
This is one of the current Parallel Algorithms and it contains
incorrect parameter ordering for inner_product and some extra
overloads as described in US 159, US 160, US 161, US 162,
US 184 and it was agreed to be renamed to
transform_reduce. More specifically, it renamed
inner_product() to transform_reduce() and
reorganized some of the parameters. It removes the
ExecutionPolicy
overloads for
inner_product()
and
adjacent_difference()
because it cannot be
parallelized. This work still requires some LWG approval so it
will continue into the next meeting.
Parallel Algorithm iterator requirements
The current Parallelism TS restricts it to InputIterators as described by US 156, which are not well-specifed, because they invalidate too frequently, necessitating a copy ability for copying an input Range, or even serialization to an Output Iterator. The basic problem with an Input Iterator is that it does not provide the multi-pass guarantee due to the invalidation (when it grows, for example). This makes it impossible for a parallel algorithm to work with different subsequences at the same time, as it is not possible to advance the iterator to refer to such subsequences without invalidating the iterators used by other threads.This paper specifically cites SYCL as one that requires the ability to copy Parallel Algorithm arguments into subranges (what SYCL called buffers though in this case, SYCL uses a non-Std policy an so it is still conforming and it does it to prepare for running the parallel Algorithm on GPUs) and suggest to promote the requirement to RandomAccessIterators. SG1 considered that but there was no consensus for the change because that may be too restrictive. In terms of SYCL 1.2, we are effectively only supporting contiguous_iterators although we do not check for it since they don't exist in C++14 though some form will exist for C++17. It was agreed by SG1 that we will simply put a note warning people that the choice of iterators can affect performance, in that it may fall back to sequential, which is entirely allowed by the specification. If anything cannot be satisfied by the implementation, then it is allowed to fall back from par_unseq, to par, to seq. What this means is that the Std execution policy can probably only be used in CPUs and shared memory. If it has NUMA, then it will likely need to be copied and it may fall back to the sequential case. This warning note will be reviewed by LWG in the next meeting.
Enable copies of arguments of Parallel Algorithms
One of the most important comments came from CH11 which asked that we be allowed copying of argument objects in parallel algorithm. This is currently not allowed for standard execution policies and is a problem if people ever want to use them even in CPUs with NUMA, but more specifically to enable use on GPUs which have separate memory space, then copies must be allowed. Indeed SYCL does this, not with the Standard execution policies, but with a vendor-supplied SYCL execution policy which is allowed. I think this is one of the most important comments to address for future heterogeneous programming in C++. The proposal did initially gain agreement and I was asked to draft the fix with others, but when people looked at our wording, which used words like "should not take the address of" as a proxy for working on copies, they backtracked and are now concerned about its adaption. This work also continues in the next meeting where SG1 will take another look at our wording fix.
In addition to these major issues, we also looked at how to break up the various SG1 TSs. I currently edit the Concurrency TS, and it was decided at this meeting that Concurrency TS2 will contain:
- ostream synchronization
- floating point atomics
- atomic views
- floating point atomic views
- counter and queues
- lock-free support
- Synchronis/atomic flags
A separate TS will deliver executors because it is needed to support both parallelism and concurrency. Executors have been a topic of great discussion that I have been chairing for the last three months since the Oulu meeting to try to bring together the three competing proposals. At the Issaquah meeting we prepared a unified paper from the months of discussion that looks promising to advance the status. I will describe this in more detail in a future blog post.
Parallelism TS2 will contain task blocks, vector/simd datapar, as well as loop-based vector/symd execution policies.
Here is an updated slide of how SG1 intends to separate out the various TSes:
In SG1, we got to work on not just NB defects, but also some new features for C++20 as shown above as we cleared all NB comments by Wednesday, although other groups continue to work on them as they had far more to deal with. We presented 2 techniques for lock-free programming, called Hazard pointers and Read-Copy-update as enhancements on top of shared_ptr and atomic shared_ptr that can be packaged into a concurrency toolkit. My Cppcon 2016 talk will demonstrate it more. This presents a C++ interface to these techniques as well as show the advantages and disadvantages when compared to reference counting. Both works have been encouraged to continue.
I also
worked on
Memory order Consume,
asychronous algorithms, and presented
Concurrent ring span as the proxy for SG14.
Other work that was presented, and asked to continue and revise for next meeting are:
-
C++ Concurrent Queue: revised for next meeting
-
C++ Distributed Counters: ready for LEWG, target
concurrency TS2
- Concurrency Safety in C++ Data Structures: revise for next meeting
-
RCU: revise for next meeting
-
Hazard Pointer: revise for next meeting; possibly factor
commonality with RCU interface
- Concurrent Ring span: drop concurrent part in favour of above Concurrent Queue proposal, and continues with non-concurrent part
- apply for synchronized_value: revise for next meeting
-
Make
std::memory_order
a scoped enumeration : proceed to LEWG - Thread Constructor Attribute: Revise for next meeting
- Low level API for stackful context switching: revise for next meeting
- Implementation of memory_order_consume: revise for next meeting
- Invoking Algorithms asynchronously: revise for next meeting
-
Updating Parallel Execution Policy Names in Parallelism TS:
proceed for LWG and update Parallelism TS
Here is
the current status of all the comments. Some of these may change
and a final document will be published after the Kona meeting as
a Record of Response. At the suggestion of my colleague, Ruyman,
I have removed the empty rows where we are still working on the
resolution.
By the next meeting in February, the remaining comments will be triaged and we will be in a position to release C++17 for NB ballot again, this time as a Draft International Standard (DIS). If all goes well, the vote will be presented in the July Toronto meeting, and we will be ready to celebrate and push C++17 to be published by the end of 2017. At this point, I don't see any show stoppers and certainly expect that prediction to come true. However, the biggest issues tends to be settle only at the end, so there may still be problems but we will not know until then. So stay tuned.