In CUDA, OpenCL, and C++ AMP, a group is a collection of threads that execute in parallel in lock-step fashion. In CUDA, it is called a block; in OpenCL, it is called a work-group; in C++ AMP, it is called a tile. The purpose of a group is to allow threads within the group to communicate with each other using synchronization and/or shared memory. The size of thread groups is set by the programmer, but hardware constraints limit the maximum size to 512 or 1024. While programmers usually need to tailor algorithms to be aware of thread groups, there are a few tricks that can make programming easier.