Performance comparison of CUDA, OpenCL, and C++ AMP

Trying to get information of the underlying design of a GPGPU programming language environment and hardware can be difficult.  Companies will not publish design information because they do not want you or other companies to copy the technology.  But, sometimes you need to know details of a technology that are just not published in order to use it effectively.  If they won’t tell you how the technology works, the only recourse to gain an understanding is experimentation [1, 2].  What is the performance of OpenCL, CUDA, and C++ AMP?  What can we learn from this information?

