Today's processors have undergone a huge transformation from those of just 10 years ago. CPU manufacturers Intel and AMD (and up and coming CPU designer ARM) have increased processor speed via greater emphasis on superscalar execution, deeper pipelining, branch prediction, out of order execution, and fast multi-level caches. This design philosophy has resulted in faster response time for single tasks executing on a processor, but at the expense of increased circuit complexity, high power consumption, and a small number of cores on the die. On the other hand, GPU manufacturers NVIDIA and ATI have focused their designs on processors with many simple cores that implement SIMD parallelism, which hides latency of instruction execution .
While GPUs have been in existence for about 10 years, the software support for these processor have taken years to catch up. Software developers are still sifting through solutions for programming these processors. OpenCL and CUDA are frameworks for GPGPU computing. Each framework comprises a language for expressing kernel code (instructions that run on a GPU), and an API for calling kernels (from the CPU). While the frameworks are similar, there are some important differences.