      This commit folks the VP9 and VP10 codebase and makes libvpx
      support VP8, VP9, and VP10.
      Narrow a load in iwht4x4_16_add.
      The top half is unused.
      VPX: remove scaled calls from FUN_CONV_1D
      and FUN_CONV_2D macros.  The predict lut now handles
      this case.  The encoder now calls vpx_scaled_2d() instead
      of vpx_convolve8() for scaling.
      Revert "VP9_COPY_CONVOLVE_SSE2 optimization"
      This reverts commit a5e97d87.
      Revert "vpx_convolve_copy_sse2: fix win64"
      This reverts commit 22a8474f.
      This change performs poorly on various x86_64 devices affecting
      performance by 1-3% at 1080P. Performance on chromebook like devices was
      mixed neutral to slightly negative, so there should be minimal change
      Factor inverse transform functions into vpx_dsp
      This commit moves the module inverse transform functions from vp9
      to vpx_dsp folder. The hybrid transform wrapper functions stay in
      the vp9 folder, since it involves codec-specific data structures.
      VP9_COPY_CONVOLVE_SSE2 optimization
      This function suffers from a couple problems in small core(tablets):
      -The load of the next iteration is blocked by the store of previous iteration
      -4k aliasing (between future store and older loads)
      -current small core machine are in-order machine and because of it the store will spin the rehabQ until the load is finished
      fixed by:
      - prefetching 2 lines ahead
      - unroll copy of 2 rows of block
      - pre-load all xmm regiters before the loop, final stores after the loop
      The function is optimized by:
      copy_convolve_sse2 64x64 - 16%
      copy_convolve_sse2 32x32 - 52%
      copy_convolve_sse2 16x16 - 6%
      copy_convolve_sse2 8x8 - 2.5%
      copy_convolve_sse2 4x4 - 2.7%
      credit goes to Tom Craver(tom.r.craver@intel.com) and Ilya Albrekht(ilya.albrekht@intel.com)
      Refactor mips/dspr2 on convolution.
      Code refactor on InterpKernel
      It in essence refactors the code for both the interpolation
      filtering and the convolution. This change includes the moving
      of all the files as well as the changing of the code from vp9_
      prefix to vpx_ prefix accordingly, for underneath architectures:
      (1) x86;
      (2) arm/neon; and
      (3) mips/msa.
      The work on mips/drsp2 will be done in a separate change list.
      Refactor vp9_idct.h file
      Separate the common coefficient constant into vpx_dsp/txfm_common.h.
      Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h.
      This clears the use case of vp9_idct.h in vpx_dsp folder.
