1. 23 Nov, 2015 1 commit
    • Debargha Mukherjee's avatar
      Reduce transform options for ext-tx experiment · 56ab215d
      Debargha Mukherjee authored
      Reduces the transform optons for INTRA as well as INTER when
      transform size is 16x16 to not use any of the DSTs.
      Thus, a total of 10 options are used for 16x16, while 4x4
      and 8x8 still uses 17 options.
      
      derflr/hevchd actually improves a little, while hevcmr drops
      a little.
      
      About 10% speed improvement.
      
      Change-Id: I920a182231e052cdd622f8bb67085c16c572cb1e
      56ab215d
  2. 19 Nov, 2015 2 commits
  3. 18 Nov, 2015 1 commit
  4. 17 Nov, 2015 2 commits
    • hui su's avatar
      Merge MISC_FIXES · 66f2f65e
      hui su authored
      Remove MISC_FIXES flags except for the changes on MV precision, which
      has a 0.1% performance drop.
      
      On derflr, the impact is -0.012%.
      
      Change-Id: I0a74e5a212dd0cb827192a318c92a714c9681e45
      66f2f65e
    • hui su's avatar
      Fix some unused variable warnings · af084fbe
      hui su authored
      Change-Id: Ia7680ddf00dd50dd66bbb5753bae30b937988800
      af084fbe
  5. 16 Nov, 2015 1 commit
  6. 13 Nov, 2015 1 commit
    • hui su's avatar
      refactor ext-intra · 4aa50c17
      hui su authored
      Coding gain remains about the same, while overall speed is
      substantially increased.
      
      Change-Id: I2989bebcfd21092cd6a02653d4df4a3bf6780874
      4aa50c17
  7. 12 Nov, 2015 5 commits
  8. 11 Nov, 2015 3 commits
  9. 09 Nov, 2015 1 commit
    • Johann's avatar
      Release v1.5.0 · cbecf57f
      Johann authored
      Javan Whistling Duck release.
      
      Change-Id: If44c9ca16a8188b68759325fbacc771365cb4af8
      cbecf57f
  10. 06 Nov, 2015 9 commits
  11. 04 Nov, 2015 13 commits
    • Angie Chiang's avatar
      Add iadst32 · b0df5e0f
      Angie Chiang authored
      Change-Id: I3a53ee51146d0bd4b0fe4b27c286e8c921f9823b
      b0df5e0f
    • Angie Chiang's avatar
      Add iadst16 · 35486a6b
      Angie Chiang authored
      Change-Id: I093881aacaf9a070f78cc4eea2e8a6ede8a71792
      35486a6b
    • Angie Chiang's avatar
      Add iadst8 · 0ca0cc24
      Angie Chiang authored
      Change-Id: Ia58e4735d7d7bfd2ac55259c32705118c6745c6d
      0ca0cc24
    • Angie Chiang's avatar
      Add iadst4 · ba69089e
      Angie Chiang authored
      Change-Id: Ie419b2b1e939a41c30ed609e1ba46f5f6609b2a5
      ba69089e
    • Angie Chiang's avatar
      Add idct32 · 74678334
      Angie Chiang authored
      Change-Id: I75412bdc4bd0d9c90e8b56e02e0e467a2d9957f9
      74678334
    • Angie Chiang's avatar
      Add idct16 · d3cee565
      Angie Chiang authored
      Change-Id: I8e5ba3a3f9b64ccbf038e371525e897774729b06
      d3cee565
    • Angie Chiang's avatar
      Add idct8 · bd9db2f5
      Angie Chiang authored
      Change-Id: I8092a6f229b196c5c8b7dcd2dff8aaf68253e422
      bd9db2f5
    • Angie Chiang's avatar
      Add idct4 · 7d2b7b69
      Angie Chiang authored
      Change-Id: I1d1b6822452772cec95160491c7bc6d3bba1f5c2
      7d2b7b69
    • Angie Chiang's avatar
      Add fadst32 · a9253a20
      Angie Chiang authored
      Change-Id: I77299f0e39fc7cef91e7e420513dbd05194f320a
      a9253a20
    • Angie Chiang's avatar
      Add fadst16 · a7d26f4e
      Angie Chiang authored
      Change-Id: I5175e39b5df73646488f74b2a9e4a463ae79d91a
      a7d26f4e
    • Jingning Han's avatar
      Simplify txfm rate-distortion optimization · 493d0234
      Jingning Han authored
      This commit refactors the rate-distortion optimization scheme for
      transform block coding. When both ext-tx and var-tx experiments
      are turned on, the encoding time for bus_cif at 1000 kbps goes down
      from 706377 ms to 666503 ms (5.6% speed-up). The coding statics
      remain unchanged.
      
      Change-Id: I20835db573725580aad79c16220f799ce01f2093
      493d0234
    • Geza Lore's avatar
      Flip the result of the inverse transform for FLIPADST. · 4f510809
      Geza Lore authored
      When using FLIPADST, the vp10_inv_txfm_add functions used to flip
      the destination array, add the result of the inverse transform, to it
      and then flip the destination back. This has been replaced by
      flipping the result of the inverse transform before adding it to the
      destination. Up-Down flipping is done by negating the destination
      stride, and staring from the bottom, so it should now be free.
      Left-right flipping is done with the usual SSE2 instructions in the
      optimized code.
      
      The C functions match the SSE2 functions as expected, so the C functions
      now do the flipping as well when required. Adding this cleanly required
      some refactoring of the C functions, but there is no measurable
      performance impact when ext-tx is not enabled.
      
      Encode speedup with ext-tx enabled is about 3%.
      
      Change-Id: I5b04e5d720f0b9f0d54fd8607a8764f2314c7234
      4f510809
    • hui su's avatar
      ext-intra experiment · be3559ba
      hui su authored
      Currently there are two parts in this experiment: extra directional intra
      prediction modes and the filter intra modes migrated from the nextgen branch.
      
      Several macros are defined in "blockd.h" to provide controls of the experiment
      settings. Setting "DR_ONLY" as 1 (default is 0) means we only use directional
      modes, and skip the filter-intra modes; "EXT_INTRA_ANGLES" (default is 128)
      defines the number of different angles we want to support; setting
      "ANGLE_FAST_SEARCH" as 1 (default is 1) means we use fast sub-optimal search
      for the best prediction angle, instead of exhaustive search. The fast search
      is about 6 times faster than the exhaustive search, while preserving about
      60% of the coding gains.
      
      With extra directional prediction modes (fast search), we observe the following
      code gains (number in parentheses is for all-key-frame setting):
      derflr +0.42%  (+1.79%)
      hevclr +0.78%  (+2.19%)
      hevcmr +1.20%  (+3.49%)
      stdhd  +0.56%
      Speed-wise, about 110% slower for key frames, and 30% slower overall.
      
      The gains of filter intra modes mostly add up with the gains of directional
      modes. The overall coding gain of this experiment:
      derflr +0.94%
      hevclr +1.46%
      hevcmr +1.94%
      stdhd  +1.58%
      
      Change-Id: Ida9ad00cdb33aff422d06eb42b4f4e5f25df8a2a
      be3559ba
  12. 03 Nov, 2015 1 commit
    • Geza Lore's avatar
      Eliminate copying for FLIPADST in fwd transforms. · 01bb4a31
      Geza Lore authored
      This patch eliminates the copying of data when using FLIPADST forward
      transforms, by incorporating the necessary data flipping into the
      load_buffer_* functions of the SSE2 optimized forward transforms. The
      load_buffer_* functions are normally inlined, so the overhead of copying
      the data is removed and the overhead of flipping is minimized. Left to
      right flipping is still not free, as the columns need to be shuffled in
      registers.
      
      To preserve identity between the C and SSE2 implementations, the
      appropriate C implementations now also do the data flipping as part of
      the transform, rather than relying on the caller for flipping the input.
      
      Overall speedup is about 1.5-2% in encode on my tests. Note that these
      are only the forward transforms. Inverse transforms to come in a later
      patch.
      
      There are also a few code hygiene changes:
      - Fixed some indents of switch statements.
      - DCT_DCT transform now always use vp10_fht* functions, which dispatch
        to vpx_fdct* for DCT_DCT (some of them used to call vpx_fdct*
        directly, some of them used to call vp10_fht*).
      
      Change-Id: I93439257dc5cd104ac6129cfed45af142fb64574
      01bb4a31