1. 19 Feb, 2018 1 commit
    • Maxym Dmytrychenko's avatar
      SSE2 optimization for lpf 16_dual implementations · d6a7dd19
      Maxym Dmytrychenko authored
      covers horizontal and vertical variations and
      including low and high bitdepth types.
      
      Appropriate tests are enabled
      
      Performance changes, SSE2 over C:
      Horizontal methods: up to  3x
      Vertical   methods: up to  2x
      
      Change-Id: If430a916394c7befa743e4fbaa9913fd37c535ed
      d6a7dd19
  2. 18 Feb, 2018 3 commits
  3. 17 Feb, 2018 1 commit
  4. 16 Feb, 2018 2 commits
  5. 15 Feb, 2018 3 commits
    • Yaowu Xu's avatar
      Remove CONFIG_TX64X64 · d3d4159f
      Yaowu Xu authored
      The experiment is fully adopted.
      
      Change-Id: I6cc80a2acf0c93c13b0e36e6f4a2378fe5ce33c3
      d3d4159f
    • Debargha Mukherjee's avatar
      Stop using VP9 convolve scheme in AV1 encoder. · 1fc3df55
      Debargha Mukherjee authored
      Discontinue all VP9 style convolve rounding operations in the non-normative
      parts of the encoder.
      
      The function av1_convolve_2d_sr_c is forced instead of SIMD versions
      of the same function, because of incompatibility when round_1 > 0.
      
      In the -DCONFIG_LOWPRECISION_BLEND=2 -DCONFIG_HIGHPRECISION_INTBUF=1
      setting, results on 15 frames of lowres (cpu-used=1) is -0.019% better.
      
      Change-Id: I72154bd896357c352c944fb2cd3b25bafafba46a
      1fc3df55
    • Luc Trudeau's avatar
      [CFL] SSE2/AVX2 Versions of Sum and Subtract Average · 365f73bb
      Luc Trudeau authored
      Includes unit tests for conformance and speed.
      
      SSSE2/CFLSubAvgTest
      4x4: C time = 234 us, SIMD time = 152 us (~1.5x)
      8x8: C time = 664 us, SIMD time = 208 us (~3.2x)
      16x16: C time = 1687 us, SIMD time = 581 us (~2.9x)
      32x32: C time = 6118 us, SIMD time = 2119 us (~2.9x)
      
      AVX2/CFLSubAvgTest
      4x4: C time = 250 us, SIMD time = 221 us (~1.1x)
      8x8: C time = 683 us, SIMD time = 284 us (~2.4x)
      16x16: C time = 1727 us, SIMD time = 1091 us (~1.6x)
      32x32: C time = 6092 us, SIMD time = 2107 us (~2.9x)
      
      Change-Id: I44ffedc683829d2c16089854ac43d4ddb4415bcd
      365f73bb
  6. 14 Feb, 2018 3 commits
  7. 13 Feb, 2018 3 commits
  8. 12 Feb, 2018 2 commits
    • Angie Chiang's avatar
      [Normative lv_map ]Let default scan be zig-zag · 28ba7fc7
      Angie Chiang authored
      The PSNR change is neutral.
      
      BUG=aomedia:1369
      
      Change-Id: Iade17a19580eed788338d5933423ea0235316952
      28ba7fc7
    • Peng Bin's avatar
      Add inv txfm2d sse2 for sizes with 4 · 18976fa5
      Peng Bin authored
      Implement av1_lowbd_inv_txfm2d_add_4x4_sse2
      Implement av1_lowbd_inv_txfm2d_add_4x8_sse2
      Implement av1_lowbd_inv_txfm2d_add_8x4_sse2
      Implement av1_lowbd_inv_txfm2d_add_4x16_sse2
      Implement av1_lowbd_inv_txfm2d_add_16x4_sse2
      
      A brief speed test shows that using the included SSE2 functions
      completed by this CL, for speed1 lowbitdepth encoder speeds up >9%
      and lowbitdepth decoder speeds up >25%, comparing to the highbitdepth
      implementation in the baseline.
      
      Change-Id: I0576a2a146c0b1a7b483c9d35c3d21d979e263cd
      18976fa5
  9. 11 Feb, 2018 2 commits
    • Peng Bin's avatar
      Add inv txfm2d 64 sse2 · a7ba23f6
      Peng Bin authored
      Implement av1_lowbd_inv_txfm2d_add_32x64_sse2
      Implement av1_lowbd_inv_txfm2d_add_64x32_sse2
      Implement av1_lowbd_inv_txfm2d_add_16x64_sse2
      Implement av1_lowbd_inv_txfm2d_add_64x16_sse2
      
      Change-Id: I1b27618f153583cc787e7bf6ef1616e7c6932990
      a7ba23f6
    • Peng Bin's avatar
      Add inv txfm2d {8x32,32x8} sse2 · 008c6430
      Peng Bin authored
      Implement av1_lowbd_inv_txfm2d_add_8x32_sse2
      Implement av1_lowbd_inv_txfm2d_add_32x8_sse2
      
      Change-Id: Ibd5de72e1d2c4dabba5af020a06e8cfac329dc3d
      008c6430
  10. 10 Feb, 2018 3 commits
    • Johann's avatar
      test: apply clang-format v5.0.0 · f152ff6a
      Johann authored
      Change-Id: Iee91f5f6314c43556791850db19687ccac14c8be
      f152ff6a
    • Peng Bin's avatar
      Inv_txfm unittest cosmetic changes · e193a70d
      Peng Bin authored
      Change-Id: I51df5a53bfa97ce69e6820e67df2cece3c0d5be5
      e193a70d
    • Debargha Mukherjee's avatar
      Reorganize code to test various convolve options · e820b820
      Debargha Mukherjee authored
      Reorganize code to faciliate setting rounding parameters based
      on bit-depth, and to faciliate testing.
      
      After this patch this wil be the behavior for config flags as far
      as round_0 and round_1 choices are concerned for 8- and 10-bit:
      
      0. CONFIG_LOWPRECISION_BLEND=0 CONFIG_HIGHPRECISION_INTBUF=0:
      round_0 = 5, round_1 = None (baseline)
      
      1. CONFIG_LOWPRECISION_BLEND=0 CONFIG_HIGHPRECISION_INTBUF=1:
      round_0 = 3, round_1 = None (to test impact of increase in precision
      of intermediate buffer)
      
      2. CONFIG_LOWPRECISION_BLEND=1 CONFIG_HIGHPRECISION_INTBUF=0:
      round_0 = 5, round_1 = 4
      
      3. CONFIG_LOWPRECISION_BLEND=2 CONFIG_HIGHPRECISION_INTBUF=0:
      round_0 = 5, round_1 = 5
      
      4. CONFIG_LOWPRECISION_BLEND=1 CONFIG_HIGHPRECISION_INTBUF=1:
      round_0 = 3, round_1 = 6 (ARM proposal except clipping)
      
      5. CONFIG_LOWPRECISION_BLEND=2 CONFIG_HIGHPRECISION_INTBUF=1:
      round_0 = 3, round_1 = 7 (Google variation proposal)
      
      Change-Id: I615348332f5692135352085ca923662f9d52f696
      e820b820
  11. 09 Feb, 2018 9 commits
  12. 08 Feb, 2018 8 commits