1. 16 Jan, 2018 1 commit
    • David Michael Barr's avatar
      [CFL] SSSE3/AVX2 versions of cfl_build_prediction_hbd · c363ab76
      David Michael Barr authored
      Includes unit tests for conformance and speed.
      
      SSSE3/CFLPredictHBDTest:
      4x4: C time = 1436 us, SIMD time = 358 us (~4x)
      8x8: C time = 4821 us, SIMD time = 598 us (~8.1x)
      16x16: C time = 18528 us, SIMD time = 1793 us (~10x)
      32x32: C time = 72998 us, SIMD time = 6400 us (~11x)
      
      AVX2/CFLPredictHBDTest:
      4x4: C time = 1436 us, SIMD time = 398 us (~3.6x)
      8x8: C time = 4924 us, SIMD time = 644 us (~7.6x)
      16x16: C time = 18624 us, SIMD time = 1617 us (~12x)
      32x32: C time = 73509 us, SIMD time = 3635 us (~20x)
      
      Change-Id: Icbcfefbf165facdbd77c9b3861af2bbf464254a0
      c363ab76
  2. 11 Jan, 2018 1 commit
    • David Michael Barr's avatar
      [CFL] SSSE3/AVX2 versions of cfl_build_prediction_lbd · 16f38c2c
      David Michael Barr authored
      Includes unit tests for conformance and speed.
      
      SSSE3/CFLPredictTest:
      4x4: C time = 2063 us, SIMD time = 313 us (~6.6x)
      8x8: C time = 6656 us, SIMD time = 493 us (~14x)
      16x16: C time = 24970 us, SIMD time = 1327 us (~19x)
      32x32: C time = 59020 us, SIMD time = 5178 us (~11x)
      
      AVX2/CFLPredictTest:
      4x4: C time = 2052 us, SIMD time = 333 us (~6.2x)
      8x8: C time = 6712 us, SIMD time = 513 us (~13x)
      16x16: C time = 25292 us, SIMD time = 1023 us (~25x)
      32x32: C time = 58994 us, SIMD time = 2828 us (~21x)
      
      Change-Id: I08690a548be981ff10e184de468b9e0e691ee812
      16f38c2c
  3. 08 Jan, 2018 1 commit
    • Luc Trudeau's avatar
      [CFL] SSSE3/AVX2 versions of luma_subsampling_420_lbd · 9bd42785
      Luc Trudeau authored
      Includes unit tests for conformance and speed.
      
      SSSE2/SubsampleSpeedTest:
      4x4: C time = 868 us, SIMD time = 200 us (~4.3x)
      8x8: C time = 3054 us, SIMD time = 293 us (~10x)
      16x16: C time = 11887 us, SIMD time = 760 us (~16x)
      
      AVX2/SubsampleSpeedTest:
      4x4: C time = 784 us, SIMD time = 205 us (~3.8x)
      8x8: C time = 2774 us, SIMD time = 307 us (~9x)
      16x16: C time = 10978 us, SIMD time = 489 us (~22x)
      
      Change-Id: I7d5958097542599d57d1a9f9a0a1b809c6a345b0
      9bd42785
  4. 19 Dec, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Cache DC_PRED during CfL-RDO · 467205ac
      Luc Trudeau authored
      By default, the DC_PRED is not cached (this includes
      decoding). During cfl_rd_pick_alpha(), DC_PRED caching
      is enabled, the DC_PRED is cached after the first time it
      is computed (for each plane) and then it is reused when
      testing all the other scaling parameters.
      
      Change-Id: Ie8ba0bb0427c4d9be8de5b44e6330e8a78b9c7d9
      467205ac
  5. 15 Dec, 2017 1 commit
  6. 13 Dec, 2017 2 commits
  7. 05 Dec, 2017 1 commit
  8. 09 Nov, 2017 1 commit
  9. 28 Sep, 2017 2 commits
  10. 27 Sep, 2017 1 commit
  11. 26 Sep, 2017 1 commit
  12. 19 Sep, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Refactor includes · 89ff793b
      Luc Trudeau authored
      The cfl_init function is moved out of cfl.h simplifying the includes and
      removing the need for forward declarations.
      
      Change-Id: I47312b25410b718a830b001391e386647005d57e
      89ff793b
  13. 13 Sep, 2017 1 commit
    • David Michael Barr's avatar
      [CFL] Fix typedef-redefinition compiler warnings · 5b2021ea
      David Michael Barr authored
      Instead of forward-declaring AV1_COMMON and MACROBLOCKD,
      move the dependent struct and function prototype closer
      to where they are used and after these types are defined.
      
      Change-Id: I75f005b46ef322a6fcbc01377b8dded1637c5f73
      5b2021ea
  14. 31 Aug, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Asserts for chroma_sub8x8 · c84c21c4
      Luc Trudeau authored
      When Chroma from Luma is combined with chroma_sub8x8, the prediction
      used for sub8x8 blocks originates from multiple luma blocks. Extra
      asserts are added to validate that the prediction buffer contains all
      the required information.
      
      Change-Id: I305c46ce9b8292697e1d5b181d123461026da11c
      c84c21c4
  15. 30 Aug, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Fixed negative rounding in scaled_luma · 9c0e9eac
      Luc Trudeau authored
      Since the scaled luma can be negative, ROUND_POWER_OF_TWO_SIGNED must be used.
      This changes the behavior from rounding toward -infinity to rounding towards 0.
      
      Results for Subset1 (compared with 35545dd5 with CfL enabled)
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      0.0082 | -0.1061 | -0.0119 |  -0.0126 | -0.0011 | -0.0121 |     0.0094
      
      Change-Id: Ie7258a17a199368339d4794fba6b5916e607c95b
      9c0e9eac
  16. 28 Aug, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Move store flag to CFL_CTX · fcca37a4
      Luc Trudeau authored
      With recent changes, it is now possible to store the storage
      flag inside the CFL_CTX. This simplifies the implementation
      and will allow reuse in the decoder.
      
      This change does not alter the bitstream.
      
      Change-Id: Ibb8aebdd3d06f8765d40248ece8a038892e87032
      fcca37a4
  17. 17 Aug, 2017 1 commit
    • David Michael Barr's avatar
      [CFL] Move CFL cost table to struct macroblock · 38e560cc
      David Michael Barr authored
      Also, move body of update_cfl_costs() to av1_fill_mode_rates().
      
      Results on Subset1 (Compared to 1cfe474b with CFL enabled)
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000
      
      No change in bitstream, for an average encode speed-up of 2.3%.
      
      Change-Id: I3948abcd70cfecad8086edfe4c45552b576ae06f
      38e560cc
  18. 29 Jul, 2017 1 commit
    • David Michael Barr's avatar
      [CFL] Uniform Q3 alpha grid with extent [-2, 2] · f6eaa159
      David Michael Barr authored
      Expand the range of alpha to [-2, 2] in Q3.
      Jointly signal the signs, including zeros.
      Use the signs to give context for each quadrant
      and half-axis. The (0, 0) point is excluded.
      Symmetry in alpha_u == alpha_v yields 6 contexts.
      
      Results on Subset1 (Compared to 9136ab7d with CFL enabled)
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0792 | -0.7535 | -0.7574 |  -0.0639 | -0.0843 | -0.0665 |    -0.3324
      
      Change-Id: I250369692e92a91d9c8d174a203d441217d15063
      Signed-off-by: default avatarDavid Michael Barr <b@rr-dav.id.au>
      f6eaa159
  19. 28 Jul, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] New UV_PREDICTION_MODE for CFL · 6e1cd787
      Luc Trudeau authored
      CfL is now an independent mode.
      
      Results on Subset1 (Compared to 4266a7ed with CFL enabled)
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1645 | -0.4017 |  0.2475 |  -0.1851 | -0.2179 | -0.2338 |    -0.2897
      
      Change-Id: I2e86e7ea7bfc12bb1d763e70a136ca992d57a3c5
      6e1cd787
  20. 11 Jul, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Convert cfl_alpha to q3 · 4e81d929
      Luc Trudeau authored
      Alpha's biggest fraction is 1/8, so Q3 does not change the bitstream.
      
      Results on Subset1 (compared to 503aca74 with CfL enabled)
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000
      
      Change-Id: I1fe5b2ace97179d5f950d7406a4f3d391924f89d
      4e81d929
  21. 10 Jul, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Q0 DC_Pred · 7651b739
      Luc Trudeau authored
      The block level DC_PRED computed by CfL goes down from Q6 to Q0. This
      will allow to reuse existing assembly for DC_PRED and also reduce the
      requirements on the multilpy required to scale the reconstructed luma
      values
      
      Results on Subset1 (compared to f9684d222 with CfL enabled)
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0347 |  0.0229 | -0.1326 |  -0.0420 | -0.0057 | -0.0072 |    -0.0644
      
      Change-Id: I6ba82cc9e04fa4ab7c8ec40a7856deb273881748
      7651b739
  22. 06 Jul, 2017 4 commits
    • Luc Trudeau's avatar
      [CFL] Fewer bits for fixed point · 475fc9df
      Luc Trudeau authored
      Since alpha is Q3, we reduce y_average from Q10 to Q3. As such, the
      prediction is reduced from Q13 to Q6. Chroma dc_pred is reduced from Q7
      to Q6 in order to match with the prediction.
      
      Results on Subset1 (compared to 209de2e5b with CfL enabled)
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0010 |  0.0176 | -0.0538 |  -0.0043 | 0.0027 | -0.0097 |    -0.0018
      
      Change-Id: Ib7dd3968a764e0380ddc0ad2333ebacf1e9699cd
      475fc9df
    • Luc Trudeau's avatar
      [CFL] Convert dc_pred to fixed point · 2e6cb7e7
      Luc Trudeau authored
      The dc_pred values stored in the CfL context are in Q8.7 (Worst case
      division will be of 1/128).
      
      Results on Subset1 (compared to f9684d222 with CfL enabled)
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0118 | -0.0181 | -0.0109 |   0.0086 | 0.0086 |  0.0196 |     0.0018
      
      Change-Id: I0701e04fb76f03eff12ed01fd5fda675fbb15e32
      2e6cb7e7
    • Luc Trudeau's avatar
      [CFL] Fixed point implementation for tx average · bfe2827b
      Luc Trudeau authored
      This change does not impact the bitstream as no loss is incured by using
      a fixed point value for the transform size average.
      
      For low bit depth, the transform size average is stored using Q8.10
      fixed point format. Worst case, smallest fraction is 1/1024.
      
      Results on Subset1 (Compared to 366b74 with CfL)
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000
      
      Change-Id: Ia5b046b92a0e4c40e413b16af3394bdc0a8c8cd9
      bfe2827b
    • Luc Trudeau's avatar
      [CFL] Compute Average Over TX Block Instead of Pred Block · 03678940
      Luc Trudeau authored
      When computing alpha, multiple averages are computed, one for each
      transform block. The CfL prediction now uses the transform block average
      instead of partition block average.
      
      This allows the decoder to build the CfL prediction by using only the
      collocated reconstructed luma values for the current transform size and
      not the entire partition.
      
      Results on Subset 1 (Compared to 0e81b97c with CfL)
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0180 |  0.2627 |  0.2274 |   0.0233 | 0.0301 |  0.0312 |     0.1506
      
      A small regression is expected, this change was made to simplify
      hardware implementations.
      
      Change-Id: Ib2ce2a3053b85300c5c62ef0e3270af489568a38
      03678940
  23. 03 Jul, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Adjust Pixel Buffer for Chroma Sub8x8 · 780d249d
      Luc Trudeau authored
      Adjust row and col offset for sub8x8 blocks to allow the CfL prediction
      to use all available reconstructed luma pixels.
      
      Results on Subset 1 (Compared to b03c2f44 with CfL)
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1355 | -0.8517 | -0.4481 |  -0.0579 | -0.0237 | -0.0203 |    -0.2765
      
      Change-Id: Ia91f0a078f0ff4f28bb2d272b096f579e0d04dac
      780d249d
  24. 29 Jun, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Better encapsulation · 3dc55e0f
      Luc Trudeau authored
      The function cfl_compute_parameters is added and contains the logic
      related to building the CfL context parameters. As such, many cfl
      functions can now be encapsulated inside of cfl.c and not exposed to the
      rest of AV1.
      
      This also allows for supplemental asserts that validate that the CfL
      context is properly built.
      
      Results on Subset1 (compared to 9c6f8547 with CfL)
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000
      
      Change-Id: I6d14a426416b3af5491bdc145db7281b5e988cae
      3dc55e0f
  25. 20 Jun, 2017 2 commits
  26. 19 Jun, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Compute Luma Average Over Partition Unit · 3e18e4ae
      Luc Trudeau authored
      Extract the compution of the luma reconstructed average out of cfl_load
      and into cfl_compute_average. The reconstructed luma average is stored
      in the CFL_CONTEXT to avoid computing it for each transform block and
      for each plane.
      
      Results on subset1 (compared to 803bea26 with CfL)
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0474 | -0.1486 | -0.2931 |  -0.0358 | -0.0397 | -0.0127 |    -0.1162
      
      Change-Id: I9e34af0fe5961ce8dbe70cb80aea2a16221d0d92
      3e18e4ae
  27. 14 Jun, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Fix always use partition size to compute DC_PRED · 803bea26
      Luc Trudeau authored
      plane_bsize is now computed properly. This also includes support for the
      special case of blocks < 4X4
      
      Results on subset1 (compared to 8e689e4b with CfL)
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0218 | -0.2328 | -0.2555 |  -0.0230 | -0.0379 | -0.0723 |    -0.1205
      
      Change-Id: I6ec87d818d8df6a40ecf3bb1b86954e59c952930
      803bea26
  28. 06 Jun, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Get subsampling from AV1 common · dac5e391
      Luc Trudeau authored
      This change does not impact the bitstream
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0000 |  0.0000 |  0.0000 |   0.0000 | 0.0000 |  0.0000 |     0.0000
      
      Change-Id: I6e131e91bad5efa345ed2542ae970eb6122eff51
      dac5e391
  29. 24 May, 2017 1 commit
  30. 18 May, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Add cfl_update_cost function · 1a47430b
      Luc Trudeau authored
      Encapsulates the logic to update the rate of each CfL codeword.
      The if statements are removed from the loop and the arrays are
      stored in CFL_CTX instead of being declared every time.
      
      Change-Id: I0cb208b14e6c6a888210dd33c5e8fe8d74dd87f4
      1a47430b
  31. 12 May, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] move cfl_idx_to_alpha to header · 04120190
      Luc Trudeau authored
      Move cfl_idx_to_alpha in the header to facilitate inlining.
      Remove the forward MB_MODE_INFO forward declaration
      
      Change-Id: Id33fb0228d88b6285252843e2345a0d3ae875cd2
      04120190
  32. 09 May, 2017 1 commit
  33. 08 May, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Change cfl_load to use width and height · 30596fb2
      Luc Trudeau authored
      Since the size used with cfl_load can either be based on the transform
      block size and the prediction block size, width and height are used as
      parameters instead of TX_SIZE.
      
      This resolves a problem where cfl_compute_alpha_ind was reading
      uninitialized memory.
      
      Change-Id: I187dbdd5b2e8bd85e82bb77eb74859bee2cd3f1e
      30596fb2
  34. 05 May, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] Alpha signaling · f533400a
      Luc Trudeau authored
      Writes and reads alpha to and from the bitstream.
      
      A special case is needed on the encoder side to handle prediction block
      skips. Since whether or not a prediction block is skipped during CfL, a
      rollback is required if the block was skipped and the alpha index was
      not zero. The advantage of this is that no signaling is required when
      the prediction block is skipped as it is assumed tha the alpha index is
      zero.
      
      A encode facade is added to the intra prediction facade as CfL requires
      special encoder side operations.
      
      Change-Id: Ic3b11d0fdbd51389d862112eb09d8785127a6b06
      f533400a