1. 15 Aug, 2017 8 commits
    • Monty Montgomery's avatar
      Disable only coding transform SIMD for DAALA_TX · 1d190950
      Monty Montgomery authored
      Rather than disabling MMX (well, all of SIMD) for daala transforms,
      selectively disable the AV1 TX SIMD through
      av1/common/av1_rtcd_defs.pl
      
      This also requires quite a few testing build fixups.
      
      Change-Id: I689eaafbdd3a87e3a8eeef97412a1846ef886055
      1d190950
    • Monty Montgomery's avatar
      Add 4-point DST to DAALA_DCT4 experiment · 573cf25f
      Monty Montgomery authored
      CONFIG_DAALA_DCT4 currently force-enables CONFIG_DCT_ONLY due to a
      missing 4-point DST.  The DST had not been included because it was a
      significant coding performance loss; this turned out to be a bug that
      has since been corrected.
      
      This patch adds a 4-point type IV DST to the DAALA_DCT4 experiment.
      There is a small coding performance loss in using the type IV over
      AV1's current type VII.
      
      subset-1:
         monty-newdst4test-baseline-s1-F@2017-07-29T04:58:43.976Z ->
            monty-newdst4test-daala-s1-F@2017-07-29T04:59:56.094Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0336 |  0.1393 |  0.0491 |   0.4118 | -0.0439 |  0.2084 |     0.0476
      
      objective-1-fast:
         monty-newdst4test-baseline-o1f-F@2017-07-29T04:58:10.439Z ->
            monty-newdst4test-daala-o1f-F@2017-07-29T04:59:04.678Z
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      0.0064 |  0.1071 | -0.0108 |   0.1133 | -0.0035 |  0.0765 |     0.0502
      
      Change-Id: Ie29835edbe0e41bc86f4b09457e88d924cc9bf7e
      573cf25f
    • Zoe Liu's avatar
      Add ext-comp-refs dependency on ext-refs in configure · b9cfa415
      Zoe Liu authored
      This will remove the compilation failure for the weekly run on speed
      checking.
      
      Change-Id: Idf688c7e4c6fcb4c5aabef68b0e9f68996cd9a12
      b9cfa415
    • Monty Montgomery's avatar
      Add CONFIG_DAALA_DCT64 experiment. · a4e245a9
      Monty Montgomery authored
      This experiment replaces the 64-point Type-II DCT and related
      scaling vp9 transforms with the 64-point orthonormal
      Daala transforms.
      
      subset-1:
      
          monty-square-baseline-s1-F2@2017-07-28T03:35:45.962Z ->
            monty-square-dct64-s1-F2@2017-07-29T04:50:58.412Z
      
             PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
          -0.1930 | -0.2037 | -0.0643 |  -0.1917 | -0.2331 | -0.3510 |    -0.1810
      
      objective-1-fast:
      
          monty-square-baseline-o1f-F2@2017-07-28T03:35:35.533Z ->
            monty-square-dct64-o1f-F2@2017-07-29T04:50:28.542Z
      
             PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
          -0.2557 | -0.1743 | -0.4900 |  -0.3028 | -0.4147 | -0.5764 |    -0.2864
      
      Change-Id: I1f944df29e44d2e350c42555af274f2d75a62a92
      a4e245a9
    • Urvang Joshi's avatar
      Remove ALT_INTRA flag. · 93b543ab
      Urvang Joshi authored
      This experiment has been adopted as it has been cleared by Tapas.
      
      Change-Id: I0682face60f62dd43091efa0a92d09d846396850
      93b543ab
    • Urvang Joshi's avatar
      Remove left-over lines about PALETTE from configure. · 9f262c5b
      Urvang Joshi authored
      Change-Id: I6b529f8aac561c746bf2805e601931f982bdbb88
      9f262c5b
    • Thomas Davies's avatar
      AOM_QM: enable by default · 181fc08f
      Thomas Davies authored
      No change to metrics, as quantization matrices are not used
      unless --enable-qm=1 is set on the command line.
      
      Fix no highbitdepth compilation, and fix compile errors and
      warnings for PVQ and NEW_QUANT experiments.
      
      Change-Id: I49aceb5acf6ca6790c81e760e5b208788f87086d
      181fc08f
    • Monty Montgomery's avatar
      Add CONFIG_DAALA_DCT32 experiment. · 2cb52baf
      Monty Montgomery authored
      This experiment replaces the 32-point Type-II DCT and 32-point
      Type-IV DST scaling vp9 transforms with the 32-point orthonormal
      Daala transforms.
      
      subset-1:
      
          monty-square-baseline-s1-F3@2017-08-02T11:50:51.375Z ->
            monty-square-dct32-s1-F3@2017-08-02T11:50:18.859Z
      
            PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
          0.0000 |  0.0115 | -0.1044 |  -0.0185 | -0.0069 | -0.0603 |     0.0555
      
      objective-1-fast (4 frames):
      
          monty-square-baseline-o1f-F3-l4-fine@2017-08-12T02:18:05.560Z ->
            monty-square-dct32-o1f-F3-l4-fine@2017-08-12T02:19:44.461Z
      
            PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
         -0.0269 | -0.0715 |     N/A |  -0.0547 | -0.0268 | -0.0590 |        N/A
      
      Change-Id: Ib1bad991d82eb67956e94a6216298a84e908b169
      2cb52baf
  2. 11 Aug, 2017 1 commit
    • Steinar Midtskogen's avatar
      Add experiment CONFIG_CDEF_SINGLEPASS: Make CDEF single pass · 5978212b
      Steinar Midtskogen authored
      Low latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.3162 | -0.6719 | -0.6535 |   0.0089 | -0.3890 | -0.1515 |    -0.6682
      
      High latency, cpu-used=0:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0293 | -0.3556 | -0.5505 |   0.0684 | -0.0862 |  0.0513 |    -0.2765
      
      Low latency, cpu-used=4:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.2248 | -0.7764 | -0.6630 |  -0.2109 | -0.3240 | -0.2532 |    -0.6980
      
      High latency, cpu-used=4:
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1118 | -0.5841 | -0.7406 |  -0.0463 | -0.2442 | -0.1064 |    -0.4187
      
      Change-Id: I9ca8399c8f45489541a66f535fb3d771eb1d59ab
      5978212b
  3. 10 Aug, 2017 1 commit
    • Urvang Joshi's avatar
      Remove PALETTE flag · c6300aa1
      Urvang Joshi authored
      This experiment is now adopted as it was cleared by Tapas.
      
      Note: Palette use can still be controlled by command-line option
      "--tune-content=..." in 'aomenc'.
      
      Change-Id: I832f49f20f60c34bdef5b424755849c496687e87
      c6300aa1
  4. 09 Aug, 2017 1 commit
  5. 08 Aug, 2017 1 commit
  6. 04 Aug, 2017 3 commits
    • Rupert Swarbrick's avatar
      ext-partition-types: Add 64x16 and 16x64 bsizes · 72678577
      Rupert Swarbrick authored
      Change-Id: I0c3772110e9fa62ac687bd99e290b5006bf3bd6c
      72678577
    • Tom Finegan's avatar
      dead code removal: error concealment. · 28c628d9
      Tom Finegan authored
      Change-Id: I5d8615b585f3c4da6af1c1bfd073bdea94ac9df0
      28c628d9
    • Yushin Cho's avatar
      New experiment, CDEF-DIST · c49177e4
      Yushin Cho authored
      Distortion metric that is currently used for CDEF is also used for
      distortion of luma channel during RDO-based mode decision.
      
      This experiment works on the top of 'dist-8x8' experiment.
      
      The BD-Rate change by this experiment for three frames of
      objective-1-fast in AWCY is:
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      1.1589 | -2.0036 | -1.9620 |  -0.0076 | -1.4145 | -1.4561 |    -0.6410
      
      Change-Id: I1142fe2f186f4ed86e4d33468e00b84e30b20233
      c49177e4
  7. 03 Aug, 2017 1 commit
  8. 02 Aug, 2017 3 commits
    • Angie Chiang's avatar
      Add txmg experiment · ad653a39
      Angie Chiang authored
      This experiment aims at merging lbd/hbd txfms
      
      So far this exp uses hbd transform on lbd path.
      The performances I observed are
      lowres -0.089%
      midres  0.065%
      (negative means performance drop)
      
      Started from here, two main things are needed to be done.
      1) Fix overflow due to quantizer noise
      2) Generate a 16-bit version from the hbd txfm
      
      Change-Id: I35bb1fc0cbb78decad2570ff5826ed665f739752
      ad653a39
    • Tom Finegan's avatar
      Remove dead experiment flag: onthefly_bitpacking · 3bc237eb
      Tom Finegan authored
      CONFIG_ONTHEFLY_BITPACKING no longer guards any code. Remove
      the flag from the configure and CMake builds.
      
      Change-Id: Id5605155bdedbf540fe5b9cea3899e8de5ee1062
      3bc237eb
    • Zoe Liu's avatar
      Enable flex-refs by default when altref2 is on · 438b3ae7
      Zoe Liu authored
      Compared against baseline with default enabled tools (except for
      ext-tx and global-motion for speed concern):
      
                       altref2 -> altref2 + flex-refs
      lowres: avg_psnr -0.395% -> -0.460%
      midres: avg_psnr -0.418% -> -0.478%
      
      In particular, flex-refs improves the coding performance for the
      following 3 clips while no impact on all other clips:
      
      bowing_cif.y4m:    avg_psnr  0.023% -> -1.022%
      pamphlet_cif.y4m:  avg_psnr  0.454% -> -1.111%
      snow_mnt_480p.y4m: avg_psnr -0.162% -> -1.948%
      
      Change-Id: I612c1ae5feb1f07d8bd5aaf67e21a076445e10b9
      438b3ae7
  9. 01 Aug, 2017 1 commit
  10. 31 Jul, 2017 1 commit
    • Angie Chiang's avatar
      Turn on convolve_round by default · 71ef7c27
      Angie Chiang authored
      The performance on default experiment is
      lowres: 0.812%
      
      midres/hdres and AWCY tests are still running
      
      Change-Id: Id2209c79df6517732dd06c2712a7bdefde118ead
      71ef7c27
  11. 29 Jul, 2017 1 commit
    • Monty Montgomery's avatar
      Add CONFIG_DAALA_DCT16 experiment. · cb9c1c52
      Monty Montgomery authored
      This experiment replaces the 16-point Type-II DCT and 16-point Type-IV
      DST scaling vp9 transforms with the 16-point orthonormal Daala
      transforms.  These have reduced complexity and are perfect
      reconstruction.  There is currently no net coding performance impact.
      
      subset-1:
      
        monty-square-baseline-s1-F@2017-07-23T03:43:45.042Z ->
           monty-square-dct16-s1-F@2017-07-23T03:42:29.805Z
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.0152 | -0.0028 | -0.0929 |  -0.0432 | -0.0457 | -0.0425 |    -0.0237
      
        objective-1-fast:
      
        monty-square-baseline-o1f-F@2017-07-23T03:44:19.973Z ->
           monty-square-dct16-o1f-F@2017-07-23T03:43:22.549Z
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0305 |  0.0926 | -0.1600 |   0.0471 | 0.0219 | -0.0075 |     0.0135
      
      Change-Id: I54fed26d65fd8450693334bb400b1fafd7e0dacb
      cb9c1c52
  12. 28 Jul, 2017 1 commit
    • Luc Trudeau's avatar
      [CFL] New UV_PREDICTION_MODE for CFL · 6e1cd787
      Luc Trudeau authored
      CfL is now an independent mode.
      
      Results on Subset1 (Compared to 4266a7ed with CFL enabled)
      
         PSNR | PSNR Cb | PSNR Cr | PSNR HVS |    SSIM | MS SSIM | CIEDE 2000
      -0.1645 | -0.4017 |  0.2475 |  -0.1851 | -0.2179 | -0.2338 |    -0.2897
      
      Change-Id: I2e86e7ea7bfc12bb1d763e70a136ca992d57a3c5
      6e1cd787
  13. 27 Jul, 2017 1 commit
    • Cheng Chen's avatar
      Select filter level for U, V planes · e94df5cf
      Cheng Chen authored
      Previously, U, V planes share the same filter level with Y.
      Here, we search and pick the best filter level for U, V planes.
      Selected filter levels are transmitted per frame.
      This works with parallel_deblocking.
      
      Coding gain on Google test set:
      		Avg_psnr	ovr_psnr	ssim
      lowres: 	-0.116		-0.120		-0.339
      midres:		-0.218		-0.228		-0.338
      hdres:		-0.260		-0.264		-0.365
      
      Change-Id: I03d2ac47539f3eea9f3c4b08007bd6d3f4b73572
      e94df5cf
  14. 26 Jul, 2017 4 commits
    • Yue Chen's avatar
      rect_tx_ext: work with var_tx · d6bdd46b
      Yue Chen authored
      Change-Id: Ie2c34490dc50cb242bcd701308e6b55243883b15
      d6bdd46b
    • Sarah Parker's avatar
      Add txfm functions corresponding to MRC_DCT · 5b8e6d2d
      Sarah Parker authored
      MRC_DCT uses a mask based on the prediction signal to modify the
      residual before applying DCT_DCT. This adds all necessary functions
      to perform this transform and makes the prediction signal available
      to the 32x32 txfm functions so the mask can be created. I am still
      experimenting with different types of mask generation functions and
      so this patch contains a placeholder. This patch has no impact on
      performance.
      
      Change-Id: Ie3772f528e82103187a85c91cf00bb291dba328a
      5b8e6d2d
    • Di Chen's avatar
      Disable extra altref and bwdref for still gf group · 53a04f66
      Di Chen authored
      Use three metrics to identify the still gf group.
      Performance:
      lowres: pamphlet_cif -1.395; bowing_cif -0.989;
              others remain same. Overall -0.064
      midres: snow_mnt_480p -0.827. others remain same.
              Overall -0.028
      
      Change-Id: I22a6429c7ebdad2c36ec73c7a69cabc07e8208b7
      53a04f66
    • Monty Montgomery's avatar
      Add CONFIG_DAALA_DCT8 experiment. · cf18fe4e
      Monty Montgomery authored
      This experiment replaces the 8-point Type-II DCT and 8-point Type-IV DST
       scaling vp9 transforms with the 8-point orthonormal Daala transforms.
      These have reduced complexity and are perfect reconstruction at the cost
       of a slightly worse coding performance.
      This is because the Daala transforms expect the input to be shifted by 4
       bits but the output scale of the vp9 transforms is only 3 bits.
      
      subset-1:
      
      monty-square-baseline-subset1 ->
        monty-square-dct8-subset1@2017-07-17T21:37:44.281Z
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0019 | -0.0011 | -0.0585 |  -0.0111 | 0.0305 |  0.0317 |     0.0187
      
      objective-1-fast:
      
      monty-square-baseline-o1f ->
        monty-square-dct8-o1f@2017-07-17T21:37:15.735Z
      
        PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
      0.0285 |  0.0129 | -0.5080 |   0.0529 | 0.0345 |  0.0441 |     0.0054
      
      Change-Id: I2b775495398fb717204a295397c3c5e3ca938183
      cf18fe4e
  15. 24 Jul, 2017 2 commits
    • Urvang Joshi's avatar
      filter-intra: Support rectangular blocks. · 6a99691d
      Urvang Joshi authored
      - Use 'tx_size' in function signatures.
      - filter_intra_taps_3 and filter_intra_taps_4 updated to support
        TX_SIZES_ALL (thanks to yuec@)
      
      With these changes, filter-intra works correctly with rect-intra-pred.
      So, we remove the temporary workaround for this.
      
      Change-Id: Ide0f593419c21a74c08c61859f8dad918ca169fa
      6a99691d
    • Urvang Joshi's avatar
      Workaround for filter-intra + rect-intra-pred mismatch. · d0b7cf94
      Urvang Joshi authored
      This workaround is temporary, until filter-intra can work with rectangular
      blocks.
      
      Tested OK:
      make clean; ../../configure --disable-install-docs --enable-unit-tests
      --enable-debug --enable-aom-highbitdepth --enable-experimental
      --enable-adapt-scan --enable-dual-filter --enable-ext-inter
      --enable-ext-intra --enable-ext-refs --enable-ext-tx
      --enable-filter-intra --enable-loop-restoration --enable-rect-tx
      --enable-compound-segment --enable-interintra --enable-wedge
      make -j
      ./test_libaom
      
      Change-Id: I4554d1f25de9448b22465e93a7616df0c206e298
      d0b7cf94
  16. 21 Jul, 2017 2 commits
  17. 20 Jul, 2017 4 commits
    • Cheng Chen's avatar
      Directional deblocking filter · 9050c9da
      Cheng Chen authored
      New deblocking filter that smooths block boundaries in an estimated
      direction of object orientation.
      
      1. Select the proper direction for deblocking filtering.
      Compute abs gradient line by line for the block.
      Select the direction with least sum of abs gradient.
      
      2. Apply deblocking filtering for a block along this direction.
      Apply directional filtering for Y, U, V planes.
      
      Coding gain on Google test set:
      
      %	  	avg_psnr   ovr_psnr  ssim
      lowres  	-0.129 	  -0.136    -0.277
      midres  	-0.103    -0.127    -0.188
      hdres	  	-0.159    -0.158    -0.173
      screen_content  -0.408    -0.397    -0.695
      
      Change-Id: Ie8646dcc163ace5d8faf5e502b38342d885efc30
      9050c9da
    • Yunqing Wang's avatar
      Make ext_tile compatible with reference_buffer · c2502b55
      Yunqing Wang authored
      In ext_tile experiment, when cm->large_scale_tile is 1, prev_frame_id can be
      the same as current_frame_id, which is prohibited in reference_buffer
      experiment and causes "CORRUPT_FRAME" error to be reported.
      
      In this patch, enable/disable reference_buffer according to large_scale_tile
      value, and thus make these 2 experiments compatible.
      
      Change-Id: If64943acb91e7a7b859db4e2ac62581e9b53ef85
      c2502b55
    • Yushin Cho's avatar
      New experiment DIST_8x8 · b7b60c57
      Yushin Cho authored
      A framework for computing a distortion at 8x8 luma block level
      during RDO-based mode decision search. New 8x8 distortion metric can
      be plugged in by way of this tool.
      
      Existing daala_dist now uses this experiment as well.
      Other possible applications that can make use of this experiment would be
      a distortion meric, which should apply at 8x8 pixels such as PSNR-HVS, SSIM, or etc.
      
      A rd_cost for final coding mode decision for a super block is
      computed for a partition size 8x8 or larger. For a block larger than 8x8,
      a distortion of each 8x8 block is independently computed then summed up.
      
      The rd_cost for 8x8 block with new 8x8 distortion metric is computed
      only when the mode decision of its sub8x8 blocks are completed.
      However, MSE distortion metric is used with sub8x8 mode decision. Thus,
      early termination is also determined with the MSE based rd_cost.
      Because the best rd_cost (i.e. the reference rd_cost) during sub8x8 prediction
      or sub8x8 tx is based on new 8x8 distortion while each sub8x8 uses MSE,
      the existing early termination cannot be used (And this can be the one of possible reason
      for the BD-Rate change with this revision).
      
      For a sub8x8 prediction, prediction mode for each sub8x8 block of a 8x8 block is
      decided with existing MSE and then av1_dist_8x8() is applied to the 8x8 pixels.
      (There is also av1_dist_8x8_diff, which can input diff signal directly)
      
      For a sub8x8 tx in a block larger than 8x8, instead of computing MSE distortion for
      each sub8x8 tx block, we wait until all sub8x8 tx blocks are encoded before av1_dist_8x8()
      is applied to 8x8 pixels.
      
      Sub8x8 prediction and transformas were most of tricky parts in this change.
      Two kind of distortions, for a) predicted pixels and b) decoded pixels
      (i.e. predicted + possible reconstructed residue), are always computed during RDO.
      In order to access those two signals a) and b) for a 8x8 block after
      its sub8x8 mode decision is finished, a) and b) need be properly stored for later retrieval.
      
      The CB4X4 makes the task of accessing a) and b) signals for sub8x8 block further difficult,
      since the intermediate data (i.e. a and/or b) for sub8x8 block
      are not easily accessible outside of current partition unless reconstruced
      with decided coding modes.
      
      Change-Id: If60301a890c0674a3de1d8206965bbd6a6495bb7
      b7b60c57
    • Zoe Liu's avatar
      Add a new experiment "altref2" · 68ad7a6e
      Zoe Liu authored
      This experiment is to add ALTREF2_FRAME to allow 2 altref backward
      predictions. Each video frame will then have up to 7 reference frames
      to choose from:
      
      (1) 4 forward predictive references, namely
      LAST_FRAME, LAST2_FRAME, LAST3_FRAME, and GOLDEN_FRAME; and
      (2) 3 backward predictive references, namely
      BWDREF_FRAME, ALTREF2_FRAME, and ALTREF_FRAME.
      
      The tool of "altref2" is built on top of the "ext_refs" experiment.
      
      Change-Id: Idbb0bb53b43c5c2c7baf4959331fc5a31c77a118
      68ad7a6e
  18. 18 Jul, 2017 2 commits
    • Ryan Lei's avatar
      enable parallel_deblocking experiment by default · 2c6ca5fe
      Ryan Lei authored
      this change enables parallel_deblocking by default after it has been
      officially adopted. the parallel_deblocking_15taps experiment is merged
      into the parallel_deblocking experiment so it is removed to clean up
      the code. internal compile flags are added to disable 15 tap for both
      luma and chroma plane for future experiment purpose. the internal
      compile flags are disabled by default.
      
      Change-Id: I1668fd2cb7676d756c52263d6993241618d33ee6
      2c6ca5fe
    • Angie Chiang's avatar
      Add flag inter_stats_only · 08a22a63
      Angie Chiang authored
      This flag will allow us to skip key frame's stats
      Therefore, we can test inter frame performance when frame number
      is small. The inter frame's stats won't get underwhelmed because
      of key frame's stats
      
      Change-Id: I9eaa8e5775fb2e740406cfa4b4f64f96f180d9db
      08a22a63
  19. 14 Jul, 2017 1 commit
    • Yunqing Wang's avatar
      Make EXT_TILE compatible with TILE_GROUPS · eeb08a9b
      Yunqing Wang authored
      Added a 1-bit flag 'large_scale_tile'. If it is 0 that is the default value,
      use normal tile coding in TILE_GROUPS. If it is 1, use large-scale tile
      coding in EXT_TILE.
      
      At large_scale_tile=1 case, if single-tile-decoding is required, then the
      loopfilter is disabled.
      
      Related API and unit tests were modified.
      
      Change-Id: I3ba12dc3d80ccf1ab21543ab3b16c02282c34e3b
      eeb08a9b
  20. 13 Jul, 2017 1 commit