1. 06 Jul, 2017 1 commit
  2. 29 Jun, 2017 1 commit
  3. 28 Jun, 2017 3 commits
  4. 27 Jun, 2017 2 commits
    • Debargha Mukherjee's avatar
      Reduce multiplier precision for warp least squares · f053cba2
      Debargha Mukherjee authored
      Includes reordering and other clamping changes, as well as
      changes to reduce multiplier precision.
      
      cam_lowres (60 frames): -0.092% BDRATE improvement in
      --disable-cdef --disable-global-motion --disable-ext-tx
      configuation.
      
      Change-Id: I0660c45b44fcd5a193534d8dadd1aa1ae5c5e27a
      f053cba2
    • Yi Luo's avatar
      Fix inv txfm low/high bitdepth selection logic · 51281095
      Yi Luo authored
      We are going to have several commits to setup new low/high
      bitdepth data path selection logic. This patch is for inverse
      transform. Let me summarize the ideas as following.
      
      - For low/high bitdepth selection, encoder depends on
        input configuration, e.g., video sequence bitdepth,
        profile. Decoder depends on input bitstream. This has
        nothing to do with compiler/build  configuration.
      
      - Typical encoder usage for sampling format 4:2:0.
        1) 8-bit video sequence:
         a) --profile=0
         Fastest encoding/decoding pipeline on speedup.
      
         b) --profile=2 --bit-depth=10
         Image pixels are left shifted by 2 bits. It
         employs 16-bit reference frame buffer and has high
         calculation precision. It usually enjoys higher
         compression performance.
      
        2) 10/12-bit video sequence (HDR):
         --profile=2 --bit-depth=10/12
      
      - Transform coefficient type:
        Lowbitdepth:  int16_t
        Highbitdepth: int32_t
      
      - The type, tran_low_t is still used in codebase,
        Which is int32_t, defining the data path capacity.
        Naturally, it is high bitdepth.
      
      Eventually we shall remove the configuration flags,
      CONFIG_HIGHBITDEPTH/CONFIG_LOWBITDEPTH, and seperate
      low and high bitdepth data path. Two data paths co-exist
      in the same build environment.
      
      Change-Id: I35c06d4d4f19ebf80d909168fdddbae57c3cc884
      51281095
  5. 26 Jun, 2017 3 commits
    • Yaowu Xu's avatar
      quantize.c: convert to int before apply sign · d43d6777
      Yaowu Xu authored
      This change makes the conversions similar to those in av1_quantize.c,
      and fix ubsan warnings shown in nightly tests.
      
      Change-Id: I90851a80dcb9f052a32bf22199fd9ef8ff927725
      d43d6777
    • James Zern's avatar
      aom_dsp.cmake: add highbd_quantize_intrin_avx2.c · 284c8830
      James Zern authored
      added in:
      193422e7 Add avx2 highbd_quantize_b
      
      Change-Id: Ie4ba48042ffd36d69d2bf200bba12a1d924c8f9c
      284c8830
    • Lester Lu's avatar
      New experiment: LGT · ad8290b8
      Lester Lu authored
      In previous ADSTs, DST-7 and DST-4 are used for length 4 and length
      8/16/32, respectively. In this LGT experiment we explore transforms
      between DST-4 and DST-7. When CONFIG_LGT flag is on, adst4 and adst8
      are replaced by lgt4 and lgt8, the intermediate transforms with
      pre-chosen parameters.
      
      The LGTs applied here are lgt4_160 and lgt8_170, where the numbers
      mean the self-loop weights times 100. The associated values for DST-7
      and DST-4 are 100 and 200.
      
      ovr_psnr:
      lowres: -0.140
      midres: -0.131
      hdres: -0.078
      
      These changes are not applied to the highbd scenario in the
      current version.
      
      Change-Id: I20600456da8766528b2b6b11aa28801e70af498e
      ad8290b8
  6. 22 Jun, 2017 2 commits
    • Steinar Midtskogen's avatar
      Silence warnings in VS · 079acac1
      Steinar Midtskogen authored
      BUG=aomedia:615
      
      Change-Id: I827e857d310020705a5292ef8fe817bc042d8dd0
      079acac1
    • Yi Luo's avatar
      Add avx2 highbd_quantize_b · 193422e7
      Yi Luo authored
      - First pass encoding time reduces ~10.9% on i7-6700
        at 100 frames, 1080p.
      - avx2 works for coeff number >= 8 cases; coeff number < 8
        case will be implemented by sse2.
      - Unit test is added type B/FP/DC.
      
      Change-Id: Ibe5b7807c64e6dfc2d59c470ed50a6e8ca94ef7c
      193422e7
  7. 20 Jun, 2017 2 commits
    • Tom Finegan's avatar
      Build static libaom without internal deps in CMake. · 78516fca
      Tom Finegan authored
      Change the internal lib targets so that external apps
      need link only libaom instead of all internal library
      targets and libaom.
      
      BUG=aomedia:76,aomedia:609
      
      Change-Id: I38862fcd90cb585300b6b23e8558f78a1934750f
      78516fca
    • Tom Finegan's avatar
      Add shared library support to the CMake build. · 84f2d796
      Tom Finegan authored
      This is enabled via:
      $ cmake path/to/aom -DBUILD_SHARED_LIBS=1
      
      Currently supports only Linux and MacOS targets. Symbol visibility
      is handled by exports.cmake and its helpers exports_sources.cmake
      and generate_exports.cmake.
      
      Some sweeping changes were required to properly support shared libs
      and control symbol visibility:
      
      - Object libraries are always linked privately into static
        libraries.
      - Static libraries are always linked privately into eachother
        in the many cases where the CMake build merges multiple library
        targets.
      - aom_dsp.cmake now links all its targets into the aom_dsp static
        library target, and privately links aom_dsp into the aom target.
      - av1.cmake now links all its targets into the aom_av1 static library
        target, and privately links in aom_dsp and aom_scale as well. It
        then privately links aom_av1 into the aom target.
      - The aom_mem, aom_ports, aom_scale, and aom_util targets are now
        static libs that are privately linked into the aom target.
      - In CMakeLists.txt libyuv and libwebm are now privately linked into
        app targets.
      - The ASM and intrinsic library functions in aom_optimization.cmake
        now both require a dependent target argument. This facilitates the
        changes noted above regarding new privately linked static library
        targets for ASM and intrinsics sources.
      
      BUG=aomedia:76,aomedia:556
      
      Change-Id: I4892059880c5de0f479da2e9c21d8ba2fa7390c3
      84f2d796
  8. 19 Jun, 2017 2 commits
    • Jingning Han's avatar
      Revert "Clamp inverse transform coefficients" · 71adf529
      Jingning Han authored
      This reverts commit 79b78b7d.
      
      The transform coefficient range needs some more tuning.
      Before we finalize on that front, directly applying clamping
      would cause multiple unit test failure issues. Hence revert
      this Cl temporarily.
      
      BUG=aomedia:612
      
      Change-Id: I1dd8680dee17289801c4a209275f05a498355c8e
      71adf529
    • Timothy B. Terriberry's avatar
      encoder: Remove 64x upsampled reference buffers · 5d24b6f0
      Timothy B. Terriberry authored
      They do not handle border extension correctly (interpolation and
      border extension do not commute unless you upsample into the
      border), nor do they handle crop dimensions that are not a multiple
      of 8 (the upsampled version is not sufficiently large), in addition
      to using massive amounts of memory and being a criminal waste of
      cache (1 byte used for every 8 bytes fetched).
      
      This commit reimplements use_upsampled_references by computing the
      subpixel samples on the fly. This implementation not only corrects
      the border handling, but is also faster, while maintaining the
      same quality.
      
      HL AWCY results are basically noise:
          PSNR | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
        0.0188 |   0.0187 | 0.0045 |  0.0063 |     0.0228
      
      Change-Id: I7527db9f83b87a7bb8b35342f7e6457cd0bef9cd
      5d24b6f0
  9. 16 Jun, 2017 2 commits
    • Sebastien Alaiwan's avatar
      Clamp inverse transform coefficients · 79b78b7d
      Sebastien Alaiwan authored
      When --enable-coefficient-range-checking isn't specificed, clamp the
      coefficient at each stage.
      
      This doesn't change the decoder behaviour for existing AV1 streams.
      However, some AV1 bitstreams that would have been rejected by the
      decoder as illegal (range check failure) are now legal bitstreams.
      
      There is no impact on video quality.
      
      BUG=aomedia:30
      
      Change-Id: Ifa01186bae6bfe5d7712298e33d964c20f88435e
      79b78b7d
    • Tom Finegan's avatar
      Sync CMake build with the configure build. · 3613c517
      Tom Finegan authored
      - Added: CONFIG_COLORSPACE_HEADERS CONFIG_SPEED_REFS
               CONFIG_LGT CONFIG_SBL_SYMBOL
      - Removed: CONFIG_RECT_INTRA_PRED
      - Changed, 0 => 1: CONFIG_EXT_INTER CONFIG_INTERINTRA
                         CONFIG_WEDGE CONFIG_COMPOUND_SEGMENT
                 1 => 0: CONFIG_ONE_SIDED_COMPOUND
      
      BUG=aomedia:76
      
      Change-Id: If9ebd068d0014386ec25d91226a577c591f5a774
      3613c517
  10. 14 Jun, 2017 1 commit
  11. 11 Jun, 2017 1 commit
    • Jingning Han's avatar
      Resolve compiler warning when highbd is off · 105eecf4
      Jingning Han authored
      The highbd_clip_pixel_add() function is generalized to be used in
      the regular 8 bit path. Move its defintions outside the highbd
      experimental flag.
      
      This resolves the comiler warning in unit tests when high bit-depth
      is turned off.
      
      Change-Id: I90a744adb2381c9bf8476aa2a2bd0c87d9afdf57
      105eecf4
  12. 09 Jun, 2017 2 commits
    • David Barker's avatar
      Fix Windows x86 build with --enable-ext-inter · dab3e99b
      David Barker authored
      The Windows calling convention pushes any __m128i type arguments
      after the 3rd (4th on x86-64) onto the stack. But on x86,
      stack-allocated arguments are not guaranteed to be aligned to
      a multiple of their natural alignment, leading to compile errors.
      
      We fix this by making the functions which take >3 __m128i arguments
      instead take pointers. Since the functions are marked INLINE, the
      extra memory operations should optimize out.
      
      BUG=aomedia:587
      
      Change-Id: I0cb2831fd12aded6f2821c037365386e6183ba5c
      dab3e99b
    • Thomas Davies's avatar
      AOM_QM: Use 8-bit matrices and fix 2x2 transform sizes. · 92aa22a8
      Thomas Davies authored
      2x2 transforms are now hidden behind the CHROMA_2X2 macro,
      not the CB4X4 macro.
      
      Change-Id: I5d73c679fba486ccda98fa8dbb804a3902df6c8d
      92aa22a8
  13. 08 Jun, 2017 2 commits
    • Frederic Barbier's avatar
      Cleanup dead fwd transform functions · d405f8a6
      Frederic Barbier authored
      Cleanup related wrappers and unit-tests.
      
      Change-Id: I2d37a8c80de63dbeaef584e3d5fa842c0b2ee6db
      d405f8a6
    • Sarah Parker's avatar
      Remove deprecated high-bitdepth functions · 31c66502
      Sarah Parker authored
      This unifies the codepath for high-bitdepth transforms and deletes
      all calls to the old deprecated versions. This required reworking
      the way 1d configurations are combined in order to support rectangular
      transforms.
      
      There is one remaining codepath that calls the deprecated 4x4 hbd
      transform from encoder/encodemb.c. I need to take a closer look
      at what is happening there and will leave that for a followup
      since this change has already gotten so large.
      
      lowres 10 bit: -0.035%
      lowres 12 bit: 0.021%
      
      BUG=aomedia:524
      
      Change-Id: I34cdeaed2461ed7942364147cef10d7d21e3779c
      31c66502
  14. 06 Jun, 2017 1 commit
    • Urvang Joshi's avatar
      Add a new experiment "rect-intra-pred". · 766a389b
      Urvang Joshi authored
      Earlier, intra prediction for rectangular blocks was performed by
      running two steps of prediction on square sub-blocks.
      
      With this experiment, we do proper intra prediction for rectangular
      blocks. This ensures that we make use of all available neighboring
      pixels especially for directional modes. For this, all the intra
      predictors were updated to work with rectangular transform block sizes.
      
      Performance improvements are small but free of cost:
      
      All Intra frames:
      lowres: -0.126
      midres: -0.154
      
      Video Overall:
      lowres: -0.043
      midres: -0.100
      
      [Could not get AWCY results due to a backlog.]
      
      BUG=aomedia:551
      
      Change-Id: I7936e91b171d5c246cb0a4ea470a981a013892e6
      766a389b
  15. 02 Jun, 2017 3 commits
    • Tom Finegan's avatar
      Sync CMake build defaults with the configure build. · 6f9dfa51
      Tom Finegan authored
      - Added: CONFIG_ONE_SIDED_COMPOUND CONFIG_VAR_REFS
      - Removed: CONFIG_SUB8X8_MC CONFIG_EC_MULTISYMBOL
                 CONFIG_DAALA_EC CONFIG_LOWDELAY_COMPOUND
      - Changed, 0 => 1: CONFIG_VAR_TX CONFIG_EC_SMALLMUL
                         CONFIG_CHROMA_SUB8X8
                         CONFIG_LOOPFILTERING_ACROSS_TILES
                         CONFIG_TEMPMV_SIGNALING
      
      BUG=aomedia:76
      
      Change-Id: Ia010abeaf079d8c6318a5a540e9354d5455ce826
      6f9dfa51
    • Tom Finegan's avatar
      Add include guards to CMake files used as includes. · 17ccaec4
      Tom Finegan authored
      BUG=aomedia:76
      
      Change-Id: Ie34025f31a89f4991d03d5ecf03c6f6f5ab7b0a1
      17ccaec4
    • Ryan Lei's avatar
      integrate parallel_deblocking with CB4x4 · 17905edf
      Ryan Lei authored
      this change makes parallel deblocking experiment works with
      cb4x4. the inner loop process every 4x4 block.
      
      Change-Id: I86adb3d7b6d67a91ccc12aab29da9bfb8c522cf1
      17905edf
  16. 30 May, 2017 1 commit
  17. 27 May, 2017 1 commit
    • Debargha Mukherjee's avatar
      High precision Wiener filter rework · 11cf46f4
      Debargha Mukherjee authored
      Implements the high precision Wiener filter with an offset
      to reduce the error due to saturation without increasing
      the number of bits needed for intermediate precision.
      
      Also turns the high precision filter on.
      
      Change-Id: I34037a5746a6a89c5fce67753c1b027749085edf
      11cf46f4
  18. 26 May, 2017 2 commits
    • David Barker's avatar
      ext-inter: Vectorize new masked SAD/SSE functions · 0aa39ff0
      David Barker authored
      We would expect that these new functions would be slower than
      the old masked SAD/SSE functions, as they do additional work
      (blending two inputs and comparing to a third, rather than
      just comparing two inputs).
      
      This is true for the SAD functions, which are about 50% slower
      (depending on block size and bit depth). However, the sub-pixel
      SSE functions are comparable to the old speed for the accelerated
      special cases (xoffset or yoffset = 0 or 4), and are
      between 40-90% faster for the generic case.
      
      Change-Id: I1a296ed8fc9e3edc313a6add516ff76b17cd3e9f
      0aa39ff0
    • Cheng Chen's avatar
      Function parameter type correction · 60f59618
      Cheng Chen authored
      Make function parameter and pass in value the same type.
      
      Change-Id: Ie2172b99b4cda81ac1d51f7ef1018bb9d4f55016
      60f59618
  19. 25 May, 2017 4 commits
  20. 24 May, 2017 2 commits
    • Tom Finegan's avatar
      Remove CONFIG_{DE,EN}CODERS from the build system. · 378d652f
      Tom Finegan authored
      Use CONFIG_AV1_DECODER and CONFIG_AV1_ENCODER instead.
      
      Change-Id: I33d83aa6d31067d0db7a972d36927dc49c420f81
      378d652f
    • David Barker's avatar
      ext-inter: Further cleanup · f19f35f7
      David Barker authored
      * Rename the 'masked_compound_*' functions to just 'masked_*'.
        The previous names were intended to be temporary, to distinguish
        the old and new masked motion search pipelines. But now that the
        old pipeline has been removed, we can reuse the old names.
      
      * Simplify the new ext-inter compound motion search pipeline
        a bit.
      
      * Harmonize names: Rename
        aom_highbd_masked_compound_sub_pixel_variance* to
        aom_highbd_8_masked_sub_pixel_variance*, to match the naming of
        the corresponding non-masked functions
      
      Change-Id: I988768ffe2f42a942405b7d8e93a2757a012dca3
      f19f35f7
  21. 23 May, 2017 2 commits
    • David Barker's avatar
      Vectorize high-precision convolve filter · 5d34e6a7
      David Barker authored
      Add SSE2 lowbd and SSSE3 highbd versions of the filters
      introduced in https://aomedia-review.googlesource.com/c/11962/ .
      
      These filters are equivalent in speed to the SSE2 implementations
      of the regular convolve filter. The average time to filter a
      64x64 block is:
      
      lowbd C: 52us
      lowbd SSE2: 5.6us
      highbd C: 53us
      highbd SSSE3: 5.8us
      
      Also add a correctness test based on the warp filter tests.
      
      Change-Id: Ia0d81100e8a414bbfc2b5f664d751cf24765299e
      5d34e6a7
    • David Barker's avatar
      ext-inter: Delete dead code · 0f3c94e1
      David Barker authored
      Patches https://aomedia-review.googlesource.com/c/11987/
      and https://aomedia-review.googlesource.com/c/11988/
      replaced the old masked motion search pipeline with
      a new one which uses different SAD/SSE functions.
      This resulted in a lot of dead code.
      
      This patch removes the now-dead code. Note that this
      includes vectorized SAD/SSE functions, which will need
      to be rewritten at some point for the new pipeline. It
      also includes the masked_compound_variance_* functions
      since these turned out not to be used by the new pipeline.
      
      To help with the later addition of vectorized functions, the
      masked_sad/variance_test.cc files are kept but are modified
      to work with the new functions. The tests are then disabled
      until we actually have the vectorized functions.
      
      Change-Id: I61b686abd14bba5280bed94e1be62eb74ea23d89
      0f3c94e1