1. 09 Oct, 2014 1 commit
    • Deb Mukherjee's avatar
      Subpel search cleanups and enhancements · d78dbff0
      Deb Mukherjee authored
      - Some fixes to surface fit.
      - Returns variance function as cost rather than sad in the
        pattern search and diamond search functions. Only
        vp9_pattern_search_sad function used in bigdia search
        uses sad as integer 1-away costs.
      - Deploys SUBPEL_TREE_PRUNED_MORE for speed 4+.
      
      Results:
      derf [Speed 3]: About +0.036% in coding efficiency without any
      discernible speed loss.
      derf [Speed 4]: About 2-3% faster at -0.199% loss in coding efficiency.
      derf [Speed 5]: About 3-4% faster at -0.149% loss in coding efficiency.
      
      Change-Id: I8462f94f6adb46966ca964f2bd0400977357fd63
      d78dbff0
  2. 07 Oct, 2014 1 commit
  3. 06 Oct, 2014 2 commits
    • Jingning Han's avatar
      Fix eobs buffer pointer mis-use · a7555158
      Jingning Han authored
      This commit fixes a buffer pointer mis-use in store_coding_context.
      The compression performance for stdhd set of speed 3 is improved by
      0.097%. It fixes issue 869.
      
      Change-Id: Idc59e22035eaf39f7133ca04174894374d647ff7
      a7555158
    • JackyChen's avatar
      Add SSE2 code and unit test for VP9 denoiser. · 80465dae
      JackyChen authored
      This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are
      only 16x16 blocks in denoiser, while in VP9, there are 13 different
      block sizes.
      
      By adding this SSE2 code, the improvement of encoder speed is around
      20%(using C code vs using SSE2 code), vary for different clips.
      
      The unit test for VP9 denoiser is to confirm that the SSE2 code is
      bit-exact with the C code. The unit test covers all block size.
      
      Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d
      80465dae
  4. 05 Oct, 2014 1 commit
    • Jingning Han's avatar
      Fix an IOC issue in vp9_rd_pick_inter_mode_sb · 085b97aa
      Jingning Han authored
      It is possible that the GOLDEN reference frame is not avaiable, in
      which setting the predicted mv will be associated with a residual
      value of INT_MAX. This commit checks this condition before
      left shift and comparison with that of ALTREF frame, to avoid
      overflow issue.
      
      Change-Id: Ib98c3149dbdd016f2fe5beaafb13f67d469dd07c
      085b97aa
  5. 03 Oct, 2014 6 commits
    • Yaowu Xu's avatar
      Properly initialize segmentID in nonrd coding path · 0065b734
      Yaowu Xu authored
      This commit adds proper initialization of segment id for variance AQ
      mode in non-rd coding path. It fixes the enc/dec mismatch issue of
      rt=7 with --aq-mode=1, as reported in issue #816
      
      Change-Id: I02fa41b96345bf2e66077d5ea553f85ba800f7bb
      0065b734
    • Jingning Han's avatar
      Fix indent in encode_rd_sb_row · ef622333
      Jingning Han authored
      Change-Id: Icbcfe7b56d88474f4398b4c5b52f6719d551ab4a
      ef622333
    • Jingning Han's avatar
      Rework partition search skip scheme · bb260d90
      Jingning Han authored
      This commit enables the encoder to skip split partition search if
      the bigger block size has all non-zero quantized coefficients in low
      frequency area and the total rate cost is below a certain threshold.
      It logarithmatically scales the rate threshold according to the
      current block size. For speed 3, the compression performance loss:
      derf  -0.093%
      stdhd -0.066%
      
      Local experiments show 4% - 20% encoding speed-up for speed 3.
      blue_sky_1080p, 1500 kbps
      51051 b/f, 35.891 dB, 67236 ms ->
      50554 b/f, 35.857 dB, 59270 ms (12% speed-up)
      
      old_town_cross_720p, 1500 kbps
      14431 b/f, 36.249 dB, 57687 ms ->
      14108 b/f, 36.172 dB, 46586 ms (19% speed-up)
      
      pedestrian_area_1080p, 1500 kbps
      50812 b/f, 40.124 dB, 100439 ms ->
      50755 b/f, 40.118 dB,  96549 ms (4% speed-up)
      
      mobile_calendar_720p, 1000 kbps
      10352 b/f, 35.055 dB, 51837 ms ->
      10172 b/f, 35.003 dB, 44076 ms (15% speed-up)
      
      Change-Id: I412e34db49060775b3b89ba1738522317c3239c8
      bb260d90
    • Deb Mukherjee's avatar
      Incorporate WRAPLOW macro into non-highbitdepth tx · d50716fa
      Deb Mukherjee authored
      Incorporates the WRAPLOW macro into the non-highbitdepth transforms
      to aid hardware verification between a software C model and an
      intended hardware implementation though the use of the configure
      options: --enable-experimental --enable-emulate-hardware.
      Note that to avoid further discrepancies between the sse/sse2
      implementations of the transforms and the C implementation, when the
      emulate hardware option is invoked, we also disable sse/sse2/etc.
      
      Also incudes some minor cleanups/renaming etc.
      
      Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
      d50716fa
    • Deb Mukherjee's avatar
      Prevent negative cost for highbitdepth · 431cdc33
      Deb Mukherjee authored
      Adds proper scaling for highbitdepth in a rdopt cost.
      
      Change-Id: I066694799a7f491b830945ef1c66eb202071c355
      431cdc33
    • Deb Mukherjee's avatar
      rdmult data type change · 00a4b20f
      Deb Mukherjee authored
      To fix a VS warning.
      
      Change-Id: I4c530c0afe8d06acdb8cc78b7995aba57a25373d
      00a4b20f
  6. 02 Oct, 2014 3 commits
  7. 01 Oct, 2014 5 commits
    • Jingning Han's avatar
      Remove unused header files from vp9_encodemb.h · 72a78a0c
      Jingning Han authored
      Change-Id: Icfc3fb62cc0b05e435814035bfe1f2e2870442b4
      72a78a0c
    • Deb Mukherjee's avatar
      High-bitdepth bugfixes · a160d725
      Deb Mukherjee authored
      Miscellaneous bug-fixes for high bitdepth functionality.
      With this patch, high bit-depth profiles become mostly functional,
      except for an intermittent assert failure issue that is being
      tracked.
      
      Change-Id: I6a7fcbdcf1e5b09842e88535f8442d2e1230748c
      a160d725
    • Jingning Han's avatar
      Remove repeated header files from vp9_block.h · 0a9f5fa1
      Jingning Han authored
      This commit removes unused header file vp9_onyxc_int.h and repeatedly
      included file vpx_ports/mem.h from vp9_block.h
      
      Change-Id: I400b210bd1da48f1880bd50a8f4a6e2c690e15a1
      0a9f5fa1
    • Yunqing Wang's avatar
      Modify block transform skipping check · e4aac6bb
      Yunqing Wang authored
      Block transform skipping was implemented based on DCT's energy
      conservation property. Modified the thresholds using zero bin
      parameters. AC and DC coefficients were checked separately to
      allow better identifying of skippable blocks.
      
      Borg test at speed 3 showed:
      stdhd set: psnr gain: 0.153%, ssim gain: 0.051%;
      derf set: psnr gain: 0.023%, ssim gain: 0.036%
      
      For most test clips, the encoding speedup is 1% - 2%.
      parkrun(720p): 7.5% speedup, park_joy(1080p): 3.5% speedup.
      
      Change-Id: If28eb81113a077414f5ca7b021c14f9069b373bb
      e4aac6bb
    • Jingning Han's avatar
      Conditionally skip reference frame check · 891793a5
      Jingning Han authored
      For regular inter frames, if the distance from GOLDEN_FRAME is larger
      than 2 and if the predicted motion vector of LAST_FRAME gives lower
      sse than that of GOLDEN_FRAME, skip the GOLDE_FRAME mode checking in
      the rate-distortion optimization. It provides about 5% speed-up at
      expense of -0.137% and -0.230% performance down for speed 3. Local
      experiment results:
      
      pedestrian 1080p 2000 kbps
      66712 b/f, 40.908 dB, 113688 ms ->
      66768 b/f, 40.911 dB, 108752 ms
      
      blue_sky 1080p 2000 kbps
      51054 b/f, 35.894 dB, 70406 ms ->
      51051 b/f, 35.891 dB, 67236 ms
      
      old_town_cross 720p 1500 kbps
      14412 b/f, 36.252 dB, 60690 ms ->
      14431 b/f, 36.249 dB, 57346 ms
      
      Change-Id: Idfcafe7f63da7a4896602fc60bd7093f0f0d82ca
      891793a5
  8. 30 Sep, 2014 1 commit
  9. 29 Sep, 2014 2 commits
    • JackyChen's avatar
      Fix a bug in calculating delta in VP9 denoiser. · 7ba646f7
      JackyChen authored
      When calculating delta in VP8 denoiser, since the block size is fixed to 16x16,
      the divisor is 256, which is the number of the pixel.
      But in VP9, the block size varies, the divisor should correspond to the block
      size.
      
      Change-Id: Ibdc1e5d23ba8c788b0d0dc6d406bcdfc34c1b142
      7ba646f7
    • Deb Mukherjee's avatar
      Adds two new subpel search methods · 4e9c0d2a
      Deb Mukherjee authored
      One is a more aggressive version of the pruned subpel tree
      search where only a single halfpel candidate is searched.
      The search candidate is based on a surface fit result.
      The other is a method to obtain the subpel position at one
      shot based on the same surface fit.
      
      The methods have not been deployed in any speed setting yet.
      
      Change-Id: I34fef3f2e34f11396c9d1ba97f4be8c4ffca62d3
      4e9c0d2a
  10. 26 Sep, 2014 3 commits
    • Deb Mukherjee's avatar
      Fix a bug introduced in a previous patch on highbd · d4713f1d
      Deb Mukherjee authored
      Change-Id: Ice692334f75157446a44a6e81503cada977934f4
      d4713f1d
    • Jingning Han's avatar
      Skip certain ALTREF inter modes in ARF coding · ccdb518f
      Jingning Han authored
      This commit enables the encoder to skip checking ALTREF inter modes
      in ARF coding, if the predicted motion vectors suggest that the
      GOLDEN_FRAME provides higher prediction accuracy than ALTREF_FRAME.
      
      It improves the speed 3 encoding speed by about 5%, at the expense
      of compression performance loss -0.041% and -0.225% for derf and
      stdhd, respectively.
      
      pedestrian_area 1080p 2000 kbps
      66705 b/f, 40.909 dB, 118738 ms ->
      66732 b/f, 40.908 dB, 113688 ms
      
      old_town_cross 720p 1500 kbps
      14427 b/f, 36.256 dB, 62746 ms ->
      14412 b/f, 36.252 dB, 60690 ms
      
      blue_sky 1080p 1500 kbps
      51026 b/f, 35.897 dB, 73310 ms ->
      50921 b/f, 35.893 dB, 70406 ms
      
      bus CIF 1000 kbps
      21301 b/f, 34.841 dB, 7326 ms ->
      21248 b/f, 34.837 dB, 7196 ms
      
      Change-Id: I76cf88b4d655e1ee3c0cb03c8a5745493040e8d2
      ccdb518f
    • Yunqing Wang's avatar
      Skip the partition search for still frames · 1fcbf6ed
      Yunqing Wang authored
      This patch re-enabled the feature in Pengchong's patch
      (commit 12861260). Originally, it
      was turned on while use_lastframe_partitioning > 0(not used anymore).
      Now it was added as a feature, and turned on while speed >= 2.
      As described in the original patch, this feature helps speed up the
      slideshows in YouTube.
      
      Change-Id: I1b0f18d65da1ee1c8d1e117dabba910c5207c471
      1fcbf6ed
  11. 25 Sep, 2014 2 commits
  12. 24 Sep, 2014 4 commits
  13. 23 Sep, 2014 4 commits
    • Deb Mukherjee's avatar
      High bit-depth loop/arf/postproc filter functions · 931ed516
      Deb Mukherjee authored
      Adds high-bitdepth loopfilter, temporal filter and postproc functions
      
      Change-Id: I81c8a9176890784686bc4f2af0d550d243b3b2d3
      931ed516
    • Yaowu Xu's avatar
      Adapt mode based rd_threshold for similar block size · 4a101310
      Yaowu Xu authored
      The rd_thresholds are adaptively changed based on best mode tested.
      It was only changed for the same block size, this commit makes the
      adaptation for similar block sizes too. The commit also made minor
      adjustment and code cleanups.
      
      The impact on encoding time for _ped:
      118089 ms -> 111927 ms
      
      The impact on compression:
      derf:  -0.339%
      stdhd: -0.303%
      
      Change-Id: I8817fed1102350497f2ec631849e43f753878e5d
      4a101310
    • Yaowu Xu's avatar
      Fix an IOC · 56032b47
      Yaowu Xu authored
      Change-Id: I0ca6746696d81657c035b0f6523c9af370da3c95
      56032b47
    • Deb Mukherjee's avatar
      Pruned subpel search for speed 3. · c94b17f4
      Deb Mukherjee authored
      Adds code to return an integer cost list for NSTEP search. Then
      uses it for pruned subpel search in speed 3.
      
      derf: -0.06%
      Speed on mobcal 720p increaes from 10.28 fps to 10.65 fps.
      [Subject to further testing].
      
      Change-Id: Ib591382d25b2c11bcaba9d3a27a93a9d1ab27a96
      c94b17f4
  14. 22 Sep, 2014 4 commits
    • Yaowu Xu's avatar
      Remove code duplication · c7ab18fe
      Yaowu Xu authored
      Change-Id: I453b3e0d946951665d5919248445fc4f3222d2ad
      c7ab18fe
    • Yaowu Xu's avatar
      Simplify rd_pick_intra_sby_mode() · f46326c7
      Yaowu Xu authored
      Change-Id: Ifb0915c94c2db48827ddbd446314cb6e3155b99c
      f46326c7
    • Jingning Han's avatar
      Remove unnecessary local variable declaration · f7023ea0
      Jingning Han authored
      This commit removes a repetitive local variable declaration in
      vp9_rd_pick_inter_mode_sb.
      
      Change-Id: I1b0afa98ff1ecbfb46e17d3d1cee95d32c4309db
      f7023ea0
    • Jingning Han's avatar
      Adaptive mode search scheduling · eee904c9
      Jingning Han authored
      This commit enables an adaptive mode search order scheduling scheme
      in the rate-distortion optimization. It changes the compression
      performance by -0.433% and -0.420% for derf and stdhd respectively.
      It provides speed improvement for speed 3:
      
      bus CIF 1000 kbps
      24590 b/f, 35.513 dB, 7864 ms ->
      24696 b/f, 35.491 dB, 7408 ms (6% speed-up)
      
      stockholm 720p 1000 kbps
      8983 b/f, 35.078 dB, 65698 ms ->
      8962 b/f, 35.054 dB, 60298 ms (8%)
      
      old_town_cross 720p 1000 kbps
      11804 b/f, 35.666 dB, 62492 ms ->
      11778 b/f, 35.609 dB, 56040 ms (10%)
      
      blue_sky 1080p 1500 kbps
      57173 b/f, 36.179 dB, 77879 ms ->
      57199 b/f, 36.131 dB, 69821 ms (10%)
      
      pedestrian_area 1080p 2000 kbps
      74241 b/f, 41.105 dB, 144031 ms ->
      74271 b/f, 41.091 dB, 133614 ms (8%)
      
      Change-Id: Iaad28cbc99399030fc5f9951eb5aa7fa633f320e
      eee904c9
  15. 20 Sep, 2014 1 commit
    • hkuang's avatar
      Remove mi_grid_* structures. · c70cea97
      hkuang authored
      mi_grid_* are arrays of pointer to pointer. They save the pointers that point
      to the MIs in cm->mi. But they are unnecessary and complicated. The original
      goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer
      inside MODE_INFO_t, same goal could be achieved.
      
      This commit totally removes the mi_grid_* structures. But there are still
      many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit
      will do on-demand MODE_INFO_t allocation in order to save these memories.
      
      Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6
      c70cea97