18 Apr, 2017
      Add txk_sel exp
      This will separate the transform kernel selection from lv_map
      experiment such that we can evaluate each feature's performance
      Note that txk_sel is build on top of lv_map
      ec_smallmul: Convert CDFs to iCDFs.
      Hoists the iCDF conversion outside of the daala code.
      We directly store 32768 - cdf[i] in each cdf, to avoid having to
      convert the whole array every time a symbol is coded.
      This works with ec_multisymbol, new_tokenset, and ec_adapt.
      Compared to Change-Id Idbbd3743e9189146cb519d5b984bdabd69e3f4c0,
      this improves decoder runtimes by 1.15% at QP=55 and 2.64% at
      The overall slowdown of ec_smallmul is now 0.12% at QP=55 and
      0.44% at QP=20.
      Encoder output should not change, and all streams should remain
      decodable without decoder changes.
      Deliver the eob threshold to inverse transform
      Move width branch out of height loop
      - AVX2 Copy and average functions are faster,
        Copy function: ~4%-57%
        Avg function:  ~17%-54%
      enable mv_compress by default
      Remove rt deadline.
      The "good" speed levels are universally better than the "rt" ones,
      running faster to achieve the same quality.
      rt mode also turned off alt refs and lag-in-frames, but these
      are still accessible separately (and the low latency test case
      explicitly sets them).
      Some features were used by the rt scale and not the good scale.
      Two additional "good" levels, 7 and 8, were added to accomidate
      these features and not reduce test coverage.
      bitstream-dbg: Add missing include to decodeframe
      If daala_ec is disabled while bistream_debug is enabled, decodeframe.c
      fails to compile due to aom_util/debug_util.h not being included
      This patch just adds the missing include so that decodeframe.c will
      still build with bitstream_debug enabled and daala_ec disabled.
      Avoid exiting tx size search when tx-size is square
      This fixes a mismatch in ext-tx + rect-tx introduced
      by a refactor in 2d147c16.
      Skip adding zero siginal to prediction with DC only idct
      If DC only idct gives zero, then we can skip the steps which
      add zero signal to predicted signal.
      DC only idct cases will occur more frequently at lower bit rates.
      Similar changes can be done for C version of high bit depth idct functions.
      Fix inspection mi grid size.
      Compress analyzer data using RLE.
      This significantly reduces the size of the exported
      data and improves analyzer decoding time.
      Fix build for motion-var and ext-inter
      Changes for chroma u8x8 obmc
      (1) Add a macro DISABLE_CHROMA_U8X8_OBMC to enable(one-sided)/turn
      off obmc in under 8x8 chroma blocks.
      (2) When it is enabled, use the above neighbor in chroma 4x4 obmc
      Turn on tx_type search when sb_type < BLOCK_8X8
      Fix enc/dec mismatch in global motion
      Resolve an enc/dec mismatch issue when global motion is turned on.
      Singularity handling in Gaussian elimination
      When the fwd Gaussian elimination process encounters diagonal
      element as zero value, the linear equation does not have unique
      solution. Return the linear solver state as unsolvable in such
      case. This resolves a floating point exception issue due to divided
      by zero in the loop restoration filter.
      Reduce prec of matrices/vectors for warp estimate
      Reduces precision of matrices by 2 bits.
      No material change in performance.
      Fix EOB threshold array size
      - TX_SIZES_ALL is the correct macro to cover all txfm sizes.
      Enable ext_intra by default
      Reduce array sizes for Wiener update steps
      pvq: Remove non-dyadic CDF initialization.
      This was still being used for CDFs whose size might not match the
      declared array size. We replace it with an intialization macro
      intended explicitly for this purpose.
      ec_smallmul: Simplify binary read/write.
      This should be the same number of operations as the non-ec_smallmul
      version (though ideally we'd use the real 15-bit probability
      Encoder output should not change, and all streams should remain
      decodable without decoder changes.
      daala_ec: Convert the decoder to use iCDFs
      This only changes the internal coding engine. We convert CDFs into
      iCDFs at the "bool" reader <-> daala_ec boundary.
      Decoder output should not change.
      daala_ec: Invert the internal state of the decoder
      This removes one subtraction from the CDF search loop (reducing the
      dependency chain for reading from the CDF) at the cost of one
      increment and decrement during renormalization (easily absorbed by
      the reorder buffer).
      There should be no change in decoded output.
      daala_ec: Convert the encoder to use iCDFs
      This only changes the internal coding engine. We convert CDFs into
      iCDFs at the "bool" writer <-> daala_ec boundary.
      Encoder output should not change, and all streams should remain
      decodable without decoder changes.
      daala_ec: Remove non-dyadic functions.
      Encoder output should not change, and all streams should remain
      decodable without decoder changes.
      update parallel_deblocking experiment with more filter tap options
      this change adds the following filter tap options:
      1. add options to replace 15 tap filter with 9 or 11 tap filter
      2. force chroma plane to only use maximum 7 tap filter
      above options are disabled by default
      Adjust encoder rate allocations for ext-refs
      This CL is targeted to improve the objective/subjective quality of the
      "ext-refs" coding tool.
      Tuned the frame rate factors as follows:
      (1) BRF_UPDATE:
          Decreased from INTER_HIGH (1.5) to GF_ARF_LOW (1.25);
          Increased from INTER_LOW (0.80) to INTER_NORMAL (1.00)
      , which is to reduce the bits allocated to the BWEDREF frame whereas
      to increase the bits allocated to the bi-directionally predicted
      Obtained a coding gain in overall PSNR as follows, compared against
      the original ext-refs:
      lowres: BDRate -0.181%
      midres: BDRate -0.090%
      hdres:  BDRate -0.701%
      Refactor gm/wm/obmc for cleaner warping interactions
      This creates a central function which defines when a
      block should be warped. It also refactors the
      WARPED_MOTION code so that all calls to av1_warp_plane
      happen in the same location.
      No change in performance.
      [optimize-b] Use a greedy search method
      The greedy search method improves the BD-rate over the baseline by
      more than 0.2% for lowres test set. Larger gain 0.55% is observed for hdres test set.
      [2017.04.06] Cleanup to remove redundant computation. On a local linux
      machine, the greedy method is now faster than the trellis method in
      encoding the first 100 frames of foreman_cif. However, the BD-rate seems
      to become smaller due to the recent changes to the codebase.
      [2017.04.06-2] Style changes to meet the requirements.
      remove "greedy-optimize-b" in configure
      [2017.04.10] Move the changes under the macro USE_GREEDY_OPTIMIZE_B
      [2017.04.11] Adjust rdmult to accommodate CpuSpeedTest
      [2017.04.12] Set USE_GREEDY_OPTIMIZE_B to 0 at the request of debargha@.
      [2017.04.13] Move greedy implementation of optimize_b into a separate
      function with the same name (selected by USE_GREEDY_OPTIMIZE_B, default
      is 0)
      Simplify coefficient range checking
      Deduplicate implementations of check_range, and deduplicate the call
      to aom_read_bit.
      Modify av1_read_tx_type for lv_map exp
      Write/update tx_type per txb in lv_map exp
      Add av1_update_tx_type_count()
      This will make the code cleaner and lv_map experiment will be able
      to reuse this function.
      Modify av1_write_tx_type for lv_map experiment
      Change-Id: If129748d918995efcc58169d153a0950eeec5efb