1. 02 Feb, 2018 16 commits
    • Angie Chiang's avatar
      Implement sse2 inv 1d txfms · 1637d424
      Angie Chiang authored
      Change-Id: I9a42b75de3e623f6af325edbe91e299c0662f19c
      1637d424
    • Thomas Daede's avatar
      Re-allow 32x32 inter idtx. · af73d536
      Thomas Daede authored
      BUG=aomedia:1293
      
      Change-Id: Iabdeb4ef7a98b034a4777527f727231f7b8815ee
      af73d536
    • Sebastien Alaiwan's avatar
      Move encoder-only function to encodemb.c · 750f6445
      Sebastien Alaiwan authored
      Change-Id: Id7ea17a5124215907d076e0e3500b9aeea1146fc
      750f6445
    • Debargha Mukherjee's avatar
      Remove code for CONFIG_FAST_SGR=2 and cleanup · 1a709944
      Debargha Mukherjee authored
      Change-Id: I01cecc829e2d57517427a1de6387e91ba3c64312
      1a709944
    • Imdad Sardharwalla's avatar
      SSE4 and AVX2 implementations of updated FAST_SGR · d051e560
      Imdad Sardharwalla authored
      The SSE4.1 and AVX2 implementations of the self-guided filter have been updated
      to match the updated FAST_SGR C implementation in restoration.c.
      
      The self-guided filter speed tests have been altered to compare the speeds of
      the SIMD and C implementations of the relevant functions.
      
      Speed Tests (code compiled with CLANG)
      ===========
      
      For LowBD:
      - The SSE4.1 implementation is ~220% faster (~69% less time) than the C code
      - The AVX2 implementation is ~314% faster (~76% less time) than the C code
      
      For HighBD:
      - The SSE4.1 implementation is ~240% faster (~71% less time) than the C code
      - The AVX2 implementation is ~343% faster (~77% less time) than the C code
      
      Change-Id: Ic2734bb89ccd3f66667c68647e5f677a5a496233
      d051e560
    • Angie Chiang's avatar
      Implement sse2 fwd 1d txfms · 1a796617
      Angie Chiang authored
      Change-Id: I8dcaa6882d47a097498c8f8af515b1185df4fdf3
      1a796617
    • Hui Su's avatar
      lv-map: move loading of default CDFs to av1_default_coef_probs() · 3d288156
      Hui Su authored
      In preparation for supporting q_adapt_probs.
      
      Change-Id: I4a39b81b0d2c4ceb1586ae411a1216c6c20d896d
      3d288156
    • Hui Su's avatar
      Reduce memory usage of inter_tx_size[] in MB_MODE_INFO · 7167d952
      Hui Su authored
      Reduce the length of inter_tx_size[] from 1024 to 16.
      
      On a cif test sequence,
      encoder memory consumption decreases by 18% (380MB -> 312MB);
      decoder memory consumption decreases by 56% (21.4MB -> 9.4MB).
      
      Change-Id: I42928eb9312748f96f4393c8d8040791f38f98b6
      7167d952
    • Frederic Barbier's avatar
      Cleanup deprecated comments · e5d166ef
      Frederic Barbier authored
      Change-Id: I91f18c498c694829b933bb73812ad94d66962994
      e5d166ef
    • Imdad Sardharwalla's avatar
      AVX2 implementation of the Wiener filter · aab6aee3
      Imdad Sardharwalla authored
      Added an AVX2 version of the Wiener filter, along with associated tests. Speed
      tests have been added for all implementations of the Wiener filter.
      
      Speed Test results
      ==================
      
      GCC
      ---
      
      Low bit-depth filter:
      - SSE2 vs C: SSE2 takes ~92% less time
      - AVX2 vs C: AVX2 takes ~96% less time
      - SSE2 vs AVX2: AVX2 takes ~43% less time (~74% faster)
      
      High bit-depth filter:
      - SSSE3 vs C: SSSE3 takes ~92% less time
      - AVX2  vs C: AVX2  takes ~96% less time
      - SSSE3 vs AVX2: AVX2 takes ~46% less time (~84% faster)
      
      CLANG
      -----
      
      Low bit-depth filter:
      - SSE2 vs C: SSE2 takes ~84% less time
      - AVX2 vs C: AVX2 takes ~88% less time
      - SSE2 vs AVX2: AVX2 takes ~27% less time (~36% faster)
      
      High bit-depth filter:
      - SSSE3 vs C: SSSE3 takes ~85% less time
      - AVX2  vs C: AVX2  takes ~89% less time
      - SSS3  vs AVX2: AVX2 takes ~24% less time (~31% faster)
      
      Change-Id: Ide22d7c09c0be61483e9682caf17a39438e4a208
      aab6aee3
    • Debargha Mukherjee's avatar
      Don't use extra lines for r=2 guided filter · f7d1ff49
      Debargha Mukherjee authored
      Changes the CONFIG_FAST_SGR=1 strategy to not use any
      subsampling for the r=1 filter, but for the r=2 filter
      sub-sample vertically but combine only by filtering
      horizontally in the last stage for odd rows.
      
      Coding efficiency loss sems quite minimal.
      
      Change-Id: I5644ac400b387c37a2d278db7f6ad3ac0a6b5e93
      f7d1ff49
    • Debargha Mukherjee's avatar
      Remove CONFIG_FRAME_SIGN_BIAS config flag · 23b54841
      Debargha Mukherjee authored
      Change-Id: I6138519456b2ad3ffc8bced803ddc4418b246e74
      23b54841
    • Debargha Mukherjee's avatar
      Port first pass stats handling from Vp9 into Av1 · da01c0dc
      Debargha Mukherjee authored
      Some parameter tuning included.
      
      lowres (q, 30 frames, speed 1):
      -1.243% av PSNR, -2.337% ov PSNR, +0.577% SSIM
      
      lowres (vbr, 30 frames, speed 1):
      -0.327% av PSNR, -1.007% ov PSNR, +0.182% SSIM
      
      A few videos become a lot worse in SSIM, which needs to be
      investigated. But PSNR-wise the patch seems pretty good.
      
      Change-Id: I17c8d812c96ee49ddae7d3959a459aa3ffcea208
      da01c0dc
    • Peng Bin's avatar
      Remove aom_comp_mask_upsampled_pred from rtcd · f8daa92d
      Peng Bin authored
      Since aom_comp_mask_upsampled_pred just call aom_upsampled_pred
      and aom_comp_mask_pred, no need to separate c version from simd
      version any more.
      
      Change-Id: I1ff8bcae87d501c68a80708fd2dc6b74c6952f88
      f8daa92d
    • Yaowu Xu's avatar
      Align allocated buffers · a64c05b5
      Yaowu Xu authored
      BUG=aomedia:1306
      
      Change-Id: I5a8bdbd472213ded2de706c5b044a1bf24823670
      a64c05b5
    • Jingning Han's avatar
      Fix txk-sel unit test failure in aq-mode · 1e79d90e
      Jingning Han authored
      The current aq mode encoder setting would alter the segment_id
      between the rate-distortion optimization and the block encoding
      stages. Disable the corresponding consistency check in this case.
      
      BUG=aomedia:1251
      
      Change-Id: Ic910a23fd64a9b4554567d3c8c9a9ae5f6062c7b
      1e79d90e
  2. 01 Feb, 2018 14 commits
  3. 31 Jan, 2018 10 commits
    • Johann's avatar
      Revert "Enable pic (position independent code) config." · 32bb5110
      Johann authored
      This reverts commit fbeee067.
      
      The assembly which was failing has been fixed.
      
      BUG=aomedia:102
      
      Change-Id: Ide75630b38603a2553f6e231085994251c77b26c
      32bb5110
    • Hui Su's avatar
      Fix the fast tx type search feature with txk-sel on · 8f47f705
      Hui Su authored
      Change-Id: If0b1d2fe31569104f2d8eef3cfd42cab30162c7e
      8f47f705
    • Hui Su's avatar
      Reduce memory usage of inter_tx_size[] in MB_MODE_INFO · 1379beb7
      Hui Su authored
      Reduce the length of inter_tx_size[] from 1024 to 16.
      
      On a cif test sequence,
      encoder memory consumption decreases by 18% (380MB -> 312MB);
      decoder memory consumption decreases by 56% (21.4MB -> 9.4MB).
      
      Change-Id: Ie11dd055255d200954b704b8c2ad8ca3dff7bf5c
      1379beb7
    • Tom Finegan's avatar
      Fix a couple of visual studio nightly build failures. · f19a91cb
      Tom Finegan authored
      - Add explicit cast of bool to int to silence a test warning.
      - Add explicit cast of size_t to int for same in dump_obu.
      
      Change-Id: I90846eb5c88880d921f20cb66b116ab7d2799af5
      f19a91cb
    • Angie Chiang's avatar
      Integrate lv_map with aom_qm · b3167a65
      Angie Chiang authored
      BUG=aomedia:717
      
      Change-Id: Ib06a12039cb72665c1ee534cc2246ac3d23f878d
      b3167a65
    • Soo-Chul Han's avatar
      add scalability experiment · f8589863
      Soo-Chul Han authored
      cmake: -DCONFIG_SCALABILITY=1
      
      Change-Id: Ifa908f809bcf904bdf0ed87b351e1ef3accc2b3f
      f8589863
    • Johann's avatar
      use GLOBAL() macro when loading constant · 4972ac81
      Johann authored
      Clear linker error when building with gcc 6:
      relocation R_X86_64_32 against `.rodata' can not be used when making a
      shared object; recompile with -fPIC
      
      BUG=aomedia:102
      
      Change-Id: I6c06de1e9dac1c044a4b07125abcaba0943a29b6
      4972ac81
    • Hui Su's avatar
      one level less of tx size search for blocks larger than 64 · 7ed7e1fa
      Hui Su authored
      3~5% encoding speedup for speed 0; no quality loss.
      
      Change-Id: I0e31755f45253e5e99d8d9eed0d7a6fe6050f49f
      7ed7e1fa
    • Urvang Joshi's avatar
      Cleanup some fragile aspects of rd_pick_partition. · 00c6e6f7
      Urvang Joshi authored
      (1) Explicitly reset RD stats for each partition.
      
      Earlier,
      PARTITION_SPLIT was the only one resetting the RD_STATS in 'sum_rdc'.
      
      But this was working because:
      - PARTITION_SPLIT was tried before VERT, HORZ, VERT_4 and HORZ_4; and
      - RD cost calculations in VERT, HORZ, VERT_4 and HORZ_4 partitions
      implicitly discarded existing value in sum_rdc
      
      However, that was very fragile; explicitly resetting the stats every
      time is much safer.
      
      (2) Using a separate variable 'temp_best_rd_cost' was fragile as someone
      may forget to update the same. So, we use best_rdc.rdcost directly.
      
      BUG=aomedia:1246
      
      Change-Id: Icd75f25c34bb0f1806e691784648bcffce2417e6
      00c6e6f7
    • Deepa K G's avatar
      AVX2 optimization of motion compensation functions · c8e0336a
      Deepa K G authored
      AVX2 implementation of av1_convolve_x_sr, av1_convolve_y_sr and
      av1_convolve_2d_sr have been added.
      
      Improvements have been made to av1_convolve_x_avx2, av1_convolve_y_avx2
      and av1_convolve_2d_avx2.
      
      Change-Id: I62a699dd9dcf42de94dd72cc2d43affc0dc31404
      c8e0336a