- 20 Jul, 2017 1 commit
-
-
Sarah Parker authored
This adds the new transform to the list of possible transforms. The impact on performance is in the noise range because the transform implementation currently performs DCT as a placeholder. This transform will initially only have an implementation for TX_32X32 and it is skipped in the tx search for smaller transform sizes. Change-Id: Iab2faddc525b478ca06972a753428a4f4ef53ac6
-
- 17 Jul, 2017 1 commit
-
-
Lester Lu authored
Change two similar structs, FWD_TXFM_PARAM and INV_TXFM_PARAM, into a common struct: TxfmParam. Its definition is moved to aom_dsp/txfm_common.h to simplify dependency. This change is made so that, in later changes of the LGT experiment, functions requiring FWD_TXFM_PARAM and INV_TXFM_PARAM, such as get_fwd_lgt4 and get_inv_lgt4, can also be unified. Change-Id: I756b0176a02314005060adbf8e62386f10eeb344
-
- 07 Jul, 2017 1 commit
-
-
Lester Lu authored
The input arguments of av1_fht* and av1_iht* functions (and their HBD versions) are slightly changed. Input arguments tx_type and bd are carried by a struct fwd_txfm_param/inv_txfm_param. This struct is meant to later on carry other prediction information, such as intra top/left boundaries to the transform level, so that the choice of transforms can be more adaptive to the prediction mode and local video content. Change-Id: Ia42544248a51845be64b72855b642ef1fe5910a9
-
- 28 Jun, 2017 1 commit
-
-
Yi Luo authored
Change-Id: Iaae46d0735539b8b8daf9faac81c2a3434838020
-
- 26 Jun, 2017 1 commit
-
-
Lester Lu authored
In previous ADSTs, DST-7 and DST-4 are used for length 4 and length 8/16/32, respectively. In this LGT experiment we explore transforms between DST-4 and DST-7. When CONFIG_LGT flag is on, adst4 and adst8 are replaced by lgt4 and lgt8, the intermediate transforms with pre-chosen parameters. The LGTs applied here are lgt4_160 and lgt8_170, where the numbers mean the self-loop weights times 100. The associated values for DST-7 and DST-4 are 100 and 200. ovr_psnr: lowres: -0.140 midres: -0.131 hdres: -0.078 These changes are not applied to the highbd scenario in the current version. Change-Id: I20600456da8766528b2b6b11aa28801e70af498e
-
- 08 Jun, 2017 1 commit
-
-
Sarah Parker authored
This unifies the codepath for high-bitdepth transforms and deletes all calls to the old deprecated versions. This required reworking the way 1d configurations are combined in order to support rectangular transforms. There is one remaining codepath that calls the deprecated 4x4 hbd transform from encoder/encodemb.c. I need to take a closer look at what is happening there and will leave that for a followup since this change has already gotten so large. lowres 10 bit: -0.035% lowres 12 bit: 0.021% BUG=aomedia:524 Change-Id: I34cdeaed2461ed7942364147cef10d7d21e3779c
-
- 01 Jun, 2017 1 commit
-
-
Timothy B. Terriberry authored
cb4x4 itself should not require these sizes. This simplifies compatibility with other experiments, since we can first make them work with cb4x4 (which is now on by default), and then worry about chroma_2x2 (which is not) in separate steps. Encoder and decoder output should remain unchanged. Change-Id: I4e9fcdae49f238b5099a3c74a398fe993c2545f8
-
- 19 May, 2017 1 commit
-
-
Yue Chen authored
It gives 0.1% gain on lowres and midres Change-Id: I555a492a68571c525713840d73aa5614fe80a87d
-
- 12 Apr, 2017 1 commit
-
-
Sebastien Alaiwan authored
Rename '--enable-aom-highbitdepth' to '--enable-highbitdepth' Change-Id: I1de13c3508c30c552532993419d8ace326142ab6
-
- 31 Mar, 2017 1 commit
-
-
Yi Luo authored
Change-Id: I99b15e5270bfefe2eb3e982aeba06ed564540d73
-
- 28 Mar, 2017 1 commit
-
-
hui su authored
Change-Id: Ie18fd2b8a3caf3948748ee353fe41e37f5803ba3
-
- 28 Feb, 2017 1 commit
-
-
Angie Chiang authored
Change-Id: Ie1bfece43c81ee5d149ed25c3f7fd959a8f95030
-
- 13 Jan, 2017 1 commit
-
-
Yi Luo authored
- Turn on SSE2 unit tests Change-Id: I285771b04c0dec0501210fde570b9ac3cb9c4be0
-
- 09 Jan, 2017 1 commit
-
-
Angie Chiang authored
Performance drop with ext_tx and rect_tx on BDRate lowres -0.028 midres -0.075 hdres -0.054 Change-Id: I50f89b9e9785d82ab05c3276a3c8b22b4dcfd408
-
- 04 Jan, 2017 2 commits
-
-
Angie Chiang authored
This CL aims at simplify transform code. Change-Id: Ibaf1dd8607e37d44a0f77788a72e344583f81fa0
-
Angie Chiang authored
Change-Id: I6ce654b582f2a9d45a40bf22ba597b47d418a0be
-
- 20 Dec, 2016 1 commit
-
-
Jingning Han authored
Add 2x2 forward and inverse 2D-DCT for high bit-depth. Change-Id: I3092a2587a0cdc6675a69cc9203499a530b65325
-
- 30 Nov, 2016 1 commit
-
-
Jingning Han authored
Add a 2x2 forward transform function for 4x4 coding block unit. Change-Id: I44c8f0d55f371db68541e7e5f7cbd340a82cd788
-
- 09 Nov, 2016 1 commit
-
-
Debargha Mukherjee authored
Also includes some refactoring and cleanups. Change-Id: I2c2528c434a1e9e9b898251fa69489d884463929
-
- 03 Nov, 2016 1 commit
-
-
Debargha Mukherjee authored
For higher level fwd and inv transform functions. Change-Id: I91518250a0be7d94aada7519f6c9e7ed024574fb
-
- 02 Nov, 2016 1 commit
-
-
Jingning Han authored
Change-Id: I4128ab932a967a3d657bb1f95f0fa2af20a06469
-
- 20 Oct, 2016 1 commit
-
-
Yi Luo authored
- Use range check function to avoid DCT_DCT overflow. We need to re-develop the column txfm side scaling/rounding. Now, we prefer to maintain the current BDRate level. - Encoder user level time reduction <1% owing to av1_fht32x32_avx2. - Add MemCheck unit test and fdct32() unit test. Change-Id: I1e67030f67bc637859798ebe2f6698afffb8531c
-
- 12 Oct, 2016 1 commit
-
-
Yi Luo authored
- av1_fht32x32 AVX2 function level time reduction ~89% compared to C. - av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2() But function replacement must go with the corresponding inverse txfm. - No obvious user level time reduction due to 32x32 TX_TYPE selection. - Zero high 128b YMM to avoid AVX-SSE transition penalties (fix 16x16 case). - Added 32x32 AVX2 unit tests to verify bitexact. - AVX2 optimization summary: On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results: C to AVX2: function level time reduction, ~86-89%. SSE2 to AVX2: function level time reduction, ~51%. Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
-
- 02 Sep, 2016 1 commit
-
-
Yaowu Xu authored
Change-Id: I2b2b70e756b7eb9611b7b33b7d5f19b3b30e0a50
-
- 01 Sep, 2016 2 commits
-
-
Yaowu Xu authored
Cherry-Picked the following commits: 0defd8f2 Changed "WebM" to "AOMedia" & "webm" to "aomedia" 54e66767 Replace "VPx" by "AVx" 5082a369 Change "Vpx" to "Avx" 7df44f17 Replace "Vp9" w/ "Av1" 967f722f Remove kVp9CodecId 828f30ce Change "Vp8" to "AOM" 030b5ffc AUTHORS regenerated 2524caee Add ref-mv experimental flag 016762be Change copyright notice to AOMedia form 81e55269 Replace vp9 w/ av1 9b94565b Add missing files fa8ca9f2 Change "vp9" to "av1" ec838b76 Convert "vp8" to "aom" 80edfa01 Change "VP9" to "AV1" d1a11fb9 Change "vp8" to "aom" 7b582513 Point to WebM test data dd1a5c8d Replace "VP8" with "AOM" ff00fc0f Change "VPX" to "AOM" 01dee0bb Change "vp10" to "av1" in source code cebe6f0c Convert "vpx" to "aom" 17b05679 rename vp10*.mk to av1_*.mk fe5f8a8a rename files vp10_* to av1_* Change-Id: I6fc3d18eb11fc171e46140c836ad5339cf6c9419
-
- 18 Aug, 2016 1 commit
-
-
clang-format authored
after: 253c001f Port dering experiment from aom 72081457 Adding 8x16/16x8/32x16/16x32 transforms Change-Id: Id93e0d7b72a128701d8dec35fc2fac473944d0c1
-
- 15 Aug, 2016 1 commit
-
-
Debargha Mukherjee authored
Adds forward, inverse transforms and scan orders. Change-Id: Iab6994f4b0ef65e660b714d111b79b1c8172d6a8
-
- 12 Aug, 2016 1 commit
-
-
clang-format authored
Change-Id: I58a42ced5b8a4338524434ff3356850b89aa705a
-
- 21 Jul, 2016 1 commit
-
-
Debargha Mukherjee authored
Added a new expt rect-tx to be used in conjunction with ext-tx. [rect-tx is a temporary config flag and will eventually be merged into ext-tx once it works correctly with all other experiments]. Added 4x8 and 8x4 tranforms for use initially with rectangular sub8x8 y blocks as part of this experiment. There is about a -0.2% BDRATE improvement on lowres, others pending. When var-tx is on rectangular transforms are currently not used. That will be enabled in a subsequent patch. Change-Id: Iaf3f88ede2740ffe6a0ffb1ef5fc01a16cd0283a
-
- 23 Jun, 2016 1 commit
-
-
Jingning Han authored
This commit refactors the transform and quantization process for sub8x8 blocks and unifies the related functions. Change-Id: I005f61f3eb49eec44f947b906c4e308cab9935a2
-
- 18 May, 2016 1 commit
-
-
Yi Luo authored
- Integrate 5 flip transform types for each 4x4, 8x8, and 16x16 block, for experiment, EXT_TX. - Encoder speed improves about 12%-15%. - Update the unit tests for bit-exact result against C. Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb
-
- 11 May, 2016 1 commit
-
-
Angie Chiang authored
Will add unit test to test/vp10_fwd_txfm2d_test.cc later Change-Id: I626900c67fca4eee2ad0ae1828188527a04a5362
-
- 10 May, 2016 1 commit
-
-
Jingning Han authored
The encoder is using vp10_fwd_txfm2d_32x32 now. Change-Id: I719f18ec0b065f1e062d01fd300533dd2f17c712
-
- 22 Apr, 2016 1 commit
-
-
Yi Luo authored
Unit test shows manually developed SSE4.1 code would performs ~30% better if TXFM_2D_CFG configuration is set in lower level. This change only updates function signature. There is no performance impact. Change-Id: I62692bd50a21ffc8a944bbd6c155c0a2020ad77b
-
- 14 Apr, 2016 1 commit
-
-
Angie Chiang authored
Change-Id: Ie428c6f0655873de3e77e844a2f2e4203cf47dff
-
- 04 Apr, 2016 1 commit
-
-
Angie Chiang authored
Change-Id: I9281935653aacce22ac3100f79fb956c249e2bf3
-
- 28 Mar, 2016 1 commit
-
-
Angie Chiang authored
Change-Id: I996c48a90d7d71b52594a91a35cb8712c7fc212e
-
- 25 Mar, 2016 1 commit
-
-
Yi Luo authored
- Wrote function: fidtx8_sse2() and fidtx16_sse2(). - Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types. - Updated 8x8/16x16 unit tests for accuracy/speed. - Running 20K times with random numbers and getting through tx type from V_DCT to H_FLIPADST, SSE2 speed improvement: 8x8: ~131% 16x16: ~66% Change-Id: Ibbb707e932a08fec3b1f423a7dab280a1d696c9a
-
- 24 Mar, 2016 1 commit
-
-
Yi Luo authored
- Added function fidtx4_sse2(). - Turned on vp10_fht4x4_sse2() for these tx types. - Updated 4x4 unit test for speed/accuracy. - 4x4 Unit test passed. - Running 20K times with random numbers for tx type from V_DCT to H_FLIPADST, SSE2 against C, speed improves ~46%. Change-Id: I828088b7f98dc0f5939a72e3fcd6cb0b8d8dd8bf
-