- 06 May, 2016 1 commit
-
-
Yaowu Xu authored
This fixes compiler warnings from MSVC. Change-Id: Iaac0e994869561371295578a893f766493ce0544
-
- 30 Apr, 2016 1 commit
-
-
Yi Luo authored
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Update bit-exact unit test against current C version. - HBD encoder speed improves ~3.8%. Change-Id: Ie13925ba11214eef2b5326814940638507bf68ec
-
- 25 Apr, 2016 1 commit
-
-
Yi Luo authored
- Optimization on tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Overall encoder speed improves ~4.5%-6%. - Update bit-exact unit test against current C version. Change-Id: If751c030612245b1c2470200c9570cf40d655504
-
- 15 Apr, 2016 1 commit
-
-
Yi Luo authored
- Implemented Angie's new fwd txfm algorithm. - Improve ~100% than last 64-bit version; 3 times faster than original C code. - Passed bit-exact unit test. Change-Id: Ica30b9768706604a6d69fe42da778441f0f5f02e
-
- 30 Mar, 2016 1 commit
-
-
Geza Lore authored
If --enable-ext-partition is used at build time, the superblock size (sometimes also referred to as coding unit (CU) size) is extended to 128x128 pixels. Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a
-
- 25 Mar, 2016 1 commit
-
-
Yi Luo authored
- Wrote function: fidtx8_sse2() and fidtx16_sse2(). - Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types. - Updated 8x8/16x16 unit tests for accuracy/speed. - Running 20K times with random numbers and getting through tx type from V_DCT to H_FLIPADST, SSE2 speed improvement: 8x8: ~131% 16x16: ~66% Change-Id: Ibbb707e932a08fec3b1f423a7dab280a1d696c9a
-
- 24 Mar, 2016 1 commit
-
-
Yi Luo authored
- Added function fidtx4_sse2(). - Turned on vp10_fht4x4_sse2() for these tx types. - Updated 4x4 unit test for speed/accuracy. - 4x4 Unit test passed. - Running 20K times with random numbers for tx type from V_DCT to H_FLIPADST, SSE2 against C, speed improves ~46%. Change-Id: I828088b7f98dc0f5939a72e3fcd6cb0b8d8dd8bf
-
- 23 Mar, 2016 2 commits
-
-
Yi Luo authored
- Use Makefile to control the build for highbd_fwd_txfm_sse4.c. - Fixed hybrid transform (HT) types due to recent update. - Added new unit test cases for highbd HT. Change-Id: Ifd768a9b429a8c21ed40c1de8152fb5ac71e2f90
-
Yi Luo authored
- Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1 intrinsics optimization. - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(), and fdct4x4_sse4_1(). - Used logic right shift to avoid coeff memory write/read. - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only. - Improved overall encoding performance >2.3% for 50 frames sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12, --bit-depth=12, 50 frames. - Unit test passed. Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004
-
- 21 Mar, 2016 1 commit
-
-
Debargha Mukherjee authored
Makes a set of 16 transforms total, adding all 1D combinations of ADST and FlipADST, and removng all DST transforms. lowres, midres both improve by about 0.1% and hdres by -0.378% in BDRATE but with fewer transforms that are also simpler. Further experiments to continue later. Change-Id: I7348a4c0e12078fdea5ae3a2d36a89a319ffcc6e
-
- 08 Mar, 2016 1 commit
-
-
Yi Luo authored
- Implemented fdst16_sse2(), fdst16_8col() against C version: fdst16(). - Turned on 7 DST related hybrid txfm types in vp10_fht16x16_sse2(). - Replaced vp10_fht10x10_c() with vp10_fht16x16_sse2() in fwd_txfm_16x16(). - Added vp10_fht16x16_sse2() unit test against C version: vp10_fht16x16_c() (--gtest_filter=*VP10Trans16x16*). - Unit test passed. - Speed improvement: 2.4%, 3.2%, 3.2%, for city_cif.y4m, garden_sif.y4m, and mobile_cif.y4m. Change-Id: Ib30a67ce5d5964bef143d588d0f8fa438be8901f
-
- 02 Mar, 2016 1 commit
-
-
Yi Luo authored
fdct16_sse2() was not bit-exact with C reference, fdct16(). The inconsistency was found by writing a unit test for vp10_fht16x16_sse2(). Since the unit test needs a pending change on the inherited base class. I will commit this unit test after making a header file for this base class. Passed the uncommitted unit test: vp10_fht16x16_test.cc. Change-Id: If2b617883c633a3ea90c19e1d018240c8007102b
-
- 24 Feb, 2016 1 commit
-
-
Yi Luo authored
Implemented fdst8_sse2() function against C version: fdst8(). Added seven DST related hybrid transform types in vp10_fht8x8_sse2(). Replaced vp10_fht8x8_c() with vp10_fht8x8_sse2() in fwd_txfm_8x8(). Speedup: 18.1%, 11.5%, 22.0% based on speed test from city_cif.y4m, garden_sif.y4m, mobile_cif.y4m. Change-Id: Ia4aa1ea44c7a33e494f64ce843037f8703f975e3
-
- 19 Feb, 2016 1 commit
-
-
Yi Luo authored
Applied DST sse2 to 4x4 transform. Fixed DST coefficient packing to satisfy 4x4 transpose requirement. Change-Id: I9164714c77049523dbbc9e145ebb10d7911fba9d
-
- 14 Dec, 2015 1 commit
-
-
James Zern authored
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
-
- 09 Nov, 2015 1 commit
-
-
Johann authored
Javan Whistling Duck release. Change-Id: If44c9ca16a8188b68759325fbacc771365cb4af8
-
- 03 Nov, 2015 1 commit
-
-
Geza Lore authored
This patch eliminates the copying of data when using FLIPADST forward transforms, by incorporating the necessary data flipping into the load_buffer_* functions of the SSE2 optimized forward transforms. The load_buffer_* functions are normally inlined, so the overhead of copying the data is removed and the overhead of flipping is minimized. Left to right flipping is still not free, as the columns need to be shuffled in registers. To preserve identity between the C and SSE2 implementations, the appropriate C implementations now also do the data flipping as part of the transform, rather than relying on the caller for flipping the input. Overall speedup is about 1.5-2% in encode on my tests. Note that these are only the forward transforms. Inverse transforms to come in a later patch. There are also a few code hygiene changes: - Fixed some indents of switch statements. - DCT_DCT transform now always use vp10_fht* functions, which dispatch to vpx_fdct* for DCT_DCT (some of them used to call vpx_fdct* directly, some of them used to call vp10_fht*). Change-Id: I93439257dc5cd104ac6129cfed45af142fb64574
-
- 12 Aug, 2015 2 commits
-
-
Jingning Han authored
Remove the vp9_ prefix from vp10 file names. Change-Id: I513a211b286a57d6126fc1b0fbfd6405120014f1
-
Jingning Han authored
This commit folks the VP9 and VP10 codebase and makes libvpx support VP8, VP9, and VP10. Change-Id: I81782e0b809acb3c9844bee8c8ec8f4d5e8fa356
-