Commit 1b888f2e authored by David Barker's avatar David Barker Committed by Debargha Mukherjee
Browse files

Optimize SSE2 warp filter

Improve the speed of the warp filter itself by ~30%. This leads
to an overall decoder speedup of 5-20%, depending on bitrate,
for the global-motion experiment, and a small speedup for
warped-motion.

Applies a very minor change to the rounding during filter
selection (ROUND_POWER_OF_TWO makes slightly more sense here
than ROUND_POWER_OF_TWO_SIGNED, and is faster)

Change-Id: I3f364221d1ec35a8aac0d2c8b0e427f527d12e43
parent 0b04e9b8
...@@ -992,8 +992,7 @@ void warp_affine_c(int32_t *mat, uint8_t *ref, int width, int height, ...@@ -992,8 +992,7 @@ void warp_affine_c(int32_t *mat, uint8_t *ref, int width, int height,
int ix = ix4 + l - 3; int ix = ix4 + l - 3;
// At this point, sx = sx4 + alpha * l + beta * k // At this point, sx = sx4 + alpha * l + beta * k
const int16_t *coeffs = const int16_t *coeffs =
warped_filter[ROUND_POWER_OF_TWO_SIGNED(sx, warped_filter[ROUND_POWER_OF_TWO(sx, WARPEDDIFF_PREC_BITS) +
WARPEDDIFF_PREC_BITS) +
WARPEDPIXEL_PREC_SHIFTS]; WARPEDPIXEL_PREC_SHIFTS];
int32_t sum = 0; int32_t sum = 0;
for (m = 0; m < 8; ++m) { for (m = 0; m < 8; ++m) {
...@@ -1012,9 +1011,9 @@ void warp_affine_c(int32_t *mat, uint8_t *ref, int width, int height, ...@@ -1012,9 +1011,9 @@ void warp_affine_c(int32_t *mat, uint8_t *ref, int width, int height,
uint8_t *p = uint8_t *p =
&pred[(i - p_row + k + 4) * p_stride + (j - p_col + l + 4)]; &pred[(i - p_row + k + 4) * p_stride + (j - p_col + l + 4)];
// At this point, sy = sy4 + gamma * l + delta * k // At this point, sy = sy4 + gamma * l + delta * k
const int16_t *coeffs = warped_filter[ROUND_POWER_OF_TWO_SIGNED( const int16_t *coeffs =
sy, WARPEDDIFF_PREC_BITS) + warped_filter[ROUND_POWER_OF_TWO(sy, WARPEDDIFF_PREC_BITS) +
WARPEDPIXEL_PREC_SHIFTS]; WARPEDPIXEL_PREC_SHIFTS];
int32_t sum = 0; int32_t sum = 0;
for (m = 0; m < 8; ++m) { for (m = 0; m < 8; ++m) {
sum += tmp[(k + m + 4) * 8 + (l + 4)] * coeffs[m]; sum += tmp[(k + m + 4) * 8 + (l + 4)] * coeffs[m];
......
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment