Skip to content
Snippets Groups Projects
Commit 7422189a authored by Timothy B. Terriberry's avatar Timothy B. Terriberry
Browse files

Fix silk_VQ_WMat_EC_sse4_1().

During review of c95c9a04, I replaced a call to
 _mm_cvtepi8_epi32() with the OP_CVTEPI16_EPI32_M64() macro (note
 the 16 instead of 8).
Make a separate OP_CVTEPI8_EPI32_M32() macro and use that instead.

Thaks to Wei Zhou for the report.
parent 23f503ad
No related branches found
No related tags found
No related merge requests found
......@@ -44,18 +44,26 @@
int opus_select_arch(void);
# endif
/*gcc appears to emit MOVDQA's to load the argument of an _mm_cvtepi16_epi32()
when optimizations are disabled, even though the actual PMOVSXWD instruction
takes an m64. Unlike a normal m64 reference, these require 16-byte alignment
and load 16 bytes instead of 8, possibly reading out of bounds.
We can insert an explicit MOVQ using _mm_loadl_epi64(), which should have the
same semantics as an m64 reference in the PMOVSXWD instruction itself, but
gcc is not smart enough to optimize this out when optimizations ARE enabled.*/
/*gcc appears to emit MOVDQA's to load the argument of an _mm_cvtepi8_epi32()
or _mm_cvtepi16_epi32() when optimizations are disabled, even though the
actual PMOVSXWD instruction takes an m32 or m64. Unlike a normal memory
reference, these require 16-byte alignment and load a full 16 bytes (instead
of 4 or 8), possibly reading out of bounds.
We can insert an explicit MOVD or MOVQ using _mm_cvtsi32_si128() or
_mm_loadl_epi64(), which should have the same semantics as an m32 or m64
reference in the PMOVSXWD instruction itself, but gcc is not smart enough to
optimize this out when optimizations ARE enabled.*/
# if !defined(__OPTIMIZE__)
# define OP_CVTEPI8_EPI32_M32(x) \
(_mm_cvtepi8_epi32(_mm_cvtsi32_si128(*(int *)(x))))
# define OP_CVTEPI16_EPI32_M64(x) \
(_mm_cvtepi16_epi32(_mm_loadl_epi64((__m128i *)(x))))
# else
# define OP_CVTEPI8_EPI32_M32(x) \
(_mm_cvtepi8_epi32(*(__m128i *)(x)))
# define OP_CVTEPI16_EPI32_M64(x) \
(_mm_cvtepi16_epi32(*(__m128i *)(x)))
# endif
......
......@@ -65,7 +65,7 @@ void silk_VQ_WMat_EC_sse4_1(
diff_Q14[ 0 ] = in_Q14[ 0 ] - silk_LSHIFT( cb_row_Q7[ 0 ], 7 );
C_tmp1 = OP_CVTEPI16_EPI32_M64( &in_Q14[ 1 ] );
C_tmp2 = OP_CVTEPI16_EPI32_M64( &cb_row_Q7[ 1 ] );
C_tmp2 = OP_CVTEPI8_EPI32_M32( &cb_row_Q7[ 1 ] );
C_tmp2 = _mm_slli_epi32( C_tmp2, 7 );
C_tmp1 = _mm_sub_epi32( C_tmp1, C_tmp2 );
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment