FARGAN truncation?
Hi, I'm a speech scientist experimenting with FARGAN. I've installed Opus on my machines (one a Mac, the other a Linux server) and tried the FARGAN demo. I'm finding that in copy-synthesis with FARGAN, the output wavefile is 50-59.9 msec shorter than the original. Is this expected?
More detail: My lab uses wavefiles with headers, so to start I use sox to remove the header: sox file.wav file.raw
Then I follow the FARGAN recipe: ./fargan_demo -features file.raw file.f32 ./fargan_demo -fargan-synthesis file.f32 file_rebuilt.raw
If file.raw has a length equal to an integer multiple of 10 msec, file_rebuilt.raw will be exactly 50 msec (800 samples) shorter. If it's not an even multiple of 10 msec, the remainder is truncated as well (e.g. a file of length 779 msec is truncated to 720 msec). Discarding the end of a file that doesn't make a complete 10-msec frame is no surprise, but discarding the other 50 msec puzzles me.
After looking at a couple of files, it's also not clear exactly which 50 msecs is being lost. There seems to be some missing from the start of the file, which appears to be less than 50 msec but more than 25.
Do you have any insight as to what is going on? Perhaps my headers are not being removed properly. Or does feature extraction a very large window, like 60 msec? If so, wouldn't it make sense to amend the code so that the file ends are padded before extraction?