FEC (forward-error-correction) techniques correct errors at the receiver end of digital communications systems. In contrast with error-detection and retransmission ...
Abstract: Video captioning is a process of automatically generating textual descriptions for video content. This task is crucial in the fields of computer vision and Natural Language Processing (NLP).
DisCoder is a neural vocoder that leverages a generative adversarial encoder-decoder architecture informed by a neural audio codec to reconstruct high-fidelity 44.1 kHz audio from mel spectrograms.
We present OpenS2S, a fully open-source, transparent and end-to-end large speech language model designed to enable empathetic speech interactions. As shown in the figure, OpenS2S consists of the ...
Abstract: Computer vision frequently applies background subtraction (BGS) as a core technique, particularly in fields such as surveillance, object detection, and motion analysis. The main goal of BGS ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results