神戸大学附属図書館デジタルアーカイブ
入力補助
English
カテゴリ
学内刊行物
ランキング
アクセスランキング
ダウンロードランキング
https://hdl.handle.net/20.500.14094/90008868
このアイテムのアクセス数:
84
件
(
2025-07-01
15:49 集計
)
閲覧可能ファイル
ファイル
フォーマット
サイズ
閲覧回数
説明
90008868 (fulltext)
pdf
942 KB
25
メタデータ
ファイル出力
メタデータID
90008868
アクセス権
open access
出版タイプ
Version of Record
タイトル
Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation
著者
Takashima, Yuki ; Takashima, Ryoichi ; Tsunoda, Ryota ; Aihara, Ryo ; Takiguchi, Tetsuya ; Ariki, Yasuo ; Motoyama, Nobuaki
著者名
Takashima, Yuki
著者ID
A2510
研究者ID
1000050846102
KUID
https://kuid-rm-web.ofc.kobe-u.ac.jp/search/detail?systemId=df7a61d0afafcfc6520e17560c007669
著者名
Takashima, Ryoichi
髙島, 遼一
タカシマ, リョウイチ
所属機関名
都市安全研究センター
著者名
Tsunoda, Ryota
著者名
Aihara, Ryo
著者ID
A1279
研究者ID
1000040397815
KUID
https://kuid-rm-web.ofc.kobe-u.ac.jp/search/detail?systemId=b3ec2a1710d8267b520e17560c007669
著者名
Takiguchi, Tetsuya
滝口, 哲也
タキグチ, テツヤ
所属機関名
都市安全研究センター
著者ID
A0260
研究者ID
1000010135519
KUID
https://kuid-rm-web.ofc.kobe-u.ac.jp/search/detail?systemId=09a784b8ffbc912c520e17560c007669
著者名
Ariki, Yasuo
有木, 康雄
アリキ, ヤスオ
所属機関名
都市安全研究センター
著者名
Motoyama, Nobuaki
言語
English (英語)
収録物名
EURASIP Journal on Audio, Speech, and Music Processing
巻(号)
2021(1)
ページ
44
出版者
SpringerOpen
刊行日
2021-12-11
公開日
2021-12-21
抄録
We present an unsupervised domain adaptation (UDA) method for a lip-reading model that is an image-based speech recognition model. Most of conventional UDA methods cannot be applied when the adaptation data consists of an unknown class, such as out-of-vocabulary words. In this paper, we propose a cross-modal knowledge distillation (KD)-based domain adaptation method, where we use the intermediate layer output in the audio-based speech recognition model as a teacher for the unlabeled adaptation data. Because the audio signal contains more information for recognizing speech than lip images, the knowledge of the audio-based model can be used as a powerful teacher in cases where the unlabeled adaptation data consists of audio-visual parallel data. In addition, because the proposed intermediate-layer-based KD can express the teacher as the sub-class (sub-word)-level representation, this method allows us to use the data of unknown classes for the adaptation. Through experiments on an image-based word recognition task, we demonstrate that the proposed approach can not only improve the UDA performance but can also use the unknown-class adaptation data.
キーワード
Lip reading
Knowledge distillation
Multimodal
Unsupervised domain adaptation
カテゴリ
都市安全研究センター
学術雑誌論文
権利
© The Author(s). 2021. Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
関連情報
DOI
https://doi.org/10.1186/s13636-021-00232-5
詳細を表示
資源タイプ
journal article
eISSN
1687-4722
OPACで所蔵を検索
CiNiiで学外所蔵を検索
ホームへ戻る