Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Whole Word Masking #2

Open
sunny0315 opened this issue Sep 10, 2020 · 0 comments
Open

About Whole Word Masking #2

sunny0315 opened this issue Sep 10, 2020 · 0 comments

Comments

@sunny0315
Copy link

sunny0315 commented Sep 10, 2020

dataset.py中的batchify似乎不是绝对的Whole Word Masking吧?首先是选择分词结果中的15%的词作为mask对象,然后对于选中的词的每个字概率性选择替换、保留或mask,所以也有概率一个词的部分被mask,部分被替换或保留吧,源代码如下图,不知是不是我的理解有误
image

结果如下
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant