[实际应用] 用fastrtext做中文文本分类问题 [推广有奖]

2关注
0粉丝

高中生

22%

还不是VIP/贵宾

威望: 0 级
论坛币: 10 个
通用积分: 0
学术水平: 0 点
热心指数: 0 点
信用等级: 0 点
经验: 330 点
帖子: 4
精华: 0
在线时间: 41 小时
注册时间: 2011-4-4
最后登录: 2021-7-24

楼主

sound118 发表于 2019-4-18 10:42:29 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

这个是我在github https://github.com/pommedeterresautee/fastrtext/issues/34 上提问的，用fastrtext 来做文本分类预测的，以下是英文直接复制过来的，哪位大神帮忙看看，多谢多谢~

I got an issue with Chinese text classification prediction model as folloing:

test_sentences$text2[9]
[1] "蛋白粉开封后两个月在次食用味道发苦"
predict(model,test_sentences$text2[9])
[[1]]
__label__262
0.5312194

predict(model, "蛋白粉开封后两个月在次食用味道发苦")
[[1]]
__label__314
0.9935217

Basically, after you trained the model using "fastrtext", if you try to predict a Chinese tokenized text and put it as an object (e.g. test_sentences$text2[9] in my case), it will give you a wrong prediction with low probability. If you just simply copy the tokenized Chinese text into the prediction model like I did above, it will give a correct one with high probability. I am really confused about this situation. Anyone can help with it? Much appreciated!