这个是我在github https://github.com/pommedeterresautee/fastrtext/issues/34 上提问的,用fastrtext 来做文本分类预测的,以下是英文直接复制过来的,哪位大神帮忙看看,多谢多谢~
I got an issue with Chinese text classification prediction model as folloing:
test_sentences$text2[9]
[1] "蛋白粉 开封 后 两个 月 在 次 食用 味道 发苦"
predict(model,test_sentences$text2[9])
[[1]]
__label__262
0.5312194
predict(model, "蛋白粉 开封 后 两个 月 在 次 食用 味道 发苦")
[[1]]
__label__314
0.9935217
Basically, after you trained the model using "fastrtext", if you try to predict a Chinese tokenized text and put it as an object (e.g. test_sentences$text2[9] in my case), it will give you a wrong prediction with low probability. If you just simply copy the tokenized Chinese text into the prediction model like I did above, it will give a correct one with high probability. I am really confused about this situation. Anyone can help with it? Much appreciated!