- 阅读权限
- 255
- 威望
- 0 级
- 论坛币
- 50288 个
- 通用积分
- 83.6306
- 学术水平
- 253 点
- 热心指数
- 300 点
- 信用等级
- 208 点
- 经验
- 41518 点
- 帖子
- 3256
- 精华
- 14
- 在线时间
- 766 小时
- 注册时间
- 2006-5-4
- 最后登录
- 2022-11-6
|
Listing 2-10. Extracting least frequently used tokens
- Listing 2-10. Extracting least frequently used tokens
- let rareTokens n (tokenizer:Tokenizer) (docs:string []) =
- let tokenized = docs |> Array.map tokenizer
- let tokens = tokenized |> Set.unionMany
- tokens
- |> Seq.sortBy (fun t -> countIn tokenized t)
- |> Seq.take n
- |> Set.ofSeq
- let rareHam = ham |> rareTokens 50 casedTokenizer |> Seq.iter (printfn "%s")
- let rareSpam = spam |> rareTokens 50 casedTokenizer |> Seq.iter (printfn "%s")
复制代码
|
|