Keras Text Classification Library-经管之家官网!

人大经济论坛-经管之家 收藏本站
您当前的位置> 考研考博>>

考研

>>

Keras Text Classification Library

Keras Text Classification Library

发布:Nicolle | 分类:考研

关于本站

人大经济论坛-经管之家:分享大学、考研、论文、会计、留学、数据、经济学、金融学、管理学、统计学、博弈论、统计年鉴、行业分析包括等相关资源。
经管之家是国内活跃的在线教育咨询平台!

经管之家新媒体交易平台

提供"微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯"等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】

提供微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】

KerasTextClassificationLibraryBuildStatuslicenseSlackkeras-textisaone-stoptextclassificationlibraryimplementingvariousstateoftheartmodelswithacleanandextendableinterfacetoimplementcustomarchitectures. ...
坛友互助群


扫码加入各岗位、行业、专业交流群


  1. Keras Text Classification Library

  2. Build Status license Slack

  3. keras-text is a one-stop text classification library implementing various state of the art models with a clean and extendable interface to implement custom architectures.

  4. Quick start

  5. Create a tokenizer to build your vocabulary

  6. To represent you dataset as (docs, words) use WordTokenizer
  7. To represent you dataset as (docs, sentences, words) use SentenceWordTokenizer
  8. To create arbitrary hierarchies, extend Tokenizer and implement the token_generator method.
  9. from keras_text.processing import WordTokenizer


  10. tokenizer = WordTokenizer()
  11. tokenizer.build_vocab(texts)
  12. Want to tokenize with character tokens to leverage character models? Use CharTokenizer.

  13. Build a dataset

  14. A dataset encapsulates tokenizer, X, y and the test set. This allows you to focus your efforts on trying various architectures/hyperparameters without having to worry about inconsistent evaluation. A dataset can be saved and loaded from the disk.

  15. from keras_text.data import Dataset


  16. ds = Dataset(X, y, tokenizer=tokenizer)
  17. ds.update_test_indices(test_size=0.1)
  18. ds.save('dataset')
  19. The update_test_indices method automatically stratifies multi-class or multi-label data correctly.

  20. Build text classification models

  21. See tests/ folder for usage.

  22. Word based models

  23. When dataset represented as (docs, words) word based models can be created using TokenModelFactory.

  24. from keras_text.models import TokenModelFactory
  25. from keras_text.models import YoonKimCNN, AttentionRNN, StackedRNN


  26. # RNN models can use `max_tokens=None` to indicate variable length words per mini-batch.
  27. factory = TokenModelFactory(1, tokenizer.token_index, max_tokens=100, embedding_type='glove.6B.100d')
  28. word_encoder_model = YoonKimCNN()
  29. model = factory.build_model(token_encoder_model=word_encoder_model)
  30. model.compile(optimizer='adam', loss='categorical_crossentropy')
  31. model.summary()
  32. Currently supported models include:

  33. Yoon Kim CNN
  34. Stacked RNNs
  35. Attention (with/without context) based RNN encoders.
  36. TokenModelFactory.build_model uses the provided word encoder which is then classified via Dense block.

  37. Sentence based models

  38. When dataset represented as (docs, sentences, words) sentence based models can be created using SentenceModelFactory.

  39. from keras_text.models import SentenceModelFactory
  40. from keras_text.models import YoonKimCNN, AttentionRNN, StackedRNN, AveragingEncoder


  41. # Pad max sentences per doc to 500 and max words per sentence to 200.
  42. # Can also use `max_sents=None` to allow variable sized max_sents per mini-batch.
  43. factory = SentenceModelFactory(10, tokenizer.token_index, max_sents=500, max_tokens=200, embedding_type='glove.6B.100d')
  44. word_encoder_model = AttentionRNN()
  45. sentence_encoder_model = AttentionRNN()

  46. # Allows you to compose arbitrary word encoders followed by sentence encoder.
  47. model = factory.build_model(word_encoder_model, sentence_encoder_model)
  48. model.compile(optimizer='adam', loss='categorical_crossentropy')
  49. model.summary()
  50. Currently supported models include:

  51. Yoon Kim CNN
  52. Stacked RNNs
  53. Attention (with/without context) based RNN encoders.
  54. SentenceModelFactory.build_model created a tiered model where words within a sentence is first encoded using
  55. word_encoder_model. All such encodings per sentence is then encoded using sentence_encoder_model.

  56. Hierarchical attention networks (HANs) can be build by composing two attention based RNN models. This is useful when a document is very large.
  57. For smaller document a reasonable way to encode sentences is to average words within it. This can be done by using token_encoder_model=AveragingEncoder()
  58. Mix and match encoders as you see fit for your problem.
  59. Resources

  60. TODO: Update documentation and add notebook examples.

  61. Stay tuned for better documentation and examples. Until then, the best resource is to refer to the API docs

  62. Installation

  63. Install keras with theano or tensorflow backend. Note that this library requires Keras > 2.0

  64. Install keras-text

  65. From sources
  66. sudo python setup.py install
  67. PyPI package
  68. sudo pip install keras-text
  69. Download target spacy model
  70. keras-text uses the excellent spacy library for tokenization. See instructions on how to download model for target language.

  71. Citation

  72. Please cite keras-text in your publications if it helped your research. Here is an example BibTeX entry:

  73. @misc{raghakotkerastext
  74. title={keras-text},
  75. author={Kotikalapudi, Raghavendra and contributors},
  76. year={2017},
  77. publisher={GitHub},
  78. howpublished={\url{https://github.com/raghakot/keras-text}},
  79. }
复制代码
扫码或添加微信号:坛友素质互助


「经管之家」APP:经管人学习、答疑、交友,就上经管之家!
免流量费下载资料----在经管之家app可以下载论坛上的所有资源,并且不额外收取下载高峰期的论坛币。
涵盖所有经管领域的优秀内容----覆盖经济、管理、金融投资、计量统计、数据分析、国贸、财会等专业的学习宝库,各类资料应有尽有。
来自五湖四海的经管达人----已经有上千万的经管人来到这里,你可以找到任何学科方向、有共同话题的朋友。
经管之家(原人大经济论坛),跨越高校的围墙,带你走进经管知识的新世界。
扫描下方二维码下载并注册APP
本文关键词:

本文论坛网址:https://bbs.pinggu.org/thread-6261471-1-1.html

人气文章

1.凡人大经济论坛-经管之家转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
经管之家 人大经济论坛 大学 专业 手机版