关于本站
人大经济论坛-经管之家:分享大学、考研、论文、会计、留学、数据、经济学、金融学、管理学、统计学、博弈论、统计年鉴、行业分析包括等相关资源。
经管之家是国内活跃的在线教育咨询平台!
经管之家新媒体交易平台
提供"微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯"等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】
期刊
- 期刊库 | 马上cssci就要更新 ...
- 期刊库 | 【独家发布】《财 ...
- 期刊库 | 【独家发布】“我 ...
- 期刊库 | 【独家发布】“我 ...
- 期刊库 | 【独家发布】国家 ...
- 期刊库 | 请问Management S ...
- 期刊库 | 英文期刊库
- 核心期刊 | 歧路彷徨:核心期 ...
TOP热门关键词
KerasTextClassificationLibraryBuildStatuslicenseSlackkeras-textisaone-stoptextclassificationlibraryimplementingvariousstateoftheartmodelswithacleanandextendableinterfacetoimplementcustomarchitectures. ...
坛友互助群 |
扫码加入各岗位、行业、专业交流群 |
- Keras Text Classification Library
- Build Status license Slack
- keras-text is a one-stop text classification library implementing various state of the art models with a clean and extendable interface to implement custom architectures.
- Quick start
- Create a tokenizer to build your vocabulary
- To represent you dataset as (docs, words) use WordTokenizer
- To represent you dataset as (docs, sentences, words) use SentenceWordTokenizer
- To create arbitrary hierarchies, extend Tokenizer and implement the token_generator method.
- from keras_text.processing import WordTokenizer
- tokenizer = WordTokenizer()
- tokenizer.build_vocab(texts)
- Want to tokenize with character tokens to leverage character models? Use CharTokenizer.
- Build a dataset
- A dataset encapsulates tokenizer, X, y and the test set. This allows you to focus your efforts on trying various architectures/hyperparameters without having to worry about inconsistent evaluation. A dataset can be saved and loaded from the disk.
- from keras_text.data import Dataset
- ds = Dataset(X, y, tokenizer=tokenizer)
- ds.update_test_indices(test_size=0.1)
- ds.save('dataset')
- The update_test_indices method automatically stratifies multi-class or multi-label data correctly.
- Build text classification models
- See tests/ folder for usage.
- Word based models
- When dataset represented as (docs, words) word based models can be created using TokenModelFactory.
- from keras_text.models import TokenModelFactory
- from keras_text.models import YoonKimCNN, AttentionRNN, StackedRNN
- # RNN models can use `max_tokens=None` to indicate variable length words per mini-batch.
- factory = TokenModelFactory(1, tokenizer.token_index, max_tokens=100, embedding_type='glove.6B.100d')
- word_encoder_model = YoonKimCNN()
- model = factory.build_model(token_encoder_model=word_encoder_model)
- model.compile(optimizer='adam', loss='categorical_crossentropy')
- model.summary()
- Currently supported models include:
- Yoon Kim CNN
- Stacked RNNs
- Attention (with/without context) based RNN encoders.
- TokenModelFactory.build_model uses the provided word encoder which is then classified via Dense block.
- Sentence based models
- When dataset represented as (docs, sentences, words) sentence based models can be created using SentenceModelFactory.
- from keras_text.models import SentenceModelFactory
- from keras_text.models import YoonKimCNN, AttentionRNN, StackedRNN, AveragingEncoder
- # Pad max sentences per doc to 500 and max words per sentence to 200.
- # Can also use `max_sents=None` to allow variable sized max_sents per mini-batch.
- factory = SentenceModelFactory(10, tokenizer.token_index, max_sents=500, max_tokens=200, embedding_type='glove.6B.100d')
- word_encoder_model = AttentionRNN()
- sentence_encoder_model = AttentionRNN()
- # Allows you to compose arbitrary word encoders followed by sentence encoder.
- model = factory.build_model(word_encoder_model, sentence_encoder_model)
- model.compile(optimizer='adam', loss='categorical_crossentropy')
- model.summary()
- Currently supported models include:
- Yoon Kim CNN
- Stacked RNNs
- Attention (with/without context) based RNN encoders.
- SentenceModelFactory.build_model created a tiered model where words within a sentence is first encoded using
- word_encoder_model. All such encodings per sentence is then encoded using sentence_encoder_model.
- Hierarchical attention networks (HANs) can be build by composing two attention based RNN models. This is useful when a document is very large.
- For smaller document a reasonable way to encode sentences is to average words within it. This can be done by using token_encoder_model=AveragingEncoder()
- Mix and match encoders as you see fit for your problem.
- Resources
- TODO: Update documentation and add notebook examples.
- Stay tuned for better documentation and examples. Until then, the best resource is to refer to the API docs
- Installation
- Install keras with theano or tensorflow backend. Note that this library requires Keras > 2.0
- Install keras-text
- From sources
- sudo python setup.py install
- PyPI package
- sudo pip install keras-text
- Download target spacy model
- keras-text uses the excellent spacy library for tokenization. See instructions on how to download model for target language.
- Citation
- Please cite keras-text in your publications if it helped your research. Here is an example BibTeX entry:
- @misc{raghakotkerastext
- title={keras-text},
- author={Kotikalapudi, Raghavendra and contributors},
- year={2017},
- publisher={GitHub},
- howpublished={\url{https://github.com/raghakot/keras-text}},
- }
扫码或添加微信号:坛友素质互助
「经管之家」APP:经管人学习、答疑、交友,就上经管之家!
免流量费下载资料----在经管之家app可以下载论坛上的所有资源,并且不额外收取下载高峰期的论坛币。
涵盖所有经管领域的优秀内容----覆盖经济、管理、金融投资、计量统计、数据分析、国贸、财会等专业的学习宝库,各类资料应有尽有。
来自五湖四海的经管达人----已经有上千万的经管人来到这里,你可以找到任何学科方向、有共同话题的朋友。
经管之家(原人大经济论坛),跨越高校的围墙,带你走进经管知识的新世界。
扫描下方二维码下载并注册APP
免流量费下载资料----在经管之家app可以下载论坛上的所有资源,并且不额外收取下载高峰期的论坛币。
涵盖所有经管领域的优秀内容----覆盖经济、管理、金融投资、计量统计、数据分析、国贸、财会等专业的学习宝库,各类资料应有尽有。
来自五湖四海的经管达人----已经有上千万的经管人来到这里,你可以找到任何学科方向、有共同话题的朋友。
经管之家(原人大经济论坛),跨越高校的围墙,带你走进经管知识的新世界。
扫描下方二维码下载并注册APP
您可能感兴趣的文章
本站推荐的文章
人气文章
本文标题:Keras Text Classification Library
本文链接网址:https://bbs.pinggu.org/jg/kaoyankaobo_kaoyan_6261471_1.html
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。