[size=17.3333339691162px]This page provides a number of examples on how to use the various Tika APIs. All of the examples shown are also available in the Tika Example module in SVN.
For more control, you can call the Tika Parsers directly. Most likely, you'll want to start out using the Auto-Detect Parser, which automatically figures out what kind of content you have, then calls the appropriate parser for you.
public String parseExample() throws IOException, SAXException, TikaException {
AutoDetectParser parser = new AutoDetectParser();
BodyContentHandler handler = new BodyContentHandler();
Sometimes, you want to chunk the resulting text up, perhaps to output as you go minimising memory use, perhaps to output to HDFS files, or any other reason! With a small custom content handler, you can do that.
public List<String> parseToPlainTextChunks() throws IOException, SAXException, TikaException {
final List<String> chunks = new ArrayList<>();
chunks.add("");
ContentHandlerDecorator handler = new ContentHandlerDecorator() {
@Override
public void characters(char[] ch, int start, int length) {
String lastChunk = chunks.get(chunks.size() - 1);
String thisStr = new String(ch, start, length);
if (lastChunk.length() + length > MAXIMUM_TEXT_CHUNK_SIZE) {
In order to use the Microsoft Translation API, you need to sign up for a Microsoft account, get an API key, then pass the key to Tika before translating.
public String microsoftTranslateToFrench(String text) {
MicrosoftTranslator translator = new MicrosoftTranslator();
// Change the id and secret! See http://msdn.microsoft.com/en-us/library/hh454950.aspx.