Multiple Languages Support

Pipeline Architecture for ApexNLP

ApexNLP allows you to add new languages to its main pipeline. Below you can see how it works

First it needs a Dictionary for target language. It can be created using DictionaryBuilder
Each sentence is subdivided into words using a Tokenizer
Next, each token is tagged with Tagger and can take multiple Tags
Next step called Named Entity Detection. The basic technique we will use for entity detection is chunking, we search for interesting Entities in each sentence using created TaggedWords in previous step. RegExChunkeruse a deterministic finite automata(dfalex) to find ChunkedParts by predefined patterns.
Then, we will convert all chunked parts to an Event class
In the last step, StandardParserBase will binds all these classes and call them sequentially.

Now, for each new languages you should do all these steps. First you need to create a module named yourLang-nlp, then create all needed classes.

Provide feedback