Skip to content

Multiple Languages Support

Saeed Masoumi edited this page Mar 10, 2017 · 3 revisions

Pipeline Architecture for ApexNLP

ApexNLP allows you to add new languages to its main pipeline. Below you can see how it works

  1. First it needs a Dictionary for target language. It can be created using DictionaryBuilder
  2. Each sentence is subdivided into words using a Tokenizer
  3. Next, each token is tagged with Tagger and can take multiple Tags
  4. Next step called Named Entity Detection. The basic technique we will use for entity detection is chunking, we search for interesting Entities in each sentence using created TaggedWords in previous step.  RegExChunkeruse a deterministic finite automata(dfalex) to find ChunkedParts by predefined patterns.
  5. Then, we will convert all chunked parts to an Event class
  6. In the last step, StandardParserBase will binds all these classes and call them sequentially.

Now, for each new languages you should do all these steps. First you need to create a module named yourLang-nlp, then create all needed classes.

Clone this wiki locally