-
Notifications
You must be signed in to change notification settings - Fork 12
Multiple Languages Support
Saeed Masoumi edited this page Mar 10, 2017
·
3 revisions

Pipeline Architecture for ApexNLP
ApexNLP allows you to add new languages to its main pipeline. Below you can see how it works
- First it needs a
Dictionary
for target language. It can be created usingDictionaryBuilder
- Each sentence is subdivided into words using a
Tokenizer
- Next, each token is tagged with
Tagger
and can take multipleTags
- Next step called Named Entity Detection. The basic technique we will use for entity detection is chunking, we search for interesting
Entities
in each sentence using createdTaggedWords
in previous step.RegExChunker
use a deterministic finite automata(dfalex) to findChunkedParts
by predefined patterns. - Then, we will convert all chunked parts to an
Event
class - In the last step,
StandardParserBase
will binds all these classes and call them sequentially.
Now, for each new languages you should do all these steps. First you need to create a module named yourLang-nlp
, then create all needed classes.