Apache Solr NGram/EdgeNGram

### Apache Solr NGram/EdgeNGram ###

=== NGram ===
NGram is very useful for auto-complete, it will cut the word by a size specified by yourself.

For example, the word “paris”, minGramSize takes 2, maxGramSize takes 3, we will get :
paris => “pa”, “ar”, “ri”, “is”
=> “par”, “ari”, “ris”

By default, minGramSize is 1, maxGramSize is 1 and side is “front”.
You can also set side to “back” to generate the ngrams from right to left.

 <fieldType name="text_general_ngram" positionIncrementGap="100">
   <analyzer type="index">
       <tokenizer .../>
       <filter minGramSize="2" maxGramSize="15"/>
   </analyzer>
   <analyzer type="query">
       <tokenizer.../>
    </analyzer>
 </fieldType>

=== EdgeNGram ===
We can use also EdgeNGram, it will create n-grams from the beginning edge of a input token.

Also take the word “paris” as an example, and take minGramSize equals to 2, maxGramSize equals to 10, side from front
paris => “pa”, “par”, “pari”, “paris”

By default, minGramSize is 1, maxGramSize is 1 and side is “front”.
You can also set side to “back” to generate the ngrams from right to left.

 <fieldType name="text_general_edge_ngram" positionIncrementGap="100">
   <analyzer type="index">
     <tokenizer .../>
     <filter minGramSize="2" maxGramSize="15" side="front"/>
   </analyzer>
   <analyzer type="query">
      <tokenizer .../>
   </analyzer>
 </fieldType>

Leave a comment