Lucene's PerFieldAnalyzerWrapper

17 Mar 2021

The Problem

If you are using a basic Lucene indexing strategy, you may do something like this:

Java
1
IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());

In this situation, every field in each document will be analyzed using the StandardAnalyzer.

The indexer is constrained to using a single analyzer.

In some situations, this may not meet your needs, as you may want to use different analyzers for different fields in a document.

How can we instruct the index writer to use different analyzers for different fields?

The Solution

The PerFieldAnalyzerWrapper does exactly this by letting you populate a Java Map, which maps document fields to analyzers.

See the documentation here for an overview of how the PerFieldAnalyzerWrapper works.

You can also use the same approach when building queries

An Example

Java
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper;
import org.apache.lucene.analysis.core.KeywordAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.Analyzer;

...

Map<String, Analyzer> analyzerPerField = new HashMap<>();
analyzerPerField.put("firstname", new KeywordAnalyzer());
analyzerPerField.put("lastname", new KeywordAnalyzer());

PerFieldAnalyzerWrapper aWrapper =
        new PerFieldAnalyzerWrapper(new StandardAnalyzer(version), analyzerPerField);

From the docs:

In this example, StandardAnalyzer will be used for all fields except “firstname” and “lastname”, for which KeywordAnalyzer will be used.

A PerFieldAnalyzerWrapper can be used like any other analyzer, for both indexing and query parsing.