Skip to content
View Article Network

Elasticsearch Query DSL Query Syntax Notes

I previously mentioned that I would be grinding away at Elasticsearch until the end of this year, but I have changed my plans. After this post, I will likely only add one more about its application in .NET before wrapping things up. I originally expected to finish by the end of October, but it dragged into November. At this rate, the next post will probably drag until the end of the year as well.

I had originally planned to write about Geo Queries and Aggregations as well, but I decided it would be better to split them up. Aggregations lean more toward statistical analysis and aren't strictly related to query syntax itself; as for Geo Queries, I'll put them on hold. I've been working on this post for too long and it's becoming a bit tedious, so I'll write a separate one when I have the time and the mood.

My weight loss progress stalled from September 16th to October 9th, and in October, I inexplicably fell into a state of world-weariness, wanting only to stay home, read novels, and scroll through short videos, with no desire to go out or use my brain. I don't know if I'll fall into this state again, and I have no idea when the next post will be finished.


I previously wrote a note on Elasticsearch QueryString query syntax, which mainly introduced how to use simple query strings in the query_string field. That syntax is concise and intuitive, making it very suitable for quick tests or simple query requirements. However, in actual production environments, Query DSL (Domain Specific Language) is used more frequently. Query DSL is a JSON-structured query language provided by Elasticsearch, and its functionality is far more powerful and flexible than Query String. This article focuses on actual test results, supplemented by cross-verification with official documentation, to organize Query DSL query syntax.

Test version: Elasticsearch 9.1.5


Query DSL vs Query String

Before we begin, let's briefly explain the advantages of Query DSL over Query String:

1. More Complete Functionality

Certain query features can only be implemented using Query DSL; Query String cannot support them:

  • Nested Queries: When you need to preserve the relationship between fields within nested objects, you must use the nested query in Query DSL.
  • Geo-spatial Queries: Such as geo_distance and other geo-query features.
  • Custom Scoring: Use function_score to customize the relevance scoring of documents.
  • Complex Boolean Logic Combinations: Flexibly combine must, should, must_not, filter, and other conditions through bool queries.

2. Clearer Structure

Query String:

json
{
  "query": {
    "query_string": {
      "query": "title:Elasticsearch AND status:published AND created_date:[2024-01-01 TO 2024-12-31]"
    }
  }
}

Query DSL:

json
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "Elasticsearch" }},
        { "term": { "status": "published" }},
        { "range": {
            "created_date": {
              "gte": "2024-01-01",
              "lte": "2024-12-31"
            }
          }
        }
      ]
    }
  }
}

Although Query DSL looks more verbose, the structure is clearer. Each query condition has a specific type and parameters, making it easier to maintain and debug. In addition, Query DSL can provide clearer error messages, explicitly pointing out which field or parameter is problematic.

Common Query DSL Syntax

1. Match Query - Full-Text Search Query

Used for full-text search; it performs tokenization and relevance scoring.

Applicable Types:

  • Text fields: Tokenized, supports all advanced parameters.
  • Keyword fields: Not tokenized, exact match.
  • Numeric/Date/Boolean fields: Exact match, does not support parameters like fuzziness or analyzer.

Basic Query

json
{
  "query": {
    "match": {
      "title": "Elasticsearch Tutorial"
    }
  }
}

operator Parameter

Controls the logical relationship between multiple tokens.

OR (Default)

Returns results if any of the terms match:

json
{
  "query": {
    "match": {
      "title": {
        "query": "quick brown fox",
        "operator": "OR"
      }
    }
  }
}

Effect: Documents containing any of the terms quick, brown, or fox will be returned.


AND

Must match all terms:

json
{
  "query": {
    "match": {
      "title": {
        "query": "quick brown fox",
        "operator": "AND"
      }
    }
  }
}

Effect: Documents must contain all three terms: quick, brown, and fox.


minimum_should_match Parameter

Important: This parameter is only effective when operator = "OR".

Controls how many conditions must be met at a minimum.

Positive Integer (Absolute quantity)

json
{
  "query": {
    "match": {
      "content": {
        "query": "quick brown fox jumps",
        "minimum_should_match": 3
      }
    }
  }
}

Effect: At least 3 out of 4 terms must match.

Examples:

  • quick brown fox jumps ✓ (All 4 match).
  • quick brown fox dog ✓ (3 match: quick brown fox).
  • quick brown lazy dog ✗ (Only 2 match: quick brown).
  • the fox jumps high ✗ (Only 2 match: fox jumps).

Negative Integer (Allowed missing quantity)

json
{
  "query": {
    "match": {
      "content": {
        "query": "quick brown fox jumps",
        "minimum_should_match": -1
      }
    }
  }
}

Effect: At most 1 term can be missing, equivalent to requiring at least 3.

Examples:

  • quick brown fox jumps ✓ (0 missing).
  • quick brown fox dog ✓ (1 missing: jumps).
  • quick brown lazy dog ✗ (2 missing: fox and jumps).

⚠️ Special Case: The minimum match floor is 1.

When setting -4 (missing count = total tokens) or -100% (100% missing), it will not return all data; at least 1 term must match to return results.

Examples (-4 or -100%):

  • quick dog ✓ (1 match: quick).
  • brown cat ✓ (1 match: brown).
  • lazy slow ✗ (0 matches).

Percentage (Floor rule)

json
{
  "query": {
    "match": {
      "content": {
        "query": "quick brown fox jumps",
        "minimum_should_match": "75%"
      }
    }
  }
}

Effect: At least 75% must match, which means at least 3 out of 4 terms (4 × 0.75 = 3).

⚠️ Calculation Rule (Floor rule):

  • 75%: 4 × 0.75 = 3.0 → 3 terms.
  • 74%: 4 × 0.74 = 2.96 → Floor to 2 terms.
  • 50%: 4 × 0.50 = 2.0 → 2 terms.
  • 26%: 4 × 0.26 = 1.04 → Floor to 1 term.
  • 25%: 4 × 0.25 = 1.0 → 1 term.

Example (75%):

  • quick brown fox jumps ✓ (100% match).
  • quick brown fox dog ✓ (3 match, meets 75%).
  • quick brown dog cat ✗ (Only 2 match, less than 75%).

Example (74%):

  • quick brown dog cat ✓ (2 match, 2.96 floors to 2).
  • quick dog cat rat ✗ (Only 1 match).

Negative Percentage (Floor rule)

json
{
  "query": {
    "match": {
      "content": {
        "query": "quick brown fox jumps",
        "minimum_should_match": "-25%"
      }
    }
  }
}

Effect: At most 25% can be missing, which means at most 1 term missing (4 × 0.25 = 1), equivalent to requiring at least 3.

⚠️ Calculation Rule (Floor rule):

  • -25%: 4 × 0.25 = 1 → At most 1 missing, requires 3.
  • -26%: 4 × 0.26 = 1.04 → Floor to 1, at most 1 missing, requires 3.
  • -74%: 4 × 0.74 = 2.96 → Floor to 2, at most 2 missing, requires 2.
  • -75%: 4 × 0.75 = 3 → At most 3 missing, requires 1.

Example (-25%):

  • quick brown fox jumps ✓ (0 missing).
  • quick brown fox dog ✓ (1 missing, meets at most 25% missing).
  • quick brown dog cat ✗ (2 missing, exceeds limit).

Example (-74%):

  • quick brown dog cat ✓ (2 match, at most 2 missing).
  • quick dog cat rat ✗ (Only 1 match, 3 missing).

Example (-75%):

  • quick dog cat rat ✓ (1 match, at most 3 missing).
  • lazy slow fast dog ✗ (0 matches).

Single Condition Combination (Advanced)

⚠️ Important: How to interpret single conditions.

Format: N<VALUE or N>VALUE.

  • N<VALUE: When token count ≤ N, use default rule (100%); when > N, apply VALUE rule.
  • N>VALUE: When token count > N, use default rule (100%); when ≤ N, apply VALUE rule.

Example 1: 3<90%

json
{
  "query": {
    "match": {
      "content": {
        "query": "some long search query with many terms",
        "minimum_should_match": "3<90%"
      }
    }
  }
}

Interpretation:

  • When query ≤ 3 tokens: 100% match required (default).
  • When query > 3 tokens: 90% match required.

Example (Assuming query "one two three four five", 5 terms):

  • one two three four five ✓ (100% match, 5/5).
  • one two three four dog ✓ (80% match, but only 90% needed because 5 > 3).
  • one two three dog cat ✗ (Only 60% match, 3/5).

Example 2: 3<-1

json
{
  "query": {
    "match": {
      "content": {
        "query": "alpha beta gamma delta",
        "minimum_should_match": "3<-1"
      }
    }
  }
}

Interpretation:

  • When query ≤ 3 tokens: 100% match required.
  • When query > 3 tokens: At most 1 missing.

Example (4 terms):

  • alpha beta gamma delta ✓ (0 missing).
  • alpha beta gamma dog ✓ (1 missing: delta).
  • alpha beta dog cat ✗ (2 missing: gamma and delta).

Multiple Condition Combination (Advanced)

⚠️ Important: Multiple conditions are interpreted differently than single conditions.

Format: N1<VALUE1 N2<VALUE2 ....

Multiple conditions are interpreted as "intervals," not "less than":

  • Before the first condition: Use default rule (100%).
  • Between N1 and N2: Apply VALUE1.
  • After N2: Apply VALUE2.

Example: 2<-25% 9<-3

json
{
  "query": {
    "match": {
      "content": {
        "query": "very long search query with lots of terms",
        "minimum_should_match": "2<-25% 9<-3"
      }
    }
  }
}

⚠️ Correct Interpretation (Interval approach):

  • ≤ 2 tokens: 100% match (default).
  • 3-9 tokens: At most 25% missing (applies first condition -25%).
  • > 9 tokens: At most 3 missing (applies second condition -3).

❌ Incorrect Interpretation (Understanding via single condition approach):

  • ≤ 2: Apply -25% (Incorrect!)
  • > 9: Apply -3 (Incorrect!)

Example (Assuming a 10-term query):

  • 10 terms match ✓ (0 missing).
  • 7 terms match ✓ (3 missing, meets > 9 rule).
  • 6 terms match ✗ (4 missing, exceeds limit).

Example (Assuming a 5-term query):

  • 5 terms match ✓ (0% missing).
  • 4 terms match ✓ (1 missing, 5 × 25% = 1.25 → Floor to 1, meets at most 1 missing).
  • 3 terms match ✗ (2 missing, exceeds limit).

fuzziness Parameter

Fuzzy matching, allows spelling errors. Only applicable to text fields.

AUTO (Recommended)

json
{
  "query": {
    "match": {
      "title": {
        "query": "Elasticsearc",
        "fuzziness": "AUTO"
      }
    }
  }
}

Effect: Automatically determines allowed edit distance based on term length.

Examples:

  • Elasticsearch ✓ (1 character difference: h).
  • Elasticsearc ✓ (Exact match).
  • Elasticserch ✓ (1 character difference).
  • Elastix ✗ (Too much difference).

Fixed Edit Distance

json
{
  "query": {
    "match": {
      "title": {
        "query": "quikc brown",
        "fuzziness": 1
      }
    }
  }
}

Effect: Allows at most 1 character difference (insertion, deletion, substitution).

Examples:

  • quick brown ✓ (quikc → quick, 1 character difference).
  • quikc brown ✓ (Exact match).
  • qukc brown ✗ (2 character difference).
  • qick brown ✓ (1 character difference).

Related Parameters

json
{
  "query": {
    "match": {
      "title": {
        "query": "quikc brown fox",
        "fuzziness": "AUTO",
        "prefix_length": 2,
        "max_expansions": 10,
        "fuzzy_transpositions": true
      }
    }
  }
}

Parameter Explanations:

  • prefix_length: The first N characters must match exactly, default is 0.
  • max_expansions: Maximum number of candidate terms to expand during fuzzy matching, default is 50.
  • fuzzy_transpositions: Whether to allow adjacent character swaps (ab → ba), default is true.

Example (prefix_length = 2):

  • quick brown fox ✓ (Starts with qu, matches prefix).
  • quikc brown fox ✓ (Starts with qu, matches prefix).
  • xuick brown fox ✗ (First 2 characters xu do not match qu).

Example (max_expansions = 10):

Suppose the index contains these terms: quick, quit, quiz, quiet, quiche, quill, quirk, quack, queue, quartz, qualify, quarrel... (Total 20+ similar terms).

When querying qui:

json
{
  "query": {
    "match": {
      "title": {
        "query": "qui",
        "fuzziness": 1,
        "max_expansions": 10
      }
    }
  }
}

Effect:

  • Elasticsearch finds all similar terms with edit distance ≤ 1 (could be 20+).
  • Only the first 10 candidate terms are taken for searching (e.g., qui, quit, quiz, quiet, quick, quiche, quill, quirk, quack, queue).
  • Other candidate terms (like quartz, qualify, quarrel...) are ignored.

Why limit this?

  • Performance considerations: If expanded into dozens of candidate terms, it consumes significant computing resources, slowing down the query.
  • Result quality: Too many candidate terms may include irrelevant results.

Example (fuzzy_transpositions = true):

  • qiuck ✓ (ui ↔ iu, swapped).
  • qukic ✓ (ki ↔ ik, swapped).

Example (fuzzy_transpositions = false):

json
{
  "query": {
    "match": {
      "title": {
        "query": "qiuck",
        "fuzziness": 1,
        "fuzzy_transpositions": false
      }
    }
  }
}
  • qiuck ✗ (ui ↔ iu swap is not allowed, requires 2 edits: delete i, insert u).
  • quick ✓ (Requires only 1 edit: replace i → u).

Other Parameters

analyzer

Specifies the analyzer (defaults to the analyzer set on the field):

json
{
  "query": {
    "match": {
      "content": {
        "query": "Quick Brown",
        "analyzer": "standard"
      }
    }
  }
}

lenient

Controls how to handle cases where the query value does not match the field type, default is false.

Parameter Explanations:

  • false (Default): Throws an error when types do not match, query fails.
  • true: Ignores the query for that field when types do not match, does not throw an error, but that field will have no matching results.

Example 1: lenient = false (Default)

json
{
  "query": {
    "match": {
      "age": {
        "query": "not a number"
      }
    }
  }
}

Effect:

  • Because the age field is numeric, and the query value "not a number" is text.
  • The query will throw an error.

Example 2: lenient = true

json
{
  "query": {
    "match": {
      "age": {
        "query": "not a number",
        "lenient": true
      }
    }
  }
}

Effect:

  • The query will not throw an error.
  • But because the type does not match, the field will not match any documents (equivalent to the condition being ignored).
  • The query will execute normally, just with no results.

boost

Adjusts the relevance score weight, default is 1.0:

json
{
  "query": {
    "match": {
      "title": {
        "query": "Elasticsearch",
        "boost": 2.0
      }
    }
  }
}

zero_terms_query

How to handle cases where there are no tokens after query analysis (becomes an empty query), default is none.

Parameter Explanations:

  • none (Default): Returns no documents.
  • all: Returns all documents (equivalent to match_all).

Example 1: Empty string query

json
{
  "query": {
    "match": {
      "message": {
        "query": "",
        "zero_terms_query": "none"  // or "all"
      }
    }
  }
}

Effect:

  • zero_terms_query: "none": Returns no documents.
  • zero_terms_query: "all": Returns all documents.

Example 2: stop filter removes all terms

Assuming the message field uses a stop filter containing to, be, or, not (requires additional configuration), when querying "to be or not to be":

json
{
  "query": {
    "match": {
      "message": {
        "query": "to be or not to be",
        "zero_terms_query": "none"  // or "all"
      }
    }
  }
}

Processing:

  1. Original query: "to be or not to be".
  2. Stop filter removes all stop words, leaving 0 tokens (becomes an empty query).
  3. zero_terms_query: "none": Returns no documents; zero_terms_query: "all": Returns all documents.

Use Cases:

  • zero_terms_query: "all": Search box allows empty queries, or users might only enter stop words but still expect feedback.
  • zero_terms_query: "none": Does not allow empty queries (most default behaviors).

WARNING

zero_terms_query is only triggered when the query truly becomes empty.

If the query terms are not removed, but simply cannot be found in the index, it will return 0 results normally rather than triggering zero_terms_query. For example, if the field does not have a stop filter set, querying "to be or not to be" will not trigger zero_terms_query, but will search for those terms normally.


2. Multi Match Query - Multi-Field Query

Searches for the same keyword in multiple fields.

json
{
  "query": {
    "multi_match": {
      "query": "Elasticsearch",
      "fields": ["title^3", "content", "tags"],
      "type": "best_fields"
    }
  }
}

Parameter Explanations:

  • fields: List of fields, the number after ^ represents the weight. Fields can use wildcards, e.g., "title" and "*_name" will query fields like title, first_name, last_name, etc.
  • type: Query type.

Parameter Support by Type

ParameterDescriptionbest_fieldsmost_fieldscross_fieldsphrasephrase_prefixbool_prefix
fuzzinessFuzzy match, allows spelling errors (supports AUTO, 0, 1, 2)
prefix_lengthFirst N characters must match exactly (default 0)
max_expansionsMax candidate terms to expand during fuzzy match (default 50)
fuzzy_transpositionsWhether to allow adjacent character swaps (default true)
fuzzy_rewriteRewrite method for fuzzy queries
slopAllowed term spacing for phrase queries

lenient Parameter

The lenient parameter is particularly useful in multi-field queries because different fields may have different data types.

Assuming the index has the following fields:

  • title (text)
  • price (integer)
json
{
  "query": {
    "multi_match": {
      "query": "not a number",
      "fields": ["title", "price"],
      "lenient": false
    }
  }
}

Effect (lenient = false, default):

  • title field is text, can process "not a number" normally.
  • price field is integer, cannot process "not a number".
  • The query will throw an error, and the entire query fails.

json
{
  "query": {
    "multi_match": {
      "query": "not a number",
      "fields": ["title", "price"],
      "lenient": true
    }
  }
}

Effect (lenient = true):

  • title field searches "not a number" normally.
  • price field is ignored because the type does not match, no error is thrown.
  • The query executes normally, searching only in the title field.

Query Type Explanations

To better explain the differences between various query types, we use the following test data:

Test Data:

json
// Document 1
{
  "title": "brown fox jumps",
  "subject": "quick animal",
  "message": "The quick brown fox"
}

// Document 2
{
  "title": "quick brown",
  "subject": "fox hunting",
  "message": "Guide to fox hunting"
}

// Document 3
{
  "title": "fast animal",
  "subject": "brown bear",
  "message": "The brown bear is slow"
}

best_fields (Default)

Takes the score of the highest-scoring field, suitable for finding cases where "a single field is the best match."

json
{
  "query": {
    "multi_match": {
      "query": "quick brown fox",
      "type": "best_fields",
      "fields": ["title", "subject", "message"],
      "tie_breaker": 0.3
    }
  }
}

Internal Execution Logic (Equivalent to):

json
{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "title": "quick brown fox" }},
        { "match": { "subject": "quick brown fox" }},
        { "match": { "message": "quick brown fox" }}
      ],
      "tie_breaker": 0.3
    }
  }
}

Scoring Method:

  • Takes the score of the highest-scoring field.
  • If tie_breaker is set, it becomes: Highest score + (other field scores × tie_breaker).

Query Result Analysis:

Assuming query "quick brown fox", the base scores for each field are as follows (actual scores are affected by BM25 algorithm, term frequency, document length, etc.):

Documenttitle scoresubject scoremessage scoreFinal score calculation (tie_breaker=0.3)
Doc 11.5 (brown, fox)1.0 (quick)5.0 (quick, brown, fox)5.0 + (1.5 + 1.0) × 0.3 = 5.75
Doc 23.0 (quick, brown)1.0 (fox)1.0 (fox)3.0 + (1.0 + 1.0) × 0.3 = 3.6
Doc 301.0 (brown)1.0 (brown)1.0 + 1.0 × 0.3 = 1.3

Calculation Logic:

  • Select the highest-scoring field as the base score.
  • Multiply the scores of all other matching fields by tie_breaker and sum them up.
  • Formula: Highest score + (sum of other field scores × tie_breaker).

Conclusion: Document 1 has the highest score because the message field contains all three terms and is the highest score, while the other two fields also contribute.


most_fields

Combines the scores of all fields, suitable for cases with "multiple similar fields" (e.g., different tokenization methods for the same content).

json
{
  "query": {
    "multi_match": {
      "query": "quick brown fox",
      "type": "most_fields",
      "fields": ["title", "subject", "message"]
    }
  }
}

Internal Execution Logic (Equivalent to):

json
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "quick brown fox" }},
        { "match": { "subject": "quick brown fox" }},
        { "match": { "message": "quick brown fox" }}
      ]
    }
  }
}

Scoring Method:

  • Sums the scores of all fields.

Query Result Analysis:

Documenttitle scoresubject scoremessage scoreFinal score (sum)
Doc 11.5 (brown, fox)1.0 (quick)5.0 (quick, brown, fox)1.5 + 1.0 + 5.0 = 7.5
Doc 23.0 (quick, brown)1.0 (fox)1.0 (fox)3.0 + 1.0 + 1.0 = 5.0
Doc 301.0 (brown)1.0 (brown)0 + 1.0 + 1.0 = 2.0

Conclusion: Document 1 has the highest score because it matches in multiple fields.

Difference from best_fields:

The main difference between best_fields and most_fields lies in the default value of tie_breaker:

  • best_fields: Default tie_breaker = 0.0 (takes only the highest score).
  • most_fields: Default tie_breaker = 1.0 (sums all scores).

When both are set to the same tie_breaker value, the calculated scores will be the same.


cross_fields

Cross-field query, treats multiple fields as one large field, suitable for cases like names, addresses, etc., that need to match across fields.

Test Data (Name Example):

json
// Document 1
{ "first_name": "Wing", "last_name": "Chou" }

// Document 2
{ "first_name": "Chou", "last_name": "Chen" }

// Document 3
{ "first_name": "John", "last_name": "Wing" }
json
{
  "query": {
    "multi_match": {
      "query": "Wing Chou",
      "type": "cross_fields",
      "fields": ["first_name", "last_name"],
      "operator": "and"
    }
  }
}

Execution Logic:

According to official documentation, cross_fields analyzes the query string into individual terms and then looks for each term in any field, as if they were one large field.

text
+blended(terms:[first_name:wing, last_name:wing])
+blended(terms:[first_name:chou, last_name:chou])

This means each term can be scattered across different fields, as long as each term appears in at least one field.

Query Result Analysis:

DocumentMatches?Description
Doc 1Wing in first_name, Chou in last_name (scattered across different fields)
Doc 2Only Chou matches, missing Wing
Doc 3Only Wing matches, missing Chou

WARNING

When the search_analyzer settings of the fields are inconsistent (e.g., one field has an analyzer set, another does not), the query behavior of cross_fields will change. For example, the execution logic becomes:

text
((+first_name:wing +first_name:chou) | (+last_name:wing +last_name:chou))

At this point, all terms must appear in the same field, rather than being scattered across different fields, behaving similarly to best_fields (but with different field order).

Additionally, the combined_fields query cannot be queried when fields use different search_analyzers, so if you have custom analyzer requirements, you need to pay special attention to this limitation.

Scoring Method:

  • Blends term frequency statistics across all fields to avoid results being affected by high term frequency in a single field.
  • tie_breaker can be used to adjust scoring behavior (default is 0.0).

phrase

Phrase query, terms must appear in order.

Test Data:

json
// Document 1
{ "title": "quick brown fox", "message": "The fox is quick" }

// Document 2
{ "title": "brown quick fox", "message": "quick brown fox jumps" }

// Document 3
{ "title": "fast brown fox", "message": "A brown and quick animal" }
json
{
  "query": {
    "multi_match": {
      "query": "quick brown fox",
      "type": "phrase",
      "fields": ["title", "message"]
    }
  }
}

Internal Execution Logic (Equivalent to):

json
{
  "query": {
    "dis_max": {
      "queries": [
        { "match_phrase": { "title": "quick brown fox" }},
        { "match_phrase": { "message": "quick brown fox" }}
      ]
    }
  }
}

Query Result Analysis:

Documenttitle matchesmessage matchesReturns?
Doc 1✅ (Order correct)❌ (Order incorrect: "fox is quick")
Doc 2❌ (Order incorrect: "brown quick fox")✅ (Order correct)
Doc 3❌ (Extra "fast" in middle)❌ (Terms scattered: "brown and quick")

Conclusion: Phrase queries require terms to appear adjacent and in order.

Pairing with slop parameter:

json
{
  "query": {
    "multi_match": {
      "query": "quick brown fox",
      "type": "phrase",
      "fields": ["title", "message"],
      "slop": 1
    }
  }
}

Query Result Changes:

Documenttitle matchesmessage matchesReturns?
Doc 1❌ (Requires slop = 2)
Doc 2❌ (Requires slop = 2)
Doc 3✅ ("fast" counts as 1 interval)❌ (Requires larger slop)

phrase_prefix

Phrase prefix query, the last term can be a prefix match.

Test Data:

json
// Document 1
{ "title": "quick brown fox", "message": "quick brown forest" }

// Document 2
{ "title": "quick brown food", "message": "quick brown" }

// Document 3
{ "title": "fast brown fox", "message": "quick blue forest" }
json
{
  "query": {
    "multi_match": {
      "query": "quick brown f",
      "type": "phrase_prefix",
      "fields": ["title", "message"]
    }
  }
}

Internal Execution Logic (Equivalent to):

json
{
  "query": {
    "dis_max": {
      "queries": [
        { "match_phrase_prefix": { "title": "quick brown f" }},
        { "match_phrase_prefix": { "message": "quick brown f" }}
      ]
    }
  }
}

Query Result Analysis:

Documenttitle matchesmessage matchesReturns?
Doc 1✅ (f prefix matches fox)✅ (f prefix matches forest)
Doc 2✅ (f prefix matches food)❌ (No term starting with f)
Doc 3❌ (Missing "quick")❌ (Missing "brown")

Conclusion: The first N-1 terms must match exactly and in order, the last term can be a prefix match.


bool_prefix

Boolean prefix query, the last term uses prefix match, other terms use exact match.

Test Data:

json
// Document 1
{ "title": "quick brown fox", "message": "forest animals" }

// Document 2
{ "title": "brown food quick", "message": "quick forest" }

// Document 3
{ "title": "fast fox", "message": "brown quick forest" }
json
{
  "query": {
    "multi_match": {
      "query": "quick brown f",
      "type": "bool_prefix",
      "fields": ["title", "message"]
    }
  }
}

Scoring Method:

  • Similar to most_fields, but uses match_bool_prefix query.
  • Supports fuzzy query parameters, but only effective for non-prefix terms.

Query Result Analysis:

Documenttitle matchesmessage matchesReturns?Description
Doc 1✅ (quick, brown, f prefix)✅ (f prefix matches forest)All terms match
Doc 2✅ (quick, brown, f prefix matches food)✅ (quick, f prefix matches forest)Term order doesn't matter
Doc 3✅ (f prefix matches fox)✅ (brown, quick, f prefix matches forest)Terms can be scattered across fields

Difference from phrase_prefix:

Featurephrase_prefixbool_prefix
Term OrderMust be in orderOrder not required
Term PositionMust be adjacentCan be scattered
Use CaseExact phrase searchFlexible auto-complete

Example:

Querying "quick brown f":

  • phrase_prefix: Must be in order "quick brown f...".
  • bool_prefix: Can be any order, e.g., "brown quick f..." or "f... brown quick".

3. Combined Fields Query - Cross-Field Term Query

The combined_fields query adopts a term-centric approach, treating multiple text fields as a single combined field for searching. It is particularly suitable for handling cases where query terms might be scattered across multiple fields, such as an article's title, abstract, and body.

Basic Query:

json
{
  "query": {
    "combined_fields": {
      "query": "database systems",
      "fields": ["title", "abstract", "body"],
      "operator": "and"
    }
  }
}

Test Data:

json
// Document 1
{
  "title": "Database Management",
  "abstract": "Modern systems overview",
  "body": "Relational database concepts"
}

// Document 2
{
  "title": "Information Systems",
  "abstract": "Database architecture",
  "body": "Design patterns"
}

// Document 3
{
  "title": "NoSQL Solutions",
  "abstract": "Alternative approaches",
  "body": "Non-relational systems"
}

Query Result Analysis:

When querying "database systems":

DocumentMatches?Returns?Description
Doc 1"database" in title and body, "systems" in abstract
Doc 2"database" in abstract, "systems" in title
Doc 3"systems" in body (if operator is "or")

Main Parameters

fields (Required)

List of fields, supports wildcards. All fields must be of text type and use the same search analyzer.

json
{
  "query": {
    "combined_fields": {
      "query": "quick search",
      "fields": ["title^2", "content", "*_text"]
    }
  }
}

boost

You can use the ^ symbol to set field weights (must be ≥ 1.0, can be decimal), or use the boost parameter to adjust the weight of the entire query:

json
{
  "query": {
    "combined_fields": {
      "query": "distributed consensus",
      "fields": ["title^2", "body"],
      "boost": 1.5
    }
  }
}

Test Data:

json
// Document 1
{ "title": "Consensus Algorithms", "body": "Distributed systems basics" }

// Document 2
{ "title": "Network Protocols", "body": "Distributed consensus mechanisms" }

Scoring Method:

  • Document 1: title contains "consensus" (weight × 2), body contains "distributed", overall score is higher.
  • Document 2: Both terms are in body (no weight bonus), score is lower.

operator

Sets the logical relationship between terms, default is or.

  • or (Default): Any term matches.
  • and: All terms must match.
json
{
  "query": {
    "combined_fields": {
      "query": "database systems",
      "fields": ["title", "abstract", "body"],
      "operator": "and"
    }
  }
}

minimum_should_match

Minimum number of matches, usage is the same as match query. Supports:

  • Positive integer: Absolute quantity (e.g., 3).
  • Negative integer: Allowed missing quantity (e.g., -1).
  • Percentage: "75%" or "-25%".
  • Condition combination: "3<90%" or "2<-25% 9<-3".

For detailed explanations, please refer to the minimum_should_match parameter in the "Match Query" section.

json
{
  "query": {
    "combined_fields": {
      "query": "quick brown fox jumps",
      "fields": ["title", "content"],
      "minimum_should_match": "75%"
    }
  }
}

zero_terms_query

How to handle cases where there are no tokens after analysis, default is none.

  • none (Default): Returns no documents.
  • all: Returns all documents.

For detailed explanations, please refer to the zero_terms_query parameter in the "Match Query" section.


auto_generate_synonyms_phrase_query

Whether to automatically create phrase queries for multi-term synonyms, default is true.

json
{
  "query": {
    "combined_fields": {
      "query": "quick",
      "fields": ["title", "body"],
      "auto_generate_synonyms_phrase_query": true
    }
  }
}

Effect: If "quick" has a synonym "fast running", it will automatically create a phrase query "fast running".

WARNING

Using the synonym feature requires setting up a synonym filter in the field's search_analyzer. However, combined_fields requires all fields to use the same search_analyzer, and if the analyzer settings for the fields are inconsistent, the query will fail. Therefore, when using this parameter, ensure all queried fields use the same synonym configuration.


Execution Logic

json
{
  "query": {
    "combined_fields": {
      "query": "database systems",
      "fields": ["title", "abstract"],
      "operator": "and"
    }
  }
}

Actual Execution Logic:

text
+(combined("database", fields:["title", "abstract"]))
+(combined("systems", fields:["title", "abstract"]))

Meaning: Each term must appear in at least one field (can be scattered across different fields).


Usage Limitations

  1. Field Type Limitation: Only supports text fields, does not support keyword, numeric, date, etc.
  2. Analyzer Limitation: All fields must use the same search analyzer.
  3. Similarity Limitation: Only supports BM25 similarity (Elasticsearch's default similarity), does not support custom similarity or per-field similarity settings.
  4. Clause Count Limitation: The number of query clauses is limited by indices.query.bool.max_clause_count (default 4096), calculated as "number of fields × number of terms".

Example:

json
{
  "query": {
    "combined_fields": {
      "query": "quick brown fox jumps",
      "fields": ["title", "abstract", "body"]
    }
  }
}
  • Number of terms: 4 (quick, brown, fox, jumps).
  • Number of fields: 3 (title, abstract, body).
  • Clause count: 4 × 3 = 12 (far below the 4096 limit).

4. Match Phrase Query - Phrase Query

Must match the phrase order exactly, suitable for searching fixed phrases.

json
{
  "query": {
    "match_phrase": {
      "content": {
        "query": "quick brown fox",
        "slop": 1
      }
    }
  }
}

Parameter Explanations:

  • query: The phrase to search for.
  • analyzer: Specifies the analyzer (defaults to the analyzer set on the field).
  • boost: Adjusts the relevance score weight, default is 1.0.
  • slop: Maximum allowed spacing between terms, default is 0 (must be completely adjacent).
  • zero_terms_query: How to handle cases where there are no tokens after analysis (none or all).

Test Data:

json
// Document 1
{ "content": "The quick brown fox jumps over the lazy dog" }

// Document 2
{ "content": "A quick and brown fox in the forest" }

// Document 3
{ "content": "The brown quick fox runs fast" }

Query Result (slop = 0):

json
{
  "query": {
    "match_phrase": {
      "content": "quick brown fox"
    }
  }
}
DocumentMatches?Description
Doc 1Term order correct and adjacent
Doc 2"and" in the middle, not adjacent
Doc 3Order incorrect (brown quick)

Query Result (slop = 1):

json
{
  "query": {
    "match_phrase": {
      "content": {
        "query": "quick brown fox",
        "slop": 1
      }
    }
  }
}
DocumentMatches?Description
Doc 1Term order correct and adjacent
Doc 21 term in the middle ("and"), meets slop = 1
Doc 3Order incorrect, requires 2 moves to match

5. Term Query - Exact Match

Used for exact value queries, does not perform tokenization, matches terms in the index directly.

json
{
  "query": {
    "term": {
      "status": {
        "value": "published"
      }
    }
  }
}

Parameter Explanations:

  • value: The exact value to query.
  • boost: Adjusts the relevance score weight, default is 1.0.
  • case_insensitive: Whether to ignore case, default is false (supported since Elasticsearch 7.10+).

Applicable Types:

  • Keyword fields: Matches the original value exactly.
  • Text fields: Matches tokenized terms, not the original text.
  • Numeric, Date, Boolean: Exact value comparison.

Use Cases:

  • Exact match for Keyword fields (status, tags, ID, etc.).
  • Exact query for numeric, date, boolean values.
  • Specific term query for Text fields (requires understanding tokenization results).

Test Data:

json
// Document 1
{ "status": "published", "title": "Elasticsearch Guide" }

// Document 2
{ "status": "draft", "title": "Quick Tutorial" }

Query Example (Keyword field):

json
{
  "query": {
    "term": {
      "status": "published"
    }
  }
}
DocumentMatches?Description
Doc 1status matches "published" exactly
Doc 2status is "draft"

Query Example (Text field):

Assuming title is a text field using the standard analyzer:

json
{
  "query": {
    "term": {
      "title": "elasticsearch"
    }
  }
}
DocumentMatches?Description
Doc 1"Elasticsearch Guide" contains "elasticsearch" after tokenization
Doc 2Does not contain "elasticsearch" after tokenization

WARNING

When using term query on a text field, the query value will not be tokenized, but it will match against the tokenized terms in the index. For example, querying "Elasticsearch Guide" will not match any results because the index stores tokenized "elasticsearch" and "guide", not the full string.

Recommendation: When performing full-text search on text fields, use match query instead of term query.


6. Terms Query - Multi-Value Exact Match

Similar to SQL's IN query.

Basic Usage

json
{
  "query": {
    "terms": {
      "status": ["published", "draft", "pending"],
      "boost": 2.0
    }
  }
}

Parameter Explanations:

  • boost: Adjusts the relevance score weight.
  • index.max_terms_count: Default maximum 65,536 terms, can be adjusted via settings.

Test Data:

json
// Document 1
{ "status": "published", "title": "Article 1" }
// Document 2
{ "status": "draft", "title": "Article 2" }
// Document 3
{ "status": "archived", "title": "Article 3" }

Query Result:

DocumentMatches?Description
Doc 1status = "published"
Doc 2status = "draft"
Doc 3status = "archived" not in list

Terms Lookup - Fetch values from existing documents as search criteria

When you need to search for a large number of terms, you can fetch field values from existing documents as search criteria, avoiding manually listing a large number of terms.

Usage Limitations:

  • Must enable _source for the field.
  • Does not support cross-cluster search.
  • Also subject to index.max_terms_count limitation (default 65,536).

Parameter Explanations:

  • index: Name of the index where the source document resides.
  • id: ID of the source document.
  • path: Name of the field to fetch values from, supports dot notation for nested objects.

Example Scenario: Suppose there is an index storing article statuses, and you want to find all other documents that have the same status as a specific document.

Test Data:

json
// Document 1
{ "status": "published", "title": "Article 1" }
// Document 2
{ "status": "draft", "title": "Article 2" }
// Document 3
{ "status": "archived", "title": "Article 3" }

Query: Fetch status field value from Document 2 and search for all documents containing these values

json
{
  "query": {
    "terms": {
      "status": {
        "index": "my-index",
        "id": "2",
        "path": "status"
      }
    }
  }
}

Execution Flow:

  1. Elasticsearch fetches the document with ID 2 from the my-index index.

  2. Reads the status field value: ["draft"].

  3. Uses ["draft"] as search criteria, equivalent to executing:

    json
    {
      "query": {
        "terms": {
          "status": ["draft"]
        }
      }
    }

Query Result:

DocumentMatches?Description
Doc 1status = "published" does not match
Doc 2status = "draft"
Doc 3status = "archived" does not match

7. Range Query - Range Query

Used for numeric and date range queries.

Basic Usage

json
{
  "query": {
    "range": {
      "age": {
        "gte": 18,
        "lte": 65,
        "boost": 2.0
      }
    }
  }
}

Parameter Explanations:

  • gt: Greater than.
  • gte: Greater than or equal.
  • lt: Less than.
  • lte: Less than or equal.
  • format: Date format, overrides the default format of the field mapping.
  • relation: Only applicable to range type fields (e.g., date_range, integer_range, etc.), specifies range comparison method:
    • INTERSECTS (Default): Intersection comparison - matches if the query range has any overlap with the document range.
    • CONTAINS: Containment comparison - matches if the document range completely contains the query range.
    • WITHIN: Within comparison - matches if the document range is completely within the query range.
  • time_zone: Time zone setting, used to convert date values to UTC.
  • boost: Adjusts the relevance score weight (default 1.0).

Test Data:

json
// Document 1
{ "age": 25, "name": "Alice" }
// Document 2
{ "age": 17, "name": "Bob" }
// Document 3
{ "age": 70, "name": "Charlie" }

Query Result (age range 18-65):

DocumentMatches?Description
Doc 125 is in range
Doc 217 < 18
Doc 370 > 65

Date Range Query

Basic Date Example:

json
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2024-01-01",
        "lte": "2024-12-31",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

Date Example using Date Math:

json
{
  "query": {
    "range": {
      "created_date": {
        "gte": "now-1d/d",
        "lte": "now/d"
      }
    }
  }
}

This query returns documents where the created_date field is between yesterday and today.

Date Math Syntax Explanations:

  • now: Current time (UTC)
  • +1h: Plus 1 hour
  • -1d: Minus 1 day
  • /d: Round to the day (start or end of the day)
  • /M: Round to the month
  • /y: Round to the year

Using Date Math Operator ||:

When a fixed date needs to be paired with date math (e.g., rounding), you must use || to connect them:

json
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2024-01-01||/d",  // Use || to connect date and rounding operation
        "lte": "2024-12-31||/d"
      }
    }
  }
}

Date Math Rounding Rules:

OperatorRounding BehaviorExample
gtRound up to the first millisecond (exclusive)2014-11-18||/M2014-12-01T00:00:00.000Z
gteRound down to the first millisecond (inclusive)2014-11-18||/M2014-11-01T00:00:00.000Z
ltRound down to the last millisecond (exclusive)2014-11-18||/M2014-10-31T23:59:59.999Z
lteRound up to the last millisecond (inclusive)2014-11-18||/M2014-11-30T23:59:59.999Z

format Parameter Explanation

Role of the format parameter:

  • Overrides the date format defined in the field mapping.
  • Specifies the date format for query parameters (gte, gt, lte, lt).

format Usage Rules:

  1. If the date field does not specify a format.

    • Usually supports multiple common date formats.
    • Elasticsearch will attempt to parse automatically.
  2. If the index mapping specifies a format.

    • Query parameters (gte, lte, etc.) must match the format defined in the index mapping.
    • Or override it using the format parameter in the query.
  3. When using the format parameter.

    • All query parameters (gte, gt, lte, lt) must match the format specified by the format parameter.
    • Inconsistent formats will cause the query to fail or produce unexpected results.

Example:

json
// Index mapping definition
{
  "mappings": {
    "properties": {
      "created_date": {
        "type": "date",
        "format": "yyyy-MM-dd'T'HH:mm:ss'Z'"  // Defined format
      }
    }
  }
}

// ✅ Example 1: Query format matches mapping exactly
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2024-01-01T00:00:00Z",
        "lte": "2024-12-31T23:59:59Z"
      }
    }
  }
}

// ❌ Example 2: Query format does not match mapping (only provides YMD)
// Error: Format mismatch, cannot parse
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2024-01-01",
        "lte": "2024-12-31"
      }
    }
  }
}

// ✅ Example 3: Use format parameter to override mapping format
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2024-01-01",
        "lte": "2024-12-31",
        "format": "yyyy-MM-dd"  // Override mapping format
      }
    }
  }
}

// ❌ Example 4: Query parameter format does not match format parameter
// Error: Query parameter format does not match format parameter
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2024-01-01T00:00:00Z",  // Includes time
        "lte": "2024-12-31T23:59:59Z",
        "format": "yyyy-MM-dd"  // format only defines YMD
      }
    }
  }
}

Time Zone Handling

Using the time_zone parameter:

json
{
  "query": {
    "range": {
      "timestamp": {
        "time_zone": "+01:00",
        "gte": "2020-01-01T00:00:00",
        "lte": "now"
      }
    }
  }
}

Time Zone Conversion Explanation:

  • time_zone parameter can use ISO 8601 UTC offset (e.g., +01:00, -08:00).
  • Can also use IANA time zone ID (e.g., America/Los_Angeles, Asia/Taipei).
  • In the example, 2020-01-01T00:00:00 uses UTC offset +01:00, which will be converted to 2019-12-31T23:00:00 UTC.
  • Note: The time_zone parameter does not affect the value of now; now is always the UTC of the current system time.

Missing Date Components

When the date format is incomplete, Elasticsearch fills in the missing components with the following default values (the year will not be replaced):

ComponentDefault Value
MONTH_OF_YEAR01
DAY_OF_MONTH01
HOUR_OF_DAY23
MINUTE_OF_HOUR59
SECOND_OF_MINUTE59
NANO_OF_SECOND999_999_999

Official Documentation Example (Date part):

  • If format is yyyy-MM, and gt value is 2099-12.
  • Elasticsearch will convert it to 2099-12-01T23:59:59.999_999_999Z.
  • Retains the provided year (2099) and month (12).
  • Uses default day (01), hour (23), minute (59), second (59), nanosecond (999_999_999).

Actual Test Results (Time part):

The behavior of the time part differs from the official documentation explanation. Actual tests found:

✅ Cases that query successfully:

json
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2023-01-15T08",  // Only provided up to the hour
        "lte": "2023-01-15T08"
      }
    }
  }
}
  • Can query data for 2023-01-15T08:30:00Z.
  • Indicates that Elasticsearch formats both the document and query parameters to the same precision before comparing.

❌ Cases that cannot be queried:

json
// Case 1: Using gt and lte
{
  "query": {
    "range": {
      "joined_date": {
        "gt": "2023-01-15T08",   // Greater than (exclusive)
        "lte": "2023-01-15T08"
      }
    }
  }
}

// Case 2: Using gte and lt
{
  "query": {
    "range": {
      "joined_date": {
        "gte": "2023-01-15T08",
        "lt": "2023-01-15T08"   // Less than (exclusive)
      }
    }
  }
}
  • Neither case can query 2023-01-15T08:30:00Z.
  • Because gt and lt exclude the specified precision range.

Inferred Behavior:

  1. Date part: Fills in missing components according to official documentation.
  2. Time part: Formats the document and query parameters to the same precision, then compares.
    • E.g., "2023-01-15T08" treats all 2023-01-15T08:xx:xx data as the same time unit.
    • Using gte and lte can include data for the entire hour.
    • Using gt or lt excludes that time unit.

Recommended Practice:

To avoid unexpected query results due to precision issues, it is recommended to:

  1. Explicitly specify the complete time format.

    json
    {
      "query": {
        "range": {
          "created_date": {
            "gte": "2023-01-15T08:00:00Z",
            "lte": "2023-01-15T08:59:59Z"
          }
        }
      }
    }
  2. Use Date Math rounding functionality.

    json
    {
      "query": {
        "range": {
          "created_date": {
            "gte": "2023-01-15T08:00:00Z||/h",  // Round to start of hour
            "lte": "2023-01-15T08:59:59Z||/h"   // Round to end of hour
          }
        }
      }
    }
  3. Use gte + lte when querying an entire time unit.

    json
    {
      "query": {
        "range": {
          "created_date": {
            "gte": "2023-01-15T08",  // Includes start of 08:00:00
            "lte": "2023-01-15T08"   // Includes end of 08:59:59
          }
        }
      }
    }

Numeric vs String Differences

When using range query on a date field, the parsing methods for numbers and strings differ:

json
// ❌ Error: Numbers are interpreted as millisecond timestamps
{
  "query": {
    "range": {
      "created_date": {
        "gte": 2020  // Interpreted as 1970-01-01T00:00:02.020Z (2020 milliseconds after 1970)
      }
    }
  }
}

// ✅ Correct: Strings are parsed according to format
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2020"  // Interpreted as 2020-01-01T00:00:00.000Z (Year 2020)
      }
    }
  }
}

Pitfalls of mixing numbers and strings:

When gte/gt/lte/lt mix numbers and strings, different results occur:

json
// ❌ Error: Mixing numbers and date format strings
{
  "query": {
    "range": {
      "created_date": {
        "gte": 2022,              // Number: interpreted as milliseconds
        "lte": "2025-01-01"       // String: interpreted as date format
      }
    }
  }
}
// Error: String "2025-01-01" cannot be mixed with numbers, format error

// ✅ Correct: Mixing numbers and pure numeric strings
{
  "query": {
    "range": {
      "created_date": {
        "gte": 2025,              // Number: interpreted as milliseconds
        "lte": "2025"             // Pure numeric string: interpreted as milliseconds
      }
    }
  }
}
// Success: Both are treated as millisecond timestamps

// ✅ Correct: Use strings consistently
{
  "query": {
    "range": {
      "created_date": {
        "gte": "2022",            // String: interpreted as year
        "lte": "2025-01-01"       // String: interpreted as date
      }
    }
  }
}

Important Principles:

  1. It is recommended to use string format consistently to avoid parsing issues caused by mixing numbers and strings.
  2. Pure numeric strings (e.g., "2025") are treated as millisecond timestamps.
  3. Date format strings (e.g., "2025-01-01") are parsed according to format.
  4. Numbers are always interpreted as millisecond timestamps.

8. Exists Query - Field Existence Query

Queries whether a field exists (is not null).

Positive Query: Query field exists

json
{
  "query": {
    "exists": {
      "field": "email"
    }
  }
}

Test Data:

json
// Document 1
{ "name": "Alice", "email": "[email protected]" }
// Document 2
{ "name": "Bob", "email": null }
// Document 3
{ "name": "Charlie" }
// Document 4
{ "name": "David", "email": "" }
// Document 5
{ "name": "Eve", "email": [] }

Query Result:

DocumentMatches?Description
Doc 1email field exists and has a value
Doc 2email field is null
Doc 3No email field
Doc 4Empty string is still considered existing
Doc 5Empty array is considered non-existent

Negative Query: Query field does not exist

Use must_not paired with exists to query documents where the field does not exist.

json
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "email"
        }
      }
    }
  }
}

Special Case Explanations

In some cases, even if the original JSON document has a value for the field, the exists query may still determine it as "non-existent":

  1. index: false and doc_values: false:

    • index: false: Field is not indexed, cannot be used for search queries.
    • doc_values: false: Field does not store doc values, cannot be used for sorting, aggregation, or script access.
    • When both are set to false, the exists query considers the field non-existent.
  2. Exceeding ignore_above setting: For keyword type fields, if the value length exceeds the ignore_above limit set in the mapping, the value will not be indexed.

    json
    // Mapping sets ignore_above: 10
    { "tags": "this_is_too_long" }  // Length 15, will not be indexed
  3. ignore_malformed and format error: When the field type is numeric, date, etc., but the written data format is incorrect, if ignore_malformed: true is set in the mapping, the value will be ignored and not indexed.

    json
    // Mapping sets price to integer type, and ignore_malformed: true
    { "price": "not_a_number" }  // Format error, will not be indexed, but document write succeeds

These settings are mainly used to improve data processing fault tolerance, but note that they affect the results of the exists query.


9. Prefix Query - Prefix Query

Queries documents starting with a specific string.

json
{
  "query": {
    "prefix": {
      "username": {
        "value": "admin"
      }
    }
  }
}

Parameter Explanations:

  • value: Prefix string.
  • boost: Adjusts the relevance score weight.
  • case_insensitive: Whether to ignore case, default is false.
  • rewrite: Query rewrite method, used for performance optimization. When a prefix matches a large number of terms, this parameter controls how to handle matching results. Common values include constant_score (default, all matches given the same score), top_terms_N (only takes the top N terms), etc. For detailed explanations, please refer to the official documentation.

Test Data:

json
// Document 1
{ "username": "admin123" }

// Document 2
{ "username": "administrator" }

// Document 3
{ "username": "user456" }

Query Result:

DocumentMatches?Description
Doc 1Starts with "admin"
Doc 2Starts with "admin"
Doc 3Does not start with "admin"

10. Wildcard Query - Wildcard Query

Uses * and ? for fuzzy queries (performance is poor, use with caution).

json
{
  "query": {
    "wildcard": {
      "username": {
        "value": "ad*n?",
        "case_insensitive": true
      }
    }
  }
}

Wildcard Explanations:

  • *: Matches zero or more characters.
  • ?: Matches a single character.

Parameter Explanations:

  • value: Query string containing wildcards.
  • wildcard: Alias for value, same functionality. When both exist, the last one takes precedence.
  • boost: Adjusts the relevance score weight.
  • case_insensitive: Whether to ignore case, default is false.
  • rewrite: Query rewrite method.

Comparison of wildcard and value parameters

Test Data:

json
// Document 1
{ "username": "admin" }
// Document 2
{ "username": "administrator" }
// Document 3
{ "username": "admins" }
// Document 4
{ "username": "user456" }

Query Example: Using both wildcard and value

json
{
  "query": {
    "wildcard": {
      "username": {
        "wildcard": "admin",
        "value": "ad*n?"
      }
    }
  }
}

Parameter Explanations:

  • wildcard: "admin": Will match "admin" exactly.
  • value: "ad*n?": Will match "ad" start + zero or more characters + "n" + single character.

Query Result (Using value: "ad*n?" because it appears later):

DocumentMatches?Description
Doc 1"admin" only has 5 characters, does not match "ad*n?" pattern (needs one more character after n)
Doc 2"administrator" matches "ad*n?" pattern
Doc 3"admins" matches "ad*n?" pattern
Doc 4Does not start with "ad"

If wildcard appears later:

json
{
  "query": {
    "wildcard": {
      "username": {
        "value": "ad*n?",
        "wildcard": "admin"
      }
    }
  }
}

Query Result (Using wildcard: "admin" because it appears later):

DocumentMatches?Description
Doc 1Matches "admin" exactly
Doc 2Not an exact match for "admin"
Doc 3Not an exact match for "admin"
Doc 4Not an exact match for "admin"

Performance Notes:

  • Avoid using leading wildcards (e.g., *term or ?term), which leads to full table scans.
  • Wildcard queries have no caching mechanism, performance is poor.

11. Regexp Query - Regular Expression Query

Uses regular expressions for complex matching (performance is worst, use with caution).

json
{
  "query": {
    "regexp": {
      "phone": {
        "value": "09[0-9]{8}"
      }
    }
  }
}

Parameter Explanations:

  • value: Regular expression pattern.
  • flags: Regular expression flags (e.g., COMPLEMENT, INTERVAL), used to enable additional operators.
  • case_insensitive: Whether to ignore case, default is false.
  • max_determinized_states: Maximum number of states, default is 10000. This parameter limits the complexity of the regex engine, preventing overly complex regexes from causing performance issues or memory exhaustion. Throws an exception when the regex is too complex.
  • rewrite: Query rewrite method.

Test Data:

json
// Document 1
{ "phone": "0912345678" }
// Document 2
{ "phone": "0987654321" }
// Document 3
{ "phone": "02-12345678" }

Query Result (Querying "09[0-9]{8}"):

DocumentMatches?Description
Doc 1Matches 09 start + 8 digits
Doc 2Matches 09 start + 8 digits
Doc 3Format does not match

Flags Parameter Explanation and Examples

The flags parameter is used to enable additional operators for the Lucene regex engine. The following uses the same test data to show the effects of different flags.

Note: These symbols (~, #, <>, &, @) are Lucene-specific extensions, not standard regex syntax.

Test Data:

json
// Document 1
{ "code": "abc123" }
// Document 2
{ "code": "abc456" }
// Document 3
{ "code": "xyz789" }
// Document 4
{ "code": "def123" }
// Document 5
{ "code": "abc" }

1. COMPLEMENT - Negation Pattern

Uses the ~ operator to negate the subsequent pattern.

json
{
  "query": {
    "regexp": {
      "code": {
        "value": "abc~123",
        "flags": "COMPLEMENT"
      }
    }
  }
}

Query Result:

DocumentMatches?Description
Doc 1"abc123" contains the negated "123"
Doc 2"abc456" matches "abc" followed by something other than "123"
Doc 3Does not start with "abc"
Doc 4Does not start with "abc"
Doc 5"abc" followed by nothing (not "123")

Text field usage notes:

Be careful with the impact of tokenization when using ~ negation on text fields. For example:

json
// Assuming name field is text type
// Data: { "name": "Wing Chou" }

// Query
{
  "query": {
    "regexp": {
      "name": {
        "value": "~(wing)",
        "flags": "COMPLEMENT"
      }
    }
  }
}

At first glance, you might think this query would exclude "Wing Chou", but in reality:

  • "Wing Chou" is tokenized into ["wing", "chou"].
  • ~(wing) negates "wing", but "chou" still matches.
  • Therefore, "Wing Chou" will still appear in the query results.

It is recommended to use negation operators on keyword fields to avoid unexpected results caused by tokenization.


2. INTERVAL - Numeric Range

Uses the <> operator to match numeric ranges.

json
{
  "query": {
    "regexp": {
      "code": {
        "value": "abc<100-200>",
        "flags": "INTERVAL"
      }
    }
  }
}

Query Result:

DocumentMatches?Description
Doc 1"abc123" matches abc + number in 100-200 range
Doc 2456 in "abc456" is out of range
Doc 3Does not start with "abc"
Doc 4Does not start with "abc"
Doc 5No number after "abc"

3. INTERSECTION - AND Operation

Uses the & operator to match strings that match both patterns.

json
{
  "query": {
    "regexp": {
      "code": {
        "value": "abc.+&.+123",
        "flags": "INTERSECTION"
      }
    }
  }
}

Query Result:

DocumentMatches?Description
Doc 1"abc123" matches both "starts with abc" and "ends with 123"
Doc 2"abc456" does not match "ends with 123"
Doc 3"xyz789" does not match "starts with abc"
Doc 4"def123" does not match "starts with abc"
Doc 5"abc" does not match "ends with 123"

4. ANYSTRING - Match Any String

Uses the @ operator to match any entire string.

Official Example (paired with exclusion logic):

json
{
  "query": {
    "regexp": {
      "code": {
        "value": "@&~(abc.+)",
        "flags": "ANYSTRING|INTERSECTION|COMPLEMENT"
      }
    }
  }
}

This example matches all strings that do not start with "abc".

Note: I cannot understand the actual difference between @&~(abc.+) and simply using ~(abc.+). If you need to use this operator, it is recommended to refer to official documentation or perform actual tests to confirm the behavior.


5. EMPTY - Match Nothing

Uses the # operator to represent "match nothing," not even an empty string.

Difference from empty string:

json
// Empty string matches empty data
// ✅ Matches data where code field is empty string
{
  "query": {
    "regexp": {
      "code": {
        "value": ""
      }
    }
  }
}

// # matches nothing
// ❌ Matches nothing (including empty string)
{
  "query": {
    "regexp": {
      "code": {
        "value": "#",
        "flags": "EMPTY"
      }
    }
  }
}

Actual Use Case (.NET Example):

Mainly used when dynamically combining regular expressions in code to avoid accidentally matching empty string data when there are no query conditions.

csharp
// .NET dynamic combination query condition example
List<string> conditions = new();

if (searchByAbc) {
    conditions.Add("abc.*");
}

if (searchByXyz) {
    conditions.Add("xyz.*");
}

// Use # to avoid matching empty string when there are no conditions
string pattern = conditions.Count > 0
    ? string.Join("|", conditions)  // "abc.*|xyz.*"
    : "#";                          // Ensure nothing is matched

SearchRequest searchRequest = new() {
    Query = new RegexpQuery {
        Field = "code",
        Value = pattern,
        Flags = conditions.Count > 0 ? "ALL" : "EMPTY"
    }
};

Notes:

# is a special Lucene operator and cannot be used to match the literal "#" character.

json
// ❌ Error: Cannot be used to query data containing "#" character
// Query data { "code": "#" } → Cannot find
{
  "query": {
    "regexp": {
      "code": {
        "value": "#",
        "flags": "EMPTY"
      }
    }
  }
}

// ❌ Error: Cannot be used to query data containing "#" character
// Query data { "code": "#1" } → Cannot find
{
  "query": {
    "regexp": {
      "code": {
        "value": "#1",
        "flags": "EMPTY"
      }
    }
  }
}

To match the literal "#" character, you need to use backslash escaping (see "Special Character Escaping" section below).


6. Combining Multiple Flags

You can use the | separator to enable multiple operators simultaneously.

json
{
  "query": {
    "regexp": {
      "code": {
        "value": "abc<100-500>",
        "flags": "COMPLEMENT|INTERVAL"
      }
    }
  }
}

Flags Support Options:

  • ALL (Default): Enables all optional operators.
  • NONE: Disables all optional operators.
  • COMPLEMENT: Enables ~ negation operator.
  • INTERVAL: Enables <> range operator.
  • INTERSECTION: Enables & AND operator.
  • ANYSTRING: Enables @ any string operator.
  • EMPTY: Enables # empty language operator (matches nothing).

Special Character Escaping

In the Lucene regex engine, the following characters have special meanings. If you want to use them as ordinary characters, you need to escape them with a backslash \:

Reserved Characters:

text
. ? + * | { } [ ] ( ) " \ #

Escaping Example:

json
// ❌ Error: + is a special character
// Query data { "phone": "+886912345678" } → Cannot find
{
  "query": {
    "regexp": {
      "phone": {
        "value": "+886.*"
      }
    }
  }
}

// ✅ Correct: Use backslash to escape
// Query data { "phone": "+886912345678" } → Can find
{
  "query": {
    "regexp": {
      "phone": {
        "value": "\\+886.*"
      }
    }
  }
}

Notes:

Because the backslash itself needs escaping in JSON strings, you need to use a double backslash \\ in JSON queries.

json
// Need to write "\\" in JSON to represent one backslash
{ "value": "\\+886.*" }  // Actual regex is "\+886.*"

Anchor Operator Limitation

Lucene's regular expression engine does not support anchor operators, such as ^ (beginning of line) or $ (end of line). To match a term, the regular expression must match the entire string.

Lucene's regex engine does not support anchor operators, such as ^ (beginning of line) or $ (end of line). To match a term, the regular expression must match the entire string.

This means:

  • ^ and $ do not have special anchor meanings.
  • Regular expressions match the entire field value by default (equivalent to already having anchor effects).
  • Based on testing, ^ and $ are treated as ordinary characters, not anchor operators (using them results in no data found).

Example:

json
// ✅ Correct: Match pattern directly
{ "value": "abc.*" }      // Matches full string starting with abc

// ❌ Not recommended: Cannot find abc data, inferred that it tries to match ^abc and abc$
{ "value": "^abc" }
{ "value": "abc$" }

Performance Notes:

  • Regex query performance is extremely poor, avoid using it as much as possible.
  • Consider using other query methods (e.g., prefix, wildcard) as alternatives.
  • If you must use it, limit the query scope and set a reasonable max_determinized_states.
  • Avoid overly complex regular expressions to prevent triggering max_determinized_states limits.

13. Fuzzy Query - Fuzzy Query

Fault-tolerant query, allows spelling errors. Can be used for text and keyword fields.

Text field example:

json
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "wing",
        "fuzziness": "AUTO"
      }
    }
  }
}

Effect: Queries terms with edit distance within the allowed range of wing.

Example:

  • wing ✓ (Exact match).
  • wang ✓ (1 character difference: a).
  • weng ✓ (1 character difference: e).
  • king ✗ (Too much difference).

Note: Because text fields are processed by an analyzer (tokenization, lowercase conversion):

  • Index: Wing Chou → Tokenized into [wing, chou].
  • Query: wing → Can match the term wing.

Parameter Explanations:

  • value: Term to query (required).
  • fuzziness: Allowed edit distance (AUTO, 0, 1, 2), recommended to use AUTO.
    • AUTO: Automatically determines edit distance based on term length.
    • 0: No errors allowed (equivalent to term query).
    • 1: Allows 1 character difference.
    • 2: Allows 2 character difference.
  • prefix_length: First N characters must match exactly, default is 0.
  • max_expansions: Max candidate terms to expand, default is 50.
  • transpositions: Whether to allow adjacent character swaps (e.g., ab → ba), default is true.

Complete Example:

json
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "quikc",
        "fuzziness": "AUTO",
        "prefix_length": 2,
        "max_expansions": 10,
        "transpositions": true
      }
    }
  }
}

Parameter Effects:

prefix_length = 2 (First 2 characters must match):

  • quick ✓ (Starts with qu, matches prefix).
  • quikc ✓ (Starts with qu, matches prefix).
  • xuick ✗ (Starts with xu, does not match prefix qu).

max_expansions = 10 (Max 10 candidate terms):

Assuming the index contains 20+ similar terms (quick, quit, quiz, quiet, quiche...), Elasticsearch only takes the first 10 candidate terms for searching, ignoring the rest.

Purpose: Limiting expansion count improves query performance, avoiding resource consumption from too many candidate terms.

transpositions = true (Allows adjacent character swaps):

  • qiuck ✓ (ui ↔ iu, swap counts as 1 edit).
  • qukic ✓ (ki ↔ ik, swap counts as 1 edit).

transpositions = false (No swaps allowed):

json
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "qiuck",
        "fuzziness": 1,
        "transpositions": false
      }
    }
  }
}
  • qiuck ✗ (ui ↔ iu swap not allowed, requires 2 edits: delete i, insert u).
  • quick ✓ (Requires only 1 edit: replace i → u).

Keyword field example:

json
{
  "query": {
    "fuzzy": {
      "name.keyword": {
        "value": "Wing Chow",
        "fuzziness": "AUTO"
      }
    }
  }
}

Effect: Performs fuzzy matching on the complete keyword value.

Example:

  • Wing Chou ✓ (1 character difference: w → u).
  • Wing Chow ✓ (Exact match).
  • Wing Zhou ✓ (2 character difference).
  • John Wang ✗ (Too much difference).

Usage Recommendations:

For text fields:

  • Recommended to use match query paired with fuzziness parameter, rather than using fuzzy query directly.
  • Reason: match query is processed by an analyzer (tokenization, lowercase, etc.), which better fits actual search requirements.

Example Comparison:

Scenario: Index contains document name = "Wing Chou" (text field)

→ After analyzer processing, the terms in the index are: ["wing", "chou"] (lowercased, tokenized)


Example 1: fuzziness = 0 (Must match exactly)

json
// Not recommended: Use fuzzy directly (text field)
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "Wing",  // Will not be analyzed, matches "Wing" directly
        "fuzziness": 0
      }
    }
  }
}
  • Query term: Wing (uppercase W).
  • Index term: wing (lowercase w).
  • fuzziness = 0 means must match exactly.
  • Result: ✗ Cannot find (Wingwing, case differs).
json
// Recommended: Use match (text field)
{
  "query": {
    "match": {
      "name": {
        "query": "Wing",  // Will be processed by analyzer, converted to "wing"
        "fuzziness": 0
      }
    }
  }
}
  • Query term: Wing → Analyzer → wing (lowercase).
  • Index term: wing (lowercase).
  • fuzziness = 0 means must match exactly.
  • Result: ✓ Can find (matches exactly).

Example 2: fuzziness = 1 (Allows 1 character difference)

json
// Not recommended: Use fuzzy directly (text field)
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "wing chuo",  // No tokenization, queries "wing chuo" as a complete term
        "fuzziness": 1
      }
    }
  }
}
  • Query term: wing chuo (complete string).
  • Index term: wing, chou (tokenized).
  • Result: ✗ Cannot find (no "wing chuo" term in index).
json
// Recommended: Use match + fuzziness (text field)
{
  "query": {
    "match": {
      "name": {
        "query": "wing chuo",  // Tokenized into ["wing", "chuo"], fuzzy match for each term
        "fuzziness": 1
      }
    }
  }
}
  • Query term: wing chuo → Analyzer → ["wing", "chou"].
  • Index term: wing, chou.
  • Result: ✓ Can find (wing matches exactly, chuo differs from chou by 1 character).

For keyword fields:

  • Can use fuzzy query directly.
  • Because keyword fields are not analyzed, fuzzy matching against the complete value is reasonable.

Summary of Usage Timing:

  • Text fields: Prioritize match + fuzziness.
  • Keyword fields: Can use fuzzy query.
  • Need direct term matching (no analysis needed): Use fuzzy query.

Edit Distance Explanation:

Edit distance (Levenshtein Distance) is the minimum number of operations required to convert one string into another. Allowed operations include:

  • Insert a character: quicquick (insert k).
  • Delete a character: quickkquick (delete k).
  • Substitute a character: quikcquick (substitute k → c).
  • Swap adjacent characters (requires transpositions = true): qiuckquick (swap iu).

For detailed fuzziness parameter explanations, please refer to the "Match Query" section.

14. IDs Query - Query by Document ID

Queries directly by document _id.

json
{
  "query": {
    "ids": {
      "values": ["1", "2", "3"]
    }
  }
}

Use Cases:

  • Query by known document ID.
  • Batch query specific documents.
  • Use in combination with other queries.

15. Nested Query - Nested Object Query

Used for querying nested type fields. Can only be used for nested types, not object types. Can preserve the relationship of fields within array elements.

Mapping Definition:

json
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "comments": {
        "type": "nested",
        "properties": {
          "author": { "type": "keyword" },
          "rating": { "type": "integer" },
          "text": { "type": "text" }
        }
      }
    }
  }
}

Basic Query:

json
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "match": { "comments.author": "John" }},
            { "range": { "comments.rating": { "gte": 4 }}}
          ]
        }
      }
    }
  }
}

Parameter Explanations:

  • path: Path to the nested object (required).
  • query: Query to execute within the nested object (required).
  • score_mode: How to calculate the nested object score, default is avg.
    • avg: Average score (default).
    • sum: Sum.
    • max: Maximum score.
    • min: Minimum score.
    • none: Do not calculate score (set to 0).
  • ignore_unmapped: Whether to ignore errors if the field does not exist, default is false.

Test Data:

json
// Document 1
{
  "title": "Product A",
  "comments": [
    { "author": "John", "rating": 5, "text": "Great!" },
    { "author": "Jane", "rating": 3, "text": "OK" }
  ]
}

// Document 2
{
  "title": "Product B",
  "comments": [
    { "author": "John", "rating": 2, "text": "Poor" },
    { "author": "Bob", "rating": 5, "text": "Excellent" }
  ]
}

Query Result (author = "John" AND rating >= 4):

DocumentMatches?Description
Doc 1John's rating is 5 (>= 4)
Doc 2John's rating is 2 (< 4)

Why is Nested Query needed?

Problem: Object type flattens arrays

If comments is an object type (default), Elasticsearch flattens the array, losing the relationship between elements:

json
// Original data
{
  "title": "Product A",
  "comments": [
    { "author": "John", "rating": 5 },
    { "author": "Jane", "rating": 3 }
  ]
}

// After flattening (relationship lost)
{
  "title": "Product A",
  "comments.author": ["John", "Jane"],
  "comments.rating": [5, 3]
}

Example: Incorrect query result (using object type)

Query "John gave 3 stars" product:

json
{
  "query": {
    "bool": {
      "must": [
        { "term": { "comments.author": "John" }},
        { "term": { "comments.rating": 3 }}
      ]
    }
  }
}

Result: ✓ Will find Document 1 (❌ But this is wrong! John gave 5 stars, not 3).

Reason: Elasticsearch only knows author has "John" and rating has 3, but doesn't know "John corresponds to 5 stars."


Solution: Use nested type + nested query

Define comments as nested type, Elasticsearch stores each array element as an independent sub-document internally (but it's still one document to the user):

json
// What you see: one document
{
  "title": "Product A",
  "comments": [
    { "author": "John", "rating": 5 },
    { "author": "Jane", "rating": 3 }
  ]
}

// Elasticsearch internal storage structure (hidden, user cannot see):
// ├─ Main document: { "title": "Product A" }
// ├─ Sub-document 1: { "author": "John", "rating": 5 }
// └─ Sub-document 2: { "author": "Jane", "rating": 3 }

Key Points:

  • To you, it's still one document.
  • Elasticsearch automatically handles sub-document relationships.
  • When querying, use nested query to ensure conditions match "within the same sub-document."

Query "John gave 3 stars" product:

json
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "term": { "comments.author": "John" }},
            { "term": { "comments.rating": 3 }}
          ]
        }
      }
    }
  }
}

Result: ✗ Cannot find (✓ Correct! John gave 5 stars, not 3).


score_mode parameter example:

When multiple nested objects in a document match the query, score_mode determines how to calculate the final score for that document.

Test Data:

json
// Document 1
{
  "title": "Product A",
  "comments": [
    { "author": "Alice", "rating": 5, "text": "Excellent" },
    { "author": "Bob", "rating": 4, "text": "Good" },
    { "author": "Charlie", "rating": 3, "text": "Average" }
  ]
}

// Document 2
{
  "title": "Product B",
  "comments": [
    { "author": "David", "rating": 5, "text": "Perfect" }
  ]
}

Query:

json
{
  "query": {
    "nested": {
      "path": "comments",
      "score_mode": "max",
      "query": {
        "range": { "comments.rating": { "gte": 3 }}
      }
    }
  }
}

Result Comparison (Assuming each matching comment has a score of 1.0):

DocumentMatching comment countmaxavgsummin
Doc 131.01.03.01.0
Doc 211.01.01.01.0

Explanation:

  • Using sum, Document 1's score will be higher (because there are 3 matching comments).
  • Using max or avg, both documents have the same score.
  • This affects sorting results.

Purpose:

  • Using sum allows documents with "more matching comments" to be sorted higher.
  • Using max only considers the "most relevant comment."

Advanced: Use inner_hits to fetch matching nested objects

Sometimes you don't just want to know "which document matches," but also "which nested object in the document matches."

json
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "term": { "comments.author": "John" }},
            { "range": { "comments.rating": { "gte": 4 }}}
          ]
        }
      },
      "inner_hits": {}
    }
  }
}

Explanation:

  • inner_hits is an object-type parameter.
  • Using an empty object {} means using default settings.
  • inner_hits supports various parameters (e.g., size, from, _source, etc.), but they are outside the scope of this note.

Return Result:

json
{
  "hits": {
    "hits": [
      {
        "_source": {
          "title": "Product A",
          "comments": [
            { "author": "John", "rating": 5, "text": "Great!" },
            { "author": "Jane", "rating": 3, "text": "OK" }
          ]
        },
        "inner_hits": {
          "comments": {
            "hits": {
              "hits": [
                {
                  "_source": {
                    "author": "John",
                    "rating": 5,
                    "text": "Great!"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

Purpose: You can clearly see specifically which comment matches the condition, rather than the entire comments array.


object vs nested quick comparison:

Featureobject (default)nested
Array handlingFlattened (relationship lost)Maintained (relationship preserved)
Query methodGeneral query (match, term, bool...)Must use nested query
Use caseSingle object or array without relationshipNeed to preserve array element relationship
PerformanceBetterWorse (extra overhead)

Usage Recommendations:

Use nested when:

  • The field is an array.
  • You need to query multiple conditions "within the same array element."
  • You need to preserve the relationship between array elements.

Example Scenarios:

  • Order's product list (product name + price must correspond).
  • Employee's project experience (project name + role must correspond).
  • Product reviews (reviewer + rating must correspond).

Use object when:

  • The field is not an array.
  • Array elements do not need to preserve relationships.
  • You are pursuing better query performance.

Changelog

    • Initial document creation.