On this page

Skip to content

Elasticsearch Query DSL Syntax Notes

TLDR

  • Query DSL Advantages: Compared to Query String, DSL supports Nested queries, geospatial queries, custom scoring, and complex boolean logic, with a clear structure that is easy to maintain.
  • Match Query: The core of full-text search; minimum_should_match is only effective under OR logic, with a floor value of 1.
  • Multi Match: Provides various strategies such as best_fields (default, takes the highest score), most_fields (sums scores), and cross_fields (cross-field search).
  • Combined Fields: Term-centric; treats multiple text fields as a single field, suitable for cross-field searching.
  • Nested Query: Solves the loss of correlation caused by the flattening of object types; must be used for nested type fields.
  • Date Range Query: It is recommended to use string formats consistently and utilize Date Math (e.g., ||/d) for rounding to avoid parsing errors caused by mixing numeric and string types.
  • Performance Warning: wildcard and regexp queries have poor performance; avoid using leading wildcards or complex regular expressions.

Query DSL vs Query String

In production environments, Query DSL (Domain Specific Language) provides more powerful features and clearer error feedback than Query String due to its JSON-structured nature.

1. Functional Differences

Certain advanced features can only be implemented via Query DSL:

  • Nested Queries: Preserves the correlation of fields within nested objects.
  • Geospatial Queries: Such as geo_distance.
  • Custom Scoring: Uses function_score to customize relevance scoring.
  • Complex Boolean Logic: Combines must, should, must_not, and filter via bool.

Common Query DSL Syntax

Used for full-text search; it performs tokenization and relevance scoring.

minimum_should_match Parameter

This parameter is only effective when operator = "OR", used to control the minimum number of conditions that must be met.

  • Special Rules: The minimum match count has a floor of 1. When set to -4 or -100%, at least 1 term must match.
  • Percentage Calculation: Uses "round down" (floor). For example, if 4 terms are set to 75%, 4 × 0.75 = 3.0, at least 3 must match; if set to 74%, 4 × 0.74 = 2.96, rounded down to 2.
  • Multi-condition Combination: Formats like 2<-25% 9<-3 mean that if there are ≤ 2 tokens, 100% must match; for 3-9 tokens, a maximum of 25% can be missing; for > 9 tokens, a maximum of 3 can be missing.

lenient Parameter

Controls behavior when types do not match:

  • false (default): Throws an error, query fails.
  • true: Ignores the query for that field, does not throw an error, but the field will have no matching results.

Searches for the same keyword across multiple fields.

  • best_fields: Takes the score of the highest-scoring field (default).
  • most_fields: Sums the scores of all fields.
  • cross_fields: Treats multiple fields as one large field, suitable for cross-field matching like names or addresses.

WARNING

When the search_analyzer settings for fields are inconsistent, the behavior of cross_fields changes, which may result in all terms needing to appear in the same field.


Adopts a term-centric approach, treating multiple text fields as a single combined field.

  • Limitations: All fields must be of text type and use the same search_analyzer.
  • Execution Logic: Each term must appear in at least one field (can be distributed across different fields).

Used for numeric and date queries.

  • Date Format Pitfalls: If the index mapping specifies a format, the query parameters must align with it, or use the format parameter to override.
  • Mixing Numeric and String Types: Numeric values are always interpreted as millisecond timestamps; it is recommended to use string formats (e.g., "2025-01-01") consistently to avoid parsing errors.
  • Time Precision Issues: If only the hour is provided (e.g., 2023-01-15T08), Elasticsearch will format the document and query parameters to the same precision. It is recommended to explicitly specify the full time or use the Date Math rounding function (e.g., ||/h).

Used to query nested type fields, solving the loss of correlation caused by the flattening of object types.

  • When to encounter this issue: When the data structure is an array (e.g., a list of products in an order) and you need to ensure that the "product name" and "corresponding price" match within the same array element.
  • Solution: Define the field as a nested type and use the nested query.
json
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "term": { "comments.author": "John" }},
            { "term": { "comments.rating": 3 }}
          ]
        }
      }
    }
  }
}

Fault-tolerant search that allows for spelling errors.

  • Recommended Practice: For text fields, prioritize using the match query with the fuzziness parameter instead of using the fuzzy query directly, because match is processed by the analyzer, which better meets search requirements.

The worst performance; should be avoided as much as possible.

  • Anchor Limitations: Does not support ^ and $ anchor operators; the regular expression must match the entire string.
  • Special Characters: If you need to match special characters like #, you must use double backslashes \\ to escape them (e.g., \\#).

Change Log

  • 2025-11-04 Initial document created.