Aggregations

Aggregations calculate statistical summaries about the entire set of search results. They are applied after the full text query and any filters.

They allow you to group the unique elements for a specific field into buckets.

Example

For example, you can group all items by their file extension (ext field) and get a summary similar to:

{
	"aggregations": {
		"terms": {
			"file.extension": {
				"buckets": [
					{
						"key": "mp4",
						"count": 202
					},
					{
						"key": "csv",
						"count": 5
					},
					{
						"key": "mov",
						"count": 2
					},
					{
						"key": "pptx",
						"count": 6
					},
					{
						"key": "xlsx",
						"count": 7
					},
					{
						"key": "xml",
						"count": 22
					}
				],
				"othersCount": 36
			}
		}
	}
}

The above example indicates that, after applying any filters, there are 202 items with mp4 as their extension, 5 with csv, and so on.

Requesting aggregations

To request an aggregation, add the aggregations field to the search request object.

{
	"aggregations": {
		"terms": [{terms}],
		"metrics": [{metrics}],
		"histogram": [{histogram}]
	}
}

metrics - (array of objects) Requests statistical information about a field
terms - (array of objects) Aggregates the frequency of specified terms
histogram - (array of objects) Generates histogram data for a given field

Aggregation results

Aggregation results are provided in the response object under the aggregations object.

See below for specific examples.

Aggregation names

When requesting an aggregation, you provide a name field. The Search API returns the result of the aggregation along with this name, allowing you to find the specific aggregation.

Metrics aggregations (`metrics`)

Metrics aggregations request statistical information about a field.

{
	"name": "{name}",
	"field": "{field}",
	"type": "{type}"
}

name - (string) User defined name of the aggregation
field - (string) The field to apply the aggregation on
type - (string) The type of the metric aggregation to perform

Metric aggregation types

The type field of a metric aggregation indicates what kind of metric you are interested in.

Valid types are:

min - The minimum value of the result set
max - The maximum value of the result set
avg - The mean average value of the result set
sum - The sum of the values of the result set
cardinality - The number of unique instances of values from a result set

Metric aggregation example

To get the minimum, maximum and average file sizes of items from a query, you would make the following request:

POST /api/data/search
{
	"query": "{query}",
    "aggregations": {
		"metrics": [
			{"type": "min", "field": "file.size", "name": "min(file.size)"},
			{"type": "max", "field": "file.size", "name": "max(file.size)"},
			{"type": "avg", "field": "file.size", "name": "avg(file.size)"}
		]
	}
}

The results will be in the aggregations object of the response, keyed by the name:

{
	//... other search items
	"aggregations": {
		"metrics": {
			"min(file.size)": 0,
			"max(file.size)": 100,
			"avg(file.size)": 50
		}
	}
}

Terms aggregations (`terms`)

Terms aggregations allow you to get the frequency of different terms within a specific field.

{
	"name": "{name}",
	"field": "{field}",
	"size": "{integer}"
}

name - (string) User defined name of the aggregation
field - (string) The field on which to apply the aggregation
size - (integer) Number of buckets that will be returned (by default 10)

The results are provided as a set of buckets, one for each term, along with their count.

Only the most popular terms will be included in the results, and the Search API intelligently decides what is relevant. A special othersCount property indicates an approximate value for the remaining items that are not included in any buckets.

Terms aggregation example

To get a breakdown of all countries, you could ask for the terms from the geocoding.country_code.raw field:

POST /api/data/search
{
	"query": "{query}",
    "aggregations": {
		"terms": [
			{"name": "countries", "field": "geocoding.country_code.raw"}
		]
	}
}

The results might be something like this:

{
	//... other search items
	"aggregations": {
		"terms": {
			"countries": {
				"buckets": [
					{"key":"UK", "count":200},
					{"key":"USA", "count":100},
					{"key":"Germany", "count":50}
				],
				"othersCount": 10
			}
		}
	}
}

buckets - (array of objects) The top terms (key) along with their frequency (count)
key - (string) The term value
count - (int) The number of items that contain that term
othersCount - (int) Number of items that were not included in any bucket

Histogram aggregations (`histograms`) - BETA

Histogram aggregations request statistic information about the frequency of items for a given interval.

{
	"name": "{name}",
	"field": "{field}",
	"interval": {interval},
	"min_count": {min_count}
}

name - (string) User defined name of the aggregation
field - (string) The field on which to apply the aggregation
interval - (int) The fixed size of each bucket over the values
min_count - (int) The minimum number of items that must appear within an interval in order to be considered significant enough to return

Histogram aggregation example

To get the frequency of audio files within specific peak level intervals, we would make the following request:

POST /api/data/search
{
	"query": "{query}",
    "aggregations": {
		"histograms": [
			{
				"name": "peak",
				"field": "audiopeak.true_peak_dbfs",
				"interval": 1,
				"min_count": 1
			}
		]
	}
}

The results will group the items into the values specified:

{
	//... other search items
	"aggregations": {
		"histograms": {
			"peak": {
				"buckets": [
					{"key":-3, "count":2},
					{"key":-2, "count":4},
					{"key":-1, "count":7},
					{"key":0, "count":1},
					{"key":1, "count":2},
					{"key":2, "count":4},
					{"key":3, "count":9}
				]
			}
		}
	}
}

peak - (string) The name field passed in with the request
buckets - (array of objects) The buckets that describe the results of the histogram
key - (int) The lower value of the range (items must be within key to key+interval to be counted)
count - (int) The number of items that fit into the interval

This documentation is generated from the latest version of GrayMeta Platform. For documentation relevant to your own deployed version, please use the documentation inside the application.

Table of contents