Aggregations calculate statistical summaries about the entire set of search results. They are applied after the full text query and any filters.
They allow you to group the unique elements for a specific field into buckets.
For example, you can group all items by their file extension (ext
field) and get a summary
similar to:
{
"aggregations": {
"terms": {
"file.extension": {
"buckets": [
{
"key": "mp4",
"count": 202
},
{
"key": "csv",
"count": 5
},
{
"key": "mov",
"count": 2
},
{
"key": "pptx",
"count": 6
},
{
"key": "xlsx",
"count": 7
},
{
"key": "xml",
"count": 22
}
],
"othersCount": 36
}
}
}
}
The above example indicates that, after applying any filters, there are 202
items
with mp4
as their extension, 5
with csv
, and so on.
To request an aggregation, add the aggregations
field to the search request object.
{
"aggregations": {
"terms": [{terms}],
"metrics": [{metrics}],
"histogram": [{histogram}]
}
}
metrics
- (array of objects) Requests statistical information about a fieldterms
- (array of objects) Aggregates the frequency of specified termshistogram
- (array of objects) Generates histogram data for a given fieldAggregation results are provided in the response object under the aggregations
object.
See below for specific examples.
When requesting an aggregation, you provide a name
field. The Search API returns the result
of the aggregation along with this name, allowing you to find the specific aggregation.
metrics
)Metrics aggregations request statistical information about a field.
{
"name": "{name}",
"field": "{field}",
"type": "{type}"
}
name
- (string) User defined name of the aggregationfield
- (string) The field to apply the aggregation ontype
- (string) The type of the metric aggregation to performThe type
field of a metric aggregation indicates what kind of metric you are interested in.
Valid types are:
min
- The minimum value of the result setmax
- The maximum value of the result setavg
- The mean average value of the result setsum
- The sum of the values of the result setcardinality
- The number of unique instances of values from a result setTo get the minimum, maximum and average file sizes of items from a query, you would make the following request:
POST /api/data/search
{
"query": "{query}",
"aggregations": {
"metrics": [
{"type": "min", "field": "file.size", "name": "min(file.size)"},
{"type": "max", "field": "file.size", "name": "max(file.size)"},
{"type": "avg", "field": "file.size", "name": "avg(file.size)"}
]
}
}
The results will be in the aggregations
object of the response, keyed by the name
:
{
//... other search items
"aggregations": {
"metrics": {
"min(file.size)": 0,
"max(file.size)": 100,
"avg(file.size)": 50
}
}
}
terms
)Terms aggregations allow you to get the frequency of different terms within a specific field.
{
"name": "{name}",
"field": "{field}",
"size": "{integer}"
}
name
- (string) User defined name of the aggregationfield
- (string) The field on which to apply the aggregationsize
- (integer) Number of buckets that will be returned (by default 10)The results are provided as a set of buckets, one for each term, along with their count.
Only the most popular terms will be included in the results, and the Search API intelligently
decides what is relevant. A special othersCount
property indicates an approximate value for
the remaining items that are not included in any buckets.
To get a breakdown of all countries, you could ask for the terms from the geocoding.country_code.raw
field:
POST /api/data/search
{
"query": "{query}",
"aggregations": {
"terms": [
{"name": "countries", "field": "geocoding.country_code.raw"}
]
}
}
The results might be something like this:
{
//... other search items
"aggregations": {
"terms": {
"countries": {
"buckets": [
{"key":"UK", "count":200},
{"key":"USA", "count":100},
{"key":"Germany", "count":50}
],
"othersCount": 10
}
}
}
}
buckets
- (array of objects) The top terms (key
) along with their frequency (count
)key
- (string) The term valuecount
- (int) The number of items that contain that termothersCount
- (int) Number of items that were not included in any buckethistograms
) - BETAHistogram aggregations request statistic information about the frequency of items for a given interval.
{
"name": "{name}",
"field": "{field}",
"interval": {interval},
"min_count": {min_count}
}
name
- (string) User defined name of the aggregationfield
- (string) The field on which to apply the aggregationinterval
- (int) The fixed size of each bucket over the valuesmin_count
- (int) The minimum number of items that must appear within an interval in order to be considered significant enough to returnTo get the frequency of audio files within specific peak level intervals, we would make the following request:
POST /api/data/search
{
"query": "{query}",
"aggregations": {
"histograms": [
{
"name": "peak",
"field": "audiopeak.true_peak_dbfs",
"interval": 1,
"min_count": 1
}
]
}
}
The results will group the items into the values specified:
{
//... other search items
"aggregations": {
"histograms": {
"peak": {
"buckets": [
{"key":-3, "count":2},
{"key":-2, "count":4},
{"key":-1, "count":7},
{"key":0, "count":1},
{"key":1, "count":2},
{"key":2, "count":4},
{"key":3, "count":9}
]
}
}
}
}
peak
- (string) The name
field passed in with the requestbuckets
- (array of objects) The buckets that describe the results of the histogramkey
- (int) The lower value of the range (items must be within key
to key+interval
to be counted)count
- (int) The number of items that fit into the intervalThis documentation is generated from the latest version of GrayMeta Platform. For documentation relevant to your own deployed version, please use the documentation inside the application.
© 2021 GrayMeta, Inc.