Table of contents

Scroll API


While a search request returns a less than 10,000 results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request. Scrolling is not intended for real time user requests, but rather for processing large amounts of data.

NOTE: The results that are returned from a scroll request reflect the state of the index at the time that the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update or delete) will only affect later search requests.

To search the metadata in the GrayMeta Platform, and get all matching items, make the following request:

POST /api/data/v3/search/scroll
	"query": `<query>`,
	"only": `<only>`,
	"filters": `<filter>`,
	"limit": `<limit>`,
	"scroll_keep_alive": `<keep_alive>`
  • query (string) - The text to search for - See the Full text search documentation for examples

  • only ([]string) - If the only fields are not provided then the results will return the entire item document, which can be prohibitively resource hungry. If you are working with larger datasets, it is advised to use the only fields to return just the fields that are of interest.

  • filters ([]filters) - The filters to use on search.

    • Couple of examples for setting time ranges as a filter: json "filters":{ "ranges":[ { "field":"last_harvested", "from":"now-1h", "to":"" } ] json "filters":{ "ranges":[ { "field":"last_harvested", "from":"2017-12-31T19:30:000.000Z", "to":"2017-12-31T19:45:00.000Z" } ]
  • limit (int) - Limit the number of results returned in a single request. Default: 1,000 Max: 10,000. Use the scroll_id to get the next set of results

  • scroll_keep_alive (string) - The scroll parameter (passed to the search request and to every scroll request) tells Elasticsearch how long it should keep the search context alive. Its value (e.g. the default is 5m, see Time units) does not need to be long enough to process all data — it just needs to be long enough to process the previous batch of results. Each scroll request (with the scroll parameter) sets a new expiry time.

The result from the above request includes a scroll_id, which should be passed to the scroll API in order to retrieve the next batch of results.

POST /api/data/v3/search/scroll
    "scroll_keep_alive" : "5m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" 

NOTE: The initial search request and each subsequent scroll request each return a scroll_id. While the scroll_id may change between requests, it doesn’t always change — in any case, only the most recently received _scroll_id should be used.

When you hit the end of the scroll results the hits will be null and scroll_id will be blank

	"total": 0,
	"hits": null,
	"scroll_id": ""

Scroll search response

A typical scroll search response looks like this:

	"total": 1,
	"hits": [
			"_score": null,
			"_index": "metadata",
			"_type": "items",
			"_id": "1c706dbce6a933bd00f86412f488574c",
			"_uid": "",
			"_routing": "1c706dbce6a933bd00f86412f488574c",
			"_parent": "",
			"_version": null,
			"sort": [
			"highlight": null,
			"_source": {
				"item_id": "1c706dbce6a933bd00f86412f488574c",
				"name": "536112966.jpeg"
			"fields": null,
			"_explanation": null,
			"matched_queries": null,
			"inner_hits": null,
			"_nested": null
  • total (int) - The number of hits found.
  • hits (array) - Array of result objects (see _source fields section below).
  • scroll_id (string) - The scroll_id to get the next set of results. Blank means you are end of results.

_source fields

The _source objects contain document of Item.

  • _id - (string) Unique Item ID
  • last_modified - (timestamp) When the item was last modified
  • location_id - (string) The ID of the Location where this Item was found
  • location_kind - (string) The Location Kind (see the Location Kinds API for more information) of the Location where this Item was found
  • mime_type - (string) MIME type for the item
  • name - (string) Name of the item (usually filename)
  • stow_container_id - (string) Stow Container ID of where this Item was found
  • stow_container_name - (string) Name of the Stow Container where this Item was found
  • stow_url - (string) The Stow URL of this Item.

This documentation is generated from the latest version of GrayMeta Platform. For documentation relevant to your own deployed version, please use the documentation inside the application.