While a search request returns a less than 10,000 results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request. Scrolling is not intended for real time user requests, but rather for processing large amounts of data.
NOTE: The results that are returned from a scroll request reflect the state of the index at the time that the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update or delete) will only affect later search requests.
The Scroll API works in two stages, with the first call creating the Scroll ID and showing you the first set of results. The second call will then show you the remaining results in increments of 10k.
POST /api/data/v3/search/scroll
{
"query": "{query}",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "categories.raw",
"value": "{search term}"
}
]},
"limit": 10000,
"scroll_keep_alive": "5m"
}
query
- (string) The text to search for - See the Full text search documentation for examples
only
- ([]string) If the only fields are not provided then the results will return the entire item document, which can be prohibitively resource hungry. If you are working with larger datasets, it is advised to use the only fields to return just the fields that are of interest.
filters
- ([]filters) The filters to use on search.
json
"filters":{
"ranges":[
{
"field":"last_harvested",
"from":"now-1h",
"to":""
}
]
json
"filters":{
"ranges":[
{
"field":"last_harvested",
"from":"2017-12-31T19:30:000.000Z",
"to":"2017-12-31T19:45:00.000Z"
}
]
limit
- (int) Limit the number of results returned in a single request. Default: 1,000 Max: 10,000. Use the scroll_id to get the next set of results
scroll_keep_alive
- (string) The scroll parameter (passed to the search request and to every scroll request) tells Elasticsearch how long it should keep the search context alive. Its value (e.g. the default is 5m, see Time units) does not need to be long enough to process all data — it just needs to be long enough to process the previous batch of results. Each scroll request (with the scroll parameter) sets a new expiry time.
NOTE: The initial search request and each subsequent scroll request each return a scroll_id
. While
the scroll_id
may change between requests, it does not always change but only the most recently
received scroll_id
should be used.
When you hit the end of the scroll results the hits
will be null and scroll_id
will be blank.
A typical scroll search response looks like this:
{
"total": 1,
"hits": [
{
"_score": null,
"_index": "metadata",
"_type": "items",
"_id": "1c706dbce6a933bd00f86412f488574c",
"_uid": "",
"_routing": "1c706dbce6a933bd00f86412f488574c",
"_parent": "",
"_version": null,
"sort": [
0
],
"highlight": null,
"_source": {
"item_id": "1c706dbce6a933bd00f86412f488574c",
"name": "536112966.jpeg"
},
"fields": null,
"_explanation": null,
"matched_queries": null,
"inner_hits": null,
"_nested": null
}
],
"scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAABFnkwQ2xvd1Z2Um9TVHB3aEI4Q0xwTEEAAAAAAAAAAhZ5MENsb3dWdlJvU1Rwd2hCOENMcExBAAAAAAAAAAUWeTBDbG93VnZSb1NUcHdoQjhDTHBMQQAAAAAAAAADFnkwQ2xvd1Z2Um9TVHB3aEI4Q0xwTEEAAAAAAAAABBZ5MENsb3dWdlJvU1Rwd2hCOENMcExB"
}
total
(int) - The number of hits found.hits
(array) - Array of result objects (see _source
fields section below).scroll_id
(string) - The scroll_id
to get the net set of results. Blank means you are at the end of results.The _source
objects contain document of Item.
_id
- (string) Unique Item IDlast_modified
- (timestamp) When the item was last modifiedlocation_id
- (string) The ID of the Location where this Item was foundlocation_kind
- (string) The Location Kind see the Location Kinds API for more information of the Location where this Item was foundmime_type
- (string) MIME type for the itemname
- (string) Name of the item (usually filename)stow_container_id
- (string) Stow Container ID of where this Item was foundstow_container_name
- (string) Name of the Stow Container where this Item was foundstow_url
- (string) The Stow URL of this Item.Included on the last line of your results is the Scroll_ID field(example above). This ID is what will allow you to see additional results (if there are any).
Once you have completed the first call, you must take your Scroll ID from the bottom of the results and use that in the second stage of the Scroll API.
POST /api/data/v3/search/scroll
{
"scroll_keep_alive" : "5m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}
Copy and paste the Scroll ID into the field above. Then execute the call.
The response will mirror the above, until you hit the end of the Scroll and consequentially the end of the results from the call, at which point you will see the following message:
{
"total": 0,
"hits": null,
"scroll_id": ""
}
Search Fields which users can filter on:
captions.captions.transcript.text
- (string) ID of the caption text you wish to search on i.e. hellocategories.raw
- (string) Category ID of where the Items are foundcontainer_id
- (string) Stow Container ID - The name of the storage containerfile_extension.raw
- (string) ID on the file format you wish to search on i.e. MP4file_size
- (number) File size using a ‘from’ and ‘to’ fields done in bytes. i.e. 1 gb would be 1073741824folder_path.raw
- (string) The folder path from storage locations such as Dropboxgeocoding.country_code.raw
- (string) location country code i.e. USgeocoding.admin_name1.raw
- (string) Region name i.e. Ohiogeocoding.admin_name2.raw
- (string) Place name i.e. Delawaregeocoding.place_name.raw
- (string) City name i.e. WestLakegm_item_type
- (string) Mime Type of the content i.e. Videoitem_id
- (string) The Item ID - will act as a basic item ID search.last_harvested
- (number) Date stamps using a “From“ and ”To“ fields, date stamp: 2020-04-30T00:00:00Zlocation_name.raw
- (string) Location Name you wish to search onlocation_kind.raw
- (string) The Location Kind see the Location Kinds API for more information of the Location where this Item was foundlocation_id
- (string) The Location IDlogos.logos.name.raw
- (string) The ID of the logo which you wish to detectname.raw
- (string) Filter on filenamensfw.is_adult_content
- (bolean) True or False value on the detection of NSFW contentnsfw.categories.category.raw
- (string) The ID of the NSFW Categoryocrs.text
- (string) ID of the OCR text you wish to search onpeople.name.raw
- (string) Filter on known / named people i.e. Barry Bertolet"query": "\"police car\""
- (string) Exact match on phrase within a search"query": "police OR car"
- (string) Or operator on phrase within a search"query": "police AND car"
- (string) All of these words on phrase within a searchspeech_to_text.transcripts.transcript.text
- (string) ID of the speech to text you wish to search on i.e. Hellostow_container_id
- (string) Stow Container ID of where this Item was foundtag.raw
- (string) The tag you wish to detect or search uponweather.currently.summary.raw
- (String) Weather filter based on available data i.e. Partly CloudyBelow are a example bodies utilizing some of the filter and query functionality, to help you better understand and build your calls with our API.
NOTE: .raw not required
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "categories.raw",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "captions.captions.transcript.text",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
NOTE: .raw not required
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "file_extension",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
NOTE: file size done in bytes e.g. 1 gb would be 1073741824 Searches are based on greater than or less than operators.
{
"query": "",
"only": ["item_id"],
"filters": {
"ranges": [
{
"field": "file_size",
"from": "1073741824",
"to": "8.988466e+307"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
{
"query": "",
"only": ["item_id"],
"filters": {
"ranges": [
{
"field": "file_size",
"from": "0",
"to": "1073741824"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
NOTE: Country Code is regional location such as US and .raw is required.
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "categories.raw",
"value": "geocoding.country_code.raw"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
NOTE: .raw is required and Filter is CASE sensitive (Capitalize the first letter!).
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "geocoding.admin_name1.raw",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
NOTE: .raw is required and Filter is CASE sensitive (Capitalize the first letter!).
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "geocoding.admin_name2.raw",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
Example Place Names: * Douglas * Cuyahoga * Delaware * Madison
NOTE: .raw is required and Filter is CASE sensitive (Capitalize the first letter!)
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "geocoding.place_name.raw",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "gm_item_type",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
Accepted Item Types:
{
"query": "",
"only": ["item_id"],
"filters": {
"ranges": [
{
"field": "last_harvested",
"from": "2020-04-30T00:00:00Z",
"to": "2020-05-15T00:00:00Z"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
NOTE: This pertains to the storage location name i.e. Adam’s S3 Bucket. .raw is required and Filter is CASE sensitive.
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "location_name.raw",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
.raw is not required and Filter is CASE sensitive (All letters should be lowercase).
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "location_kind.raw",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
Location Kinds (as required):
NOTE: Value is CASE sensitive and must match naming within Curio Exactly and .raw is required.
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "logos.logos.name.raw",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "nsfw.is_adult_content",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
Values:
NOTE: Value is CASE sensitive and requires an exact match. .raw is required. For this filter, values can be stacked with a simple coma separator i.e. "value": "Weapon Violence,Graphic Violence Or Gore“
. This will extend search results not filter them out.
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "nsfw.categories.category.raw",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
Example Categories:
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "ocrs.text",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
NOTE: Formatting has to match exactly as written in Curio. .raw is required.
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "people.name.raw",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
{
"query": "police car",
"only": ["item_id"],
"filters": {}
},
"limit": 10000,
"scroll_keep_alive": ""
}
{
"query": "\"police car\"",
"only": ["item_id"],
"filters": {}
},
"limit": 10000,
"scroll_keep_alive": ""
}
{
"query": "police OR car",
"only": ["item_id"],
"filters": {}
},
"limit": 10000,
"scroll_keep_alive": ""
}
{
"query": "police AND car",
"only": ["item_id"],
"filters": {}
},
"limit": 10000,
"scroll_keep_alive": ""
}
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "speech_to_text.transcripts.transcript.text",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
NOTE: container ID has to match formatting exactly.
Container ID can be found by in the Tech Information panel on the asset level under Stow Container ID.
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "stow_container_id",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "name.raw",
"value": "[insertsearchterm]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
Folder path is only applicable for storage which appears under ‘Cloud Storage’ such as Dropbox, box and OneDrive.
NOTE: to add additional folder levels add a ‘/’
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "folder_path.raw",
"value": "[insertfoldername]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "folder_path.raw",
"value": "[insertfoldername]/[insertsubfoldername]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
NOTE: The Location ID can be found in the ‘tech’ info segment on an asset or through the locations API.
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "location_id",
"value": "[insertlocationid]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "tag.raw",
"value": "car"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
When searching for multiple tags, repeat the Field and Value as seen below. This will work as an AND operator and reduce the number of results, not increase them.
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "tag.raw",
"value": "car"
},
{
"field": "tag.raw",
"value": "tree"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
NOTE: Value is CASE sensitive and requires an exact match. .raw is required.
{
"query": "",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "weather.currently.summary.raw",
"value": "[insertvalue]"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
Example Values:
All the fields above can be combined to create and advance search, leveraging the power of our platform to it’s fullest potential. An example of this would be the following:
{
"query": "her AND majesty",
"only": ["item_id"],
"filters": {
"multi_terms": [
{
"field": "speech_to_text.transcripts.transcript.text",
"value": "victoria"
},
{
"field": "people.name.raw",
"value": "Jenna Coleman"
},
{
"field": "nsfw.is_adult_content",
"value": "True"
}
],
"ranges": [
{
"field": "file_size",
"from": "10737418240",
"to": "8.988466e+307"
}
]},
"limit": 10000,
"scroll_keep_alive": ""
}
The above example is looking for content which:
This documentation is generated from the latest version of GrayMeta Platform. For documentation relevant to your own deployed version, please use the documentation inside the application.
© 2021 GrayMeta, Inc.