Scroll API

Overview

While a search request returns a less than 10,000 results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request. Scrolling is not intended for real time user requests, but rather for processing large amounts of data.

NOTE: The results that are returned from a scroll request reflect the state of the index at the time that the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update or delete) will only affect later search requests.

The Scroll API works in two stages, with the first call creating the Scroll ID and showing you the first set of results. The second call will then show you the remaining results in increments of 10k.

First Call:

POST /api/data/v3/search/scroll
{
    "query": "{query}",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "categories.raw",
        "value": "{search term}"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": "5m"
}

query - (string) The text to search for - See the Full text search documentation for examples
only - ([]string) If the only fields are not provided then the results will return the entire item document, which can be prohibitively resource hungry. If you are working with larger datasets, it is advised to use the only fields to return just the fields that are of interest.
filters - ([]filters) The filters to use on search.
- Couple of examples for setting time ranges as a filter: json "filters":{ "ranges":[ { "field":"last_harvested", "from":"now-1h", "to":"" } ] json "filters":{ "ranges":[ { "field":"last_harvested", "from":"2017-12-31T19:30:000.000Z", "to":"2017-12-31T19:45:00.000Z" } ]
  * Additional examples can be found below.
limit - (int) Limit the number of results returned in a single request. Default: 1,000 Max: 10,000. Use the scroll_id to get the next set of results
scroll_keep_alive - (string) The scroll parameter (passed to the search request and to every scroll request) tells Elasticsearch how long it should keep the search context alive. Its value (e.g. the default is 5m, see Time units) does not need to be long enough to process all data — it just needs to be long enough to process the previous batch of results. Each scroll request (with the scroll parameter) sets a new expiry time.

NOTE: The initial search request and each subsequent scroll request each return a scroll_id. While the scroll_id may change between requests, it does not always change but only the most recently received scroll_id should be used.

When you hit the end of the scroll results the hits will be null and scroll_id will be blank.

Response

A typical scroll search response looks like this:

{
    "total": 1,
    "hits": [
        {
            "_score": null,
            "_index": "metadata",
            "_type": "items",
            "_id": "1c706dbce6a933bd00f86412f488574c",
            "_uid": "",
            "_routing": "1c706dbce6a933bd00f86412f488574c",
            "_parent": "",
            "_version": null,
            "sort": [
                0
            ],
            "highlight": null,
            "_source": {
                "item_id": "1c706dbce6a933bd00f86412f488574c",
                "name": "536112966.jpeg"
            },
            "fields": null,
            "_explanation": null,
            "matched_queries": null,
            "inner_hits": null,
            "_nested": null
        }
    ],
    "scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAABFnkwQ2xvd1Z2Um9TVHB3aEI4Q0xwTEEAAAAAAAAAAhZ5MENsb3dWdlJvU1Rwd2hCOENMcExBAAAAAAAAAAUWeTBDbG93VnZSb1NUcHdoQjhDTHBMQQAAAAAAAAADFnkwQ2xvd1Z2Um9TVHB3aEI4Q0xwTEEAAAAAAAAABBZ5MENsb3dWdlJvU1Rwd2hCOENMcExB"
}

total (int) - The number of hits found.
hits (array) - Array of result objects (see _source fields section below).
scroll_id (string) - The scroll_id to get the net set of results. Blank means you are at the end of results.

_source fields

The _source objects contain document of Item.

_id - (string) Unique Item ID
last_modified - (timestamp) When the item was last modified
location_id - (string) The ID of the Location where this Item was found
location_kind - (string) The Location Kind see the Location Kinds API for more information of the Location where this Item was found
mime_type - (string) MIME type for the item
name - (string) Name of the item (usually filename)
stow_container_id - (string) Stow Container ID of where this Item was found
stow_container_name - (string) Name of the Stow Container where this Item was found
stow_url - (string) The Stow URL of this Item.

Included on the last line of your results is the Scroll_ID field(example above). This ID is what will allow you to see additional results (if there are any).

Second Call:

Once you have completed the first call, you must take your Scroll ID from the bottom of the results and use that in the second stage of the Scroll API.

POST /api/data/v3/search/scroll
{
"scroll_keep_alive" : "5m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}

Copy and paste the Scroll ID into the field above. Then execute the call.

Response

The response will mirror the above, until you hit the end of the Scroll and consequentially the end of the results from the call, at which point you will see the following message:

{
	"total": 0,
	"hits": null,
	"scroll_id": ""
}

Search Fields Overview

Search Fields which users can filter on:

captions.captions.transcript.text - (string) ID of the caption text you wish to search on i.e. hello
categories.raw - (string) Category ID of where the Items are found
container_id - (string) Stow Container ID - The name of the storage container
file_extension.raw - (string) ID on the file format you wish to search on i.e. MP4
file_size - (number) File size using a ‘from’ and ‘to’ fields done in bytes. i.e. 1 gb would be 1073741824
folder_path.raw - (string) The folder path from storage locations such as Dropbox
geocoding.country_code.raw - (string) location country code i.e. US
geocoding.admin_name1.raw - (string) Region name i.e. Ohio
geocoding.admin_name2.raw - (string) Place name i.e. Delaware
geocoding.place_name.raw - (string) City name i.e. WestLake
gm_item_type - (string) Mime Type of the content i.e. Video
item_id - (string) The Item ID - will act as a basic item ID search.
last_harvested - (number) Date stamps using a “From“ and ”To“ fields, date stamp: 2020-04-30T00:00:00Z
location_name.raw - (string) Location Name you wish to search on
location_kind.raw - (string) The Location Kind see the Location Kinds API for more information of the Location where this Item was found
location_id - (string) The Location ID
logos.logos.name.raw - (string) The ID of the logo which you wish to detect
name.raw - (string) Filter on filename
nsfw.is_adult_content - (bolean) True or False value on the detection of NSFW content
nsfw.categories.category.raw - (string) The ID of the NSFW Category
ocrs.text - (string) ID of the OCR text you wish to search on
people.name.raw - (string) Filter on known / named people i.e. Barry Bertolet
"query": "\"police car\"" - (string) Exact match on phrase within a search
"query": "police OR car" - (string) Or operator on phrase within a search
"query": "police AND car" - (string) All of these words on phrase within a search
speech_to_text.transcripts.transcript.text - (string) ID of the speech to text you wish to search on i.e. Hello
stow_container_id - (string) Stow Container ID of where this Item was found
tag.raw - (string) The tag you wish to detect or search upon
weather.currently.summary.raw - (String) Weather filter based on available data i.e. Partly Cloudy

Body Examples

Below are a example bodies utilizing some of the filter and query functionality, to help you better understand and build your calls with our API.

Categories:

NOTE: .raw not required

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "categories.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Captions:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "captions.captions.transcript.text",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

File Extensions:

NOTE: .raw not required

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "file_extension",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

File Size:

NOTE: file size done in bytes e.g. 1 gb would be 1073741824 Searches are based on greater than or less than operators.

Greater Than:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "ranges": [
            {
        "field": "file_size",
        "from": "1073741824",
        "to": "8.988466e+307"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Less Than:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "ranges": [
            {
        "field": "file_size",
        "from": "0",
        "to": "1073741824"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

GeoCoding - Country Code:

NOTE: Country Code is regional location such as US and .raw is required.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "categories.raw",
        "value": "geocoding.country_code.raw"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

GeoCoding - Region Name:

NOTE: .raw is required and Filter is CASE sensitive (Capitalize the first letter!).

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "geocoding.admin_name1.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

GeoCoding - Places:

NOTE: .raw is required and Filter is CASE sensitive (Capitalize the first letter!).

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "geocoding.admin_name2.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Example Place Names: * Douglas * Cuyahoga * Delaware * Madison

GeoCoding - City:

NOTE: .raw is required and Filter is CASE sensitive (Capitalize the first letter!)

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "geocoding.place_name.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

GM Item Type

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "gm_item_type",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Accepted Item Types:

Video
Audio
Document
Caption
Text
Archive
Image/Raster
Image/Raw
Image/Vector
Unknown

Last Harvested:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "ranges": [
            {
        "field": "last_harvested",
        "from": "2020-04-30T00:00:00Z",
        "to": "2020-05-15T00:00:00Z"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Location Name:

NOTE: This pertains to the storage location name i.e. Adam’s S3 Bucket. .raw is required and Filter is CASE sensitive.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "location_name.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Location Kind:

.raw is not required and Filter is CASE sensitive (All letters should be lowercase).

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "location_kind.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Location Kinds (as required):

s3
dropbox
box
azure
slack
onedrive
googledrive
sharepoint
cloudian

Logos

NOTE: Value is CASE sensitive and must match naming within Curio Exactly and .raw is required.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "logos.logos.name.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

NSFW Filter:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "nsfw.is_adult_content",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Values:

True = 1
False = 0

NSFW Category:

NOTE: Value is CASE sensitive and requires an exact match. .raw is required. For this filter, values can be stacked with a simple coma separator i.e. "value": "Weapon Violence,Graphic Violence Or Gore“. This will extend search results not filter them out.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "nsfw.categories.category.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Example Categories:

Weapon Violence
Graphic Violence Or Gore
Sexual Activity
Female Swimwear Or Underwear
Nudity
Revealing Clothes
Drug Detection
Graphic Male Nudity
Physical Violence
Adult

OCR Text:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "ocrs.text",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

People - Name:

NOTE: Formatting has to match exactly as written in Curio. .raw is required.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "people.name.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Keyword Searches:

Basic Search:

{
    "query": "police car",
    "only": ["item_id"],
    "filters": {}
        },
    "limit": 10000,
    "scroll_keep_alive": ""
}

Exact Match Operator Search:

{
    "query": "\"police car\"",
    "only": ["item_id"],
    "filters": {}
        },
    "limit": 10000,
    "scroll_keep_alive": ""
}

Or Operator Search:

{
    "query": "police OR car",
    "only": ["item_id"],
    "filters": {}
        },
    "limit": 10000,
    "scroll_keep_alive": ""
}

And Operator Search:

{
    "query": "police AND car",
    "only": ["item_id"],
    "filters": {}
        },
    "limit": 10000,
    "scroll_keep_alive": ""
}

Speech to Text:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "speech_to_text.transcripts.transcript.text",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Stow Containter ID:

NOTE: container ID has to match formatting exactly.
Container ID can be found by in the Tech Information panel on the asset level under Stow Container ID.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "stow_container_id",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

File Name:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "name.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Folder Path:

Folder path is only applicable for storage which appears under ‘Cloud Storage’ such as Dropbox, box and OneDrive.
NOTE: to add additional folder levels add a ‘/’

Top Level:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "folder_path.raw",
        "value": "[insertfoldername]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Sub Folder Level:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "folder_path.raw",
        "value": "[insertfoldername]/[insertsubfoldername]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Location ID:

NOTE: The Location ID can be found in the ‘tech’ info segment on an asset or through the locations API.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "location_id",
        "value": "[insertlocationid]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Tags:

Single Tag Search:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
                "field": "tag.raw",
                "value": "car"
            }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Multiple Tags Search:

When searching for multiple tags, repeat the Field and Value as seen below. This will work as an AND operator and reduce the number of results, not increase them.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
                "field": "tag.raw",
                "value": "car"
            },
            {
                "field": "tag.raw",
                "value": "tree"
            }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Weather:

NOTE: Value is CASE sensitive and requires an exact match. .raw is required.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
                "field": "weather.currently.summary.raw",
                "value": "[insertvalue]"
            }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Example Values:

Clear
Partly Cloudy
Humid And Partly Cloudy
Foggy

Advance Scroll Search Example:

All the fields above can be combined to create and advance search, leveraging the power of our platform to it’s fullest potential. An example of this would be the following:

{
    "query": "her AND majesty",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
                "field": "speech_to_text.transcripts.transcript.text",
                "value": "victoria"
            },
            {
                "field": "people.name.raw",
                "value": "Jenna Coleman"
            },
            {
                "field": "nsfw.is_adult_content",
                "value": "True"
            }
        ],
        "ranges": [
            {
                "field": "file_size",
                "from": "10737418240",
                "to": "8.988466e+307"
            }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

The above example is looking for content which:

Has mention of both the keywords ‘Her’ and ‘Majesty’
Has speech to text surface the word ‘victoria’
Has the named / known person name ‘Jenna Coleman’
Is tagged has having detected NSFW content
Is greater than 10gb in size

This documentation is generated from the latest version of GrayMeta Platform. For documentation relevant to your own deployed version, please use the documentation inside the application.

Table of contents

Scroll API

Overview

First Call:

Response

_source fields

Second Call:

Response

Search Fields Overview

Body Examples

Categories:

Captions:

File Extensions:

File Size:

Greater Than:

Less Than:

GeoCoding - Country Code:

GeoCoding - Region Name:

GeoCoding - Places:

GeoCoding - City:

GM Item Type

Last Harvested:

Location Name:

Location Kind:

Logos

NSFW Filter:

NSFW Category:

OCR Text:

People - Name:

Keyword Searches:

Basic Search:

Exact Match Operator Search:

Or Operator Search:

And Operator Search:

Speech to Text:

Stow Containter ID:

File Name:

Folder Path:

Top Level:

Sub Folder Level:

Location ID:

Tags:

Single Tag Search:

Multiple Tags Search:

Weather:

Advance Scroll Search Example: