Table of contents

Scroll API

Overview

While a search request returns a less than 10,000 results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request. Scrolling is not intended for real time user requests, but rather for processing large amounts of data.

NOTE: The results that are returned from a scroll request reflect the state of the index at the time that the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update or delete) will only affect later search requests.

The Scroll API works in two stages, with the first call creating the Scroll ID and showing you the first set of results. The second call will then show you the remaining results in increments of 10k.

First Call:

POST /api/data/v3/search/scroll
{
    "query": "{query}",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "categories.raw",
        "value": "{search term}"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": "5m"
}
  • query - (string) The text to search for - See the Full text search documentation for examples

  • only - ([]string) If the only fields are not provided then the results will return the entire item document, which can be prohibitively resource hungry. If you are working with larger datasets, it is advised to use the only fields to return just the fields that are of interest.

  • filters - ([]filters) The filters to use on search.

    • Couple of examples for setting time ranges as a filter: json "filters":{ "ranges":[ { "field":"last_harvested", "from":"now-1h", "to":"" } ] json "filters":{ "ranges":[ { "field":"last_harvested", "from":"2017-12-31T19:30:000.000Z", "to":"2017-12-31T19:45:00.000Z" } ]
      * Additional examples can be found below.
  • limit - (int) Limit the number of results returned in a single request. Default: 1,000 Max: 10,000. Use the scroll_id to get the next set of results

  • scroll_keep_alive - (string) The scroll parameter (passed to the search request and to every scroll request) tells Elasticsearch how long it should keep the search context alive. Its value (e.g. the default is 5m, see Time units) does not need to be long enough to process all data — it just needs to be long enough to process the previous batch of results. Each scroll request (with the scroll parameter) sets a new expiry time.

NOTE: The initial search request and each subsequent scroll request each return a scroll_id. While the scroll_id may change between requests, it does not always change but only the most recently received scroll_id should be used.

When you hit the end of the scroll results the hits will be null and scroll_id will be blank.

Response

A typical scroll search response looks like this:

{
    "total": 1,
    "hits": [
        {
            "_score": null,
            "_index": "metadata",
            "_type": "items",
            "_id": "1c706dbce6a933bd00f86412f488574c",
            "_uid": "",
            "_routing": "1c706dbce6a933bd00f86412f488574c",
            "_parent": "",
            "_version": null,
            "sort": [
                0
            ],
            "highlight": null,
            "_source": {
                "item_id": "1c706dbce6a933bd00f86412f488574c",
                "name": "536112966.jpeg"
            },
            "fields": null,
            "_explanation": null,
            "matched_queries": null,
            "inner_hits": null,
            "_nested": null
        }
    ],
    "scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAABFnkwQ2xvd1Z2Um9TVHB3aEI4Q0xwTEEAAAAAAAAAAhZ5MENsb3dWdlJvU1Rwd2hCOENMcExBAAAAAAAAAAUWeTBDbG93VnZSb1NUcHdoQjhDTHBMQQAAAAAAAAADFnkwQ2xvd1Z2Um9TVHB3aEI4Q0xwTEEAAAAAAAAABBZ5MENsb3dWdlJvU1Rwd2hCOENMcExB"
}
  • total (int) - The number of hits found.
  • hits (array) - Array of result objects (see _source fields section below).
  • scroll_id (string) - The scroll_id to get the net set of results. Blank means you are at the end of results.

_source fields

The _source objects contain document of Item.

  • _id - (string) Unique Item ID
  • last_modified - (timestamp) When the item was last modified
  • location_id - (string) The ID of the Location where this Item was found
  • location_kind - (string) The Location Kind see the Location Kinds API for more information of the Location where this Item was found
  • mime_type - (string) MIME type for the item
  • name - (string) Name of the item (usually filename)
  • stow_container_id - (string) Stow Container ID of where this Item was found
  • stow_container_name - (string) Name of the Stow Container where this Item was found
  • stow_url - (string) The Stow URL of this Item.

Included on the last line of your results is the Scroll_ID field(example above). This ID is what will allow you to see additional results (if there are any).

Second Call:

Once you have completed the first call, you must take your Scroll ID from the bottom of the results and use that in the second stage of the Scroll API.

POST /api/data/v3/search/scroll
{
"scroll_keep_alive" : "5m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}

Copy and paste the Scroll ID into the field above. Then execute the call.

Response

The response will mirror the above, until you hit the end of the Scroll and consequentially the end of the results from the call, at which point you will see the following message:

{
	"total": 0,
	"hits": null,
	"scroll_id": ""
}

Search Fields Overview

Search Fields which users can filter on:

  • captions.captions.transcript.text - (string) ID of the caption text you wish to search on i.e. hello
  • categories.raw - (string) Category ID of where the Items are found
  • container_id - (string) Stow Container ID - The name of the storage container
  • file_extension.raw - (string) ID on the file format you wish to search on i.e. MP4
  • file_size - (number) File size using a ‘from’ and ‘to’ fields done in bytes. i.e. 1 gb would be 1073741824
  • folder_path.raw - (string) The folder path from storage locations such as Dropbox
  • geocoding.country_code.raw - (string) location country code i.e. US
  • geocoding.admin_name1.raw - (string) Region name i.e. Ohio
  • geocoding.admin_name2.raw - (string) Place name i.e. Delaware
  • geocoding.place_name.raw - (string) City name i.e. WestLake
  • gm_item_type - (string) Mime Type of the content i.e. Video
  • item_id - (string) The Item ID - will act as a basic item ID search.
  • last_harvested - (number) Date stamps using a “From“ and ”To“ fields, date stamp: 2020-04-30T00:00:00Z
  • location_name.raw - (string) Location Name you wish to search on
  • location_kind.raw - (string) The Location Kind see the Location Kinds API for more information of the Location where this Item was found
  • location_id - (string) The Location ID
  • logos.logos.name.raw - (string) The ID of the logo which you wish to detect
  • name.raw - (string) Filter on filename
  • nsfw.is_adult_content - (bolean) True or False value on the detection of NSFW content
  • nsfw.categories.category.raw - (string) The ID of the NSFW Category
  • ocrs.text - (string) ID of the OCR text you wish to search on
  • people.name.raw - (string) Filter on known / named people i.e. Barry Bertolet
  • "query": "\"police car\"" - (string) Exact match on phrase within a search
  • "query": "police OR car" - (string) Or operator on phrase within a search
  • "query": "police AND car" - (string) All of these words on phrase within a search
  • speech_to_text.transcripts.transcript.text - (string) ID of the speech to text you wish to search on i.e. Hello
  • stow_container_id - (string) Stow Container ID of where this Item was found
  • tag.raw - (string) The tag you wish to detect or search upon
  • weather.currently.summary.raw - (String) Weather filter based on available data i.e. Partly Cloudy

Body Examples

Below are a example bodies utilizing some of the filter and query functionality, to help you better understand and build your calls with our API.

Categories:

NOTE: .raw not required

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "categories.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Captions:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "captions.captions.transcript.text",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

File Extensions:

NOTE: .raw not required

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "file_extension",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

File Size:

NOTE: file size done in bytes e.g. 1 gb would be 1073741824 Searches are based on greater than or less than operators.

Greater Than:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "ranges": [
            {
        "field": "file_size",
        "from": "1073741824",
        "to": "8.988466e+307"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Less Than:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "ranges": [
            {
        "field": "file_size",
        "from": "0",
        "to": "1073741824"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

GeoCoding - Country Code:

NOTE: Country Code is regional location such as US and .raw is required.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "categories.raw",
        "value": "geocoding.country_code.raw"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

GeoCoding - Region Name:

NOTE: .raw is required and Filter is CASE sensitive (Capitalize the first letter!).

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "geocoding.admin_name1.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

GeoCoding - Places:

NOTE: .raw is required and Filter is CASE sensitive (Capitalize the first letter!).

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "geocoding.admin_name2.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Example Place Names: * Douglas * Cuyahoga * Delaware * Madison

GeoCoding - City:

NOTE: .raw is required and Filter is CASE sensitive (Capitalize the first letter!)

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "geocoding.place_name.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

GM Item Type

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "gm_item_type",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Accepted Item Types:

  • Video
  • Audio
  • Document
  • Caption
  • Text
  • Archive
  • Image/Raster
  • Image/Raw
  • Image/Vector
  • Unknown

Last Harvested:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "ranges": [
            {
        "field": "last_harvested",
        "from": "2020-04-30T00:00:00Z",
        "to": "2020-05-15T00:00:00Z"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Location Name:

NOTE: This pertains to the storage location name i.e. Adam’s S3 Bucket. .raw is required and Filter is CASE sensitive.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "location_name.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Location Kind:

.raw is not required and Filter is CASE sensitive (All letters should be lowercase).

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "location_kind.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Location Kinds (as required):

  • s3
  • dropbox
  • box
  • azure
  • slack
  • onedrive
  • googledrive
  • sharepoint
  • cloudian

Logos

NOTE: Value is CASE sensitive and must match naming within Curio Exactly and .raw is required.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "logos.logos.name.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

NSFW Filter:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "nsfw.is_adult_content",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Values:

  • True = 1
  • False = 0

NSFW Category:

NOTE: Value is CASE sensitive and requires an exact match. .raw is required. For this filter, values can be stacked with a simple coma separator i.e. "value": "Weapon Violence,Graphic Violence Or Gore“. This will extend search results not filter them out.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "nsfw.categories.category.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Example Categories:

  • Weapon Violence
  • Graphic Violence Or Gore
  • Sexual Activity
  • Female Swimwear Or Underwear
  • Nudity
  • Revealing Clothes
  • Drug Detection
  • Graphic Male Nudity
  • Physical Violence
  • Adult

OCR Text:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "ocrs.text",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

People - Name:

NOTE: Formatting has to match exactly as written in Curio. .raw is required.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "people.name.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Keyword Searches:

Basic Search:

{
    "query": "police car",
    "only": ["item_id"],
    "filters": {}
        },
    "limit": 10000,
    "scroll_keep_alive": ""
}

Exact Match Operator Search:

{
    "query": "\"police car\"",
    "only": ["item_id"],
    "filters": {}
        },
    "limit": 10000,
    "scroll_keep_alive": ""
}

Or Operator Search:

{
    "query": "police OR car",
    "only": ["item_id"],
    "filters": {}
        },
    "limit": 10000,
    "scroll_keep_alive": ""
}

And Operator Search:

{
    "query": "police AND car",
    "only": ["item_id"],
    "filters": {}
        },
    "limit": 10000,
    "scroll_keep_alive": ""
}

Speech to Text:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "speech_to_text.transcripts.transcript.text",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Stow Containter ID:

NOTE: container ID has to match formatting exactly.
Container ID can be found by in the Tech Information panel on the asset level under Stow Container ID.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "stow_container_id",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

File Name:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "name.raw",
        "value": "[insertsearchterm]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Folder Path:

Folder path is only applicable for storage which appears under ‘Cloud Storage’ such as Dropbox, box and OneDrive.
NOTE: to add additional folder levels add a ‘/’

Top Level:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "folder_path.raw",
        "value": "[insertfoldername]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Sub Folder Level:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "folder_path.raw",
        "value": "[insertfoldername]/[insertsubfoldername]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Location ID:

NOTE: The Location ID can be found in the ‘tech’ info segment on an asset or through the locations API.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
        "field": "location_id",
        "value": "[insertlocationid]"
      }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Tags:

Single Tag Search:

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
                "field": "tag.raw",
                "value": "car"
            }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Multiple Tags Search:

When searching for multiple tags, repeat the Field and Value as seen below. This will work as an AND operator and reduce the number of results, not increase them.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
                "field": "tag.raw",
                "value": "car"
            },
            {
                "field": "tag.raw",
                "value": "tree"
            }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Weather:

NOTE: Value is CASE sensitive and requires an exact match. .raw is required.

{
    "query": "",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
                "field": "weather.currently.summary.raw",
                "value": "[insertvalue]"
            }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

Example Values:

  • Clear
  • Partly Cloudy
  • Humid And Partly Cloudy
  • Foggy

Advance Scroll Search Example:

All the fields above can be combined to create and advance search, leveraging the power of our platform to it’s fullest potential. An example of this would be the following:

{
    "query": "her AND majesty",
    "only": ["item_id"],
    "filters": {
        "multi_terms": [
            {
                "field": "speech_to_text.transcripts.transcript.text",
                "value": "victoria"
            },
            {
                "field": "people.name.raw",
                "value": "Jenna Coleman"
            },
            {
                "field": "nsfw.is_adult_content",
                "value": "True"
            }
        ],
        "ranges": [
            {
                "field": "file_size",
                "from": "10737418240",
                "to": "8.988466e+307"
            }
        ]},
    "limit": 10000,
    "scroll_keep_alive": ""
}

The above example is looking for content which:

  • Has mention of both the keywords ‘Her’ and ‘Majesty’
  • Has speech to text surface the word ‘victoria’
  • Has the named / known person name ‘Jenna Coleman’
  • Is tagged has having detected NSFW content
  • Is greater than 10gb in size

This documentation is generated from the latest version of GrayMeta Platform. For documentation relevant to your own deployed version, please use the documentation inside the application.