Table of contents

Items API

Once harvesting has begun, you will be able to access the item data using the Items API.

Item Object

The data that makes up an Item is outlined in the Item Object documentation.

Getting all metadata

To get all metadata of an item by its ID, you make the following request:

GET /api/data/items/{id}
  • {id} - (string) ID of the Item to get
  • Do not include the only parameter

The response will be a JSON document containing ALL metadata for the item.

Get the metadata.json file for an item

GET /files/{item_id}/metadata2.json

The response will be a JSON document containing the same data as the metadata.json file.

Selective data

The Items API allows you to be selective about what data you get. You may:

  • Use only parameter to get specific leaf fields, or
  • Use the include parameter to specify root objects to include

You cannot use both only and include parameters at the same time.

Getting specific leaf fields

To get only a list of specific fields, you may specify them using the only parameter.

GET /api/data/items/{id}?only=field1,field2,field3
  • field1,field2,field3 - (comma separated list) List of fields to include
  • Only leaf fields are supported, so you must know the full path to the fields to get
  • If you need to get entire groups of data, consider using the include parameter
  • This endpoint is extremely efficient and is preferred over include
  • For a complete list of acceptable fields, perform a GET request without any parameters to see the entire data payload

Getting specific groups of data

You can get groups of data at a time using the include parameter.

GET /api/data/items/{id}?include=obj1,obj2,obj3
  • obj1,obj2,obj3 - (comma separated list) List of fields to include
  • Only root fields are supported. If you wish to get only fields from within objects, use the only parameter instead
  • This endpoint is not as efficient as using the only parameter but is more convenient
  • For a complete list of acceptable fields, see the Item Object documentation

Large data

Some data is considered too large to go into the main item API, and such data is not returned unless you explicitly ask for it.

Speech to text

summary.speech_to_text data can be obtained by using the only parameter:

GET /api/data/items/{id}?only=summary.speech_to_text

speech_to_text data can be edited with

PUT | DELETE /api/data/items/{id}/speech_to_text
[
	{
		"source": "speech to text source",
		"transcript": [
			{
				"start_at": 0.5,
				"end_at": 4.5,
				"text": "edited text / text to delete"
			},
			...
		]
	},
	...
]

PUT will replace the speech to text lines, and DELETE will remove them.

Categories

An item could be associated with one or many categories. One can associate an item with categories using

POST /api/data/items/{id}/categories
{
	"categories": ["cat1", "cat2"]
}

To disassociate categories from an Item use

DELETE /api/data/items/{id}/categories/{categories}

where {categories} is a url-encoded, comma separated list of categories e.g. cat1,other%20category

Timelines

A timeline of continguous blocks of time where mature content is found. The response is separated by individual labels/identifiiers.

There are currently two types of content that can produce timelines, audio and mature-content.

Audio

GET /api/data/v3/items/{id}/timeline/audio

Response

{
	"audio": {
		"Speech": [
			{
				"start": 10,
				"end": 60
			},
			{
				"start": 120,
				"end": 130
			}
		],
		"explosion": [
			{
				"start": 3.014,
				"end": 4.56
			}
		]
	}
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Mature Content

GET /api/data/v3/items/{id}/timeline/mature-content

Response

{
	"mature_content": {
		"nudity": [
			{
				"start": 0.1,
				"end": 1.1
			}
		],
		"gore": [
			{
				"start": 0.0,
				"end": 3.5
			}
		]
	}
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Locations

GET /api/data/v3/items/{id}/timeline/locations

Response

{
	"locations": {
		"Rome": [
			{
				"start": 0.1,
				"end": 1.1
			},
			{
				"start": 13.0,
				"end": 15.5
			}
		],
		"Paris": [
			{
				"start": 2.1,
				"end": 4.3
			}
		]
	}
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Logos

GET /api/data/v3/items/{id}/timeline/logos

Response

{
	"logos": {
		"Pepsi": [
			{
				"start": 0.1,
				"end": 1.1
			},
			{
				"start": 13.0,
				"end": 15.5
			}
		],
		"GrayMeta": [
			{
				"start": 2.1,
				"end": 4.3
			}
		]
	}
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Identifying items

To determine the GrayMeta Platform ID and Stow URL for an item, you need to know the location ID, container ID and the identifier for the item itself within the container. For more information about item IDs, see the Stow project.

You can make the following request:

POST /api/control/item-id
{
	"location_id": "abc123",
	"container_id": "MyContainer",
	"item_id": "MyItem"
}
  • location_id - (string) The GrayMeta Platform location ID that indicates which storage location the item is in
  • container_id - (string) The container ID where the item is (usually bucket name)
  • item_id - (string) The identifier of the item (usually its name within the storage)

Provided the location, container and item values are all valid, you will be given the following response:

{
	"stow_url": "s3://unique/url/to/item",
	"gm_item_id": "67779468b22af637e2dd6a2616264b6c"
}
  • stow_url - (string) The Stow URL for the item
  • gm_item_id - (string) The internal GrayMeta Platform ID for this item

It is not necessary for the item to have been harvested in order for the ID to be returned, but once harvested you can trust that the ID will match the gm_item_id returned.

Once you have obtained the identifiers for an item, you can use them in the Harvest API.

Item Captions

Get a list of captions for an item

GET /api/data/v3/items/{id}/captions

Response

A successful call will return a Status OK (200) with the following response body:

{
  "captions": [
    {
      "id": "c57149e1f0b9387294e1f5efe6cb1ef0",
      "item_id": "71ab3889e1c559865ed6bce99b349d4f",
      "source": "captions",
      "language": {
        "code": "eng",
        "confidence": 1
      }
    }
  ]
}

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item Caption Text

Get a list of texts contained in an item caption along with possible NLP data

GET /api/data/v3/items/{id}/caption/{item-captions-id}?mask={mask}
  • item-captions-id - (string) The ID value contained in the results of a /captions request.
  • mask allows you to mask the embedded nlp data for a caption text, which may result in faster results. Set mask=nlp to remove nlp data from being provided.

Response

A successful call will return a Status OK (200) with the following response body:

{
  "caption": [
    {
      "id": "f27f081531c42d304c285dc7306f29e7",
      "item_captions_id": "c57149e1f0b9387294e1f5efe6cb1ef0",
      "start_at": 0.03,
      "end_at": 4.92,
      "text": "Mr. Jones will speak now",
      "nlp_properties": {
        "entities": [
          {
            "text": "Mr. Jones",
            "confidence": 0.9995918273925781,
            "type": "person"
          }
        ],
        "key_phrases": [
          {
            "text": "Mr. Jones",
            "confidence": 0.9994778037071228
          }
        ],
        "sentiment": {
          "text": "neutral",
          "sentiment_confidence": {
            "Mixed": 0.014400332234799862,
            "Negative": 0.10051420331001282,
            "Neutral": 0.8749107718467712,
            "Positive": 0.010174600407481194
          }
        },
        "language": {
          "language": "en",
          "confidence": 0.9737588763237
        }
      }
    }
  ]
}

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item Descriptions

Get descriptions for an item

GET /api/data/v3/items/{id}/descriptions?page-token={page_token}&start={start}&window={window}&all={all}
  • page_token is the next page token provided to page the results. When provided a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query params, the page_token will take precedence.
  • start is the time(video) or page(documents) to indicate where to start retrieving results for a given item. Has no affect on an img item.
  • window is the time(video) or page(documents) to indicate where to end retrieving results for a given item. Has no affect on an img item.
  • all will provide all entries for an item regardless of description data being present or not.

If none are set, the full collection will be returned without pagination. A valid page_token can be used without the addition of all, limit and offset.

Response

A successful call will return a Status OK (200) with the following response body:

For an asset that is an img:

{
    "contents": {
        "img": {
            "description": {
                "id": "b2fe0c8e0ee298eb07c3c5dce457c907",
                "item_id": "fe154badd7b2d78349c214938f27547c",
                "confidence": 0.443617173003186,
                "language": {
                         "language": "en-US",
                         "confidence": 0.8
                },
                "text": "a drawing of a face"
            }
        },
        "pages": null,
        "video_frames": null
    },
    "next_page": ""
}

For an asset that is a video:

{
    "contents": {
        "img": null,
        "pages": null,
        "video_frames": [
            {
                "description": {
                    "id": "d4acd831bf8d56e6a6e1fcb054228f29",
                    "item_id": "dddcbe782d810eb80f531361b5799a53",
                    "confidence": 0.7532465814725752,
                    "language": {
                    	"code": "en",
                    	"confidence": 0.96
                    },
                    "text": "a close up of a person"
                },
                "frame_id": "59ad92a0ed0d873de84d2cc2bd080898",
                "thumbnail_path": "video_main_frames/frame-0000000000.jpg",
                "time": 0
            },
            {
                "description": {
                    "id": "8a7f430ebb24d621021e2014ee05c6eb",
                    "item_id": "dddcbe782d810eb80f531361b5799a53",
                    "confidence": 0.2997128373277878,
                    "language": null,
                    "text": "a close up of a man with smoke coming out of it"
                },
                "frame_id": "677deeef7865c9e1b0bb497164aeca50",
                "thumbnail_path": "video_main_frames/frame-0000000001.jpg",
                "time": 2
            },
            {
                "description": {
                    "id": "ff31ffc38b95e420dfca366ef02b550d",
                    "item_id": "dddcbe782d810eb80f531361b5799a53",
                    "confidence": 0.8331186858981754,
                    "language": null,
                    "text": "a blurry image of smoke"
                },
                "frame_id": "b767a35a35e5a873a109f2d3b4df5ec2",
                "thumbnail_path": "video_main_frames/frame-0000000002.jpg",
                "time": 4
            }
        ]
    },
    "next_page": "NextPageTokenString"
}

For an asset that is a document:

{
    "contents": {
        "img": null,
        "pages": [
            {
                "images": [
                    {
                        "description": {
                            "id": "4c51651ee15c7ce414ae381bdc252622",
                            "item_id": "0ca4a8e17b66d3946f611621936896c1",
                            "confidence": 0.7192287180775676,
                             "language": {
								"code": "en",
								"confidence": 0.86
							},
                            "text": "a man standing in front of a mirror posing for the camera"
                        },
                        "image_id": "24235782fb645ba35c6410617f8c3527",
                        "image_index": 0,
                        "thumbnail_path": "document_pages/thumb-pg-00000-img-00000.png"
                    }
                ],
                "page": 0,
                "description": "optionally, a page can have a description as well, or it can be embedded in the images within the page",
				"thumbnail_path": "optionally, a page can have a thumbnail path as well as well as each embedded images thumbnail",
				"page_id": "a uuid to identify the page, optional, will be avilable when all query param is set"
            },
            {
                "images": [
                    {
                        "description": {
                            "id": "62a0967c75af0b1f8ba65edd7b287929",
                            "item_id": "0ca4a8e17b66d3946f611621936896c1",
                            "confidence": 0.9312026737395315,
                            "language": null,
                            "text": "Robb Wells, John Paul Tremblay that are looking at the camera"
                        },
                        "image_id": "5fd88f4121c499aa04b9f77fa59e7788",
                        "image_index": 0,
                        "thumbnail_path": "document_pages/thumb-pg-00001-img-00000.png"
                    }
                ],
                "page": 1
            }
        ],
        "video_frames": null
    },
    "next_page": "NextPageTokenString"
}

If start and window are provided, then the results may be paginated. The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned. If the item id is for an item that does not exist, then a 404 will be returned indicating the item is not found.

Curate a description for an item

To create a curated description you can post to the endpoint below with a valid request body.

POST /api/data/v3/items/{id}/descriptions
{
	"segment_index": float64,
	"image_index": int,
	"item_type": ENUM["image" | "video" | "document"],
	"text": string
}
  • item_type indicates the type of item
  • segment_index should be set to -1 if item is an image
  • image_index should be set to -1 if item is an image and -1 for a video
  • text must not be an empty string

Response

A successful call will return a Status Create (201) with the following response body that includes the related metadata data associated with that segment/image index:

{
	"description": {
		"id": string,
		"item_id": string,
		"confidence": float64,
		"language": string,
		"language_confidence": float64,
		"text": string,
	}
}

If there is a conflict with the segment and/or image index, you’ll recieve a Status Unprocessable Entity (422). If any unexpected errors happend in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Edit a description

To edit an existing descriptions text use the following request:

PATCH /api/data/v3/items/{id}/descriptions/{desc_id}

{
	"text": "new description text"
}
  • id is the item id
  • desc_id is the description id
  • text must not be an empty string

Response

A successful call with return a Status OK (200) with the new description after updating.

{
	"description": {
		"id": string,
		"item_id": string,
		"confidence": float64,
		"language": string,
		"language_confidence": float64,
		"text": string,
	}
}

If the description by the desc_id is not found, you’ll receive a Status Not Found (404). If any unexpected errors happened in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item OCRs

Get ocrs for an item

GET /api/data/v3/items/{id}/ocrs?page-token={page_token}&start={start}&window={window}&all={all}
  • page_token is the next page token provided to page the results. If page_token is set along with other query params, the page_token will take precedence.
  • start is the time(video) or page(documents) to indicate where to start retrieving results for a given item. Has no affect on an img item.
  • window is the time(video) or page(documents) to indicate where to end retrieving results for a given item. Has no affect on an img item.
  • all will provide all entries for an item regardless of ocr data being present or not.

If none are set, the full collection will be returned without pagination. A valid page_token can be used without the addition of all, limit and offset.

Response

A successful call will return a Status OK (200) with the following response body:

For an asset that is an img:

{
    "contents": {
        "img": {
            "ocrs": [
                {
                    "id": "0a4877c31bd094a32a124a1c5571f751",
                    "item_id": "c21b3e9fe1fbe09814d6a7bfdf5ba313",
                    "bounding_box": {
                        "top": 441,
                        "left": 220,
                        "width": 52,
                        "height": 16
                    },
                    "confidence": 0,
                    "language": null,
                    "text": "adidas",
                    "text_type": "lines"
                },
                {
                    "id": "1afb7fa88d193d5d9ae766da04e1517f",
                    "item_id": "c21b3e9fe1fbe09814d6a7bfdf5ba313",
                    "bounding_box": {
                        "top": 462,
                        "left": 295,
                        "width": 86,
                        "height": 15
                    },
                    "confidence": 0,
                    "language": null,
                    "text": "SNALMUNCH 2012",
                    "text_type": "lines"
                },
                {
                    "id": "f4ee60381a3cd69a0787848b71255b62",
                    "item_id": "c21b3e9fe1fbe09814d6a7bfdf5ba313",
                    "bounding_box": {
                        "top": 497,
                        "left": 222,
                        "width": 243,
                        "height": 53
                    },
                    "confidence": 0,
                    "language": null,
                    "text": "SAMSUNG",
                    "text_type": "lines"
                }
            ]
        },
        "pages": null,
        "video_frames": null
    },
    "next_page": ""
}

For an asset that is a video:

{
    "contents": {
        "img": null,
        "pages": null,
        "video_frames": [
            {
                "frame_id": "5248fd3acceab63163a5bcc5ddc15d62",
                "ocrs": [
                    {
                        "id": "e45aa09fd7d6121e474459e59e8a7d4c",
                        "item_id": "3b103330378acb3ad604045ba1f4aecd",
                        "bounding_box": {
                            "top": 12,
                            "left": 14,
                            "width": 169,
                            "height": 11
                        },
                        "confidence": 0.87,
                        "language": null,
                        "text": "HIT THAT LIKE BUTTON, NATION!",
                        "text_type": "lines"
                    }
                ],
                "thumbnail_path": "video_main_frames/frame-0000000001.jpg",
                "time": 2.002
            },
            {
                "frame_id": "c5501c083166450e9885782aac29fad8",
                "ocrs": [
                    {
                        "id": "7a5d590df38aa2494a1a854640ea6b47",
                        "item_id": "3b103330378acb3ad604045ba1f4aecd",
                        "bounding_box": {
                            "top": 19,
                            "left": 249,
                            "width": 33,
                            "height": 16
                        },
                        "confidence": 0,
                        "language": null,
                        "text": "BEA",
                        "text_type": "lines"
                    },
                    {
                        "id": "8485e7771e8f0ecdc871cf8856554be1",
                        "item_id": "3b103330378acb3ad604045ba1f4aecd",
                        "bounding_box": {
                            "top": 12,
                            "left": 14,
                            "width": 169,
                            "height": 11
                        },
                        "confidence": 0,
                        "language": null,
                        "text": "HIT THAT LIKE BUTTON, NATION!",
                        "text_type": "lines"
                    }
                ],
                "thumbnail_path": "video_main_frames/frame-0000000002.jpg",
                "time": 4.004
            }
        ]
    },
    "next_page": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAKP-CAwMDYWxsAAVzdGFydA0tMTExMTEwMC45OTk5BndpbmRvdwIxMAA="
}

For an asset that is a document:

{
    "contents": {
        "img": null,
        "pages": [
            {
                "images": [
                    {
                        "image_id": "daea649e947b430acceccc089f653c3f",
                        "image_index": 0,
                        "ocrs": [
                            {
                                "id": "095a5ca47bda4e0a9db562e23070125c",
                                "item_id": "43b89caeb2fd3c23f25b8431e644cabc",
                                "bounding_box": {
                                    "top": 15,
                                    "left": 522,
                                    "width": 217,
                                    "height": 24
                                },
                                "confidence": 0,
                                "language": null,
                                "text": "Pilgrim Programming, LLC",
                                "text_type": "lines"
                            }
                        ],
                        "thumbnail_path": "document_pages/thumb-pg-00000-img-00000.png"
                    }
                ],
                "page": 0,
                "ocrs": "optionally, a page can have OCR as well, or it can be embedded in the images within the page",
                "thumbnail_path": "optionally, a page can have a thumbnail path as well as well as each embedded images thumbnail",
                "page_id": "a uuid to identify the page, optional, will be avilable when all query param is set"
            },
            {
                "images": [
                    {
                        "image_id": "18061410280b001f26f34890d0da6b04",
                        "image_index": 0,
                        "ocrs": [
                            {
                                "id": "2b6d00e3d4d014267a011d64d50144ff",
                                "item_id": "43b89caeb2fd3c23f25b8431e644cabc",
                                "bounding_box": {
                                    "top": 1194,
                                    "left": 270,
                                    "width": 817,
                                    "height": 23
                                },
                                "confidence": 0,
                                "language": null,
                                "text": "There are no liens , claims or encumbrances which might conflict with or otherwise affect",
                                "text_type": "lines"
                            }
                        ],
                        "thumbnail_path": "document_pages/thumb-pg-00001-img-00000.png"
                    }
                ],              
                "page": 1,
            }
        ],
        "video_frames": null
    },
    "next_page": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAJ_-CAwMDYWxsAAVzdGFydA0tMTExMTEwOS45OTk5BndpbmRvdwExAA=="
}

If start and window are provided, then the results may be paginated. The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned. If the item id is for an item that does not exist, then a 404 will be returned indicating the item is not found.

Curate an ocr for an item

To create a curated ocr you can post to the endpoint below with a valid request body.

POST /api/data/v3/items/{id}/ocrs
{
	"segment_index": float64,
	"image_index": int,
	"item_type": ENUM["image" | "video" | "document"],
	"text": string
}
  • item_type indicates the type of item
  • segment_index should be set to -1 if item is an image
  • image_index should be set to -1 if item is an image, and -1 for a video
  • text must not be an empty string

Response

A successful call will return a Status Create (201) with the following response body that includes the related metadata data associated with that segment/image index:

{
	"ocr": {
		"id": string,
		"item_id": string,
		"bounding_box": {
			"top": int,
			"left": int,
			"width": int,
			"height": int
		},
		"confidence": float64,
		"language": string,
		"language_confidence": float64,
		"text": string,
		"text_type": string,
	}
}

If there is a conflict with the segment and/or image index, you’ll recieve a Status Unprocessable Entity (422). If any unexpected errors happend in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Edit an ocr

To edit an existing ocr text use the following request:

PATCH /api/data/v3/items/{id}/ocrs/{ocr_id}
{
	"text": "new ocr text"
}
  • id is the item id
  • desc_id is the description id
  • text must not be an empty string

Response

A successful call with return a Status OK (200) with the new ocr after updating.

{
	"ocr": {
		"id": string,
		"item_id": string,
		"bounding_box": {
			"top": int,
			"left": int,
			"width": int,
			"height": int
		},
		"confidence": float64,
		"language": string,
		"language_confidence": float64,
		"text": string,
		"text_type": string,
	}
}

If the ocr by the ocr_id is not found, you’ll receive a Status Not Found (404). If any unexpected errors happened in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item Speech to Texts

Get speech to texts for an item

GET /api/data/v3/items/{id}/speech-to-texts?page-token={page_token}&limit={limit}&offset={offset}&mask={mask}
  • page_token is the next page token provided to page the results.
  • limit is the limit to the number of transcripts returned. Limit has a maximum value of 1000 and a minimum of 1. Default of 100 is applied when no limit was set.
  • offset is the offset to start from for the records to be returned. Defaults to 0 when not set.
  • mask allows you to mask the embedded nlp data for a STT, which may result in faster results. Set mask=nlp to remove nlp data from being provided.

Response

A successful call will return a Status OK (200) with the following response body:

{
	"speech_to_texts":  [
	 	{
	 		"id": "transcript id",
	 		"item_id": "item id",
	 		"start_at": 1.3,
	 		"end_at": 4.1,
	 		"text": "transcript text",
	 		"language": "lang-en",
	 		"lanaguage_confidence": 0.92,
	 		"created_at": zulu timestamp,
	 		"updated_at": zulu timestamp,
	 		"nlp_properties": {
	 			"entities": [
	 				{
	 					"text": "entity text",
	 					"confidence": 0.64,
	 					"type": "entity type"
	 				}
	 			],
	 			"key_phrases": [
	 				{
	 					"text": "phrase text",
	 					"confidence": 0.7
	 				}
	 			] ,
	 			"sentiment": {
	 				"label": "sentiment label",
	 				"sentiment_confidence": {
	 					"negative": 0.3,
	 					"mixed": 0.11,
	 					various other keys available here, depends on the service provider that runs the data
	 				}
	 			},
	 			"language": {
	 			
	 			}
	 		}
	 	}
	],
	"next_page_token": "mxzybiji"
}

The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item Logos

Get logos for an item

GET /api/data/v3/items/{id}/logos?page-token={page_token}&start={start}&window={window}&all={all}
  • page_token is the next page token provided to page the results. When provided a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query params, the page_token will take precedence.
  • start is the time(video) or page(documents) to indicate where to start retrieving results for a given item. Has no affect on an img item.
  • window is the time(video) or page(documents) to indicate where to end retrieving results for a given item. Has no affect on an img item.
  • all will provide all entries for an item regardless of logo data being present or not.

If none are set, the full collection will be returned without pagination. A valid page_token can be used without the addition of all, limit and offset.

Response

A successful call will return a Status OK (200) with the following response body:

For an asset that is an img:

{
    "contents": {
        "img": {
            "logos": [
                {
                    "id": "8abdb55f697f0a28a8264f3f0a320d09",
                    "confidence": 0.851,
                    "name": "Adidas",
                    "bounding_box": {
                        "top": 1083,
                        "left": 236,
                        "width": 62,
                        "height": 88
                    }
                }
            ]
        }
    },
    "next_page_token": ""
}

For an asset that is a video:

{
    "contents": {
        "video_frames": [
            {
                "time": 4.004,
                "frame_id": "adb7e0d8787d2da396ee6e79ab80b0b0",
                "thumbnail": "video_main_frames/frame-0000000002.jpg",
                "logos": [
                    {
                        "id": "cb76d5633637c0d060026adaf87a2804",
                        "confidence": 0.8071,
                        "name": "eastern connecticut state university",
                        "bounding_box": {
                            "top": 7,
                            "left": 6,
                            "width": 180,
                            "height": 21
                        }
                    }
                ]
            },
            {
                "time": 8.008,
                "frame_id": "8e9ebfb5b7fa100a636fe1aeef01ea5f",
                "thumbnail": "video_main_frames/frame-0000000004.jpg",
                "logos": [
                    {
                        "id": "a7be4458334a717525a902ce4df2f358",
                        "confidence": 0.81195,
                        "name": "eastern connecticut state university",
                        "bounding_box": {
                            "top": 7,
                            "left": 5,
                            "width": 182,
                            "height": 21
                        }
                    }
                ]
            }
        ]
    },
    "next_page_token": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAIv-CAwMFc3RhcnQHMzYuMDAwMQZ3aW5kb3cCMTYDYWxsAAA="
}

For an asset that is a document:

{
    "contents": {
        "pages": [
            {
                "page": 22,
                "images": [
                    {
                        "image_index": 11,
                        "image_id": "7f6d4b027ba6cf303a3cd4108e99b866",
                        "thumbnail_path": "document_pages/thumb-pg-00022-img-00011.png",
                        "logos": [
                            {
                                "id": "72157a50ea72d788ca102171639a3f45",
                                "confidence": 0.8273,
                                "name": "misako",
                                "bounding_box": {
                                    "top": 0,
                                    "left": 306,
                                    "width": 1079,
                                    "height": 782
                                }
                            }
                        ]
                    }
                ]
            },
            {
                "page": 34,
                "images": [
                    {
                        "image_index": 0,
                        "image_id": "ebf1c28240735ad2b63a348a9f6671ee",
                        "thumbnail_path": "document_pages/thumb-pg-00034-img-00000.png",
                        "logos": [
                            {
                                "id": "628cd34b9b13f7b5a16d19d62923ce47",
                                "confidence": 0.81385,
                                "name": "colgate",
                                "bounding_box": {
                                    "top": 436,
                                    "left": 375,
                                    "width": 172,
                                    "height": 151
                                }
                            }
                        ]
                    }
                ]
            }
        ]
    },
    "next_page_token": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAIv-CAwMFc3RhcnQHMzYuMDAwMQZ3aW5kb3cCMTYDYWxsAAA="
}

The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item Tags

Get tags for an item

GET /api/data/v3/items/{id}/tags?page-token={page_token}&start={start}&window={window}&all={all}
  • page_token is the next page token provided to page the results. When provided a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query params, the page_token will take precedence.
  • start is the time(video) or page(documents) to indicate where to start retrieving results for a given item. Has no affect on an img item.
  • window is the time(video) or page(documents) to indicate where to end retrieving results for a given item. Has no affect on an img item.
  • all will provide all entries for an item regardless of tag data being present or not.

If none are set, the full collection will be returned without pagination. A valid page_token can be used without the addition of all, limit and offset.

Response

A successful call will return a Status OK (200) with the following response body:

For an asset that is an img:

{
    "contents": {
        "img": {
            "tags": [
                {
                    "id": "eec2accad1cc1b757f3035bd8253ac04",
                    "text": "window",
                    "confidence": 0.9015815854072571
                },
                {
                    "id": "e95fc00150dbdf08ec3eb2e75c638ac1",
                    "text": "stained glass",
                    "confidence": 0.9015815854072571
                },
                {
                    "id": "c06ee06a066475e692b8b56433ab371a",
                    "text": "light",
                    "confidence": 0.8380934019465514
                },
                {
                    "id": "7cfb5e8e12cdfaeb10086a5a9f693786",
                    "text": "sphere",
                    "confidence": 0.5156272603992966
                },
                {
                    "id": "ef689cbeb566845b46be786438cb8782",
                    "text": "church",
                    "confidence": 0.25103029243584873
                }
            ]
        },
        "pages": null,
        "video_frames": null
    },
    "next_page": ""
}

For an asset that is a video:

{
    "contents": {
        "img": null,
        "pages": null,
        "video_frames": [
            {
                "frame_id": "3a222b4f2e58529ce0dc321cf299dadb",
                "tags": [
                    {
                        "id": "b12f152bda07508c9a482d74fef7dac3",
                        "text": "summer",
                        "confidence": 0.312589002008962
                    },
                    {
                        "id": "a090dafce32110bde870b89055e75939",
                        "text": "autumn",
                        "confidence": 0.20671001655433419
                    }
                ],
                "thumbnail_path": "video_main_frames/frame-0000000000.jpg",
                "time": 0
            },
            {
                "frame_id": "6decb0045799aef3f95dbc6327a992fd",
                "tags": [
                    {
                        "id": "8153baacdde17515bb7d89aa09e10347",
                        "text": "firefighter",
                        "confidence": 0.9791649580001832
                    },
                    {
                        "id": "e5c846b2a75496779eb7c32b1f567a86",
                        "text": "person",
                        "confidence": 0.9791649580001831
                    },
                    {
                        "id": "6dcff09bdec2deb23d355d89e48a8f34",
                        "text": "smoke",
                        "confidence": 0.5069368303763367
                    }
                ],
                "thumbnail_path": "video_main_frames/frame-0000000001.jpg",
                "time": 2
            }
        ]
    },
    "next_page": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAJ_-CAwMDYWxsAAVzdGFydA0tMTExMTEwOS45OTk5BndpbmRvdwExAA=="
}

For an asset that is a document:

{
    "contents": {
        "img": null,
        "pages": [
            {
                "images": [
                    {
                        "image_id": "6ae8b0cd1a26070884f08dac7336a2c2",
                        "image_index": 0,
                        "tags": [
                            {
                                "id": "e6d5e65e71d93bbada112bfbcb6a5ed0",
                                "text": "person",
                                "confidence": 0.9971379041671753
                            },
                            {
                                "id": "5a322b094c87b670ea71bf8eea7cc7ed",
                                "text": "man",
                                "confidence": 0.9918940663337708
                            }
                        ],
                        "thumbnail_path": "document_pages/thumb-pg-00000-img-00000.png"
                    }
                ],
                "page": 0,
                "tags": "optionally, a page can have tags as well, or it can be embedded in the images within the page as shown above",
				"thumbnail_path": "optionally, a page can have a thumbnail path as well as well as each embedded images thumbnail above,
				"page_id": "a uuid to identify the page, optional, is guaranteed to be avilable when all query param is set"
            },
            {
                "images": [
                    {
                        "image_id": "b604734057d3462894d6c1e75a8517b3",
                        "image_index": 0,
                        "tags": [
                            {
                                "id": "2ca9919a848d662a254315f34de006ca",
                                "text": "standing",
                                "confidence": 0.8040973544120789
                            },
                            {
                                "id": "807f196d29b0094d558479d50fddbb74",
                                "text": "crowd",
                                "confidence": 0.0062681203708052635
                            }
                        ],
                        "thumbnail_path": "document_pages/thumb-pg-00001-img-00000.png"
                    }
                ],
                "page": 1
            }
        ],
        "video_frames": null
    },
    "next_page": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAJ_-CAwMDYWxsAAVzdGFydA0tMTExMTEwOS45OTk5BndpbmRvdwExAA=="
}

The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Add tags to an item

POST /api/data/v3/items/{id}/tags

Request body:

{
	"metadata_id": "metadata UUID",
	"tags": ["tag1", "tag2","tag3"]
}
  • "tags" is a list of tags you would like to add. All tags will be deduplicated before being applied to the segment.

Response

A successful call will return a Status Created (201) with the metadata segment (or segment parent) with new state of the segment, including any added tags that may have been added. Will look identical to the get teats response except the respose will only include the edited img/time-frame/page data.

Delete tags from a segment

DELETE /api/data/v3/items/{id}/tags?metaID={meta_id}&tagName={tag_name}
  • meta_id being the segment or image uuid, if an image metadata id is provided it will nuke the tagname from all sibling images under that segment.
  • tag_name is the tag to be deleted, only supports one tag to be deleted at a given time

Response

A successful call will return a Status No Content (204) with no response body.

Get contents for a document item

GET /api/data/v3/items/{id}/text-contents?page-token={page_token}&start={start}&window={window}&all={all}
  • page_token is the next page token provided to page the results. When provided a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query params, the page_token will take precedence.
  • start is the page(documents) to indicate where to start retrieving results for a given item. Has no affect on an img item.
  • window is the page(documents) to indicate where to end retrieving results for a given item. Has no affect on an img item.
  • all will provide all entries for an item regardless of text content data being present or not.

If none are set, the full collection will be returned without pagination. A valid page_token can be used without the addition of all, limit and offset.

Response

A successful call will return a Status OK (200) with the following response body:

For an asset that is a document:

{
    "contents": {
        "pages": [
            {
                "page": 0,
                "page_id": "1a15b610dd82bdfec2ee8ecf5597cc97",
                "thumbnail_path": "document_pages/thumb-pg-00000.png",
                "text_content": {
                    "id": "65a5d28b38228ebce12f8bab67e0f386",
                    "metadatas_id": "1a15b610dd82bdfec2ee8ecf5597cc97",
                    "text": "This is an example pdf\n\n\f",
                    "language": {
                    	"code": "en-US",
                    	"confidence": 0.7
                    }
                }
            },
            {
                "page": 1,
                "page_id": "1bf644159b1b1c1e3025df76f7c66110",
                "thumbnail_path": "document_pages/thumb-pg-00001.png",
                "text_content": {
                    "id": "62773d56d314a578f746a080460195a7",
                    "metadatas_id": "1bf644159b1b1c1e3025df76f7c66110",
                    "text": "This is page 2 of the example pdf\n\n\f",
                    "language": null
                }
            }
        ]
    },
    "next_page": ""
}

If start and window are provided, then the results may be paginated. The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned. If the item id is for an item that does not exist, then a 404 will be returned indicating the item is not found.

Item Extractor Runs

Get extractor runs for an item

GET /api/data/v3/items/{id}/extractors?run-type={run-type}
  • run-type is an enum from one of the following values: ** all: returns all the extractors run for an item in its lifetime ** latest: returns the extractors from the last run ** erroneous: returns the extractors that have errors registered from their last run ** unique: returns a list of unique extractors from the history of the item, the latest extractor run for each extractor type will be returned

Response

A successful call will return a Status OK (200) with the following response body:

{
	"extractors_runtime": [
		{
			"request_id": String,
			"err": String,
			"success": Bool,
			"skipped": Bool,
			"runtime": Integer duration in nanoseconds,
			"start_at": Zulu Timestamp,
			"end_at": Zulu Timestamp,
			"info": {
				"name": String,
				"version": Integer
			} 
		}
		...
	]
}

If the item is not found, a Status Not Found 404 will be returned. If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Extractor Source Files

Item Amazon Transcribe Source Word file Download

To download the source words for the Amazon Transcribe transcriptions you may retrieve it via the following endpoint.

GET /api/files/{item_id}/sourcefiles/amazon_transcribe.json

Response

A response is a list of words and punctuation that make up dictation. The below examples illecits what each type would look like in a source file.

{
    "words": [
        {
            "start_time": "0",
            "end_time": "0",
            "type": "punctuation",
            "alternatives": [
                {
                    "confidence": "0",
                    "content": "."
                }
            ]
        },
        {
			"start_time": "0.28",
			"end_time": "0.34",
			"type": "pronunciation",
			"alternatives": [
				{
					"confidence": "0.215",
					"content": "Yeah"
				}
			]
		},
		...
	]
}

This documentation is generated from the latest version of GrayMeta Platform. For documentation relevant to your own deployed version, please use the documentation inside the application.