Table of contents

Items API

Once harvesting has begun, you will be able to access the item data using the Items API.

Item Object

The data that makes up an Item is outlined in the Item Object documentation.

Getting item without metadata

GET /api/data/v3/items/{id}
  • {id} - (string) ID of the Item to get

Response

{
    "item": {
        "id": "f66082c13a7f3a10ebf89405433bb80f",
        "location_id": "5c7ec14434f90e13219e3ece821dce55",
        "container_id": "da971705615c7d57583a57fa170ab696",
        "file_size": 15327,
        "etag": "cff1e0c414d74cbcc436e1502f61cc8f",
        "file_extension": "jpg",
        "path": "",
        "file_path": "",
        "folder_path": "",
        "gm_item_type": "video",
        "hash_c4id": "c42YSyLSKjGuuqoUQPiM4qHeYs5CBNhmt5DDoAbzdxxEprrQwD6YmAi3FurRK9tS2kgj5msCq8rUqk95YWeAqwB7CM",
        "hash_md5": "cff1e0c414d74cbcc436e1502f61cc8f",
        "hash_sha1": "06fdba756542993f0070f4a863eceeb1bcefda33",
        "hash_sha512": "630fa3947ed13bb6cdc4a303dba65eb4edfbf66b7ba2670bbb14a744419e0ab101106d1a5a901f1e0d4f876142585b316f1e1fb461db9758500e2e26b677bc85",
        "harvester_version": "2.0.3134",
        "last_harvested": "2019-06-05T19:46:30.176133Z",
        "last_modified": "2019-04-16T13:44:34Z",
        "location_kind": "local",
        "location_name": "local",
        "mime_type": "video/mp4",
        "mime_category": "video",
        "name": "tears.mp4",
        "parent_id": "",
        "root_id": "0eed8099520a60a2bd3701655f0fbe81",
        "segment_interval": 2,
        "shared_link": "",
        "stow_container_id": "/data/videos",
        "stow_container_name": "",
        "stow_url": "s3://https://s3-us-west-2.amazonaws.com/3item/black_kid.jpg",
        "thumbnail": {
            "path": "thumbnailer/sprite.jpg",
            "type": "sprite",
            "frame_count": 30,
            "height": 152,
            "width": 270
        },
        "stow_metadata": [
            {
                "name": "mtime",
                "value": "2019-07-02T21:57:41Z"
            },
            {
                "name": "mode",
                "value": "644"
            },
            {
                "name": "name",
                "value": "Jeff_with_location.JPG"
            },
            ...
        ],
        "stow_tags": [],
        "drm": false,
        "created_at": "2019-06-05T19:46:30.266769Z",
        "updated_at": "2019-06-05T19:46:30.266769Z",
        "in_progress": false,
       	"preview": {
			"path": "fb3b37ce2a06c3aba49e07c7eb87acae/video_previews/preview.mp4",
			"mime_type": "video/mp4"
		},
        "duration": 734167
    }
}

Status codes: - 200 (success) - 404 (item not found) - 500 (unexpected error)

Search Within Item

GET /api/data/v3/search/item/{item_id}?q={query}
  • {item_id} - (string) ID of the Item to search within
  • {query} - (string) the query string

Response

The response is an object of histogram/timlines consisting of continugous chunks of where the query value shows up within the item. All fields that do not have matches will not be populated.

{
	"advertising": {
		"histogram": [
			{
				"start": 2,
				"end": 4
			},
			...
			{
				"start": 80,
				"end": 88
			}
		]
	},
	"audio_classification": {
		"histogram": [
			{
				"start": 0,
				"end": 4
			},
			...
			{
				"start": 60,
				"end": 72
			}
		]
	},
	"caption": {
		"histogram": [
			{
				"start": 0.03,
				"end": 4.92
			},
			...
			{
				"start": 207.355,
				"end": 211.436
			}
		]
	},
	"description": {
		"histogram": [
			{
				"start": 8,
				"end": 20
			},
			...
			{
				"start": 110,
				"end": 122
			}
		]
	},
	"location": {
		"histogram": [
			{
				"start": 10,
				"end": 14
			}
		]
	},
	"logo": {
		"histogram": [
			{
				"start": 20,
				"end": 34
			}
		]
	},
	"keyword": {
		"histogram": [
			{
				"start": 2,
				"end": 134
			}
		]
	},
	"mature_content": {
		"histogram": [
			{
				"start": 30,
				"end": 34
			}
		]
	},
	"ocr": {
		"histogram": [
			{
				"start": 16,
				"end": 46
			},
			...
			{
				"start": 112,
				"end": 122
			}
		]
	},
	"people": {
		"Kim Ryan": [
			{
				"start": 8,
				"end": 14
			},
			...
			{
				"start": 118,
				"end": 122
			}
		],
		"Kim Smith": [
			{
				"start": 68,
				"end": 72
			}
		]
	},
	"sound": {
		"histogram": [
			{
				"start": 16,
				"end": 46
			},
			...
			{
				"start": 112,
				"end": 122
			}
		]
	},
	"speech_to_text": {
		"histogram": [
			{
				"start": 0.1,
				"end": 50.53
			},
			...
			{
				"start": 110.32,
				"end": 130.08
			}
		]
	},
	"sport": {
		"histogram": [
			{
				"start": 100,
				"end": 164
			}
		]
	},
	"tag": {
		"histogram": [
			{
				"start": 64,
				"end": 66
			}
		]
	},
	"text_content": {
		"histogram": [
			{
				"start": 55,
				"end": 101
			}
		]
	}
}

A successful response will return a Status OK (200) and if an unexpected error occurs a Status Internal Server Error (500) will be returned.

Getting all metadata

To get all metadata of an item by its ID, you make the following request:

GET /api/data/items/{id}
  • {id} - (string) ID of the Item to get
  • Do not include the only parameter

The response will be a JSON document containing ALL metadata for the item.

Get the metadata.json file for an item

GET /files/{item_id}/metadata2.json

The response will be a JSON document containing the same data as the metadata.json file.

Selective data

The Items API allows you to be selective about what data you get. You may:

  • Use only parameter to get specific leaf fields, or
  • Use the include parameter to specify root objects to include

You cannot use both only and include parameters at the same time.

Getting specific leaf fields

To get only a list of specific fields, you may specify them using the only parameter.

GET /api/data/items/{id}?only=field1,field2,field3
  • field1,field2,field3 - (comma separated list) List of fields to include
  • Only leaf fields are supported, so you must know the full path to the fields to get
  • If you need to get entire groups of data, consider using the include parameter
  • This endpoint is extremely efficient and is preferred over include
  • For a complete list of acceptable fields, perform a GET request without any parameters to see the entire data payload

Getting specific groups of data

You can get groups of data at a time using the include parameter.

GET /api/data/items/{id}?include=obj1,obj2,obj3
  • obj1,obj2,obj3 - (comma separated list) List of fields to include
  • Only root fields are supported. If you wish to get only fields from within objects, use the only parameter instead
  • This endpoint is not as efficient as using the only parameter but is more convenient
  • For a complete list of acceptable fields, see the Item Object documentation

Categories

An item could be associated with one or many categories. One can associate an item with categories using

POST /api/data/items/{id}/categories
{
	"categories": ["cat1", "cat2"]
}

To disassociate categories from an Item use

DELETE /api/data/items/{id}/categories/{categories}

where {categories} is a url-encoded, comma separated list of categories e.g. cat1,other%20category

Timelines

A timeline of continguous blocks of time where data is found. The response is separated by individual labels/identifiiers.

Note: A Status Not Found (404) will be returned if {id} cannot be found.

Technical Cues

The technical cues endpoint is a wrapper for all the technical spanning metadata for a video.

GET /api/data/v3/items/{id}/timeline/technical-cues

Response

{
    "technical_cues": {
        "black_frames": {
            "histogram": [
                {
                    "start": 0,
                    "end": 0.3003
                },
                {
                    "start": 3.6036,
                    "end": 4.1041
                },
                {
                    "start": 224.257,
                    "end": 234.835
                }
            ]
        },
        "color_bars": {
            "histogram": [
                {
                    "start": 32,
                    "end": 54
                }
            ]
        },
        "credits": {
            "histogram": [
                {
                    "start": 2,
                    "end": 4
                },
                {
                    "start": 32,
                    "end": 54
                },
                {
                    "start": 70,
                    "end": 74
                },
                {
                    "start": 100,
                    "end": 104
                },
                {
                    "start": 194,
                    "end": 206
                },
                {
                    "start": 208,
                    "end": 210
                }
            ]
        },
        "digital_slates": {
            "histogram": [
                {
                    "start": 32,
                    "end": 54
                }
            ]
        },
        "silence": {
            "histogram": [
                {
                    "start": -0.0247619,
                    "end": 3.85451
                },
                {
                    "start": 224.373,
                    "end": 234.985
                }
            ]
        },
        "slates": {
            "all": [
                {
                    "start": 32,
                    "end": 34
                },
                {
                    "start": 40,
                    "end": 44
                },
                {
                    "start": 50,
                    "end": 52
                }
            ]
        },
        "start_end": {
            "histogram": [
                {
                    "start": 3.85451,
                    "end": 224.257
                }
            ]
        },
        "textless": {
            "histogram": []
        }
    }
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Audio

GET /api/data/v3/items/{id}/timeline/audio

Response

{
	"audio": {
		"Speech": [
			{
				"start": 10,
				"end": 60
			},
			{
				"start": 120,
				"end": 130
			}
		],
		"explosion": [
			{
				"start": 3.014,
				"end": 4.56
			}
		]
	}
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Color Bars

GET /api/data/v3/items/{id}/timeline/color-bars

Response

{
	"color_bars": [
     	{
     		"start": 0.1,
     		"end": 1.1
     	}
 	]
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Black Frames

GET /api/data/v3/items/{id}/timeline/black-frames

Response

{
	"black_frames": [
		{
			"start": 0.1,
			"end": 1.1,
			"start_frame": 2,
			"end_frame": 12
		},
		{
			"start": 0.0,
			"end": 3.5,
			"start_frame": 1,
			"end_frame": 106
		}
		...
	]
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Credits

GET /api/data/v3/items/{id}/timeline/credits

Response

{
	"credits": [
     	{
     		"start": 0.1,
     		"end": 1.1
     	}
 	]
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Digital Slates

GET /api/data/v3/items/{id}/timeline/digital-slates

Response

{
	"digital_slates": [
     	{
     		"start": 0.1,
     		"end": 1.1
     	}
 	]
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Insights

Deprecated. Use the new (GET /api/data/v3/items/{item_id}/insights/{insight_group_id}) instead.

GET /api/data/keywords/hits/{item_id}

Response

{
    "insights": [
        {
            "group_name": "Supplier",
            "color": "#4DD0E1",
            "words": [
                "content delivery",
                "exclusive",
                "hollywood",
                "payments",
                "pepsi",
                "price increase",
                "term",
                "termination"
            ],
            "matches": [
                {
                    "type": "captions",
                    "timeline": [
                        {
                            "start_at": 30.03,
                            "end_at": 44.97,
                            "count": 1
                        }
                    ],
                    "source": "2minuteVideo.srt"
                },
                {
                    "type": "captions",
                    "timeline": [
                        {
                            "start_at": 30.03,
                            "end_at": 44.97,
                            "count": 1
                        }
                    ],
                    "source": "2minuteVideo.srt"
                },
                {
                    "type": "captions",
                    "timeline": [
                        {
                            "start_at": 30.03,
                            "end_at": 44.97,
                            "count": 1
                        }
                    ],
                    "source": "2minuteVideo.srt"
                }
            ]
        }
    ]
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Mature Content

GET /api/data/v3/items/{id}/timeline/mature-content

Response

{
	"mature_content": {
		"nudity": [
			{
				"start": 0.1,
				"end": 1.1
			}
		],
		"gore": [
			{
				"start": 0.0,
				"end": 3.5
			}
		]
	}
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Locations

GET /api/data/v3/items/{id}/timeline/locations

Response

{
	"locations": {
		"Rome": [
			{
				"start": 0.1,
				"end": 1.1
			},
			{
				"start": 13.0,
				"end": 15.5
			}
		],
		"Paris": [
			{
				"start": 2.1,
				"end": 4.3
			}
		]
	}
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Logos

GET /api/data/v3/items/{id}/timeline/logos

Response

{
	"logos": {
		"Pepsi": [
			{
				"start": 0.1,
				"end": 1.1
			},
			{
				"start": 13.0,
				"end": 15.5
			}
		],
		"GrayMeta": [
			{
				"start": 2.1,
				"end": 4.3
			}
		]
	}
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Slates

GET /api/data/v3/items/{id}/timeline/slates

Response

{
	"slates": {
    	"all": [
        	{
        		"start": 0,
        		"end": 23
        	},
        	...
        ]
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Sports

GET /api/data/v3/items/{id}/timeline/sports

Response

{
	"sport_events": {
		"soccer": {
			"penalties": [
				{
					"start": -0.001338,
					"end": 3.998662
				}
			],
			"shots on goal": [
				{
					"start": 4.998662,
					"end": 4.998662
				}
			]
		}
	}
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Silence

GET /api/data/v3/items/{id}/timeline/silence

Response

{
	"silence": {
		"histogram": [
			{
				"start": 3.83129,
				"end": 23.308867,
				"start_frame": 114,
				"end_frame": 699,
			}
		]
	}
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Start End

GET /api/data/v3/items/{id}/timeline/start-end

Response

{
	"start_end": {
		"histogram": [
			{
				"start": 3.83129,
				"start_frame": 230,
				"end": 224.308867,
				"end_frame": 13458
			}
		]
	}
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Textless Material

GET /api/data/v3/items/{id}/timeline/textless

Response

{
	"textless": {
		"histogram": [
			{
				"start": 3.83129,
				"end": 23.308867
			}
		]
	}
}

Texted

GET /api/data/v3/items/{id}/timeline/texted

Response

{
	"texted": [
     	{
     		"start": 0.1,
     		"end": 1.1
     	}
 	]
}

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

A successful call will return a Status OK (200). If any unexpected errors happen in the process of fulfilling the response a Status Internal Server Error (500) will be returned.

Technical Metadata

Technical metadata is item encoded within files that we were able to extract directly. We have made this information available, either, via specific-type or batch APIs.

To get all technical metadata found within a given file, use this API.

Note: The response may not include all fields (audio_info, audio_peak, exiv2, geocoding, media_info, and pdf), depending on file type/contents. This applies to the fields within, as well. Different types of files with different contents will not always have all information we check for.

Request

GET /api/data/v3/items/{id}/technical
  • id - (string) The identifier of the item.

Response

{
    "audio_info": {
        "streams": [
            {
                "avg_frame_rate": "0/0",
                "bit_rate": "192000",
                "bits_per_sample": 0,
                "channel_layout": "stereo",
                "channels": 2,
                "codec_long_name": "AAC (Advanced Audio Coding)",
                "codec_name": "aac",
                "codec_tag": "0x000f",
                "codec_tag_string": "[15][0][0][0]",
                "codec_time_base": "1/48000",
                "codec_type": "audio",
                "index": 5,
                "r_frame_rate": "0/0",
                "sample_fmt": "fltp",
                "sample_rate": "48000",
                "start_pts": 725280,
                "start_time": "8.058667",
                "time_base": "1/90000",
                "disposition": {
                    "attached_pic": 0,
                    "clean_effects": 0,
                    "comment": 0,
                    "default": 0,
                    "dub": 0,
                    "forced": 0,
                    "hearing_impaired": 0,
                    "karaoke": 0,
                    "lyrics": 0,
                    "original": 0,
                    "timed_thumbnails": 0,
                    "visual_impaired": 0
                },
                "tags": {
                    "encoder": "",
                    "language": "eng",
                    "title": ""
                }
            },
            {
                "avg_frame_rate": "0/0",
                "bit_rate": "192000",
                "bits_per_sample": 0,
                "channel_layout": "stereo",
                "channels": 2,
                "codec_long_name": "AAC (Advanced Audio Coding)",
                "codec_name": "aac",
                "codec_tag": "0x000f",
                "codec_tag_string": "[15][0][0][0]",
                "codec_time_base": "1/48000",
                "codec_type": "audio",
                "index": 6,
                "r_frame_rate": "0/0",
                "sample_fmt": "fltp",
                "sample_rate": "48000",
                "start_pts": 725280,
                "start_time": "8.058667",
                "time_base": "1/90000",
                "disposition": {
                    "attached_pic": 0,
                    "clean_effects": 0,
                    "comment": 0,
                    "default": 0,
                    "dub": 0,
                    "forced": 0,
                    "hearing_impaired": 0,
                    "karaoke": 0,
                    "lyrics": 0,
                    "original": 0,
                    "timed_thumbnails": 0,
                    "visual_impaired": 0
                },
                "tags": {
                    "encoder": "",
                    "language": "spa",
                    "title": ""
                }
            }
        ]
    },
    "audio_peak": {
        "integrated_loudness": {
            "i_lufs": -29.1,
            "threshold_lufs": -39.1
        },
        "loudness_range": {
            "lra_lu": 0.1,
            "threshold_lufs": -49.1,
            "lra_low_lufs": -29.1,
            "lra_high_lufs": -29
        },
        "true_peak_dbfs": -19.2
    },
    "exiv2": {
        "normalized": {
            "resolution_x": 1024,
            "resolution_y": 680,
            "format": "image/jpeg",
            "photo": {
                "exif_version": "48 50 50 49",
                "color_space": 1,
                "pixel_x_dimension": 1024,
                "pixel_y_dimension": 680
            },
            "application2": {},
            "image": {
                "image_width": 1024,
                "image_length": 680,
                "bits_per_sample": "8 8 8",
                "photometric_interpretation": 2,
                "orientation": 1,
                "samples_per_pixel": 3,
                "x_resolution": "720000/10000",
                "y_resolution": "720000/10000",
                "resolution_unit": 2,
                "software": "Adobe Photoshop CC 2018 (Macintosh)",
                "date_time": "2018:10:12 13:43:31",
                "exif_tag": 236
            },
            "xmp": {
                "create_date": "2018-10-12T13:34:17-07:00",
                "modify_date": "2018-10-12T13:43:31-07:00",
                "metadata_date": "2018-10-12T13:43:31-07:00"
            }
        }
    },
    "geocoding": {
        "place_name": "Los Angeles",
        "country_code": "US",
        "admin_name1": "California",
        "admin_name2": "Los Angeles"
    },
    "media_info": {
        "general": {
            "audio_codecs": "AAC LC / AAC LC / AAC LC / AAC LC / AAC LC / AAC LC",
            "audio_format_list": "AAC LC / AAC LC / AAC LC / AAC LC / AAC LC / AAC LC",
            "audio_format_with_hint_list": "AAC LC / AAC LC / AAC LC / AAC LC / AAC LC / AAC LC",
            "audio_language_list": "English /  / English / Spanish / English / Spanish",
            "codec": "MPEG-TS",
            "codecs_video": "AVC",
            "commercial_name": "MPEG-TS",
            "complete_name": "/tmp/d89a7c3bdaa3cdf23420cc4e905349f1.ts",
            "count": 333,
            "count_of_audio_streams": 6,
            "count_of_stream_of_this_kind": 1,
            "count_of_video_streams": 1,
            "duration": 6034,
            "duration_time": "00:00:06.035 (00:00:06;00)",
            "file_extension": "ts",
            "file_name": "d89a7c3bdaa3cdf23420cc4e905349f1",
            "file_size": 4012484,
            "folder_name": "/tmp",
            "format": "MPEG-TS",
            "format_extensions_usually_used": "ts m2t m2s m4t m4s tmf ts tp trp ty",
            "frame_count": 360,
            "frame_rate": 59.94,
            "internet_media_type": "video/MP2T",
            "kind_of_stream": "General",
            "overall_bit_rate": 5083438,
            "overall_bit_rate_mode": "VBR",
            "video_format_list": "AVC",
            "video_format_with_hint_list": "AVC"
        },
        "audio": {
            "bit_rate": 112000,
            "bit_rate_mode": "CBR",
            "channels": 2,
            "codec": "MPEG Audio",
            "commercial_name": "MPEG Audio",
            "compression_mode": "Lossy",
            "count": 277,
            "count_of_stream_of_this_kind": 1,
            "duration": 89808,
            "duration_time": "00:01:30:18",
            "format": "MPEG Audio",
            "format_profile": "Layer 3",
            "frame_count": 3438,
            "kind_of_stream": "Audio",
            "proportion_of_this_stream": 0.99971,
            "samples_count": 3960576,
            "sampling_rate": 44100,
            "stream_size": 1257325
        },
        "video": {
            "bit_depth": 8,
            "bits_pixel_frame": 0.072,
            "chroma_subsampling": "4:2:0",
            "codec": "AVC",
            "codec_id": "27",
            "color_range": "Limited",
            "color_space": "YUV",
            "colour_description_present": "Yes",
            "commercial_name": "AVC",
            "count": 377,
            "count_of_stream_of_this_kind": 1,
            "display_aspect_ratio": 1.778,
            "duration": 6006,
            "duration_time": "00:00:06.006 (00:00:06;00)",
            "format": "AVC",
            "format_info": "Advanced Video Codec",
            "format_profile": "High@L4.1",
            "format_settings": "CABAC / 2 Ref Frames",
            "format_settings_cabac": "Yes",
            "format_url": "http://developers.videolan.org/x264.html",
            "frame_count": 360,
            "frame_rate": 59.94,
            "height": 720,
            "id": 481,
            "internet_media_type": "video/H264",
            "kind_of_stream": "Video",
            "pixel_aspect_ratio": 1,
            "scan_type": "Progressive",
            "stream_order": "0-0",
            "width": 1280
        },
        "image": {
            "bit_depth": 8,
            "chroma_subsampling": "4:4:4",
            "codec": "JPEG",
            "color_space": "YUV",
            "commercial_name": "JPEG",
            "compression_mode": "Lossy",
            "count": 125,
            "count_of_stream_of_this_kind": 1,
            "format": "JPEG",
            "height": 680,
            "internet_media_type": "image/jpeg",
            "kind_of_stream": "Image",
            "proportion_of_this_stream": 1,
            "stream_size": 424430,
            "width": 1024
        }
    },
    "pdf": {
        "title": "Microsoft Word - Backup4all_network_backup_solution.doc",
        "subject": "",
        "keywords": "",
        "author": "Administrator",
        "creator": "Microsoft Word - Backup4all_network_backup_solution.doc",
        "producer": "novaPDF Professional Server Ver 5.4 Build 260 (Windows XP  x32)",
        "creation_date": "2008-05-26T09:02:00Z",
        "mod_date": "0001-01-01T00:00:00Z",
        "pages": 4,
        "javascript": false,
        "encrypted": true,
        "password_protected": false,
        "page_size": "612 x 792 pts (letter)",
        "optimized": false,
        "pdf_version": 1.4,
        "page_rotation": 0,
        "tagged": false,
        "form": false
    }
}

Response Codes:

  • 200 (StatusOK) - Success!
  • 404 (StatusNotFound) - Item not found.
  • 500 (StatusInternalServerError) - An unexpected error happend.

Identifying items

To determine the GrayMeta Platform ID and Stow URL for an item, you need to know the location ID, container ID and the identifier for the item itself within the container. For more information about item IDs, see the Stow project.

You can make the following request:

POST /api/control/item-id
{
	"location_id": "abc123",
	"container_id": "MyContainer",
	"item_id": "MyItem"
}
  • location_id - (string) The GrayMeta Platform location ID that indicates which storage location the item is in
  • container_id - (string) The container ID where the item is (usually bucket name)
  • item_id - (string) The identifier of the item (usually its name within the storage)

Provided the location, container and item values are all valid, you will be given the following response:

{
	"stow_url": "s3://unique/url/to/item",
	"gm_item_id": "67779468b22af637e2dd6a2616264b6c"
}
  • stow_url - (string) The Stow URL for the item
  • gm_item_id - (string) The internal GrayMeta Platform ID for this item

It is not necessary for the item to have been harvested in order for the ID to be returned, but once harvested you can trust that the ID will match the gm_item_id returned.

Once you have obtained the identifiers for an item, you can use them in the Harvest API.

Item Captions

Get a list of captions for an item

GET /api/data/v3/items/{id}/captions

Response

A successful call will return a Status OK (200) with the following response body:

{
  "captions": [
    {
      "id": "c57149e1f0b9387294e1f5efe6cb1ef0",
      "item_id": "71ab3889e1c559865ed6bce99b349d4f",
      "source": "captions",
      "language": {
        "code": "eng",
        "confidence": 1
      }
    }
  ]
}

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item Caption Text

Get a list of texts contained in an item caption along with possible NLP data

GET /api/data/v3/items/{id}/caption/{item-captions-id}?mask={mask}
  • item-captions-id - (string) The ID value contained in the results of a /captions request.
  • mask allows you to mask the embedded nlp data for a caption text, which may result in faster results. Set mask=nlp to remove nlp data from being provided.

Response

A successful call will return a Status OK (200) with the following response body:

{
  "caption": [
    {
      "id": "f27f081531c42d304c285dc7306f29e7",
      "item_captions_id": "c57149e1f0b9387294e1f5efe6cb1ef0",
      "start_at": 0.03,
      "end_at": 4.92,
      "text": "Mr. Jones will speak now",
      "nlp_properties": {
        "entities": [
          {
            "text": "Mr. Jones",
            "confidence": 0.9995918273925781,
            "type": "person"
          }
        ],
        "key_phrases": [
          {
            "text": "Mr. Jones",
            "confidence": 0.9994778037071228
          }
        ],
        "sentiment": {
          "text": "neutral",
          "sentiment_confidence": {
            "Mixed": 0.014400332234799862,
            "Negative": 0.10051420331001282,
            "Neutral": 0.8749107718467712,
            "Positive": 0.010174600407481194
          }
        },
        "language": {
          "language": "en",
          "confidence": 0.9737588763237
        }
      }
    }
  ]
}

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item Descriptions

Get descriptions for an item

GET /api/data/v3/items/{id}/descriptions?page-token={page_token}&start={start}&window={window}&all={all}
  • page_token is the next page token provided to page the results. When provided a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query params, the page_token will take precedence.
  • start is the time(video) or page(documents) to indicate where to start retrieving results for a given item. Has no affect on an img item.
  • window is the time(video) or page(documents) to indicate where to end retrieving results for a given item. Has no affect on an img item.
  • all will provide all entries for an item regardless of description data being present or not.

If none are set, the full collection will be returned without pagination. A valid page_token can be used without the addition of all, limit and offset.

Response

A successful call will return a Status OK (200) with the following response body:

For an asset that is an img:

{
    "contents": {
        "img": {
            "description": {
                "id": "b2fe0c8e0ee298eb07c3c5dce457c907",
                "item_id": "fe154badd7b2d78349c214938f27547c",
                "confidence": 0.443617173003186,
                "language": {
                         "language": "en-US",
                         "confidence": 0.8
                },
                "text": "a drawing of a face"
            }
        },
        "pages": null,
        "video_frames": null
    },
    "next_page": ""
}

For an asset that is a video:

{
    "contents": {
        "img": null,
        "pages": null,
        "video_frames": [
            {
                "description": {
                    "id": "d4acd831bf8d56e6a6e1fcb054228f29",
                    "item_id": "dddcbe782d810eb80f531361b5799a53",
                    "confidence": 0.7532465814725752,
                    "language": {
                    	"code": "en",
                    	"confidence": 0.96
                    },
                    "text": "a close up of a person"
                },
                "frame_id": "59ad92a0ed0d873de84d2cc2bd080898",
                "thumbnail_path": "video_main_frames/frame-0000000000.jpg",
                "time": 0
            },
            {
                "description": {
                    "id": "8a7f430ebb24d621021e2014ee05c6eb",
                    "item_id": "dddcbe782d810eb80f531361b5799a53",
                    "confidence": 0.2997128373277878,
                    "language": null,
                    "text": "a close up of a man with smoke coming out of it"
                },
                "frame_id": "677deeef7865c9e1b0bb497164aeca50",
                "thumbnail_path": "video_main_frames/frame-0000000001.jpg",
                "time": 2
            },
            {
                "description": {
                    "id": "ff31ffc38b95e420dfca366ef02b550d",
                    "item_id": "dddcbe782d810eb80f531361b5799a53",
                    "confidence": 0.8331186858981754,
                    "language": null,
                    "text": "a blurry image of smoke"
                },
                "frame_id": "b767a35a35e5a873a109f2d3b4df5ec2",
                "thumbnail_path": "video_main_frames/frame-0000000002.jpg",
                "time": 4
            }
        ]
    },
    "next_page": "NextPageTokenString"
}

For an asset that is a document:

{
    "contents": {
        "img": null,
        "pages": [
            {
                "images": [
                    {
                        "description": {
                            "id": "4c51651ee15c7ce414ae381bdc252622",
                            "item_id": "0ca4a8e17b66d3946f611621936896c1",
                            "confidence": 0.7192287180775676,
                             "language": {
								"code": "en",
								"confidence": 0.86
							},
                            "text": "a man standing in front of a mirror posing for the camera"
                        },
                        "image_id": "24235782fb645ba35c6410617f8c3527",
                        "image_index": 0,
                        "thumbnail_path": "document_pages/thumb-pg-00000-img-00000.png"
                    }
                ],
                "page": 0,
                "description": "optionally, a page can have a description as well, or it can be embedded in the images within the page",
				"thumbnail_path": "optionally, a page can have a thumbnail path as well as well as each embedded images thumbnail",
				"page_id": "a uuid to identify the page, optional, will be available when all query param is set"
            },
            {
                "images": [
                    {
                        "description": {
                            "id": "62a0967c75af0b1f8ba65edd7b287929",
                            "item_id": "0ca4a8e17b66d3946f611621936896c1",
                            "confidence": 0.9312026737395315,
                            "language": null,
                            "text": "Robb Wells, John Paul Tremblay that are looking at the camera"
                        },
                        "image_id": "5fd88f4121c499aa04b9f77fa59e7788",
                        "image_index": 0,
                        "thumbnail_path": "document_pages/thumb-pg-00001-img-00000.png"
                    }
                ],
                "page": 1
            }
        ],
        "video_frames": null
    },
    "next_page": "NextPageTokenString"
}

If start and window are provided, then the results may be paginated. The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned. If the item id is for an item that does not exist, then a 404 will be returned indicating the item is not found.

Curate a description for an item

To create a curated description you can post to the endpoint below with a valid request body.

POST /api/data/v3/items/{id}/descriptions
{
	"segment_index": float64,
	"image_index": int,
	"item_type": ENUM["image" | "video" | "document"],
	"text": string
}
  • item_type indicates the type of item
  • segment_index should be set to -1 if item is an image
  • image_index should be set to -1 if item is an image and -1 for a video
  • text must not be an empty string

Response

A successful call will return a Status Create (201) with the following response body that includes the related metadata data associated with that segment/image index:

{
	"description": {
		"id": string,
		"item_id": string,
		"confidence": float64,
		"language": string,
		"language_confidence": float64,
		"text": string,
	}
}

If there is a conflict with the segment and/or image index, you’ll receive a Status Unprocessable Entity (422). If any unexpected errors happened in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Edit a description

To edit an existing descriptions text use the following request:

PATCH /api/data/v3/items/{id}/descriptions/{desc_id}

{
	"text": "new description text"
}
  • id is the item id
  • desc_id is the description id
  • text empty string allowed

Response

A successful call with return a Status OK (200) with the new description after updating.

{
	"description": {
		"id": string,
		"item_id": string,
		"confidence": float64,
		"language": string,
		"language_confidence": float64,
		"text": string,
	}
}

If the description by the desc_id is not found, you’ll receive a Status Not Found (404). If any unexpected errors happened in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item OCRs

Get ocrs for an item

GET /api/data/v3/items/{id}/ocrs?page-token={page_token}&start={start}&window={window}&all={all}
  • page_token is the next page token provided to page the results. If page_token is set along with other query params, the page_token will take precedence.
  • start is the time(video) or page(documents) to indicate where to start retrieving results for a given item. Has no affect on an img item.
  • window is the time(video) or page(documents) to indicate where to end retrieving results for a given item. Has no affect on an img item.
  • all will provide all entries for an item regardless of ocr data being present or not.

If none are set, the full collection will be returned without pagination. A valid page_token can be used without the addition of all, limit and offset.

Response

A successful call will return a Status OK (200) with the following response body:

For an asset that is an img:

{
    "contents": {
        "img": {
            "ocrs": [
                {
                    "id": "0a4877c31bd094a32a124a1c5571f751",
                    "item_id": "c21b3e9fe1fbe09814d6a7bfdf5ba313",
                    "bounding_box": {
                        "top": 441,
                        "left": 220,
                        "width": 52,
                        "height": 16
                    },
                    "confidence": 0,
                    "language": null,
                    "text": "adidas",
                    "text_type": "lines"
                },
                {
                    "id": "1afb7fa88d193d5d9ae766da04e1517f",
                    "item_id": "c21b3e9fe1fbe09814d6a7bfdf5ba313",
                    "bounding_box": {
                        "top": 462,
                        "left": 295,
                        "width": 86,
                        "height": 15
                    },
                    "confidence": 0,
                    "language": null,
                    "text": "SNALMUNCH 2012",
                    "text_type": "lines"
                },
                {
                    "id": "f4ee60381a3cd69a0787848b71255b62",
                    "item_id": "c21b3e9fe1fbe09814d6a7bfdf5ba313",
                    "bounding_box": {
                        "top": 497,
                        "left": 222,
                        "width": 243,
                        "height": 53
                    },
                    "confidence": 0,
                    "language": null,
                    "text": "SAMSUNG",
                    "text_type": "lines"
                }
            ]
        },
        "pages": null,
        "video_frames": null
    },
    "next_page": ""
}

For an asset that is a video:

{
    "contents": {
        "img": null,
        "pages": null,
        "video_frames": [
            {
                "frame_id": "5248fd3acceab63163a5bcc5ddc15d62",
                "ocrs": [
                    {
                        "id": "e45aa09fd7d6121e474459e59e8a7d4c",
                        "item_id": "3b103330378acb3ad604045ba1f4aecd",
                        "bounding_box": {
                            "top": 12,
                            "left": 14,
                            "width": 169,
                            "height": 11
                        },
                        "confidence": 0.87,
                        "language": null,
                        "text": "HIT THAT LIKE BUTTON, NATION!",
                        "text_type": "lines"
                    }
                ],
                "thumbnail_path": "video_main_frames/frame-0000000001.jpg",
                "time": 2.002
            },
            {
                "frame_id": "c5501c083166450e9885782aac29fad8",
                "ocrs": [
                    {
                        "id": "7a5d590df38aa2494a1a854640ea6b47",
                        "item_id": "3b103330378acb3ad604045ba1f4aecd",
                        "bounding_box": {
                            "top": 19,
                            "left": 249,
                            "width": 33,
                            "height": 16
                        },
                        "confidence": 0,
                        "language": null,
                        "text": "BEA",
                        "text_type": "lines"
                    },
                    {
                        "id": "8485e7771e8f0ecdc871cf8856554be1",
                        "item_id": "3b103330378acb3ad604045ba1f4aecd",
                        "bounding_box": {
                            "top": 12,
                            "left": 14,
                            "width": 169,
                            "height": 11
                        },
                        "confidence": 0,
                        "language": null,
                        "text": "HIT THAT LIKE BUTTON, NATION!",
                        "text_type": "lines"
                    }
                ],
                "thumbnail_path": "video_main_frames/frame-0000000002.jpg",
                "time": 4.004
            }
        ]
    },
    "next_page": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAKP-CAwMDYWxsAAVzdGFydA0tMTExMTEwMC45OTk5BndpbmRvdwIxMAA="
}

For an asset that is a document:

{
    "contents": {
        "img": null,
        "pages": [
            {
                "images": [
                    {
                        "image_id": "daea649e947b430acceccc089f653c3f",
                        "image_index": 0,
                        "ocrs": [
                            {
                                "id": "095a5ca47bda4e0a9db562e23070125c",
                                "item_id": "43b89caeb2fd3c23f25b8431e644cabc",
                                "bounding_box": {
                                    "top": 15,
                                    "left": 522,
                                    "width": 217,
                                    "height": 24
                                },
                                "confidence": 0,
                                "language": null,
                                "text": "Pilgrim Programming, LLC",
                                "text_type": "lines"
                            }
                        ],
                        "thumbnail_path": "document_pages/thumb-pg-00000-img-00000.png"
                    }
                ],
                "page": 0,
                "ocrs": "optionally, a page can have OCR as well, or it can be embedded in the images within the page",
                "thumbnail_path": "optionally, a page can have a thumbnail path as well as well as each embedded images thumbnail",
                "page_id": "a uuid to identify the page, optional, will be available when all query param is set"
            },
            {
                "images": [
                    {
                        "image_id": "18061410280b001f26f34890d0da6b04",
                        "image_index": 0,
                        "ocrs": [
                            {
                                "id": "2b6d00e3d4d014267a011d64d50144ff",
                                "item_id": "43b89caeb2fd3c23f25b8431e644cabc",
                                "bounding_box": {
                                    "top": 1194,
                                    "left": 270,
                                    "width": 817,
                                    "height": 23
                                },
                                "confidence": 0,
                                "language": null,
                                "text": "There are no liens , claims or encumbrances which might conflict with or otherwise affect",
                                "text_type": "lines"
                            }
                        ],
                        "thumbnail_path": "document_pages/thumb-pg-00001-img-00000.png"
                    }
                ],              
                "page": 1,
            }
        ],
        "video_frames": null
    },
    "next_page": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAJ_-CAwMDYWxsAAVzdGFydA0tMTExMTEwOS45OTk5BndpbmRvdwExAA=="
}

If start and window are provided, then the results may be paginated. The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned. If the item id is for an item that does not exist, then a 404 will be returned indicating the item is not found.

Curate an ocr for an item

To create a curated ocr you can post to the endpoint below with a valid request body.

POST /api/data/v3/items/{id}/ocrs
{
	"segment_index": float64,
	"image_index": int,
	"item_type": ENUM["image" | "video" | "document"],
	"text": string
}
  • item_type indicates the type of item
  • segment_index should be set to -1 if item is an image
  • image_index should be set to -1 if item is an image, and -1 for a video
  • text must not be an empty string

Response

A successful call will return a Status Create (201) with the following response body that includes the related metadata data associated with that segment/image index:

{
	"ocr": {
		"id": string,
		"item_id": string,
		"bounding_box": {
			"top": int,
			"left": int,
			"width": int,
			"height": int
		},
		"confidence": float64,
		"language": string,
		"language_confidence": float64,
		"text": string,
		"text_type": string,
	}
}

If there is a conflict with the segment and/or image index, you’ll receive a Status Unprocessable Entity (422). If any unexpected errors happened in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Edit an ocr

To edit an existing ocr text use the following request:

PATCH /api/data/v3/items/{id}/ocrs/{ocr_id}
{
	"text": "new ocr text"
}
  • id is the item id
  • ocr_id is the description id
  • text empty string allowed

Response

A successful call with return a Status OK (200) with the new ocr after updating.

{
	"ocr": {
		"id": string,
		"item_id": string,
		"bounding_box": {
			"top": int,
			"left": int,
			"width": int,
			"height": int
		},
		"confidence": float64,
		"language": string,
		"language_confidence": float64,
		"text": string,
		"text_type": string,
	}
}

If the ocr by the ocr_id is not found, you’ll receive a Status Not Found (404). If any unexpected errors happened in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item Speech to Texts

Get speech to texts for an item

GET /api/data/v3/items/{id}/speech-to-texts?mask={mask}
  • mask allows you to mask the embedded nlp data for a STT, which may result in faster results. Set mask=nlp to remove nlp data from being provided.

Response

A successful call will return a Status OK (200) with the following response body:

    "transcripts": [
        {
            "source": "amazon_transcribe",
            "transcript": [
                {
                    "id": "f83e777b8bf149438d61109a5d9dbf6f",
                    "item_id": "8447caddb4c1501a291bf343d6886586",
                    "start_at": 0,
                    "end_at": 10.05,
                    "text": "Wiggle room Small additions to Cuba with yourself So look at my harvest one file here I have a number of different",
                    "language": null,
                    "nlp_properties": {
                        "entities": [
                            {
                                "text": "Cuba",
                                "confidence": 0.9795709848403931,
                                "type": "location"
                            },
                            {
                                "text": "one file",
                                "confidence": 0.6889344453811646,
                                "type": "quantity"
                            }
                        ],
                        "key_phrases": [
                            {
                                "text": "Wiggle room Small additions",
                                "confidence": 0.7353704571723938
                            },
                            {
                                "text": "Cuba",
                                "confidence": 0.9996484518051147
                            },
                            {
                                "text": "my harvest one file",
                                "confidence": 0.8326694965362549
                            },
                            {
                                "text": "a number",
                                "confidence": 0.9973674416542053
                            }
                        ],
                        "sentiment": {
                            "text": "neutral",
                            "sentiment_confidence": {
                                "Mixed": 0.004104138817638159,
                                "Negative": 0.012511652894318104,
                                "Neutral": 0.934111475944519,
                                "Positive": 0.04927277937531471
                            }
                        },
                        "language": {
                            "language": "en",
                            "confidence": 0.9973103404045105
                        }
                    }
                },
                ...
            ]
        }
    ]
}

The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item Thumbnails

Get thumbnails for an item

GET /api/data/v3/items/{id}/thumbnails

Response

A successful call will return a Status OK (200) with the following response body:

For an asset that is a video:

{
    "contents": {
        "thumbnail": {
			"path": "thumbnailer/sprite.jpg",
			"type": "sprite",
			"frame_count": 30,
			"height": 152,
			"width": 270
		},
        "video_frames": [
            {
                "time": 1,
                "frame_id": "16d63b1c711727218a44e4c0a8d43a20",
                "thumbnail": "video_main_frames/frame-0000000000.jpg"
            },
            {
                "time": 2,
                "frame_id": "0b9d3ac13e51b621d535f812bcdd45fb",
                "thumbnail": "video_main_frames/frame-0000000001.jpg"
            },
            ...
        ]
    }
}

For an asset that is a document:

{
    "contents": {
        "thumbnail": {
			"path": "thumbnailer/thumb.png",
			"type": "image",
			"frame_count": 0,
			"height": 152,
			"width": 270
		},
        "pages": [
            {
                "page": 0,
                "page_id": "28e5fa9e0b736baa7f2f7843a024adc9",
                "thumbnail_path": "document_pages/thumb-pg-00000.png",
                "images": [
                    {
                        "image_index": 0,
                        "image_id": "907b26c326a53e67f32f1cbf8ccbba54",
                        "thumbnail_path": "document_pages/thumb-pg-00000-img-00000.png"
                    }
                ]
            },
            {
                "page": 1,
                "page_id": "6b3446f091ad62dc1ea90e6b619666f8",
                "thumbnail_path": "document_pages/thumb-pg-00001.png",
                "images": [
                    {
                        "image_index": 0,
                        "image_id": "09ca78c5e94ed7cb328bc2980aacacfe",
                        "thumbnail_path": "document_pages/thumb-pg-00001-img-00000.png"
                    }
                ]
            },
			...
        ]
    }
}

Images in a document show up when an embedded image is detected within the document and will be nil if they are not detected.

For all other assets:

{
    "contents": {
        "thumbnail": {
			"path": "thumbnailer/thumb.jpg",
			"type": "image",
			"frame_count": 0,
			"height": 152,
			"width": 270
		}
    }
}

If the thumbnail field is "", that indicates the asset did not have a thumbnail created. This will happen with text, caption, and archive files.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Update a speech to text entry

PATCH /api/data/v3/items/{item_id}/speech-to-texts/{s2t_id}
{
	"text": "my new speech to text"
}

Returns the updated speech to text entry, else an error:

  • Status Not Found (404) if passing an invalid item_id or s2t_id
  • Status Unprocessable Entity (422) if there is a validation error
  • Status Internal Server Error (500) if some other error occurs

Delete a speech to text entry

DELETE /api/data/v3/items/{item_id}/speech-to-texts/{s2t_id}

On success, returns a 204 No Content, else an error:

  • Status Not Found (404) if passing an invalid item_id or s2t_id
  • Status Internal Server Error (500) if some other error occurs

Item Logos

Get logos for an item

GET /api/data/v3/items/{id}/logos?page-token={page_token}&start={start}&window={window}&all={all}
  • page_token is the next page token provided to page the results. When provided a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query params, the page_token will take precedence.
  • start is the time(video) or page(documents) to indicate where to start retrieving results for a given item. Has no affect on an img item.
  • window is the time(video) or page(documents) to indicate where to end retrieving results for a given item. Has no affect on an img item.
  • all will provide all entries for an item regardless of logo data being present or not.

If none are set, the full collection will be returned without pagination. A valid page_token can be used without the addition of all, limit and offset.

Response

A successful call will return a Status OK (200) with the following response body:

For an asset that is an img:

{
    "contents": {
        "img": {
            "logos": [
                {
                    "id": "8abdb55f697f0a28a8264f3f0a320d09",
                    "confidence": 0.851,
                    "name": "Adidas",
                    "bounding_box": {
                        "top": 1083,
                        "left": 236,
                        "width": 62,
                        "height": 88
                    }
                }
            ]
        }
    },
    "next_page_token": ""
}

For an asset that is a video:

{
    "contents": {
        "video_frames": [
            {
                "time": 4.004,
                "frame_id": "adb7e0d8787d2da396ee6e79ab80b0b0",
                "thumbnail": "video_main_frames/frame-0000000002.jpg",
                "logos": [
                    {
                        "id": "cb76d5633637c0d060026adaf87a2804",
                        "confidence": 0.8071,
                        "name": "eastern connecticut state university",
                        "bounding_box": {
                            "top": 7,
                            "left": 6,
                            "width": 180,
                            "height": 21
                        }
                    }
                ]
            },
            {
                "time": 8.008,
                "frame_id": "8e9ebfb5b7fa100a636fe1aeef01ea5f",
                "thumbnail": "video_main_frames/frame-0000000004.jpg",
                "logos": [
                    {
                        "id": "a7be4458334a717525a902ce4df2f358",
                        "confidence": 0.81195,
                        "name": "eastern connecticut state university",
                        "bounding_box": {
                            "top": 7,
                            "left": 5,
                            "width": 182,
                            "height": 21
                        }
                    }
                ]
            }
        ]
    },
    "next_page_token": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAIv-CAwMFc3RhcnQHMzYuMDAwMQZ3aW5kb3cCMTYDYWxsAAA="
}

For an asset that is a document:

{
    "contents": {
        "pages": [
            {
                "page": 22,
                "images": [
                    {
                        "image_index": 11,
                        "image_id": "7f6d4b027ba6cf303a3cd4108e99b866",
                        "thumbnail_path": "document_pages/thumb-pg-00022-img-00011.png",
                        "logos": [
                            {
                                "id": "72157a50ea72d788ca102171639a3f45",
                                "confidence": 0.8273,
                                "name": "misako",
                                "bounding_box": {
                                    "top": 0,
                                    "left": 306,
                                    "width": 1079,
                                    "height": 782
                                }
                            }
                        ]
                    }
                ]
            },
            {
                "page": 34,
                "images": [
                    {
                        "image_index": 0,
                        "image_id": "ebf1c28240735ad2b63a348a9f6671ee",
                        "thumbnail_path": "document_pages/thumb-pg-00034-img-00000.png",
                        "logos": [
                            {
                                "id": "628cd34b9b13f7b5a16d19d62923ce47",
                                "confidence": 0.81385,
                                "name": "colgate",
                                "bounding_box": {
                                    "top": 436,
                                    "left": 375,
                                    "width": 172,
                                    "height": 151
                                }
                            }
                        ]
                    }
                ]
            }
        ]
    },
    "next_page_token": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAIv-CAwMFc3RhcnQHMzYuMDAwMQZ3aW5kb3cCMTYDYWxsAAA="
}

The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item Tags

Get tags for an item

GET /api/data/v3/items/{id}/tags?page-token={page_token}&start={start}&window={window}&all={all}
  • page_token is the next page token provided to page the results. When provided a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query params, the page_token will take precedence.
  • start is the time(video) or page(documents) to indicate where to start retrieving results for a given item. Has no affect on an img item.
  • window is the time(video) or page(documents) to indicate where to end retrieving results for a given item. Has no affect on an img item.
  • all will provide all entries for an item regardless of tag data being present or not.

If none are set, the full collection will be returned without pagination. A valid page_token can be used without the addition of all, limit and offset.

Response

A successful call will return a Status OK (200) with the following response body:

For an asset that is an img:

{
    "contents": {
        "img": {
            "tags": [
                {
                    "id": "eec2accad1cc1b757f3035bd8253ac04",
                    "text": "window",
                    "confidence": 0.9015815854072571
                },
                {
                    "id": "e95fc00150dbdf08ec3eb2e75c638ac1",
                    "text": "stained glass",
                    "confidence": 0.9015815854072571
                },
                {
                    "id": "c06ee06a066475e692b8b56433ab371a",
                    "text": "light",
                    "confidence": 0.8380934019465514
                },
                {
                    "id": "7cfb5e8e12cdfaeb10086a5a9f693786",
                    "text": "sphere",
                    "confidence": 0.5156272603992966
                },
                {
                    "id": "ef689cbeb566845b46be786438cb8782",
                    "text": "church",
                    "confidence": 0.25103029243584873
                }
            ]
        },
        "pages": null,
        "video_frames": null
    },
    "next_page": ""
}

For an asset that is a video:

{
    "contents": {
        "img": null,
        "pages": null,
        "video_frames": [
            {
                "frame_id": "3a222b4f2e58529ce0dc321cf299dadb",
                "tags": [
                    {
                        "id": "b12f152bda07508c9a482d74fef7dac3",
                        "text": "summer",
                        "confidence": 0.312589002008962
                    },
                    {
                        "id": "a090dafce32110bde870b89055e75939",
                        "text": "autumn",
                        "confidence": 0.20671001655433419
                    }
                ],
                "thumbnail_path": "video_main_frames/frame-0000000000.jpg",
                "time": 0
            },
            {
                "frame_id": "6decb0045799aef3f95dbc6327a992fd",
                "tags": [
                    {
                        "id": "8153baacdde17515bb7d89aa09e10347",
                        "text": "firefighter",
                        "confidence": 0.9791649580001832
                    },
                    {
                        "id": "e5c846b2a75496779eb7c32b1f567a86",
                        "text": "person",
                        "confidence": 0.9791649580001831
                    },
                    {
                        "id": "6dcff09bdec2deb23d355d89e48a8f34",
                        "text": "smoke",
                        "confidence": 0.5069368303763367
                    }
                ],
                "thumbnail_path": "video_main_frames/frame-0000000001.jpg",
                "time": 2
            }
        ]
    },
    "next_page": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAJ_-CAwMDYWxsAAVzdGFydA0tMTExMTEwOS45OTk5BndpbmRvdwExAA=="
}

For an asset that is a document:

{
    "contents": {
        "img": null,
        "pages": [
            {
                "images": [
                    {
                        "image_id": "6ae8b0cd1a26070884f08dac7336a2c2",
                        "image_index": 0,
                        "tags": [
                            {
                                "id": "e6d5e65e71d93bbada112bfbcb6a5ed0",
                                "text": "person",
                                "confidence": 0.9971379041671753
                            },
                            {
                                "id": "5a322b094c87b670ea71bf8eea7cc7ed",
                                "text": "man",
                                "confidence": 0.9918940663337708
                            }
                        ],
                        "thumbnail_path": "document_pages/thumb-pg-00000-img-00000.png"
                    }
                ],
                "page": 0,
                "tags": "optionally, a page can have tags as well, or it can be embedded in the images within the page as shown above",
				"thumbnail_path": "optionally, a page can have a thumbnail path as well as well as each embedded images thumbnail above",
				"page_id": "a uuid to identify the page, optional, is guaranteed to be available when all query param is set"
            },
            {
                "images": [
                    {
                        "image_id": "b604734057d3462894d6c1e75a8517b3",
                        "image_index": 0,
                        "tags": [
                            {
                                "id": "2ca9919a848d662a254315f34de006ca",
                                "text": "standing",
                                "confidence": 0.8040973544120789
                            },
                            {
                                "id": "807f196d29b0094d558479d50fddbb74",
                                "text": "crowd",
                                "confidence": 0.0062681203708052635
                            }
                        ],
                        "thumbnail_path": "document_pages/thumb-pg-00001-img-00000.png"
                    }
                ],
                "page": 1
            }
        ],
        "video_frames": null
    },
    "next_page": "NP-BAwEBBVRva2VuAf-CAAEDAQVMaW1pdAEEAAEGT2Zmc2V0AQQAAQZQYXJhbXMB_4QAAAAh_4MEAQERbWFwW3N0cmluZ11zdHJpbmcB_4QAAQwBDAAAJ_-CAwMDYWxsAAVzdGFydA0tMTExMTEwOS45OTk5BndpbmRvdwExAA=="
}

The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Add tags to an item

POST /api/data/v3/items/{id}/tags

Request body:

{
	"metadata_id": "metadata UUID",
	"tags": ["tag1", "tag2","tag3"]
}
  • "tags" is a list of tags you would like to add. All tags will be deduplicated before being applied to the segment.

Response

A successful call will return a Status Created (201) with the metadata segment (or segment parent) with new state of the segment, including any added tags that may have been added. Will look identical to the get teats response except the respose will only include the edited img/time-frame/page data.

Delete tags from a segment

By Name

DELETE /api/data/v3/items/{id}/tags?metaID={meta_id}&tagName={tag_name}
  • meta_id being the segment or image uuid, if an image metadata id is provided it will nuke the tagname from all sibling images under that segment.
  • tag_name is the name of the tag(s) to be deleted. (may result in multple tags being deleted from the segment, if they share names)
Response

A successful call will return a Status No Content (204) with no response body.

By ID

DELETE /api/data/v3/items/{id}/tags/{tag_id}/?metaID={meta_id}
  • meta_id being the segment or image uuid, if an image metadata id is provided it will nuke the tagname from all sibling images under that segment.
Response

A successful call will return a Status No Content (204) with no response body.

Get contents for a document item

GET /api/data/v3/items/{id}/text-contents?page-token={page_token}&start={start}&window={window}&all={all}&mask={mask}
  • page_token is the next page token provided to page the results. When provided a next page token is all that is needed to retrieve the next page of results. If page_token is set along with other query params, the page_token will take precedence.
  • start is the page(documents) to indicate where to start retrieving results for a given item. Has no affect on an img item.
  • window is the page(documents) to indicate where to end retrieving results for a given item. Has no affect on an img item.
  • all will provide all entries for an item regardless of text content data being present or not.
  • mask allows you to mask the embedded nlp data for a text content entry, which may result in faster results. Set mask=nlp to remove nlp data from being provided.

If none are set, the full collection will be returned without pagination. A valid page_token can be used without the addition of all, limit and offset.

Response

A successful call will return a Status OK (200) with the following response body:

For an asset that is a document:

{
    "contents": {
        "pages": [
            {
                "page": 0,
                "page_id": "1a15b610dd82bdfec2ee8ecf5597cc97",
                "thumbnail_path": "document_pages/thumb-pg-00000.png",
                "text_content": {
                    "id": "65a5d28b38228ebce12f8bab67e0f386",
                    "metadatas_id": "1a15b610dd82bdfec2ee8ecf5597cc97",
                    "text": "This is an example pdf\n\n\f",
                    "language": {
                    	"code": "en-US",
                    	"confidence": 0.7
                    }
                    "nlp_properties": {
                        "entities": null,
                        "key_phrases": null,
                        "sentiment": {
                            "text": "neutral",
                            "sentiment_confidence": {
                                "Mixed": 0.013882513158023357,
                                "Negative": 0.16380225121974945,
                                "Neutral": 0.7018685936927795,
                                "Positive": 0.12044669687747955
                            }
                        },
                        "language": {
                            "language": "en",
                            "confidence": 0.9962568283081055
                        }
                    }
                }
            },
            {
                "page": 1,
                "page_id": "1bf644159b1b1c1e3025df76f7c66110",
                "thumbnail_path": "document_pages/thumb-pg-00001.png",
                "text_content": {
                    "id": "62773d56d314a578f746a080460195a7",
                    "metadatas_id": "1bf644159b1b1c1e3025df76f7c66110",
                    "text": "This is page 2 of the example pdf\n\n\f",
                    "language": null
                    "nlp_properties": {
                        "entities": [
                            {
                                "text": "page 2",
                                "confidence": 0.8359338045120239,
                                "type": "quantity"
                            }
                        ],
                        "key_phrases": [
                            {
                                "text": "page 2",
                                "confidence": 0.9814304709434509
                            }
                        ],
                        "sentiment": {
                            "text": "neutral",
                            "sentiment_confidence": {
                                "Mixed": 0.007130879443138838,
                                "Negative": 0.10643889009952545,
                                "Neutral": 0.783736526966095,
                                "Positive": 0.10269377380609512
                            }
                        },
                        "language": {
                            "language": "en",
                            "confidence": 0.9866665005683899
                        }
                    }
                }
            }
        ]
    },
    "next_page": ""
}

For the same asset but with the mask set to remove NLP information:

{
    "contents": {
        "pages": [
            {
                "page": 0,
                "page_id": "1a15b610dd82bdfec2ee8ecf5597cc97",
                "thumbnail_path": "document_pages/thumb-pg-00000.png",
                "text_content": {
                    "id": "65a5d28b38228ebce12f8bab67e0f386",
                    "metadatas_id": "1a15b610dd82bdfec2ee8ecf5597cc97",
                    "text": "This is an example pdf\n\n\f",
                    "language": {
                       "code": "en-US",
                       "confidence": 0.7
                    }
                }
            },
            {
                "page": 1,
                "page_id": "1bf644159b1b1c1e3025df76f7c66110",
                "thumbnail_path": "document_pages/thumb-pg-00001.png",
                "text_content": {
                    "id": "62773d56d314a578f746a080460195a7",
                    "metadatas_id": "1bf644159b1b1c1e3025df76f7c66110",
                    "text": "This is page 2 of the example pdf\n\n\f",
                    "language": null
                    }
                }
            }
        ]
    },
    "next_page": ""
}

If start and window are provided, then the results may be paginated. The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results.

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned. If the item id is for an item that does not exist, then a 404 will be returned indicating the item is not found.

Item Text Tokens

Get the content of text files (.txt)

GET /api/data/v3/items/{id}/tokens

Response

A successful call will return a Status OK (200) with the following response body:

{
  "tokens": "The quick brown fox jumps over the lazy dog"
}

If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Item Extractor Runs

Get extractor runs for an item

GET /api/data/v3/items/{id}/extractors?run-type={run-type}
  • run-type is an enum from one of the following values: ** all: returns all the extractors run for an item in its lifetime ** latest: returns the extractors from the last run ** erroneous: returns the extractors that have errors registered from their last run ** unique: returns a list of unique extractors from the history of the item, the latest extractor run for each extractor type will be returned

Response

A successful call will return a Status OK (200) with the following response body:

{
	"extractors_runtime": [
		{
			"request_id": String,
			"err": String,
			"success": Bool,
			"skipped": Bool,
			"runtime": Integer duration in nanoseconds,
			"start_at": Zulu Timestamp,
			"end_at": Zulu Timestamp,
			"info": {
				"name": String,
				"version": Integer
			} 
		}
		...
	]
}

If the item is not found, a Status Not Found 404 will be returned. If any unexpected errors happen in the process of fulfilling this req/response cycle a Status Internal Server Error (500) will be returned.

Extractor Source Files

Item Amazon Transcribe Source Word file Download

To download the source words for the Amazon Transcribe transcriptions you may retrieve it via the following endpoint.

GET /api/files/{item_id}/sourcefiles/amazon_transcribe.json

Response

A response is a list of words and punctuation that make up dictation. The below example elicits what each type would look like in a source file.

{
    "words": [
        {
            "start_time": "0",
            "end_time": "0",
            "type": "punctuation",
            "alternatives": [
                {
                    "confidence": "0",
                    "content": "."
                }
            ]
        },
        {
			"start_time": "0.28",
			"end_time": "0.34",
			"type": "pronunciation",
			"alternatives": [
				{
					"confidence": "0.215",
					"content": "Yeah"
				}
			]
		},
		...
	]
}

This documentation is generated from the latest version of GrayMeta Platform. For documentation relevant to your own deployed version, please use the documentation inside the application.