Table of contents

Activity API

The Activity API allows you to view details about a specific harvesting job. This endpoint allows you to find extractor errors, and items that had a specific error message.

List Extractor Errors

This API will show you the amount of errors each extractor had.

GET /api/data/v3/activity/{request_id}/extractors
  • {request_id} - (string) Harvest Job request ID

You will receive the following response:

{
	"extractors": [{
			"name": "captions",
			"display_name": "Embedded Captions",
			"num_errors": 0
		},
		{
			"name": "document_pages",
			"display_name": "Documents",
			"num_errors": 0
		},
		{
			"name": "drm",
			"display_name": "DRM",
			"num_errors": 2
		},
		{
			"name": "exiv2",
			"display_name": "EXIV2",
			"num_errors": 0
		}
	]
}
  • extractors[].name - (string) The actual name of the extractor that ran in this harvest. This value is used as {extractor} in all other Activity endpoint API requests.
  • extractors[].display_name - (string) The display name of the extractor that ran in this harvest
  • extractors[].num_error - (integer) Amount of errors the extractor caught

List Errors for an Extractor

This API will list all errors that occurred in this harvesting job. Each error will have an error_hash that will allow you to find items in the next API endpoint.

GET /api/data/v3/activity/{request_id}/extractors/{extractor}?page_token={token}&limit={limit}&offset={offset}
  • {request_id} - (string) Harvest Job request ID
  • {extractor} - (string) Extractor name
Optional Queries
  • limit - (integer) Maximum results to show
  • offset - (integer) The offset amount of results (used for pagination)
  • page-token - (string) Show results for the next set of results, from next_page_token value.

You will receive the following response:

{
	"extractor": "credits",
	"extractor_name": "Credits",
	"errors": [{
		"error": "credits: error message ouchhh",
		"error_hash": "8f1e07deac51451cc1ed2778312af796",
		"num_files": 1
	}],
	"next_page_token": ""
}
  • extractor - (string) The actual name of the extractor being searched
  • extractor_name - (string) The display name of the extractor being searched
  • errors[].error - (string) The error message that occured
  • errors[].error_hash - (string) The MD5 hash of the error message
  • errors[].num_files - (integer) Amount of errors with same error message
  • num_files - (integer) Total amount of files that had errors with extractor
  • next_page_token - (string) The next set of paginated results for page-token query

The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results. Use the next_page_token as the page-token in query to fetch next set of results.

List Items That Have a Specific Error

This API will list all item’s in the harvesting job that had this specific error message. The error_md5 is a MD5 hash of the error message string.

GET /api/data/v3/activity/{request_id}/extractors/{extractor}/errors/{error_md5}/files?page_token={token}&limit={limit}&offset={offset}
  • {request_id} - (string) Harvest Job request ID
  • {extractor} - (string) Extractor name
  • {error_md5} - (string) Error message hashed into MD5.
Optional Queries
  • limit - (integer) Maximum results to show
  • offset - (integer) The offset amount of results (used for pagination)
  • page-token - (string) Show results for the next set of results, from next_page_token value.

You will receive the following response:

{
	"files": [{
			"item_id": "8535dc5d1de2e83fba40bc0fe66a9ab8"
		},
		{
			"item_id": "08376d120bcd1cb8d14329e1e8e9cb71"
		},
		{
			"item_id": "69f6a57e28e7358f554388f1be6b6be3"
		}
	],
	"num_files": 3,
	"next_page_token": ""
}
  • files[].item_id - (string) The Item ID that had this extractor error
  • num_files - (integer) Total amount of files that had errors with extractor
  • next_page_token - (string) The next set of paginated results for page-token query

The next page token will provide a stringified token that can be used to retrieve the next page of results. If the next page token is an empty string, then there are no more results. Use the next_page_token as the page-token in query to fetch next set of results.

This documentation is generated from the latest version of GrayMeta Platform. For documentation relevant to your own deployed version, please use the documentation inside the application.