Table of contents

Duplicates API

The duplicates API is a way to access information about items that have identical checksums and file size and are thus considered to be duplicated copies of the same file.

Retrieving Duplicate Stats

GET /api/data/v3/duplicates-footprint

Response:

{
    "total_count": 14,
    "total_number": 126,
    "total_size": 184925196
}
  • total_count - (int) Total count of duplicate hashes and file_size
  • total_number - (int) Total number of duplicate items
  • total_size - (int) Total size of duplicate items

Status codes:

  • 200 (success)
  • 500 (unexpected error)

Listing Duplicated Hashes

GET /api/data/v3/duplicates

Optional Query Parameters:

  • page-token - (string) A page token to fetch an additional page of results
  • limit - (int) How many results to return. Minimum: 1, Maximum: 1000, Default: 50
  • limit-items (int) How many item ids to return for each hash. Minimum: 1, Maximum: 1000, Default: 20

Response:

{
    "duplicates": [
        {
            "hash": "1bf6808152b518ec38b18780f96526ef",
            "items": [
                "0ddc337d7f399b406073f5f8e5324fc4",
                "3566d2ff4aea65d30eda983c2c749274",
                "444ccc35017f1e39188d735d7ec126a0",
                "4a86bb348ba1050511777d40a2471d3e",
                "69af63e7f10ad1b8fa5e9a05f149bbea",
                "7d63cf40504e9f2501250f731fea03b0",
                "84deb4ebb35d5c78cd69b9dce4eaf4b5",
                "cbbf806d29b43fdbeb9f2516e4813525",
                "faa93144670268595c8acc823e861de7"
            ],
            "count": 9,
            "file_size": 614912,
            "footprint_size": 39803040,
            "items_next_page": ""
        },
        {
            "hash": "fd083c80752b24b6085eb773ed1b9609",
            "items": [
                "4223082baa8abcb70c1fcd600b0f7b83",
                "82d37a0a16911b49a001e5686ad58a07",
                "92306656ab6c366b113af0cc6ca0abad",
                "92fb0a09883700b8c2116eaccf2b80e8",
                "9c8b1b15613e22f3474291f3ea2d8ca6",
                "be265a032be6d5b759e1ead68df875b9",
                "c27582011f2966350c7a64c834d88637",
                "dcb2c7f92cecac5b2a89fdd046a9bac8",
                "fe63caef8d1089b930a76dfc21dd5634"
            ],
            "count": 9,
            "file_size": 3598,
            "footprint_size": 28784,
            "items_next_page": ""
        },
        {
            "hash": "c37470f71fb1bc9c7242c5282274cdd8",
            "items": [
                "017dc9a61a040855f819f48b85d3119b",
                "651f0dc8cd2d2fe9d2b8462dfa03d04b",
                "82aaa25ff23826e21629d9f391b83d18",
                "84e5ae7f3fcd7caa4259d1480f057a73"
            ],
            "count": 4,
            "file_size": 54047,
            "footprint_size": 162141,
            "items_next_page": ""
        },
        {
            "hash": "6461a4d3c389f561207065dfd8d4c01b",
            "items": [
                "0fc10bd128c62699c80d3af21d5356ba",
                "eb41c0739e90a24d7d855933c5d3c231"
            ],
            "count": 2,
            "file_size": 367922,
            "footprint_size": 367922,
            "items_next_page": ""
        }
    ],
    "next_page": ""
}
  • duplicates - (array) An array of hash objects:
    • hash - (string) The cryptographic hash
    • items - (array of strings) A list of item ids that share that hash
    • count - (int) The total number of items with that hash
    • file_size - (int) The size of one of the files, in bytes
    • footprint_size - (int) The total size of the duplicates added up. ((count - 1) * file_size)
    • items_next_page - (string) A page token that can be used to retrieve the next page of items with a given hash (see the GET /api/data/v3/duplicates/{hash} endpoint below. If this is an empty string, there are no additional items with the given hash.
  • next_page - A page token that can be used to reteieve the next page of hashes. If this is an empty string, there are no additional results to return.

Status codes:

  • 200 (success)
  • 500 (unexpected error)

Retrieving a list of items for a given hash

GET /api/data/v3/duplicates/{hash}

Optional Query Parameters:

  • page-token - (string) A page token to fetch an additional page of results
  • limit - (int) The number of results to return. Minimum: 1, Maximum: 1000, Default: 20

Response:

{
    "hash": "1bf6808152b518ec38b18780f96526ef",
    "items": [
        "0ddc337d7f399b406073f5f8e5324fc4",
        "3566d2ff4aea65d30eda983c2c749274",
        "444ccc35017f1e39188d735d7ec126a0",
        "4a86bb348ba1050511777d40a2471d3e",
        "69af63e7f10ad1b8fa5e9a05f149bbea",
        "7d63cf40504e9f2501250f731fea03b0",
        "84deb4ebb35d5c78cd69b9dce4eaf4b5",
        "cbbf806d29b43fdbeb9f2516e4813525",
        "faa93144670268595c8acc823e861de7"
    ],
    "count": 9,
    "file_size": 614912,
    "footprint_size": 4919296,
    "items_next_page": ""
}
  • hash - (string) The cryptographic hash
  • items - (array of strings) A list of item ids that share that hash
  • count - (int) The total number of items with that hash
  • file_size - (int) The size of one of the files, in bytes
  • footprint_size - (int) The total size of the duplicates added up. ((count - 1) * file_size)
  • items_next_page - (string) A page token that can be used to retrieve the next page of items with the hash. If this is an empty string, there are no additional items with the given hash.

Status codes:

  • 200 (success)
  • 500 (unexpected error)

This documentation is generated from the latest version of GrayMeta Platform. For documentation relevant to your own deployed version, please use the documentation inside the application.