API access to traffic data

The page below describes how to retrieve traffic data via the API. It contains these main sections:

Introduction

The Data queries API namespace provides access to traffic data, via these four routes:

  • GET /api/v4.0/data/topx

  • GET /api/v4.0/data/stats

  • GET /api/v4.0/data/timeline

  • GET /api/v4.0/data/logs

Each includes an input parameter named filters. This parameter specifies the traffic data that will be retrieved.

On the page below, we begin by discussing this parameter's usage, syntax, and construction. Then we discuss the four API routes that require it, and the different types of data they return.

The filters parameter

filters is a string that specifies one or more conditions. A database query is constructed from those conditions, and the results are returned to the user.

Usage of the filters parameter depends on the context:

  • When using Swagger UI, this value is supplied directly in the filters input field.

  • When using curl to call the Reblaze API, this value is encoded into the destination URL, preceded by "filters=". See this explanation for more information.

A short example

Here is an example of a filters specification in JSON format:

{
  "AND": [
    {
      "field": "timestamp",
      "op": "between",
      "value": [
        "2024-06-06 09:31:00",
        "2024-06-06 09:36:00"
      ]
    },
    {
      "field": "tags",
      "op": "regex",
      "value": "unrecognized"
    }
  ]
}

This will return requests which:

  • were received in a five-minute time period (from 2024-06-06 09:31:00 to 2024-06-06 09:36:00), and...

  • have a Tag containing the string unrecognized . (In this example, the admin wanted to retrieve requests tagged with unrecognized-host-header.)

Format

The filters parameter can be supplied as a query string, or as JSON.

Query string format

This format is used in the first Query Specification input field in the UI's Dashboard and Events Log. For example, this query string will display all requests with a 301 response code within a certain time period:

status=301, timestamp between 2024-06-06 09:31:00 and 2024-06-06 09:36:00

For more information on this format, see Query filter syntax and best practices.

JSON format

The JSON equivalent of the query string above is:

{
  "AND": [
    {
      "field": "timestamp",
      "op": "between",
      "value": [
        "2024-06-06 09:31:00",
        "2024-06-06 09:36:00"
      ]
    },
    {
      "field": "status",
      "op": "eq",
      "value": 301
    }
  ]
}

(This is provided as an example only. A full discussion of JSON syntax is below.)

Converting from query string to JSON

The POST /api/v4.0/data/timeline/parse API route accepts query strings and returns the same query in JSON format.

JSON structure

The discussion below will focus on building the filters parameter in JSON format, for two reasons. First, text query strings are discussed elsewhere (in the links given above). Second, for complex queries, JSON is more powerful.

A JSON filters parameter is structured as follows:

{
  "AND": [
    $CONDITION1,
    $CONDITION2,
    ...
    $CONDITIONn
  ]
}

The first condition: a range of dates/times

The first condition must be included, and must be a range of dates/times. It is structured as follows:

{
  "field": "timestamp",
  "op": "between",
  "value": [
    "$TIMESTAMP1",
    "$TIMESTAMP2"
  ]
}

where $TIMESTAMP1 and $TIMESTAMP2 are timestamps: specifications of date and time.

For timestamps, any ISO format is supported. Nevertheless, both timestamps must include year, month, and day.

If hours, minutes, or seconds are not included in a timestamp, the time will be rendered as the beginning of the day/hour/minute, respectively. Examples: "2022-07-14" -> "2022-07-14 00:00:00", "2022-07-14 06:52" -> "2022-07-14 06:52:00", etc.

Subsequent conditions

After the first condition, additional conditions can be specified if desired. They are structured as follows:

{
  "field": "$FIELD_NAME",
  "op": "$OPERATOR",
  "key": "$KEY",
  "value": $VALUE
}

They must meet these requirements:

  • Multiple conditions are combined with a logical AND. (A logical OR is not supported, as this could potentially retrieve unexpectedly large amounts of data.)

  • When multiple conditions are provided, all are followed by commas, except for the final one.

  • The "key" line is optional; see discussion below.

Here in the documentation, spaces and carriage returns are included in JSON filter examples for clarity. In usage, they are optional.

Also, the order of a condition's components (its field/op/key/value) does not matter.

Field names

Available fields include those inherent to HTTP requests, along with additional data added by Reblaze during processing.

  • Some of the Reblaze-added information consists of internal IDs for the security settings that are relevant to the request.

  • Some requests will not contain all possible information. When Reblaze blocks a request, processing usually stops immediately, and later stages in the traffic filtering process do not occur.

Key

For some types of data, it might not be enough to specify the field, because there could be multiple parameters in the request that match it. For example, a field name of "cookies" or "headers" does not tell the system which cookie or header to inspect.

The key field supplies this information; it is the name of the specific parameter to evaluate. If this parameter is not defined, the system will inspect all instances of the specified field (all cookies, all headers, etc.).

Some examples are in the JSON filters examples below.

Values

Values should be specified in the appropriate data type: strings as quote-delimited strings, integers as numbers, etc. Arrays of values can be supplied.

Operators

Negative operators

Operators can be inverted by adding not. For example, not eq means "does not equal".

Inverting a condition

It's possible to invert an entire condition by adding NOT, like this:

{
  "NOT": {
    "field": "$FIELD_NAME",
    "op": "$OPERATOR",
    "key": "$KEY",
    "value": $VALUE
  }
}

JSON filter examples

Retrieve PUT and POST requests:

{
    "AND": [
        {"field": "timestamp", "op": "between", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"]},
        {"field": "method", "value":"^P.*T$", "op":"regex"}
    ]
}

Retrieve requests containing a tag that matches "geo" or "location":

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {"field": "tags", "value": ["geo", "location"], "op": "in"}
    ]
}

Retrieve requests where certain cookies' values match a regex:

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {"field": "cookies", "key": "analytics_.*", "value": "gcp_.*", "op": "regex"}
    ]
}

Retrieve requests where any cookie's value matches a regex:

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {"field": "cookies", "value": "gcp_.*", "op": "regex"}
    ]
}

Retrieve requests according to a subfield (the acl_active subfield of security_config must be greater than 11).

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {"field": "security_config", "key": "acl_active", "value": 11, "op": "gt"}
  ]
}

Retrieve requests according to an array of subfields:

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {
            "field": "cf_triggers",
            "conditions": [
                {"field": "risk_level", "value": 2, "op": "gt"},
                {"field": "ruleid", "value": "100037", "op": "eq"}
            ]
        }
    ]
}

Retrieve requests according to a combination of conditions (there is no limit on the number of conditions):

{
    "AND": [
        {"field": "timestamp", "value": ["2022-08-08 01:15:25", "2022-08-08 01:52:28"], "op": "between"},
        {"field": "tags", "value":"geo-continent-name:north-america", "op": "eq"},
        {"field": "agent", "key": "ephemeral_id", "value": "029ab6f-6d15-4290-b44f-9133", "op": "regex"},
        {"field": "trigger_counters", "key": "acl","value": 2,"op": "not gt"},
        {"field": "headers", "key": "x-forwarded.*", "value": "3.65.14.177", "op": "regex"},
        {"field": "path_parts", "key": "^part3$", "value": "^-1&", "op": "regex"},
        {"field": "security_config", "key": "cf_active", "value": true, "op": "not eq"},
        {"field": "ip", "value": ["2.55.96.231", "23.65.14.177", "3.1.92.15", "8.206.254.196"], "op": "in"}
    ]
}

API access to traffic data

As noted previously, there are four API routes that use the filters parameter to retrieve traffic data:

  • GET /api/v4.0/data/topx

  • GET /api/v4.0/data/stats

  • GET /api/v4.0/data/timeline

  • GET /api/v4.0/data/logs

Their typical uses are as follows.

Quickly discover the most important factors in the traffic stream (e.g., the countries sending the most blocked requests, the URLs receiving the most bot traffic, etc.): use the topx route.

Get a summary of traffic statistics (total requests, bandwidth, and latency): use the stats route.

Get a summary of security metrics (total requests, blocked requests, status codes returned, number of human clients, activity of the origin, etc.): use the timeline route

Get complete data for all requests matching certain criteria (often used for drilling down into trends discovered from the other routes): use the logs route.

Below, we discuss each route in detail.

GET /api/v4.0/data/topx

This route provides API access to the same Top Metrics available in the Dashboard. Here's an example of Top Countries in the Dashboard:

Calling this route returns all metrics of data: all results for "top applications", all results for "top countries", all items for "top sources", and so on. They are combined into a single continuous list:

{
  "data": {
    "results": [
        $RESULT1,
        $RESULT2,
        $RESULT3,
        ...
        $RESULTn
    ],
    "statistics": {
      "bytes_billed": null,
      "bytes_processed": null,
      "elapsed_ms": 0
    }
  },
  "status": 200
}

...where the list of $RESULTs looks something like this (incomplete) example:

"results": [
  {
    "key": "China",
    "label": "country"
  },
  {
    "key": "United States",
    "label": "country"
  },
  {
    "key": "161.1.50.2",
    "label": "ip"
  },
  {
    "key": "101.6.123.53",
    "label": "ip"
  },
  {
    "key": "161.1.50.5",
    "label": "ip"
  },
  {
    "key": "mysite.com/login",
    "label": "url"
  }
]

We see that in the time period specified in the filters parameter, there were three IP addresses in two countries that sent requests to a single URL.

Some points to note:

  • The results are organized and grouped together in the list according to their label.

  • Labels are ordered alphabetically.

  • Labels can differ in their number of results.

  • The route returns the "top" results for each label (for example, results with the "ip" label show the IPs that sent the most blocked requests). When there are only a few results for a given label, all are retrieved. When there are many, only the "top" results are retrieved.

Actual usage

The example above is oversimplified. In actual use, the topx route:

  • Returns much more data per result, not just key and label (see the list of fields below)

  • Returns more categories than just country, ip, and url (see the discussion of the label field below)

A detailed discussion of topx follows.

Contents of each result

Each result contains the fields listed below.

The _time fields are in seconds. These are floats, but can appear at various precisions: zero decimal places, several decimal places, or scientific notation (e.g., 1.6210818451802098e-9).

Organization of results: the label and key fields

The topx route returns results in a specific order:

  • Results are grouped together according to their label (i.e., their category).

  • Labels are ordered alphabetically (see full list below)

  • Within each label, results are ordered by num_of_blocked_requests, in descending order.

Notice that this is unlike the UI's Dashboard Top Metrics, where some types of results have other default orders.

The topx route returns twelve categories of results, each with its own label. The label determines the contents of the key field.

GET /api/v4.0/data/stats

This route returns traffic metrics for the requested time period, broken down into shorter segments of time.

Data structures

The retrieved metrics are structured like this:

{
  "data": {
    "results": [
      $RESULTS-TIMESEGMENT-1,
      $RESULTS-TIMESEGMENT-2,
      ...
      $RESULTS-TIMESEGMENT-n
    ],
    "statistics": {
      "bytes_billed": null,
      "bytes_processed": null,
      "elapsed_ms": 0
    }
  },
  "status": 200
}

...where each $RESULTS-TIMESEGMENT-x has this structure:

{
   "avg_latency": float,
   "hostname": string,
   "num_of_requests": integer,
   "sum_of_bandwidth": integer,
   "time_period": integer,
   "timeperiod_string": string
}

Contents of each result

GET /api/v4.0/data/timeline

This route returns security metrics for the requested time period, broken down into shorter segments of time.

Data structures

The retrieved metrics are structured like this:

{
  "data": {
    "results": [
      $RESULTS-TIMESEGMENT-1,
      $RESULTS-TIMESEGMENT-2,
      ...
      $RESULTS-TIMESEGMENT-n
    ],
    "statistics": {
      "bytes_billed": null,
      "bytes_processed": null,
      "elapsed_ms": 0
    }
  },
  "status": 200
}

...where each $RESULTS-TIMESEGMENT-x has this structure:

{
  "array_origin_status_codes": [
    $STATUS-DATA-ORIGIN1,
    $STATUS-DATA-ORIGIN2,
    ...
    $STATUS-DATA-ORIGINn,    
  ],
  "array_status_codes": [
    $STATUS-DATA-REBLAZE1,
    $STATUS-DATA-REBLAZE2,
    ...
    $STATUS-DATA-REBLAZEn,    
  ],
  "num_of_blocked_requests": integer,
  "num_of_challenges": integer,
  "num_of_human_requests": integer,
  "num_of_ip": integer,
  "num_of_origin_blocked_requests": integer,
  "num_of_requests": integer,
  "num_of_sessions": integer,
  "sum_of_sent_bytes": integer,
  "time_period": integer,
  "timeperiod_string": string 
}

...and each $STATUS-DATA-x contains the number of responses with a specific status:

{
  "num_of_requests": integer,
  "status": integer [an HTTP status code]
}

Contents of each result

GET /api/v4.0/data/logs

This route returns all requests that match the filter parameters, up to the number of requests specified. The results are returned like this:

{
  "data": {
    "results": [
      {
            $REQUEST1
      },
      {
            $REQUEST2
      }, 
      {
            $REQUEST3
      },           
      ...
      {
            $REQUESTn
      }
    ],
    "statistics": {
      "bytes_billed": null,
      "bytes_processed": null,
      "elapsed_ms": 0
    }
  },
  "status": 200
}

Each $REQUEST has this structure:

$FIELD1: $VALUE1,
$FIELD2: $VALUE2,
...
$FIELDn: $VALUEn

...where the $FIELDs are the Field names listed above in the filters discussion, and the $VALUEs are their values, if any. So a request looks like this:

"acl_triggers": [],
"arguments": {},
"asn": "AS4837",
...
"user_agent": "curl/7.74.0",
"version": null

Last updated