API Access to Traffic Data

Introduction

The Data queries API namespace provides access to traffic data, via these four routes:

  • GET /api/v4.0/data/topx

  • GET /api/v4.0/data/stats

  • GET /api/v4.0/data/timeline

  • GET /api/v4.0/data/logs

Each includes an input parameter named filters. This parameter specifies the traffic data that will be retrieved.

On the page below, we begin by discussing this parameter's usage, syntax, and construction. Then we discuss the four routes that require it, and the different types of data they return.

The filters parameter

filters is a string that specifies one or more conditions. A database query is constructed from those conditions, and the results are returned to the user.

Usage of the filters parameter depends on the context:

  • When using Swagger UI, this value is supplied directly in the filters input field.

  • When using curl to call the Reblaze API, this value is encoded into the destination URL, preceded by "filters=". See this explanation for more information.

A short example

Here is an example of a filters specification in JSON format:

{
  "AND": [
    {
      "field": "timestamp",
      "op": "between",
      "value": [
        "2024-06-06 09:31:00",
        "2024-06-06 09:36:00"
      ]
    },
    {
      "field": "tags",
      "op": "regex",
      "value": "unrecognized"
    }
  ]
}

This will return requests which:

  • were received in a five-minute time period (from 2024-06-06 09:31:00 to 2024-06-06 09:36:00), and...

  • have a Tag containing the string unrecognized . (In this example, the admin wanted to retrieve requests tagged with unrecognized-host-header.)

Format

The filters parameter can be supplied as a query string, or as JSON.

Query string format

This format is used in the first Query Specification input field in the UI's Dashboard and Events Log. For example, this query string will display all requests with a 301 response code within a certain time period:

status=301, timestamp between 2024-06-06 09:31:00 and 2024-06-06 09:36:00

For more information on this format, see Query filter syntax and best practices.

JSON format

The JSON equivalent of the query string above is:

{
  "AND": [
    {
      "field": "timestamp",
      "op": "between",
      "value": [
        "2024-06-06 09:31:00",
        "2024-06-06 09:36:00"
      ]
    },
    {
      "field": "status",
      "op": "eq",
      "value": 301
    }
  ]
}

(This is provided as an example only. A full discussion of JSON syntax is below.)

Converting from query string to JSON

The POST /api/v4.0/data/timeline/parse API route accepts query strings and returns the same query in JSON format.

JSON structure

The discussion below will focus on building the filters parameter in JSON format, for two reasons. First, text query strings are discussed elsewhere (in the links given above). Second, for complex queries, JSON is more powerful.

A JSON filters parameter is structured as follows:

{
  "AND": [
    $CONDITION1,
    $CONDITION2,
    ...
    $CONDITIONn
  ]
}

The first condition: a range of dates/times

The first condition must be included, and must be a range of dates/times. It is structured as follows:

{
  "field": "timestamp",
  "op": "between",
  "value": [
    "$TIMESTAMP1",
    "$TIMESTAMP2"
  ]
}

where $TIMESTAMP1 and $TIMESTAMP2 are timestamps: specifications of date and time.

For timestamps, any ISO format is supported. Nevertheless, both timestamps must include year, month, and day.

If hours, minutes, or seconds are not included in a timestamp, the time will be rendered as the beginning of the day/hour/minute, respectively. Examples: "2022-07-14" -> "2022-07-14 00:00:00", "2022-07-14 06:52" -> "2022-07-14 06:52:00", etc.

Subsequent conditions

After the first condition, additional conditions can be specified if desired. They are structured as follows:

{
  "field": "$FIELD_NAME",
  "op": "$OPERATOR",
  "key": "$KEY",
  "value": $VALUE
}

They must meet these requirements:

  • Multiple conditions are combined with a logical AND. (A logical OR is not supported, as this could potentially retrieve unexpectedly large amounts of data.)

  • When multiple conditions are provided, all are followed by commas, except for the final one.

  • The "key" line is optional; see discussion below.

Note: here in the documentation, spaces and carriage returns are included in JSON filter examples for clarity. In usage, they are optional. Also, the order of a condition's components (its field/op/key/value) does not matter.

Field names

Available fields include those inherent to HTTP requests, along with additional data added by Reblaze during processing.

  • Some of the Reblaze-added information consists of internal IDs for the security settings that are relevant to the request.

  • Some requests will not contain all information. When Reblaze blocks a request, processing usually stops immediately, and later stages in the traffic filtering process do not occur.

Field nameComments

acl_triggers

Populated during evaluation of the active ACL Profile. Contains keys: acl_action

action (the type of Action that was triggered)

extra (currently unused)

tags

trigger_id

trigger_name All are strings, except for tags, which is an array of strings.

arguments

Arguments of the request, if any. Query string and JSON examples for mysite.com/page?foo=1 :

arguments["foo"]="1"

{"field": "arguments", "key": "^foo$", "op": "eq", "value": "1"}

asn

string

authority

string

biometric

array

blocked

boolean

bot

boolean

branch

string

bytes_sent

integer

challenge

boolean

challenge_type

string

cf_restrict_triggers

array

cf_triggers

Populated if the request triggered a Content Filter Rule. Contains keys: action, extra, name, risk_level, ruleid, section, trigger_id, trigger_name, value. All have string values, except for risk_level, which is an integer.

challenge

boolean

challenge_type

string

cookies

Can inspect specific cookies or all cookies. See JSON filter examples.

country

string

dr_triggers

array

geo_region

string

gf_triggers

An array of entries, one for each Global Filter that matched the request. Each entry contains these keys: action, extra, name, section, trigger_id, trigger_name, value

headers

Can inspect specific headers or all headers. See JSON filter examples.

host

string

hostname

string

human

boolean

ichallenge

boolean

ip

string

logs

array

method

string

monitor

boolean: whether or not the request triggered a Monitor action.

monitor_reasons

array of strings; the various reasons (if any) that the request triggered Monitor actions.

organization

string

path

string. The path excluding the TLD and excluding arguments; the string begins with "/".

path_parts

string. Contents for mysite.com/abc/123/home.html?foo=true: "path_parts": {

"part1": "abc",

"part2": "123",

"part3": "home.html",

"path": "/abc/123/home.html"

} To match the second part of this example: path_parts["part2"]="123" {"field": "path_parts", "key": "^part2$", "op": "eq", "value": "123"}

port

string

processing_stage

integer: the furthest stage of traffic filtering that was reached. 0: Initialization 2: Global Filtering 3. Flow Control 4. Global Rate Limits 5. Rate Limits 6. ACL Profile 7. Content Filtering

profiling

array of items containing the security settings relevant to this request. Each item contains: a name (secpol, mapping, flow, limit, acl, content_filter) and value (the internal ID of that setting).

protocol

string

proxy

array of items containing proxy-related data. Each item contains a name and value. The names are: additional_tags, bytes_sent, container, geo_as_domain, geo_as_name, geo_as_type, geo_company_country, geo_company_domain, geo_company_type, geo_lat, geo_long, geo_mobile_carrier, geo_mobile_country, geo_mobile_mcc, geo_mobile_mnc, realip, request_id, request_length, request_time, ssl_cipher, ssl_protocol, status.

query

string. Example: for mysite.com/page?code=117, this is ?code=117.

rbz_latency

integer

rbzid

Cookie set by Reblaze. Example query string and JSON: cookies["rbzid"]="Jc491eLWqTBOfDnJwNk" {"field": "cookies", "key": "^rbzid$", "op": "eq", "value": "Jc491eLWqTBOfDnJwNk"}

rbzsessionid

Cookie set by Reblaze. Example query string and JSON: cookies["rbzsessionid"]="57870178706cb50db6d41aab" {"field": "cookies", "key": "^rbzsessionid$", "op": "eq", "value": "578701713f70dcd8706cb50db6d41aab"}

reason

string. The reason, if any, the request was blocked.

referer

string

request_id

string

request_length

integer

request_time

float

result

string; the disposition of the request. A way to quickly see anomalies is to search for {"field": "result", "op": "not eq", "value": "Passed"}

rl_triggers

array; the reasons (if any) that rate limits were triggered.

security_config

The configuration of security settings when this request was processed. Keys and data types are: acl_active (boolean)

cf_active (boolean)

cf_rules (integer)

gf_rules (integer)

revision (string)

rl_rules (integer)

secpolentryid (string)

secpolid (string)

session

string

session_ids

array

status

integer

tags

array of strings: all the tags attached to the request

time_period

integer; the Epoch Unix timestamp of the request

timestamp

string; date and time

trigger_counters

The number of times an Action was triggered, and the source of the triggers (ACL Profile, Content Filtering, Global Filters, or Rate Limits). This is a collection of keys (counters) and values (integers with the value of each counter). Counter names are strings: acl, cf, cf_restrict, dr, gf, rl. Sample filter condition: {"field": "trigger_counters", "key": "acl", "value": 0, "op": "gt"}

upstream_addr

array of strings

upstream_data

array of elements: {addr (string), response_time (float), status (integer)}

upstream_response_time

float or null

upstream_status

array of integers

url

string

user_agent

string

version

string

Key

For some types of data, it might not be enough to specify the field, because there could be multiple parameters in the request that match it. For example, a field name of "cookies" or "headers" does not tell the system which cookie or header to inspect.

The key field supplies this information; it is the name of the specific parameter to evaluate. If this parameter is not defined, the system will inspect all instances of the specified field (all cookies, all headers, etc.).

Some examples are in the JSON filters examples below.

Values

Values should be specified in the appropriate data type: strings as quote-delimited strings, integers as numbers, etc. Arrays of values can be supplied.

Operators

OperatorData TypeDescription

is

boolean

checks if value is True or False

eq

integer / float / string

checks exact match for numeric/string value

gt/ lt

integer / float

checks if value is greater/less than

gte/ lte

integer / float

checks if value is greater/less than or equal

in

integer / float / string

checks if numeric/string value is in a list of values

regex

string

checks if string has a match with a regex

between

integer / float / timestamp

checks if value between two numbers/ timestamps. Does not depend on order.

Negative operators

Operators can be inverted by adding not. For example, not eq means "does not equal".

Inverting a condition

It's possible to invert an entire condition by adding NOT, like this:

{
  "NOT": {
    "field": "$FIELD_NAME",
    "op": "$OPERATOR",
    "key": "$KEY",
    "value": $VALUE
  }
}

JSON filter examples

Retrieve PUT and POST requests:

{
    "AND": [
        {"field": "timestamp", "op": "between", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"]},
        {"field": "method", "value":"^P.*T$", "op":"regex"}
    ]
}

Retrieve requests containing a tag that matches "geo" or "location":

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {"field": "tags", "value": ["geo", "location"], "op": "in"}
    ]
}

Retrieve requests where certain cookies' values match a regex:

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {"field": "cookies", "key": "analytics_.*", "value": "gcp_.*", "op": "regex"}
    ]
}

Retrieve requests where any cookie's value matches a regex:

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {"field": "cookies", "value": "gcp_.*", "op": "regex"}
    ]
}

Retrieve requests according to a subfield (the acl_active subfield of security_config must be greater than 11).

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {"field": "security_config", "key": "acl_active", "value": 11, "op": "gt"}
  ]
}

Retrieve requests according to an array of subfields:

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {
            "field": "cf_triggers",
            "conditions": [
                {"field": "risk_level", "value": 2, "op": "gt"},
                {"field": "ruleid", "value": "100037", "op": "eq"}
            ]
        }
    ]
}

Retrieve requests according to a combination of conditions (there is no limit on the number of conditions):

{
    "AND": [
        {"field": "timestamp", "value": ["2022-08-08 01:15:25", "2022-08-08 01:52:28"], "op": "between"},
        {"field": "tags", "value":"geo-continent-name:north-america", "op": "eq"},
        {"field": "agent", "key": "ephemeral_id", "value": "029ab6f-6d15-4290-b44f-9133", "op": "regex"},
        {"field": "trigger_counters", "key": "acl","value": 2,"op": "not gt"},
        {"field": "headers", "key": "x-forwarded.*", "value": "3.65.14.177", "op": "regex"},
        {"field": "path_parts", "key": "^part3$", "value": "^-1&", "op": "regex"},
        {"field": "security_config", "key": "cf_active", "value": true, "op": "not eq"},
        {"field": "ip", "value": ["2.55.96.231", "23.65.14.177", "3.1.92.15", "8.206.254.196"], "op": "in"}
    ]
}

API access to traffic data

As noted previously, there are four API routes that use the filters parameter to retrieve traffic data:

  • GET /api/v4.0/data/topx

  • GET /api/v4.0/data/stats

  • GET /api/v4.0/data/timeline

  • GET /api/v4.0/data/logs

Their typical uses are as follows.

Quickly discover the most important factors in the traffic stream (e.g., the countries sending the most blocked requests, the URLs receiving the most bot traffic, etc.): use the topx route.

Get a summary of traffic statistics (total requests, bandwidth, and latency): use the stats route.

Get a summary of security metrics (total requests, blocked requests, status codes returned, number of human clients, activity of the origin, etc.): use the timeline route

Get complete data for all requests matching certain criteria (often, drilling down further into trends discovered from the other routes): use the logs route.

Below, we discuss each route in detail.

GET /api/v4.0/data/topx

This route provides API access to the same Top Metrics available in the Dashboard. Here's an example of Top Countries in the Dashboard:

Calling this route returns all metrics of data: all results for "top applications", all results for "top countries", all items for "top sources", and so on. They are combined into a single continuous list:

{
  "data": {
    "results": [
        $RESULT1,
        $RESULT2,
        $RESULT3,
        ...
        $RESULTn
    ],
    "statistics": {
      "bytes_billed": null,
      "bytes_processed": null,
      "elapsed_ms": 0
    }
  },
  "status": 200
}

...where the list of $RESULTs looks something like this (incomplete) example:

"results": [
  {
    "key": "China",
    "label": "country"
  },
  {
    "key": "United States",
    "label": "country"
  },
  {
    "key": "161.1.50.2",
    "label": "ip"
  },
  {
    "key": "101.6.123.53",
    "label": "ip"
  },
  {
    "key": "161.1.50.5",
    "label": "ip"
  },
  {
    "key": "mysite.com/login",
    "label": "url"
  }
]

We see that in the time period specified in the filters parameter, there were three IP addresses in two countries that sent requests to a single URL.

Some important points:

  • The results are organized and grouped together in the list according to their label.

  • Labels are ordered alphabetically.

  • Labels can differ in their number of results.

  • The route returns the "top" results for each label (for example, results with the "ip" label show the IPs that sent the most blocked requests). When there are only a few results for a given label, all are retrieved. When there are many, only the "top" results are retrieved.

Actual usage

The example above is oversimplified. In actual use, the topx route:

  • Returns much more data per result, not just key and label (see the list of fields below)

  • Returns more categories than just country, ip, and url (see the discussion of the label field below)

A detailed discussion of topx follows.

Contents of each result

Each result contains the fields listed below.

The _time fields are in seconds. These are floats, but can appear at various precisions: zero decimal places, several decimal places, or scientific notation (e.g., 1.6210818451802098e-9).

Result field nameTypeComments

avg_origin_time

float

The average amount of processing time by the origin for these requests. If no requests reached the origin, this will be null.

avg_rbz_time

float

The average amount of processing time by Reblaze for these requests.

avg_total_time

float

The average total amount of processing time for these requests.

first_asn

string

First entry in the list of ASNs

first_geo_country

string

First entry in the list of countries

first_organization

string

First entry in the list of organizations

key

string

Content varies; see discussion below.

label

string

Category of result. See discussion below.

max_origin_time

float

The longest amount of processing time by the origin among these requests. If no requests reached the origin, this will be null.

max_rbz_time

float

The longest amount of processing time by Reblaze among these requests.

max_total_time

float

The longest amount of total processing time among these requests.

min_origin_time

float

The shortest amount of processing time by the origin among these requests. If no requests reached the origin, this will be null.

min_rbz_time

float

The shortest amount of processing time by Reblaze among these requests.

min_total_time

float

The shortest amount of total processing time among these requests.

num_of_blocked_requests

integer

num_of_bot_requests

integer

num_of_challenges

integer

num_of_human_requests

integer

num_of_monitored_requests

integer

Includes all requests that triggered a "monitor" action, even if they were blocked as well.

num_of_requests

integer

sum_of_bytes_sent

integer

sum_of_request_length

integer

Organization of results: the label and key fields

The topx route returns results in a specific order:

  • Results are grouped together according to their label (i.e., their category).

  • Labels are ordered alphabetically (see full list below)

  • Within each label, results are ordered by num_of_blocked_requests, in descending order.

Notice that this is unlike the UI's Dashboard Top Metrics, where some types of results have other default orders.

The topx route returns twelve categories of results, each with its own label. The label determines the contents of the key field.

Label'Key' field contains

country

country name

host

host name or IP

ip

IP address

organization

organization

origin_time

target URL

rbz_time

target URL

rbzid

rbzid cookie value

reason

reason the request was monitored or blocked

referer

referer string

total_time

target URL

url

target URL

user_agent

user agent string

GET /api/v4.0/data/stats

This route returns traffic metrics for the requested time period, broken down into shorter segments of time.

Data structures

The retrieved metrics are structured like this:

{
  "data": {
    "results": [
      $RESULTS-TIMESEGMENT-1,
      $RESULTS-TIMESEGMENT-2,
      ...
      $RESULTS-TIMESEGMENT-n
    ],
    "statistics": {
      "bytes_billed": null,
      "bytes_processed": null,
      "elapsed_ms": 0
    }
  },
  "status": 200
}

...where each $RESULTS-TIMESEGMENT-x has this structure:

{
   "avg_latency": float,
   "hostname": string,
   "num_of_requests": integer,
   "sum_of_bandwidth": integer,
   "time_period": integer,
   "timeperiod_string": string
}

Contents of each result

Field name

avg_latency

Average latency in seconds

hostname

Host

num_of_requests

Total requests received during the time period

sum_of_bandwidth

Total bytes sent and received

time_period

Beginning of time segment, as an Epoch Unix integer (e.g., 1718186400)

timeperiod_string

Beginning of time segment, as a string (e.g., "2024-06-12 10:00:00")

GET /api/v4.0/data/timeline

This route returns security metrics for the requested time period, broken down into shorter segments of time.

Data structures

The retrieved metrics are structured like this:

{
  "data": {
    "results": [
      $RESULTS-TIMESEGMENT-1,
      $RESULTS-TIMESEGMENT-2,
      ...
      $RESULTS-TIMESEGMENT-n
    ],
    "statistics": {
      "bytes_billed": null,
      "bytes_processed": null,
      "elapsed_ms": 0
    }
  },
  "status": 200
}

...where each $RESULTS-TIMESEGMENT-x has this structure:

{
  "array_origin_status_codes": [
    $STATUS-DATA-ORIGIN1,
    $STATUS-DATA-ORIGIN2,
    ...
    $STATUS-DATA-ORIGINn,    
  ],
  "array_status_codes": [
    $STATUS-DATA-REBLAZE1,
    $STATUS-DATA-REBLAZE2,
    ...
    $STATUS-DATA-REBLAZEn,    
  ],
  "num_of_blocked_requests": integer,
  "num_of_challenges": integer,
  "num_of_human_requests": integer,
  "num_of_ip": integer,
  "num_of_origin_blocked_requests": integer,
  "num_of_requests": integer,
  "num_of_sessions": integer,
  "sum_of_sent_bytes": integer,
  "time_period": integer,
  "timeperiod_string": string 
}

...and each $STATUS-DATA-x contains the number of responses with a specific status:

{
  "num_of_requests": integer,
  "status": integer [an HTTP status code]
}

Contents of each result

Field name

array_origin_status_codes

An array: each element contains a status code and the number of responses from the origin with that code. If no requests reached the origin during the specified time period, the array will be empty.

array_status_codes

An array: each element contains a status code and the number of responses from Reblaze with that code.

num_of_blocked_requests

Requests that were blocked

num_of_challenges

Number of times Reblaze issued a bot challenge

num_of_human_requests

Requests from human (i.e., non-bot) clients

num_of_ip

Number of IPs used by clients

num_of_origin_blocked_requests

Number of requests rejected by the origin

num_of_requests

Total requests received during the time period

num_of_sessions

Number of unique sessions

sum_of_sent_bytes

Total bytes sent

time_period

Beginning of time segment, as an Epoch Unix integer (e.g., 1718186400)

timeperiod_string

Beginning of time segment, as a string (e.g., "2024-06-12 10:00:00")

GET /api/v4.0/data/logs

This route returns all requests that match the filter parameters, up to the number of requests specified. The results are returned like this:

{
  "data": {
    "results": [
      {
            $REQUEST1
      },
      {
            $REQUEST2
      }, 
      {
            $REQUEST3
      },           
      ...
      {
            $REQUESTn
      }
    ],
    "statistics": {
      "bytes_billed": null,
      "bytes_processed": null,
      "elapsed_ms": 0
    }
  },
  "status": 200
}

Each $REQUEST has this structure:

$FIELD1: $VALUE1,
$FIELD2: $VALUE2,
...
$FIELDn: $VALUEn

...where the $FIELDs are the Field names listed above in the filters discussion, and the $VALUEs are their values, if any. So a request looks like this:

"acl_triggers": [],
"arguments": {},
"asn": "AS4837",
...
"user_agent": "curl/7.74.0",
"version": null

Last updated