API Access to Traffic Data

Introduction

The Data queries API namespace provides API access to traffic data.

Within it, there are several routes that include a filters parameter:

  • GET /api/v4.0/data/topx

  • GET /api/v4.0/data/logs

  • GET /api/v4.0/data/stats

  • GET /api/v4.0/data/timeline

This parameter specifies the traffic data that will be retrieved. Below, we discuss its syntax and construction.

The discussion below is for the value of the filters parameter.

  • When using Swagger UI, this value is supplied directly in the filters input field.

  • When using curl to call the Reblaze API, this value is encoded into the destination URL; see this discussion for more information.

A short example

Here is an example of a filters specification in JSON format:

{
  "AND": [
    {
      "field": "timestamp",
      "op": "between",
      "value": [
        "2024-06-06 09:31:00",
        "2024-06-06 09:36:00"
      ]
    },
    {
      "field": "tags",
      "op": "regex",
      "value": "unrecognized"
    }
  ]
}

This will return requests which:

  • were received in a five-minute time period (from 2024-06-06 09:31:00 to 2024-06-06 09:36:00), and...

  • have a Tag containing the string unrecognized . (In this example, the admin wanted to retrieve requests tagged with unrecognized-host-header.)

Format

The filters parameter can be supplied as a query string, or as JSON.

Query string format

This format is used in the first Query Specification input field in the UI's Dashboard and Events Log. For example, this query string will display all requests with a 301 response code within a certain time period:

status=301, timestamp between 2024-06-06 09:31:00 and 2024-06-06 09:36:00

For more information on this format, see Query filter syntax and best practices.

JSON format

The JSON equivalent of the query string above is:

{
  "AND": [
    {
      "field": "timestamp",
      "op": "between",
      "value": [
        "2024-06-06 09:31:00",
        "2024-06-06 09:36:00"
      ]
    },
    {
      "field": "status",
      "op": "eq",
      "value": 301
    }
  ]
}

(This is provided as an example only. A full discussion of JSON syntax is below.)

Converting from query string to JSON

The POST /api/v4.0/data/timeline/parse API route accepts query strings and returns the same query in JSON format.

JSON structure

The discussion below will focus on building the filters parameter in JSON format, for two reasons. First, text query strings are discussed elsewhere (in the links given above). Second, for complex queries, JSON is more powerful.

A JSON filters parameter is structured as follows:

{
  "AND": [
            condition1,
            condition2,
            condition3...
  ]
}

The first condition: a range of dates/times

The first condition must be included, and must be a range of dates/times. It is structured as follows:

{
  "field": "timestamp",
  "op": "between",
  "value": [
    "$TIMESTAMP1",
    "$TIMESTAMP2"
  ]
}

where $TIMESTAMP1 and $TIMESTAMP2 are timestamps: specifications of date and time.

For timestamps, any ISO format is supported. Nevertheless, both timestamps must include year, month, and day.

If hours, minutes, or seconds are not included in a timestamp, the time will be rendered as the beginning of the day/hour/minute, respectively. Examples: "2022-07-14" -> "2022-07-14 00:00:00", "2022-07-14 06:52" -> "2022-07-14 06:52:00", etc.

Subsequent conditions

After the first condition, additional conditions can be specified if desired. They are structured as follows:

{
  "field": "$FIELD_NAME",
  "op": "$OPERATOR",
  "key": "$KEY",
  "value": $VALUE
}

They must meet these requirements:

  • Multiple conditions are combined with a logical AND. (A logical OR is not supported, as this could potentially retrieve unexpectedly large amounts of data.)

  • When multiple conditions are provided, all are followed by commas, except for the final one.

  • The "key" line is optional; see discussion below.

Note: here in the documentation, spaces and carriage returns are included in JSON filter examples for clarity. In usage, they are optional. Also, the order of a condition's components (its field/op/key/value) does not matter.

Field names

Available fields include those inherent to HTTP requests, along with additional data added by Reblaze during processing.

  • Some of the Reblaze-added information consists of internal IDs for the security settings that are relevant to the request.

  • Some requests will not contain all information. When Reblaze blocks a request, processing usually stops immediately, and later stages in the traffic filtering process do not occur.

Field nameComments

acl_triggers

Populated during evaluation of the active ACL Profile. Contains keys: allow, action, extra, tags, trigger_id, trigger_name. All have string values, except for tags, which is an array of strings.

arguments

Arguments of the request, if any. Query string and JSON examples for mysite.com/page?foo=1 :

arguments["foo"]="1"

{"field": "arguments", "key": "^foo$", "op": "eq", "value": "1"}

asn

string

authority

string

biometric

array

blocked

boolean

bot

boolean

branch

string

bytes_sent

integer

challenge

boolean

challenge_type

string

cf_restrict_triggers

array

cf_triggers

Populated if the request triggered a Content Filter Rule. Contains keys: action, extra, name, risk_level, ruleid, section, trigger_id, trigger_name, value. All have string values, except for risk_level, which is an integer.

challenge

boolean

challenge_type

string

cookies

Can inspect specific cookies or all cookies. See JSON filter examples.

country

string

dr_triggers

array

geo_region

string

gf_triggers

An array of entries, one for each Global Filter that matched the request. Each entry contains these keys: action, extra, name, section, trigger_id, trigger_name, value

headers

Can inspect specific headers or all headers. See JSON filter examples.

host

string

hostname

string

human

boolean

ichallenge

boolean

ip

string

logs

array

method

string

monitor

boolean: whether or not the request triggered a Monitor action.

monitor_reasons

array of strings; the various reasons (if any) that the request triggered Monitor actions.

organization

string

path

string. The path excluding the TLD and excluding arguments; string begins with "/".

path_parts

string(s). Contents for mysite.com/abc/123/home.html?foo=true: "path_parts": {

"part1": "abc",

"part2": "123",

"part3": "home.html",

"path": "/abc/123/home.html"

} To match the second part of this example: path_parts["part2"]="123" {"field": "path_parts", "key": "^part2$", "op": "eq", "value": "123"}

port

string

processing_stage

integer: the furthest stage of traffic filtering that was reached. 0: Initialization 2: Global Filtering 3. Flow Control 4. Global Rate Limits 5. Rate Limits 6. ACL Profile 7. Content Filtering

profiling

array of items containing the security settings relevant to this request. Each item contains: a name (secpol, mapping, flow, limit, acl, content_filter) and value (the internal ID of that setting).

protocol

string

proxy

array of items containing proxy-related data. Each item contains a name and value. The names are: additional_tags, bytes_sent, container, geo_as_domain, geo_as_name, geo_as_type, geo_company_country, geo_company_domain, geo_company_type, geo_lat, geo_long, geo_mobile_carrier, geo_mobile_country, geo_mobile_mcc, geo_mobile_mnc, realip, request_id, request_length, request_time, ssl_cipher, ssl_protocol, status.

query

string. Example: for mysite.com/page?code=117, this is ?code=117.

rbz_latency

integer

rbzid

Cookie set by Reblaze. Example query string and JSON: cookies["rbzid"]="Jc491eLWqTBOfDnJwNk" {"field": "cookies", "key": "^rbzid$", "op": "eq", "value": "Jc491eLWqTBOfDnJwNk"}

rbzsessionid

Cookie set by Reblaze. Example query string and JSON: cookies["rbzsessionid"]="57870178706cb50db6d41aab" {"field": "cookies", "key": "^rbzsessionid$", "op": "eq", "value": "578701713f70dcd8706cb50db6d41aab"}

reason

string. The reason, if any, the request was blocked.

referer

string

request_id

string

request_length

integer

request_time

float

result

string; the disposition of the request. A way to quickly see anomalies is to search for {"field": "result", "op": "not eq", "value": "Passed"}

rl_triggers

array; the reasons (if any) that rate limits were triggered.

security_config

The configuration of security settings when this request was processed. Keys and data types are: acl_active (boolean)

cf_active (boolean)

cf_rules (integer)

gf_rules (integer)

revision (string)

rl_rules (integer)

secpolentryid (string)

secpolid (string)

session

string

session_ids

array

status

integer

tags

array of strings: all the tags attached to the request

time_period

integer

timestamp

date and time

trigger_counters

Collection of keys (counters) and values (integers with the value of each counter). Counter names are strings: acl, cf, cf_restrict, dr, gf, rl

upstream_addr

array

upstream_data

array

upstream_response_time

float or null

upstream_status

array of integers

url

string

user_agent

string

version

string

Key

For some types of data, it might not be enough to specify the field, because there could be multiple parameters in the request that match it. For example, a field name of "cookie" or "header" does not tell the system which cookie or header to inspect.

The key field supplies this information; it is the name of the specific parameter to evaluate. If this parameter is not defined, the system will inspect all instances of the specified field (all cookies, all headers, etc.).

Some examples are in the JSON filters examples below.

Values

Values should be specified in the appropriate data type: strings as quote-delimited strings, integers as numbers, etc. Arrays of values can be supplied.

Operators

OperatorData TypeDescription

is

boolean

checks if value is True or False

eq

integer / float / string

checks exact match for numeric/string value

gt/ lt

integer / float

checks if value is greater/less than

gte/ lte

integer / float

checks if value is greater/less than or equal

in

integer / float / string

checks if numeric/string value is in a list of values

regex

string

checks if string has a match with a regex

between

integer / float / timestamp

checks if value between two numbers/ timestamps. Does not depend on order.

Negative operators

Operators can be inverted by adding not. For example, not eq means "does not equal".

Inverting a condition

It's possible to invert an entire condition by adding NOT, like this:

{
  "NOT": {
    "field": "$FIELD_NAME",
    "op": "$OPERATOR",
    "value": $VALUE
  }
}

JSON filter examples

Retrieve PUT and POST requests:

{
    "AND": [
        {"field": "timestamp", "op": "between", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"]},
        {"field": "method", "value":"^P.*T$", "op":"regex"}
    ]
}

Retrieve requests containing a tag that matches "geo" or "location":

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {"field": "tags", "value": ["geo", "location"], "op": "in"}
    ]
}

Retrieve requests where certain cookies' values match a regex:

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {"field": "cookies", "key": "analytics_.*", "value": "gcp_.*", "op": "regex"}
    ]
}

Retrieve requests where any cookie's value matches a regex:

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {"field": "cookies", "value": "gcp_.*", "op": "regex"}
    ]
}

Retrieve requests according to a subfield (the acl_active subfield of security_config must be greater than 11).

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {"field": "security_config", "key": "acl_active", "value": 11, "op": "gt"}
  ]
}

Retrieve requests according to an array of subfields:

{
    "AND": [
        {"field": "timestamp", "value": ["2022-07-14 06:52:37", "2022-07-12 06:52:37"], "op": "between"},
        {
            "field": "cf_triggers",
            "conditions": [
                {"field": "risk_level", "value": 2, "op": "gt"},
                {"field": "ruleid", "value": "100037", "op": "eq"}
            ]
        }
    ]
}

Retrieve requests according to a combination of conditions (there is no limit on the number of conditions):

{
    "AND": [
        {"field": "timestamp", "value": ["2022-08-08 01:15:25", "2022-08-08 01:52:28"], "op": "between"},
        {"field": "tags", "value":"geo-continent-name:north-america", "op": "eq"},
        {"field": "agent", "key": "ephemeral_id", "value": "029ab6f-6d15-4290-b44f-9133", "op": "regex"},
        {"field": "trigger_counters", "key": "acl","value": 2,"op": "not gt"},
        {"field": "headers", "key": "x-forwarded.*", "value": "3.65.14.177", "op": "regex"},
        {"field": "path_parts", "key": "^part3$", "value": "^-1&", "op": "regex"},
        {"field": "security_config", "key": "cf_active", "value": true, "op": "not eq"},
        {"field": "ip", "value": ["2.55.96.231", "23.65.14.177", "3.1.92.15", "8.206.254.196"], "op": "in"}
    ]
}

Last updated