API access to traffic data
The page below describes how to retrieve traffic data via the API. It contains these main sections:
The filters parameter (which specifies the traffic data that will be retrieved)
The four API routes that retrieve traffic data, and how to use them
Introduction
The Data queries API namespace provides access to traffic data, via these four routes:
GET /api/v4.0/data/topx
GET /api/v4.0/data/stats
GET /api/v4.0/data/timeline
GET /api/v4.0/data/logs
Each includes an input parameter named filters
. This parameter specifies the traffic data that will be retrieved.
On the page below, we begin by discussing this parameter's usage, syntax, and construction. Then we discuss the four API routes that require it, and the different types of data they return.
The filters parameter
filters
is a string that specifies one or more conditions. A database query is constructed from those conditions, and the results are returned to the user.
Usage of the filters
parameter depends on the context:
When using Swagger UI, this value is supplied directly in the
filters
input field.When using curl to call the Reblaze API, this value is encoded into the destination URL, preceded by "filters=". See this explanation for more information.
A short example
Here is an example of a filters
specification in JSON format:
This will return requests which:
were received in a five-minute time period (from 2024-06-06 09:31:00 to 2024-06-06 09:36:00), and...
have a Tag containing the string
unrecognized
. (In this example, the admin wanted to retrieve requests tagged withunrecognized-host-header
.)
Format
The filters
parameter can be supplied as a query string, or as JSON.
Query string format
This format is used in the first Query Specification input field in the UI's Dashboard and Events Log. For example, this query string will display all requests with a 301 response code within a certain time period:
status=301, timestamp between 2024-06-06 09:31:00 and 2024-06-06 09:36:00
For more information on this format, see Query filter syntax and best practices.
JSON format
The JSON equivalent of the query string above is:
(This is provided as an example only. A full discussion of JSON syntax is below.)
Converting from query string to JSON
The POST /api/v4.0/data/timeline/parse API route accepts query strings and returns the same query in JSON format.
JSON structure
The discussion below will focus on building the filters
parameter in JSON format, for two reasons. First, text query strings are discussed elsewhere (in the links given above). Second, for complex queries, JSON is more powerful.
A JSON filters parameter is structured as follows:
The first condition: a range of dates/times
The first condition must be included, and must be a range of dates/times. It is structured as follows:
where $TIMESTAMP1 and $TIMESTAMP2 are timestamps: specifications of date and time.
For timestamps, any ISO format is supported. Nevertheless, both timestamps must include year, month, and day.
If hours, minutes, or seconds are not included in a timestamp, the time will be rendered as the beginning of the day/hour/minute, respectively. Examples: "2022-07-14"
-> "2022-07-14 00:00:00"
, "2022-07-14 06:52"
-> "2022-07-14 06:52:00"
, etc.
Subsequent conditions
After the first condition, additional conditions can be specified if desired. They are structured as follows:
They must meet these requirements:
Multiple conditions are combined with a logical AND. (A logical OR is not supported, as this could potentially retrieve unexpectedly large amounts of data.)
When multiple conditions are provided, all are followed by commas, except for the final one.
The "key" line is optional; see discussion below.
Here in the documentation, spaces and carriage returns are included in JSON filter examples for clarity. In usage, they are optional.
Also, the order of a condition's components (its field
/op/key/value
) does not matter.
Field names
Available fields include those inherent to HTTP requests, along with additional data added by Reblaze during processing.
Some of the Reblaze-added information consists of internal IDs for the security settings that are relevant to the request.
Some requests will not contain all possible information. When Reblaze blocks a request, processing usually stops immediately, and later stages in the traffic filtering process do not occur.
acl_triggers
Populated during evaluation of the active ACL Profile. Contains keys: acl_action
action (the type of Action that was triggered)
extra (currently unused)
tags
trigger_id
trigger_name All are strings, except for tags, which is an array of strings.
arguments
Arguments of the request, if any. Query string and JSON examples for mysite.com/page?foo=1 :
arguments["foo"]="1"
{"field": "arguments", "key": "^foo$", "op": "eq", "value": "1"}
asn
string
authority
string
biometric
array
blocked
boolean
bot
boolean
branch
string
bytes_sent
integer
challenge
boolean
challenge_type
string
cf_restrict_triggers
array
cf_triggers
Populated if the request triggered a Content Filter Rule. Contains keys: action, extra, name, risk_level, ruleid, section, trigger_id, trigger_name, value. All have string values, except for risk_level, which is an integer.
challenge
boolean
challenge_type
string
cookies
Can inspect specific cookies or all cookies. See JSON filter examples.
country
string
dr_triggers
array
geo_region
string
gf_triggers
An array of entries, one for each Global Filter that matched the request. Each entry contains these keys: action, extra, name, section, trigger_id, trigger_name, value
headers
Can inspect specific headers or all headers. See JSON filter examples.
host
string
hostname
string
human
boolean
ichallenge
boolean
ip
string
logs
array
method
string
monitor
boolean: whether or not the request triggered a Monitor action.
monitor_reasons
array of strings; the various reasons (if any) that the request triggered Monitor actions.
organization
string
path
string. The path excluding the TLD and excluding arguments; the string begins with "/".
path_parts
string. Contents for mysite.com/abc/123/home.html?foo=true: "path_parts": {
"part1": "abc",
"part2": "123",
"part3": "home.html",
"path": "/abc/123/home.html"
} To match the second part of this example: path_parts["part2"]="123" {"field": "path_parts", "key": "^part2$", "op": "eq", "value": "123"}
port
string
processing_stage
integer: the furthest stage of traffic filtering that was reached. 0: Initialization 2: Global Filtering 3. Flow Control 4. Global Rate Limits 5. Rate Limits 6. ACL Profile 7. Content Filtering
profiling
array of items containing the security settings relevant to this request. Each item contains: a name (secpol, mapping, flow, limit, acl, content_filter) and value (the internal ID of that setting).
protocol
string
proxy
array of items containing proxy-related data. Each item contains a name and value. The names are: additional_tags, bytes_sent, container, geo_as_domain, geo_as_name, geo_as_type, geo_company_country, geo_company_domain, geo_company_type, geo_lat, geo_long, geo_mobile_carrier, geo_mobile_country, geo_mobile_mcc, geo_mobile_mnc, realip, request_id, request_length, request_time, ssl_cipher, ssl_protocol, status.
query
string. Example: for mysite.com/page?code=117, this is ?code=117
.
rbz_latency
integer
rbzid
Cookie set by Reblaze. Example query string and JSON: cookies["rbzid"]="Jc491eLWqTBOfDnJwNk" {"field": "cookies", "key": "^rbzid$", "op": "eq", "value": "Jc491eLWqTBOfDnJwNk"}
rbzsessionid
Cookie set by Reblaze. Example query string and JSON: cookies["rbzsessionid"]="57870178706cb50db6d41aab" {"field": "cookies", "key": "^rbzsessionid$", "op": "eq", "value": "578701713f70dcd8706cb50db6d41aab"}
reason
string. The reason, if any, the request was blocked.
referer
string
request_id
string
request_length
integer
request_time
float
result
string; the disposition of the request. A way to quickly see anomalies is to search for {"field": "result", "op": "not eq", "value": "Passed"}
rl_triggers
array; the reasons (if any) that rate limits were triggered.
security_config
The configuration of security settings when this request was processed. Keys and data types are: acl_active (boolean)
cf_active (boolean)
cf_rules (integer)
gf_rules (integer)
revision (string)
rl_rules (integer)
secpolentryid (string)
secpolid (string)
session
string
session_ids
array
status
integer
tags
array of strings: all the tags attached to the request
time_period
integer; the Epoch Unix timestamp of the request
timestamp
string; date and time
trigger_counters
The number of times an Action was triggered, and the source of the triggers (ACL Profile, Content Filtering, Global Filters, or Rate Limits). This is a collection of keys (counters) and values (integers with the value of each counter). Counter names are strings: acl, cf, cf_restrict, dr, gf, rl. Sample filter condition: {"field": "trigger_counters", "key": "acl", "value": 0, "op": "gt"}
upstream_addr
array of strings
upstream_data
array of elements: {addr (string), response_time (float), status (integer)}
upstream_response_time
float or null
upstream_status
array of integers
url
string
user_agent
string
version
string
Key
For some types of data, it might not be enough to specify the field, because there could be multiple parameters in the request that match it. For example, a field name of "cookies" or "headers" does not tell the system which cookie or header to inspect.
The key field supplies this information; it is the name of the specific parameter to evaluate. If this parameter is not defined, the system will inspect all instances of the specified field (all cookies, all headers, etc.).
Some examples are in the JSON filters examples below.
Values
Values should be specified in the appropriate data type: strings as quote-delimited strings, integers as numbers, etc. Arrays of values can be supplied.
Operators
is
boolean
checks if value is True or False
eq
integer / float / string
checks exact match for numeric/string value
gt/ lt
integer / float
checks if value is greater/less than
gte/ lte
integer / float
checks if value is greater/less than or equal
in
integer / float / string
checks if numeric/string value is in a list of values
regex
string
checks if string has a match with a regex
between
integer / float / timestamp
checks if value between two numbers/ timestamps. Does not depend on order.
Negative operators
Operators can be inverted by adding not
. For example, not eq
means "does not equal".
Inverting a condition
It's possible to invert an entire condition by adding NOT, like this:
JSON filter examples
Retrieve PUT and POST requests:
Retrieve requests containing a tag that matches "geo" or "location":
Retrieve requests where certain cookies' values match a regex:
Retrieve requests where any cookie's value matches a regex:
Retrieve requests according to a subfield (the acl_active subfield of security_config must be greater than 11).
Retrieve requests according to an array of subfields:
Retrieve requests according to a combination of conditions (there is no limit on the number of conditions):
API access to traffic data
As noted previously, there are four API routes that use the filters
parameter to retrieve traffic data:
GET /api/v4.0/data/topx
GET /api/v4.0/data/stats
GET /api/v4.0/data/timeline
GET /api/v4.0/data/logs
Their typical uses are as follows.
Quickly discover the most important factors in the traffic stream (e.g., the countries sending the most blocked requests, the URLs receiving the most bot traffic, etc.): use the topx route.
Get a summary of traffic statistics (total requests, bandwidth, and latency): use the stats route.
Get a summary of security metrics (total requests, blocked requests, status codes returned, number of human clients, activity of the origin, etc.): use the timeline route
Get complete data for all requests matching certain criteria (often used for drilling down into trends discovered from the other routes): use the logs route.
Below, we discuss each route in detail.
GET /api/v4.0/data/topx
This route provides API access to the same Top Metrics available in the Dashboard. Here's an example of Top Countries in the Dashboard:
Calling this route returns all metrics of data: all results for "top applications", all results for "top countries", all items for "top sources", and so on. They are combined into a single continuous list:
...where the list of $RESULTs looks something like this (incomplete) example:
We see that in the time period specified in the filters
parameter, there were three IP addresses in two countries that sent requests to a single URL.
Some points to note:
The results are organized and grouped together in the list according to their label.
Labels are ordered alphabetically.
Labels can differ in their number of results.
The route returns the "top" results for each label (for example, results with the "ip" label show the IPs that sent the most blocked requests). When there are only a few results for a given label, all are retrieved. When there are many, only the "top" results are retrieved.
Actual usage
The example above is oversimplified. In actual use, the topx route:
Returns much more data per result, not just key and label (see the list of fields below)
Returns more categories than just country, ip, and url (see the discussion of the label field below)
A detailed discussion of topx follows.
Contents of each result
Each result contains the fields listed below.
The _time fields are in seconds. These are floats, but can appear at various precisions: zero decimal places, several decimal places, or scientific notation (e.g., 1.6210818451802098e-9).
avg_origin_time
float
The average amount of processing time by the origin for these requests. If no requests reached the origin, this will be null.
avg_rbz_time
float
The average amount of processing time by Reblaze for these requests.
avg_total_time
float
The average total amount of processing time for these requests.
first_asn
string
First entry in the list of ASNs
first_geo_country
string
First entry in the list of countries
first_organization
string
First entry in the list of organizations
key
string
Content varies; see discussion below.
label
string
Category of result. See discussion below.
max_origin_time
float
The longest amount of processing time by the origin among these requests. If no requests reached the origin, this will be null.
max_rbz_time
float
The longest amount of processing time by Reblaze among these requests.
max_total_time
float
The longest amount of total processing time among these requests.
min_origin_time
float
The shortest amount of processing time by the origin among these requests. If no requests reached the origin, this will be null.
min_rbz_time
float
The shortest amount of processing time by Reblaze among these requests.
min_total_time
float
The shortest amount of total processing time among these requests.
num_of_blocked_requests
integer
num_of_bot_requests
integer
num_of_challenges
integer
num_of_human_requests
integer
num_of_monitored_requests
integer
Includes all requests that triggered a "monitor" action, even if they were blocked as well.
num_of_requests
integer
sum_of_bytes_sent
integer
sum_of_request_length
integer
Organization of results: the label and key fields
The topx route returns results in a specific order:
Results are grouped together according to their label (i.e., their category).
Labels are ordered alphabetically (see full list below)
Within each label, results are ordered by num_of_blocked_requests, in descending order.
Notice that this is unlike the UI's Dashboard Top Metrics, where some types of results have other default orders.
The topx route returns twelve categories of results, each with its own label. The label determines the contents of the key field.
country
country name
host
host name or IP
ip
IP address
organization
organization
origin_time
target URL
rbz_time
target URL
rbzid
rbzid cookie value
reason
reason the request was monitored or blocked
referer
referer string
total_time
target URL
url
target URL
user_agent
user agent string
GET /api/v4.0/data/stats
This route returns traffic metrics for the requested time period, broken down into shorter segments of time.
Data structures
The retrieved metrics are structured like this:
...where each $RESULTS-TIMESEGMENT-x has this structure:
Contents of each result
avg_latency
Average latency in seconds
hostname
Host
num_of_requests
Total requests received during the time period
sum_of_bandwidth
Total bytes sent and received
time_period
Beginning of time segment, as an Epoch Unix integer (e.g., 1718186400)
timeperiod_string
Beginning of time segment, as a string (e.g., "2024-06-12 10:00:00")
GET /api/v4.0/data/timeline
This route returns security metrics for the requested time period, broken down into shorter segments of time.
Data structures
The retrieved metrics are structured like this:
...where each $RESULTS-TIMESEGMENT-x has this structure:
...and each $STATUS-DATA-x contains the number of responses with a specific status:
Contents of each result
array_origin_status_codes
An array: each element contains a status code and the number of responses from the origin with that code. If no requests reached the origin during the specified time period, the array will be empty.
array_status_codes
An array: each element contains a status code and the number of responses from Reblaze with that code.
num_of_blocked_requests
Requests that were blocked
num_of_challenges
Number of times Reblaze issued a bot challenge
num_of_human_requests
Requests from human (i.e., non-bot) clients
num_of_ip
Number of IPs used by clients
num_of_origin_blocked_requests
Number of requests rejected by the origin
num_of_requests
Total requests received during the time period
num_of_sessions
Number of unique sessions
sum_of_sent_bytes
Total bytes sent
time_period
Beginning of time segment, as an Epoch Unix integer (e.g., 1718186400)
timeperiod_string
Beginning of time segment, as a string (e.g., "2024-06-12 10:00:00")
GET /api/v4.0/data/logs
This route returns all requests that match the filter parameters, up to the number of requests specified. The results are returned like this:
Each $REQUEST has this structure:
...where the $FIELDs are the Field names listed above in the filters
discussion, and the $VALUEs are their values, if any. So a request looks like this:
Last updated