API access to traffic data
The page below describes how to retrieve traffic data via the API. It contains these main sections:
The filters parameter (which specifies the traffic data that will be retrieved)
The four API routes that retrieve traffic data, and how to use them
Introduction
The Data queries API namespace provides access to traffic data, via these four routes:
GET /api/v4.0/data/topx
GET /api/v4.0/data/stats
GET /api/v4.0/data/timeline
GET /api/v4.0/data/logs
Each includes an input parameter named filters
. This parameter specifies the traffic data that will be retrieved.
On the page below, we begin by discussing this parameter's usage, syntax, and construction. Then we discuss the four API routes that require it, and the different types of data they return.
The filters parameter
filters
is a string that specifies one or more conditions. A database query is constructed from those conditions, and the results are returned to the user.
Usage of the filters
parameter depends on the context:
When using Swagger UI, this value is supplied directly in the
filters
input field.When using curl to call the Reblaze API, this value is encoded into the destination URL, preceded by "filters=". See this explanation for more information.
A short example
Here is an example of a filters
specification in JSON format:
This will return requests which:
were received in a five-minute time period (from 2024-06-06 09:31:00 to 2024-06-06 09:36:00), and...
have a Tag containing the string
unrecognized
. (In this example, the admin wanted to retrieve requests tagged withunrecognized-host-header
.)
Format
The filters
parameter can be supplied as a query string, or as JSON.
Query string format
This format is used in the first Query Specification input field in the UI's Dashboard and Events Log. For example, this query string will display all requests with a 301 response code within a certain time period:
status=301, timestamp between 2024-06-06 09:31:00 and 2024-06-06 09:36:00
For more information on this format, see Query filter syntax and best practices.
JSON format
The JSON equivalent of the query string above is:
(This is provided as an example only. A full discussion of JSON syntax is below.)
Converting from query string to JSON
The POST /api/v4.0/data/timeline/parse API route accepts query strings and returns the same query in JSON format.
JSON structure
The discussion below will focus on building the filters
parameter in JSON format, for two reasons. First, text query strings are discussed elsewhere (in the links given above). Second, for complex queries, JSON is more powerful.
A JSON filters parameter is structured as follows:
The first condition: a range of dates/times
The first condition must be included, and must be a range of dates/times. It is structured as follows:
where $TIMESTAMP1 and $TIMESTAMP2 are timestamps: specifications of date and time.
For timestamps, any ISO format is supported. Nevertheless, both timestamps must include year, month, and day.
If hours, minutes, or seconds are not included in a timestamp, the time will be rendered as the beginning of the day/hour/minute, respectively. Examples: "2022-07-14"
-> "2022-07-14 00:00:00"
, "2022-07-14 06:52"
-> "2022-07-14 06:52:00"
, etc.
Subsequent conditions
After the first condition, additional conditions can be specified if desired. They are structured as follows:
They must meet these requirements:
Multiple conditions are combined with a logical AND. (A logical OR is not supported, as this could potentially retrieve unexpectedly large amounts of data.)
When multiple conditions are provided, all are followed by commas, except for the final one.
The "key" line is optional; see discussion below.
Here in the documentation, spaces and carriage returns are included in JSON filter examples for clarity. In usage, they are optional.
Also, the order of a condition's components (its field
/op/key/value
) does not matter.
Field names
Available fields include those inherent to HTTP requests, along with additional data added by Reblaze during processing.
Some of the Reblaze-added information consists of internal IDs for the security settings that are relevant to the request.
Some requests will not contain all possible information. When Reblaze blocks a request, processing usually stops immediately, and later stages in the traffic filtering process do not occur.
Key
For some types of data, it might not be enough to specify the field, because there could be multiple parameters in the request that match it. For example, a field name of "cookies" or "headers" does not tell the system which cookie or header to inspect.
The key field supplies this information; it is the name of the specific parameter to evaluate. If this parameter is not defined, the system will inspect all instances of the specified field (all cookies, all headers, etc.).
Some examples are in the JSON filters examples below.
Values
Values should be specified in the appropriate data type: strings as quote-delimited strings, integers as numbers, etc. Arrays of values can be supplied.
Operators
Negative operators
Operators can be inverted by adding not
. For example, not eq
means "does not equal".
Inverting a condition
It's possible to invert an entire condition by adding NOT, like this:
JSON filter examples
Retrieve PUT and POST requests:
Retrieve requests containing a tag that matches "geo" or "location":
Retrieve requests where certain cookies' values match a regex:
Retrieve requests where any cookie's value matches a regex:
Retrieve requests according to a subfield (the acl_active subfield of security_config must be greater than 11).
Retrieve requests according to an array of subfields:
Retrieve requests according to a combination of conditions (there is no limit on the number of conditions):
API access to traffic data
As noted previously, there are four API routes that use the filters
parameter to retrieve traffic data:
GET /api/v4.0/data/topx
GET /api/v4.0/data/stats
GET /api/v4.0/data/timeline
GET /api/v4.0/data/logs
Their typical uses are as follows.
Quickly discover the most important factors in the traffic stream (e.g., the countries sending the most blocked requests, the URLs receiving the most bot traffic, etc.): use the topx route.
Get a summary of traffic statistics (total requests, bandwidth, and latency): use the stats route.
Get a summary of security metrics (total requests, blocked requests, status codes returned, number of human clients, activity of the origin, etc.): use the timeline route
Get complete data for all requests matching certain criteria (often used for drilling down into trends discovered from the other routes): use the logs route.
Below, we discuss each route in detail.
GET /api/v4.0/data/topx
This route provides API access to the same Top Metrics available in the Dashboard. Here's an example of Top Countries in the Dashboard:
Calling this route returns all metrics of data: all results for "top applications", all results for "top countries", all items for "top sources", and so on. They are combined into a single continuous list:
...where the list of $RESULTs looks something like this (incomplete) example:
We see that in the time period specified in the filters
parameter, there were three IP addresses in two countries that sent requests to a single URL.
Some points to note:
The results are organized and grouped together in the list according to their label.
Labels are ordered alphabetically.
Labels can differ in their number of results.
The route returns the "top" results for each label (for example, results with the "ip" label show the IPs that sent the most blocked requests). When there are only a few results for a given label, all are retrieved. When there are many, only the "top" results are retrieved.
Actual usage
The example above is oversimplified. In actual use, the topx route:
Returns much more data per result, not just key and label (see the list of fields below)
Returns more categories than just country, ip, and url (see the discussion of the label field below)
A detailed discussion of topx follows.
Contents of each result
Each result contains the fields listed below.
The _time fields are in seconds. These are floats, but can appear at various precisions: zero decimal places, several decimal places, or scientific notation (e.g., 1.6210818451802098e-9).
Organization of results: the label and key fields
The topx route returns results in a specific order:
Results are grouped together according to their label (i.e., their category).
Labels are ordered alphabetically (see full list below)
Within each label, results are ordered by num_of_blocked_requests, in descending order.
Notice that this is unlike the UI's Dashboard Top Metrics, where some types of results have other default orders.
The topx route returns twelve categories of results, each with its own label. The label determines the contents of the key field.
GET /api/v4.0/data/stats
This route returns traffic metrics for the requested time period, broken down into shorter segments of time.
Data structures
The retrieved metrics are structured like this:
...where each $RESULTS-TIMESEGMENT-x has this structure:
Contents of each result
GET /api/v4.0/data/timeline
This route returns security metrics for the requested time period, broken down into shorter segments of time.
Data structures
The retrieved metrics are structured like this:
...where each $RESULTS-TIMESEGMENT-x has this structure:
...and each $STATUS-DATA-x contains the number of responses with a specific status:
Contents of each result
GET /api/v4.0/data/logs
This route returns all requests that match the filter parameters, up to the number of requests specified. The results are returned like this:
Each $REQUEST has this structure:
...where the $FIELDs are the Field names listed above in the filters
discussion, and the $VALUEs are their values, if any. So a request looks like this:
Last updated