Access log-structure

Log line structure is as follows, note that some fields are quoted and some are not.if we will add more fields in the future, they'll be added on the right side.

Field names:

Field

Description

remote_addr

Client IP

timestamp

Timestamp

status

Response Status code sent to client

bytes_sent

Number of bytes sent in the response

method

HTTP Request method

request

The complete URL (including the query string)

proto_version

Protocol Version (1.0/1.1)

blocked

Was it blocked by Reblaze?

is_human

Was it marked as human?

block_reason

If blocked / Exceptionally passed - for what reason

geoip_country_name

Country Name

geoip_country_code

Country Code

request_id

Unique ID of this request within Reblaze

captured_vector

The vector attack we captured

request_time

The time it took our system to process the request

upstream_addr

The address of the upstream server(s) reblaze approached

upstream_response_time

The time Reblaze was waiting to the upstream server(s) to return the response

domain_name

The domain name of the server group in reblaze

host

The Host header of this request (same as domain name, or one of its aliases)

referer

The HTTP Referer header

http_user_agent

HTTP User Agent

http_cookie

The Cookie header string

request_headers

Request headers encoded in base6

organization

The complete organisation name owning the IP address

upstream_status

The status code returned by the upstream server

uri

The request URI without query part

hostname

The Proxy (Reblaze) server that process the request.

is_cloud

is_tor

is_vpn

is_anonymizer

is_proxy

rbzsessionid

Reblaze Session ID

request_length

The upload request size in bytes

sent_http_cache_control

The Cache-Control header we sent as part of the response

sent_http_expires

The Expires header we sent as part of the response

cookie_rbzid

Hash of the RBZID Cookie

sent_http_content_type

The MIME - Type (Content-Type response header)

browsersig

A unique signature of the visiting browser (future use)

ssl_protocol

SSL Protocol Version

ssl_cipher

Selected SSL Cipher

cache_status

Upstream Cache Status

anything_else

Future use place holder

Example line (sliced into smaller parts):

15.14.13.12 1465461803.223 200 3158 \
"POST /payment_service_api/get_all_payment_methods.json HTTP/1.1" "0" "0" \
"" "Singapore" "SG" "foobar-rbzr131343635343631383032ce9df9a6a90fa3bc" "-" \
0.729 "8.12.40.38:443" "0.418" "secure.foobar.com" "secure.foobar.com" "-" \
"REEBONZ 5.2.2 rv:3 (iPhone; iPhone OS 9.3.2; en_SG)" "-" \
"eyJob3N0Ijoic2VjdXJlLnJlZWJvbnouY29tIiwieC1uZXdyZWxpYy1pZCI6IlVnWURVRkJBQ1FzSlZGbGFCUT09IiwiY29udGVudC10eXBlIjoiYXBwbGljYXRpb25cL3gtd3d3LWZvcm0tdXJsZW5jb2RlZDsgY2hhcnNldD11dGYtOCIsImNvbm5lY3Rpb24iOiJjbG9zZSIsImNvbnRlbnQtbGVuZ3RoIjoiMTQ0IiwiYWNjZXB0LWVuY29kaW5nIjoiZ3ppcCIsInBvc3RfcmVxdWVzdF9ib2R5Ijp7ImNvdW50cnlfY29kZSI6IlNHIiwic2lnbmF0dXJlIjoiMGE2YzQwZTI3ZGRmMGYyNjcxMzM2YjhiYTljYjVmYzIiLCJkYXRldGltZSI6IjIwMTYtMDYtMDkgMTY6NDM6MjEiLCJkcnVwYWxfdWlkIjoiNTI0NTgxNiIsInBsYXRmb3JtX25hbWUiOiJNb2JpbGUiLCJidV9jb2RlIjoiMDEifSwidXNlci1hZ2VudCI6IlJFRUJPTlogNS4yLjIgcnY6MyAoaVBob25lOyBpUGhvbmUgT1MgOS4zLjI7IGVuX1NHKSJ9" \
"AS4773 MobileOne Ltd. Mobile/Internet Service Provider Singapore" "200" \
"/payment_service_api/get_all_payment_methods.json" "foobar-rbzr1" "0" "0" "0" "0" \
"0" "83d90abb12608623ea23442273249146" "466" "max-age=0, private, must-revalidate" \
"-" "-" "text/html; charset=utf-8" "-" "TLSv1.2" "ECDHE-RSA-AES128-GCM-SHA256" "-"

Headers are base64 encoded JSON string contains headers names and values.

if we decided to block a request, either by ACL, or WAF/IPS it will be marked as "1" in the blocked field,

as well there will be a description at "block_reason" field.

Decoding headers filed shall result with the following:

{
  "host": "secure.reebonz.com",
  "x-newrelic-id": "UgYDUFBACQsJVFlaBQ==",
  "content-type": "application/x-www-form-urlencoded; charset=utf-8",
  "connection": "close",
  "content-length": "144",
  "accept-encoding": "gzip",
  "post_request_body": {
    "country_code": "SG",
    "signature": "0a6c40e27ddf0f2671336b8ba9cb5fc2",
    "datetime": "2016-06-09 16:43:21",
    "drupal_uid": "5245816",
    "platform_name": "Mobile",
    "bu_code": "01"
  },
  "user-agent": "FOOBAR 5.2.2 rv:3 (iPhone; iPhone OS 9.3.2; en_SG)"
}

Note that within the headers, we add an entry for the post_request_body.

Python Parser Example:

import re
access_line_rec = re.compile(''' # -- line by line
    (\S+)\s         # remote_addr
    (\S+)\s         # timestamp
    (\S+)\s         # status
    (\S+)\s         # bytes_sent
    "(\S+)\s        # METHOD
    (.+)            # request
    \s(HTTP/\d\.\d)"\s     # PROTOCOL/Version
    "([^"]*)"\s     # blocked
    "([^"]*)"\s     # is_human
    "([^"]*)"\s     # block_reason
    "([^"]*)"\s     # geoip_city_country_name
    "([^"]*)"\s     # geoip_city
    "([^"]*)"\s     # request_id
    "([^"]*)"\s     # captured_vector
    (\S+)\s         # request_time
    "([^"]*)"\s     # upstream_addr
    "([^"]*)"\s     # upstream_response_time
    "([^"]*)"\s     # canonical_domain_name
    "([^"]*)"\s     # http_host
    "([^"]*)"\s     # referer
    "([^"]*)"\s     # user-agent
    "([^"]*)"\s     # cookie
    "([^"]*)"\s     # headers
    "([^"]*)"\s     # organization
    "([^"]*)"\s     # upstream_status
    "([^"]*)"\s     # uri
    "([^"]*)"\s     # hostname
    "([^"]*)"\s     # is_cloud
    "([^"]*)"\s     # is_tor
    "([^"]*)"\s     # is_vpn
    "([^"]*)"\s     # is_anonymizer
    "([^"]*)"\s     # is_proxy
    "([^"]*)"\s     # rbzsessionid
    "([^"]*)"\s     # request_length
    "([^"]*)"\s     # sent_http_cache_control
    "([^"]*)"\s     # sent_http_expires
    "([^"]*)"\s     # cookie_rbzid
    "([^"]*)"\s     # sent_http_content_type
    "([^"]*)"\s     # browsersig
    "([^"]*)"\s     # ssl_protocol
    "([^"]*)"\s     # ssl_cipher
    "([^"]*)"       # cache_status
    (.*)''', re.X)  # anything else

names = ("remote_addr","timestamp","status",
    "bytes_sent","method","request","proto_version", 
    "blocked","is_human","block_reason",
    "geoip_city_country_name","geoip_city",
    "request_id", "captured_vector", "request_time", "upstream_addr", 
    "upstream_response_time", "domain_name", "host", "referer", 
    "user_agent", "cookie", "request_headers", "organization", "upstream_status", 
    "uri", "hostname", "is_cloud", "is_tor", "is_vpn", "is_anonymizer", 
    "is_proxy", "rbzsessionid", "request_length", "sent_http_cache_control", 
    "sent_http_expires", "cookie_rbzid", "sent_http_content_type", "browsersig", 
    "ssl_protocol", "ssl_cipher", "cache_status", "anything_else")

def parse_line(line, as_dict=False):
   rmatch = access_line_rec.match(line)
   if rmatch:
       g_match = rmatch.groups()
       if not as_dict:
           return g_match
       else:
           # to do, check if using re-group names would be faster
           return dict(zip(names, g_match))
   else:
       return None