Access log-structure

Log line structure is as follows, note that some fields are quoted and some are not.if we will add more fields in the future, they'll be added on the right side.

Field names:

Example line (sliced into smaller parts):

15.14.13.12 1465461803.223 200 3158 \
"POST /payment_service_api/get_all_payment_methods.json HTTP/1.1" "0" "0" \
"" "Singapore" "SG" "foobar-rbzr131343635343631383032ce9df9a6a90fa3bc" "-" \
0.729 "8.12.40.38:443" "0.418" "secure.foobar.com" "secure.foobar.com" "-" \
"REEBONZ 5.2.2 rv:3 (iPhone; iPhone OS 9.3.2; en_SG)" "-" \
"eyJob3N0Ijoic2VjdXJlLnJlZWJvbnouY29tIiwieC1uZXdyZWxpYy1pZCI6IlVnWURVRkJBQ1FzSlZGbGFCUT09IiwiY29udGVudC10eXBlIjoiYXBwbGljYXRpb25cL3gtd3d3LWZvcm0tdXJsZW5jb2RlZDsgY2hhcnNldD11dGYtOCIsImNvbm5lY3Rpb24iOiJjbG9zZSIsImNvbnRlbnQtbGVuZ3RoIjoiMTQ0IiwiYWNjZXB0LWVuY29kaW5nIjoiZ3ppcCIsInBvc3RfcmVxdWVzdF9ib2R5Ijp7ImNvdW50cnlfY29kZSI6IlNHIiwic2lnbmF0dXJlIjoiMGE2YzQwZTI3ZGRmMGYyNjcxMzM2YjhiYTljYjVmYzIiLCJkYXRldGltZSI6IjIwMTYtMDYtMDkgMTY6NDM6MjEiLCJkcnVwYWxfdWlkIjoiNTI0NTgxNiIsInBsYXRmb3JtX25hbWUiOiJNb2JpbGUiLCJidV9jb2RlIjoiMDEifSwidXNlci1hZ2VudCI6IlJFRUJPTlogNS4yLjIgcnY6MyAoaVBob25lOyBpUGhvbmUgT1MgOS4zLjI7IGVuX1NHKSJ9" \
"AS4773 MobileOne Ltd. Mobile/Internet Service Provider Singapore" "200" \
"/payment_service_api/get_all_payment_methods.json" "foobar-rbzr1" "0" "0" "0" "0" \
"0" "83d90abb12608623ea23442273249146" "466" "max-age=0, private, must-revalidate" \
"-" "-" "text/html; charset=utf-8" "-" "TLSv1.2" "ECDHE-RSA-AES128-GCM-SHA256" "-"

Headers are base64 encoded JSON string contains headers names and values.

if we decided to block a request, either by ACL, or WAF/IPS it will be marked as "1" in the blocked field,

as well there will be a description at "block_reason" field.

Decoding headers filed shall result with the following:

{
  "host": "secure.reebonz.com",
  "x-newrelic-id": "UgYDUFBACQsJVFlaBQ==",
  "content-type": "application/x-www-form-urlencoded; charset=utf-8",
  "connection": "close",
  "content-length": "144",
  "accept-encoding": "gzip",
  "post_request_body": {
    "country_code": "SG",
    "signature": "0a6c40e27ddf0f2671336b8ba9cb5fc2",
    "datetime": "2016-06-09 16:43:21",
    "drupal_uid": "5245816",
    "platform_name": "Mobile",
    "bu_code": "01"
  },
  "user-agent": "FOOBAR 5.2.2 rv:3 (iPhone; iPhone OS 9.3.2; en_SG)"
}

Note that within the headers, we add an entry for the post_request_body.

Python Parser Example:

import re
access_line_rec = re.compile(''' # -- line by line
    (\S+)\s         # remote_addr
    (\S+)\s         # timestamp
    (\S+)\s         # status
    (\S+)\s         # bytes_sent
    "(\S+)\s        # METHOD
    (.+)            # request
    \s(HTTP/\d\.\d)"\s     # PROTOCOL/Version
    "([^"]*)"\s     # blocked
    "([^"]*)"\s     # is_human
    "([^"]*)"\s     # block_reason
    "([^"]*)"\s     # geoip_city_country_name
    "([^"]*)"\s     # geoip_city
    "([^"]*)"\s     # request_id
    "([^"]*)"\s     # captured_vector
    (\S+)\s         # request_time
    "([^"]*)"\s     # upstream_addr
    "([^"]*)"\s     # upstream_response_time
    "([^"]*)"\s     # canonical_domain_name
    "([^"]*)"\s     # http_host
    "([^"]*)"\s     # referer
    "([^"]*)"\s     # user-agent
    "([^"]*)"\s     # cookie
    "([^"]*)"\s     # headers
    "([^"]*)"\s     # organization
    "([^"]*)"\s     # upstream_status
    "([^"]*)"\s     # uri
    "([^"]*)"\s     # hostname
    "([^"]*)"\s     # is_cloud
    "([^"]*)"\s     # is_tor
    "([^"]*)"\s     # is_vpn
    "([^"]*)"\s     # is_anonymizer
    "([^"]*)"\s     # is_proxy
    "([^"]*)"\s     # rbzsessionid
    "([^"]*)"\s     # request_length
    "([^"]*)"\s     # sent_http_cache_control
    "([^"]*)"\s     # sent_http_expires
    "([^"]*)"\s     # cookie_rbzid
    "([^"]*)"\s     # sent_http_content_type
    "([^"]*)"\s     # browsersig
    "([^"]*)"\s     # ssl_protocol
    "([^"]*)"\s     # ssl_cipher
    "([^"]*)"       # cache_status
    (.*)''', re.X)  # anything else

names = ("remote_addr","timestamp","status",
    "bytes_sent","method","request","proto_version", 
    "blocked","is_human","block_reason",
    "geoip_city_country_name","geoip_city",
    "request_id", "captured_vector", "request_time", "upstream_addr", 
    "upstream_response_time", "domain_name", "host", "referer", 
    "user_agent", "cookie", "request_headers", "organization", "upstream_status", 
    "uri", "hostname", "is_cloud", "is_tor", "is_vpn", "is_anonymizer", 
    "is_proxy", "rbzsessionid", "request_length", "sent_http_cache_control", 
    "sent_http_expires", "cookie_rbzid", "sent_http_content_type", "browsersig", 
    "ssl_protocol", "ssl_cipher", "cache_status", "anything_else")

def parse_line(line, as_dict=False):
   rmatch = access_line_rec.match(line)
   if rmatch:
       g_match = rmatch.groups()
       if not as_dict:
           return g_match
       else:
           # to do, check if using re-group names would be faster
           return dict(zip(names, g_match))
   else:
       return None

Last updated