Skip to content

A micro multi-website analytics database service designed to be fast and robust, built with Go and SQLite.

Notifications You must be signed in to change notification settings

shengxinbaixiaosheng/micro-analytics

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

µAnalytics

A micro multi-website analytics database service designed to be fast and robust, built with Go and SQLite.

Schema

Principle

Analytics databases tend to grow fast and exponentially. Requesting data for one specific website from a single database thus become very slow over time. But analytics data are highly decoupled between two websites.

The idea behind µAnalytics is to shard your analytics data on a key, which is usually a website name. Each shard thus only contains a specific website data, allowing faster response times and easy horizontal scaling.

To handle requests even faster, µAnalytics automatically manages a pool of connections to multiple shards at a time.

By default, the service keeps 10 connections alive. But you can easily increase/decrease the max number of alive shards with the --connections flag when launching the app.

Install

$ go install github.com/GitbookIO/micro-analytics

Service launching

To launch the application, simply run:

$ ./micro-analytics

The command takes the following optional parameters:

Parameter Environment Variable Usage Type Default Value
--user, -u MA_USER  Username for basic auth String ""
--password, -w MA_PASSWORD  Password for basic auth String ""
--port, -p MA_PORT  Port to listen on String "7070"
--root, -r MA_ROOT Database directory String "./dbs"
--connections, -c MA_POOL_SIZE Max number of alive shards connections Number 1000
--idle-timeout, -i MA_POOL_TIMEOUT Idle timeout for DB connections in seconds Number 60
--cache-directory, -d MA_CACHE_DIR Cache directory String ".diskache"

If --user is provided, the service will automatically use basic access authentication on all requests.

The actual cache directory will be a subdirectory named after the app major version. The default will then be ./.diskache/0.

Analytics schema

All shards of the µAnalytics database share the same TABLE schema:

CREATE TABLE visits (
    time            INTEGER,
    event           TEXT,
    path            TEXT,
    ip              TEXT,
    platform        TEXT,
    refererDomain   TEXT,
    countryCode     TEXT
)

Service requests

GET requests

Common Parameters

Every query for a specific website can be executed using a time range. Every following GET request thus takes the two following optional query string parameters:

Name Type Description Default Example
start Date Start date to query a range none "2015-11-20T12:00:00.000Z"
end Date End date to query a range none "2015-11-21T12:00:00.000Z"

The dates can be passed either as:

  • ISO (RFC3339) "2015-11-20T12:00:00.000Z"
  • UTC (RFC1123) "Fri, 20 Nov 2015 12:00:00 GMT"
  • A Unix timestamp as a String "1448020800"
Common Aggregation Parameters
Name Type Description Default Example
unique Boolean Include the total number of unique visitors in response none true
Common Aggregation Response Values

Except for GET /:website, every response to a GET request will contain the two following values:

Name Type Description
total Integer Total number of visits
unique Integer Total number of unique visitors based on ip, set to 0 unless unique=true is passed as a query string parameter

GET /:website

Returns the full analytics for a website.

Response
{
    "list": [
        {
            "time": "2015-11-25T16:00:00+01:00",
            "event": "download",
            "path": "/somewhere",
            "ip": "127.0.0.1",
            "platform": "Windows",
            "refererDomain": "gitbook.com",
            "countryCode": "fr"
        },
    ...
    ]
}

GET /:website/count

Returns the count of analytics for a website. The unique query string parameter is not necessary for this request.

Response
{
    "total": 1000,
    "unique": 900
}

GET /:website/countries

Returns the number of visits per countryCode.

Response

label contains the country full name.

{
    "list": [
        {
            "id": "fr",
            "label": "France",
            "total": 1000,
            "unique": 900
        },
        ...
    ]
}

GET /:website/platforms

Returns the number of visits per platform.

Response
{
    "list": [
        {
            "id": "Linux",
            "label": "Linux",
            "total": 1000,
            "unique": 900
        },
        ...
    ]
}

GET /:website/domains

Returns the number of visits per refererDomain.

Response
{
    "list": [
        {
            "id": "gitbook.com",
            "label": "gitbook.com",
            "total": 1000,
            "unique": 900
        },
        ...
    ]
}

GET /:website/events

Returns the number of visits per event.

Response
{
    "list": [
        {
            "id": "download",
            "label": "download",
            "total": 1000,
            "unique": 900
        },
        ...
    ]
}

GET /:website/time

Returns the number of visits as a time serie. The interval in seconds can be specified as an optional query string parameter. Its default value is 86400, equivalent to one day.

Parameters
Name Type Description Default Example
interval Integer Interval of the time serie 86400 (1 day) 3600
Response

Example with interval set to 3600:

{
    "list": [
        {
            "start": "2015-11-24T12:00:00.000Z",
            "end": "2015-11-24T13:00:00.000Z",
            "total": 450,
            "unique": 390
        },
        {
            "start": "2015-11-24T13:00:00.000Z",
            "end": "2015-11-24T14:00:00.000Z",
            "total": 550,
            "unique": 510
        },
        ...
    ]
}

POST requests

POST /:website

Insert new data for the specified website.

POST Body
{
    "time": "2015-11-24T13:00:00.000Z", // optional
    "event": "download",
    "ip": "127.0.0.1",
    "path": "/README.md",
    "headers": {
        // ...
        // HTTP headers received from your visitor
    }
}

The time parameter is optional and is set to the date of your POST request by default.

Passing the HTTP headers in the POST body allows the service to extract the refererDomain and platform values. The countryCode will be deduced from the passed ip parameter using Maxmind's GeoLite2 database.

POST /:website/bulk

Insert a list of analytics for a specific website. The analytics can be sent directly in DB format, with time being a String value.

time can be passed as either:

  • ISO (RFC3339) "2015-11-20T12:00:00.000Z"
  • UTC (RFC1123) "Fri, 20 Nov 2015 12:00:00 GMT"
  • A Unix timestamp as a String "1448020800"

If the time parameter is not provided, it will be defaulted to the exact time of the server processing the POST request.

As for the POST /:website method, the analytics can also have an optional headers parameter. If the refererDomain and/or platform values are not passed in the JSON body, the headers parameter will be used to set these values automatically.

POST Body
{
    "list": [
        {
            "time": "1450098642",
            "ip": "127.0.0.1",
            "event": "download",
            "path": "/somewhere",
            "platform": "Apple Mac",
            "refererDomain": "www.gitbook.com",
            "countryCode": "fr"
        },
        {
            "time": "2015-11-20T12:00:00.000Z",
            "ip": "127.0.0.1",
            "event": "login",
            "path": "/someplace",
            "headers": {
                // ...
                // HTTP headers received from your visitor
            }
        }
    ]
}

The countryCode will be reprocessed by the service using GeoLite2 based on the ip.

POST /bulk

Insert a list of analytics for different websites. The analytics have the same format as POST /:website/bulk, with a mandatory website parameter.

POST Body
{
    "list": [
        {
            "website": "website-1",
            "time": "1450098642",
            "ip": "127.0.0.1",
            "event": "download",
            "path": "/somewhere",
            "platform": "Apple Mac",
            "refererDomain": "www.gitbook.com",
            "countryCode": "fr"
        },
        {
            "website": "website-2",
            "time": "2015-11-20T12:00:00.000Z",
            "ip": "127.0.0.1",
            "event": "login",
            "path": "/someplace",
            "headers": {
                // ...
            }
        }
    ]
}

DELETE requests

DELETE /:website

Fully delete a shard from the file system.

About

A micro multi-website analytics database service designed to be fast and robust, built with Go and SQLite.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 98.1%
  • Shell 1.9%