Building Policy Based Access Control to AWS API Gateway Using Open Policy Agent

Yash Tewari March 18, 2021
AWS API Gateway with OPA

Implementing Geofencing, Rate Limiting and Group-Based Access Control

Introduction

Now more than ever, developers are tasked with building better access controls for their APIs. By implementing rate limiting, geo fencing or group-based access controls, developers have the ability to construct dynamic authorization  – when, who, from where and how often someone has access to your API. Open Policy Agent is becoming the de-facto standard for policy-based control. So we thought to ourselves – why not to connect an OPA server to AWS API gateway, and introduce Policy Based Access Control (PBAC) to the gateway?  

This blog post shows how to leverage OPA using AWS Gateway in order to implement geofencing, rate limiting and more in order to introduce policy based access control to the gateway.  

In this step-by-step guide, we’ll demonstrate how we were able to enforce the following policies on incoming requests to the API gateway using Open Policy Agent:

  1. A simple user whitelist (by JWT/API key/or other custom authorization header)
  2. Time of day when the request is made
  3. The location from which the request is made (AKA geo-fencing)
  4. User throttling – has the user exhausted their request quota
  5. Group throttling – have the members of a user group exhausted their total request quota

In order to make this happen, we need to implement two new built-in functions within the OPA runtime environment and Rego language. The first function converts an IP address into a geolocation object and the second function leverages distributed key-value storage in order to have rate-limiting functionality. In this example, we hooked into AWS ElastiCache (Managed Redis).

Overview

OPA AWS API Gateway Diagram

We set up an API Gateway endpoint with a mock backend, and attached a Custom Authorizer Lambda to it. An Authorizer Lambda is just a normal AWS Lambda function that can be associated with an API Gateway endpoint. This runs simple Python code in order to call an OPA server that is running on an EC2 instance within the same subnet. It packages the request context as `input` for the Rego policy hosted on the server, and packages the response from OPA in a way that API Gateway can understand.

The API Gateway mock endpoint

The API Gateway mock endpoint, which is a basic pet store example.

gateway OPA auth

The endpoint is configured to use an AWS Lambda function for authorization. The code used in the AWS Lambda can be found here.

New Built-ins

We have created two custom OPA built-ins, that will be used inside the policy code: one for converting IP addresses to geolocation data, and another that connects to a Redis shared memory server endpoint in order to implement rate-limiting across multiple OPA instances.

Both built-in functions are open sourced and can be accessed here.

geo_from_ip(ip_address)

This Function returns a detailed geolocation object for the given `ip_address` using the Maxmind GeoLite2 Database. For example, if the `ip_address` is `”27.7.199.7”`, then the response object is:

{
  "City": {
    "Names": {
  	"de": "Delhi",
  	"en": "Delhi",
  	"es": "Delhi",
  	"fr": "Delhi",
  	"ja": "デリー",
  	"pt-BR": "Deli",
  	"ru": "Дели",
  	"zh-CN": "德里"
    }
  },
  "Continent": {
    "Code": "AS",
    "Names": {
  	"de": "Asien",
  	"en": "Asia",
  	"es": "Asia",
  	"fr": "Asie",
  	"ja": "アジア",
  	"pt-BR": "Ásia",
  	"ru": "Азия",
  	"zh-CN": "亚洲"
    }
  },
  "Country": {
    "IsInEuropeanUnion": false,
    "IsoCode": "IN",
    "Names": {
  	"de": "Indien",
  	"en": "India",
  	"es": "India",
  	"fr": "Inde",
  	"ja": "インド",
  	"pt-BR": "Índia",
  	"ru": "Индия",
  	"zh-CN": "印度"
    }
  },
  "Location": {
    "AccuracyRadius": 1,
    "Latitude": 28.6858,
    "Longitude": 77.231,
    "MetroCode": 0,
    "TimeZone": "Asia/Kolkata"
  },
  "Postal": {
    "Code": "110054"
  },
  "RegisteredCountry": {
    "IsInEuropeanUnion": false,
    "IsoCode": "IN",
    "Names": {
  	"de": "Indien",
  	"en": "India",
  	"es": "India",
  	"fr": "Inde",
  	"ja": "インド",
  	"pt-BR": "Índia",
  	"ru": "Индия",
  	"zh-CN": "印度"
    }
  },
  "RepresentedCountry": {
    "GeoNameID": 0,
    "IsInEuropeanUnion": false,
    "IsoCode": "",
    "Names": null,
    "Type": ""
  },
  "Subdivisions": [
    {
  	"IsoCode": "DL",
  	"Names": {
    	"de": "Delhi",
    	"en": "National Capital Territory of Delhi",
    	"fr": "Delhi"
  	}
    }
  ],
  "Traits": {
    "IsAnonymousProxy": false,
    "IsSatelliteProvider": false
  }
}

rate_limit(key, limit)

This function returns `true` if `limit` has been reached for the given `key`.

In this example, multiple OPA instances can connect to Redis shared memory and make calls using this builtin. A time duration for which the limit is applied is configurable on the server (in our case hard coded to 1 minute). 

So for example, if the time duration is 1 minute, then `rate_limit(key, limit)` will return `false` for the first `limit` calls made for the given `key` from across all OPA servers, within the last 1 minute. For every call thereon, it will return `true`.

Policy code – putting it all together

The policy contains multiple checks (rules) that run against incoming requests, and are divided into two functional categories: `stateless_checks` and `stateful_checks`

The idea is that `stateful_checks`, which affect the Redis shared memory state, should only be run if all `stateless_checks` are passing (evaluated to true). In any other case we’ll want to short-circuit and return a “deny” decision to the requester.

Along with context from `input`, the policy also refers to a Data Source. In this case we’re using  mock data to demonstrate how different sources of information in an organization can provide context to make OPA policy decisions.

Let’s go over these checks one by one:

allow_time

A simple rule to demonstrate how time-limit conditions stored in a database can be applied. The request time is picked up from the API Gateway request context.

epoch_ms = input.requestContext.requestTimeEpoch
request_time := time.clock(epoch_ms*1000000)
allow_time {
  # The starting and ending time, in 24-hour UTC format, is defined for each user.
  start_t := user_ctx.start_time
  end_t := user_ctx.end_time
  request_time[0] >= start_t
  request_time[0] < end_t
}

allow_geo

A geofencing rule that uses the IP address available in the API Gateway request context and converts it to a geolocation using our new built-in.

ip := input.requestContext.identity.sourceIp
allow_geo {
  geo := build.geo_from_ip(ip)
  # Match the IP address geolocation to the allowed geolocations
  # for this user.
  geo.Subdivisions[_].IsoCode == user_ctx.subdivisions[_]
}

stateless_checks_failed

A rule that finds whether the `stateless_checks` are passing or not.

stateless_checks = [allow_user, allow_time, allow_geo]

stateless_checks_failed {
  check := stateless_checks[_]
  not check == true
}

allow_user_rate

A rule for per-user rate limiting. The custom built-in `build.rate_limit` accepts a key that is unique to this function, so we construct a key of the form `user:<user>`.  The limit for this user is derived from the database.

allow_user_rate {
  not stateless_checks_failed
  key := concat("", ["user:", user])
  limit := build.rate_limit(key, user_ctx.rate)
  limit == false
}

allow_group_rate

A rule for per-group rate limiting.

allow_group_rate {
  not stateless_checks_failed
  key := concat("", ["group:", user_ctx.group])
  limit := build.rate_limit(key, group_ctx.rate)
  limit == false
}

Finally, we put it all together.

If all checks are passing, `allow` should evaluate to `true`. Otherwise, `allow` should evaluate to the error message of the first failing check.

allow {
  passed := [x | x := all_checks[_]; x == true]
  count(passed) == count(all_checks)
}

allow = message {
  failed := [x | x := all_checks[_]; x != true]
  message := failed[0]
}

The full policy will look like this:

package demo
default allow = false

#import data.dynamo

# Set defaults for rules.
default allow_user = "You are not authorized to access this service"
default allow_time = "You cannot access this service in this time period"
default allow_geo = "You cannot access this service from your location"

default stateless_checks_failed = false

default allow_user_rate = "You have exceeded your user request quota for this service"
default allow_group_rate = "You have exceeded your group request quota for this service"

# Based on the architecture, an identity provider will convert
# tokens held by the user to a user identifier. In this demo, the
# user is directly accepted as a header.
user := input.headers.user

# Authorization context for this user is fetched from a Data Source,
# for example an AWS DynamoDB table.
user_ctx := data.datasources.internal.users[user]
group_ctx := data.datasources.internal.groups[user_ctx.group]

# Handle basic user check.
allow_user = x {
  user_ctx
  x := true
}

# Handle request time conditions.
epoch_ms = input.requestContext.requestTimeEpoch
request_time := time.clock(epoch_ms*1000000)

allow_time {
  # The starting and ending time, in 24-hour UTC format, is defined for each user.
  start_t := user_ctx.start_time
  end_t := user_ctx.end_time
    
  request_time[0] >= start_t
  request_time[0] < end_t
}

# Handle request geolocation conditions.
ip := input.requestContext.identity.sourceIp

allow_geo {
  geo := build.geo_from_ip(ip)
  
  # Match the IP address geolocation to the allowed geolocations
  # for this user.
  geo.Subdivisions[_].IsoCode == user_ctx.subdivisions[_]
}

stateless_checks = [allow_user, allow_time, allow_geo]

# The following rules affect state, specifically the Redis cache cluster
# used for rate limiting. They should only be evaluated if the previous
# 'stateless' rules have passed successfully.
stateless_checks_failed {
  check := stateless_checks[_]
  not check == true
}

# Handle user rate limiting conditions.
allow_user_rate {
  not stateless_checks_failed

  key := concat("", ["user:", user])
  limit := build.rate_limit(key, user_ctx.rate)
  
  limit == false
}

# Handle group rate limiting conditions.
allow_group_rate {
  not stateless_checks_failed
  
  key := concat("", ["group:", user_ctx.group])
  limit := build.rate_limit(key, group_ctx.rate)
  
  limit == false
}

stateful_checks = [allow_user_rate, allow_group_rate]

all_checks = array.concat(stateless_checks, stateful_checks)

# This rule verifies all required conditions, and also decides
# the message to be shown to the user based on the type of auth denial.
allow {
  passed := [x | x := all_checks[_]; x == true]
  count(passed) == count(all_checks)
}

allow = message {
  failed := [x | x := all_checks[_]; x != true]
  message := failed[0]
}

As you can see, OPA is very general purpose. In this example we’ve managed to leverage OPA and the rego language to define more fine-grained access controls on top of AWS API Gateway.

About Us

build.security is an authorization policy management platform that simplifies building RBAC/ABAC and API Authorization for your applications. Powered by OPA, our platform empowers you to create fine-grained access controls from any external data source in your technology stack, providing a single pane of glass for policy visibility and management. build.security’s control plane makes authoring, testing and managing policies a seamless process and can be deployed without a single line of code.  

Subscribe to build.security’s newsletter

Keep up with the latest news on our authorization policy management platform