Ruby Curator for AWS ElasticSearch Service

A Ruby alternative to the Python Curator for ElasticSearch.

If you use Amazon Web Services' ElasticSearch Service, there's a good chance you will need to manage the indices so you do not kill your ElasticSearch cluster. On the AWS documentation page they have four options available as of this writing and depending on which version of the ElasticSearch engine you are using, you will find a best fit for your use case. Three of the options seem to be native to AWS, and the other is an outside resource that is written in Python. This blog post is not intended to convince you to use one version over another, it is simply providing an alternative to one of the options, Curator, which is written in Python.

At Custom Ink, we primarily support Ruby on Rails apps, and we felt as though this may be a good opportunity to have something that is more widely understood and supportable by our engineers. After researching online for a Ruby alternative to the Curator lambda, I was unable to find something, so we went to work on writing our own.

I used Lamby as the framework for this Lambda, which made a lot of the initial set up very easy! The first step in the proccess is to create an IAM role for the Lambda function.

Create an IAM user for your lambda function

You should first create an IAM user that will be used by the lambda to invalidate/delete indices. This is how the policy should look like for this user:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "es:ESHttpGet",
                "es:ESHttpPut"
            ],
            "Resource": "arn:aws:es:REGION:ACCOUNT:domain/elasticsearch-cluster/*"
        }
    ]
}

If the above policy has too wide of a resource, you can definitely pin down the resource access even further to the endpoint that controls the indices. Our Lambda was in the VPC and only affected our 1 cluster so we felt okay with access to more than that endpoint for our access policy.

Under the Security Credentials tab, you should generate access keys and then save those keys in SSM. You can save both the username & password in the same secret, or have them in separate secrets, just be mindful that you are charged by API call for SSM. Below is the example I have for this function:

{
  "access_key_id": "SUPERSECRETACCESSKEYID",
  "secret_access_key": "SUPERSECRETACCESSKEY"
}

Create your serverless function & code

Below is the core of the work, this curator.rb file is set in the lib directory. As you can see, it's fairly straightforward. You simply have to add in the name of your full URL & port for the ElasticSearch cluster, your region, and your account number. In order to ensure the lambda can actually perform the proper functions, we do have to pass in variables saved in SSM, which is set as an environment below that is passed in to the lambda function. This will delete indexes older than 7 days, but is completely customizable depending on your needs. The values in the event handler can be passed in as ENV values along with the ElasticSearch domain if you want to make the lambda more generic or want to deploy it across different development environments or AWS accounts.

# Add your gem requires here:
require 'aws-sdk'
require 'faraday_middleware/aws_sigv4'
require 'elasticsearch'
# to convert json from ssm
require 'json' 
require_relative 'curator/ssm'

def handler(event:, context:)
  full_url_and_port = https://my-domain.region.es.amazonaws.com:443
  region = 'us-east-1' # e.g. us-west-1
  account = '1234567890'

  keys = JSON.parse(SSM.get_parameter('/path/to-elasticsearch-keys'))


  client = Elasticsearch::Client.new(url: full_url_and_port) do |f|
    f.request :aws_sigv4,
      service: 'es',
      region: region,
      access_key_id: keys['access_key_id'],
      secret_access_key: keys['secret_access_key']
  end
  response = client.perform_request 'GET', '_cat/indices?format=JSON'
  p response.body
  # gather indexes here
  indices = response.body.map {|a| a["index"]}  
  indices.each do |index|
  # returns class called match data.
    date = index.match(/\d{4}-\d{2}-\d{2}/) 
    if date
      date = Date.parse date.to_s
      if date < Date.today - 7
        puts client.indices.delete index: index
        p "Deleted index: #{index}"
      end
    end
  end
end

In the above code, please note that you can pass in the account ID & region values as ENV values along with the ES domain if you want to make the lambda more generic or want to deploy it across different development environments or AWS accounts.

In the curator folder, we have another folder for the Ruby code that generates an SSM class to get the SSM variables from AWS, that is shared below:

# uses AWS SDK SSM gem
require 'aws-sdk-ssm'

class SSM
  def self.get_parameter(name)
    ssm = Aws::SSM::Client.new()
    ssm_response = ssm.get_parameter({
      name: name,
      with_decryption: true,
    })
    ssm_response.parameter.value
  end
end

Be sure to add these two gems to your gemfile as well:

gem 'elasticsearch', "~> 6.7" # change depending on your version
gem 'faraday_middleware-aws-sigv4'

Write your template.yaml CloudFormation

Lastly we have a template.yaml for our CloudFormation which generates these resources. You can see it below:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Curator

Parameters:

  StageEnv:
    Type: String
    Default: development
    AllowedValues:
      - test
      - development
      - staging
      - prod

Resources:

  CuratorLambda:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: .
      Handler: lib/curator.handler
      Runtime: ruby2.7
      Timeout: 60
      MemorySize: 512
      FunctionName: !Sub curator-${StageEnv}
      Environment:
        Variables:
          STAGE_ENV: !Ref StageEnv
      Policies:
        - Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Action:
                - ssm:GetParametersByPath
                - ssm:GetParameters
                - ssm:GetParameterHistory
                - ssm:GetParameter
                - kms:Decrypt
              Resource:
                - arn:aws:ssm:*:1234567890:parameter/path/to-elasticsearch-keys
                - arn:aws:kms:*:1234567890:key/key-string-12345
      Events:
        Curator:
          Type: Schedule
          Properties:
            Schedule: 'cron(0 8 ? * * *)' # UTC so 3AM EST          

Outputs:

  CuratorLambdaArn:
    Description: Lambda Function Arn
    Value: !GetAtt CuratorLambda.Arn

That's it! Now we have a serverless resource that can help you manage your indicies in AWS ElasticSearch. If you have any questions on this, please feel free to leave a comment! I hope this is helpful to others out there who are finding it hard to have a Ruby version for an ElasticSearch lambda curator. This work was done with William Spencer, the WebOps manager at Custom Ink.

by Katherine Cisneros