A Ruby alternative to the Python Curator for ElasticSearch.
If you use Amazon Web Services' ElasticSearch Service, there's a good chance you will need to manage the indices so you do not kill your ElasticSearch cluster. On the AWS documentation page they have four options available as of this writing and depending on which version of the ElasticSearch engine you are using, you will find a best fit for your use case. Three of the options seem to be native to AWS, and the other is an outside resource that is written in Python. This blog post is not intended to convince you to use one version over another, it is simply providing an alternative to one of the options, Curator, which is written in Python.
At Custom Ink, we primarily support Ruby on Rails apps, and we felt as though this may be a good opportunity to have something that is more widely understood and supportable by our engineers. After researching online for a Ruby alternative to the Curator lambda, I was unable to find something, so we went to work on writing our own.
I used Lamby as the framework for this Lambda, which made a lot of the initial set up very easy! The first step in the proccess is to create an IAM role for the Lambda function.
You should first create an IAM user that will be used by the lambda to invalidate/delete indices. This is how the policy should look like for this user:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"es:ESHttpGet",
"es:ESHttpPut"
],
"Resource": "arn:aws:es:REGION:ACCOUNT:domain/elasticsearch-cluster/*"
}
]
}
If the above policy has too wide of a resource, you can definitely pin down the resource access even further to the endpoint that controls the indices. Our Lambda was in the VPC and only affected our 1 cluster so we felt okay with access to more than that endpoint for our access policy.
Under the Security Credentials tab, you should generate access keys and then save those keys in SSM. You can save both the username & password in the same secret, or have them in separate secrets, just be mindful that you are charged by API call for SSM. Below is the example I have for this function:
{
"access_key_id": "SUPERSECRETACCESSKEYID",
"secret_access_key": "SUPERSECRETACCESSKEY"
}
Below is the core of the work, this curator.rb file is set in the lib directory. As you can see, it's fairly straightforward. You simply have to add in the name of your full URL & port for the ElasticSearch cluster, your region, and your account number. In order to ensure the lambda can actually perform the proper functions, we do have to pass in variables saved in SSM, which is set as an environment below that is passed in to the lambda function. This will delete indexes older than 7 days, but is completely customizable depending on your needs. The values in the event handler can be passed in as ENV values along with the ElasticSearch domain if you want to make the lambda more generic or want to deploy it across different development environments or AWS accounts.
# Add your gem requires here:
require 'aws-sdk'
require 'faraday_middleware/aws_sigv4'
require 'elasticsearch'
# to convert json from ssm
require 'json'
require_relative 'curator/ssm'
def handler(event:, context:)
full_url_and_port = https://my-domain.region.es.amazonaws.com:443
region = 'us-east-1' # e.g. us-west-1
account = '1234567890'
keys = JSON.parse(SSM.get_parameter('/path/to-elasticsearch-keys'))
client = Elasticsearch::Client.new(url: full_url_and_port) do |f|
f.request :aws_sigv4,
service: 'es',
region: region,
access_key_id: keys['access_key_id'],
secret_access_key: keys['secret_access_key']
end
response = client.perform_request 'GET', '_cat/indices?format=JSON'
p response.body
# gather indexes here
indices = response.body.map {|a| a["index"]}
indices.each do |index|
# returns class called match data.
date = index.match(/\d{4}-\d{2}-\d{2}/)
if date
date = Date.parse date.to_s
if date < Date.today - 7
puts client.indices.delete index: index
p "Deleted index: #{index}"
end
end
end
end
In the above code, please note that you can pass in the account ID & region values as ENV values along with the ES domain if you want to make the lambda more generic or want to deploy it across different development environments or AWS accounts.
In the curator folder, we have another folder for the Ruby code that generates an SSM class to get the SSM variables from AWS, that is shared below:
# uses AWS SDK SSM gem
require 'aws-sdk-ssm'
class SSM
def self.get_parameter(name)
ssm = Aws::SSM::Client.new()
ssm_response = ssm.get_parameter({
name: name,
with_decryption: true,
})
ssm_response.parameter.value
end
end
Be sure to add these two gems to your gemfile as well:
gem 'elasticsearch', "~> 6.7" # change depending on your version
gem 'faraday_middleware-aws-sigv4'
Lastly we have a template.yaml for our CloudFormation which generates these resources. You can see it below:
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Curator
Parameters:
StageEnv:
Type: String
Default: development
AllowedValues:
- test
- development
- staging
- prod
Resources:
CuratorLambda:
Type: AWS::Serverless::Function
Properties:
CodeUri: .
Handler: lib/curator.handler
Runtime: ruby2.7
Timeout: 60
MemorySize: 512
FunctionName: !Sub curator-${StageEnv}
Environment:
Variables:
STAGE_ENV: !Ref StageEnv
Policies:
- Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- ssm:GetParametersByPath
- ssm:GetParameters
- ssm:GetParameterHistory
- ssm:GetParameter
- kms:Decrypt
Resource:
- arn:aws:ssm:*:1234567890:parameter/path/to-elasticsearch-keys
- arn:aws:kms:*:1234567890:key/key-string-12345
Events:
Curator:
Type: Schedule
Properties:
Schedule: 'cron(0 8 ? * * *)' # UTC so 3AM EST
Outputs:
CuratorLambdaArn:
Description: Lambda Function Arn
Value: !GetAtt CuratorLambda.Arn
That's it! Now we have a serverless resource that can help you manage your indicies in AWS ElasticSearch. If you have any questions on this, please feel free to leave a comment! I hope this is helpful to others out there who are finding it hard to have a Ruby version for an ElasticSearch lambda curator. This work was done with William Spencer, the WebOps manager at Custom Ink.