Host a Grepr data lake with the AWS S3 integration
Grepr uses AWS S3 as storage for the Grepr data lake. Although you have the option of using a Grepr-managed S3 bucket, you can deploy an S3 bucket in your account when you need more control over the data to meet security and compliance requirements. To connect to an S3 bucket in your account and enable reading and writing data, use the AWS S3 integration.
To learn more about the Grepr data lake, see The Grepr data lake.
There are two ways to set up an S3 integration:
- Use CloudFormation to create all objects. Grepr recommends using CloudFormation because it automates the setup process, ensures repeatability, and reduces the chance of misconfiguration.
- Manually create an S3 bucket and associated resources.
Both of these options are available through the Grepr UI or API.
The instructions in this document include creating a role in your account that grants access to your S3 bucket and assigning this role to a Grepr account principal. This role is only assumed by Grepr jobs acting on behalf of your organization, ensuring isolation between tenants. Alternatively, you can create a role in your own account and grant the required access for a Grepr principal to assume it. For help, contact support@grepr.ai.
Use CloudFormation to deploy an S3 bucket
Grepr recommends using CloudFormation because it automates the setup of S3 by:
- Creating a new S3 bucket specifically for your Grepr integration.
- Configuring all necessary permissions and policies.
- Granting access to an organization-specific role within Grepr. This organization-specific role is only assumed by Grepr jobs processing data for your organization, ensuring complete isolation from other Grepr customers.
Manually create an S3 bucket and resources
During query processing, Grepr might store transient objects with the prefix query-results/. To reduce storage costs, these transient objects should be removed periodically. To ensure these objects are automatically removed, Grepr recommends adding a lifecycle policy when you configure the S3 bucket. When you configure the policy:
- Add a filter on the prefix
query-results/, making sure to include the trailing/so the filter isn’t applied to other objects. - Select the
Expire current version of objectsaction. - For
Days after object creation, Grepr recommends setting the value to 1 day.
See the AWS put-bucket-lifecycle documentation .
To use an existing bucket or deploy S3 bucket resources using a tool other than CloudFormation, use the following steps:
- Create a bucket if one doesn’t already exist. You must create the bucket in the
us-east-1region. - Attach the following resource policy, replacing
{YOUR_BUCKET_NAME}with the bucket’s name and{YOUR_ORG_NAME}with your organization name.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::992382778380:role/customer-role-{YOUR_ORG_NAME}"
},
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::{YOUR_BUCKET_NAME}"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::992382778380:role/customer-role-{YOUR_ORG_NAME}"
},
"Action": [
"s3:DeleteObjectTagging",
"s3:PutObject",
"s3:GetObject",
"s3:PutObjectTagging",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::{YOUR_BUCKET_NAME}/*"
}
]
}- Create the integration using the UI or API.
Limitations
The Grepr SaaS offering is available only in the AWS us-east-1 region. Your S3 bucket must also be in the us-east-1 region.