CloudTrail

CloudTrail integrations as a source are supported in both the UI and the API.

Creating a CloudTrail integration

Create a self-hosted S3 data warehouse to connect Grepr to a CloudTrail S3 bucket. This is a one-time setup that allows Grepr to access an S3 bucket.

To set up a self-hosted S3 data warehouse, follow these steps:

In the Grepr UI, navigate to the Data Warehouses section on the Integrations page.
Add a new data warehouse and select S3 Data Warehouse as the type.
Enter the CloudTrail S3 bucket details.
Set up the necessary permissions for Grepr to access the S3 bucket.
- Add this bucket policy to the S3 bucket to allow Grepr to read the logs.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::992382778380:role/customer-role-{YOUR_ORG_NAME}"
            },
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::{YOUR_BUCKET_NAME}"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::992382778380:role/customer-role-{YOUR_ORG_NAME}"
            },
            "Action": [
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::{YOUR_BUCKET_NAME}/*"
        }
    ]
}

Add this key policy to the CloudTrail KMS key to allow Grepr to decrypt the logs.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::992382778380:role/customer-role-{YOUR_ORG_NAME}"
            },
            "Action": [
                "kms:Decrypt",
                "kms:DescribeKey"
            ],
            "Resource": "*"
      }
    ]
}

Setting up the CloudTrail source

Once the self-hosted S3 data warehouse is set up, add CloudTrail as a source when creating a pipeline in the UI or via the API. Here is what the UI looks like:

Adding CloudTrail source modal

Add every account ID to collect CloudTrail logs from. These are subfolders in the S3 bucket set up in the previous step. If adding more than one, they must be comma-separated.
If the AWSLogs/ CloudTrail folder is not at the root of the S3 bucket, specify the prefix to the folder that contains it here. Otherwise, leave it empty.
Select a file reading strategy. The default is to read the latest files from the start of the job or choose the timestamp strategy to read files starting from a timestamp in the past (up to 2 days).
Select the AWS Regions to collect CloudTrail logs from. If left empty / blank, all regions will be selected and read from by default.

Optimizing the Pipeline for CloudTrail

There are a few configuration areas to update to get the most out of the CloudTrail source:

In the JSON Parser pipeline step, enable the Preserve Json Content as Message option on the Json Processer tab. This enables similarity checks on the raw JSON of the log since there is no "message" field inside CloudTrail logs. Here is what this option looks like in the UI:

JSON Parser step

In the Reducer pipeline step, there are two configurations to set:
- Under the Grouping Configuration section, improve how the logs are partitioned by updating the Group-by values field to:
service, @eventType, @eventCategory, @eventName, @sourceIPAddress, @recipientAccountId, @userIdentity.sessionContext.sessionIssuer.arn
- Under the Mask Configuration section, enable the default awsarn and awstoken masks to boost reduction for AWS-specific datasets.

Customer-hosted S3 Storage Vendor Sources & Sinks