0

I have an EMR cluster that involves steps to write and delete objects on S3 bucket. I have been trying to create a bucket policy in the S3 bucket that denies deleting access to all principals except for the EMR role and the instance profile. Below is my policy.

{
    "Version": "2008-10-17",
    "Id": "ExamplePolicyId123458",
    "Statement": [
        {
            "Sid": "ExampleStmtSid12345678",
            "Effect": "Deny",
            "Principal": "*",
            "Action": [
                "s3:DeleteBucket",
                "s3:DeleteObject*"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name",
                "arn:aws:s3:::bucket-name/*"
            ],
            "Condition": {
                "StringNotLike": {
                    "aws:userId": [
                        "AROAI3FK4OGNWXLHB7IXM:*", #EMR Role Id
                        "AROAISVF3UYNPH33RYIZ6:*", # Instance Profile Role ID
                        "AIPAIDBGE7J475ON6BAEU" # Instance Profile ID
                    ]
                }
            }
        }
    ]
}

As I found somewhere, it is not possible to use wildcard entries to specify every Role session in the "NotPrincipal" section so I have used the condition of aws:userId to match.

Whenever I run the EMR step without the bucket policy, the step completes successfully. But when I add the policy to bucket and re run, the step fails with following error.

diagnostics: User class threw exception:
org.apache.hadoop.fs.s3a.AWSS3IOException: delete on s3://vr-dump/metadata/test:
com.amazonaws.services.s3.model.MultiObjectDeleteException: One or more objects could not be deleted 
(Service: null; Status Code: 200; Error Code: null; Request ID: 9FC4797479021CEE; S3 Extended Request ID: QWit1wER1s70BJb90H/0zLu4yW5oI5M4Je5aK8STjCYkkhZNVWDAyUlS4uHW5uXYIdWo27nHTak=), S3 Extended Request ID: QWit1wER1s70BJb90H/0zLu4yW5oI5M4Je5aK8STjCYkkhZNVWDAyUlS4uHW5uXYIdWo27nHTak=: One or more objects could not be deleted (Service: null; Status Code: 200; Error Code: null; Request ID: 9FC4797479021CEE; S3 Extended Request ID: QWit1wER1s70BJb90H/0zLu4yW5oI5M4Je5aK8STjCYkkhZNVWDAyUlS4uHW5uXYIdWo27nHTak=)

What is the problem here? Is this related to EMR Spark Configuration or the bucket policy?

Manoj Acharya
  • 1,331
  • 2
  • 15
  • 27
  • Why do you wish to create a Bucket Policy with a Deny? The logical approach would be to provide a Role to the EMR cluster that allows it to write to the bucket. This would not require a Bucket policy, nor the denial of anything (unless you have other policies that are granting wide-ranging access). Do you have other (non-EMR) policies in place that you are trying to Deny access? – John Rotenstein May 12 '19 at 08:39
  • @JohnRotenstein Yeah I have multiple IAM identities with a wide range of S3 permissions. So I think it will be easier with a bucket policy. I just can't figure out if the error is related to the policy itself or the spark configuration. If I remove the policy then the EMR step completes successfully. – Manoj Acharya May 12 '19 at 11:55
  • I'm confused. By default, no users/roles have access to S3. You want EMR to have access, so you can grant access via an IAM Role assigned to the cluster. That should satisfy the requirement for EMR. If you want to additionally say that "no other user/role should be able to delete the bucket or objects", then it would be better not to assign those permissions in the first place. Rather than adding a `Deny` policy, you simply should not be assigning the `Allow` permissions in the first place. But, if you have no control over such `Allow` assignments – John Rotenstein May 12 '19 at 13:02
  • I wonder if the problem is being caused by the `:*` at the end of the role. Take a look at these other questions and try the syntax they used. [AWS S3 IAM policy for role for restricting few instances to connect to S3 bucket based in instance tag or instance id](https://stackoverflow.com/a/35720454/174777) and [AWS IAM access to s3](https://stackoverflow.com/a/47424418/174777) and [S3 Cross Account Access With Role](https://stackoverflow.com/a/43649196/174777). – John Rotenstein May 12 '19 at 13:03
  • I also presume you have been reading this article: [How to Restrict Amazon S3 Bucket Access to a Specific IAM Role | AWS Security Blog](https://aws.amazon.com/blogs/security/how-to-restrict-amazon-s3-bucket-access-to-a-specific-iam-role/) and the [Request Information That You Can Use for Policy Variables](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_variables.html#policy-vars-infotouse) documentation. – John Rotenstein May 12 '19 at 13:05

1 Answers1

1

Assuming these role ids are correct (they start in AROA so they have a valid format) I believe you also need the aws account number on the policy. For example:

{
"Version": "2008-10-17",
"Id": "ExamplePolicyId123458",
"Statement": [
    {
        "Sid": "ExampleStmtSid12345678",
        "Effect": "Deny",
        "Principal": "*",
        "Action": [
            "s3:DeleteBucket",
            "s3:DeleteObject*"
        ],
        "Resource": [
            "arn:aws:s3:::vr-dump",
            "arn:aws:s3:::vr-dump/*"
        ],
        "Condition": {
            "StringNotLike": {
                "aws:userId": [
                    "AROAI3FK4OGNWXLHB7IXM:*", #EMR Role Id
                    "AROAISVF3UYNPH33RYIZ6:*", # Instance Profile Role ID
                    "AIPAIDBGE7J475ON6BAEU", # Instance Profile ID
                    "1234567890" # Your AWS Account Number
                ]
            }
        }
    }
]

}

phill.tomlinson
  • 956
  • 11
  • 13