Quantcast
Channel: Software Development Articles / Blogs / Perficient
Viewing all articles
Browse latest Browse all 85

SNOWPIPE WITH AWS

$
0
0

SNOWFLAKE’S SNOWPIPE

Snow pipe:

snow pipe is a one of the data loading strategies in snowflake , for continuous data loading, will create a snow pipe to load the data from any data source or storage or any cloud to snowflake tables, its an event trigger ideology whenever a file came to the source immediately it will trigger and notify to the particular external stage in snowflake and load the data to the table immediately

 

procedure of snow pipe:

Picture1

 

S3 bucket setup for snow pipe:

Create a s3 bucket in AWS and a folder in that:

Creating an IAM policyPicture2

  1. From the home dashboard, search for and select IAM.
  2. From the left-hand navigation pane, select Account settings.
  3. Under Security Token Service (STS) in the Endpoints list, find the Snowflake region where your account is located. If the STS status is inactive, move the toggle to Active.
  4. From the left-hand navigation pane, select Policies.
  5. Select Create Policy.
  6. For Policy editor, select JSON.
  7. Add a policy document that will allow Snowflake to access the S3 bucket and folder.

The following policy (in JSON format) provides Snowflake with the required permissions to load or unload data using a single bucket and folder path.

Copy and paste the text into the policy editor:

 Picture3

 

 

 

 

 

 

{

“Version”: “2012-10-17”,

“Statement”: [

{

“Effect”: “Allow”,

“Action”: [

“s3:GetObject”,

“s3:GetObjectVersion”

],

“Resource”: “arn:aws:s3:::<bucket>/<prefix>/*”

}

 

  1. Note that AWS policies support a variety of different security use cases.
  2. Select Next.
  3. Enter a Policy name (for example, snowflake_integration)
  4. Select Create policy.

Step 2: Create the IAM Role in AWS

To configure access permissions for Snowflake in the AWS Management Console, do the following:

  1. From the left-hand navigation pane in the Identity and Access Management (IAM) Dashboard, select Roles.
  2. Select Create role.
  3. Select AWS account as the trusted entity type.
  4. In the Account ID field, enter your own AWS account ID temporarily. Later, you modify the trust relationship and grant access to Snowflake.
  5. Select the Require external ID option. An external ID is used to grant access to your AWS resources (such as S3 buckets) to a third party like Snowflake.

Enter a placeholder ID such as 0000. In a later step, you will modify the trust relationship for your IAM role and specify the external ID for your storage integration.

  1. Select Next.
  2. Select the policy you created in Step 1: Configure Access Permissions for the S3 Bucket(in this topic).
  3. Select Next.
  4. Enter a name and description for the role, then select Create role.

You have now created an IAM policy for a bucket, created an IAM role, and attached the policy to the role.

  1. On the role summary page, locate and record the Role ARN value. In the next step, you will create a Snowflake integration that references this role.

Note

Snowflake caches the temporary credentials for a period that cannot exceed the 60 minute expiration time. If you revoke access from Snowflake, users might be able to list files and access data from the cloud storage location until the cache expires.

Step 3: Create a Cloud Storage Integration in Snowflake

A storage integration is a Snowflake object that stores a generated identity and access management (IAM) user for your S3 cloud storage, along with an optional set of allowed or blocked storage locations (i.e. buckets). Cloud provider administrators in your organization grant permissions on the storage locations to the generated user. This option allows users to avoid supplying credentials when creating stages or loading data.

CREATE or replace STORAGE INTEGRATION bowiya_inte

TYPE = EXTERNAL_STAGE

STORAGE_PROVIDER = ‘S3’

ENABLED = TRUE

STORAGE_AWS_ROLE_ARN = ‘arn:aws:iam::151364773749:role/manojrole’

STORAGE_ALLOWED_LOCATIONS = (‘s3://newbucket1.10/sample.csv’);

 

The following will create an integration that allows access to all buckets in the account.

Additional external stages that also use this integration can reference the allowed buckets and paths:

Step 4: Retrieve the AWS IAM User for your Snowflake Account

  1. To retrieve the ARN for the IAM user that was created automatically for your Snowflake account, use:

desc integration bowiya_inte;

Step 5: Grant the IAM User Permissions to Access Bucket Objects

The following step-by-step instructions describe how to configure IAM access permissions for Snowflake in your AWS Management Console so that you can use a S3 bucket to load and unload data:

  1. Log in to the AWS Management Console.
  2. Select IAM.
  3. From the left-hand navigation pane, select Roles.
  4. Select the role you created
  5. Select the Trust relationships tab.
  6. Select Edit trust policy.
  7. Modify the policy document with the DESC STORAGE INTEGRATION Policy document for IAM role
  8. Picture4

 

Step 6: CREATE A STAGE IN SNOWFLAKE :

 

A stage is an object where files can be stored temporarily from a local storage or cloud storage, using the stage we can load the data into tables.

 

CREATE or replace STAGE mystage

URL = ‘s3://newbucket1.10/sample.csv’

STORAGE_INTEGRATION = bowiya_inte;

 

 

 

Step 7: CREATE A SNOW PIPE IN SNOWFLAKE:

 

CREATE or replace PIPE mypipe

AUTO_INGEST = TRUE

AS

COPY INTO table1

FROM @mystage

FILE_FORMAT = (type = ‘CSV’ SKIP_HEADER = 1);

 

Step 7: CREATE A EVENT NOTIFICATION IN S3:

Event notification will notify when an object is changed or added into the bucket.

 

In s3 go to properties and create one notification event.

 Picture5

STEP 8: Get the SQS queue id from your snowflake pipe.

Picture6

 

Once the notification event is created the snow pipe will load the data whenever the file is added or changed in s3 bucket.

Picture7

 

STEP 9: MONITOR THE SNOW PIPE STATUS.

Picture8


Shaik Abdul Kalam

Shaik Abdul Kalam is an associate technical consultant at Perficient and he is based in Chennai. He has a firm understanding of technologies like IICS-CDI, Snowflake, and SQL and is keen to learn new technologies.

NOTE: Snow pipe won’t load the same file again, because the SQS queue is reading the file name read the file and the metadata was captured. If we upload the same file again then the SQS queue will not get any notification so snow pipe can’t load the same file again.

 

 

 

 

 

 

 

 

 

 


Viewing all articles
Browse latest Browse all 85

Trending Articles