Connect to the cluster from the SQL Workbench/J.To load data to Amazon Redshift local storage, complete the following steps: It is provided publicly in an S3 bucket ( s3://awssampledbuswest2/ssbgz/), which any authenticated AWS user with access to Amazon S3 can access. This post uses the Star Schema Benchmark (SSB) dataset. Loading data to Amazon Redshift local storage You have an AWS Glue catalog database named eltblogpost as the metadata catalog for Amazon Athena and Redshift Spectrum queries.You have an EC2 instance in the same Region with PostgreSQL client CLI (psql) and can connect successfully to the cluster.You have SQL Workbench/J (or another tool of your choice) and can connect successfully to the cluster.Cluster workload management set to manual.A cluster parameter group named eltblogpost-parameter-group, which you use to change the Concurrency Scaling.An associated IAM role named redshift-elt-test-role.You have an existing Amazon Redshift cluster with the following parameters:.Make a note of the ARN for redshift-elt-test-role IAM role.redshift-elt-test-sampledata-s3-read-policy.You have an IAM role named redshift-elt-test-role that has a trust relationship with and and the following IAM policies (for production, you should restrict this further as needed):.You have an IAM policy named redshift-elt-test-s3-policy with the following read and write permissions for the Amazon S3 bucket named eltblogpost:.You have AWS CLI installed and configured to use with your AWS account.Because bucket names are unique across AWS accounts, replace eltblogpost with your unique bucket name as applicable in the sample code provided. You have an existing Amazon S3 bucket named eltblogpost in your data lake to store unloaded data from Amazon Redshift.You have the AdministratorAccess policy granted to your AWS account (for production, you should restrict this further).You have an AWS account in the same Region.Use the US-West-2 (Oregon) Region for your test run to reduce cross-region network latency and cost due to the data movement. This post uses two publicly available AWS sample datasets from the US-West-2 (Oregon) Region.Prerequisitesīefore getting started, make sure that you meet the following prerequisites: This post shows you how to get started with a step-by-step walkthrough of a few ETL and ELT design patterns of Amazon Redshift using AWS sample datasets. Part 1 of this multi-post series, ETL and ELT design patterns for modern data architecture using Amazon Redshift: Part 1, discussed common customer use cases and design best practices for building ELT and ETL data processing pipelines for data lake architecture using Amazon Redshift Spectrum, Concurrency Scaling, and recent support for data lake export. New: Read Amazon Redshift continues its price-performance leadership to learn what analytic workload trends we’re seeing from Amazon Redshift customers, new capabilities we have launched to improve Redshift’s price-performance, and the results from the latest benchmarks.
0 Comments
Leave a Reply. |