Member-only story
ETL(Extract, Transform, and Load), Analyze and Visualize a Data Lake Using AWS Glue, Amazon Athena, Amazon Quicksight, and Amazon S3
Now let’s imagine, that we are a retail company that is looking to improve data management and analyze sales data from multiple databases, other sources, and different locations. We want to combine data bases into a single repository. Thus, this unified data repository allows us for simplified access for analysis and additional processing. This is important to us because we need to adjust our sales strategies and resources according to the results we find. To accomplish this mission related to big data, we can use AWS Glue as an ETL and data catalog management tool, Amazon Athena as a data query tool, Amazon QuickSight as a visualization tool, and Amazon S3 data lake as a data lake storage tool, as shown in the figure below.
In this article, we will upload a sample of raw data to S3 Bucket, then we will Extract, Transform, and Load this data using AWS Glue.
Then, we will analyze and query this data in S3 Bucket or AWS Glue Catalog Data by using Amazon Athena. We will save the results as pdf, and we will also automatically save them to a folder we specify in the S3 bucket.
Finally, we will visualize our data queries using Amazon QuickSight. Thus it will give decision-makers the opportunity to explore and interpret information in an interactive visual environment.
We will use them practically step by step in this article.

Topics we will cover:
1. AWS Glue Overview
2. Amazon Athena Overview
3. Hands-on Experience
3.1. The Other Tools We’ll Use In This Article
3.2. Prerequisites
3.3. Introduction
4. Creating Amazon S3 Bucket and Sample Data Files
5. Create an IAM Service Role for AWS Glue
6. Creating an AWS Glue Crawlers to Discover and Catalog the Data
7. Run the Glue Crawler
8. Review the metadata in The Glue Data Catalog
9. Transforming your Data Using AWS Glue Studio
9.1. Create a job using Glue Studio
9.2. Adding The data from Amazon S3
9.3. Run the data transformation job
9.4. Running an ELT job for S3 Bucket
10. Analyzing Data Using Amazon Athena
11. Clean Up
12. Conclusion
13. Next post
14. References