The answers that enterprises need from their data can sometimes be elusive. We are now living in an era where data is in great abundance, especially with the transformation into cloud storage. But the tools to analyze and process that data are not always easy to use, overly accessible, or even effective. Data has to reside somewhere, and most companies need to think about how it is stored. That’s where Amazon Athena can help. Keep reading and explore how Amazon Athena supports businesses in the world of data!
Table of Contents
What is Amazon Athena?
Amazon Athena is known as an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions in the Amazon Web Services (AWS) Management Console, you can point Athena at your data which was stored in Amazon S3, and begin using standard SQL to run ad-hoc queries and get results in seconds.
Athena is serverless, so there is no infrastructure to set up or manage. Athena scales automatically running queries in parallel, therefore results are fast, even with large datasets or complex queries.
How does Amazon Athena work?
Amazon Athena works directly with S3 data. Athena uses a distributed SQL engine, Presto in order to run queries, Apache Hive to create and alter tables and partitions.
One way to think about Athena? It’s somehow similar to a Google search. You might know the data is out there, but sometimes it’s hard to find the data sets that you actually need. A query is similar to a Google search in that you can create the parameters for the SQL query that you need to perform. The difference here is that you’re now using cloud computing services instead of a search engine.
Amazon Athena does not require setup or configuration, which is typically the case with a local data store and involves an ETL (Extract, Transform, Load) preparing data in a database for a query by isolating the dataset. Instead, with Athena, your query can run without using ETL. Thus, it simplifies the process – you run the query from an easy-to-use web console. You only need to point to your data in S3, configure the schema, and start the query.
How much does Amazon Athena cost?
Amazon Athena pricing is $5 to scan Terabyte data from S3, surrounded by the closest megabyte having a minimum of 10 MB per query.
To reduce the cost, Amazon advises users to use compressed data files, have data in columnar formats, and routinely delete old result sets.
Benefits of Amazon Athena
Ease of use
Athena uses Presto, an SQL query engine that was designed to run interactive analytic queries against data sources of all sizes. It also supports a wide range of data formats including Avro, Parquet, CSV, JSON, and ORC. This allows Athena to run quick ad-hoc analysis as well as more complex requests including nested queries, large joins, windows functions, and arrays.
As mentioned above, Athena is serverless which means the user can quickly query data without having to configure or manage any infrastructure. Additionally, the customer doesn’t have to worry about failures, software updates, or scaling the servers or data warehouses as the datasets and number of users grow.
The interactive query service allows data analysts to tap into their data in Amazon S3 without creating processes to extract, transform, and load the data.
Pay per query
With Athena, users only have to pay for the queries they run and the amount of data scanned per each query. So, users can significantly reduce their charges by partitioning, compressing, and converting their data into columnar formats. Besides, there are no additional storage charges since the queries are performed directly in S3.
This interactive query tool is designed for fast performance with S3. It can easily perform queries in parallel, allowing users to get results within seconds.
Amazon Athena is able to integrate with a variety of tools including AWS Glue, Amazon QuickSight, and Key Management Service (KMS). For instance, integrating Athena with Glue allows users to access the Glue Catalog helping them to create a unified metadata repository across different services.
What are the key differences between Athena and Redshift Spectrum?
Launching in April 2017, Amazon Redshift Spectrum is known as a feature within Amazon Redshift. Spectrum allows you to query data stored on Amazon S3 using SQL, and to run the same queries on tabular data stored in your Redshift cluster and data stored in S3 which all use the Redshift SQL query editor.
Both Spectrum and Athena are similar in terms that they enable users to query data stored on S3. But, they also have some differences in how they work under the hood so choosing one over the other will produce different results in many cases. Below are some of these distinctions.
Amazon Athena vs Redshift Cost
Athena and Spectrum are both charged based on the data scanned when running a query. The price is the same for both services – $5 per compressed terabyte scanned.
Additional costs to take into account could be storage on S3, which is relatively less costly than a database.
While in Athena these costs would be all-inclusive, with Spectrum you also need to consider Redshift compute costs.
While both Spectrum and Athena are serverless, they still have some differences. Athena relies on pooled resources provided by Amazon Web Services to return query results, whereas Spectrum resources are allocated according to your Redshift cluster size.
Besides, using Redshift Spectrum gives you more control over performance. In case you need a specific query to return extra-quickly, you can easily allocate additional computing resources. This is not the case with Athena, where your query will only receive the resources allocated automatically by Amazon Web Services, which might differ during peak usage times.
Athena and Spectrum use virtual tables when querying data stored on Amazon S3. This is done using the Glue Data Catalog for schema management. Athena is optimized to work directly with table metadata stored in the Glue Data Catalog, whereas external tables in Spectrum need to be configured per each Glue Data Catalog schema.
We can’t deny the fact that data has become an essential asset that a company owns, gaining insights and extracting more out of the data is more critical now than ever. With public cloud services, providing service-based analytics services, more and more people are now searching for Amazon Athena tutorial. With all the benefits that Athena brings, companies can get more insights without any expensive complications that arise with home-built analytics tools.