Posts

Showing posts from 2019

AWS Glue python ApplyMapping / apply_mapping example

Image
The ApplyMapping class is a type conversion and field renaming function for your data. To apply the map, you need two things: A dataframe The mapping list

The Glue code that runs on AWS Glue and on Dev Endpoint

Image
When you develop code for Glue with the Dev Endpoint , you soon get annoyed with the fact that the code is different in Glue vs on Dev Endpoint glueContext is created in a different manner there's no concept of 'job' on dev endpoint, and therefore no arguments for the job, either So Mike from The MIS Theorist asked if there was a simpler way. And sure there is!

AWS Glue, Dev Endpoint and Zeppelin Notebook

Image
AWS Glue is quite a powerful tool. What I like about it is that it's managed : you don't need to take care of infrastructure yourself, but instead AWS hosts it for you. You can schedule scripts to run in the morning and your data will be in its right place by the time you get to work. The downside is that developing scripts for AWS Glue is cumbersom , a real pain in the butt. I first tried to code the scripts through the console, but you end up waiting a lot only to realize you had a syntax error in your code.

Popular posts from this blog

How to access AWS S3 with pyspark locally using AWS profiles tutorial

Snowflake UPSERT operation (aka MERGE)