AWS Glue python ApplyMapping / apply_mapping example


The ApplyMapping class is a type conversion and field renaming function for your data. To apply the map, you need two things:
  1. A dataframe
  2. The mapping list
The mapping list is a list of tuples that describe how you want to convert you types. For example, if you have a data frame like such

 | old_column1   | old_column2 | old_column3 |
 | "1286"        |          29 | "foo"       |
 | "38613390539" |         386 | "bar"       |
And you apply the following mapping to it:

your_map = [
    ('old_column1', 'string', 'new_column1', 'bigint' ),
    ('old_column2', 'int',    'new_column2', 'float'  )
    ]
This would rename old_column1 to new_column1 and cast its string contents to bigint. Similarly it would rename old_column2 to new_column2 and cast it from int to float. old_column3 will be omitted from the results. Rows that cannot be mapped in the way you instruct will be filtered.

 | new_column1  | new_column2  |
 |         1286 |         29.0 |
 |  38613390539 |        386.0 |
to apply:

# you need to have aws glue transforms imported
from awsglue.transforms import *

# the following lines are identical
new_df = df.apply_mapping(mappings = your_map)
new_df = ApplyMapping.apply(frame = df, mappings = your_map)
If your columns have nested data, then use dots to refer to nested columns in your mapping. If your column names have dots in them (e.g. you have relationalized your data), then escape column names with back-ticks.

For example:


your_map = [
    ('old.nested.column1',      'string', 'new.nested.column1', 'bigint' ),
    ('`old.column.with.dots1`', 'int',    'new_column2',        'float'  )
]

ApplyMapping returns only mapped columns. Columns that aren't in your mapping list will be omitted from the result. So you need to include all fields in mapping that you want to include in the result, even if no conversion is made.

Comments

Popular posts from this blog

How to access AWS S3 with pyspark locally using AWS profiles tutorial