AWS Glue python ApplyMapping / apply_mapping example
The ApplyMapping class is a type conversion and field renaming function for your data. To apply the map, you need two things:
- A dataframe
- The mapping list
The mapping list is a list of tuples that describe how you want to convert you types. For example, if you have a data frame like such
| old_column1 | old_column2 | old_column3 | | "1286" | 29 | "foo" | | "38613390539" | 386 | "bar" |And you apply the following mapping to it:
your_map = [
('old_column1', 'string', 'new_column1', 'bigint' ),
('old_column2', 'int', 'new_column2', 'float' )
]
This would rename old_column1
to new_column1
and cast its string contents to bigint. Similarly it would rename old_column2
to new_column2
and cast it from int to float. old_column3
will be omitted from the results. Rows that cannot be mapped in the way you instruct will be filtered.| new_column1 | new_column2 | | 1286 | 29.0 | | 38613390539 | 386.0 |to apply:
# you need to have aws glue transforms imported
from awsglue.transforms import *
# the following lines are identical
new_df = df.apply_mapping(mappings = your_map)
new_df = ApplyMapping.apply(frame = df, mappings = your_map)
If your columns have nested data, then use dots to refer to nested columns in your mapping. If your column names have dots in them (e.g. you have relationalized your data), then escape column names with back-ticks.For example:
your_map = [
('old.nested.column1', 'string', 'new.nested.column1', 'bigint' ),
('`old.column.with.dots1`', 'int', 'new_column2', 'float' )
]
ApplyMapping returns only mapped columns. Columns that aren't in your mapping list will be omitted from the result. So you need to include all fields in mapping that you want to include in the result, even if no conversion is made.
Comments
Post a Comment