Posts

ThumbmarkJS: A free, open source device fingerprinting JavaScript library for the web

I needed a decent JavaScript fingerprinting library. I wanted something that was 'good enough': not crappy, but didn't need to be perfect. I noticed the great FingerprintJS , but sadly, they changed their license to a paid one. Boo! What is a good alternative to FingerprintJS? There are alternatives out there too, but to be honest, they all have faults. FingerprintJS is great, but they're monetizing their product in a way that I don't like. I might need hundreds of thousands of requests per month, but I can't pay thousands of dollars. It doesn't need to be perfect either, so I don't want to pay such a high premium. ImprintJS used to be a thing, but it's now archived for a few years already. ClientJS hasn't been updated for a few years either. It is promising, but I find it a little too complicated to extend and I see nowhere any statistics on how good is it. BroprintJS is the new kid on the block and hats off for trying, but it's very lim

How to get source and medium programmatically with JavaScript - just like Google Analytics

Image
To make smart marketing decisions, you need to know what the Return On Marketing Investment (ROMI) is. When you're a webshop that does immediate transactions, it's easier to set up Google Analytics to serve you. However, if you need to understand longer customer relationships, Lifetime Values, etc, you need to get your hands on raw data. The obvious idea that comes to mind is: "I just need to get the source, medium, campaign etc. data per each visitor" and you quickly realize that Google Analytics doesn't allow you to do that. You need a custom solution. I have found two online that solve the problem This one with FirstSession and ReturningSession cookies set The Lunametrics one , which is a bit more verbose, but has a more extensive list of search engines These solutions are just fine, but, no-one's maintaining them. They're old. They don't evolve. And they don't support other paid channels than Google search. What about Bing? Facebook? Also, bot

How to access AWS S3 with pyspark locally using AWS profiles tutorial

At Zervant , we currently use databricks for our ETL processes, and it's quite great. However, there's been some difficulty in setting up scripts that work both locally and on the databricks cloud. Specifically, databricks uses their own prorpietary libraries to connect to AWS S3 based on AWS hadoop 2.7. That version does not support accessing using AWS profiles. Internally, we use SSO to create temporary credentials for an AWS profile that then assumes a role. Therefore, reading the ACCESS_ID and ACCESS_SECRET from the .credentials file is something we don't want to do. In order to accomplish this, we need to set two hadoop configurations to the Spark Context fs.s3a.aws.credentials.provider com.amazonaws.auth.profile.ProfileCredentialsProvider This is done by running this line of code: sc._jsc.hadoopConfiguration().set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.profile.ProfileCredentialsProvider") Note! You need to set your environment var

Don't use gross churn to compare SaaS businesses

Image
Whether you're optimizing your ad spend or in talks with investors about what your SaaS business is worth, you need to calculate your customer lifetime value right. And while there are good shortcuts, using them can easily lead to a completely wrong number , resulting in really bad decisions. In my role I've come to realize that the shortcuts rarely work well enough. In this article I'll cover the following challenges in calculating churn: Fluctuating / seasonal churn Churn-and-return customers Churn of different price tiers is asymmetric Churn is non-linear over time Saturated markets And finally the best way to calculate lifetime value that account for these challenges. Calculating lifetime from retention Typically lifetime value is calculated by dividing your average monthly revenue per account (ARPA) by your monthly churn. You can get your ARPA by dividing your monthly recurring revenue (MRR) by the number of paying customers you have. And so, if your ARPA is 10 € and y

Snowflake UPSERT operation (aka MERGE)

You want to insert data to a table, but if a corresponding row already exists (by some rule, e.g. unique key) you want to update that instead of adding a new row, keeping the dataset's unique requirements intact. That's an "UPDATE AND INSERT" operation, or UPSERT. Some SQL languages have native support for it.  PostgreSQL has UPSERT as native. Also MySQL supports the operation with INSERT and ON DUPLICATE KEY UPDATE. How do you do UPSERT on Snowflake? Here's how: Snowflake UPSERT i.e. MERGE operation Snowflake's UPSERT is called MERGE and it works just as conveniently. It just has a different name. Here's the simple usage: MERGE INTO workspace.destination_table d USING workspace.source_table s ON d.id = s.id AND d.val1 = s.val1 WHEN MATCHED THEN update SET d.val2 = s.val2, d.val3 = s.val3 WHEN NOT MATCHED THEN INSERT (id, val1, val2, val3) VALUES (s.id, s.val1, s.val2, s.val3); Here the destination_table and source_table are of similar form,

AWS Glue python ApplyMapping / apply_mapping example

Image
The ApplyMapping class is a type conversion and field renaming function for your data. To apply the map, you need two things: A dataframe The mapping list

The Glue code that runs on AWS Glue and on Dev Endpoint

Image
When you develop code for Glue with the Dev Endpoint , you soon get annoyed with the fact that the code is different in Glue vs on Dev Endpoint glueContext is created in a different manner there's no concept of 'job' on dev endpoint, and therefore no arguments for the job, either So Mike from The MIS Theorist asked if there was a simpler way. And sure there is!

Popular posts from this blog

How to access AWS S3 with pyspark locally using AWS profiles tutorial

Snowflake UPSERT operation (aka MERGE)