Migrate from Amplitude to PostHog

Last updated:

|Edit this page

Prior to starting a historical data migration, ensure you do the following:

  1. Create a project on our US or EU Cloud.
  2. Sign up to a paid product analytics plan on the billing page (historic imports are free but this unlocks the necessary features).
  3. Raise an in-app support request (Target Area: Data Management) detailing where you are sending events from, how, the total volume, and the speed. For example, "we are migrating 30M events from a self-hosted instance to EU Cloud using the migration scripts at 10k events per minute."
  4. Wait for the OK from our team before starting the migration process to ensure that it completes successfully and is not rate limited.
  5. Set the historical_migration option to true when capturing events in the migration.

Migrating from Amplitude is a two step process:

  1. Export your data from Amplitude using the Amplitude Export API.

  2. Import data into PostHog using PostHog's Python SDK or batch API with the historical_migration option set to true.

Capture the event

According to Amplitude's Export API documentation, their event structure is:

{
"event_time": UTC ISO 8601 formatted timestamp,
"event_name": string,
"device_id": string,
"user_id": string | null,
"event_properties": dict,
"group_properties": dict,
"user_properties": dict,
// A bunch of other fields that include things like
// device_carrier, city, etc.
...other_fields
}

When capturing events into PostHog, we need to convert this schema to PostHog's:

Python
distinct_id = user_id if user_id else device_id
event_message = {
"properties": {
**event_properties,
**other_fields,
"$set": {**user_properties},
"$geoip_disable": True,
},
"event": event_type,
"distinct_id": distinct_id,
"timestamp": event_time,
}

In short:

  • We track the event with the same name and timestamp
  • For the distinct ID, we either use the User ID if present, or the Device ID (a UUID string) if not
  • We track person (AKA user) properties using properties: {$set}
  • We track event properties using properties

Aliasing device IDs to user IDs

In addition to tracking the events, we want to tie users' both before and after login. For Amplitude, events before and after login look a bit like this:

EventUser IDDevice ID
Application installednull551dc114-7604-430c-a42f-cf81a3059d2b
Login123551dc114-7604-430c-a42f-cf81a3059d2b
Purchase123551dc114-7604-430c-a42f-cf81a3059d2b

We want to attribute "Application installed" to the user with ID 123, so we need to also call alias:

Python
posthog = Posthog(
'<ph_project_api_key>',
host='https://us.i.posthog.com',
debug=True,
historical_migration=True
)
posthog.alias(previous_id=device_id, distinct_id=user_id)

Since you only need to do this once per user, ideally you'd store a record (e.g. a SQL table) of which users you'd already sent to PostHog, so that you don't end up sending the same events multiple times.

Questions?

Was this page useful?

Next article

Billing limits and alerts

To help you avoid surprise bills, PostHog enables you to set billing limits for each of our products. Setting a billing limit means we will stop ingesting and processing your data so you are not charged over the set limit. In other words, if you exceed the billing limit you set, your additional data is lost forever. To set a billing limit: Go to your organization's billing settings Click on the three dots in the top right of a product, then "Set billing limit." Set your dollar limit in the box…

Read next article