AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |
Back to Blog
Sql dummy data generator8/2/2023 The main difference from before is that we'll use y instead of x to create the actual timestamp value. So we have random() as x and 2^x - 1 as y We'll deal with values not between 0 and 2 in the next section The code to do this isn't too crazy:įirst we're using the function 2 x - 1 to generate a y value between 0 and 1 for x between 0 and 1. So the problem is how to generate timestamps that follow a function. This models the hockey-stick growth all startups want to see and is actually a fairly good representation of actual user growth. For example, 2 x looks like this: exponential function with base 2 We'll use the form f(x) = r x for simplicity's sake. Over time this usage is going to increase, so we should have our synthetic timestamps follow that.Īn exponential function is a great one to follow, since it measures compounding growth. Let's assume our timestamps represent user activity – say website sessions. Let's try modeling usage increasing over time. Well, there is one obvious pattern – the slope is perfectly straight. The timestamps are is random so there's no truly obvious pattern. It creates timestamps on a per-minute basis. The first step is to create timestamps evenly spread throughout our time interval. From there a little math will create timestamps in that interval with the properties we want. The basic process is to pick a target start and end date, the number of timestamps needed, and use a numbers table to select the right number of rows. It should be straightforward to convert to any other warehouse. We'll start with a simple case and build it up bit by bit. Here we'll show how we create timestamps that follow reasonable usage patterns. Postgres has a neat generate_series() function that can create timestamps, but it'll create them evenly (which is quite useful, but not what we want). Unfortunately, creating realistic synthetic timestamps in SQL is pretty unintuitive. Creating realistic timestamps is at the core of this process. The data is all generated by creating user events at specific (randomish) times. For example, whenever we demo Narrator's data platform we show generated data from a fake company. Generating time series data can extremely useful for testing, debugging, and demoing. Part 1 described how to create a numbers table. This is the second in a three part series showing how we generate interesting fake data to demo Narrator.
0 Comments
Read More
Leave a Reply. |