Earlier this year we were dealing with an issue in one of our data processing pipelines. Through telemetry we could see where the problem was coming from, but I did wonder if I could model the problem in a simulation too. After all, if we have a tool available to predict certain issues before we implement them, that can be very helpful to adapt the design and/or code upfront. I remembered some lecture from university many years ago, where we were discussing discrete event simulations. So I figured I’d give it a shot and see if I can write a simple simulator for such a producer/consumer scenario with TypeScript.

Very short intro

Discrete event simulators model sequences of discrete events. Each event has a timestamp, and can cause the modelled system to change. Events can also cause other events. A typical application of such simulators is a bank where customers arrive, wait in line for a teller to become available, take some amount of time with the teller, and then leave the bank again. Wikipedia1 has more to say about this, if you care.

More technical stuff

My simulator uses a min-heap / priority queue for event scheduling. This helps ensure that no matter the order in which events are introduced to the system, they’re always handled using the correct order on the timeline.

The specific simulators can have a setup phase, typically used to bootstrap the first few events into the system, and a run phase that executes until there are no more events. That is to say, it is up to the specific simulators to decide when to stop the simulation, by no longer scheduling additional events or by scheduling an event that simply stops the simulator, regardless of how many events are left in the queue.

Events in the simulator are active in a sense that their execute method is called when its their turn. That allows for code to execute when an event is handled. Like I mentioned above, that code can do all sorts of funny things like scheduling more events or stopping the entire simulation.

Code and demo

But enough of all that. Here is the link to the sources on GitHub. And here is a link to a demo of the Simulator here on flrx39.net. You’ll see that there are different simulations available, including a procuder/consumer scenario. Currently, that supports only a single producer though. Feel free to play around with it, and change the configuration. For instance, changing the mean arrival time (the mean time between arrivals of new data to handle, i.e. the mean time between data procuded by the producer) to 1 and use a mean processing time (i.e., the mean time needed by consumers to process data), you’ll see that the queue tends to grow inifitely as long as there’s only a single consumer. Yet adding another consumer helps bring the system back into balance, which is what one would expect. Similarly if instead of introducing a second consumer you let the single consumer consume produced events in batches of 2 (or more) – also just like one would expect.

I did play around with the parameters too, to model the issue I mentioned in the beginning and indeed I could see that with how our data processing system was set up, it was bound to run into the problem.

Relation between events

If you dig around in the code, you’ll find all the details of the producer/consumer scenario implementation. A few key items are highlighted here.

For instance, a consumer can be either Idle (ready to consume data, if available), Consuming, or Sleeping. If sleep is configured by setting both the maximum idle time and the sleep time, a consumer will go to sleep if it didn’t have any data to process for maxIdleTime, that is, if it has been in the Idle state for that long. In case a consumer goes to sleep, a corresponding ConsumerWakeUp event is scheduled for the currentTime + sleepTime. The ConsumerWakeUp event in turn schedules the next ConsumerStartSleep event, which, when executed, only starts sleeping if the last activity and maxIdleTime allow that.

There’s also the ItemProduced event which represents an item being produced by the producer. It uses the configured meanArrivalTime to schedule the next item production too. When an item is produced, it also checks if there’s an idle consumer around, that could start handling that item right away. If that’s the case, it schedules a BatchConsumed event, representing the end of consumption of a batch (which is either a single item, or multiple items, depending on the configuration).

Granted, this way of modelling is quite a bit simplified. Depending on the complexity of the system, and communication channels/mechanisms involved, a more complicated modelling could be needed. For instance, separate events for when consumption on a batch is started and when it finishes might be needed. To the simulator it doesn’t matter how many different event types there are. It’ll just work through them in order.

Other applications for discrete events

If you go through the simulator, you’ll also find some other applications I added while building this. Of course, the list of what can be modeled is a lot longer though.

Beep simulator

See Beep simulator. No worries, it will not actually make any noise. It is there to demo the min-heap / priority queue: regardless of the order of times one enters in there, the beeps always happen in the correct order.

Counter simulator

See Counter simulator. The events in this simulator do exactly one thing: they add another event with an event time of x + increment where x is the current event’s timestamp, and increment is the configured increment, as long as the current event time is less than the configured upper bound. In other words, the simulator counts from zero to the upper bound (or slighly beyond that) in the given increments.

Decay simulator

See Decay simulator. I used this a lot to actually verify correctness of the whole thing. It’s all about decay (as in “radioactive decay”). Since there are tables of half-lifes for various isotopes, I just select a few of them to quickly run through the simulator and check the results. For instance, the carbon-14 (aka radiocarbon) isotope used for radiocarbon dating has a half-life of about 5730 years. Running the simulation for this a few times, you will see that the error for the 50-percentile (aka the half-life) is typically less than 1%. Precision is typically increased by increasing the number of nuclei that are simulated. Since the number of nuclei directly affects the number of events though, the simulation time also increases, so don’t make that number too high 😉