
Performance Engineering
fast systems, faster business
Fast systems make customers happy, they do more business and more often. Fast systems prevent failures, lower support tickets and makes teams productive. And fast systems operate at lower costs, much lower.




Low Latency
Realtime trading & messaging systems and APIs need low latency responses on every interaction. We take into consideration everything from language, libraries, algorithms, data structures, caching, disk and network i/o to ensure these systems are tuned for low latency irrespective of scale.

High Throughput
Data processing and async transactions need high throughput to scale, usually measured in rps - records per second. We achieve multi million rps via segregating processing steps by cpu and i/o, execute and scale them in parallel, batch all i/o, reduce/eliminate local and distributed locks to ensure scale.







User Experience
User experience on a web or mobile app has multiple layers that affect performance, both perceived and real. We tune both via optimized and background download of assets & media, tune API latency, data formats and compression, cache data online and offline, prioritize requests for above and below the fold.

Databases
Database is a critical component and often is the bottleneck as it may not be tuned or receiving load it should not. We tune the db from the data model, creating the necessary indices, optimize query plans, remove unnecessary constraints, avoid clever hacks like update on unique key error, batch updates, etc.







Events and Streams
High throughput event streaming needs expert tuning of message bus, publishers & consumers. Publishers and consumers can use batching, async processing of messages while the read loop is constant. We tuned kafka to consume at 250MBps from a single node for a high volume log mgmt product. Read it here.

Separate Storage from Compute
With increase in the variety and volume of data, cloud/block storage becomes an integral part of high throughput systems. Compute can be separated and provisioned as needed to process data from storage at scale. Products such as duckdb and data formats of parquet and arrow make this approach quite scalable.












faster system
1M rps
on a JVM with
2 CPUs and 1GB RAM
10MBps
logs processed
per core
5X
online store throughput increased
10ms
avg api latency for a pci-dss store
perf tuning, the procecss

1. Define The Ask
A clear definition of the requirements, the latency, throughput and cost sets the context for the subsequent phases. Sometimes, latency and throughput requirements get mixed up when one of them is clearly the only requirement over the other. For instance, a PnL calculation for 1M entries in 10 seconds does not imply a latency requirement of 1 record in 10 micro seconds.
2. Collect the metrics
Data is collected from production monitoring systems if available and sufficient. Or it is taken from a perf test environment putting the system on anticipated load. Additional non-intrusive monitoring components are installed to collect the required data to the granular level. Config and Code is reviewed to form the complete picture and correlate metrics to code.


3. Tune with Roadrunnr
Roadrunnr is a performance toolkit from 91social, to analyze metrics, jvm gc logs, query plans etc and discover bottlenecks faster. The systems are tuned at 4 levels - hardware and software config, non-intrusive code changes, code refactoring and design refactoring to ensure systems meet latency and throughput requirements at optimal cost.
Contact us to performance tune your systems
Client Stories

E-commerce firm sails through the holiday season taking 3X load with zero dropped orders
A San Francisco-based e-commerce firm that makes high-quality fashion affordable, was looking to improve their application readiness and performance for the holiday season. Roadrunnr helped the company optimise Shopify platform spending, improved their application responsiveness by 10X, and scaled the platform to take up to 3X load without dropping any order at all.