top of page
Highway Night Light_edited (1).png

Performance Engineering

Faster systems = Better business 

We performance engineer software systems to accelerate your business

Web and Mobile Apps-1.webp

Web and mobile apps

User experience on apps is tuned for both perceived and real performance. Optimized via CDN and async download of assets and media. Html, js, css are minified, merged and compressed. Data is cached on the client and server for fast serving pages. An ecommerce client's homepage always rendered under 2s.

Icons-01.png

API servers and realtime apps

For user facing APIs and realtime systems, all aspects are taken  into consideration from programming language, libraries, algorithms, data structures, caching, disk and network i/o to ensure fast response times at scale. A pci-dss solution for credit cards consistently delivered APIs in 10-15 millis.

Icons-02.png

Streaming data pipelines

Data pipelines for streaming and file sources are tuned keeping i/o parallel to processing so read loop never stops. Batching, parallel & async processing and bulk writes are used to achieve high throughput. We tuned kafka to consume at 250MBps from a single node for a high volume log management product.

Icons-010.png

Databases and warehouses

RDBMS and NoSQL databases cause bottlenecks when they are not tuned or loaded beyond capacity. We tune the apps and dbs with correct usage, data models, indices, query optimization, and removing unnecessary constraints, triggers and slow operations such as "update on unique key error".

Icons-03.png

Cloud deployments

Cloud services though similar do operate differently across providers and tuned with billing and cost optimization drivers. Serverless and cloud workers, and services for compute, data, messaging, security and observability are configured for maximum performance at optimal cost.

Icons-04.png

Snowflake

Snowflake is a cloud datawarehouse and with its unique "storage separate from compute" architecture can suffer from escalating costs if not used correctly. We have deep technical expertise with it and tuned one of our client deployments for 10X throughput while reducing the bill by 4X. 

“Faster systems keep users happy, are easy on ops, economical and environment friendly”
To make your systems fast, connect with us

Simple & effective engineering

We favour simple architectures of stateless services, peer to peer over master/slave, and keeping storage separate from compute

Our performance engineering principles

Icons-01 (1).png

Max CPU usage

CPU is the most costly resource in a datacenter, so we provision it right and then max its usage. Data and requests are buffered and tasks split across cores. Service instances are scaled on-demand to incoming work.

Icons-02 (1).png

I/O parallel to CPU

CPU should not wait for data to process. Data is read from and written on a separate thread in parallel to disk, network, message queues and databases. This ensures cpu cycles are optimized to the full.

Icons-03 (1).png

Non core work under 10%

The work done by an app or a server is for the core business transaction and non core work to get to the transaction such as serde, decrypt, queue/file read. The target is to keep non-core work minimal under 10%.

Events and Streams-1.png

I/O in bulk and in parallel

Disk and NW I/O has usually the slowest throughput in the stack. This is tuned with multiple disks and network calls in parallel. Caching is used where possible. We prefer bulk I/O into dbs, message brokers, APIs.

Icons-04 (1).png

Avoid json

Json is a popular data format; unfortunately it is built for human readability and is quite heavy on the cpu. Binary proto, avro, parquet formats are preferred. Json payloads are parsed with SIMD for 10X faster times.

Icons-05 (2).png

Protect the process

A process should always be protected from overload. Not mixing heavy and light workloads, request queues and internal capacity checks can help to distribute work among multiple instances. 

Performance in numbers

1M Rps

On a JVM with
2 CPUs and 1GB RAM

10MBps

Logs processed
per core

4X

Snowflake cost reduced

10ms

P95 api latency for a pci-dss store

Outcome focused engineering

Define The Ask (1)-1.png

1. Defining the problem

The requirements for latency, throughput and cost are established first. Latency and throughput requirements can get mixed up when one is clearly preferred over the other. For instance, a PnL calculation for 1M entries in 10 seconds does not necessarily mean a latency requirement of 1 record in 10 micro seconds.

2. A performance audit

An audit of the system is done including its architecture, code and cost & performance in production to understand what is working well and the bottlenecks. Recommendations are identified and an action plan is prepared to tune the system. The audit takes 2-4 weeks. 

Collect the metrics (1)-1.png
Tune with Roadrunnr (1)-1.png

3. Tuning - tactical and strategic

Tactical fixes are applied first to provide immediate relief, typically in 2-4 weeks. Tuning is done at 4 levels - hardware and software config, non-intrusive code changes, code refactoring and design refactoring to ensure systems perform and scale for the long term.

To make your systems fast, connect with us
“From 15 seconds to 3.5 seconds, 91social tuned our online store in 2 weeks"

- theorganicworld.com

Client stories

This fashion e-commerce client boosted its topline by 15%, reduced lost business & grew it

E-commerce firm sails through the holiday season taking 3X load with zero dropped orders

A San Francisco-based e-commerce firm that makes high-quality fashion affordable, was looking to improve their application readiness and performance for the holiday season. Roadrunnr helped the company optimize Shopify platform spending, improved their application responsiveness by 10X, and scaled the platform to take up to 3X load without dropping any order at all.

bottom of page