Performance Engineering

Faster systems = Better business

We performance engineer software systems to accelerate your business

Web and mobile apps

User experience on apps is tuned for both perceived and real performance. Optimized via CDN and async download of assets and media. Html, js, css are minified, merged and compressed. Data is cached on the client and server for fast serving pages. An ecommerce client's homepage always rendered under 2s.

API servers and realtime apps

For user facing APIs and realtime systems, all aspects are taken into consideration from programming language, libraries, algorithms, data structures, caching, disk and network i/o to ensure fast response times at scale. A pci-dss solution for credit cards consistently delivered APIs in 10-15 millis.

Streaming data pipelines

Data pipelines for streaming and file sources are tuned keeping i/o parallel to processing so read loop never stops. Batching, parallel & async processing and bulk writes are used to achieve high throughput. We tuned kafka to consume at 250MBps from a single node for a high volume log management product.

Databases and warehouses

RDBMS and NoSQL databases cause bottlenecks when they are not tuned or loaded beyond capacity. We tune the apps and dbs with correct usage, data models, indices, query optimization, and removing unnecessary constraints, triggers and slow operations such as "update on unique key error".

Cloud deployments

Cloud services though similar do operate differently across providers and tuned with billing and cost optimization drivers. Serverless and cloud workers, and services for compute, data, messaging, security and observability are configured for maximum performance at optimal cost.

Snowflake

Snowflake is a cloud datawarehouse and with its unique "storage separate from compute" architecture can suffer from escalating costs if not used correctly. We have deep technical expertise with it and tuned one of our client deployments for 10X throughput while reducing the bill by 4X.

“Faster systems keep users happy, are easy on ops, economical and environment friendly”

To make your systems fast, connect with us

Simple & effective engineering

We favour simple architectures of stateless services, peer to peer over master/slave, and keeping storage separate from compute

Our performance engineering principles

Max CPU usage

CPU is the most costly resource in a datacenter, so we provision it right and then max its usage. Data and requests are buffered and tasks split across cores. Service instances are scaled on-demand to incoming work.

I/O parallel to CPU

CPU should not wait for data to process. Data is read from and written on a separate thread in parallel to disk, network, message queues and databases. This ensures cpu cycles are optimized to the full.

Non core work under 10%

The work done by an app or a server is for the core business transaction and non core work to get to the transaction such as serde, decrypt, queue/file read. The target is to keep non-core work minimal under 10%.

I/O in bulk and in parallel

Disk and NW I/O has usually the slowest throughput in the stack. This is tuned with multiple disks and network calls in parallel. Caching is used where possible. We prefer bulk I/O into dbs, message brokers, APIs.

Avoid json

Json is a popular data format; unfortunately it is built for human readability and is quite heavy on the cpu. Binary proto, avro, parquet formats are preferred. Json payloads are parsed with SIMD for 10X faster times.

Protect the process

A process should always be protected from overload. Not mixing heavy and light workloads, request queues and internal capacity checks can help to distribute work among multiple instances.

Performance in numbers

1M Rps

On a JVM with
2 CPUs and 1GB RAM

10MBps

Logs processed
per core

4X

Snowflake cost reduced

10ms

P95 api latency for a pci-dss store

Outcome focused engineering

1. Defining the problem

The requirements for latency, throughput and cost are established first. Latency and throughput requirements can get mixed up when one is clearly preferred over the other. For instance, a PnL calculation for 1M entries in 10 seconds does not necessarily mean a latency requirement of 1 record in 10 micro seconds.

2. A performance audit

An audit of the system is done including its architecture, code and cost & performance in production to understand what is working well and the bottlenecks. Recommendations are identified and an action plan is prepared to tune the system. The audit takes 2-4 weeks.

3. Tuning - tactical and strategic

Tactical fixes are applied first to provide immediate relief, typically in 2-4 weeks. Tuning is done at 4 levels - hardware and software config, non-intrusive code changes, code refactoring and design refactoring to ensure systems perform and scale for the long term.

To make your systems fast, connect with us

“From 15 seconds to 3.5 seconds, 91social tuned our online store in 2 weeks"

- theorganicworld.com

Client stories

This fashion e-commerce client boosted its topline by 15%, reduced lost business & grew it

E-commerce firm sails through the holiday season taking 3X load with zero dropped orders

A San Francisco-based e-commerce firm that makes high-quality fashion affordable, was looking to improve their application readiness and performance for the holiday season. Roadrunnr helped the company optimize Shopify platform spending, improved their application responsiveness by 10X, and scaled the platform to take up to 3X load without dropping any order at all.

Performance Engineering

Faster systems = Better business

We performance engineer software systems to accelerate your business

Web and mobile apps

API servers and realtime apps

Streaming data pipelines

Databases and warehouses

Cloud deployments

Snowflake

“Faster systems keep users happy, are easy on ops, economical and environment friendly”

To make your systems fast, connect with us

Simple & effective engineering

We favour simple architectures of stateless services, peer to peer over master/slave, and keeping storage separate from compute

Our performance engineering principles

Max CPU usage

I/O parallel to CPU

Non core work under 10%

I/O in bulk and in parallel

Avoid json

Protect the process

Performance in numbers

1M Rps

On a JVM with 2 CPUs and 1GB RAM

10MBps

Logs processed per core

4X

Snowflake cost reduced

10ms

P95 api latency for a pci-dss store

Outcome focused engineering

1. Defining the problem

2. A performance audit

3. Tuning - tactical and strategic

To make your systems fast, connect with us

“From 15 seconds to 3.5 seconds, 91social tuned our online store in 2 weeks"

- theorganicworld.com

Client stories

E-commerce firm sails through the holiday season taking 3X load with zero dropped orders

What we offer

Resources

Company

On a JVM with
2 CPUs and 1GB RAM

Logs processed
per core