Introduction
After how we calculate filter life, some of the most popular questions we’re asked concern our cloud infrastructure. Is it secure? Is it reliable? As a business, can I trust Woosh to adhere to information security best practices? As a customer, is my credit card number at risk of being stolen? Sometimes other engineers are just curious about how and why we made certain decisions.
Disclaimer: This article is technically focused and readers will benefit from an understanding of cloud technologies, particularly AWS. Additionally, many aspects have been simplified for brevity.
Problem
In November 2021, we had a 3D printed folding frame prototype and a crude concept for measuring differential air pressure. We certainly did not have a viable backend solution for collecting vast sums of data, managing a fleet of devices, calculating filter life or even displaying it to the user. So, we set out to build one.
Requirements
The simplified minimum requirements for our cloud solution are as follows:
- The solution must automatically scale horizontally with addition of new devices and users. There shall be no manual provisioning of infrastructure.
- Transmission of data between the cloud, devices and users must be secure.
- The solution must be resilient to physical hardware and network outages. There must not be a single point of failure. Eg: data center power outage.
- Telemetry (data) must be ingested from devices and securely stored for later analysis by Woosh’s algorithms.
- The solution must be able to manipulate and synchronize settings for devices that lack a persistent internet connection.
- The solution must expose secure APIs to support the user facing mobile application.
- The solution must expose secure APIs to support internal tooling.
- The solution must provide for separate but identical development and production environments.
Tenets
While making design decisions that satisfy the above, we adhered to the following tenets:
- We prioritize implementation speed over other factors, including cost. We use off the shelf components and solutions rather than creating new ones from scratch, where practical.
- We are not an infrastructure company. We design solutions that do not require manual fleet scaling or server maintenance.
- We do not strive to design perfect solutions on the first try. We make reversible decisions and learn from our mistakes.
High Level Design
After numerous brainstorming sessions, prototypes, and design reviews, we landed on an AWS based, serverless design. The main benefits of which are: security, automatic scaling, and rapid development.
High Level Function
We use the IoT MQTT and the Rules services to ingest device telemetry while IoT Device Shadows allow us to remotely monitor and configure our hardware. All data is stored in DynamoDB using a model optimized for fast access to time series data.
Cognito and IAM provide boiler plate user management and access control. APIs are served through API Gateway and executed as Lambda functions. Our algorithms, such as filter life, are also Lambda functions which are triggered upon the ingestion of new telemetry. We make extensive use of S3 for lower cost data storage and Athena for deep analytics.
Our infrastructure is provisioned as code using the Serverless Framework. Bitbucket is our selected code storage and CI/CD platform which seamlessly integrates with AWS.
Highlights
There are many noteworthy aspects to our solution but a few in particular stand out.
Security
Use of AWS has freed us from the burden (and risk) of having to build our own security primitives. Cognito stores user passwords, we don’t do so directly. API Gateway enforces TLS encryption on all our endpoints. Communication between our cloud and each device is similarly encrypted. This is accomplished via X.509 certificates generated by AWS IoT for each device, which we can revoke individually. All permissions in the system are based on least privilege, which is enabled through robust IAM policies.
Data Model
Theoretically, we are limited only by DynamoDB in the amount of data we can store from devices in the field. Our selection of partition and sort keys distributes data evenly inside DynamoDB while permitting fast access for algorithms that compute filter life. We make use of on-demand capacity for scaling and point-in-time recovery for backups. We export data from DynamoDB to other tools, such as Athena, for complicated queries.
Scaling and Reliability
Like DynamoDB, the other managed AWS services we depend on are also designed for scale. The number of connected Woosh devices and amount of telemetry our solution can receive is limited only by AWS IoT service limitations. Additionally, we benefit from the automatic replication of AWS services across availability zones.
Device Management
The IoT Registry provides all the primitives we need to maintain a detailed index of our deployed fleet. We are able to query for important metrics such as battery level or firmware version without difficulty. Relationships between users and devices are maintained in DynamoDB. Secure REST APIs transform and encapsulate this information along with telemetry for consumption by our mobile app.
Since Inception
We launched our shiny new cloud as a private beta in 2022. It was launched publicly along with general availability of the Woosh solution in 2023.
Uptime
Our cloud infrastructure has been collecting data 100% of the time since launch. While we have experienced a few customer facing issues, these have been a result of edge cases we did not anticipate while testing or outages with our third party weather API providers. We maintain subscriptions to multiple weather API services and failover quickly to maintain our mobile app experience.
Rapid Development
Our development culture emphasizes making fast, deliberate decisions. Going serverless has enabled us to quickly add new features without breaking existing ones. When we needed to add push notifications after launch, we built a new serverless project and a CI/CD pipeline specifically for it in only two weeks.
What we’ve learned
To date, the serverless promise has certainly delivered. However, there were a couple of drawbacks we didn't adequately anticipate.
The Good
- Using AWS got us to market more quickly than if we built many of the offered primitives ourselves.
- Our solution scales automatically with new users and devices.
- Optimized infrastructure cost. We only pay for what we use.
- The operational load for developers is low. We can focus on building.
- We’ve never had to wake anyone up in the middle of the night. Customers and engineers alike love reliability!
The Bad
- We spent a lot of time thinking about the types of DynamoDB queries we would need to ensure we designed the proper data model. This was time consuming and we were often wrong.
- Logical integrity can be difficult to maintain in DynamoDB. If we had to do it again, we would consider using a traditional relational DB for low frequency queries such as user/device mappings or settings, leaving DynamoDB for telemetry only.
- Performing business analytics on data stored in DynamoDB is hard. We had requests from the business team that were complex to perform and took developers away from building features. While export to Athena has mitigated this to an extent, the use of a relational database for certain types of data, as per above, would have greatly reduced friction between business and engineering teams.
Conclusion
Our hardware and mobile app are only two things customers get to see. We are extremely happy with, and proud of, our cloud architecture. It’s the hidden glue that keeps Woosh together. It is reliable and secure. Our design choices reduce the operational load on the engineering team so we are able to focus on delivering great new features for our customers.
Leave a comment
This site is protected by hCaptcha and the hCaptcha Privacy Policy and Terms of Service apply.