Across cloud solutions, solution viability, and versatile data modeling
Cloud Solutions at Scale
We deal with all aspects of "data at scale" and rely on many different layers of abstraction from cloud services. The increasing commoditization of machine-level access (e.g. low-level compute, storage, network) makes cloud vendors an attractive option. However, while the rise of higher levels of abstracted and managed services has ushered in unprecedented ease-of-use it often comes with a heavy cost.
We need to build our own solutions that will scale in terms of both performance and economics. This means taking a shrewd look at where it makes sense to leverage ancillary managed solutions, so we can focus on the core problems at hand, but eschew these more expensive solutions when they are directly tied to our user behaviors and/or revenue model.
Like many successful start-ups, Gro is laser-focused on solving the core problem - embracing the move-fast/break-things approach to iterate as quickly as possible towards solution viability. As we hit our good-to-great inflection point, we now need to rethink our approach, since the tools and techniques that enabled us to get here are not going to be the same that get us to the next level.
Part of this change is shifting to a service-oriented architecture to help logically and physically separate both areas of concern (e.g. high-availability) and different scale points (i.e. not all services scale linearly). Automating the management of these services (read: not just the deployment, but the care-and-feeding, as well) is crucial for us to maintain significantly sub-linear human-operator scale with our machine/user scale.
Versatile Data Modeling
We need to be creative about storing and organizing data for different scenarios (ingestion, relationship building, static and dynamic data) for not just tabular and textual data, but raster/image data as well. As you can imagine, geospatial data plays a large part in what we offer today, but will play an even larger role as we expand our focus on our global climate and the associated perils therein.
Tying together diverse datasets in dramatically different formats is just the beginning - we need to create a platform to enable ease of applying NLP, computer vision, and nearly all forms of data science and machine learning techniques. Additionally, data quality - as it applies to source data cleaning, ensuring preservation of accuracy in transformations, and verifying and validating various models - needs to be a fundamentally integrated aspect of the data pipeline architecture.