Rick Lamers, Orchest: “humans are still the weak link in a company’s security perimeter”

As data is rapidly becoming one of the most important company assets, quality protection and analysis measures are essential.

While during the pandemic most companies were mainly concerned about implementing the latest security measures like firewalls and Virtual Private Network solutions, the demand for data science tools skyrocketed too. Although data analytics can greatly enhance business operations, not every organization has the expertise needed to navigate this complex environment.

To discuss how data orchestration can help organizations gain more value out of vast amounts of data, we caught up with Rick Lamers, the CEO and Co-Founder of Orchest – a company creating tools to make data science easier.

What has your journey been like since your launch? How did the idea of Orchest originate?

My CTO and Co-Founder Yannick and I were both working as data scientists/data engineers next to our studies in Computer Science and Mathematics at TU Delft. We were surprised by the amount of time data teams spend on non-data-related tasks that you could cluster together as “technical/infrastructure tasks”. Configuring compute resources, network access, version control, data versioning, logging, reproducibility, code/data/result sharing, and the list goes on…We felt that a lot of these tasks are reminiscent of web application development in the early days before modern PaaS and DevOps tools like Heroku and GitLab were developed. In addition, modern data processing differs significantly enough from (web) application development to warrant a specialized set of tools. After that experience, we decided to start building a platform that would abstract away technical “low-level” details from a typical data scientist/data engineer’s workflow.

Can you introduce us to what you do? What technology do you use to make data science easier?

We’ve developed an open-source data orchestration platform. That’s a fancy way of saying a piece of software that helps you develop, deploy, and monitor batch data processing code. The pipeline abstraction (a bunch of subtasks that depend on each other), scheduler, and container-based executor sit at the heart of the product. Orchest offers a graphical browser-based workbench to create these data processing pipelines and allows users to deploy many of them concurrently in compute clusters. It includes support for version control (on top of git) and has built-in production features like failure notifications and logging. In terms of technologies we use Python, Go, TypeScript/React, Buildah (for dynamic container image builds), Argo, various Jupyter projects, and Kubernetes. The tool itself is programming language agnostic, so in addition to Python, we support Julia, R, JavaScript, and Bash-based pipeline steps (the latter opening up extensibility to CLI-based pipeline steps (dbt, meltano, wget) or other language runtimes like Java/Scala/C++).

It is evident that open source is an important part of Orchest. Would you like to share more about your vision?

We believe tools can only succeed if they deeply align with the needs of their end-users. Open source is a way for our team to build a tight feedback loop with those end users. In addition, we like to say “proof is in the pudding” and let people experience a product and evaluate whether it’s for them in a non-freedom-restricting way. Orchest has an Open Core business model and that Open Core version is available on GitHub under an OSI-approved open source license. That makes it super easy for data practitioners to try out the product.

How did the recent global events affect your field of work? Were there any new challenges you had to adapt to?

Remote work became the new norm. That helped us, as we were already planning to be a remote company before the pandemic started. We like to enable smart, independent, and self-motivated people to work in a way that works best for them. It never felt right that a group of highly talented individuals was being left out of interesting jobs & companies because they weren’t willing or able to move to one of the large tech hubs. Companies like Basecamp and GitLab have shown how successful this model can be. We now have 2 years of experience working remotely and everyone on the team is loving it. We’re never going back to the office. Something that we might experiment with is access to flexible co-working spaces for people to meet up occasionally just to be able to have a healthy routine of leaving the house every once in a while.

Besides data science solutions, what other technologies do you think would greatly enhance business operations?

Blockchain and NFTs. Haha no just kidding. We see a lot of potential in the continued abstraction of low-level concerns. I think we’ll see more serverless-type projects (preferably OSS)/cloud services that boost developer productivity. As problems become more well understood (e.g. databases and ORM systems) they can be more easily abstracted away. No need to reinvent the wheel, so let’s all work on building blocks that free up precious brain cycles to focus our work on the novel/differentiated bit of the products/software we’re developing.

What are some of the worst mistakes companies make when handling large amounts of data?

A lack of organization. No documentation, bad naming, lack of versioning (both code & data), messy databases, no cleanup after failed/incomplete data loading, etc. The type of person I would hire to work on big data/modern data processing systems is one that can carefully plan and layout projects before initiating them and someone who has a tendency to clean up after themselves. Write that README, clean up those magic values, get rid of the test files/tables, abstract out the copied code into a library/utility function, etc. What’s so different about data engineering vs software engineering is that there’s so much STATE. There are so many things (files, artifacts, databases, (message), code, services, credentials) to keep track of, that that becomes one of the dominant jobs to be done.

In your opinion, why do certain companies hesitate to implement new and innovative solutions, despite all the technological advancements available nowadays?

It’s still difficult to find talented people and those people can be expensive to hire. Having some level of understanding of how that hire will contribute at least the costs to the bottom line of the company is key in allocating budget to start initiatives that make use of more of the innovation that has become available over the past years/decades. There can also be a bit of an analysis paralysis effect where there are just too many options available so it’s hard to figure out where to start. My advice would be to start with the most pressing problems your customers have brought to you and start as small as you can. That usually sets enough things in motion to start learning and iterating. Before you know it you’re Kafka deep, but as long as you keep focussing on “how is this providing value to our customers” you’re probably in the clear.

What kind of threats do you think organizations should be prepared to tackle in the next few years? What security measures are essential in combating these threats?

From a practical perspective, the WFH trend has introduced security challenges. Oftentimes, humans are still the weak link in a company’s security perimeter. Making sure the basics are covered (2FA, disk encryption, principle of least access, up-to-date software) and that security is top of mind for everyone at the company can go a long way.

Tell us, what’s next for Orchest?

We’ve grown to 10 people over the past 6 months (coming from 4) and with the addition of an amazing bunch of product people (design, eng.) we’ll be able to develop much faster than ever before. Keep an eye on https://github.com/orchest/orchest as we continue to improve on feature breadth, robustness, usability, and scalability of Orchest as a data orchestration platform. People who are using Orchest today are already starting to wonder “what were we thinking, hand-rolling our own orchestration platform”. That’s the kind of feedback we hope to get more of as we keep investing in the quality of Orchest.

Leave a Reply

Your email address will not be published. Required fields are markedmarked