Tristan Mayer, Castor: “locating the right data is still a daunting task for many data analysts”

These days the amount of data has been growing exponentially, and it is getting harder to keep track of it all, both for businesses and individuals.

Luckily, besides data protection and storage measures like cloud storage, secure VPNs, or antivirus software, there are solutions like Castor that help users navigate through vast amounts of information faster and more efficiently.

To find out more about how one can locate needed data quickly, our team sat down with Tristan Mayer, CEO at Castor – a catalog tool that provides automated documentation, data lineage, and social discovery.

How did the idea of Castor come to life? What has your journey been like so far?

Xavier, Castor’s COO, and I were both Data Scientists when we noticed a lot of companies had a data discovery issue. But as we started working, we realized that we were spending more time trying to locate and comprehend data assets than deploying the models we had learned.

We found ourselves struggling with questions like "Where does the data come from?" "Who owns it?" and "What is the meaning of a particular field in my domain?". We realized that data discovery was a major problem plaguing most organizations, and existing solutions were either too technical or lacked user-friendliness, resulting in an adoption problem.

That's when we decided to build Castor, a data catalog that would provide a simple and effective solution to the data discovery problem. We created a prototype and sought feedback from data leaders, which resonated with them. The response was so positive that Amaury and Arnaud, data leaders from Payfit and Qonto, decided to leave their companies and join the Castor project.

After this, we were ready to take the next step. We raised a seed round and began to grow our team. Since then, we have achieved tremendous growth, with almost a 10x increase in revenue and an expansion of our team from 17 to 45 people. Recently, we raised $25M in a Series A round, which has allowed us to scale up even further.

Our growth has not been limited to our team size and revenue. We have also expanded our presence globally, with our team spread across five countries, 15 different cities, and two headquarters. This includes our newest office in New York, which will allow us to better serve our clients in the US market.

You take great pride in your discovery & catalog tool. Can you tell us more about its features?

Castor's discovery and catalog tool has several standout features, but if I had to pick the top three, they would be automated documentation, data lineage, and social discovery.

Automated documentation is a critical feature for any modern data cataloging tool. People don’t like documenting data. Writing and updating documentation can be a tedious task, which is why Castor has made the process as painless as possible. Castor connects to the tools that are already in your stack, populating up to 90% of the data automatically. This means you can start using the tool immediately and find value from it in seconds without adding yet another task to your to-do list.

Data lineage is another critical feature of Castor. By providing column-level data lineage, Castor brings transparency to your entire data stack, helping you understand how different data models are connected and where upstream and downstream dependencies lie. With this level of insight, it's easy to avoid breaking production and to ensure data models are working as intended.

Finally, social discovery is an important functionality that sets Castor apart from other discovery tools. Through features such as query history and identifying the most popular datasets, Castor enables its clients to engage in "social discovery." This means building transparency around what people in the company are doing with the data so that other employees can learn from their best practices. These features allow users to obtain social validation that the data is trustworthy and to leverage the work of their peers. This boosts both trust and efficiency.

What would you consider the main challenges data analysts face nowadays?

Data analysts are facing several challenges in their field, and the most common ones are finding, accessing, and understanding the data.

Unfortunately, locating the right data is still a daunting task for many data analysts because, in many organizations, the data warehouse is chaotic and poorly labeled. When data users cannot locate the right table or dashboard, they tend to rebuild it, leading to a vicious cycle of creating new assets. This, in turn, creates more chaos. This situation has made it difficult for data analysts to find the data and creates a need to reinvent the wheel in each new project.

After locating the data, accessing it is another hurdle. The modern-day organization is trying to balance data democratization and regulatory compliance. Organizations aim to make traditional departments such as operations, finance, and marketing more autonomous with data. However, data regulations such as GDPR and CCPA have tightened, leading to a threat of hefty fines if data compliance is not observed. These regulations require data access to be firmly controlled, which makes it challenging for data stewards to grant data access to more people. This often results in data users having to wait for hours to access the necessary data.

The third major challenge that data analysts face revolves around understanding the data. One of the biggest issues is the lack of data documentation, making it complicated to comprehend the meaning of certain fields, where the data comes from, and who the assigned owners are. Without proper documentation, data analysts may face difficulties in understanding the data, which contradicts the goal of self-serve analytics that everyone is striving for.

How did the pandemic affect your field of work? Were there any new challenges you had to adapt to?

The COVID-19 pandemic has had a significant impact on the way data teams work. Before the pandemic, members of data teams could easily approach each other in the office and ask questions, such as the definition of a specific column. However, with remote work becoming the new norm, this kind of ad-hoc knowledge exchange is no longer as feasible. As a result, data teams have had to find new ways of managing knowledge.

This is where tools like Castor have become increasingly important. With teams working remotely, it’s even more essential to have tools that allow various teams to be autonomous with data. Otherwise, the constant need to ping each other for information can quickly become overwhelming and inefficient.

Another challenge that data teams have faced during the pandemic is the heightened focus on data privacy and security. With more data being collected and shared online, it's crucial to ensure that the data is handled in a secure and responsible manner. Data teams have had to adapt to this by implementing stronger security measures and ensuring that their data collection and analysis processes are in line with privacy regulations.

What are some of the worst mistakes companies make when handling large amounts of data?

Companies make two critical mistakes when handling large amounts of data: compromising on data quality and neglecting to provide the right context for data analysis.

Top-notch data quality is essential for data sharing and making data-driven decisions. Sharing flawed data can lead to poor decisions, financial losses, and reputational damage. In addition, it can erode trust in the data, leading to disinterest in data altogether. Companies should not share data if they cannot provide first-rate data.

Second, providing the right context for data analysis is equally important. Data without context is dangerous and worthless. It can lead to issues including:

  1. Inability to find and use data: Without proper documentation, data may be difficult to locate, understand, and use. This can result in data being underused or even ignored altogether, which can hinder business decision-making.
  2. Data quality issues: Poor documentation can lead to data quality issues, such as incomplete or inaccurate data. This can result in errors and inconsistencies that can compromise the reliability of data-driven insights.
  3. Legal and regulatory issues: Inadequate documentation can result in companies being unable to comply with legal and regulatory requirements for data privacy and security. This can lead to fines, legal action, and reputational damage.

Overall, companies should prioritize high-quality data and provide the right context to avoid these critical mistakes.

In your opinion, why do certain companies hesitate to implement new and innovative solutions, despite all the technological advancements available nowadays?

There are a couple of reasons why some companies are hesitant to implement new and innovative solutions. The first reason is that the current economic climate leads companies to be extra cautious with their budget. Many companies are cutting down on spending in different areas. Even big tech giants like Amazon and Microsoft are laying off thousands of workers to recover some costs for their businesses. As a result, some companies are hesitant to invest in new technologies in the name of cost-cutting. However, I believe that investing in the right tools can drive value and benefit the company in the long run, even if there is an upfront cost.

The second reason is the fear of change. Some companies may be comfortable with their current processes and systems, and they may be reluctant to change things, particularly if the change requires significant investment in terms of time, money, and resources. That's why we developed Castor to be extremely intuitive to use and not require any training or upfront time investment.

It is essential to note that embracing new solutions and technologies can improve business efficiency, increase productivity, and drive innovation, leading to a competitive advantage in the marketplace. It is crucial to strike a balance between cutting costs and investing in the right tools that can provide value to the business.

What other company processes do you hope to see automated or enhanced by technology in the next few years?

One area in particular that I am excited to see further progress in is automation in the field of data documentation.

Despite the significant advances that have already been made in this area, there are still many companies that rely on manual methods, such as Excel or Google Sheets, to document their data. At Castor, we are already leveraging technology to automate a large portion of the data documentation and data lineage processes, but I believe that there is room for growth and improvement.

Looking ahead, I hope to see data documentation become a fully automated process where technology can handle the entire process from start to finish. This would not only reduce the workload and potential for human error but also improve data accessibility across organizations.

Another business process that I hope to see enhanced in the next few years is knowledge management. In today's fast-paced business environment, knowledge is a critical asset, and companies must find better ways to capture, share, and leverage it. One area in which I see significant potential for improvement is around the alignment of metrics and KPIs across different departments within organizations.

In many companies today, different departments may calculate KPIs and metrics differently, leading to inconsistencies and contradictions in dashboards and reports. This can create confusion, inefficiencies, and even conflicts within the organization. To address this challenge, I believe that technology can play a critical role in improving knowledge management processes. This is another problem we are trying to solve at Castor.

In this age of ever-evolving technology, what do you think are the key security measures organizations and individuals should implement?

Today, I believe that ensuring strong data security measures is more critical than ever. With increasing concerns around data privacy and cybersecurity, organizations must take steps to protect their sensitive data to ensure compliance with regulations such as GDPR and CCPA. We have written an article on the topic.

One of the most important steps that organizations can take is to document their data assets and implement strong access controls. This includes identifying all data sources, classifying data based on sensitivity, and restricting access to only authorized users. Additionally, it is essential to monitor access and activity and implement multi-factor authentication to further strengthen security. Castor can help a lot when it comes to implementing access controls.

Another important security measure is to engage in data minimization, which involves only collecting and storing the minimum amount of data necessary for business purposes. This reduces the risk of a data breach and helps organizations comply with regulations around data retention and disposal.

Finally, encryption is also a key security measure that organizations and individuals should implement. This involves encrypting data at rest and in transit to protect it from unauthorized access, particularly in cloud environments.

Share with us, what’s next for Castor?

At Castor, we're on a mission to revolutionize the way our clients interact with their data - it's all about discovery, community, and health. Our product roadmap is jam-packed with exciting new features that are going to take things to the next level.

Moving forward, we're focused on helping business users unlock the full potential of Castor, while also supporting data governance efforts to enable autonomous decision-making. We want to empower business teams with data, and we're not slowing down anytime soon.

So, if you're ready to take your data experience to the next level, follow us to stay up-to-date on our exciting plans for the future. At Castor, we're confident that we're well-positioned to lead the charge in this exciting new era of data management.