Data is rapidly becoming one of the most valuable business assets, but not every organization is equipped to use it to its full potential.
While implementing security tools like VPNs or antivirus software has been the main concern for companies over the last couple of years, the demand for data analysis solutions grew too. However, with so much data these days, innovative and powerful measures are needed to navigate this complex environment.
To discuss the innovations and challenges in the data science field, we sat down with Calvin Hung, Co-founder & CEO of WASAI – a company providing solutions that accelerate big data analysis.
What has your journey been like since your launch in 2015? How did the idea of WASAI originate?
WASAI has focused on delivering acceleration technologies of High-Performance Data Analysis in futuristic data centers for targeted vertical applications with massive volumes and high velocities of scientific data. Data is the foundation for all research in the scientific and industrial fields and serves as a fundamental basis for intelligence. In this era of mega data, data cleaning, information retrieval, knowledge extraction, information processing, and insight formation become so critical that timely production is the central issue. At the beginning of WASAI, we noticed that the various applications for different data types have become much more complicated with the rising need for data computing. It will accelerate data processing facing bottlenecks, such as insufficient memory bandwidth or incapability of computation. As Taiwan is the most crucial hub of IC design in Asia-Pacific, our team naturally started with members familiar with hardware and software design capabilities. We developed the acceleration solution, different from the usage of traditional computing units with CPU or GPU, through the optimized algorithm of mega data computation and FPGA programmable digital circuit design. We introduced faster, more efficient, and lower the cost of data acceleration solutions for significant data processing needs.
Can you introduce us to your platform? What technology do you use to accelerate data?
WASAI has two accelerated solutions, the WASAI-Tachyon™ platform is for Big data analysis, and the WASAI-Lightning™ is for Genomics analysis. The WASAI-Tachyon™ accelerated solution consists of the most popular bid data analysis tools – Apache Hadoop™ and Apache Spark™ systems.
Field-Programmable Gate Array (FPGA) is a platform with a silicon chip providing the flexibility to modify algorithms in real-time and perform as a specialized IC to conquer the optimal computation for the algorithms. By leveraging the flexibility and computing power of FPGA, our acceleration platforms can create the most suitable accelerated engine for heavy workloads applications, such as Smart Factories and Smart Cities. It benefits the applications with a 2x to 6x end-to-end performance boost of data processing and 30% to 200% of power efficiency, reducing the total cost of ownership (TCO) of data center operation.
The need for high performance in the Life Science field reaches explosive growth and the development of next-generation sequencing. WASAI technology provides the FPGA-based accelerated appliance for genomics analysis with the defacto standard pipeline Genome Analysis Toolkit (GATK), integrated on 2U rack-mount servers with all the required software and hardware boosting the speed of whole-genome 30x analysis up to 11x in contrast to the original platform. With high accuracy and consistency, the best cost-performance ratio, and the ease of use with automation utility, whole-genome sequence (WGS) 30x data analysis only takes 3 hours for our users. The WASAI-Lightning™ not only saves time for the users and maintains an excellent cost-performance structure without questioning its accuracy and consistency. Moreover, less hardware and software operation is required and thus reduces the workforce to maintain and operate the system.
What are some of the most common issues surrounding big data processing?
With the exponential growth of the data in various applications, data center users have to consider a suitable solution for data analytics with a limited budget and human resources to generate insights from the analysis in time. To process big data analytics needs lots of capital and expenses investment. The decision-makers have to mind the costs of new hardware, new hires (engineers and developers), electricity efficiency, and heat dissipation. Even when users decide to use an open-source architecture, they still need to take much time to develop, set up, configure, and maintenance of software and hardware. Even though the cloud-based solution can reduce the hardware cost and maintenance effort, the rising expense of data computing and storage is essential with increasing data volume.
Finally, even if a great solution is ready for current data processing, once the data volume has increased, the users have to be concerned about the scalability and flexibility for new workloads.
How did the recent global events affect your field of work? Were there any new challenges you had to adapt to?
As the COVID-19 pandemic continues to affect the world, big data applications in the life science field have become more crucial. Experts are working hard to use their biology and computer science pieces of knowledge to address the most pressing matters associated with this public health crisis. For instance, there is an urgent need to develop vaccines, medicines, and healthcare products to relieve the burden of clinicians treating and diagnosing patients and predict disease outcomes. Since life science discoveries need many powerful computing resources to analyze patients and virus genomics data, WASAI saw the opportunity to provide high-performance, high-accuracy, and lower-cost genomics solutions. We see the potential for enormous data analysis of life science, so we will continue to provide new Genomics analysis applications to help researchers generate valuable outcomes and impact the world with goodness.
What are some of the worst mistakes companies make when handling large amounts of data?
Companies usually begin to generate and collect data before they know what to do with the big data being curated. Also, corporates generally start to analyze their big data before they know what they are looking for in the data. Naturally, enterprises don’t have a plan of analytic software and hardware to deal with the large amounts of data being collected and analyzed. As there are a variety of big data analytical platforms, it is easier to build an affordable, efficient solution for many enterprises. However, with the exponential increases in the volume of data being produced and processed, many companies face overwhelmed data that they can no longer cope with. Another big mistake is not considering the data scaling when making a new investment in another solution to manage, process, and store this overflow of data. Companies must consider a comprehensive analytics platform with scalability to accommodate rapid changes in the growing data. Otherwise, there will be more expenses and engineering efforts to solve the growth in its data needs. Last but not least, not having quantification of the value and the return of investment for big data also mean there are no solid plans for the solution built for those gigantic chunks of data (or garbage.)
Besides data science solutions, what other technologies do you think would greatly enhance business operations?
I would say artificial intelligence would do in many ways. Some people may say it is still part of or related to data science solutions. Another is blockchain or digital currencies that I believe will disrupt business operations in the coming decades, such as securities and finance. Gene sequencing and biotech are still some of the most important technologies we care about and are developing in, and hopefully, we will make significant progress soon.
What tips would you give to companies looking to get more value out of their data?
Besides not making the mistakes we mentioned above, try hard to build data into the soul, the DNA of your company. Usually, that includes changing how people work and process daily routines, adopting new workflows and technologies, and building a culture. In doing so, you will review all your people, workflows and suppliers, etc., and you will find out what metrics you need to get to fill in the unknown charts. By breaking down all the details deeper and deeper and collecting information for an extended period, companies can finally see insight and make some theories and experiments to improve and progress bit-by-bit.
In your opinion, what kind of threats organizations should be prepared to tackle in the next few years? What security measures are essential in combating these threats?
There are growing vulnerabilities in various projects and technologies for companies using open-source software like us. Log4j security vulnerabilities are actual examples that happened in 2021 as Log4j is widely adopted in plenty of open source projects and many enterprise software building on open source projects. Companies like us should have timely awareness of the open-source technologies adopted in your organization at all levels and prepare a quick response team to monitor the situation and take active measures to eliminate the threats.
Tell us, what’s next for WASAI?
WASAI is shifting our focus from pure data science solutions to healthcare and life-science data solutions. Genomics sequencing is one of the most disrupting biomedical technologies in 21 century that will shape the future of humanity and society. Precision and personalized medicine highly depend on personal genomics, artificial intelligence, and decentralized diagnoses and treatment to make the ecosystem evolve. WASAI continues to build analytics solutions in genetic diagnoses, gene-drug development, and genomic profiling to drive the future of health!