Modern businesses rely on data, which is often a complex process to harness effectively. Companies must navigate numerous steps to maximize the value of data from various sources. As data volumes continue to soar, Expanso, headquartered in Seattle, aims to revolutionize data management with distributed processing. The company recently secured $7.5 million in seed funding, led by General Catalyst and Hetz Ventures, to further develop its data processing platform ‘Bacalhau’ and extend its reach to more enterprise users.
Expanso’s vision is to meet data where it resides, whether distributed globally, and eliminate the need for extensive data transfers and centralization in cloud platforms. David Aronchick, the company’s founder and CEO, emphasizes the long-overdue need for infrastructure that caters to globally distributed workloads, promising a transformation in big data processing and global compute job execution.
Traditionally, enterprises derive value from vast data by transporting it across networks via intricate ETL pipelines and centralizing it in cloud data platforms. While effective for BI/AI applications, this approach consumes considerable time and financial resources. Aronchick recognized the challenges of globally distributed workloads in his career and saw a need for a better solution.
To address this challenge, he initiated a project to enable local compute job execution where data is stored, which evolved into Expanso. The project, launched in February 2022, rapidly gained momentum, leading to the release of the open-source project Bacalhau. Bacalhau operates on existing or planned distributed systems and schedules computing tasks directly on data storage locations. Users simply need to install a Bacalhau agent on their machines and connect to a public/private cloud network. This streamlined approach minimizes the need for extensive code rewriting, supporting various formats like Docker and WASM, making workflows simpler and more efficient.
With Bacalhau, teams can instantly analyze local data using lightweight Bacalhau nodes alongside their infrastructure. This reduces operational overhead, leverages idle edge computing resources, enhances security, speeds up processing, and reduces the risk of regulatory fines.
Bacalhau’s capabilities include data sanitization, processing application logs at the source, distributed ML training, processing files across distributed storage and regions, and managing distributed device fleets. Since its public demo launch, Bacalhau has executed over 2 million jobs across various use cases, collaborating with notable organizations such as the U.S. Navy, CalTech, University of Maryland, Prelinger Labs, WeatherXM, and others.
Expanso aims to expand Bacalhau’s support for additional enterprise use cases, address customer needs, and grow its user base, which currently boasts over 50,000 CLI downloads per month.