The Next Generation of OSS Software Won’t Be Apache

Scott Hirleman (guest)


Jun 22, 2021

The Apache Software Foundation (ASF) has been a steward of free open-source software (FOSS) for over 15 years. The ASF has overseen many of the top FOSS projects this decade (Hadoop, Spark, Kafka, Cassandra, Mesos, Lucene, Tomcat, Zeppelin, Log4j, Parquet, Zookeeper, TinkerPop, etc.) but their strict rules will lead to startups developing OSS not to put their “babies” in the hands of the ASF.

The ASF will still likely be the main OSS foundation/license of choice for a large organization that decides to “throw over the wall” a project that they have built but aren’t interested in building a large business around (see NetBeans). E.g. Apache Cassandra was developed as a prototype inside Facebook. When FB realized it could be great (but very much wasn’t yet), they threw it over the wall and Jonathan Ellis took it to the next level; the first JIRA ticket for Cassandra was adding support for deleting data.

As evidenced by their recent clashes with Databricks and (much more severely) DataStax, the ASF is working to establish that the OSS projects under their umbrella are theirs and theirs alone. This would be great except for one thing: big companies don’t contribute to early-stage OSS. So, unless a project is thrown over the wall, OSS projects are built by usually one startup plus a few committers.

Look at the top databases according to DB Engines. In the top 25, there are 14 which are Open Source (Cassandra, Couchbase, Elasticsearch, HBase, Hive, MariaDB, Memcached, MongoDB, MySQL, Neo4j, PostgreSQL, Redis, Solr, SQLite); Cassandra, HBase, Hive, and Solr are all Apache projects. The majority of the other OSS databases are developed exclusively or almost exclusively by one company: Couchbase by Couchbase, Elasticsearch by Elastic, MongoDB by MongoDB, and Neo4j by Neo4j. See a pattern?

So, since big companies aren’t contributing to early-stage OSS projects, the burden falls on to smaller companies built around their OSS projects. Look at Confluent with Apache Kafka committers. According to the LinkedIn profiles of the committers, over half of the 17 work for Confluent (for those not aware, Confluent spun out of LinkedIn to build a business around Kafka, which was developed in-house there). So, the early work of building the project AND the brand around the project falls to these startups.

Again, look at the names of the companies that have top OSS databases that are not Apache. Companies building a business around an OSS controlled by Apache can’t name their company something with the project’s name in it. I’m shocked Apache even allowed Mesosphere to be that close to the Apache Mesos name (it’s a pretty clever name though). So, a startup has to build its own brand as well as the brand around the project for at least the first few years.

Despite the fact that the ASF doesn’t do any marketing (outside of a few conferences), you can see that the ASF going after DataStax and Databricks, once a project becomes popular – Cassandra is the #1 Apache project in DB Engines and Spark is growing gang-busters – the ASF steps in to yell “MINE”. So a company trying to build a business around an Apache project has to constantly toe the line of building a company brand, a project brand, AND associate those two without pissing off a seemingly pedantic organization.

The foundation does little to grow any of its projects outside of providing a website domain and mailing list infrastructure and (presumably) paying for the JIRA. I have never heard of the foundation working to build a larger and wider number of contributors or committers (reach out and tell me if this is wrong). So you get things like the ASF saying that too many commits come to Cassandra from DataStax people but the foundation has done nothing to bring on a more diverse group of committers.

Scott Hirleman (guest)

Follow me: