close
close

How open source influences AI development

How open source influences AI development

The Linux Foundation is stepping up its efforts to create a more open environment for developing artificial intelligence (AI) applications and to facilitate enterprise adoption of the technology through open source.

“In open source, there are certain areas where there are opportunities to benefit from collective development – and it’s not just about source code,” said Jim Zemlin, executive director of the Linux Foundation, at the KubeCon + CloudNativeCon + Open Source Summit in Hong Kong this week.

One of the most important projects that the Linux Foundation has advanced this year is the Open Platform for Enterprise AI (Opea), a framework designed to make it easier for companies to deploy and manage AI applications.

Opea, Zemlin explained, was developed as the “Kubernetes of AI” within the enterprise, providing companies with a standardized, open-source platform to build and deploy their AI models more efficiently.

“If we all start using the Opea framework, it will be much easier for you and everyone else to improve this platform and quickly get to what you want, which is the actual AI application in your company.”

Zemlin also discussed the Linux Foundation’s work on the Unified Acceleration (UXL) Foundation, an effort by semiconductor manufacturers and industry players like Google Cloud to create a common hardware abstraction layer for AI workloads.

“Nvidia’s Cuda is the de facto standard for accelerated workloads around AI, but we see an opportunity for open, abstracted APIs (application programming interfaces) that can work across multiple silicon architectures,” he said. “This will drive competition and make it easier for developers to build tools for a variety of hardware.”

AI security

Regarding AI security—an area where open source is well suited due to the transparent nature of open source software development—Zemlin pointed to the Coalition for Content Provenance and Authenticity (C2PA), a project focused on ensuring the authenticity of digital content.

“In the world of generative AI, ensuring the authenticity of content will be paramount,” he said. “C2PA provides a digital, immutable watermarking technology that can track content from creator to publisher, allowing us to identify what is genuine and what is generated by AI.”

Zemlin also highlighted the Model Openness Framework, which is designed to help companies evaluate and categorize the level of openness of different AI models, including those considered open source.

“Because there are so many moving parts in producing and deploying large language models, we developed a scoring system to help organizations understand which components are open and included in a model and which are not,” he said.

The framework defines three levels of openness for an AI model, starting with level one, the highest level of openness, where all data, instructions, and components used to build the model are openly available.

The second level is “open tooling,” where most of the tools and infrastructure are available as open source, followed by level three, where the data is not open but data cards are available describing the data sets.

Zemlin emphasized the importance of this risk-based, differentiated approach as it takes into account the complexity of the AI ​​ecosystem, where openness may not be binary but rather represents a spectrum of different components.

To support model production, he pointed to the LF AI and Data Foundation, which is driving a number of projects to provide open-source implementations for every aspect of machine learning and the process of model production in major languages. These projects include Delta Lake, an open-source storage framework for data lakehouses, and Monocle, an AI observation service.

Zemlin also discussed the Linux Foundation’s open data efforts, such as the Overture Maps project, a collaboration between Amazon, Microsoft, TomTom and Meta to create the world’s largest shared geospatial dataset.

“You see a whole new world where data is now being sold to train large language models, and it’s only available to the people who have the most money to buy that data,” he said.

“There is an opportunity for us to have an open alternative to closed data, but that requires resources and a very targeted approach. Our Overture Maps project is a first example of how this can be achieved.”

Leave a Reply

Your email address will not be published. Required fields are marked *