Constellation Network and Common Crawl Foundation are revolutionizing web data accessibility and AI development through blockchain technology
SAN FRANCISCO, Oct. 24, 2024 /PRNewswire/ — The Common Crawl Foundation, a nonprofit organization founded in 2007 dedicated to providing the public with a copy of the Internet, and the Web3 blockchain it is famous for providing Constellation Network, the U.S. Department of Defense ecosystem, today announced a strategic partnership aimed at democratizing and enhancing the accessibility and utility of web crawling data around blockchain technology for artificial intelligence (AI) and data applications. announced.
The collaboration begins with Common Crawl’s massive dataset, which is used in 80% of large-scale language models and has crawled over 250 billion web pages (19 billion in 2024 alone) to date, used in AI. Explore potential opportunities to improve large-scale language models. Approximately 9 petabytes of archived crawl data. By leveraging Constellation’s decentralized network, Hypergraph, to add data immutability, provenance, and auditability, the partnership will work together to deliver accountable, transparent, AI-centric collaborative solutions. Masu.
AI is predicted to become a $30 trillion industry by 2030, with the potential to share common data sets used to train language models at scale, improve querying and storage of cleaned data, and monetize data. opportunities, and demand for secure solutions for enhanced transparency. along with the data source. With Constellation’s unique approach to providing tools to integrate existing infrastructure and distributed decentralized networks, and Common Crawl’s history of data and growth in data utilities, this partnership will work together to further democratize data .
Rich Skrenta, executive director of the Common Crawl Foundation, said: “By combining our comprehensive web archive with Constellation’s proven implementation of blockchain technology, researchers and developers around the world can trust what they get from Common Crawl and use it for AI training. We now have a model for authenticating large open data sets.”
Ben Jorgensen, CEO of Constellation Network, said: “The partnership between Constellation Network and Common Crawl highlights the mainstream adoption of Web3 solutions outside of the crypto echo chamber. The mission of “Future Focused” continues. Jorgensen continued, “Our aim is to further attract new developers by introducing features such as integrating immutability across digital workflows, thereby differentiating ourselves from previous generations of blockchain technology. It is about taking things even further.”
The organizations will begin a phased approach to implementing this effort, starting with customizable subnets called metagraphs that integrate subsets of Common Crawl’s data. This subnet is currently live on a test network and will soon be deployed to Constellation’s public network, Hypergraph. More information about Live Metagraph will be introduced in the coming weeks, along with information on how organizations and developers can get involved.
See below for more information.
About the Common Crawl Foundation
The Common Crawl Foundation is a 501(c)(3) nonprofit organization dedicated to providing free copies of the Internet to the public. Their web archive consists of petabytes of data collected over years of web crawling and serves as a vital resource for researchers, businesses, and developers around the world.
About the constellation network
Constellation Network is a Web3 blockchain ecosystem that bridges the crypto economy and traditional business. Its flagship network, Hypergraph, provides a solution for fast, scalable, zero-fee transactions. Constellation’s network has been validated by customer the U.S. Department of Defense since 2019.
Note: This press release contains forward-looking statements. Actual results may differ materially from expectations.
Logo – https://mma.prnewswire.com/media/2537537/ConstellationxCommonCrawl_Logo.jpg