Kangaroo LLM Begins Web Crawl for Australia 1st Open AI Model

The Kangaroo LLM project has launched an extensive web crawling initiative to create Australia s first open-source artificial intelligence model. The project s custom web crawler, Kangaroo Bot, will begin collecting data from 754,000 Australian websites starting September 25th, 2024, to build the VegeMighty dataset. This dataset aims to capture a comprehensive corpus of Australian English content, reflecting the country s language and culture. With over 4.2 million registered domains in Australia, this initiative represents a significant step towards developing an AI model that understands and represents Australian language and culture. The project emphasizes ethical data collection, transparency, and data sovereignty, ensuring compliance with national regulations. Vinod Bijlani, AI Practice Leader at Hewlett Packard Enterprise (HPE) and a key partner in the Kangaroo LLM consortium, highlights the importance of this initiative for Australia s AI journey. The consortium, which includes industry leaders such as Katonic, RackCorp, NextDC, Hitachi Vantara, and HPE, views this effort as a crucial step towards establishing Australia as a leader in ethical AI development. Website owners who wish to opt out of the Kangaroo Bot crawl can do so by adding the following to their robots.txt file: User-agent: Kangaroo Bot Disallow: / The Kangaroo LLM project invites all Australians to participate in this groundbreaking journey, either by allowing their sites to be included in the dataset or by following their progress. This initiative aims to build a foundation for Australia s AI future, capturing the essence of Australian online communication and culture.

Source: miragenews.com
Published on 2024-09-19