
AI datasets by IIT-Bombay to simplify Indian texts, help in AI research
MUMBAI: For years, research in Indian knowledge systems, often available in Indian languages such as Sanskrit, was challenging for researchers. However, a data curation exercise carried out by the premier IIT-Bombay, as part of its contribution to the central govt's AIKosh portal, has simplified it to some extent by digitising 30 different textbooks.
A dataset containing around 2.18 lakh sentences with 1.5 million words from these textbooks, covering diverse topics such as astronomy, medicine, and mathematics, with some even as old as 18 centuries, is now available on the govt portal.
AIKosh, launched in March, is a source for datasets, models, toolkits, and more from diverse sources that aim to help AI-based innovation and research. IIT-Bombay, one of the leading contributors to the AIKosh platform, along with BharatGen, a consortium of seven institutes again led by IIT-Bombay, has contributed 37 diverse models and datasets on the portal so far.
IIT-Bombay alone launched around 16 culturally significant datasets on the platform to contribute to the country's AI mission.
BharatGen, funded through a section 8 company formed by the Department of Science and Technology with IIT-Bombay, IIT-Kanpur, IIT-Madras, IIT-Hyderabad, IIT-Mandi, IIM-Indore, and IIIT-Hyderabad as partners, launched 21 models on the portal.
'We are not only researching Large Language Models (LLMs) and other generative models for AI that are effective and data and compute efficient, but also building sovereign models for India from the ground up.
We are creating datasets for training these models and fine-tuning them for downstream tasks such as conversation and question-answering, while creating benchmarking datasets towards calibrating the performance of these models,' said Prof Ganesh Ramakrishnan from IIT-Bombay, who is spearheading the project.
The team has not only put out datasets relevant to the Indian knowledge systems but also others that can help in audio-visual learning, such as tutorials capturing practical skills like waste-to-toy creation or organic farming.
There is also one on Sanskrit translation for contemporary prose, a math word problems dataset in Hindi and English which will train the AI in mathematical reasoning, and culturally-grounded multi-lingual question-answering datasets, including questions and answers from historian Dharampal's books, among others.
One of the datasets also enables the AI to answer questions about images using external knowledge, and another interesting one is on recognising text in videos with camera movements.
Most of these models are trained from scratch, not just fine-tuned, said Prof Ramakrishnan. The models also uniquely balance Indian data alongside English data, ensuring relevance to our country, he said. 'We are creating benchmarks for the AI ecosystem in the country, but these can be pulled out by researchers, enterprisers, companies, or even academia and developed further,' he added.
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


India.com
2 hours ago
- India.com
Meet Balaji Srinivasan, who purchased private island to build new nation for...; it is located near....
Balaji Srinivasan, a prominent Indian-American tech visionary, is taking an unprecedented step to implement his ambitious 'network state.' He has purchased a private island in the vicinity of Singapore and is developing an entirely new digital community. Emerging from the 'network state' is the 'Network School,' which is drawing technologists, entrepreneurs, and founders from all over the world. Who is the Indian-origin genius building a new nation on a private island? Balaji Srinivasan, ex-CTO at Coinbase and Counsyl Inc's co-founder, has an elaborate vision that is rooted in the ideas in his book 'The Network State'. Srinivasan lays out that there are online communities built around shared values. His vision includes creating a digitally-first nation that could eventually establish sovereignty by claiming a physical territory. What is Network School? Who will benefit or be invited to join this bold experiment? The Network School, which began in September 2024, is the initial experimental model of this idea in a three-month residential program to support personal, physical, and professional development. The students of the school wake up every day to run to the gym, lifting weights, going to class on subjects including artificial intelligence, technology, blockchain, and entrepreneurship. In a virtual tour of the island posted on Instagram, Nick Peterson described it as 'an oasis for gym fans and startup founders.' Sharing a post on X(formerly Twitter), Balaji Srinivasan wrote, 'THE NETWORK SCHOOL We got an island. That's right. Through the power of Bitcoin, we now have a beautiful island near Singapore where we're building the Network School. We're starting with a 90-day popup that runs from Sep 23 to Dec 23, right after the Network State Conference. Rent is only $1000/month with roommates or $2000/month solo. And we have plenty of day passes for visitors. ' The Network School is dedicated to 'advancing truth, health, and wealth; and reimagining democracy for the internet age.' Participants are engaged in: Physical Fitness: Regular gym workouts for a balanced development of body and mind, Tech Education: Specialized classes on AI, blockchain, and startup for innovation, Building Networks: Practicing the turning vision of a digital society into reality. What is Balaji Srinivasan's educational qualification? Originally from Tamil Nadu, Balaji Srinivasan was born on May 24, 1980, in Plainview, New York, USA. He completed his BS, MS, and PhD in Electrical Engineering and MS in Chemical Engineering, all from Stanford University. He founded a genetic testing company called Counsyl, which sold for $375 million in 2018. He co-founded several startups, including 21 Inc. (subsequently Teleport, and Coin Centre. He has also served as CTO of Coinbase, and as a General Partner of Andreessen Horowitz. In 2024, he opened the first Network School on an island in Southeast Asia, and he is currently planning campuses for Dubai, Tokyo, and Miami. In a recruitment post, Balaji Srinivasan wrote, 'We're looking for remote workers, digital creators, personal trainers, developers — people who want to earn crypto, build things, burn calories, and have fun.'


India.com
3 hours ago
- India.com
Made in Rs 835 crore, India's most expensive film, beats Baahubali, Kalki 2898 AD, Pushpa 2, Thugs of Hindostan, Adipurush, Brahmastra, name is…, lead actors are…
What if the costliest Indian film ever made isn't a sci-fi spectacle or a superhero franchise — but a retelling of ancient mythology? Filmmaker Nitesh Tiwari, known for Dangal, is now mounting the most expensive Indian film in history — Ramayana — with a jaw-dropping budget of INR 835 crore (approx. $100 million). That's more than Kalki 2898 AD (INR 600 crore), RRR and Adipurush (INR 550 crore each), and Brahmastra (INR 375 crore) — combined. And here's the twist: Only Part 1 is done. Based on Valmiki's ancient Sanskrit epic, Ramayana is being crafted as a two-part cinematic saga with Part 1 already wrapped and now entering an intense post-production phase. The film promises a mythological spectacle on a scale that Bollywood has never seen before. A cast woven in stardust Headlining this colossal project is Ranbir Kapoor as Lord Ram, Sai Pallavi as Sita, and Kannada superstar Yash as Ravana. But the star power doesn't stop there — names like Vivek Oberoi, Rakul Preet Singh, Lara Dutta, Kajal Aggarwal, Ravi Dubey, Kunal Kapoor, Arun Govil, Sheeba Chadha, and Indira Krishnan fill out an ensemble that reads like a galaxy of stars. The first-look reveal on July 3 has already ignited fan anticipation. The film is set to release during Diwali 2026, aiming to deliver an epic fit for the festival of lights. VFX fit for a godly war The visual effects are being handled by a British-Indian studio that has won eight Oscars for Best Visual Effects, further raising expectations for a visually arresting experience. With so much riding on scale and spectacle, post-production will be a painstakingly detailed process. Will part 2 wait on part 1's fate? While Part 1 is nearing completion, the filming schedule for Part 2 remains undisclosed . Industry buzz suggests the team may hold off until they gauge audience response to the first instalment before going full throttle.


Hans India
4 hours ago
- Hans India
Advantages NEP 2020 offers to B.Tech students
The introduction of the National Education Policy (NEP) 2020 has brought transformative changes in the Indian education system, especially benefiting students by making engineering education more flexible, multidisciplinary, and aligned with global standards. For aspirants and students, these reforms open up new academic and career opportunities while fostering a more holistic and industry-relevant learning environment. Flexibility in subject choice and multidisciplinary learning One of the most significant advantages of NEP 2020 for students is the removal of rigid subject prerequisites. Traditionally, admission to engineering programs required students to have studied Physics and Mathematics in their 12th standard. The NEP 2020 allows students to enter or B.E. programs with a broader range of subjects. They include Computer Science, Biotechnology, Agriculture, Business Studies, Entrepreneurship, and more, besides Physics and Mathematics. This flexibility encourages students from diverse academic backgrounds to pursue engineering, broadening the talent pool and fostering multidisciplinary education. Students can now combine engineering with humanities, management, or sciences, promoting innovative thinking and problem-solving skills essential for modern technological challenges. Introduction of bridge courses To support students who enter engineering without traditional subjects like Physics and Mathematics, NEP mandates the introduction of bridge courses in these foundational areas during the initial semesters. This ensures that all students, regardless of their prior academic background, develop a strong conceptual base necessary for engineering studies. These bridge courses help level the playing field and reduce dropout rates by providing tailored academic support, thereby enhancing students' confidence and competence in core engineering subjects. Emphasis on multidisciplinary and holistic education The NEP 2020 promotes a multidisciplinary approach, allowing engineering students to study a variety of subjects, including arts, humanities, and social sciences. This diverse educational experience fosters creativity, critical thinking, and adaptability—skills that are essential in the rapidly evolving technology industry. For example, a student specialising in Artificial Intelligence can also study psychology or cognitive science, enhancing their understanding of human cognition and improving AI algorithms. This holistic education prepares students not just as engineers but as innovators and leaders capable of addressing complex real-world problems. Academic Bank of Credits (ABC) NEP 2020 introduces the concept of an Academic Bank of Credits, allowing students to earn and accumulate credits from different institutions and programs. This system offers greater flexibility in course selection and pacing, enabling students to customise their learning paths and explore interdisciplinary subjects without losing academic progress. For students, this allows them to take courses from other universities or online platforms, thereby enriching their knowledge and skills while pursuing their degree. Focus on research and innovation NEP 2020 emphasises strengthening research culture at the undergraduate level. Engineering colleges are encouraged to integrate research and innovation into the curriculum, enabling students to engage in hands-on projects, internships, and industry collaborations early in their academic journey. This exposure equips students with practical skills and a problem-solving mindset, making them more industry-ready and competitive in the global job market. Inclusion of emerging technologies To keep pace with technological advancements, NEP 2020 mandates the inclusion of emerging and futuristic technologies such as Artificial Intelligence, Data Science, Robotics, and Cybersecurity in the engineering curriculum. This ensures that students are trained in cutting-edge fields, preparing them for future career opportunities and innovation-driven roles. Institutions are required to offer at least one course related to emerging technologies, fostering continuous learning and adaptability among students. Improved Quality and Accreditation The policy mandates uniform accreditation and quality standards for both public and private institutions, ensuring that engineering education adheres to high academic and industry standards nationwide. This helps students gain degrees that are recognised nationally and internationally, enhancing their employability and prospects for higher education abroad. Greater autonomy and flexibility for institutions The new policy provides higher education institutions with more autonomy to create curricula, implement new courses, and innovate teaching methods that align with industry needs and student interests.. This flexibility enables engineering colleges to stay updated with technological trends and tailor programs that better serve student aspirations. Integrated and accelerated degree programs The policy promotes integrated undergraduate and postgraduate programs, allowing students to complete their and degrees in a shorter duration if desired. This reduces the time and financial strain on students while allowing them to enter the workforce or research fields more quickly. Focus on skill development and employability NEP 2020 emphasises vocational education and skill development alongside traditional academics. Engineering students develop essential skills in entrepreneurship, communication, and critical thinking, which are crucial for success in the global job market. Promotion of online and digital learning In response to recent global challenges, NEP 2020 encourages the use of online education and digital platforms to supplement traditional learning. This hybrid approach provides students with access to a vast array of resources, expert lectures, and collaborative tools, enhancing learning flexibility and reach. Focus on equity and inclusion NEP 2020 aims to make technical education more accessible to underrepresented and disadvantaged groups through scholarships, reservations, and support programs. This democratisation of education ensures that talented students from all backgrounds can pursue engineering careers, contributing to a diverse and inclusive workforce2. Conclusion The National Education Policy 2020 marks a historic shift in Indian higher education, particularly benefiting students by providing greater flexibility, multidisciplinary learning opportunities, enhanced research exposure, and alignment with emerging technologies. It fosters a more inclusive, innovative, and globally competitive educational environment that prepares engineering graduates to excel in a rapidly changing world. With these reforms, students can expect a more personalised, skill-oriented, and future-ready education that not only equips them with technical knowledge but also nurtures creativity, critical thinking, and lifelong learning abilities essential for success in the 21st century.