Universal Digital Library
"Teamwork divides the task and doubles the  success"; this is the essence of this project.
The idea of the Universal Digital Library (UDL) project
Initiated by Carnegie   Mellon University, and previously known as the Million Book Project,  the objective of this project is to transfer all books into  digital format, in partnerships with other scanning centers internationally, in  order to create a Universal Digital Library (UDL) which fosters creativity  and free access to human knowledge. The project also aims at providing a  test-bed supporting research on improved scanning techniques, Optical Character  Recognition (OCR), intelligent indexing, machine translation, and information  retrieval. 
Creating partnerships
One of the key activities is to work with different  libraries, universities and institutions worldwide that can adopt this model of  exchanges and/or donate some of their collections, whether in digital form or  through sending them for digitization and having them back. This would include  books, journals, as well as theses and research reports.
Bibliotheca Alexandrina (BA) and its partners (including China, India and USA) have been working to demonstrate the project’s feasibility by digitizing one million books within three years and publishing them as a searchable collection on the Internet. However, by November 2007 the 1.5 million mark was already passed. The collection has been published and is available at www.ulib.org. All partners are contributing content to ensure that the collection is extensive, diverse and multilingual. The collection of the digitized books was reached by swapping the digitized books produced by the different partners. This method not only allows for sharing the resources of different countries and dividing the work among them, but also has the desirable feature of having each partner hold a mirror site of the million digitized books locally, thus guaranteeing fast access, reliability and availability. 
BA's role
BA is taking the lead in scanning and  digitizing Arabic books in particular. The collection has currently reached more than 170,000 Arabic digitized and processed books.
BA has also designed and implemented a  database for the books, metadata and digitization status and set standards for  the process of digitization in order to improve the quality of the scanning,  processing, and OCR phases. The complete cycle of the workflow to produce  digital books has been automated and integrated with the Library Information System. The workflow is managed by DAF, an in-house developed digitization workflow management system. 
The database was further expanded into a Digital Assets Repository (DAR) accommodating  various other types of digitized material including slides in multi-formats,  negatives, books, manuscripts, pictures and maps, audio and video.
Researching improved techniques
  Research was carried out in co-operation  with Arabic OCR producers in order to achieve efficient, high quality  recognition for mass OCR production. OCR systems alternative to Sakhr's Automatic  Reader capable of recognizing Arabic text were investigated. A modified  strategy for the OCR phase of digitization is currently being thought out. The  new tools being investigated include VERUS from NovoDynamics, iRDS SDK from  IRIS, CiyaOCR/ICR from CiyaSoft as well as OCR research work at the University of Buffalo, focusing on the recognition of  Arabic machine-print as well as handwriting. In May 2006, BA and Novodynamics  established a research partnership in order to advance NovoDynamics’ VERUS Professional  product through testing and evaluation. Another research agreement was also established with Sakhr.
An implementation of an encoding system for  multilingual, including Arabic, image-on-text DjVu and PDF has been completed  and evaluated. Besides, a design of a framework for the universal encoding of  image-on-text documents has been conceived. Previously, 12 OCR fonts were  constructed and tested for accuracy, where accuracy exceeded 90% for 11 fonts. Three  additional font groups are currently under construction. Moreover, an in-house  digital viewer was implemented for publishing books on the web based on  image-on-text technology. The viewer was enhanced and now includes searching, streaming  by displaying one page at a time to facilitate displaying the book over a slow  Internet connection, and extra security features such as displaying a specific  range of pages or a limited number of pages and protecting copyright by  preventing the user from copying or printing the entire book. DAR  publishing website (http://dar.bibalex.org)  features the books viewer where over 180,000 completely searchable books are now  available.
International Collaboration
The BA participated with the Million Book  project in the World Summit for Information Science (WSIS) conference that took  place in Tunis  from 16 to 18 November 2005. Furthermore, BA held the 2nd International  Conference on Universal Digital Library (ICUDL2006) from 17-19 November 2006;  the conference was followed by the Million Book Annual workshop. The main theme  of this conference was “Towards building the globally owned Universal Digital  Library where human knowledge is equally preserved and accessed”. The  conference provided a forum for library and IT professionals to exchange  comprehensive views on the recent development and progress in the digital  library technology. An academic paper entitled " The Million Book Project at Bibliotheca Alexandrina" was presented during the conference. This paper was also chosen to be published in the Zhejiang University SCIENCE journal. 
 
Partners & participants 
    - Carnegie Mellon University, USA 
- Internet  Archive, USA
- Beijing University, China 
- Chinese  Academy of Science, China
- Fudan University, China 
- Chinese  Ministry of Education, China
- Nanjing University, China 
- State  Planning Commission of China 
- Tsinghua University, China 
- Zhejiang University, China 
- Indian Institute   of Science, Bangalore 
- International   Institute of Information Technology 
- Indian Institute   of Information Technology 
- Anna University,   Chennai
- Mysore University,   Mysore 
- University of   Pune, Pune 
- Goa University,   Goa 
- Tirumala Tirupati   Devasthanams, Tirupathi 
- Shanmugha Arts,   Science, Technology & Research Academy, Tanjore 
- Arulmigu   Kalasalingam College of Engineering, Srivilliputhur 
- Maharashtra   Industrial Development Corporation, Mumbai 
- Bibliotheca  Alexandrina – ISIS
                
                
        
        
             
            
                 Last updated on 28 Feb 2011