Digitizing disparity in the museum. The object-based research in the Tokyo National Museum

Akira Sakai

In May 2020, UNESCO and ICOM published two reports of studies about how the COVID-19 outbreak is affecting museums around the world. While UNESCO (2020, p. 6) highlights the serious digital divide between museums among regions, ICOM (2020, pp. 9-13) reported that many museums enhanced their digital activities during the lockdown. According to both reports, most museums rely on previously digitized resources to meet an explosive demand for online access (UNESCO 2020, pp. 15-17; ICOM 2020, p. 10), therefore, it would be meaningful for all of us to reassess digitizing progress among our collections today.

The Tokyo National Museum currently houses approximately 120,000 objects and 450,000 digital images since we started a full-scale implementation of digitizing our collection in 1995. As a matter of course, it does not mean every object has several digital images equally. Some ‘popular’ objects need images from new angles every year even though they already have numerous pictures, others never see the light of day at all. Limited economic and human resources require us to prioritize objects for digitizing, which causes a large imbalance in the digitizing progress among the collections.

Although museum staff have clearly felt the digitizing disparity among collections in daily operations, we have never conducted a full-scale investigation on this matter before. This paper introduces our research carried out in late 2019 to early 2020 to visualize the digitizing disparity at the Tokyo National Museum. The research results brought us some relevant findings and a hint for a better digitizing plan for the future.

Research methods

The Tokyo National Museum uses multiple self-developed systems for its collection management. To look into the digitizing progress between our collections, we extracted data from two databases: the Collection Management Database (CMDB) and the Image Management Database (IMDB).

CMDB is an object-based database, which is similar in structure to a collection catalogue. In addition to physical information of each object, the database keeps the records regarding museum activities such as acquisitions data, appraisal records, exhibition and museum loan histories, and repair reports. These records are useful to understand the extent of contributions to the museum activities by object. CMDB itself is accessible only within the museum by users with access. The database has an integral role in the operation of the museum as a fundamental information storage for our collections and activities.

On the other hand, IMDB is a database dedicated to managing digital images. The basic information unit for the database is ‘entry’, which is a set of the digital images of the same subject, which were taken at the same time. Each entry contains data for digital images: basic subject information, the shooting date, name of the photographer, observing staff for shooting, rights, registration date, name of the data registrar, location of media device and any other special notes. Once digital images are created, we upload them to IMDB. The database assigns a unique ID to each image and connects the ID to the CMDB collection records through the subject information.

For this research, we explored the digital images of our collections using CMDB and IMDB together to figure out the answers for the following three questions: 1) how large is the imbalance in digitizing progress among our collections, 2) what are the main factors affecting the digitizing disparity, 3) what can we do for better future digitization. At the beginning, we extracted whole data from two databases and processed them into a suitable form. In particular, we converted each factor in the data to a countable form. As a result, we were able to list our entire collections with scalable numbers of volume of information, frequency of contribution to museum activities and number of digital images. We mainly used general comparative analysis and a recurrent algorithm as the measuring method, but we also checked over the individual cases carefully for exceptions.

Visualizing the digitizing disparity

We automatically extracted 221,538 records which contained 281 obvious errors. We removed them initially, and eventually got 221,257 records in total for the research. The number is nearly twice the amount compared to that of our collections, which we introduced at the beginning of this paper. It is because branch numbers could be assigned to the parts of the objects, and handled as a separate record on CMDB.

As the most basic approach to visualize the digitizing progress disparity among the collections, we compared numbers of digital images by object throughout the entire collections. The number of digital images consists of a total amount of born-digital and scanned images both public and private. If the object has other types of digital materials such as 3D measurement data, these resources are also counted.

The result shows that 21.2 % of the collections do not have digital images at all, 25.9% have only one image, 36.2% have 2 to 3 images, 12.5% have 4 to 10 images, 2.8% have 11 to 30 images, 1.2% have 31 to 100 images, and just 0.2% have more than 101 images. Even if considering that the finely divided branch records lower the level of digitizing completion, the fact that more than one fifth of our collections have not been digitized yet indicates the serious digitizing disparity among our collections.

For a closer look, we evaluated the number of images in 7 levels. It is easy to evaluate objects which have zero or one image because these numbers of images are clearly insufficient for almost all objects. However, objects with more images needs careful consideration when evaluating their digitizing progress. For example, having 8 images of different angles would mean well-digitized for a single photograph or a fairly simple white porcelain bowl, but not enough for a highly decorated samurai armor or a lengthy picture scroll.

Digitizing Progress Levels by Number of Images
Figure 1 – Digitizing progress levels by number of images
Digitizing progress levels by number of images

To see the disparity with more precision, we examined the digitizing progress by object genre with respective reference points, which indicates the assumed sufficient levels for digitization based on the nature of objects by genre; more complicated and qualitatively dense objects deserve higher ranked reference points. With this approach, we could visually recognize the huge digitizing disparity among our collections.

Digitizing Progress by Genre
Figure 2 – Digitizing progress by genre
Digitizing progress by genre

Installing the reference points, the top three well-digitized fields were photography, paintings and calligraphy. It is interesting that reference points for all three are the lowest among genres: C, which means objects in these genres are easy to complete digitization because they are all 2D objects. On the contrary, the worst three were Historical records, others and the Kuroda Collection. We left out the Kuroda Collection because the collection has special circumstances regarding rights for digitization, and put the Tokugawa Book Collection in instead. Historical records and others are the genres which we are still organizing as we process. Historical records and the Tokugawa Book Collection also have in common the fact that both genres consist of a large number of books and manuscripts, which is expensive and takes time for digitization. It is understandable that these factors would make the objects in the genres more difficult for digitization. Those results suggest that the shapes and conditions of objects could be the causes for the digitizing disparity.

This examination let us know the severity of the digitizing disparity among our collections. The most well-digitized genre was photography, in which approximately 70% of the objects were digitized above the sufficient level. On the other hand, the most poorly-digitized genre was History records of which only about 15% were sufficiently digitized. The result also suggests that objects could have inclinations towards digitization in terms of shapes and conditions which could affect the digitizing progress.

Exploring influence factors

We have learnt that the physical aptitude for digitization is one of the factors which causes the imbalance of the digitizing progress among our collections. ‘Having a lot of pages’ or ‘hard to be organized’ are internal factors of the objects. To gain a deeper understanding, we decided to explore the external factors as well; we tried to know how we, at our convenience, could proceed to digitize objects both consciously and subconsciously. Conducting research on the relevance between the digitizing progress and the variety of factors, we found three possible influence factors: object designation, amount of information and frequency of use in museum activities.

Four designation ranks are currently registered on CMDB. The National Treasures are the highest in rank, and the Important Cultural properties are in second place for the present law of the protection of tangible cultural properties. The Important arts is a designation based on antiquated law, which primarily intended to prevent Japanese cultural properties from flowing out to foreign countries. The Registered arts is a designation for objects especially in need of preservation and utilization. (Agency for Cultural Affairs, 2019, p. 3) This designation could be used with other designations, but for CMDB, it means Registered arts which have not been designated to any other ranks. We calculated the percentage of 7 digitizing progress levels by designation adding non-designated items as a group. The result showed that the designations of the objects clearly affects the digitizing progress especially for the valid designations.

Designation and Digitizing Progress
Figure 3 – Designation and Digitizing Progress
Designation and digitizing progress

The table shows that 31.4% of National Treasures are digitized above level B, which is comparable to 25.5% for important cultural properties, 20% for important arts, 7.6% for registered arts and 16.5% for non-designated items. The ratio of A+ for National Treasures is more than five times that of non-designated objects, and the ratio of A is triple. Though some exceptions were observed, the result reflects the trend that the higher designated objects have better digitizing progress especially for National Treasures and Important Cultural Properties. As we mentioned before, these two higher ranked designations are based on present law, therefore, we could say that they are stronger than other designations today. The fact that the object designations affect the digitizing progress makes us conscious about the ‘needs’ of the objects. If the object is a National Treasure, it means that the object is considered to be highly important and precious. Therefore, it could have a stronger demand for research or viewing, which links to the significant need for digitization. To begin more elaborate studies on the ‘needs’, we decided to have a closer look at the content of each object record.

One of the critical roles of museums is to enrich the collection information. Without exception, the Tokyo National Museum has accumulated the knowledge for each object cultivated by successive museum staff. The information is now put together in CMDB and we could see the overall accomplishment of research by object respectively. In the same manner as the digitizing progress, the progress of research also differs by object. We assumed that the more attention objects gain, the more information the records could have, because there could be more research ‘needs’ for the popular objects. On the basis of this assumption, we tried to utilize the amount of information as one of the indicators for object needs. CMDB has 10 basic fields for object information such as country/origin, quantity, size, inscriptions, attachments, condition and provenance. We counted the number of filled fields for every object to see the relevance between the average points and the digitizing progress level.

Amount of Information and Digitization Progress
Figure 4 – Amount of information and digitization progress
Amount of information and digitization progress

The result was rather surprising because it reveals that the digitizing progress does not exactly reflect on the amount of information. It shows that the digitizing level of B and B+ are more common rather than A and A+ among the objects with much information. For further analysis on the reasons why that happened, we changed the indicators to measure the ‘needs’. The public demand for the museum objects would be composed of several purposes, and the research is just one of them. This time, we used the frequency of use of the objects in the museum activities as the indicators. We examined the relevance between the digitizing progress and the frequency of use in each activity separately.

This approach visualized the interesting fact that the relevance to the digitizing progress varies among museum activities. For example, while B+ is remarkably common among the objects frequently exhibited, A+ stands out for the objects used in publications on a regular basis. It is understandable if we consider that we use digital images differently depending on museum activities. B+ is the level of digitizing progress which means that the object has 11-30 images. In most cases, 11-30 photos are enough to pick up an attractive main visual for an exhibition poster. However, a complete set of detailed images would be required to produce an article for publication in a scholarly journal and only A+ objects could be acceptable for this purpose. In that sense, the scaling ranks in this paper F to A+ could be said that it is a ranking system from an archival perspective.

Frequency of Use in Museum Activities and Digitizing Progress
Figure 5 – Frequency of use in museum activities and digitizing progress
Frequency of use in museum activities and digitizing progress

The result let us know that not only the frequency of use, but also the purpose of use takes on a major significance for the digitizing progress. The meaning of ‘digitized’ could be different by the purpose of why the object was digitized, therefore, when we find a record showing that an object is digitized, we need to be careful what digitized means; the object has a complete set of images from all angles, or some attractive frontal photos using special lighting. This implies the importance of setting the solid scale to have an accurate view of digitizing progress of our collections.

Throughout the several approaches to explore the external influence factors, we learnt that some objects have higher needs for digitization rooted in the public attention. Furthermore, the purpose for digitization also affects the digitizing progress, which implies that a solid standard to measure the progress should be installed to properly understand the digitizing disparity.

Conclusion and future work

The research visualized the serious digitizing disparity among our collections, which museum staff have felt has existed for a long time but never conducted a detailed investigation before. The further analysis shows that the following factors would cause the disparity: 1) the inclinations of the objects towards digitization in terms of shapes and conditions – the top three well-digitized genres consisted of 2D objects which are easier to digitize, and all of the worst three have some recognizable difficulties, 2) the attention degree of the objects which links to the demand for digitization – the objects with higher designations in present law have a better digitizing progress, and 3) the combination of the frequency and the purpose of use in museum activities – the digitizing progress was relevant to not only the frequency of use but also the purpose of digitization in museum activities.

These findings sound fairly understandable and easy to imagine, however, we need to take the fact seriously that we allowed a severe digitizing disparity among our collections affected by these simple factors. We consider that it is primarily due to the lack of the opportunity to review the digitizing progress periodically. Again, the museum staff have felt the imbalance in the digitizing progress in our collections for a long time, but no review or investigation has been conducted. Therefore, we had no idea how large the imbalance was and what kind of concrete actions we could take to bridge the gap. As a result, the digitizing disparity has kept growing till today.

To make use of this experience, we are planning to share the research results broadly in the museum, and have an opportunity to review the digitizing progress up to this time together. We also recognize the need for the practical standard for scaling the digitizing progress among our collections. To set the effective standard, it is necessary to discuss with the staff who are in charge of each genre, and improve the scaling system which we utilized in this research. Our temporary goal is to review the digitizing progress annually with a practical standard, and make staff understand the concrete actions they can take to reduce the digitizing disparity among our collections.

The influence factors we found were simple and common, which means the digitizing disparity could be a threat to any museums without taking concrete and effective measures. Public demand for digitization is increasing rapidly. We hope every museum can find the best future plan for digitizing their precious collections.

Akira Sakai

Akira Sakai

She is an Associate Fellow at the Tokyo National Museum. She received a master’s degree in library and information science from the University of North Carolina at Greensboro, where she cultivated her interest in the digital activities in the museum field. She also has experience in working as a Junior Specialist for Archival Affairs at the National Archives of Japan, which influenced her to take on digital archives as her main area of research.