Tintelligence is a convolutional neural network (CNN) that can convert black and white photos into color. CNNs have the ability to “learn” features and object structures, making them an excellent architecture for image work. This relatively recent technological advance has been instrumental in archival work. We train our model on the CIFAR-10 image dataset which contains 50,000 training images credited to Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. (Our model currently provides…..accuracy…results yet to come). This work is significant since it can significantly speed up the old-fashioned process of hand-coloring old photos and has applications in museums, academia, documentary-making, and medical imaging—any space where greyscale images need to be colorized.
Deprived of the vivid colors that now paint modern pictures, black and white footage during the era of pre-colorized film can easily seem detached from reality. Take documentary footage of World War II for example. Only when colorized with the same shades that one would see in their everyday life--often only possible thanks to an expensive, commercial restoration effort like History Channel’s “World War II in Colour”--can these pictures evoke the “realness” of the moments they depict. And yet, with significant advancements having been made in the field of AI-driven image colorization, the powerful visual data processing capabilities of convolutional neural networks offers a chance to expedite this process and bring further historical footage to life. The benefits don’t stop here, however. Colorization has the potential to enhance the effectiveness of medical imaging or create more accessible content for those with colorblindness. Such a process does not come without challenges, however. Using a convolutional neural network trained on a large dataset of consecutive colored images and their grayscale counterparts, we will be able to implement an automatic black-and-white colorizer of plausible color schemes.
Our project uses methods from different sources combined to try and find the optimal result. Not done with this section...
As we are still puting this project together, we have come across obstacles related to integration of systems from the related works as well as version control.
Within the field of black & white colorization, there has been a great number of progress and variations in terms of projects. While all of them use convolutional neural networks, there are variations in the training models used. For Revathi et al. (4) in “Black and White Image Colorization Using Convolutional Neural Networks” they trained their network with the MIRFLICKR 25k dataset. This proposed method produced great results with a very little mean square error of 0.3174. As for Joshi et al. (3) in “Auto-Colorization of Historical Images Using Deep Convolutional Neural Networks” they integrate deep CNN with Inception ResNetV2 and train their model on a dataset they created with 1.2 K historical images comprised of old and ancient photographs of Nepal. There is also a tutorial on “Pyimagesearch” by Rosebrock (5) that suggests using deep CNN alongside OpenCV with training on the ImageNet dataset. This method enables the automatic colorization of grayscale images that can convincingly resemble natural color photographs just like the previously mentioned works.
Building on black & white colorization, there have also been suggestions of colorization that deviate from realistic/natural colors and apply artistic/aesthetic colorization to images. For Zhang et al. (1) in “Colorful Image Colorization”, rather than attempting to find the “correct” colorization, this team of researchers tried to make a colorizer that generated “plausible” color schemes. As for Cho et al. (2) who worked on PaletteNet, they had a deep neural network that took in an image and a specified color palette and then returned the given image newly colorized in the desired color palette. These cases show that image colorization can go beyond realistic replication to include artistic projects and aesthetic preferences.
It is important to consider the ethics behind our project. The first concern is the data source. Images in the CIFAR-10 database were scraped from the internet by researchers at MIT and NYU. We are aware of the high possibility that some people who uploaded these images did not consent for its use in training neural networks. However, given the need for a large database of images for training CNNs, we believe CIFAR-10 is the best option given our limited constraints in gathering enough of our own training data. It is worth noting that CIFAR-10 is a popular database within the image community, and one of its creators, Geoffrey Hinton, recently won a Nobel Prize in Physics for his contributions—an external vet of validity. For these reasons, CIFAR-10 was selected and used. Ideally, future datasets will be created with the full consent of anyone involved. The second concern is how our CNN will be used. We intend for our CNN to be used to capture the “realness” of the moments they depict from the past. This can be beneficial for family members wishing to better understand their ancestors, historians researching archival photos, and documentary work to better appeal to modern audiences. There are also benefits in the medical imaging community for better visualizations. However, we realize that this CNN can be used for “bad” reasons, such as digital depiction and fraud. There may also be images people wish to remain untouched. Realistically, it is hard to distinguish between good and bad intent. We will be explicit in our directions on what this CNN should be used for.
When we first started this project, our idea was very different—detecting optimal surfing conditions. Due to lack of data and experience, we pivoted and decided to work on a practical and relatively explored topic, grayscale to RGB, as we wanted to focus more on the learning aspect of CNNs rather than going off completely into the unknown. Our choice to go down a paved path, the one more traveled, had both pros and cons. The pro was that we had easy access to information and assistance when we needed it. Numerous tutorials exist online as well as several research papers and ready to use databases (CIFAR-10). The con was that because of this, we were not pushed to think independently, troubleshoot, and learn the “hard” way of getting nitty-gritty. We had an easy way out, and without proper discipline, is a quick way to result in the bare minimum being done.
In regards to software, we intend to convert a Jupyter Notebook using PyTorch into a standalone app using Voila. While Gradio is a sound alternative--and one we considered using--we think that Voila will be most effective towards achieving a tangible end result. As for datasets, we anticipate that an ImageNet dataset with color images turned into grayscale will be adequate (as suggested by previous research). Although an analysis that hones in on creating realistic grayscale images and testing the believability of this is a meaningful route, that would require test participants; instead, we will merely analyze the error between the colorized image and the original, non-grayscale image. However, in the unlikely event that the ImageNet dataset is inadequate--or we wanted to improve the “randomness” of the validation dataset--we anticipate that we can collect and grayscale a large number of images from the internet. For our model, we’re opting to use a pre-trained convolutional neural network from PyTorch’s library as a feature extractor in order to save time and energy, but then later training the custom layers ourselves.
#TO DO:
Given that these grayscale images will have an original color-photo counterpart, it is important to not only produce a colorized image from the grayscale image but also to compare the colorized image to the original color one. With this in mind, we anticipate using Mean Absolute Error between the colorized pixels and the corresponding pixel in the original image. Naturally, a lower Mean Absolute Error would suggest that the neural network was effective in accomplishing its task, but a non-low Mean Absolute Error does not necessarily suggest it was ineffective. If the Neural Network produces images that are plausible to the eye but vary significantly in actual RGB value to the original, it has at least become an effective plausible colorizer--as opposed to a “replicator.” To elaborate further, consider a picture of a blue billiards table with five red balls on it; some are dull, some are sharp. Due to the rules of the game, billiards tables would likely have a variety of colors on the table--is it unreasonable for the Neural Network to colorize each ball differently, especially given that there is variation between them? As the UC Berkeley study cited in this paper suggests, this is not a failure of the Neural Network, merely an attribute.
#TO DO: