Source code that accompanies the paper Colorization Transformer announced at ICLR2021. Work by Manoj Kumar, Dirk Weissenborn, and Nal Kalchbrenner.

Paper summary

ColTran consists of three components: a colorizer, a color-up sampler, and a space-up sampler.

The colorizer is an autoregressive, self-attention-based architecture consisting of conditional translayers. Roughly colors low-resolution 64×64 grayscale images pixel by pixel. Color-up samplers are parallel, deterministic, self-attention-based networks. Refines coarse, low-resolution images into 64×64 RGB images. The architecture of the space-up sampler is similar to the color-up sampler. Supersolves low resolution RGB images to final output. The colorizer has an auxiliary parallel color prediction model that consists of a single linear layer.

We report the results after training these individual components on a 4×4 TPU v2 chip. Adjust the model size and batch size while training with less resources. The results of training these components with less resources are given in the appendix.

The complete configuration used to train the model in paper is available in the directory configs. Very small model configurations are provided in test_configs to test that the model builds quickly. For quick logging, set the flag –steps_per_summaries = 100. When sampling, set config.sample.log_dir to the appropriate write directory.

Requirements pipinstall -rrequirements.txt Training

Train the colorizer by running the following command

python -m –config = coltran / configs / –mode = train –logdir = / colorizer_ckpt_dir

To train color and space upsamplers, replace configs / with configs / and configs /, respectively.


For evaluation

python -m –config = coltran / configs / –mode = eval_valid –logdir = / colorizer_ckpt_dir Sampling Single GPU sampling

Sampling high resolution images is a three-step procedure. On P100 GPUs, the colorizer samples a batch of 20 images in 3-5 minutes, and the color and spatial upsampler samples on the order of milliseconds.

The sampling configuration for each model is described in config.sampleConfigDict in configs /.py.

sample.num_outputs-Total number of grayscale images sample.logdir-Sample write directory. sample.gen_data_dir-Path to where the sample from the previous step is stored. sample.skip_batches-First skip_batches from public imagenetTF * batch_size image-Dataset will be skipped.

Make sure that num_outputs and skip_batches are the same for all three components. The generated sample is written to $ logdir / $ {config.sample.logdir} as TFRecords.

Color riser

This command samples a low resolution coarse color 64×64 image.

python -m coltran.sample –config = coltran / configs / –mode = sample_test –logdir = / colorizer_ckpt_dir Color up sampler

This command converts the coarse 64×64 image from the previous step to a fine 64×64 image.

Note: Set the color-up sampler configuration config.sample.gen_data_dir to /colorizer_ckpt_dir/${config.sample.logdir}.

python -m coltran.sample –config = coltran / configs / –mode = sample_test –logdir = / cup_ckpt_dir Spatial Upsampler

This command super-resolutions the previous output to a high resolution 256×256 output.

Note: Set the spatial upsampler configuration config.sample.gen_data_dir to $ / cup_ckpt_dir / $ {config.sample.logdir}.

python -m coltran.sample –config = coltran / configs / –mode = sample_test –logdir = / cup_ckpt_dir Multi-GPU sampling

Sampling can be parallelized between batches in a multi-GPU setup using the flag config.sample.skip_batches. For example, in a 2 machine and 20 batch size setup, to color 100 grayscale images per machine, set config.sample.skip_batches on the 1st and 2nd machines to 0 and 5, respectively. Set to.

Pre-trained checkpoints

We will release a pre-trained checkpoint on ImageNet at the following URL:

Colorizer-Link Color Up Sampler-Link Space Up Sampler-Link

To sample, download them to your local directory, set the logdir flag to your local path, and run the sampling script.

Reference tensor board

An overview of train operations is available on

Colorizer-Link Color Up Sampler-Link Space Up Sampler-Link Analysis TF Records

The generated TF record can be easily converted to an image with the following code

def parse_example (example_proto, res = 64): features = {‘image’: ([res*res*3], Tf.int64)} example = (example_proto, features = features) image = tf.reshape (example)[‘image’], (Res, res, 3)) return image gen_dataset = (listdir (path)) gen_dataset = (lambda x: parse_example (x, res)) gen_dataset = iter (gen_dataset) for image in gen_dataset : Plt.imshow (image) quote

If you use the code or model, please cite our treatise.

@inproceedings {kumar2021colorization, title = {Colorization Transformer}, author = {Manoj Kumar and Dirk Weissenborn and Nal Kalchbrenner}, booktitle = {International Conference on Learning Representations}, year = {2021}, url = {https: // openreview. net / forum? id = 5NA1PinlGFu}}

