Quantcast
Channel: MobileRead Forums - Kindle Developer's Corner
Viewing all articles
Browse latest Browse all 4414

Native geekmaster dither formula demystified?

$
0
0
There have been requests, and I have posted, that when I get some time, I will write a bit about how my dither code works. This thread is the beginning of such an effort. There will be more to come later...

As published in the geekmaster kindle video player thread, here is the "raw2gmv" kindle video transcoder:
PHP Code:

//====================================================
// raw2gmv 1.0a - raw to geekmaster video transcoder
// Copyright (C) 2012 by geekmaster, with MIT license:
// http://www.opensource.org/licenses/mit-license.php
//----------------------------------------------------
#include <stdio.h>  // stdin,stdout
typedef unsigned char u8typedef unsigned int u32;
int main(void) {
    
u8 o,to,tb,wb0[800*600]; u32 x,y,xi,yi,c=250,b=120// c=contrast, b=brightness
    
while (fread(wb0,800*600,1,stdin)) for (y=0;y<800;y++) { xi=ytb=0;
        for (
x=0;x<600;x++) { yi=599-xo=x^y;
            
to=(y>>2&1|o>>1&2|y<<1&4|o<<2&8|y<<4&16|o<<5&32)-(wb0[yi*800+xi]*63+b)/c>>8;
            
tb=(tb>>1)|(to&128); if (7==(x&7)) { fwrite(&tb,1,1,stdout); tb=0; }
        }
    } return 
0;


It takes raw 800x600 8bpp video piped into STDIN, rotates it to 600x800 for kindle portrait mode, and dithers it while computing the dither table on-the-fly, then outputs it to STDOUT. Such STDIO piping allows chaining multiple programs together on the command line. The output can be redirected to a file to be played later, or it can be piped directly into the video player program, or it can be piped into netcat and sent over a network.

It uses an in-line formula with no subroutines, to be branch-free (no if-else statements) which makes it cache-friendly.

The while statement reads a series of single 800x600 video frames (one byte per pixel).

The outer for loop processes entire 800-pixel rows from the current video frame, and the inner for loop processes individual pixels (8-bit grayscale) from the current video row.

Inside this outer loop we set xi=y, to convert input rows to output columns, as needed for portrait mode output.

The body of code inside the for loops swaps x and y coordinates to rotate the video into portrait mode, and computes a dither table threshold for the current framebuffer pixel position, then computes a single output bit (black or white pixel). Finally, it packs the pixel bit into a byte, which when full (eight sequential pixel bits), it writes that full byte to STDOUT.

Here is the loop body (with added white space):
PHP Code:

    yi=599-x;
    
o=x^y;
    
to=(y>>2&1|o>>1&2|y<<1&4|o<<2&8|y<<4&16|o<<5&32)-(wb0[yi*800+xi]*63+b)/c>>8;
    
tb=(tb>>1)|(to&128);
    if (
7==(x&7)) {
        
fwrite(&tb,1,1,stdout);
        
tb=0;
    } 

Notice that yi traverses from 599 to 0, depending on x, to prevent a mirror image when rotating our input video from 800x600 landscape mode to kindle 600x800 portrait mode. We are building the OUTPUT video in portait mode, so we need to raster-scan columns from input to output in each frame buffer to build rows of output video.

The next statement sets variable o (order) in correspondence with a checkerboard pattern in the framebuffer, representing the output pixel bits needed for a 50-percent grayscale.

Then we get a bit complicated, while computing the to variable. The key idea is that some variables are used in the logical expression in a way that enables or disables portions of the overall formula, much like how if-else statements could also be used to do the same thing.

The first part of the expression, in parentheses, computes the dither threshold for the current pixel position. Notice how it contains a series of values in a binary progression (&1, &2, &4, &8, &16, &32). These bits are conditionally combined into a 6-bit dither value (0-63). The condition that determines whether these bits remain a shifted one-bit or become a zero-bit is taken from either a shifted y-position or a shifted o value (x xor y), which modified the 50-percent ordered dither mask (in o) to increase or decrease the value depending on its position in the framebuffer. This effectively reproduces the common ordered dither table that is found in most dither routines, without needing a dither table to be precomputed, and without memory lookups (when memory access is very slow compared to real-time computation on modern cached processors). The best way to see how this dither formula works is to fill a table with the values computed by using it, and compare that to a traditional dither table such as in my formula-42 dither logic. The whole point of a dither table is to evenly distribute all values 0 to 63, across a square texture tile, and use that to tile the entire framebuffer. Then each input value is compared against these values, used as a threshold, to determine whether the output pixel is black or white.

wb0 is our work-buffer (in-memory version of a framebuffer such as /dev/fb0). wb0[yi*800+xi] is the current input pixel, an 8-bit value. Using scaled-integer arithmetic, we do a brightness and contrast adjustment using predefined values for c=250 and b=120 (as determined empirically for good video image quality). Essentially, we multiple the 8-bit pixel by 63, add 120 (default brightness), divide by 250 (default contrast).

Instead of comparing each input value against a dither threshold, we subtract the scaled input value from the dither threshold value so that the result is positive or negative depending on threshold result, then we shift the sign bit with >>8 to extend it down into bottom byte of the computation, making the sign-bit either 0 or 255 (pure black or pure white).

And last, we mask the sign bit with &128, and shift it into the output byte. And yes, I did cheat a little in that an if statement determines when to write out that output byte. However, there are no branches while computing each pixel bit. The reason for using a logic expression is that traditional if-else statements will run much slower if it confuses CPU cache branch prediction (which is very simplistic in RISC embedded processors). Some compilers may automatically convert if-else statements into more efficient code, however, that cannot be relied on (especially when using tcc while compiling in the kindle itself).


I hope that helps understand this code a little bit better. Notice that this only shows the DITHER step, and does not include extra logic in the gmplay program, needed to deal with variations between different kindle models, and between main and diags modes. You may wish to compare and contrast the gmplay code against raw2gmv, or you can wait until I take time to document that a bit better too. However, until then, read the "theory of operation" details by clicking the appropriate button in that the geekmaster kindle video player first post.

Viewing all articles
Browse latest Browse all 4414

Trending Articles