Hey all, as you may know from recent history, I have been capturing using the v4l2 streaming interface and displaying to the screen using a number of different methods. Here are some results... OpenGL display system-I used (basically) a glClear, then a glSetRoster, then a glDrawPixels to display to the screen. On the capturing side I used the RGB32 (or BGR32, I am not sure) pixel format for displaying. I had to offset the buffer I handed to glDrawPixels by 1 byte (I'll send you the code if you don't believe me, but it is designed to work with a bttv card) then I could draw to the screen using GL_RGBA as a pixel format. This used 50-60% processor, as reported by top. Xvideo display system-basically, used the basic extentions, and copied the uyvy buffer from the capture stream to an xvideo shm buffer, then displayed that. Processor usage: 55-65% XDGA display system-I set the display device into 24-depth 32-bits_per_pixel format, at 1024x768. Then I could transfer the buffer returned by setting the capture device at BGR32 (that I can remember) straight to the hardware framebuffer. For that transfer I used a simple (unrolled) for loop, such as outLong[i] = inLong[i]. This used 30-40% processor. I found out that on my machine, pII400, memcpy was the slower way to copy each scanline (each scanline is 4*640 bytes, so not a trivial amount). Array indexing was FASTER. Odd, huh? Any ideas? I am also not sure if there isn't a faster way to transfer the memory from here to there, but I don't know how to initiate a string of dma transfers from userspace mem to the framebuffer. Any ideas would be nice. Chris