> Which exact part should be cache optimized? What about optimizations for You need to be aware of the cache at all time. The L1 cache is about 70 times faster than main memory, L2 about 9 times faster. > the most difficult. If you really want to rip, use the SSE2 and MMX or > 3dNow! instruction sets and hand-code the expensive sections. But that sort > of coding takes a while, and all I want is an interchangeable codec system in > place now so I can play around with it. Thing the bigger picture, Lets take MPEG2->YUV->Filter->CU30 you don't want to copy entire image into YUV format start->end with the end kicking the start out of the cache. filter on cold cache do cu30 from cold cache You want to do foreach(tile) mpeg2 decode tile, filter time, cu30 tile all in L1/L2 > If you really want performance then you need to be writing directly to or > reading from video memory, and using the video card to do hardware blitting Out of date. On a modern AGP card you want the card to do the fetches, clips and overlaying. You certainly don't want to write to video ram > Alan, it seems you are jumping to optimization before a system is even in > place. Thanks for the heads up about xine, though. I will look through it > tonight. The danger is in designing a system that can't be made to go fast. Tiling is a very well established graphics technique > About half of my senior project would have been done already if we had direct > show in linux (Actually, that was about 99% of it). Nice. > system. Data goes in one end of the filter and comes out another end of the > filter compressed or decompressed. The scaling and conversions could just be > added stuff that looked (to an application) just like another codec (which > scaling could be though of as really simple compression/decompression). Chris Agreed.