bttv / v4l2 -- status + todo list

  Hi,

I've just uploaded bttv 0.8.24.  There are also fresh patches for
videodev and the v4l2 stuff, the v4l1-compat module stopped compiling on
2.4.9 due to some min/max macro fixes.  bttv 0.8.x does also depend on
davem's pci64 patches (you might have followed the highmem-io / 64 bit
pci dma discussions on the linux-kernel list).

I still plan to get bttv 0.8.x into the 2.4.x kernel as soon as
possible.  Unfortunality I ran into some memory management problems
which delay the whole thing:  userspace pages might be allocated from
highmem, which in turn causes trouble with current kernels as there is
no support (yet) to do DMA transfers to highmem pages.  My current plan
is:  (1) wait until davem's patches made it into the standard kernel,
(2) give bttv 0.8.x some more testing in either 2.5.x (if it started at
that point) or -ac kernels,  (3) try to get bttv 0.8.x into 2.4.x
(without v4l2 for starters).

I've splitted some video buffer memory management code into a
bttv-independant source file (video-mm.c in the tarball).  That code can
(with the help of davem's patches) deal with highmem dma.  It does not
(yet) handle memory which can't be reached at all with 32bit PCI DMA
transfers (i.e. memory above 4GB on intel machines).  Not sure what is
the best way to deal with that, right now the code simply failes.  This
needs to be fixed somehow, either with bounce pages or by moving the
page to another physical location.  Not sure if the later is possible.


The other item on my todo list is fixing the v4l2 API before that can go
into 2.5.x.  Known issues here are:

 * The long discussed but not done yet dma-to-userspace API thing.
 * Support for multiple tuners (Bill?  Anything new here?).

Anything else I've missed?

  Gerd


PS: I've tried to write a document about mmap() from the driver's point
    of view (see below), comments welcome.


----------------------------- cut here -------------------------

memory management primer
========================

This document tries to give a overview about the linux memory
management.  I'm concentrating on issues which are relevant for
video4linux device drivers.

I recommend to have a kernel source tree handy when reading this, so
you can have a look at the actual code or any pointers given in the
text for details.


The mmap() system call
----------------------

See also "man 2 mmap".  The mmap() system call is used to map
something into the applications address space.  "Something" might be:

 * a regular file.  This is used for example for the executable
   itself and any shared libraries used.
 * anonymous memory (MAP_ANON).  This just allocates some memory.
   malloc(3) uses this for example.
 * device memory.  Some piece of memory provided by a device driver.
   More details on this below.
 * [ more ??? ]

You can have a look at the memory areas mapped by some process with
"cat /proc/<pid>/maps".


what happens if mmap() is called
--------------------------------

The linux kernel will create a new struct vm_area_struct, then call
the "struct file_operations->mmap()" function which is registered for
the file handle passed to the mmap() system call.  If it is a
/dev/videox file handle, your driver's mmap() function is called.

It is your drivers job to deal with that now.  You have several options
what to do.  Which is the best one depends on your hardware ...


(1) map device memory
---------------------

If your device has on-board memory which is used to store image
data there, you can allow applications to access that memory
directly.  The linux framebuffer drivers do that for example, you
can ask them for a mapping of the video memory and applications
can draw there simply by writing to the mapping (which will access
the gfx boards memory directly).  Have a look at drivers/video/fbmem.c
for details.

It may or may not wise to do that.  The read/write performance to
the onboard memory can differ alot from main memory.  Modern PCI
graphic cards for example are optimized for write access and are dead
slow if you try to read the framebuffer memory.  Scrolling the
framebuffer console by doing a complete redraw is way faster than a
simple memmove() simply because you avoid the read access to the device
memory.


(2) allocare and map kernel memory
----------------------------------

This is what most video4linux drivers do today.  The driver will
allocate a big chunk of (unswappable) kernel memory.  One way to
to that is to allocate physical continous memory using the bigphysmem
patch (FIXME: still true 2.4 ???).  A better way is to use vmalloc.
vmalloc does _not_ give you physically continous memory.  For most
todays hardware this is no problem througth.

The driver can remap that memory into the applications address space
then.  There are two ways to do that:

 * handle it at mmap() time, i.e. the drivers mmap() function has
   to call remap_page_range() for each page.
 * register a custom nopage handler by replacing the vma->vm_ops
   pointer at mmap() time.  Your nopage handler will be called
   every time the application faults a page (i.e. tries to access
   a piece of memory which is not mapped yet) and has the job to
   return the correct page for the passed virtual address to the
   VM.  There are some helper functions to remap vmalloc()ed memory
   this way in v4l2-common.c

Note that you can map the pages in any way, there is no relationship
between the kernel space addresses of vmalloc()ed memory blocks and
the userspace addresses.  For example you can remap multiple vmalloced
blocks into one continous userspace mapping.

If the driver has to process the data returned by the device in some
way (for example uncompress image data which must be transfered
compressed due to limited bandwidth of the ISA/USB bus) there is no
way around vmalloced() buffers.  If you care about performance, that
is.  But I think you do ...

Note that there are two vmalloc functions:  vmalloc() and vmalloc_32().
The first one might return high memory.


(3) use userspace memory
------------------------

You can also use userspace memory.  It's very simply: This is what
the VM does by default if you do nothing in your mmap() function.
The application will get a anonymous mapping, that is the same it
gets from calling mmap() with MAP_ANONYMOUS set.  The malloc library
uses this too to allocate memory (for big chunks).

The only difference between memory allocated using mmap(MAP_ANONYMOUS)
and memory allocated using the /dev/video file handle is that in the
second case the video4linux driver will see the mmap() call.  So it
can do sanity checks on the mmap() parameters, look at the vma data
(and save the userspace pointer returned by the syscall), ...

Handling capture buffers this was is only useful if the hardware can
scatter-gather the image data to random pages and the driver does not
need to post-process anything.  It is possible to access userspace
memory using the functions in asm/uaccess.h, but these have some over-
head because they have to catch page faults in case they tap into
some bogous userspace address.  If you have to post-process captured
data better go with the vmalloced buffers described above.

Note that userspace memory (unlike kernel memory) can be swapped out
temporarely, i.e. the physical memory address of these pages might
change or pages might not be present in memory at all.  If you want
to use that memory for DMA, you have to pin down the pages first, so
the VM would not try to swap them out while I/O is in progress.  You
can do that with kiobufs (see linux/iobuf.h).

bttv 0.8.x does memory management this way.


get bus addresses
-----------------

To program your hardware you have to get bus addresses for your video
buffer pages which you can pass to the device.  Have a look at 
Documentation/DMA-Mapping.txt for a description on how to do that.

The usage of virt_to_bus() obsolete.  It happens to work on i386
boxes, but it is _not_ portable.  Some architectures have no fixed
mapping any more.  They have a memory management unit for I/O, and
you have to allocate bus address ranges with the functions listed
in the document mentioned above to make sure the iommu knows about
the transfers you are going to perform.


[ TODO ]
--------

 * more about iobufs?
 * kmap / kunmap?  memory zones?  docs for that anywhere?