Re: v4l2 api — Video for Linux

Gerd Knorr wrote:
> 
>  * new field handling, the current v4l2 spec a bit confusing about
>    this.  There is now:
> 
>    enum v4l2_field {
>         V4L2_FIELD_ANY        = 0, /* driver can choose from none,
>                                       top, bottom, interlaced
>                                       depending on whatever it thinks
>                                       is approximate ... */
>         V4L2_FIELD_NONE       = 1, /* this device has no fields ... */
>         V4L2_FIELD_TOP        = 2, /* top field only */
>         V4L2_FIELD_BOTTOM     = 3, /* bottom field only */
>         V4L2_FIELD_INTERLACED = 4, /* both fields interlaced */
>         V4L2_FIELD_SEQUENTIAL = 5, /* both fields sequential into one
>                                       buffer */
>         V4L2_FIELD_ALTERNATE  = 6, /* both fields alternating into
>                                       separate buffers */
>    };

> This is not complete. V4L2_FIELD_INTERLACED means what? top-field-first?
> bottom-field-first? I'd rather see one extra flag being added,
> V4L2_FIELD_INTERLACED_TOP_FIELD_FIRST and _BOTTOM_FIELD_FIRST, or
> whatever... For quite some devices (such as the ones supported by the
> zoran driver), the field order is programmable, and applications can use
> this.

Guess what, we discussed that. Here's an addendum.

There are two aspects, spatial and temporal order. Spatial order matters
to correctly combine the fields into a frame. One field goes "on top" of
the other, hence the talk of top and bottom fields. Temporal order
determines which field has been transmitted/captured first, this is
important for motion analysis in deinterlacers and codecs.

We can have flags to indicate field order, or as Gerd suggested, the
spec requires a particular order. So interlaced and sequential mode must
store top first and newest first when NTSC-M; top first and oldest first
otherwise. Applications in need of this information will have to check
the current video standard. Alternating mode must store fields in
transmitted/captured order. To determine the spatial order v4l2_buffer
already flags the field parity. Cropping will be restricted to frame
lines mod 2 because moving down one line swaps fields, i.e. temporal
order.

>    what about the video output stuff btw.  Does anybody use that?

Of course. There are plenty of cards with video out or vga to tv. I know
at least one driver using this part of the api. I think restoring output
later, perhaps even in 2.7, will be much regretted.

>  * drop zoom stuff
>    use crop_* instead (whilch needs some review + exact specification)

>  * The crop/scaling thing needs some work, Michael is busy with that.

Here's a brief explanation how zoom works and why I think it should be
replaced. For cropping and scaling a source and target rectangle is
assumed. v4l2_zoomcap defines the scaling limits, minimum and maximum
width and height of the source rectangle. The spec states that "maxwidth
and maxheight represent the total size of the raw device image.", i.e.
what you normally get without zooming. The location of this rectangle
over active video is undefined, and since the zoom ioctls are entirely
optional this doesn't answer the question which part of the picture
capturing will yield, or what the pixel aspect is.

v4l2_zoom defines the width and height of a "capture subrectangle". This
is the size of the source rectangle cropped out of the picture, then
scaled up to maxwidth and maxheight, then down to v4l2_pix_format width
and height. Figuratively speaking.

The x and y fields define the "centre of capture subrectangle in device
co-ordinates" and "maxwidth / 2 is the centre of the image." "Centre of
the image" is not defined, but one can assume the origin of x and y is
the centre of active video. It follows this is also the centre of the
v4l2_pix_format source, the v4l2_zoomcap minimum-maximum rectangle. The
units of x and y and width and height are intentionally not required to
be frame lines or pixels or samples but only 1/maxwidth and 1/maxheight
since video hardware differs in resolution.

Centre co-ordinates are ambiguous. When width and height are even, x and
y fall between four pixels and the image can be exactly positioned
relative to the active video centre. When the dimensions are odd, which
isn't forbidden by the API, off-by-one errors can result. Centring the
raw device image also prevents orthogonal cropping. For example there
are more scan lines above the active video centre than below, the
vertical blanking.

Still we don't know anything about the sampling frequency or vertical
resolution and thus the pixel aspect. This is vital information to
properly scale the image for display.

Due to these flaws, nobody apparently using v4l2 zoom and two different
ways of cropping and scaling unnecessarily complicating the API,
removing zoom is justified. It cannot even remain optional because every
choice left by the API leads to applications and/or drivers implementing
both ways to remain compatible with their respective counterpart. Let's
keep it simple.

The cropping / scaling api assumes a source and target rectangle. For
video capture drivers the source is the sampled picture, target is the
captured or overlaid image. Output drivers reverse source and target,
the api is used accordingly. Basically we assume the driver can capture
within an arbitrary window. Its bounds are defined in v4l2_cropcap,
giving the co-ordinates of the top left corner and its width and height.
This is less ambiguous than co-ordinates of two opposite corners. The
origin and units of the co-ordinate system are arbitrary, possibly 13.5
MHz samples and frame lines.

We assume scaling and cropping always happens, from an arbitrary source
rectangle within the capture window up or down to the target rectangle.
The source rectangle is defined by v4l2_crop, giving the co-ordinates of
the top left corner and its width and height using the same co-ordinate
system as v4l2_cropcap. The target rectangle is given either by
v4l2_pix_format width and height or by v4l2_window x, y, width and
height.

Scaling always happens in the sense that drivers not supporting this
still scale by 1:1 in both directions. When scaling is supported, but
not cropping the source rectangle will be fixed at the capture window
size. When cropping is supported, but not scaling the source rectangle
width and height must be the same as the target size.

struct v4l2_cropcap shall contain the default source rectangle size,
given as v4l2_crop does. This default source is supposed to be centred
over the active picture area. The spec will suggest particular values
unless the devices, for example a video camera, requires deviation.
Purpose is to align images captured with different devices. A new
addition defines the pixel aspect. The contents of v4l2_cropcap may
change with video standard and perhaps other properties yet to be
defined.

struct v4l2_crop and the VIDIOC_G_CROP and VIDIOC_S_CROP ioctls will be
mandatory only if the driver supports cropping. When cropping is not
supported both ioctls shall return -EINVAL. The application can query
the current source rectangle or request different dimensions.
v4l2_cropcap and VIDIOC_CROPCAP must be supported by scaling drivers to
calculate the scale factor. It should be supported by all drivers to
query the pixel aspect.

When cropping, hardware may not permit arbitrary locations, sizes and
aspect ratios. VIDIOC_S_CROP must return the closest values possible.
When scaling the hardware may not permit arbitrary scaling factors,
perhaps depending on cropping parameters. To accommodate for this, the
spec will require that the *opposite* rectangle is modified, i.e. the
driver proposes values closest to the previously requested dimensions,
the hardcoded default if nothing else.

Suppose the application wants to capture a particular area of the
picture. It may not get square pixels because the hardware does not
support the sampling frequency or scaling. But the application may not
care if the image is squeezed as long as it gets the requested area. So
the target size is adjusted.

On the other hand an application may want a particular image size, say
for MPEG encoding. When the driver cannot scale the image to exactly the
requested target size, the application does not care if the image must
be cropped or padded. So it asks for a target size and the source
rectangle is adjusted.

To determine both source and target size the application can request a
source size, then a target size, then check if the source is still ok.
This can be repeated until acceptable cropping and scaling parameters
have been negotiated.

Using a single structure containing both source and target dimensions is
not practical because the driver wouldn't know which value to adjust to
satisfy the others. Moreover the scheme fits nicely when cropping and/or
scaling are not supported.

Without cropping the source rectangle is fixed and only the target size
will be modified. This is already requested by the spec for
v4l2_pix_format, to return the closest values possible. One could say
the driver modified the source, found this won't work, and thus changed
it back to default, accordingly modifying the target. When scaling is
not supported but cropping, the reverse applies.

About the pixel aspect. The _picture_ aspect (like 4:3) is to be derived
from the video standard. Assumed the driver samples square pixels and
the default source rectangle size is 640 x 480 we can calculate the
pixel aspect (y/x) as: 640 x 3:4 / 480 = 1/1. Other drivers sampling at,
and only at, 13.5 MHz (ITU-R Rec. 601) may by default capture a
non-square pixel image of 720 x 480. It is important to note that 720
non-square pixels cover more information than 640 square pixels, the
aspect is not 720/640.

I pondered defining a "clean aperture" size, the size the image would
have when covering the same area as a square pixel image. In this case
704 x 480, giving: 704 x 3:4 / 480 = 11/10. Obviously we cannot take the
capture window or default source window dimensions as clean aperture.
Problem with this approach is that PAL/SECAM pixel aspects cannot be
expressed as clean aperture sizes using integers. So before we resort to
numerator/denominator pairs we can as well store the pixel aspect
directly.

struct v4l2_cropcap {
        __u32   bounds_left;
	__u32	bounds_top;
	__u32	bounds_width;
	__u32	bounds_height;

        __u32   default_left;
	__u32	default_top;
	__u32	default_width;
	__u32	default_height;

	struct {
		__u32	numerator;
		__u32	denominator;
	}	pixel_aspect;
}

Again, bounds_* (until someone suggest a better name) is the possible
crop area. default_* is the default crop rectangle (source for capture
devices, target for output devices), centred over the active picture.
pixel_aspect is the pixel aspect of the captured image when cropping the
default and scaling 1:1.

There was a capabilities field (can crop, scale up, scale down etc) but
this information is rather useless. Applications should just ask the
driver how close it can get.

struct v4l2_crop {
	__u32	left;
	__u32	top;
	__u32	width;
	__u32	height;
}

BTW we still lack a brilliant idea to reliable associate v4l2 and audio
devices, mixer and pcm. I'm talking about video devices with audio
sampling like the bt878, not audio cables connecting to the soundcard.

Michael