Re: V4L2 - RFC: DMA to userspace API

Artur Skawina wrote:
> 
> Justin Schoeman wrote:
> >
> > It is often desirable to DMA directly into a target buffer.  For
> > example, into offscreen memory of a video card, or maybe the compression
> > buffers of a codec, etc.  The main reason I'm interested is for the
> > offscreen DMA.
> 
> Didn't think of that one. Things like this shouldn't however be
> visible to apps -- it needs to be in libv4l2 etc, or you end up
> with h/w specific apps; not unlike what happens today with bttv.

1) Some apps would still need this, especially high speed codecs.
2) This would be an integral part of libv4l2, as it would allow libv4l2
to distribute captures without copying.

> > > > 5) DMA to userspace example:
> > > >   1 - set capture format (S_FMT)
> > > >   2 - request buffers (REQBUFS with V4L2_BUF_REQ_DMA set)
> > > >   3 - allocate userspace memory for the number of granted buffers.
> 
> "allocate userspace memory" might not be the best wording when talking
> about mmaped hw buffers :^) This is what confused me.

Well, at this point the HW buffers are mapped into userspace, so
effectively my wording is still correct ;->

> > > is it possible to be granted less buffers than requested? when?
> >
> > Sometimes the hardware may only support 2 buffers, or there may not be
> > enough resources, etc.
> 
> The resource starvation is what i was thinking of. Obviously there can
> be driver limitations (though if the hw can dma to random pages then
> it should be capable of supporting N pages. OTOH what about hw that
> needs continuous memory? A mmapped hw buffer would be fine, but normal
> userpace mem wouldn't. Would REQBUFS return success in the former case
> and failure in the latter? Would seem logical, but how is it expected
> to know?), but checking for available memory at this point, when it
> isn't yet allocated doesn't buy you much -- QUERYBUF can still fail
> later.

REQBUFS simply requests the internal structures necessary to manipulate
the capture data.  QUERYBUF is the call that actually sets up the
capture buffers, and it is there that these details need to be checked.

As to the contiguous memory problem - the card would simply not report a
DMA to userspace capability if it cannot do scattered writes, as by
definition userspace memory is not contiguous.  The special case of a
mmaped HW buffer should probably NOT be handled independantly.

> > > >   4 - call QUERYBUF on each buffer, filling in dmabuf with a pointer to
> 
> > > so the only difference is that the app instead of the v4l2 layer is
> > > doing the allocation, right? The advantage?
> >
> > Yes. As above.
> 
> If the main use were to be feeding codecs/framebuffers/textures then
> i'm not sure this isn't too generic. OTOH it does appear not generic
> enough if it's supposed to be a universal solution (see above).

Finding the balance is tricky, which is why we have RFCs ;->

> > > QUERYBUF pins down the userspace buffers? what happens it that fails?
> >
> > The app receives an error code, and must handle it gracefully.
> 
> >From an app POV it would seem the only "graceful" thing to do
> would be to abort -- it's not like any other scheme is likely
> to work if we're short on memory. OTOH reducing the # of buffers
> could work, but then it has to undo everything, free the "successfully"
> allocated mem and go back to (2), which, remember, already returned success.
> Doesn't seem like a nice i/f, but could work -- yes.

Well, it doesn;t have to go all the way back to (2).  It could simply
not queue that specific buffer for capture.  Something like this:

if(ioctl(fd, VIDIOC_QUERYBUF, &buf)==0)
	ioctl(fd, VIDIOC_QBUF, &buf)

Something like this would queue all succesfully mapped buffers.  Calling
STREAMON would then only capture to the mapped buffers.

> > > How is the amount of mlocked memory limited?
> >
> > Not yet handled.  Any ideas?
> 
> This and the question below are related :) I'm not sure if a
> "capture resource" includes the buffers (it probably shouldn't,
> having more than one buffer set can make sense, eg webcam+preview
> both using different output hw), but if multiple opens can mlock
> some memory each then there can't just be a n-buffers/open limit.
> Having a global limit on the total amount of memory mlocked that
> could be shared among all clients might be enough for cooperative
> clients.

This is what I currently do for bttv2 with kernel memory.  We really do
need some formal guidelines here tough.

> > > Are multiple opens
> > > of the same resource possible?
> >
> > Yes, in a way.  Multiple opens of a capture card are possible, but only
> > one open can control a capture resource on the card (the definition of a
> > capture resource is very loose, and depends on the hardware).
> >
> > > >   5 - perform streaming capture.
> > > >   6 - unmap requested buffers (REQBUFS with count set to 0).
> > >
> > > Simply closing the device would be enough too?
> >
> > That would also do, but the apps must have control of the release points
> > for capture resources.  You don't want to close the card, and
> > reconfigure from the beginning just to change the image size.
> 
> Sure, but the memory must be unlocked when the app closes the device
> or otherwise gets killed.

Yes, that goes without saying 8-)

> > > Would the new v4l2_buffer be compatible with the old one?
> >
> > Yes.  Userland apps could be used in the modified without recompiling.
> 
> Thanks for the clarification
> 
> artur

Thanks,

-justin