Forgot your password?
typodupeerror
Graphics Software Media Technology

Using Photographs To Enhance Videos 102

Posted by timothy
from the ready-for-my-closeup dept.
seussman71 writes with a link to some very interesting research out of the University of Washington that employs "a method of using high quality photographs to enhance a video taken of the same subject. The project page gives a good overview of what they are doing and the video on the page gives some really nice examples of how their technology works. Hopefully someone can take the technology and run with it, but one thing's for sure: this could make amateur video-making look even better than it does now." And if adding mustaches would improve your opinion of the people in amateur videos, check out the unwrap-mosaics technique from Microsoft Research.
This discussion has been archived. No new comments can be posted.

Using Photographs To Enhance Videos

Comments Filter:
  • by QuantumG (50515) * <qg@biodome.org> on Thursday August 14, 2008 @05:43PM (#24607619) Homepage Journal

    Why is UW not releasing their source code? If they intend to spin off commercial products, why are they releasing demos? Hell, even *Microsoft* is releasing demos of this stuff.. is Apple and Google the only companies that can ship product these days (even if it is "beta" you can at least freakin' use it).

    No more demos. We know you're smart, now make something useful please.

    • by imsabbel (611519) on Thursday August 14, 2008 @05:51PM (#24607735)

      Simple reason:
      They _say_ that it does this automatically.

      Translation: We put some phd student on it who spend some months optimizing the settings for the 2 selected scenes so we can make a nice publication and maybe get more money.

      If you just look at their steps of the workflow, the way they discribe it just isnt possible (like the way they "stereoscopically" create a depth-map from a _single_ still photograph..).
      Not to mention that the first scene looks like a bad video game level after their "improvement".

      • by KalvinB (205500) on Thursday August 14, 2008 @06:10PM (#24607981) Homepage

        The easy way would be to use the already calculated depth field from the frame in the video that best matches the photo.

        • by imsabbel (611519)

          True. But still, notice how they use a reference movie thats basically a side-scroller to create this depth-map?

          Basically, stuff like this is released for _every_ siggraph. The first few times i was still blown away, but now that i am used to it (and in academia myself, so i know the paper game), i simply cannot get excited for semething that will never see realisation.

      • by CaptainPatent (1087643) on Thursday August 14, 2008 @06:15PM (#24608055) Journal

        like the way they "stereoscopically" create a depth-map from a _single_ still photograph.

        No no no, read the fine print

        Stereocycloptically, not stereoscopically!

      • by mo (2873) on Thursday August 14, 2008 @06:50PM (#24608435)

        like the way they "stereoscopically" create a depth-map from a _single_ still photograph

        TFV said they were using video frames to do stereoscopic depth-mapping. Since the source footage changed perspective, they can build a depth map based on the relative shift of each object in the video, and then project the high-quality photograph on top of the derived 3D structure

      • by pfafrich (647460)
        If you just look at their steps of the workflow, the way they discribe it just isnt possible (like the way they "stereoscopically" create a depth-map from a _single_ still photograph..).

        In the video they say they use structure from motion to create the depth map.

    • Re: (Score:3, Informative)

      by Swizec (978239)
      That's because they're all renders! None of it is real.

      Pics or it didn't happen. Or in this case, apps or it happened only in photoshop/whatever.
      • Pics or it didn't happen. Or in this case, apps or it happened only in photoshop/whatever.

        Perhaps you're right. However, apps like SynthEyes are already like 90% of the way there. Their demos aren't a huge leap away from what is already on the market.

    • by Hays (409837) on Thursday August 14, 2008 @05:55PM (#24607781)

      The publication is supposed to contain enough information to recreate the results.

      Question 4 on the SIGGRAPH review form -
      "4. Could the work be reproduced by one or more skilled graduate students? Are all important algorithmic or system details discussed adequately? Are the limitations and drawbacks of the work clear?"

      If you or a company wants it bad enough, the information is there, unless the review process failed (which does happen).

      This wasn't a SIGGRAPH paper but the ability to reproduce results is none-the-less a standard prerequisite for academic publication.

      It's certainly not as convenient as releasing source code, but that's sometimes a big challenge for an academic researcher because the last thing they want is to have to support buggy, poorly documented research code for random people on the internet.

    • CREEPY! (Score:2, Interesting)

      by Ohrion (814105)
      Did anyone else notice a very creepy effect in the "enhanced" video with the bust? It made it look like the head was turning to look at you as you moved around it. *shudder*
    • by joeava (1147727)
      If this is a third party funded project, is the source code copyrighted to the sponsor?
    • um? It's right there in the article: http://grail.cs.washington.edu/projects/videoenhancement/videoEnhancement_files/VirtualStudio.zip [washington.edu] The only thing missing is the Structure from Motion code. The readme is interesting, it says that it takes about 5 minutes to process each single 800x600 frame. It still has its uses, but for masking I bet I could do that faster manually. http://grail.cs.washington.edu/projects/videoenhancement/videoEnhancement_files/README.txt [washington.edu]
  • by 4D6963 (933028) on Thursday August 14, 2008 @05:47PM (#24607683)
    The other cool part of it is that it derives a cloud of points from the video, meaning it can turn a video into a 3D model, apparently. However it seems like their program only uses it internally.
  • Patent Encumbered? (Score:4, Insightful)

    by reality-bytes (119275) on Thursday August 14, 2008 @05:51PM (#24607719) Homepage

    I always get this feeling when I see a university-styled promotional release that the *software* patents are already pending.

    I haven't the time to search just now but I'll bet there's at least one application pertaining to this method which encompasses a hell of a lot more.

  • simply amazing (Score:1, Redundant)

    by PhrostyMcByte (589271)
    Wow, that's got to be some of the coolest tech I've seen in years. I can't wait for some software to come out that uses it. Avisynth [avisynth.org] plugins, anyone?
    • by 4D6963 (933028)

      I'm afraid it's going to be used in music videos to suddenly make make flowers or tentacles spread on walls, or other such stupid uses.

      By the way, considered how you can modify an object in a scene by replacing a frame of it, or adding a picture to the mix, does it mean we can make Clint Eastwood look like he's 30 again by using a picture of him when he was young on some recent footage of him, or even do entire "head transplants" on videos?

  • Fractal compression (Score:1, Informative)

    by IdeaMan (216340)

    Combine this with fractal compression [wikipedia.org] and we could store all the videos we've ever seen on one hard disk.

    • by 4D6963 (933028)
      Actually that was the first thought that occurred to me, that it could be used to store videos by storing a high resolution keyframe and then only themovement data. Then it occurred to me that it's already what our modern video compression algorithms do. You can tell when you skip a WMV video ahead (or that it skips) and that artifacts look like they belong on the object they appeared on, until the next full keyframe.
  • by spoco2 (322835) on Thursday August 14, 2008 @06:02PM (#24607877)

    Really, the ability for their software to 'unwrap' a 3D object and allow you to fiddle with it as you wish is very cool.

    And not limited to a 'static' scene.

    And, really, if you're going to go to the effort of videoing a scene, then photographing the scene, then passing the video and the photos through their software. All to get better exposure and resolution.

    Um.

    Wouldn't it be a far better cost/effort equation to just buy a better HD camera in the damn first place?

    • The more interesting aspect is that you can tweak those still photos, and then transfer them back. Photoshop some key frames, and you have suddenly created a video with the same manipulation. The video is just a cheap source for spatial data, which you can then texture with your photos.
      • That would greatly lower the cost of doing special effects, if you didn't have to do them frame by frame.

        • I assure you that in all but the most insane cases, doing that frame-by-frame is already not done anymore (high quality rotoscoping, on the other hand.. yipe). You model a quick mock-up in a 3D application, project your painted texture onto that, and composite that with the original footage.

          What it does do is remove that whole 'You model a quick mock-up' part in many (not all) cases. Now to see who gets the patents, how much they are to license, and who get(s) to toss it into their editing suite.

        • It would greatly lower the cost of doing special effects if you did special effects frame by frame.

          The problem with most of these technologies is that they never reach photo-realistic visual effect quality results.

          Any time you have to do frame by frame VFX you're doing it for the sole purpose of getting a more perfect result. If you need average to crappy results you won't be doing it frame by frame.

          This tech has cool potential and will be used by the VFX industry but it won't be automatic and it'll be to

    • by Endo13 (1000782)

      All to get better exposure and resolution.

      Clearly it's intended for pr0n!

    • by 4D6963 (933028)

      Wouldn't it be a far better cost/effort equation to just buy a better HD camera in the damn first place?

      What about you do buy a HD camera and combine it with a 12 megapixel still camera? Besides just a HD camera doesn't fix the issues you can fix by then adding HDR shots to the mix.

    • Re: (Score:2, Insightful)

      by jebrew (1101907)
      I'd just connect a camera to the bottom of my camcorder (they both have a spot for mounting).

      Then just have the still camera do continuous shooting @ ~1fps while you video. Match them up in this software when you're done and you're good to go...now if I could just get a hold of their software.

      • Lets just say your camera is capable of 10MP, and that an average photo is 3MB(jpeg). Then 3MB x 60(secs) x 60(minutes) = 10800MB per hour I hope your camera can keep up with the write rate (plus the fact that your expecting it to always be in focus for the 3,600 photo's you've just taken).
    • Still cameras can pretty much always take higher res pictures than video cameras no matter what price range you're looking at. I wonder if they could combine a still camera into a video camera and have it take high res still frames as the video is shot and then use this software to improve the video. It seems that'd be a cost effective way to squeeze more out of any level of camera.

    • Wouldn't it be a far better cost/effort equation to just buy a better HD camera in the damn first place?

      I hate to be captain obvious here, but historical footage strikes me as the #1 reason (historical meaning everything up to yesterday). I mean you can't go back and reshoot the millions of hours of footage the world already has, but there's a lot of high resolution photos of the some of the same subjects.

      Secondly, the still resolution on most point and shoot cameras is a lot higher than the video reso

    • Re: (Score:2, Insightful)

      by imess (805488)

      Wouldn't it be a far better cost/effort equation to just buy a better HD camera in the damn first place?

      Hint: years old amateur/family/etc video meets modern high-res camera.

    • Re: (Score:2, Insightful)

      by dword (735428)
      if you're going to go to the effort of videoing a scene

      "you..videoing..." isn't the only application. This could be used to enhance other videos. Let's say someone else made a great video (captured some really great scenes, focused on some details) and you want to publish it but even if they focused on cool details, they're not enough. You take a few pictures and enhance their video.
      Also, this is just the start. They are currently enhancing static videos but I'm sure in the near future, if this is worke
      • by spoco2 (322835)

        Um... geeze... a little impassioned perhaps?

        All I was saying was that the Microsoft tech, which got the small billing, looked to be the more interesting and useful now compared to the other tech which got prime billing.

        You know, because Microsoft is 'evil', so doesn't get attention, but a University is wholly 'good' so gets top billing.

        And also, you're telling me there aren't a heap of research time spent on ways of doing things that are overly complex and impractical just because the researchers want to do

    • Can you buy an HDR video camera? The technology enables effects not possible with current video cameras. Plenty of people don't have the means to "just buy a better HD camera in the damn first place"
  • A better use? (Score:5, Interesting)

    by neokushan (932374) on Thursday August 14, 2008 @06:10PM (#24607983)

    All of these techniques are pretty awesome and will certainly be a boon to home video enthusiasts the world over (plus plenty of commercial places that are on a tight budget), but I've got another idea.

    You see it on TV all the time, CCTV footage of robberies and the like, couldn't this technology be used to effectively map out a 3D image of the purpetrator?
    I know it wont be perfect and most CCTV is probably too low quality to be used, but it would certainly be pretty cool (and useful) to have a vaugely accurate 3D model of the guy, giving you height, build, etc. and with the help of supplementary images, a really easy way to adjust it's appearance.

    • Re: (Score:2, Informative)

      by sirkha (1015441)

      You see it on TV all the time, CCTV footage of robberies and the like, couldn't this technology be used to effectively map out a 3D image of the purpetrator? I know it wont be perfect and most CCTV is probably too low quality to be used, but it would certainly be pretty cool (and useful) to have a vaugely accurate 3D model of the guy, giving you height, build, etc. and with the help of supplementary images, a really easy way to adjust it's appearance.

      Yes, like, you could adjust the appearance to look exactly like someone else! Not saying that one would or should do this, but now that they can, they probably will.

    • Re: (Score:3, Interesting)

      by Pingmaster (1049548)

      I would say that mounting a high res still camera in parallel with the CCTV camera and taking, say 1 picture every 10 seconds after the CCTV montion sensors are tripped, which would have quality comparable to a high-end consumer camera (i.e. 7-8 Mpixels), then use that data to enhance the video taken to aid in identifying suspects

      That said, I don't think these 'enhanced' videos should be admissible as evidence, since the videos have been effectively tampered with and given the possibility of altering identi

      • by nametaken (610866)

        No, but if it helps me find the guy with the QuickStop drop bag in his back seat stuffed with bills and a .38 snub-nose, that would help. :)

      • by MobyDisk (75490)

        All the photos from cameras have been digitally enhanced. What the camera itself produces is not een viewable by a human. Software either in the camera, or in the PC software in the case of RAW image files, converts the matrix of RGB values into a photo.

        While someone could certainly question the accuracy of the enhancement process, there is no good reason enhanced photos could not be admissible as evidence. It would not surprise me to find that it is very common to do simple enhancements anyway since CCT

    • by jebrew (1101907)
      How about a security camera that has a 10 megapixel still camera shooting at 1 frame every 2 seconds when there's motion.

      Take the video at about qvga resolution, map on the high quality still and voila, you've got HD video on the cheap!

      • by Steveftoth (78419)

        Then your problem is storage, cause it's a lot of pixels to keep track, which is one reason that the CCTV cameras are so low res, it keeps the bitrate down so they can cheaply store so much footage.

        • by neokushan (932374)

          A few years ago that would be entirely valid, but now you can pick up a 1Tb HDD for around $100, storage really isn't as big an issue these days.

    • by krayzkrok (889340)
      Bear in mind that the key factor here is the photographs. Whatever you want to "enhance" must be present in those photos as well, so a robbery on CCTV can only be enhanced in this way if simultaneous photos that include the perpetrator are available... in which case why not simply use a high-resolution still security system in the first place (easy: it costs too much to store the massive amounts of data). You also cannot enhance, say, people moving through the scene or changing / unpredictable elements in
    • How about the other way around? Hack the camera, and make yourself invisible to it.
  • by bill_kress (99356) on Thursday August 14, 2008 @06:36PM (#24608269)

    When the ability to deconstruct a video into a 3-d model & skin (the opposite of what a video card does now) is placed into an open-source API, the possibilities are going to be HUGE (and a little frightening).

    Anyone want to post a few ideas? I'll give you a few topics to kick things off:

    Change detection (Finding lost objects in a room, seeing boxes left in a government office, where's my remote)
    Change observation (plant growth, things that change too gradually for us to notice)
    Creating 3-d models from humans (extracted from old films, walking down the street)
    Weapon systems (Undetectable lasers blinding targets, Unmanned guns with perfect accuracy)
    Home interaction (Make a sign with your hand, computer changes the channel, lighting, heat, ...)
    Office monitoring (Exactly where each person is any time just by typing "Where's bill" into your PC)

    All things that could be done by any hobbiest/hacker with the right API.

    (I assume that to get real-time you could use the massively parallel abilities of a video card, making this stuff run on any hardware...)

    Also, just storing models and skins is extremely efficient--You could film a room for years in extremely high resolution and use virtually no storage (almost none except when something or someone new enters the room, then just one new high-def skin)

    Other ideas?

    • Surely, Carmack's got to be interested in this? If he does take to it, I suppose the next gen of hardware renderers would have to be optimised for it. You're right, if it's as good as the demo video makes out, it could be the next big thing.
    • by MobyDisk (75490)

      I'm not sure a 3D model is going to help most of these things.

      3D models are not necessary to do any of these tasks. Object recognition can be done in 2D, and it is very very hard. I will speculate that doing it in 3D is going to be even harder. Plus, using human brains as a model, we don't do it that way.

      --

      Change detection - Change detection can be done in 2D, and the person viewing the image can see where the object is anyway. No need to have the 3D model. As for object recognition, that is machine vi

      • by bill_kress (99356)

        You are exactly right. 2D recognition is extremely difficult--if not impossible. Pretty much like rendering a 2d screen without a 3d model backing it is really difficult. All your points were that doing it in 2d was hard. You were right about every one, doing it in 2d IS hard.

        The point of my post was, what if we had a library that took 2 cameras in the real world and changed them into a 3-d model with skins (which is how the article was done). How much easier does it make all these problems. Well, the

        • by MobyDisk (75490)

          I don't think you even read my reply. Furthermore, I don't think you know anything about what you are talking about.

          All your points were that doing it in 2d was hard

          I actually said the opposite. All of the things you listed can be done in 2D, except for object recognition. Which cannot be done in 2D or 3D. So having 3D data does not help.

          The rest of your reply was just rambling about how more cameras = better. Doesn't address any of the fundamental issues. Meh, why am I bothering to reply.

          • by bill_kress (99356)

            Strange, I kinda felt the same way.

            Object recognition can be done in 3d much more easily than 2d, THAT is the point. Your saying that it can't doesn't change the fact that that's what the subject of the article was doing (Assuming you read it).

            With 3d, a generic identification of objects is very possible and reliable. There are no more questions about what part of the image is part of what object because you have distance information for every surface, the code that looks at the 2 pictures to draw the 3-d

            • by MobyDisk (75490)

              I think you are both wrong, and I think I now see what you are misunderstanding.

              When we think of a 3D model, we imagine that we have a series of coordinates for a head, and then some joint that connects to a neck, and some coordinates for that. Then a torso, with coordinates, etc.

              But if we extract 3D images from cameras, we have none of that. Firstly, we have a scene, not a model. We have only points and textures. So first, we don't know what is the person and what is the floor, or the wall, or the tabl

              • by bill_kress (99356)

                I think you are missing one step. Your arguments are still simply those against a 2-d image. A 3-d image is a completely different problem.

                With 2 images of the exact same scene taken from slightly different angles, you have much more information than you have with a single image.

                With the two camera angles, you have the data available to calculate the x, y and z coords of each surface you see... You are not simply viewing a flat image as you are with a 2-d camera image.

                Given the additional info, you can ac

  • With most if not all video cameras storing the video digitally, and now with all these new techniques for editing video, why would any court allow for video evidence?
  • by Atilla (64444) on Thursday August 14, 2008 @06:39PM (#24608323) Homepage

    This software, if it actually works as described, could also be used to easily fabricate video "evidence". An average viewer would not be able to tell the difference.

    Kinda scary...

    • I hear that's how the framed the Butcher of Bakersfield... Time to start RUNNING!!
      • by Fri13 (963421)

        Funny, I just watched that movie (The Running Man) yesterday.....

        Actually the "Butcher of Bakersfield" part was not manipulated, only a cut and rejoined the voice parts. This technology would be used on the later part where captain freedom "killed" Ben Richards..

    • by MobyDisk (75490)

      I think this would make fabricating evidence much *harder*.

      Today, if you want to add a gun into the photo you just have to make it look right from one set of lighting, and one photo, with a limited resolution. If you had multiple cameras generating a 3D model, you must now Photoshop the evidence in so that it looks right from multiple angles, PLUS the software could tell if the gun was shaped differently from different viewpoints. So your pixels must produce a perfect 3D model of the object that is consis

  • I believe IBM did something like this is the 1990s. Obviously not as slick, they didn't have as many CPU cycles.
  • automatic or not - that was a huge eye-opener if that technology is available at the grad-student level. Available in commercial/comsumer products in 3-5 years.

    So much for "video evidence". So much for reality. "Your honor, I have a video of what happened and I wasn't there! ... see?"

  • This is a 3d track of the shot (which generates a point cloud of 3d points, which can then be used to generate an automatic 3d model of the scene). They then project (a method of texturing that paints a model based on points of projection.. what happens when you stand in front of a projector- you get projected onto) the still photos onto the 3d model, recreating all aspects of the texture and geometry, but instead of SD resolution, you now have gigapixel resolution built into the model.

    The reason it looks l

    • by all204 (898409)

      The solution is NOT to fix it in post. The solution is to spend 5 minutes, think it through, and fix it while you're filming.

      Thank you. A little OT, but I do location sound for indie films, and that is the most horrible thing a director can say.... "Fix it in post". 5 minutes on set, or 3 days in editing. To me thats an easy choice, but apparently not to everyone.

      • I don't work in film, so I have a question: how much do the actors and crew get paid per hour compared to an editor?
        • by all204 (898409)
          Well... I work for a film coop as a volunteer, and we're all volunteers, so none of us get paid. :p So maybe its different in the pro-circle, I don't really know. But as far as sound goes, if you have bad location sound, you have to rebuild the scene in post, and that usually requires scheduling all the actors in that scene to come back and redub the sound. So even from a money standpoint, I think its expensive, but also in the volunteer circle, people move and post can happen months after the actual shoot
        • It varies wildly. Very, very wildly. Sometimes editors will get paid more, sometimes actors will get paid way more. At the professional film level, they're both unionized positions, unlike effects workers (sad face).

    • by blincoln (592401)

      The solution is NOT to fix it in post. The solution is to spend 5 minutes, think it through, and fix it while you're filming.

      Obviously everyone should shoot still/video with the intention of it being perfect without postproduction, but sometimes it's impractical or impossible to go back and reshoot something when you find out it's got a problem of some kind.

      For example, I went on a drive down the west coast of the US a year ago and took a bunch of pictures. Halfway through the trip, I discovered that someti

      • Hi Lincoln,

        I understand what you mean with unexpected problems that you encounter after the shoot day, but "Fix it in post" is a term most often used on set to avoid working on issues BEFORE or DURING a shoot. For instance, instead of doing prep work before the shoot for a monitor inlay, they'll spend 4 times the money and time to do it in post. Instead of finding a smart way to break glass, they dress some wacko up in green and have him stand between the glass and the camera to break it- green is apparentl

  • 1) Tape myself with cheap whore
    2) Combine with picture of Claudia Schiffer
    3) Become popular!
  • This might be helpful to deal with copyright-protected material that gets into frame, for instance, billboards, logos on T-shirts, posters and art-work on walls. Take a single frame into a photo-editor, replace the unauthorised image with an authorised one, and this technique could potentially replace it throughout the sequence. It could equally be used to replace moving images, for instance on a TV screen, with a "blue screen" (or green), that normal video compositing software can then replace with a desi
  • photo hunt has been doing this four years. let me know when something new comes up regards, mike
  • This blows my mind almost as much as Melodyne version 2 does: http://www.celemony.com/cms/index.php?id=dna [celemony.com] Only instead of 'direct note access' it's 'direct video object access'. Or something.
  • This is getting really close to being something that every straight male porn viewer (which means every straight male on Slashdot...) would pay a lot for.

    Combine the ability to remove items from a video, like they showed with the lamppost and sign in the flower shop, with the ability to insert new things into the video, and you could turn a boring man on woman porn movie into a lesbian twin incest movie.

  • It's not exactly related but Al Jazeera just had a piece about a pedophile who got arrested last year after interpol "unwarped" some picture he had put online.
    Maybe those new tech might be used to produce that kind of useful result and not only better pops and moms holiday pictures..
    Old article: http://www.guardian.co.uk/uk/2007/oct/19/ukcrime.internationalcrime [guardian.co.uk]
  • These two (article) technologies IMHO are more important than Photosynth

    http://www.youtube.com/watch?v=556FvXHLtAo [youtube.com]

  • Could this be used as a form of video compression? Shoot your video at high resolution. Extract a few high resolution stills from the video, and then convert the video to low resolution. Save the low res video and the stills. When you want to play the video, use their algorithm, with the stills taking the role of the photos, to enhance the low res video.

"In the face of entropy and nothingness, you kind of have to pretend it's not there if you want to keep writing good code." -- Karl Lehenbauer

Working...