Conclusions

After testing our algorithm with the noise and real videos, we canconclude that our video shazam is working really well in terms of guessing which is the video we want to detect.

The most important analysis that we can do by looking at the data we have obtained by testing the algorithm we made is the one referring to the AWGN noise and the SNR thresholds that it has for the videos we are corrupting by the use of that noise. Note that the lowest SNR threshold (for Blade Runner) is of about -29, which translated to linear and more understandable language, means that the algorithm is able to detect the video correctly even if the noise is approximately 20 times more powerful than the video input we are introducing, and that is a very big difference between signal and noise.

Apart from that, our program has shown that it is able to detect real videos correctly, even when the environment is really much harder. As it can be seen in the test videos we recorded, the algorithm that we have coded does detect videos correctly even if there are light reflections, reflected images in the screen and even if the recorded video does also contain parts of the screen that do not correspond to the original video. This really enhances the performance of our video Shazam.

However, it is not perfect, as it can be seen that it fails when the mobile phone records other parts with movements or if the camera is moving quite a lot (for example due to the trembling of the hands of the user) . In addition, the algorithm is quite slow while computing the output, specially if the video we are testing is a video that we recorded using our on mobile phones, but that can be fixed optimizing the code making it really effective.

AS an addition to this, one curious thing we have seen in our algorithm’s performance, is that every time we put a noise that exceeds the value of the threshold, the output video is always Reservoir Dogs, and so we were curious about why would that happen like that, like What does make Reservoir Dogs more special than the other films (because thinking about the randomness of the noise, we should have sometimes other videos as output)?

So this is the possible theory we made to explain this:

We can imagine that the fingerprint vector that we generate from the videos we have made form a vectorial space of fingerprints. This vectorial space is spanned by all the videos that we include in the database, and has n-dimensions as a maximum, being n the number of components of our fingerprint vectors. So if we imagine the world of fingerprints as this abstract mathematical vectorial space, then, the fingerprints are just a set of n-dimensional vector in a cartesian n-dimensional space. But we also know that the cartesian space is a Hilbert Space with inner product and norm defined as:

So we can calculate the relative angles between the fingerprint vectors and see how they are distributed in the space using the fact that the inner product in the cartesian space does also equal to the product of norms by the cosine of the angles they form. So doing that, we see that the angles that our fingerprints form are the next ones (in degrees):

FMJ^BR → 61.18
FMJ^TG → 70.21
FMJ^RD → 46.42
BR^TG → 61.88
BR^RD → 49.71
TG^RD → 62.27

So now we have to see how the noise corrupts the fingerprints in this vectorial space. The answer to this is that what noise does to the vector fingerprints is to change their directions randomly, so they start to have an angle between the ideal and the estimated noisy fingerprint. That can be seen in the next photo, where the first image shows that after doing lots of repetitions of the noise, what appears to happen is that a cloud of fingerprints appear surrounding the original fingerprint, and the second shows just one.

So when noise is added to the video, the new fingerprint changes direction and starts to have a value of error when the least squares error approximation is used. Finally, when the noise is too heavy that the projection to an incorrect fingerprint has less error to the correct one, it fails.

That’s how we want to explain why Reservoir Dogs is the film that appears when the noise is heavy, because observing the angles that we calculated before, it’s is the one that less angle has between the other films (except in the case of The Godfather that has almost equal to the one that it has with Blade Runner). So as the angle is smaller between the films and the Reservoir Dogs film, it is the one that is more likely to appear when the noise is added as the projection to it’s range will be the one to have less squared error.

This approach also explains why is The Godfather the film that can handle the heaviest noise before being completely corrupted for the algorithm, because it has the biggest angles with the three other films, so the noise will have to be stronger to make our algorithm to be wrong than in the case of Full Metal Jacket or Blade Runner.

Finally, we cannot explain why we can always succeed in guessing Reservoir Dogs except for ridiculous values of SNR like -10000 dB, where the signal is almost negligible.

Future improvements

We know that the application we’ve created is quite limited, but it is only the first version. Below, we have listed potential improvements for future versions.

Create an app for smartphones. It would be useful to have the application on one’s phone. In this way, whenever an individual wants to identify what movie they are watching, they can just record it with their phone camera.

Crop the video size. As we have concluded if the video is not correctly recorded with the mobile phone camera our program is more likely to fail. So we have think that we could somehow cut our video and only take the window where the interesting film scene is recorded.

Optimize the Matlab code. Right now the program is quite slow. We need to reorganise the code and rewrite it in an optimal way.

Enable the program to work with all scenes of a film. Currently, we have a fingerprint of a single scene of a film. The program will only work if we give an input video of that exact scene of the film. It would be more interesting if the program were able to produce the film name, given any scene of the movie. This would require us to do further study, but there are some ideas in the paper.