Saturday, July 26, 2014

Android port.

I planned from the start that this project will be multi-platform. At least three platforms are planned to support: Windows, Linux and Android. I already posted screen shots and videos from Windows and Linux. Now I want to show you my current progress on Android.

When I set myself the goal to port my work on Android I knew nothing about OpenGL ES 2.0, which is currently used on most of mobile devices. Somebody told me that porting should be easy. Liar. My current work is based on sources of Orkin’s glN64 plugin, written 10 years ago, before OpenGL 2.0 became standard. Now OpenGL standard has version 4+, and it is changed drastically. Nevertheless, it is backward compatible with old functions and allows programmers to mix new features with old good ones. But only on desktop. Mobile devices don’t want to support old shit, so everything “superfluous” was thrown away. With OGL ES shaders are must, data arrays are must, “fixed” functionality is reduced to minimum. When I estimated amount of work needed for port, I felt sad. Most of inherited OpenGL code had to be completely rewritten.

I downloaded sources of Mupen64Plus Android Edition (AE), which is unofficial port of Mupen64Plus to Android. I hoped to find some clues in it how to make the port. I was lucky: Mupen64Plus AE already has working Android port of glN64! I took it as reference and started to rewrite my version. Some part of code was absolutely useless for me, but I must confess that large part of code was adopted with minimal changes. Nevertheless, when I finished the rewriting, I spend weeks before I got first picture. On desktop version! Then I spent few months fighting with regressions. On desktop version! Of course, I did not work on it days and nights – I don’t have so much time. But a lot of my spare time was spent just to get the result I had with my old OpenGL code. When my plugin started to work with Mupen64Plus for Linux I finally approached to Android itself. Again, I spent many time just to make the code compile on Android. And I spent much much more time before I got something rendered on screen of my device. May be next time I’ll describe the technical problems I solved. I’m just in the beginning of that long road. Today I’ll just show you a few screenshots:

GUI. Video  plugin selection.

My plugin settings. Not everything is working yet.

The Legend of Zelda - Majora's Mask.
Motin blur.

Mario Golf. Frame buffer emulation.

Mario Tennis. Far from perfect yet due to ES restrictions on Frame Buffer Objects.

Depth buffer emulation

In Frame buffer emulation. Intro article I described what depth buffer emulation is and why it is necessary. The main goal is to have depth buffer in RDRAM (N64 memory) filled with correct values. There are two approaches:

  • software rendering directly do depth buffer
  • prepare necessary data on video card  side and copy data from video memory to the RDRAM.
Both approaches have pros and cons. Software rendering approach was successfully implemented in Glide64. It was the only option for the target hardware of that plugin. Thus, merit of software rendering is that it is suitable for any video hardware. Obvious drawback is higher CPU load, but for all CPU since Pentium 3 it’s not a problem. There are less obvious shortcomings, which need to be explained:
  • Low-res result. Depth buffer size is the same as size of main color buffer, which is mainly 320x240. It’s perfectly enough to emulate flame coronas, but when depth buffer is used as fog texture the result is bad. Run Beetle Adventure Racing with Glide64 to see what I’m talking about.
  • Incorrect result. Some pixels of a polygon can be discarded by alpha compare and thus not change color and depth buffers. Software rendering implemented in Glide64 renders only polygon’s depth without any care about its alpha. Thus, pixels which would normally be discarded by alpha compare are stored in the depth buffer. Again, it’s not a problem for coronas emulation, but very noticeable for depth buffer based fog. Run Beetle Adventure Racing with Glide64 – fog around trees is missing, because transparent parts of trees polygons were not discarded during software rendering. Software render have to do color rendering too to get alpha compare information. That is fully functional software rendering is necessary to get correct depth buffer. N64 software plugins/emulators don’t run full speed even on many modern CPU.
  • N64 depth compare is more complex process than the one which is usually used on PC. It uses not only “greater” and “greater-or-equal” equations, but also more complex ones. N64 uses not only ‘Z’ value of the pixel but also ‘Delta Z’: DeltaZpix = | dZdx | + | dZdy |. That is, ‘Delta Z’ value shows how polygon’s depth changes along X and Y directions. N64 depth compare mode may compare incoming Z with [stored Z – DeltaZ, stored Z + DeltaZ]. Software depth buffer obviously can’t help to emulate such a mode, because depth compare is done on video card side. OpenGL standard depth compare functions also do not work that way.
    Taking into account all said above I decided to choose another way and emulate N64 depth buffer on video card’s side. The main idea is to use OpenGL Shading Language to emulate N64 depth compare process. The final result should be texture with depth information, which can be copied into RDRAM. The obvious week point is necessity of copy from video memory into PC memory. However, for modern cards and modern GL copying 320x240 array of data is painless. My implementation of copying color buffer from video memory to RDRAM proved it.

    So, I begin to implement depth compare shader. Z values of N64 primitives have float type, but depth buffer format is 16bit unsigned integer. N64 uses pretty complex non-linear conversion of float Z into integer depth buffer value. Video plugins usually use pre-calculated lookup table for fast Z-to-depth_buffer conversion. I decided to continue that tradition and put the lookup table into 512x512 texture with one 16bit component. Next question is: how to make the result depth buffer texture? Depth buffer is read-write object: current value of the buffer is compared with incoming Z and if the new value passes the depth test it replaces the old one. First obvious solution was use Frame Buffer Object (FBO) with texture as render target. The idea is: create a texture with one 16bit integer component, attach it to FBO as color rendering target and pass it as texture input into depth shader. I quickly wrote a shader, which reads value from the texture, compares it with pixel’s Z and writes the result. The code was pretty simple, no place for mistake. Then I spent many days fighting with weirdest glitches. Result was absolutely non-presumable. I decided that I run into famous “ATI sucks” situation, and AMD GL drivers mock at me. I found PC with NVidia card to test my work. This time I got just black screen. Ok, if nothing helps, read the documentation. Quick dig in docs revealed the sad truth - I lost my time for a wrong idea. It is prohibited to Sampling and Rendering to the Same Texture. Documentation says that result is undefined in that case. AMD and NVidia gave me different results, but both results were bad. So, I had to find another way.

    Hopefully, one of readers of that blog, neobrain, gave me a working advice. The idea is to use ‘Image variables’ mechanism. Image variable is bound with some texture in video memory and provides pixel shader with read-write access to that texture. Exactly what I needed. That time I read the documentation first to not step on the same rakes. The documentation contains several vague moments, and I got a feeling that this way will not be easy too.

    I rewrote my code for Image Variables use. The task number one was to get depth texture suitable for copying to RDRAM. As you remember, N64 depth image format is 16bit unsigned integer. GL supports textures with 16bit unsigned integer components. Thus, natural solution was to use such texture for depth buffer data. Shader takes pixel’s Z, converts it to N64 format using lookup table texture, compares that value with the one from the buffer texture and writes the result if the new value is less than the current one. I implemented copy of that texture into RDRAM and run Zeldas to check how it works. Result was negative – no coronas in both Zelda games. I wrote a shader program for depth buffer based fog in Beetle Adventure Racing in hope to visualize my texture and see what is wrong. I got no fog. I wrote a test which just shows my depth texture on screen. The picture correlated well with the rendered scene. That is my depth shader works, but the result is not close enough to what should be. I decided to compare my result with the etalon – depth buffer rendered by Glide64. I dumped depth buffers in both plugins. Values in my hardware-rendered texture were ~10% less than in software-rendered etalon. I checked my code once again and did not find any mistake. As a last resort I decided to replace texture format from 16bit int to 32bit float. Bingo! Coronas are working, fog is working too. The code left the same, just texture format changed. I still don’t understand why integer format does not work: depth shader takes value for depth texture from the lookup table in both cases and there is no place where that value could lose precision. Enigma.

    So, I got correct hardware-rendered N64 depth buffer. That made possible emulation of depth buffer based effects. Next step is to emulate N64 depth compare on video card side. Why? First, it is interesting task by itself. Then, it should fix various problems with depth caused by incomplete depth compare emulation, namely:
    • Decal Surfaces. N64 has “special mode to allow the rendering of 'decal' polygons (usually with a texture on them, like a flag or logo) over a previously rendered opaque surface. Unlike normal rendering, here we only want to render the decal if it is coplanar with the existing surface.” It is pretty well emulated via glPolygonOffset usage, but sometimes visual glitches may appear because plugin uses static parameters for that function. N64 uses already mentioned DeltaZpix to test that two surfaces are coplanar.
    • Nearer vs InFront Compare. N64 has several depth compare modes. Some of them have direct analog in OpenGL, some does not. Graphics plugin usually use one depth compare function: either “less” or “less-or-equal”. N64 depth compare mode InFront  is the same as OpenGL ‘less”: InFront=PixZ<MemZ “Nearer” mode uses DeltaZ: Nearer=(PixZ-DeltaZmax)<=MemZ, where DeltaZmax=MAX(DeltaZpix,DeltaZmem). Sometimes it is well approximated by “less-or-equal”, sometimes it is not.
    • Farther and Nearer. Farther mode is similar to Nearer: Farther=(PixZ+DeltaZ)>=MemZ. N64 can use these modes together. In that case pixel Z must be in the interval [MemZ – DeltaZmax, MemZ + DeltaZmax] to pass the test. That mode has no analog in OpenGL. I know only one game, which uses that mode: 'Extreme-G'. Currently it is emulated only by software graphics plugins.
    So, depth compare shader must calculate not only N64 Z, but also DeltaZ must be calculated per pixel and stored in depth buffer. DeltaZ calculation with shaders is simple thanks to dFdx and dFdy functions. Code for load and store value in depth texture was already written. Equations for depth compare are very simple. So, shader program for N64 depth compare turned out short and simple.

    I disabled OpenGL depth compare to test my new depth compare shader. Main pixel shader discards pixel if its Z does not pass the depth shader test. First result was very disappointing. Many pixels which had to be discarded by the depth shader poke through covering polygons. I guess that precision loss during conversion of float Z to integer depth buffer value makes the depth buffer too rough for use in higher resolutions. I decided to store original pixel’s Z beside N64 Z in my depth texture and use original Z for depth test. Since pixel’s Z is float and depth texture components are floats there should not be any precision loss, so I expected that this time my texture based depth buffer will work as good as the standard OpenGL one. Alas, it does not. The result is much better than with depth compare based on original N64 depth buffer values, but still not perfect. Again, I don’t understand why. Probably I did not take into account some details related to synchronization of load-store processes for Image Variables. Nevertheless, some good results have been achieved.

    • N64 depth buffer emulation implemented on video card side
    • Coronas work perfectly
    • Depth buffer based fog works very well, but glitches sometimes appear
    • N64 depth compare works, but additional polishing is required.

    Now it’s time for some illustrations.

    Coronas emulation

    A short video, which shows why depth buffer rendering is important. Notice when coronas appears and disappears:

    Software depth buffer rendering VS hardware one.

    Software rendering. Black outline on building’s roof is caused by low resolution of fog texture.
    Silhouettes of trees on the left are obviously wrong.

    Hardware rendering. Perfect.

    Software rendering. Some black pixels again. Fog around the sign is missing due to ignored alpha test.

    Hardware rendering. Fog around the sign is correct, but it is wrong on some polygons on the left. I did not find yet why that happens.

    this video allows you to compare both methods side by side:
    Note: GLideN64 video is missing some frames on the start. This problem is caused by video capturing, the gameplay itself is smooth.

    Correct emulation of N64 depth compare modes

    With constant depth compare function shadows are drawn above the characters.

    With calculated depth function shadows are correct.

    Also a short video, which illustrates that with shaders even hardest depth compare modes can be emulated: