Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve apu pulse channel nonlinear mixing precision #89

Conversation

tsone
Copy link
Contributor

@tsone tsone commented Jan 24, 2025

Minor improvements to pulse channels non-linear mixing precision by increasing decimal bits in the lookup. Stereo mix on default (mono) setting now produces same output as the lookup. This fixes a 1/8192 error on output on some pulse 1 and 2 level combinations and removes some redundant left bit shifting.

Here is a python script used for testing:

import itertools

square_table_old = [0] + [int((8192.0*95.88)/(8128.0/i+100)) for i in range(1, 32)]
square_table_new = [0] + [round((16.0*8192.0*95.88)/(8128.0/i+100)) for i in range(1, 32)]

def mix_mono_old(out):
    return (square_table_old[out[0] + out[1]],) * 2

def mix_mono_new(out):
    return (square_table_new[out[0] + out[1]] >> 4,) * 2

def mix_old(out, sm):
    voltage = square_table_old[out[0] + out[1]]
    m = [x << 6 for x in out]
    ref = m[0] + m[1]
    if ref > 0:
        m = [(x * voltage) // ref for x in m]
    else:
        m = [voltage for _ in m]
    b = ((m[0]*sm[i][0] + m[1]*sm[i][1]) >> 7 for i in range(2))
    return tuple(b)

def mix_new(out, sm):
    ref = out[0] + out[1]
    if ref > 0:
        voltage = square_table_new[ref]
        m = [(x * voltage) // ref for x in out]
    else:
        m = [0] * 2
    b = ((m[0]*sm[i][0] + m[1]*sm[i][1]) >> 11 for i in range(2))
    return tuple(b)

def stereo_mix(pan):
    a,b = min(pan,128),min(256-pan,128)
    return ((a,b), (b,a))

sm_mono = stereo_mix(128)

print(f"\n# test 1: new mono vs new stereo (diffs)\n")
print("| out      | new mono     | new stereo   |")
print("| -------- | ------------ | ------------ |")
for out in itertools.combinations(range(16), 2):
    mono = mix_mono_new(out)
    new = mix_new(out, sm_mono)
    if mono != new:
        print(f"| {str(out):8} | {str(mono):12} | {str(new):12} |")

print(f"\n# test 2: old mono vs old stereo (diffs)\n")
print("| out      | old mono     | old stereo   |")
print("| -------- | ------------ | ------------ |")
for out in itertools.combinations(range(16), 2):
    mono = mix_mono_old(out)
    old = mix_old(out, sm_mono)
    if mono != old:
        print(f"| {str(out):8} | {str(mono):12} | {str(old):12} |")

print(f"\n# test 3: old stereo vs new stereo (diffs)\n")
print("| out      | old stereo   | new stereo   |")
print("| -------- | ------------ | ------------ |")
for out in itertools.combinations(range(16), 2):
    old = mix_old(out, sm_mono)
    new = mix_new(out, sm_mono)
    if old != new:
        print(f"| {str(out):8} | {str(old):12} | {str(new):12} |")

print("\n# test 4: pulse vol mix: old stereo vs new stereo (diffs)\n")
print("| mix | out      | old stereo   = sum  | new stereo   = sum  | new mono     |")
print("| --- | -------- | ------------------- | ------------------- | ------------ |")
for pan in (0, 64, 128, 192, 256):
    sm = stereo_mix(pan)
    for out in itertools.combinations(range(16), 2):
        old = mix_old(out, sm)
        new = mix_new(out, sm)
        mono = mix_mono_new(out)
        if old != new:
            print(f"| {pan:3} | {str(out):8} | {str(old):12} = {sum(old):4} | {str(new):12} = {sum(new):4} | {str(mono):12} |")

@bbbradsmith
Copy link
Owner

bbbradsmith commented Jan 25, 2025

The nonlinear mixing formula is not a precise formula, it's just the closest fit Blargg could find at the time to what they measured. I tried myself to measure several systems and improve it, but with the amount of variation other inherent noise, Blargg's seemed "good enough" and it felt difficult to pursue anything that could be considered definitively better.

Anyway, the point is that increasing the precision of this computation does not increase the accuracy of emulation at all. The formula itself is just an approximation, and the precision is already well beyond the errors of approximation and other un-emulated factors.

So... in theory, it is still okay to improve the precision of the code, though I feel it is probably not worth the effort required since it's not an emulation accuracy improvement (and all of this code will be obsoleted by NSFPlay 3). However, if you wish to proceed, I have questions about the code changes:

  1. m[0] = voltage has been replaced with m[0] = 0. Why? Unless I am mistaken, square_table[0] is not 0.

  2. >> 7+4 should just be >> 11. 16.0*8192.0 should just be 131072.0 or maybe double(1<<17) would be clearer.

  3. Can you comment on the reduction of headroom this produces? I haven't taken the time to follow the whole chain of computation to figure out the change? What is the new headroom? Are we certain that setting the APU and indidivual Square channel gains to high volumes does not produce clipping or overflow at a lower volume than previous?

This third question is the one that requires the most effort to verify.

@tsone
Copy link
Contributor Author

tsone commented Jan 27, 2025

Thank you for your reply and time.

Sorry for not being explicit. This PR is a minor fix, the error fixed is not a big deal. Perhaps confusingly there are two types of mixing involving this PR: 1. the APU non-linear mixing and 2. stereo mixing, and both are done in the same NES_APU::Render() function.

My fix targets specifically a minor error in the default mono mix for the pulse channels stereo output (i.e. the mono mix actual hardware produces). Prior to the fix the default pan/volume mono setting produced slightly different output from the square_table, while it should have been the same. The error was very minor: 1 in integer and only on some cases (i.e. 1/8192 of the output of the NES_APU class).

To correct this error I had to change NES_APU::Render() which performs the APU non-linear mixing for pulse channels, and change the square_table.

To illustrate the prior error, here's a table of differences between old mono mix (column marked 'stereo' in the table) vs the old lookup (marked 'mono') -- i.e. the errors. Note, the table contains only the incorrect error cases, not the correct ones, is generated with the python snippet in my previous comment, and is therefore not a functional test of this pull request C++ code. Column marked out is contents of out input to the NES_APU::Render() as (pulse1, pulse2) tuple. The old mono and old stereo are (left, right) tuple in the 8192-scale of the NES_APU class output. (The output then passes the rest of the audio pipeline: full APU output mix, DC removal, lowpass, etc.).

test 2: old mono vs old stereo (diffs)

out old mono old stereo
(1, 5) (539, 539) (538, 538)
(1, 6) (622, 622) (621, 621)
(1, 7) (703, 703) (702, 702)
(1, 10) (936, 936) (935, 935)
(1, 11) (1010, 1010) (1009, 1009)
(1, 12) (1083, 1083) (1082, 1082)
(1, 13) (1154, 1154) (1153, 1153)
(1, 14) (1223, 1223) (1222, 1222)
(1, 15) (1291, 1291) (1290, 1290)
(2, 4) (539, 539) (538, 538)
(2, 5) (622, 622) (621, 621)
(2, 6) (703, 703) (702, 702)
(2, 9) (936, 936) (935, 935)
(2, 10) (1010, 1010) (1009, 1009)
(2, 11) (1083, 1083) (1082, 1082)
(2, 12) (1154, 1154) (1153, 1153)
(2, 13) (1223, 1223) (1222, 1222)
(2, 14) (1291, 1291) (1290, 1290)
(2, 15) (1358, 1358) (1357, 1357)
(3, 4) (622, 622) (621, 621)
(3, 5) (703, 703) (702, 702)
(3, 8) (936, 936) (935, 935)
(3, 9) (1010, 1010) (1009, 1009)
(3, 10) (1083, 1083) (1082, 1082)
(3, 11) (1154, 1154) (1153, 1153)
(3, 12) (1223, 1223) (1222, 1222)
(3, 13) (1291, 1291) (1290, 1290)
(3, 14) (1358, 1358) (1357, 1357)
(3, 15) (1424, 1424) (1423, 1423)
(4, 7) (936, 936) (935, 935)
(4, 8) (1010, 1010) (1009, 1009)
(4, 9) (1083, 1083) (1082, 1082)
(4, 10) (1154, 1154) (1153, 1153)
(4, 11) (1223, 1223) (1222, 1222)
(4, 12) (1291, 1291) (1290, 1290)
(4, 13) (1358, 1358) (1357, 1357)
(4, 14) (1424, 1424) (1423, 1423)
(4, 15) (1488, 1488) (1487, 1487)
(5, 6) (936, 936) (935, 935)
(5, 7) (1010, 1010) (1009, 1009)
(5, 8) (1083, 1083) (1082, 1082)
(5, 9) (1154, 1154) (1153, 1153)
(5, 10) (1223, 1223) (1222, 1222)
(5, 11) (1291, 1291) (1290, 1290)
(5, 12) (1358, 1358) (1357, 1357)
(5, 13) (1424, 1424) (1423, 1423)
(5, 14) (1488, 1488) (1487, 1487)
(5, 15) (1551, 1551) (1550, 1550)
(6, 7) (1083, 1083) (1082, 1082)
(6, 8) (1154, 1154) (1153, 1153)
(6, 9) (1223, 1223) (1222, 1222)
(6, 10) (1291, 1291) (1290, 1290)
(6, 11) (1358, 1358) (1357, 1357)
(6, 12) (1424, 1424) (1423, 1423)
(6, 13) (1488, 1488) (1487, 1487)
(6, 14) (1551, 1551) (1550, 1550)
(6, 15) (1612, 1612) (1611, 1611)
(7, 8) (1223, 1223) (1222, 1222)
(7, 9) (1291, 1291) (1290, 1290)
(7, 10) (1358, 1358) (1357, 1357)
(7, 11) (1424, 1424) (1423, 1423)
(7, 12) (1488, 1488) (1487, 1487)
(7, 13) (1551, 1551) (1550, 1550)
(7, 14) (1612, 1612) (1611, 1611)
(7, 15) (1673, 1673) (1672, 1672)
(8, 9) (1358, 1358) (1357, 1357)
(8, 10) (1424, 1424) (1423, 1423)
(8, 11) (1488, 1488) (1487, 1487)
(8, 12) (1551, 1551) (1550, 1550)
(8, 13) (1612, 1612) (1611, 1611)
(8, 14) (1673, 1673) (1672, 1672)
(8, 15) (1732, 1732) (1731, 1731)
(9, 10) (1488, 1488) (1487, 1487)
(9, 11) (1551, 1551) (1550, 1550)
(9, 12) (1612, 1612) (1611, 1611)
(9, 13) (1673, 1673) (1672, 1672)
(9, 14) (1732, 1732) (1731, 1731)
(9, 15) (1790, 1790) (1789, 1789)
(10, 11) (1612, 1612) (1611, 1611)
(10, 12) (1673, 1673) (1672, 1672)
(10, 13) (1732, 1732) (1731, 1731)
(10, 14) (1790, 1790) (1789, 1789)
(10, 15) (1847, 1847) (1846, 1846)
(11, 12) (1732, 1732) (1731, 1731)
(11, 13) (1790, 1790) (1789, 1789)
(11, 14) (1847, 1847) (1846, 1846)
(11, 15) (1903, 1903) (1902, 1902)
(12, 13) (1847, 1847) (1846, 1846)
(12, 14) (1903, 1903) (1902, 1902)
(12, 15) (1958, 1958) (1957, 1957)
(13, 14) (1958, 1958) (1957, 1957)
(13, 15) (2012, 2012) (2011, 2011)
(14, 15) (2065, 2065) (2064, 2064)

My fix corrects the above differences (error).

  1. m[0] = voltage has been replaced with m[0] = 0. Why? Unless I am mistaken, square_table[0] is not 0.

You are mistaken. See the line square_table[0] = 0;.

  1. >> 7+4 should just be >> 11. 16.0*8192.0 should just be 131072.0 or maybe double(1<<17) would be clearer.

Thanks, I'll certainly rather go with (1<<17). But I'd leave out the double cast as per C++ implicit upcasting rules this cast is redundant.

  1. Can you comment on the reduction of headroom this produces? I haven't taken the time to follow the whole chain of computation to figure out the change? What is the new headroom? Are we certain that setting the APU and indidivual Square channel gains to high volumes does not produce clipping or overflow at a lower volume than previous?

The additional 4 bits precision and the round() in the square_table lookup generation were found using experimentation with the python snippet in my previous comment. About the snippet, central is the mix_old() function which implements the old NES_APU::Render() function. The function is then tested against using the known range of inputs (integer range 0..15 for both pulse channels) to verify the new version (mix_new() function) fixes the issues.

@bbbradsmith
Copy link
Owner

bbbradsmith commented Jan 27, 2025

Just to reiterate, this is not an emulation accuracy issue, it's just improving the precision of a formula that is already much more inaccurate than its precision. The difference will not be more accurate emulation, nor will it be audible. So, if you really want to proceed, I will continue, but to me it doesn't seem worth the time for either of us, doubly so because the code will be made obsolete by NSFPlay 3.

  1. m[0] = out[0] << 6 is removed. This was in the original inherited code and never changed, but I assume the intent was better precision on the division by ref. It seems that you have verified that this makes no difference?

  2. m[0] = 0 this is valid, thank you for correcting me. Could you add a comment // square_table[0] = 0; just above that line to make it explicitly clear why this is valid?

  3. square_table gets rounding and a 16*, the right shift gets a +4, this is the part that I want to know about overhead changes. As you mentioned, this is applying both the nonlinear curve and the stereo mix at the same time, so my overhead question is about both:

    When you turn the main volume, the APU mix volume, and the volume for both channels up to their maximum (i.e. max stereo mix) and fully panned left or right, what was the previous headroom and what is the current headroom now in the calculation? The highest possible value (b[0] and b[1] before shifting) in this chain of computation was increased by 4 bits, what was it before, and what is it now?

  4. You generated a table of results, but as per introducing rounding I should expect 50% of them to be off by one. I don't think it's really useful to try and look at ~128 cases of off-by-one manually. Are there any interesting cases that you would like to point out? I don't really want to scrutinize all of them to see if there is something special here.

    If the PR was just adding rounding to the square_table generator, I would have no problem merging that by itself. That change in particular seems uncontroversial and uninteresting. The other parts are what I have questions about.

@tsone
Copy link
Contributor Author

tsone commented Jan 29, 2025

Just to reiterate, this is not an emulation accuracy issue, it's just improving the precision of a formula that is already much more inaccurate than its precision. The difference will not be more accurate emulation, nor will it be audible. So, if you really want to proceed, I will continue, but to me it doesn't seem worth the time for either of us, doubly so because the code will be made obsolete by NSFPlay 3.

Yes, that's a reasonable perspective. As background, I'm emulating mixing implementations of various emulators (mesen, fceumm, nestopia) in my music creation tool. Being able to quickly switch between different emulator mixing helps in song mastering. This was motivated mainly by the vast differences in emulator expansion audio mixing. nsfplay is one emulator I looked at. I also use nsf2wav for functional testing. During my research I noticed the minor issue which this PR solves.

  1. m[0] = out[0] << 6 is removed. This was in the original inherited code and never changed, but I assume the intent was better precision on the division by ref. It seems that you have verified that this makes no difference?

Yes. The lines were redundant.

  1. square_table gets rounding and a 16*, the right shift gets a +4, this is the part that I want to know about overhead changes. As you mentioned, this is applying both the nonlinear curve and the stereo mix at the same time, so my overhead question is about both:
    When you turn the main volume, the APU mix volume, and the volume for both channels up to their maximum (i.e. max stereo mix) and fully panned left or right, what was the previous headroom and what is the current headroom now in the calculation? The highest possible value (b[0] and b[1] before shifting) in this chain of computation was increased by 4 bits, what was it before, and what is it now?

I'll explain it in detail... The new square_table is unsigned 17 bits and has max integer value 34697. out is unsigned 4 bits w/ max value 15. sm (stereo matrix) has unsigned 9 bits w/ max value 256. All computations are unsigned integer.

First, assume computations don't reach 32 bits range so that accumulating carry from addition and subtraction don't need to be accounted. Hence only multiplications and divisions are relevant:

  1. m[0] = out[0] * voltage / ref: The temporary multiplication product is 4+17=21 bits. After division, resulting m will be 17 bits (same as voltage). Fits in 32 bits.
  2. b[0] = m[0]*sm[0][0] + m[1]*sm[0][1]: Multiplication is 9+17=26 bits. Fits in 32 bits
  3. b[0] >>= 4+7: The b contents are viewed as fixed-point format UQ22.4, division with floor() is performed with >> 4, yielding UQ22.0 temporary. Then stereo mix division by 128 is performed with >> 7 (also rounding with floor()). The shifts can be combined to >> 11 because floor() rounding in both.

32 bits is not reached so all is good. And, as assumed, accounting for the accumulating carry was irrelevant.

  1. You generated a table of results, but as per introducing rounding I should expect 50% of them to be off by one. I don't think it's really useful to try and look at ~128 cases of off-by-one manually. Are there any interesting cases that you would like to point out? I don't really want to scrutinize all of them to see if there is something special here.

Sorry but there is no off by one. The new round() lookup improves accuracy over the previous INT32 cast (floor()). This is because square_table contains precalculated integer approximations of a floating-point pulse channel component of the APU non-linear mix equation, and the effective floor() at line b[0] >>= 11.

I will explain using an example decimal value of 1.997. Decimal is more intuitive than binary, but the principle is the same. Think this example as if Render() only reads the lookup and performs the >> 4 at the end, i.e. there's no stereo mixing etc. The decimal point two-digit approximation of 1.997 with rounding is 2.00 and with flooring 1.99. The 2.00 with rounding is closer to the original 1.997. The square_table contains approximations like this (has always contained).

The lines b[0] >>= 11 effectively performs floor(), throwing away the least significant bits of the fixed-point value in b. Continuing the decimal example of 1.997, where we had 2.00 and 1.99 results from rounding and flooring to two decimal points digits respectively. Think of these results as stored in the square_table lookup. Now as we lookup a value from the table and throw away (floor()) the least decimal digit from each (at b[0] >>= 11), the rounded lookup value 2.00 results 2.0 and floored 1.99 results 1.9. The 2.0 result from the rounded lookup is closer to the original value of 1.997 and so the approximation is more precise.

@tsone tsone force-pushed the improve_apu_pulse_channel_nonlinear_mixing branch from a9f4c13 to 245f03a Compare January 29, 2025 03:42
@tsone
Copy link
Contributor Author

tsone commented Jan 29, 2025

BTW Considering performance, unsigned integers produce in some cases faster code than signed. So it might be beneficial in the future to change Render() to use UINT32.

@bbbradsmith
Copy link
Owner

1 Thank you for the confirmation.

3 Thank you for showing the resulting range of output is not compromised.

4 I don't really understand your response here. What is it supposed to mean that integer and floating-point are in bold? I assure you that I would know their meaning when written plainly. I am glad that you want to help with this project, but everything you wrote here reads in a tone that is very strange to me, and I don't know what useful purpose it could have. I will return to this at the end.

I'll try to explain my question again. You sent above a table with ~100 results in it. As far as I could tell, every result in this table "old mono" differs by one from "old stereo". However, as I am trying not to belabour, NSFPlay 2 is on life support, and I don't want to spend the time carefully looking at 100 rows which all might say the exact same thing, or even counting how many there are. My request was that you summarize what information that table was meant to convey so that I can more easily understand the difference you're trying to explain to me. Were there any rows in that table that were not "old mono" = "old stereo" + 1? That was my question.

Rereading your original post, I see that I made a mistake in that I assumed the two columns were actually the result with the old table compared to the new table. I see now that you are comparing the existing mix code ("old stereo") with the square table itself ("old mono")? What I guess you were saying is the current stereo mix has lost half a bit of precision due to its truncation, which wasn't actually about the rounding of the table itself. It was unclear to me on first reading that this is what you were trying to express. It was unintuitive to me that you were comparing the existing mix to a hypothetical "old mono" mix, so I had just assumed it was a point about the new mix.

My expectation was that you were showing me a table of values from the old mix versus your new mix. This is, after all, what I think is most important to evaluate. What difference does the change ultimately produce?

BTW Considering performance, unsigned integers produce in some cases faster code than signed. So it might be beneficial in the future to change Render() to use UINT32.

I don't think this would be reasonable. Even though the NES APU output is only positive, that is not at all true of most of the rest of the pipeline, all the way down to the end result which must be signed output, so this seems like quite a large change that would touch the entire rendering code across the whole program.

Secondly, "some cases" is extremely specific, and switching them is not any kind of guaranteed speedup. Offhand, I think dealing with the needed bias for an unsigned audio pipeline would probably be a net slow-down here. The canonical speedup example that I know of regarding unsigned integers is in loop counters, and we don't do this with audio.

Is there an example in this code which you'd like to justify this with a generated assembly example? If you've found something specific and with practical performance benefit, I'd be happy to look at it and learn from it, but I don't see how this suggestion could be considered from a rule-of-thumb perspective.

summary

You've demonstrated that your changes are sound. I would be happy to accept with PR, though there are two more things I'd like to see.

  1. Combine 16.0 * 8192.0 into one value. Either 131072.0 or double(1<<17) or (1<<17) would be acceptable if you insist on the type conversion being implicit.

  2. It would be very helpful to evaluate the actual practical result. A table would be good, but please make it easier to read. Don't use 256 rows, don't give redundant stereo pairs. Maybe just make a 16x16 grid. What I want to know is: "What is the relative precision increase for each value in the table?" To that end, I think just showing the new result and the relative +/- difference from the old one would suffice, so we could see how big the change is compared to the magnitude of each value. Also, what's the average precision gain across all the values? Something like this, perhaps:

... 7 8 ...
... ... ... ... ...
5 ... 1009 +3 1082 +10 ...
6 ... 1082 -4 1153 -2 ...
... ... ... ... ...

As a secondary matter... I really am very happy that you want to contribute to NSFPlay, and I don't wish to alienate you. I'd rather you stick around if you're interested in helping. This is why I did not simply close the pull request after explaining that this code is in end-of-life maintenance, and that increasing precision to this formula does not reflect emulation accuracy.

My questions were an attempt to relieve my burden of work in accepting this PR. I need to know that the issues I would consider were considered (e.g. calculation overhead), I need to verify that there is a reason for changes that were made without comment about them, I need to know that you've built and tested this code and listened to the result. In this respect, I have failed at relieving this burden, but open source software is as much a social problem as it is a technical one, and if you're willing to do the work I want to honour that by scrutinizing it and then accepting it.

Finally, please don't condescend to me like you just did. That response really crossed a line, and the burden of trying to find an appropriate response is a lot more time wasting than I would like. I don't know too much about you, as your github profile is fairly private, but if you'd like some context my profile has links to a lot of information about my career. I don't need an explanation of what a floating point is, and I don't know what led you down the path of thinking that was the root nature of my question, but if you find yourself on this point again in the future, please consider whether there's a more polite way to verify whether the issue is understanding of a basic common concept like this.

@tsone
Copy link
Contributor Author

tsone commented Feb 4, 2025

Were there any rows in that table that were not "old mono" = "old stereo" + 1?

No.

I see now that you are comparing the existing mix code ("old stereo") with the square table itself ("old mono")?

Yes. square_table contains "ideal" (APU non-linear mix equation) output of the hardware which is mono. So the table should be the reference for the Render() output mono mix.

What I guess you were saying is the current stereo mix has lost half a bit of precision due to its truncation, which wasn't actually about the rounding of the table itself.

Yes, some precision is lost in the rounding but the main thing is the added 4 bits in square_table values. The stereo mixing in Render() requires this extra precision for even the default mono output to match the square_table values.

My expectation was that you were showing me a table of values from the old mix versus your new mix. This is, after all, what I think is most important to evaluate. What difference does the change ultimately produce?

The error was around 1/8192 measured from the output of the Render() function, which in integer is 1. So it's imperceptible.

Is there an example in this code which you'd like to justify this with a generated assembly example?

I don't have. Not probably reasonable for just NES audio, although for some more complex audio emulation it may be worth it.

  1. Combine 16.0 * 8192.0 into one value. Either 131072.0 or double(1<<17) or (1<<17) would be acceptable if you insist on the type conversion being implicit.

Integer literal to floating-point literal cast is implicit (assuming C++17).

  1. It would be very helpful to evaluate the actual practical result. A table would be good, but please make it easier to read. Don't use 256 rows, don't give redundant stereo pairs. Maybe just make a 16x16 grid. What I want to know is: "What is the relative precision increase for each value in the table?" To that end, I think just showing the new result and the relative +/- difference from the old one would suffice, so we could see how big the change is compared to the magnitude of each value. ...

My change is trivial and I find this unacceptable. I won't do it. Sorry.

As a secondary matter... I really am very happy that you want to contribute to NSFPlay, and I don't wish to alienate you. I'd rather you stick around if you're interested in helping. This is why I did not simply close the pull request after explaining that this code is in end-of-life maintenance, and that increasing precision to this formula does not reflect emulation accuracy.

Thanks. You're welcome. If the code is close end of life, then it shouldn't be that much of relevance. I've given enough proof in my opinion. The changes are trivial and can be verified by reading.

Finally, please don't condescend to me like you just did. That response really crossed a line, and the burden of trying to find an appropriate response is a lot more time wasting than I would like. I don't know too much about you, as your github profile is fairly private, but if you'd like some context my profile has links to a lot of information about my career. I don't need an explanation of what a floating point is, and I don't know what led you down the path of thinking that was the root nature of my question, but if you find yourself on this point again in the future, please consider whether there's a more polite way to verify whether the issue is understanding of a basic common concept like this.

Sorry to rub you in the wrong way. It wasn't my intention.

PS. Have you considered functional tests? Perhaps have a bunch of reference .flac audio files to test against in Git LFS. They'd be generated by previous validated build output. PRs would be then tested by GitHub actions against these to catch differences of output. I've got some python code using two techniques to test match audio files. For some tests, like waveform generation, one would disable some parts of the pipeline, ex. lowpass filter or stereo mixing. Anyway, potential issue is that audio comparison is a bit computationally expensive, so it's possible free GitHub hosts might take a long time to run the tests. But I don't know.

@bbbradsmith
Copy link
Owner

bbbradsmith commented Feb 4, 2025

My expectation was that you were showing me a table of values from the old mix versus your new mix. This is, after all, what I think is most important to evaluate. What difference does the change ultimately produce?

The error was around 1/8192 measured from the output of the Render() function, which in integer is 1. So it's imperceptible.

So, wait, are you saying the result of the whole change never makes more than a difference of +1 in any of the outputs?

Integer literal to floating-point literal cast is implicit (assuming C++17).

NSFPlay 2 is not C++17, it is currently using MSVC v141's dialect of C++, though the majority of the code is more than 20 years old.

However the suggestion was my personal preference for the explicit version, not a lack of understanding of C++ promotion. Honestly I do not care much between the 3, just it should not be 2 things multiplied together.

  1. It would be very helpful to evaluate the actual practical result. A table would be good, but please make it easier to read.

My change is trivial and I find this unacceptable. I won't do it. Sorry.

Well, the question I'm asking is what is the impact of this change on the output. This is something you haven't given a clear answer for. Your change adds 4 extra bits of precision on square_table, but what you said above suggests there was only a gain of 0.5 bits of precision in the output?

If you want to make a functional change to end-of-line code that has been stable for 13 years, I want the functional impact to be measured.

You already wanted to make a table to show me some impact of the code in your OP, I assumed it would not be very much trouble to modify your python script to show what I was suggesting instead, so I'm surprised at the refusal, but okay, this is not the only way. I would just like you to answer the question as best you can in whatever method you think is reasonable.

The changes are trivial and can be verified by reading.

If your answer is "verify it yourself", then I'm sorry, and I wish this had not persisted past my first response explaining that I believe this change is not worth the risk or time investment. Obviously it is much more trivial to you than it is to me, but if it's not worth your time either, then please let's not proceed any further with this.

PS. Have you considered functional tests? Perhaps have a bunch of reference .flac audio files to test against in Git LFS. They'd be generated by previous validated build output. PRs would be then tested by GitHub actions against these to catch differences of output. I've got some python code using two techniques to test match audio files. For some tests, like waveform generation, one would disable some parts of the pipeline, ex. lowpass filter or stereo mixing. Anyway, potential issue is that audio comparison is a bit computationally expensive, so it's possible free GitHub hosts might take a long time to run the tests. But I don't know.

For NSFPlay 3 I do want to do some sort of unit testing, but I won't work on deciding what's a worthwhile test until I get much further along on it. In general I probably don't want to do much audio verification like that, but choosing tests is always a difficult matter of finding a tradeoff between the cost of maintaining the test, and the cost of the expected code errors it should check against.

@bbbradsmith
Copy link
Owner

I apologize for allowing it to go this far, but since the code change does not produce a significant functional difference in the output, it should remain as-is.

@bbbradsmith bbbradsmith closed this Feb 4, 2025
@tsone
Copy link
Contributor Author

tsone commented Feb 5, 2025

Please don't close this PR. Please reconsider.

I apologize for allowing it to go this far, but since the code change does not produce a significant functional difference in the output, it should remain as-is.

The change in this PR is valid and functional. The change is simple enough to be verified by reading, and for additional demonstration and proof, I provided you explanations, data and a python snippet in the PR message. So there is some burden in proof I've taken, but that doesn't seem to be enough...? How much do you need?

I additionally did the changes you requested. I did what you wanted. Although the improvement from this change is minor, it's still an improvement.

PS. Now you're aware of this issue, so naturally NES_DMC::InitializeTNDTable() and NES_DMC::Render() may have similar issues.

@bbbradsmith
Copy link
Owner

bbbradsmith commented Feb 5, 2025

Again, I'm sorry. I misunderstood the nature of this PR from the start. I should have realized right away and rejected it, and I apologize for letting it carry on like this.

To clarify, the closing is ultimately not about work you did or did not do. The code itself looks OK, but once I understood that the magnitude of change to the output, I realized it is not a PR I can accept.

I can accept PRs that improve the function of the program for a user, but I categorically don't wish to take PRs that only cosmetically alter the code. This code is more than 20 years old, and it is in its end-of-life phase. There are an endless number of cosmetic issues with the code, and they are simply not worth the cost to address. It is more important to have the code remain stable and the way it is, than introduce a lateral change to its history.

@tsone
Copy link
Contributor Author

tsone commented Feb 5, 2025

This code is more than 20 years old, and it is in its end-of-life phase. There are an endless number of cosmetic issues with the code, and they are simply not worth the cost to address. It is more important to have the code remain stable and the way it is, than introduce a lateral change to its history.

I see... I respect your conclusion here but I don't like how my effort was wasted. I have two suggestions to avoid this mistake happening again. Make the above information official and explicit, that you're not looking for active involvement in the old nsfplay code.

I have two suggestions to present this information based from my point of view (I may not fully understand, but I hope this is helpful):

  1. Disclaimer option: Add to the readme on master branch in very loud letters something like the following: "Notice: New nsfplay v3 is under development and the master branch (nsfplay v2) is approaching end of life. We are not looking for active involvement in the master branch (nsfplay v2). Thank you."
  2. Archive option: Create a new GitHub repo, let's call it maybe nsfplay3, clone this old nsfplay repo there (maintaining commit history), and make the new repo main branch the nsfplay v3 development branch. Then change the old nsfplay repo readme to direct to the new nsfplay3 repo. Then archive the old nsfplay repo. (BTW This is what Mesen did. Their old repo: https://github.com/SourMesen/Mesen and new repo: https://github.com/SourMesen/Mesen2)

I think the Archive option is better if you want to completely quit maintaining the old nsfplay codebase. The Disclaimer option is more open-ended.

@bbbradsmith
Copy link
Owner

bbbradsmith commented Feb 5, 2025

This is not an archive. I did just accept a PR from you. I am just not willing to accept PRs of this one's kind, and allowing it to proceed was my mistake, for which I apologize. I misunderstood the information you presented. There are no planned future releases for NSFPlay 2, but if there was some unexpected fix or functionality upgrade that warranted one I would do another release.

@tsone
Copy link
Contributor Author

tsone commented Feb 5, 2025

Apology accepted and thanks for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants