Category Archives: Prophet

Prophet 4.3 and chess4j 5.1 released

I’m happy to finally announce the next minor release of both of my chess programs! You can grab Windows and Linux binaries from the respective Github repos:

Prophet – https://github.com/jswaff/prophet

chess4j – https://github.com/jswaff/chess4j

If you need a Mac build for Prophet, please look here. (Thank you Darius!)

The focus of this release was to continue to improve the evaluation. Prophet 4.3 and chess4j 5.1 have the exact same change log:

  • Passed pawn by rank (was a single value)
  • Non-linear mobility (was a single value)
  • Knight outposts
  • Trapped bishop penalty

These changes are worth about +50 ELO in Prophet (which I expect will bring it very close to the 2500 mark on the CCRL Blitz List). I attempted a “supported rook” term, meaning the rook on an open file is connected to another rook, but surprisingly it actually cost a few ELO. Seems that should work though, so I’ve left the code in place but have it commented out.

I had planned on doing some pawn and basic endgame work in this line, and perhaps I still will, but right now I feel the time is right to begin work on neural networks. I’m pausing development for a while to study the literature. Hopefully by spring I’m ready to begin the implementation.

Knight Outposts

Prophet and chess4j now understand knight outposts. An outpost, as implemented in Prophet, is a square that cannot be attacked by an enemy pawn. Putting a knight on an outpost can be a strong advantage, particularly if that knight is supported by a friendly pawn.

In the following diagram, the knight on D4 is on an outpost square, but the knight on E4 is not since it may be run off by the F7 pawn at some point.

The bonus (or penalty) given for an outpost varies by square. An additional bonus is given if the outpost is supported, such as the knight on the D4 square above. The “supported” bonus also varies by square. This is possibly overkill, but with an auto-tuner , I reasoned the more knobs and dials it has to minimize error the better. (Or at least, it can’t hurt as long as we guard against over-fitting.)

As expected this feature isn’t a huge gain in terms of ELO, but it did net a few points. It also puts the latest development version at +50 ELO over Prophet 4.2, which was my goal before doing a new release. Before doing a release I’m going to test a couple more terms, both expected to be minor gains at most, but after that I’m going to switch gears so it’d be good to clear them from the board. Those terms are “trapped bishop” and “supported rook on open file.”

The Prophet and The Gibbon

Graham Banks recently ran a blitz tournament (40 moves/16 minutes) he titled ‘The Prophet and The Gibbon’ between 16 engines, including Prophet 4.2.

Final Standings

21.5 – Prophet 4.2 64-bit
21.0 – SoFCheck 0.9.1-beta 64-bit
18.0 – Gibbon 2.69a 64-bit
18.0 – Isa 2.0.83 64-bit
17.0 – Queen 4.03
16.5 – Fornax 3.0 64-bit
16.5 – Barbarossa 0.6.0 64-bit
14.5 – Horizon 4.4
13.0 – Jazz 840 64-bit
13.0 – Sage 3.53
13.0 – EveAnn 1.72
13.0 – CeeChess 1.3.2 64-bit
12.5 – Napoleon 1.8 64-bit
12.0 – StockNemo 3.0.0.2 64-bit
11.0 – FireFly 2.7.2 64-bit
9.5 – Ares GB 1.1 64-bit

Woo hoo!


The complete tournament pgn (zipped) can be downloaded here:
http://kirill-kryukov.com/chess/discuss … p?id=51143

Passed pawns and Non-linear Mobility

Since I released Prophet 4.2 I’ve made a couple of additional evaluation changes:

  1. The passed pawn bonus has been made more granular. Where it used to be a simple bonus for a passed pawn, now it varies depending on the pawn’s rank. 40,000 bullet games says that change was worth about 14 ELO.
  2. Bishop and queen mobility has been made non-linear. This change was inspired by Erik Madsen’s MadChess blog – https://www.madchess.net/2014/12/16/madchess-2-0-beta-build-29-piece-mobility/ . The idea is to encourage piece development. I had originally plugged Erik’s values in verbatim, but they didn’t mesh well with existing weights and testing showed it weakened the program. After running the auto-tuner, this change brought in an additional 22 ELO.

In my first attempt at running the auto-tuner, I just started with the previously tuned weights, plus Erik’s values for bishop and queen mobility, but the tuner couldn’t seem to find any improvements. The error bounced around a little, going up and down, and not making any progress. I eventually decided to do a complete reset. I set the piece values to the traditional 1/3/3/5/9 values, and everything else to 0. Then I re-tuned and validated with some bullet games. The learning curve:

Fresh off the heals of these improvements, Prophet played in an informal online engine blitz tourney today. Unfortunately it was a pretty rough outing, placing just 16/20 with 2.5 points out of 9. It was a very strong field though. Even the 10th place finisher is nearly 3000 ELO on CCRL’s 40/2 list.

:Tourney Players: Round 9 of 9 
:
:     Name              Rating Score Perfrm Upset  Results 
:     ----------------- ------ ----- ------ ------ ------- 
:  1 +LczTinker         [2971]  6.5  [2937] [   0] +07w =05w =06b =03b +02w =04b =09w +11w +12b 
:  2 +NightmareX        [2909]  6.5  [2939] [   0] +12w =06w +09b =04w -01b =05w +07b +08w +11b 
:  3 +ChessSystemTalX   [2900]  6.5  [2898] [  35] +10w +09w =08b =01w =06b +07w =05b =04w +13b 
:  4 +RubiChess         [2875]  6.5  [2947] [  77] +13w +11w =05b =02b +08w =01w =06b =03b +09w 
:  5 +ArasanX           [2859]  6.0  [2836] [ 110] +14w =01b =04w =08b +12w =02b =03w =09b +16w 
:  6 +WaspX             [2830]  6.0  [2679] [ 181] +15w =02b =01w +17b =03w =08b =04w +13w =10b 
:  7 +TheBaron          [2569]  5.5  [2457] [   3] -01b +14w +16w =11b +17w -03b -02w +18b +15b 
:  8 +Goldbar           [2861]  5.0  [2533] [  19] +16w +17b =03w =05w -04b =06w =13b -02b +18w
:  9 +Marvin            [2752]  5.0  [2683] [ 162] +18w -03b -02w +16b +11w +12b =01b =05w -04b 
: 10 +Nalwald           [2500]  5.0  [2325] [ 165] -03b +18w =15b -12b -13w +17b +16w +14b =06w 
: 11 +atomGoldbar       [2575]  4.5  [2480] [   0] +20w -04b +13w =07w -09b +16b +14w -01b -02w 
: 12 +WaDuuttie         [2567]  4.5  [2411] [   0] -02b +15w =14b +10w -05b -09w +18b +17w -01w 
: 13 +rpiArminius       [2272]  4.0  [2425] [ 522] -04b +20w -11b =14w +10b +15w =08w -06b -03w 
: 14 +atomFloyd         [2242]  4.0  [2267] [ 177] -05b -07b =12w =13b +15w +18w -11b -10w +17b 
: 15 +Skiull            [1966]  3.0  [2170] [ 410] -06b -12b =10w +18w -14b -13b +17w =16b -07w 
: 16 -Prophet           [2253]  2.5  [2325] [ 351] -08b +19w -07b -09w +18b -11w -10b =15w -05b
: 17 -Skipper           [1662]  2.0  [2219] [1120] +19b -08w +18b -06w -07b -10w -15b -12b -14w 
: 18 +atomSargon        [1840]  0.0  [1974] [   0] -09b -10b -17w -15b -16w -14b -12w -07w -08b 
: 19 +atomNightmare     [forf]  0.0  [1557] [   0] -17w -16b 
: 20 +POS               [forf]  0.0  [2023] [   0] -11b -13b 
:
:     Average Rating    2474.2 

Next up- I’m going to continue with the mobility theme a little longer by testing rook mobility, then knight outposts, trapped bishops, and connected rooks on open files. I don’t expect any of those will be big points by themselves but cumulatively they might be worth a bit.

Prophet 4.2 and chess4j 5.0 are released

I’m happy to announce updates to both chess engines. Prophet 4.2 is approximately 50 elo stronger than 4.1, and 150 elo stronger than 4.0. (I missed a release announcement or two while this development blog was offline.) The most significant change, and the reason the chess4j major version number has been incremented, is that chess4j now includes an auto tuner! The tuner uses logistic regression with gradient descent to optimize evaluation terms. I’ll write more detail about that in a separate post. Those optimized weights have been added into Prophet, so it benefits from that work as well. Tapered evaluation has been fully implemented as well which added a few elo. I say “fully” because the king evaluation was already tapered, but now both programs fully evaluate the position with a middle game and an endgame score, and weight them based on material on the board. Some concept of mobility has been added as well – a simple count of available squares for both bishops and queens.

Here is how Prophet 4.2 stacks up against its current sparring partners in 1+0.5 games:

RankNameEloGamesScoreDraws
1tantabus-2.0.0824625063%22%
2arasan-13.4674625060%21%
3barbarossa-0.6.0584625059%20%
4qapla-0.1.1314625055%24%
5prophet-4.2244000053%24%
6loki-3.5234625054%23%
7myrddin-0.88-24625050%24%
8prophet-4.1-364000044%23%
9tjchess-1.3-834625037%21%
10jazz-840-1344625030%19%
11prophet-4.0-1411000030%18%

PVS – take 2

Some time back I tried implementing a Principal Variation Search , but as I wrote about in my post PVS – Another Fast Fail , the results were not good. At the time I concluded that if PVS is not a win, it must mean that the cost of the researches is outweighing the nodes saved by doing zero width searches. For that to be the case, it must mean that too often the first move is not the best move, which points to move ordering.

Since then move ordering has certainly improved, as documented in this post on Move Ordering . So, I decided to give PVS another try. In my first attempt, it appeared to be another loss. Then, I decided to not do PVS at the root node, and now it appears to be a very small win.

A win is a win, so I’m merging the changes in, but I think there is more to do here. My suspicion is that, as move ordering improves, the benefits of PVS will increase. The most obvious way to improve move ordering is to add a depth preferred hash table (the current strategy is a very naive “always replace”).

It seems like PVS at the root should work as well, if the program can reliably predict the best move often enough. I know a lot of programs put extra effort into ordering the moves at the root. I remember reading that Bob Hyatt’s Crafty does a quiescence search at the root. So, this is on the backlog as well, and once complete I will revisit the idea of PVS at the root.

For now, it is on to the next thing – Late Move Reductions. I’m hopeful that will yield a significant ELO increase, perhaps finally putting P4 on par with P3.

Small Improvement to “bad captures”

In my recent post on move ordering, I identified a potential area of improvement to the criteria for deciding if a capture is “good” or “bad.” As I wrote in that post, a capture is good if:

  1. It is a promotion (technically even non-capturing promotions are included)
  2. The value of the captured piece is greater than the value of the capturing piece
  3. The Static Exchange Evaluator (SEE) score is non-negative.

The issue is with knights and bishops. They are roughly the same value (which one is more valuable really depends on the position), but in Prophet the bishop has a slightly higher value. A knight has a material value equal to 3 pawns, but a bishop has 3.2 pawns. The consequence of that is that a simple Bishop x Knight capture would be categorized as “bad” and not tried until all non-captures have been tried.

I don’t have the link handy but I read an older post on talkchess.com where Tord Romstad, the author of Glaurang (pre-cursor to Stockfish), mentioned that he used different piece values for the purposes of move ordering than he did in the evaluation. He said he used 1, 3, 3, 5 and 10. That means Bishop x Knight captures, as well as Knight x Bishop captures would both be categorized as “good.” Also, by giving the queen a value of 10, it means that giving two rooks for a queen would be considered equal by the SEE, where giving a queen + pawn for two rooks would have a negative score.

Sure enough, that simple change seems to be worth about 6 ELO.

Pruning “bad” captures in quiescence

As suspected the change to move ordering to separate good captures from bad captures has already paid off. Moving bad captures to the bottom of the move order list made it trivial to skip bad captures in the quiescence search altogether. This is an idea I first read about in a discussion on r.g.c.c. between Bob Hyatt, Feng-Hsiung Hsu and others here https://groups.google.com/g/rec.games.chess.computer/c/H6XjY2L13eQ . Hyatt claimed a small improvement, though Hsu was skeptical. Time has proven Hyatt correct though; I believe this is something most strong programs do. In any event, it seems to be worth about 10 ELO for Prophet4 so it’s a keeper.

Move Ordering

Improving move ordering has been on the radar of a while now. I started to suspect that move ordering needed some work when my initial attempts at a PVS search and aspiration windows both failed. I reasoned that, if move ordering is subpar, researches would occur too often causing an overall increase in node counts.

To know, you have to measure, so I added data to the logfiles and wrote a Python script to aggregate (1) time to depth, (2) effective branching factor, and (3) the % of nodes in which we get a fail high on the 1st move and first four moves.

Once I was able to measure, I changed the move ordering scheme FROM:

PV move -> Hash move -> All captures in MVV/LVA order -> Killer 1 -> Killer 2 -> noncaptures in the order they are generated

TO:

PV move -> Hash Move -> “Good captures” in MVV/LVA order -> Killer 1 -> Killer 2 -> noncaptures -> bad captures in SEE order.

A “good capture” is one that is a promotion, a capture in which the value of the captured piece is at least that of the capturing piece, or one with a non-negative SEE value.

Incidentally, while researching this I came across a post I myself had made many years ago: http://talkchess.com/forum3/viewtopic.php?f=7&t=15198&hilit=defer+losing+captures&sid=5543f43760ce939ed16deacd59710e06

The change doesn’t seem to add more than just a couple of ELO directly. However, in non-tactical test suites all the measured metrics were improved: time to solution is down, effective branching factor is lower, the percentage of fail highs on the first move is improved, and the percentage of fail highs in the first four moves is dramatically improved. Additionally, the number of nodes required to reach a solution is cut by about a third, with only a 7% or so decrease in speed (nodes per second). I believe this sets the stage to take another stab at PVS and aspiration windows. First though, I’m going to take a stab at pruning bad captures from the quiescence search.

One potential (probable) area of improvement is that knights and bishops have slightly different values (bishops being the more valuable). For the purposes of determining if a capture is “good” when classifying the captures, they should probably be considered equal.

New Testing Rig

Prophet4 finally has a proper testing rig! A few weeks ago, I purchased a Dell Alienware system — an 8 core (16 logical) AMD Ryzen 7 5800 with 32 GB of RAM and an AMD Radeon RX600XT graphics card.

This replaces the single core laptop I have been using. This is pretty exciting as it will allow P4 testing to go at 8x the speed it did before. Just to break the machine in, I ran the first ever gauntlet with P4. Here are the results:

RankNameEloGamesScoreDraws
1plisk 0.2.7d1032978566%19%
2tjchess 1.3812978763%22%
3jazz 840522978658%21%
4myrddin 0.87442978657%21%
5Horizon 4.4152978552%19%
6jumbo 0.4.17-52978349%21%
7p3-20181124-82978648%25%
8p4-20210407-861721136%32%
9prophet2_ja-941600035%18%
10tcb 0052-1822978523%15%

This closely matches the results I obtained a few years ago when I announced Prophet3 20180811 is released . One notable exception is TCB – it seems to have done much worse on this machine, for whatever reason.

So, it seems P4 is already on par with P2, and is within 78 elo or so from P3. This is encouraging, as the P4 rewrite is still not complete. I’m feeling pretty confident that when it is, it will be at least as strong as P3, and then the real work of improving can begin. And, with the fast feedback from a proper testing rig, I don’t think that should be all that difficult. My goal is to achieve +100 elo over P3 before doing a release.

Note– the versions of the chess engines used in this gauntlet are quite old by now; many of them likely have updates that could be significantly stronger. During the rewrite process, I’ve tried to keep everything “as is” — the goal is to compare P4 to P3, not to other engines. Once the rewrite is complete and P4 grows in strength, new testing engines will be cycled in as the current set is cycled out.