Category Archives: chess4j

Category for posts pertaining to the chess4j project.

Passed pawns and Non-linear Mobility

Since I released Prophet 4.2 I’ve made a couple of additional evaluation changes:

  1. The passed pawn bonus has been made more granular. Where it used to be a simple bonus for a passed pawn, now it varies depending on the pawn’s rank. 40,000 bullet games says that change was worth about 14 ELO.
  2. Bishop and queen mobility has been made non-linear. This change was inspired by Erik Madsen’s MadChess blog – https://www.madchess.net/2014/12/16/madchess-2-0-beta-build-29-piece-mobility/ . The idea is to encourage piece development. I had originally plugged Erik’s values in verbatim, but they didn’t mesh well with existing weights and testing showed it weakened the program. After running the auto-tuner, this change brought in an additional 22 ELO.

In my first attempt at running the auto-tuner, I just started with the previously tuned weights, plus Erik’s values for bishop and queen mobility, but the tuner couldn’t seem to find any improvements. The error bounced around a little, going up and down, and not making any progress. I eventually decided to do a complete reset. I set the piece values to the traditional 1/3/3/5/9 values, and everything else to 0. Then I re-tuned and validated with some bullet games. The learning curve:

Fresh off the heals of these improvements, Prophet played in an informal online engine blitz tourney today. Unfortunately it was a pretty rough outing, placing just 16/20 with 2.5 points out of 9. It was a very strong field though. Even the 10th place finisher is nearly 3000 ELO on CCRL’s 40/2 list.

:Tourney Players: Round 9 of 9 
:
:     Name              Rating Score Perfrm Upset  Results 
:     ----------------- ------ ----- ------ ------ ------- 
:  1 +LczTinker         [2971]  6.5  [2937] [   0] +07w =05w =06b =03b +02w =04b =09w +11w +12b 
:  2 +NightmareX        [2909]  6.5  [2939] [   0] +12w =06w +09b =04w -01b =05w +07b +08w +11b 
:  3 +ChessSystemTalX   [2900]  6.5  [2898] [  35] +10w +09w =08b =01w =06b +07w =05b =04w +13b 
:  4 +RubiChess         [2875]  6.5  [2947] [  77] +13w +11w =05b =02b +08w =01w =06b =03b +09w 
:  5 +ArasanX           [2859]  6.0  [2836] [ 110] +14w =01b =04w =08b +12w =02b =03w =09b +16w 
:  6 +WaspX             [2830]  6.0  [2679] [ 181] +15w =02b =01w +17b =03w =08b =04w +13w =10b 
:  7 +TheBaron          [2569]  5.5  [2457] [   3] -01b +14w +16w =11b +17w -03b -02w +18b +15b 
:  8 +Goldbar           [2861]  5.0  [2533] [  19] +16w +17b =03w =05w -04b =06w =13b -02b +18w
:  9 +Marvin            [2752]  5.0  [2683] [ 162] +18w -03b -02w +16b +11w +12b =01b =05w -04b 
: 10 +Nalwald           [2500]  5.0  [2325] [ 165] -03b +18w =15b -12b -13w +17b +16w +14b =06w 
: 11 +atomGoldbar       [2575]  4.5  [2480] [   0] +20w -04b +13w =07w -09b +16b +14w -01b -02w 
: 12 +WaDuuttie         [2567]  4.5  [2411] [   0] -02b +15w =14b +10w -05b -09w +18b +17w -01w 
: 13 +rpiArminius       [2272]  4.0  [2425] [ 522] -04b +20w -11b =14w +10b +15w =08w -06b -03w 
: 14 +atomFloyd         [2242]  4.0  [2267] [ 177] -05b -07b =12w =13b +15w +18w -11b -10w +17b 
: 15 +Skiull            [1966]  3.0  [2170] [ 410] -06b -12b =10w +18w -14b -13b +17w =16b -07w 
: 16 -Prophet           [2253]  2.5  [2325] [ 351] -08b +19w -07b -09w +18b -11w -10b =15w -05b
: 17 -Skipper           [1662]  2.0  [2219] [1120] +19b -08w +18b -06w -07b -10w -15b -12b -14w 
: 18 +atomSargon        [1840]  0.0  [1974] [   0] -09b -10b -17w -15b -16w -14b -12w -07w -08b 
: 19 +atomNightmare     [forf]  0.0  [1557] [   0] -17w -16b 
: 20 +POS               [forf]  0.0  [2023] [   0] -11b -13b 
:
:     Average Rating    2474.2 

Next up- I’m going to continue with the mobility theme a little longer by testing rook mobility, then knight outposts, trapped bishops, and connected rooks on open files. I don’t expect any of those will be big points by themselves but cumulatively they might be worth a bit.

Prophet 4.2 and chess4j 5.0 are released

I’m happy to announce updates to both chess engines. Prophet 4.2 is approximately 50 elo stronger than 4.1, and 150 elo stronger than 4.0. (I missed a release announcement or two while this development blog was offline.) The most significant change, and the reason the chess4j major version number has been incremented, is that chess4j now includes an auto tuner! The tuner uses logistic regression with gradient descent to optimize evaluation terms. I’ll write more detail about that in a separate post. Those optimized weights have been added into Prophet, so it benefits from that work as well. Tapered evaluation has been fully implemented as well which added a few elo. I say “fully” because the king evaluation was already tapered, but now both programs fully evaluate the position with a middle game and an endgame score, and weight them based on material on the board. Some concept of mobility has been added as well – a simple count of available squares for both bishops and queens.

Here is how Prophet 4.2 stacks up against its current sparring partners in 1+0.5 games:

RankNameEloGamesScoreDraws
1tantabus-2.0.0824625063%22%
2arasan-13.4674625060%21%
3barbarossa-0.6.0584625059%20%
4qapla-0.1.1314625055%24%
5prophet-4.2244000053%24%
6loki-3.5234625054%23%
7myrddin-0.88-24625050%24%
8prophet-4.1-364000044%23%
9tjchess-1.3-834625037%21%
10jazz-840-1344625030%19%
11prophet-4.0-1411000030%18%

PVS – take 2

Some time back I tried implementing a Principal Variation Search , but as I wrote about in my post PVS – Another Fast Fail , the results were not good. At the time I concluded that if PVS is not a win, it must mean that the cost of the researches is outweighing the nodes saved by doing zero width searches. For that to be the case, it must mean that too often the first move is not the best move, which points to move ordering.

Since then move ordering has certainly improved, as documented in this post on Move Ordering . So, I decided to give PVS another try. In my first attempt, it appeared to be another loss. Then, I decided to not do PVS at the root node, and now it appears to be a very small win.

A win is a win, so I’m merging the changes in, but I think there is more to do here. My suspicion is that, as move ordering improves, the benefits of PVS will increase. The most obvious way to improve move ordering is to add a depth preferred hash table (the current strategy is a very naive “always replace”).

It seems like PVS at the root should work as well, if the program can reliably predict the best move often enough. I know a lot of programs put extra effort into ordering the moves at the root. I remember reading that Bob Hyatt’s Crafty does a quiescence search at the root. So, this is on the backlog as well, and once complete I will revisit the idea of PVS at the root.

For now, it is on to the next thing – Late Move Reductions. I’m hopeful that will yield a significant ELO increase, perhaps finally putting P4 on par with P3.

Small Improvement to “bad captures”

In my recent post on move ordering, I identified a potential area of improvement to the criteria for deciding if a capture is “good” or “bad.” As I wrote in that post, a capture is good if:

  1. It is a promotion (technically even non-capturing promotions are included)
  2. The value of the captured piece is greater than the value of the capturing piece
  3. The Static Exchange Evaluator (SEE) score is non-negative.

The issue is with knights and bishops. They are roughly the same value (which one is more valuable really depends on the position), but in Prophet the bishop has a slightly higher value. A knight has a material value equal to 3 pawns, but a bishop has 3.2 pawns. The consequence of that is that a simple Bishop x Knight capture would be categorized as “bad” and not tried until all non-captures have been tried.

I don’t have the link handy but I read an older post on talkchess.com where Tord Romstad, the author of Glaurang (pre-cursor to Stockfish), mentioned that he used different piece values for the purposes of move ordering than he did in the evaluation. He said he used 1, 3, 3, 5 and 10. That means Bishop x Knight captures, as well as Knight x Bishop captures would both be categorized as “good.” Also, by giving the queen a value of 10, it means that giving two rooks for a queen would be considered equal by the SEE, where giving a queen + pawn for two rooks would have a negative score.

Sure enough, that simple change seems to be worth about 6 ELO.

Pruning “bad” captures in quiescence

As suspected the change to move ordering to separate good captures from bad captures has already paid off. Moving bad captures to the bottom of the move order list made it trivial to skip bad captures in the quiescence search altogether. This is an idea I first read about in a discussion on r.g.c.c. between Bob Hyatt, Feng-Hsiung Hsu and others here https://groups.google.com/g/rec.games.chess.computer/c/H6XjY2L13eQ . Hyatt claimed a small improvement, though Hsu was skeptical. Time has proven Hyatt correct though; I believe this is something most strong programs do. In any event, it seems to be worth about 10 ELO for Prophet4 so it’s a keeper.

Move Ordering

Improving move ordering has been on the radar of a while now. I started to suspect that move ordering needed some work when my initial attempts at a PVS search and aspiration windows both failed. I reasoned that, if move ordering is subpar, researches would occur too often causing an overall increase in node counts.

To know, you have to measure, so I added data to the logfiles and wrote a Python script to aggregate (1) time to depth, (2) effective branching factor, and (3) the % of nodes in which we get a fail high on the 1st move and first four moves.

Once I was able to measure, I changed the move ordering scheme FROM:

PV move -> Hash move -> All captures in MVV/LVA order -> Killer 1 -> Killer 2 -> noncaptures in the order they are generated

TO:

PV move -> Hash Move -> “Good captures” in MVV/LVA order -> Killer 1 -> Killer 2 -> noncaptures -> bad captures in SEE order.

A “good capture” is one that is a promotion, a capture in which the value of the captured piece is at least that of the capturing piece, or one with a non-negative SEE value.

Incidentally, while researching this I came across a post I myself had made many years ago: http://talkchess.com/forum3/viewtopic.php?f=7&t=15198&hilit=defer+losing+captures&sid=5543f43760ce939ed16deacd59710e06

The change doesn’t seem to add more than just a couple of ELO directly. However, in non-tactical test suites all the measured metrics were improved: time to solution is down, effective branching factor is lower, the percentage of fail highs on the first move is improved, and the percentage of fail highs in the first four moves is dramatically improved. Additionally, the number of nodes required to reach a solution is cut by about a third, with only a 7% or so decrease in speed (nodes per second). I believe this sets the stage to take another stab at PVS and aspiration windows. First though, I’m going to take a stab at pruning bad captures from the quiescence search.

One potential (probable) area of improvement is that knights and bishops have slightly different values (bishops being the more valuable). For the purposes of determining if a capture is “good” when classifying the captures, they should probably be considered equal.

Extending Lines

“Tree shaping” is an important component of a strong chess program. There are various ways to shape a search tree. One is by extending lines that seem interesting in some way; particularly those that cannot be fully resolved within the search horizon (see https://en.wikipedia.org/wiki/Horizon_effect). Other methods of shaping the search tree include reducing or even pruning lines that seem less promising. The quiescence search is also a form of tree shaping, as the search becomes more selective in which nodes it expands – typically just captures or captures + checks and check evasions.

One of the simplest, and perhaps the most effective extension is the “check extension.” Whenever a move is made, if the move gives check to the enemy, the search depth is increased by one. That is, instead of this:

apply_move(pos, move)
val = -search(pos, depth-1, -beta, -alpha)

Do this:

apply_move(pos, move)
bool gives_check = in_check(pos)
int ext = gives_check ? 1 : 0
val = -search(pos, depth-1+ext, -beta, -alpha)

Disclaimer: that is probably a little too naive, as it doesn’t really guarantee the line will terminate. There are surely positions that would cause the extension as written above to explode the search tree. But, on average, it’s a win.

I implemented the check extension in Prophet4 and indeed it seems to be a big win.

WinsLossesDrawsPctEloError
33118340258.1%56.616.8
Prophet4 with check extension vs without, 5+0.25

In an effort to add some sort of terminating criteria to it, I also tried limiting the check extension to capturing moves, and while still a win it didn’t work out as well.

WinsLossesDrawsPctEloError
32063119450550.4%2.85.0
P4 with capture only check ext vs without, 5+0.25

I also tried another form of extensions entirely – promoting moves. However, it had no measurable effect at all and may have even been a small loss, so I abandoned it.

After settling on the “naive check extension” I measured P4’s standing against P3.

WinsLossesDrawsPctEloError
31392975834.6%-110.612.2
Prophet4 vs Prophet3, 10+0.5

The last time I measured, P4 was -164.29 vs P3, so it’s definitely gained some ground.

Next on the list is to take a careful look at move ordering before moving onto another form of tree shaping – reductions. I expect Late Move Reductions (LMR) in particular to give another large jump. But, the algorithm is very sensitive to good move ordering, so it’s worth taking some time to study that first.

Validations – and surprises

I’ve been pretty busy as of late, but in an effort to get some momentum going I picked some items that have been on my to do list that would take far more processor time than programming time – validating some evaluation terms that have been part of Prophet for many years now. The goal, really, was to validate what I was sure was true- that these evaluation terms do help; if they are removed, surely performance would drop as well. As you’ll see, it didn’t quite work out that way.

Here’s a list of evaluation terms that were tested, with results below.

  • Knight tropism – the idea that keeping your knight as close as possible to the enemy’s king is generally a good thing.
  • Rooks on open files, or even half open files.
  • Passed pawns – pawns with no enemy pawn in front or on an adjacent file should be rewarded, since the likelihood of promotion increases dramatically.
  • Isolated pawns – pawns with no friendly pawns on an adjacent file should be penalized, since they are weakened considerably without the supporting pawns.
  • Doubled pawns – pawns that occupy the same file as another friendly pawn are a (small) positional weakness.
  • Major pieces (rooks, queens) on the 7th rank with the enemy king on the back rank are usually very deadly, especially when “connected.”

Knight Tropism

This term works by penalizing a knight by 2 x distance_from_enemy_king centipawns. Distance from the enemy king is the max of difference in ranks and difference in files.

I actually started out running this one with chess4j, the Java engine, before deciding to use Prophet4 going forward. As explained in a previous post, the overhead of restarting the JVM between matches is just too high, and not restarting the engine between games isn’t great either, so doing these fast tests with a lightweight executable that can be quickly restarted seems preferable. At any rate, the outcome was nearly the same so I’ve combined the results into one table.

Also, since these are validations of existing terms, player A was the “control” player, and player B the “without” player. The hypothesis is that “Player A is better than Player B.”

WinsLossesDrawsPctEloError
63125982767450.8%5.73.8
23432122276651.5%10.66.3
59425789826950.4%2.73.7
Control vs “without knight tropism”, 5+0.25

As you can see, the knight tropism is worth a few ELO. Not a game wrecker by any means, but it does have a small positive effect. We’ll check this one off the list.

Rooks on Open Files

Rooks on open files or even half open files are known to be a strategic advantage. It allows the player to move their rooks around easily, projecting strength and penetrating into the opponent position.

Rooks on files with no other pieces are given a 25 centipawn bonus. Rooks on files with just enemy pieces are given a 15 centipawn bonus.

WinsLossesDrawsPctEloError
57341171754.7%33.212.6
Control vs “No rook on open files”, 5+0.25

This term is obviously doing the job. Check this one off the list.

Passed Pawns

Passed pawns become promoted pawns. Passed pawns are awarded 20 centipawns.

WinsLossesDrawsPctEloError
21431930265651.6%11.06.5
Control vs “no passed pawn bonus”, 5+0.25

Another one validated. That’s not to say it’s tuned correctly, but at least we can say it does help.

Isolated Pawns

Isolated pawns don’t have any friendly pawns on adjacent ranks to give them any support. They are a weakness. In Prophet and chess4j, isolated pawns are penalized 20 centipawns.

WinsLossesDrawsPctEloError
50264174146.3%-25.712.2
13851532184748.5%-10.77.7
Control vs “no isolated pawn penalty”, 5+0.25

That is NOT a good result! I was in such disbelief after the first test that I ran a second test, and though the result doesn’t seem quite as bad, it’s still bad. As it stands, the isolated pawn penalty is hurting. I haven’t disabled it, because the major focus right now is rewriting the engine before improving it, and I want to be able to compare apples to apples after the rewrite. However, I have put an item on the backlog to study this in more detail. The heuristic should work. Either the implementation isn’t quite right, or it’s too expensive, or the weights aren’t right. I’ll have to get to the bottom of this.

Doubled Pawns

Doubled pawns also known to be a positional weakness. They are penalized 10 centipawns (note this penalty gets “awarded” to each pawn).

WinsLossesDrawsPctEloError
59372295047.2%-19.810.9
8581149151645.9%-28.88.7
Control vs “no doubled pawn penalty”, 5+0.25

Another surprising and disappointing result! Investigating this has also been added to the post-rewrite backlog.

Majors on 7th

This evaluation term awards rooks and queens on the 7th rank when the enemy king is on the back rank. If so, 50 centipawns are awarded. Additionally, if connected to another major piece on the 7th rank, an additional 80 centipawns are awarded – the idea being this is likely a deadly / mating attack.

I can’t remember how long this term has been around, but it’s been a looooong time. Sadly, the results aren’t so good.

WinsLossesDrawsPctEloError
10191137154248.4%-11.18.5
31783509448348.5%-10.35.0
Control vs “without majors on 7th”, 5+0.25

Clearly, not a good outcome! I wondered what would happen if I just disabled the “connected” part but left the award for having a major piece on the 7th when the enemy king is on the 8th?

WinsLossesDrawsPctEloError
57056024827149.2%-5.53.7
56915973833649.3%-4.93.7
Control vs “without CONNECTED term”, 5+0.25

Disabling the “connected” bit may help a little, but still, the heuristic is hurting overall, not helping.

Conclusion

Out of the six evaluation terms to be tested / validated, three of them were found to be helpful and three are actually harmful. The “majors on 7th”, doubled pawn, and isolated pawn heuristics have all been added to the post-rewrite backlog for further study.

The moral of the story here is, do NOT assume anything. Always test! This has been my philosophy for a good while now, but these eval terms were all added at a time that I wasn’t as rigorous in my testing as I am now. (And, frankly, I don’t think many in the computer chess community were before 10-15 years ago.)

If there is any upshot, it’s that there is a guaranteed strength improvement to be had, even if it’s just removing the terms altogether. But, since I know they really should work, I’ve opted to leave them for now. Also, when the Prophet4 rewrite is complete, I really want to be able to run some benchmarks against Prophet3 and be able to make comparisons. The rewritten codebase should “stand on its own.” Improving the evaluator now would cast some doubt on those comparisons.

PVS – Another Fast Fail

After the failure with aspiration windows I decided to implement PVS . The idea here is that, with proper move ordering the first move is usually the best. That being the case, it makes sense to search all of the moves after the first with a minimal alpha/beta window, the goal being to prove or disprove the first move was best rather than to get a usable score. If the first move is in fact best, you’ve saved some time. If not, you have to research with the “regular” alpha/beta window.

WinsLossesDrawsPctEloError
35148759845.3%-33.013.7
53367079646.6%-23.811.8
With PVS vs without, 5+0.25

This result is a little surprising and depressing. I’m not going to give up on PVS, but clearly the time isn’t quite right for it. I’ll have to revisit it later. This leads me to believe I should take a step back and look at move ordering. It seems to me that if PVS is a loss, that could mean the “first move is best” assertion is wrong more than we’d like, meaning move ordering is not where it should be. There are some tasks on my board related to move ordering, such as deferring losing captures until after non-captures and implementing a static exchange evaluator to aid in ordering the quiescence moves. I’ll leave the branch open and revisit this when move ordering has improved.

Aspiration Windows – a Fast Fail

Earlier this week I implemented aspiration windows to chess4j. It’s a very simple concept. In the iterative deepening loop, the depth 1 iteration is processed as usual, with an alpha bound of “negative infinity” and a beta bound of “infinity.” On subsequent iterations, the alpha bound is initialized to the previous score minus some margin (1/3 pawn in this case), and the beta bound is initialized to the previous score plus some margin (again, 1/3 pawn). If all goes well, and the score is within those bounds, the search should finish faster as there are more cutoffs. However, if the real score is not within that search window, then a research must be done.

My intention was to start with the 1/3 pawn windows, falling back to a “wide open” search if it fails. Once testing confirmed that was a win, I could play with the margins or incrementally expanding the window. Unfortunately, this was a pretty fast fail.

My first run was terminated after 2002 games, and the second after 2965 games.

WinsLossesDrawsPctEloError
54968976346.5%-24.512.0
8861026105347.6%-16.410.0
With aspiration windows vs. without, 5+0.25

I did expect this to be a win so I’m a little puzzled by the result, but that’s why we test. Perhaps when the search matures a little more and can hit deeper depths the scores between iterations will stabilize ? I’ll just queue this up to revisit later on.

On the plus side, SPRT saved a lot of time by terminating a 20,000 game match after 2-3k games. That’s a big time saver.

On to PVS.