Introduction

This is the home page for the pgn-extract program, which is a command-line program for searching, manipulating and formatting chess games recorded in the Portable Game Notation (PGN) or something close. It is capable of handling files containing millions of games. It also recognises Chess960 encodings.

A full description of pgn-extract's functionality is available and included with the sources.

Here you can find the C source code and Windows binaries for the current version. pgn-extract compiles and runs under Windows, Linux and macOS. This program is made available under the terms of the GNU General Public License (Version 3).

Getting-started video for Windows users

For Windows users who are really only interested in getting the binary working, there is a short introductory video.

Overview

The program is designed to make it easy to extract and format selected games from a PGN format data file based on a wide variety of criteria. The criteria include:

  • textual move sequences;

  • the position reached after a sequence of moves;

  • information in the tag fields;

  • fuzzy board position;

  • and material balance in the ending.

Over the on-going 30+ year course of its development, it has also added lots of features for controlling what is output (e.g., different algebraic formats, EPD, no move numbers, restricting game length, etc.)

The program includes a semantic analyser which will report errors in game scores and it is also able to detect duplicate games found in its input files.

The range of input move formats accepted is fairly wide. The output is normally in English Standard Algebraic Notation (SAN) but this can be varied to long-algebraic or UCI, for instance.

Extracted games may be written out either including or excluding comments, NAGs, variations, move numbers, tags and/or results. Games may be given ECO classifications derived from the accompanying file eco.pgn, or a customised version provided by the user.

The program is designed to be relatively memory-friendly, so it does not retain a game's moves in memory once it has been processed. This also makes it suitable for bulk processing very large collections of games - it can efficiently process files containing several millions of games.

Use the --help argument to the program to get the full lists of arguments.

Most recent changes

These are the main changes in this version:

  • bug fix for repetition detection (--repetition);

  • find games whether the winner is either the higher or lower rated player (--lowerratedwinner and --higherratedwinner);

  • extend -v to match move sequences anywhere in a game, not just at the start (--vanywhere);

  • find games played at odds (--odds).

Available Files

You can take a copy of the full source and documentation as either pgn-extract-25-01.tgz or pgn-extract-25-01.zip. Alternatively, a Windows 64-bit binary is also available.

Name Description Size Date
pgn-extract-25-01.tgz
GZipped tar file of the complete source of the latest version of the program.
Includes usage documentation, Makefile for compilation and eco.pgn file for ECO classification.
498K bytes  08 Jan 2025
pgn-extract-25-01.zip Zipped file of the complete source of the latest version of the program.
Includes usage documentation, Makefile for compilation and eco.pgn file for ECO classification.
641K bytes  08 Jan 2025
pgn-extract.exe Windows 64-bit binary of the latest version of the program. 2.4M bytes 08 Jan 2025
eco.zip Zipped version of eco.pgn. 32K bytes  
eco.pgn File of openings with PGN classification.
This file is already included in the source archives.
254K bytes  
COPYING GNU General Public License (version 3). 35K bytes   

Blog post about data mining with pgn-extract

In October 2018 I wrote blog post about using pgn-extract to mine a PGN database. As an example it looks at the effect of having a bishop pair versus a knight pair.

Answers on Chess StackExchange using pgn-extract

I am active on Chess StackExchange as kentdjb and aim to respond to pgn-extract related questions, although email to me is my preferred way to raise potential issues with the program.

From time to time, I have provided answers to questions that involve the use of pgn-extract for analysis tasks:

Feedback

Feedback and suggestions for further features are always welcome, although I can't always promise to undertake significant development work.