Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notification while reading files #16

Open
mwo-dk opened this issue Aug 28, 2015 · 0 comments
Open

Notification while reading files #16

mwo-dk opened this issue Aug 28, 2015 · 0 comments

Comments

@mwo-dk
Copy link

mwo-dk commented Aug 28, 2015

Hi,

Thx again. Great library. As you know I like to read and do "stuff" with large files. I would like to read and optionally have progress. But what would be really nice is if the file-reader could notify ie. via an event about parsing individual games as they complete.

I tried stuff with that million-game (about 2.2M games) db, I mentioned in the large-file request. Did soimething in the line of:

[Serializable]
public struct PGNGame
{
public int White { get; set; }
public int Black { get; set; }
public GameResult Result { get; set; }
}
class Program
{
static void Main(string[] args)
{
int gamesRead = 0;
int maxPlayerId = 0;
int maxGameId = 0;
ConcurrentDictionary<int, string> playerBase =
new ConcurrentDictionary<int, string>();
ConcurrentDictionary<int, PGNGame> gameBase =
new ConcurrentDictionary<int, PGNGame>();

        var reader = new PgnReader();

        var start = DateTime.Now;

        var parsedGames = new BlockingCollection<Game>();
        var queue = new BlockingCollection<List<string>>();
        Task.Run(() =>
        {
            foreach (var gameData in queue.GetConsumingEnumerable())
            {
                var data = gameData.Aggregate((x, y) => x + y);
                var games = reader.ReadFromString(data);
                foreach (var game in games.Games)
                    parsedGames.Add(game);
                if (parsedGames.Count > 100000)
                    Thread.Sleep(500);
            }
        });
        Task.Run(() =>
        {
            foreach (var parsedGame in parsedGames.GetConsumingEnumerable())
            {
                Task.Run(() =>
                {
                    var white = parsedGame.WhitePlayer;
                    var black = parsedGame.BlackPlayer;

                    if (!playerBase.Values.Any(name => name == white))
                    {
                        playerBase[Interlocked.Increment(ref maxPlayerId)] = white;
                    }
                    if (!playerBase.Values.Any(name => name == black))
                    {
                        playerBase[Interlocked.Increment(ref maxPlayerId)] = black;
                    }

                    var whiteId = playerBase.First(kvp => kvp.Value == white).Key;
                    var blackId = playerBase.First(kvp => kvp.Value == black).Key;
                    gameBase[Interlocked.Increment(ref maxGameId)] = new PGNGame
                    {
                        White = whiteId,
                        Black = blackId,
                        Result = parsedGame.Result
                    };
                    if (maxGameId % 100 == 0)
                    {
                        var now = DateTime.Now;
                        var secs = (now - start).TotalSeconds;
                        var speed = maxGameId / secs;
                        var estimated = 2200000 / speed;
                        Console.WriteLine("Games read: {0}", gamesRead);
                        Console.WriteLine("Queue length: {0}/{1}", queue.Count, parsedGames.Count);
                        Console.WriteLine("#Players: {0}. #Games: {1}. Speed: {2}. Estimated duration: {3}.",
                            maxPlayerId, maxGameId, speed, estimated);
                    }
                });

            }
        });
        using (var streamReader = new StreamReader("millionbase-2.22.pgn"))
        {
            var pgn = new List<string>();
            while (!streamReader.EndOfStream)
            {
                var line = streamReader.ReadLine().Trim();
                var isNewGame = line.StartsWith("[Event");
                if (isNewGame && pgn.Any())
                {
                    queue.Add(pgn);
                    Interlocked.Increment(ref gamesRead);
                    pgn = new List<string>();
                    if (queue.Count > 10000)
                        Thread.Sleep(500);
                }
                pgn.Add(line);
            }
        }
    }
}

And am able to keep speeds of 300-450 games per sec. As you can see I use some producer-consumer scheme and "throttle" in order to keep the blocking queues managable. But I cheat... The line:

var isNewGame = line.StartsWith("[Event");

is because I know, that for that pgn-file every game is decently formatted. And as we all know, that is not the case with the rather loose PGN input format.

Is this a feasible thing to get?

Very much appreciated anyways,

Thanks for the good work,
Michael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant