All of this ConnectiCon panel data sharing by Rym is inspiring me to stop shit talking.
I have a complete SQL dump of the PAX Tabletop library, going back several years. I grabbed it in hopes of doing some crazy data science experiments, maybe even as ambitious as crawling BoardGameGeek to pull down meta data for each title in the PAX library such as rating, length, weight, mechanics, etc.
What sort of questions do people have that could be answered with such a dataset? Give me something to work for! Off the bat, I'm curious to see how a few things trend from PAX to PAX: number of checkouts, checkouts per person, etc. If I can pull that off w/o falling back into shit-talking, I want to run some metrics that show how the tastes of PAX attendees have changed over time, and how they differ between Prime/East.
I'm not going to attempt comparing popularity of specific games, b/c it is relative to the composition of the library. I can't remember how fine the data is there. I know it shows all games ever in the library, and whether they still exist or not, but I don't remember if I have the transaction data to see when they were added and when they were culled, so I couldn't tell you how many copies of what were present at a given PAX.
Comments
What games were "completely" checked out at any point during the con, and how many times this occurred. (Skipping games where there was only one actual copy). Obviously you couldn't do that with the data you have, but it would be interesting.
Average length of the checkout broken down by game. (That would be something that would get attention).
What are the data fields?
Raw Popularity
Most checked out. Least checked out. Percentage of the convention spent checked out. Compare this over each PAX to watch popularity change over time. Also, not just individual games, but categories of games. For example: Worker placement went down/up in popularity the past year. Maybe themes as well. Aggregate data of all zombie games.
True Playtimes
How long were games actually checked out for. What was the true playtime? Which games were checked out, but returned quickly and not actually played (overrated)? Which games were checked out rarely, but played for a long time when they were (underrated)?
Player Profiles
People who checked out game X also checked out game Y. What percent of people were hardcore, medium core, and casual? What percent of people spent all day in tabletop checking out tons of games, and what percent only passed through?
Convention Traffic
When was the busiest time for tabletop? (We already know its after Expo Hall closes). When is the quiet time? Do games change in popularity based on time of day and mood?
Differences of PAXes
Is there a difference between Prime, East, and AUS? Do tastes differ by geography? What about different years of PAX? You can probably detect things like games getting replaced. Did Apples2Apples die when CAH came out? Were there one-hit wonder games that were big at one PAX and then disappeared?
Purchasing Decisions
Which games does the library not have enough copies of? Which does it have too many copies of? If you were going to stock a game store, what should inventory be?
This is all great stuff! Gives me some goals to work towards. The idea here is for me to learn some new skills, or dust off and improve old ones.
I've already learned a bunch of stuff. I hadn't flexed some of these mental muscles in over 5 years, so it's already working. I did way more work than necessary just to get a crack at the data (I set up a Raspberry Pi with Debian Linux, a web server, PHP, and Postgres). Managed to screw it up just enough along the way that having to roll up my sleeves and fix the configuration taught me a bit. I plan to use other dusty skills to pull the queries together, do the visualizations, etc.