MN High School Tennis
MN High School Tennis
MN High School Tennis
MN High School Tennis
This website is privately maintained. As such, we have no affiliation with the Minnesota State High School League (MSHSL), the Minnesota Tennis Coaches Association, or any other profession tennis body or commercial business. All rankings on the site are created by us.
As you can see, there is no advertising or user fees for the site and we intend to keep it that way. We generate no revenue at all from these rankings, and fund all server, data and programming expenses ourselves.
Over the past 3 years, we have evaluated many different algorithms to determine which would be the best for rankings tennis matches. Some methods only use wins and losses (RPI, Colley, Elo, TrueSkill, Dominance Matricies) while others use scoring information (Massey, Bayesian Inference). Generally, using scoring data, in addition to wins and losses, results in greater accuracy.
For our site, accuracy is defined as the percentage of team or individual matches where the predicted outcomes based on our rankings matches what actually happened during match play during the season. Our team and individual ranking accuracies are posted beneath each rankings chart and are consistently between 90 and 95%. The remaining matches where the predictions and actual match play disagree (violations) are a combination of true upsets and situations where the teams rankings are so close to one another that the outcome is essentially a toss-up. Half of the time, these toss-up matches will result in violations.
The Team Rankings use scoring information to determine dominance (Massey, Stan). Every single game won or lost in each match counts. So win as many games as possible and lose as few games as possible. And win as many matches as possible. And strength of schedule counts as well.
The Player Rankings use wins and losses only, and it basically comes down to how well you play similarly ranked players. Beating better players will increase your skill level Mu. Playing often and playing similarly ranked players will cause your Sigma to decrease. Remember that the Leaderboard ranking is calculated as:
LB (Skill Rating*) = Mu - 2*Sigma
So beating good players (increasing your Mu) and playing similarly ranked players often (decreasing your Sigma) will help you achieve your highest ranking.
The biggest problem with ranking the Minnesota High School tennis seasons is the large number of matches where highly ranked teams are playing significantly weaker teams, resulting in blowouts. These matches are no fun for either team and there is little to no information contained in a match where one team blows out another. That is why blowouts between mismatched teams are weighted significantly lower for the Massey method.
* Skill Rating is simply the Leaderboard (LB) ratings rescaled to a range of 0-100.
All of the rankings are computed using advanced mathematical algorithms that are freely available on the internet. The methods themselves are unbiased and treat all teams equally with respect to their season play. If there is any "bias" present, it would affect all teams equally.
We wanted to provide an advanced, objective ranking system for the entire state of Minnesota tennis community. We rate every team, every player using the match data from the current tennis season.
We are not trying to recreate the UTR or USTA rankings for the top players, as these are separate ratings based on tournament match play for a significantly greater number of matches that are closer to each players' ability level. Our rankings are only based on match play that occurs in conjunction with the MSHSL tennis season. However, our team and player rankings are fairly accurate for the matches that are played during the high school tennis seasons. Whether or not our player rankings agree with UTR or USTA rankings is largely dependent on the quality of match play that occurs during the high school tennis season (see "How accurate are the rankings?") and how many matches are played (some teams have 3x the number of matches compared to other teams).
This site is not intended to be used by college coaches for evaluating the top players. The level of competition varies significantly among the top players, mainly due to suboptimal match scheduling practices. However, the TrueSkill algorithm does a pretty good job at ranking the top players appropriately, especially during competition at the state tournament when the top players are more evenly matched.
The hardware and software used are as follows:
All calculations are performed on an Apple iMac Pro (2017) using 8 cores.
The base software we are using is R (The R Project for Statistical Computing), an open source, functional programming language that is well suited for statistical analysis, and RStudio, which is a integrated development environment (IDE) specifically designed for R. Most of our code is custom written, although some of it has been written elsewhere in the R community.
The TrueSkill rankings are computed using Julia, which is a flexible dynamic language, appropriate for scientific and numerical computing. Using R, the Trueskill rankings typically take about 3 hours to process over 13,000 matches for approximately 2400 players (a typical girls tennis season). Using Julia, this task is reduced to approximately 30 seconds.
The rankings software currently consists of 21 separate programs which are run consecutively each morning to download and clean the data, calculate the various rankings and statistics, format the results and upload the results to the web server. Much of this is automated and the only human interaction that is necessary is to manually correct scores, team names and player names that have been entered incorrectly on the TennisReporting.com website.
Note that the ratings and rankings that are listed on the website have all been generated by the computer. We do not manually adjust any ratings or rankings at all.
The Minnesota State High School League publishes the Competitive Sections list on their website prior to each season. We use that list to download match data from TennisReporting.com.
Note that some teams on the list may choose not to field a team for any particular season and hence they will not show up in the rankings because they have no match data on Tennisreporting.com. A small number of teams are "coops" and are comprised of players from multiple smaller schools. These coops are listed under the "host" school for that coop team.
All match data is downloaded from TennisReporting.com daily during each tennis season. After completion of the state tournament, we will wait until all of the results have been entered and will then perform a final download and calculate final rankings for the entire season.
Occasionally, as we improve our algorithms, we may recalculate prior season rankings so that all season rankings are based on the same algorithms.
The fundamental unit of play in our rankings is the singles or doubles match.
For Team ratings, the total games won for each winner and each loser of any particular match are summed to create the overall score. For a typical match score of 6-4, 6-1, our rankings calculate a combined score of 12-5. The score differential, defined as the winning player's score minus the losing player's score, is then calculated and used, although somewhat differently, for both the Massey and Stan rankings. In the Massey method, the score differential itself is used. For the Stan ratings, the square root of the score differential is used.
Note that it is entirely possible that the losing player may win more games than the winning player, such as with a match score of 7-5, 1-6, 6-4 for an overall score of 14-15. This will result in a negative score differential which will slightly favor the losing player. However, these cases are not common, and only usually result from a 3 set match, which results in essentially a tie from the computer's calculations.
For Player (Singles and Doubles) ratings, the result of the match (win/loss) is used to modify player metrics used to describe each player's skill level (μ) and uncertainty (σ) in that skill level using Microsoft's TrueSkill algorithm.
Note that matches between two teams are traditionally scored by how many matches each team wins (based on 4 singles and 3 doubles matches). However, this method of scoring is not used by the computer algorithm as it includes significantly less data that using the scores (games won and lost) for each match and fails to discern close matches from blow-outs.
We have a separate program that is devoted entirely to score correction. Here's what it can do:
In some cases, the computer cannot correct an invalid score and those are set aside for manual correction. Many times we can look at all of the set scores in a match and quickly determine what was meant to be entered (e.g. a 61-0 score is most likely a 6-1 set). For those score errors where we cannot determine the original intent, we convert it to a closer valid score or simply leave it as is if deemed to be an insignificant error.
Retirements ARE included in the rankings because they correspond to a match that was actually played, but not finished, and contain valid performance information.
Defaults ARE NOT included in the rankings since they correspond to matches that were never played and contain no information about the strength of either player.
Our software looks for exhibition matches and deletes them. Exhibition matches are usually designated as flights 5 and greater for singles and flights 4 and greater for doubles. The software also checks to make sure that these matches are not part of a singles or doubles tournament, which frequently results in higher flight numbers. All tournament matches are included in the rankings.
Some schools along the MN border play matches with Wisconsin, Iowa and South Dakota high schools. These matches are deleted and not used in the rankings. This is because the computer algorithms need to calculate rankings for each school present in the overall match data. If we were to include out-of-state schools, we would have to calculate their rankings as well, which would require us to know their entire out-of-state schedule and rank all of their out-of-state opponents, and so forth. This is a common situation with most algorithms and the work around is to simply delete those matches. This can result in the border schools having lower numbers of matches that comprise their rankings.
TennisReporting.com has a system in place whereby all matches need to be approved by both coaches before being finalized. This does create some delay in finalizing matches, although the thought is that this will reduce errors in the match data. At any one time, there are usually between 7% and 9% of all matches that are "unapproved".
We still use these unapproved matches in the rankings, but isolate them first and then check them to make sure they include the minimum information necessary to use for the team rankings. However, there is still some manual effort required to add missing team names to a small fraction of these "unapproved" matches.
For our team ratings, we combine the results of 2 different polls to help increase our overall accuracy. For our purposes, accuracy is defined as the number of season matches correctly predicted (by win-loss only) by the ranking in question. Both the Massey and Stan ratings yield comparable accuracy, although you can still see a difference in their respective rankings.
The Power 10 score is a separate metric that is based on the sum of the TrueSkill ratings for the top 4 singles players and the top 6 doubles players, similar to the Power 6 that is calculated for the college teams on the UTR website. This is described further in a separate question.
The Massey Rating model was published in 1997 by Dr. Kenneth Massey, in his honors thesis at Bluefield College. His technique was subsequently used as one of six computer polls in the BCS (Bowl Championship Series) selection system from 2004 through 2013.
The principle of the Massey Rating model is fairly simple. It is based on the assumption that the difference in ratings between 2 teams should be proportional to the difference in their scores if a game was played between the two teams. The derivation of the model is fairly straight forward using linear algebra and comes down to solving a (n x n) system of linear equations (n = number of teams) for n unknowns (the ratings vector). The method uses score differentials between the winners and the losers.
For those interested, here are a couple links to descriptions of his method. Many more descriptions are available on the internet, including many variations that have been developed over the past 2 decades.
We first implement the basic Massey method that is well published, using score differentials to describe the difference in ability between the two players.
Using these base ratings, we then assign a weighting to each match for the entire season to date. The amount of weighting varies from 1.0 (for matches that occur between opponents who are ranked adjacent to one another - for example Teams #10 and #11) to almost 0.0 (this would be assigned to a match between the #1 team and the last place team). The amount of weighting varies linearly between these two extremes. The result is that the larger the difference in ranks between 2 teams, the less it will count for each team's rating/ranking. Conversely, the closer 2 teams are in rank, the more the match will count toward their final rating/ranking.
This makes intuitive sense, but there is also another reason to weight tennis matches. Blow-outs contain little information about the difference in true strength between the 2 teams. In tennis, the scores of "blow-outs" are limited to 6-0, 6-0. The combined score of 12-0 does not represent the true difference in ability between the two players. If these 2 players were allowed to play longer, the number of games won by the winner of the match would be significantly greater, whereas the loser would still be expected to win 0 or perhaps a few games at most. So these games are downplayed in the ratings/rankings by applying much lower weighings based on rank. However, if a lower ranked team beats a higher ranked team, the result will be counted at full weighting (1.0).
Stan is a software platform for performing bayesian inference. It automates the process by taking care of the advanced statistical computations in the background, while letting user focus their attention on building and testing models for various processes using Bayesian inference.
Stan was named for Stanisław Marcin Ulam, a Polish scientist in the fields of mathematics and nuclear physics, who was a pioneer of the Monte Carlo method of computation (used in Stan), and was involved in the Manhatten project, among other things.
Stan was developed in the early 2010s and currently has thousands of users. There is a fairly steep learning curve, but the site provides ample documentation and case studies. The website for Stan is located here.
For those interested, there is a video of Dr. Andrew Gelman, professor of statistics and political science at Columbia University, discussing modeling the European Premier League (EPL) soccer season with Stan. The techniques and code he presents are very similar to the analysis we are using for our tennis ratings. The video can be found here and the EPL ratings are discussed in the first 27 minutes.
A recent article online also discusses using Stan to calculate ratings for professional tennis players on the ATP based on match information from the 2019 tennis season. That article can be found here.
This is a rather easy calculation to perform since the ratings for all players/teams are known. The AOR is simply the arithmetic average of the ratings for all match opponents that were played in the season to date.
This is an important number to consider in the rankings, since a win-loss record means nothing unless you know the strengths of the opponents played. For our team rankings, the two most important numbers listed are the Total Games Win/Loss records and the Average Opponent Ratings.
For Player Rankings, the AOR is the average of all of the player's opponents' Skill Ratings.
Player skill metrics (Mu, Sigma) are calculated using Microsoft's TrueSkill algorithm and are displayed under "Player Rankings". These ratings are then used to create a Leaderboard rating (Mu - 2*Sigma), which is then mapped onto a scale ranging from 0 to 100 (the "Skill Rating"). The top 4 singles players ratings and top 6 doubles players ratings for each team are added together to form the Power 10. Theoretically, the Power 10 could approach 1000 (10 players each with a 100 ranking), but in practicality, the best teams are in the 800's.
This is similar to the Power 6 for college teams that is calculated on the UTR website. The Power 6 represents the sum of the UTRs for the top 6 players on each college team, since colleges typically play 6 singles matches and 3 doubles matches with the same 6 players.
The season rankings are based on all matches played throughout the season to date. While upsets do occur in the rankings, it is the entire season of play that determines the rankings rather than individual matches.
We use a variation of Microsoft's TrueSkill for player rankings. TrueSkill was developed by Microsoft Research Lab in 2005 for matchmaking on its XBox Live platform. The theory behind TrueSkill has been published and is available here.
One shortcoming of the original Trueskill algorithm was that the ratings were sensitive to the order in which the matches were played. This was solved with enhancements to the original algorithm and was called "TrueSkill Through Time" and was published in 2008. This is the implementation we are using on our website. The published paper is available here.
Several additional enhancements were added in 2018, although many of the finer details are unknown and there are no coded versions available in the R community. The published paper can be seen here.
Trueskill assigns a normal probability distribution to each player which can be described by its mean (mu - μ) and standard deviation (sigma - σ). The μ corresponds to the skill level of the player while the σ corresponds to the uncertainty of the strength of the player. Both the μ and σ are updated for each player after each match. Stronger players have high μ and weaker players have lower μ. The algorithm to update μ and σ for each player is fairly complex, but results in very reasonable skill levels of the players.
In the chart below, you can see that Natalia has a higher skill level μ, but there is a fairly large uncertainty in this number because of her large σ. On the other hand, there is a much higher confidence in Eric's skill level μ since his σ is much smaller.
The current values of μ and σ for each player are provided in the player rankings table.
For leaderboards (such as as our Player Skill Ratings), Microsoft recommends using the following formula to rank players' skill levels:
Player Rating = μ - 3* σ
This corresponds to a value that is 3 standard deviations below the player's mean rating μ. This results in a very conservative player rating. However, we have found that this tends to penalize those players who have not played as many matches as other teams/players (because their sigmas are higher), and so we modify the formula as follows:
MNHSTennis.org Player Rating = μ - 2* σ
This is applied to all players equally. This ensures that there is a greater than 97% probability of the player beating opponents with a lower rating. The player ratings are then rescaled from their original values to a range of 0-100, similar to the team ratings, and average opponent skill levels are also calculated.
Note that the leaderboard ranking may be different than the rankings based on μ alone, but remember that players with high σ have greater uncertainty in their μ values compared with players with lower σ. Hence, there is less certainty regarding a player's average skill level μ if they have a high corresponding σ. With continued match play, all sigmas should decrease over time, unless there are inconsistencies in play due to injury, etc. Sigma may also take a while to decrease if the player continues to play far inferior opponents (for example, a top player with continuing blow-out matches against inferior opponents).
In the example above, Eric's leaderboard rating is 26.07, compared to Natalia's rating of 15.09, despite Natalia having a higher mean skill level μ of 33.00 compared with Eric's mean skill level of 29.82.
An excellent description of the algorithm is found here.
The original publication of the method by Microsoft can be found here.
Trueskill evaluates each match played for the entire season to date and modifies each players μ - mu (skill level) and σ-sigma (uncertainly level of μ) based on the outcome of the match. Trueskill only uses wins/losses, and does not use actually scoring data. There is an extra step which modifies these parameters to account for the order in which the matches were played. Consequently, the μ - mu and σ - sigma values for each player are independent of the match order.
In order to increase the accuracy of early season rankings, we import each returning player's μ - mu and σ - sigma, from the end of the prior season, as their intial skill level for the beginning of the new season. Players that are new to the current season are assigned a default skill level of 0 and an uncertainty level of 6. By comparison, the top players have μ values of 15 - 20, and Ω of approximately 1.0 or lower.
Doubles teams are also ranked using Microsoft's TrueSkill algorithm. However, there is a choice to be made as to whether or not each doubles player receives his/her own rating, or if the doubles team as a whole receives one rating, which is assigned to both players. We choose the latter for several reasons.
The game of doubles is a strategy oriented game, and the success of the team is related not only to the skill levels of each player, but also how evenly matched the partners are and how they perform as a team, instead of two separate singles players. Players that have played many matches together generally have higher team skill levels since they have learned to work together. However, doubles teams where there is a mismatch in player skill levels may perform poorly, since there is always the opportunity for the opposing team to hit the ball exclusively to the weaker player. And doubles partners who have not played together much in the past may perform at a lower level because they are not in sync on their doubles strategies.
Trueskill is capable of combining both singles and doubles play to calculate a single player rating, which is what we have done in the past. However, we found out that the singles and doubles ratings were more accurate when we separated the singles and doubles matches and calculated separate ratings. That is why we now report separate singles and doubles rankings. That still left a choice to as to whether we rate each doubles player individually or rate each doubles team exclusively. In the last couple seasons, we found a few instances where a doubles team had played together for almost the entire season, but the players were split up for one match. One player had won their match, and the other player ended up losing their match, and as a result, the ratings between the two players would sharply diverge, even though they had played almost all of their games together. As a result, the sum of the two players ratings decreased, not because they had lost a match together, but because of the influence of playing with different partners for that one match.
The solution to this problem is to rated each doubles TEAM as its own entity, so that their rating is solely based on the matches they played together. That makes sense since you would expect different combinations of doubles partners to yield different results against the same opponents. Therefore, we rate each unique team, and so provide a ranking of doubles teams, not individuals. Note that this may result in a player showing up several times in the doubles rankings, each instance with a different doubles partner.
Teams that play together often also have greater numbers of matches, and hence the uncertainty in their ranking, σ - sigma, will be lower, and hence their ranking on the leaderboard will be higher.
Microsoft provides a table that shows the minimum number of matches per player that are needed identify the player's skill level:
Microsoft also states that the actual number of games needed may be up to 3 times higher depending on multiple factors, including availability of well-matched opponents and variations in performance per game.
When evaluating the player rankings early on in the season, it is important to keep these facts in mind: