The Irises of the Iris Data Set
Adobe Firefly

The Irises of the Iris Data Set

If you mention the "iris flower data set" to a data scientist, many will know what you're talking about. It's a set of 50 measurements for three species of flowers that's often used as a sample data set to teach machine learning models and concepts.

The R statistical package interface with a scatter plot of iris measurements
Data table and visualization of iris data set in R

But most probably won't know the history of the data set or even what the irises look like.

I've done some research into the set and have found some interesting stories about its history, but for this article, I just want to introduce irises and the three species mentioned in the data set.

The Origins of the Data Set

In the 1930s a botanist named Edgar Anderson took measurements of three species of irises mostly in the Gaspé peninsula of Québec, Canada and collections in Harvard and the Missouri Botanical Garden. These species were Iris versicolor, Iris virginica, and Iris setosa var. canadensis (now called hookeri). 

Sepia image of young man in glasses and suit and tie
R.A. Fisher (wikipedia)

The famed statistician R.A. Fisher, who came up with foundational statistical methods including the F-distribution and the student's t-distribution, used Anderson's set of iris measurements in his 1936 article The use of multiple measurements in taxonomic problems and it became known as the "Fisher's Iris data set."

The Flowers

Iris versicolor

No alt text provided for this image
Iris versicolor (https://www.inaturalist.org/photos/207803529)

Iris versicolor is usually known by the name, "Northern Blue Flag" and its Latin or Linnean name versicolor means 'many colors'. It's found throughout the Upper Midwest and Northeast sections of the United States as well as the Canadian provinces that border these sections. It is the state flower of Tennessee and the provincial flower of Québec. Like most irises, Iris versicolor is poisonous and can lead to severe intestinal disorder. But despite its toxicity, some people put the iris in cash registers for financial good luck.

Iris virginica

No alt text provided for this image
Iris virginica (https://www.inaturalist.org/photos/135126643)

This iris is versicolor's southern counterpart and is known as the "Southern Blue Flag." It is found in the Midwest and Southern regions of the United States. The plant has been used by indigenous people as a folk remedy and the Seminole peoples of Florida used it to treat shock from an alligator bite. Edgar Anderson identified this iris as a separate species from Iris versicolor.



Iris hookeri

No alt text provided for this image
Iris hookeri (https://www.inaturalist.org/photos/43964738)

The iris is listed as setosa in the data set, but the flower Anderson surveyed is now called Iris hookeri named after the 19th century botanist, William Jackson Hooker. It is found along beaches in the Northeast United States and East Canada. While this iris lacks the variety of colors that the other irises display, nonetheless Anderson states that it "makes much more of a showing in the landscape, for its flowers are raised well above the foliage...."

Thanks to Mara McHaffie, Kathleen Houlahan Chayer, and Étienne Lacroix-Carignan and for their iris photos!

Bob Love

Student at The Johns Hopkins University

9mo

Who knew that a botanist observing irises in the 1930s would have such an impact in the XXI century!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics