The scikit-learn project started as scikits.learn, a Google Summer of Code project by French data scientistDavid Cournapeau. The name of the project stems from the notion that it is a "SciKit" (SciPy Toolkit), a separately developed and distributed third-party extension to SciPy.[5]
The original codebase was later rewritten by other developers. In 2010, contributors Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort and Vincent Michel, from the French Institute for Research in Computer Science and Automation in Saclay, France, took leadership of the project and released the first public version of the library on February 1, 2010.[6] In November 2012, scikit-learn as well as scikit-image were described as two of the "well-maintained and popular" scikits libraries[update].[7] In 2019, it was noted that scikit-learn is one of the most popular machine learning libraries on GitHub.[8]
Implementation
scikit-learn is largely written in Python, and uses NumPy extensively for high-performance linear algebra and array operations. Furthermore, some core algorithms are written in Cython to improve performance. Support vector machines are implemented by a Cython wrapper around LIBSVM; logistic regression and linear support vector machines by a similar wrapper around LIBLINEAR. In such cases, extending these methods with Python may not be possible.
scikit-learn integrates well with many other Python libraries, such as Matplotlib and plotly for plotting, NumPy for array vectorization, Pandas dataframes, SciPy, and many more.
Version history
scikit-learn was initially developed by David Cournapeau as a Google Summer of Code project in 2007. Later that year, Matthieu Brucher joined the project and started to use it as a part of his thesis work. In 2010, INRIA, the French Institute for Research in Computer Science and Automation, got involved and the first public release (v0.1 beta) was published in late January 2010.