Feature scaling

Feature scaling is a method used to normalize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step.

Motivation

Since the range of values of raw data varies widely, in some machine learning algorithms, objective functions will not work properly without normalization. For example, many classifiers calculate the distance between two points by the Euclidean distance. If one of the features has a broad range of values, the distance will be governed by this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance.

Another reason why feature scaling is applied is that gradient descent converges much faster with feature scaling than without it.[1]

It's also important to apply feature scaling if regularization is used as part of the loss function (so that coefficients are penalized appropriately).

Empirically, feature scaling can improve the convergence speed of stochastic gradient descent. In support vector machines,[2] it can reduce the time to find support vectors. Feature scaling is also often used in applications involving distances and similarities between data points, such as clustering and similarity search. As an example, the K-means clustering algorithm is sensitive to feature scales.

Methods

Rescaling (min-max normalization)

Also known as min-max scaling or min-max normalization, rescaling is the simplest method and consists in rescaling the range of features to scale the range in [0, 1] or [−1, 1]. Selecting the target range depends on the nature of the data. The general formula for a min-max of [0, 1] is given as:[3]

where is an original value, is the normalized value. For example, suppose that we have the students' weight data, and the students' weights span [160 pounds, 200 pounds]. To rescale this data, we first subtract 160 from each student's weight and divide the result by 40 (the difference between the maximum and minimum weights).

To rescale a range between an arbitrary set of values [a, b], the formula becomes:

where are the min-max values.

Mean normalization

where is an original value, is the normalized value, is the mean of that feature vector. There is another form of the means normalization which divides by the standard deviation which is also called standardization.

Standardization (Z-score Normalization)

The effect of z-score normalization on k-means clustering. 4 gaussian clusters of points are generated, then squashed along the y-axis, and a clustering was computed. Without normalization, the clusters were arranged along the x-axis, since it is the axis with most of variation. After normalization, the clusters are recovered as expected.

In machine learning, we can handle various types of data, e.g. audio signals and pixel values for image data, and this data can include multiple dimensions. Feature standardization makes the values of each feature in the data have zero-mean (when subtracting the mean in the numerator) and unit-variance. This method is widely used for normalization in many machine learning algorithms (e.g., support vector machines, logistic regression, and artificial neural networks).[4][5] The general method of calculation is to determine the distribution mean and standard deviation for each feature. Next we subtract the mean from each feature. Then we divide the values (mean is already subtracted) of each feature by its standard deviation.

Where is the original feature vector, is the mean of that feature vector, and is its standard deviation.

Robust Scaling

Robust scaling, also known as standardization using median and interquartile range (IQR), is designed to be robust to outliers. It scales features using the median and IQR as reference points instead of the mean and standard deviation:where are the three quartiles (25th, 50th, 75th percentile) of the feature.

Unit vector normalization

Unit vector normalization regards each individual data point as a vector, and divide each by its vector norm, to obtain . Any vector norm can be used, but the most common ones are the L1 norm and the L2 norm.

For example, if , then its Lp-normalized version is:

See also

References

  1. ^ Ioffe, Sergey; Christian Szegedy (2015). "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". arXiv:1502.03167 [cs.LG].
  2. ^ Juszczak, P.; D. M. J. Tax; R. P. W. Dui (2002). "Feature scaling in support vector data descriptions". Proc. 8th Annu. Conf. Adv. School Comput. Imaging: 25–30. CiteSeerX 10.1.1.100.2524.
  3. ^ "Min Max normalization". ml-concepts.com. Archived from the original on 2023-04-05. Retrieved 2022-12-14.
  4. ^ Grus, Joel (2015). Data Science from Scratch. Sebastopol, CA: O'Reilly. pp. 99, 100. ISBN 978-1-491-90142-7.
  5. ^ Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. ISBN 978-0-387-84884-6.

Further reading

Read other articles:

José Antonio González Curi Gobernador de Campeche 16 de septiembre de 1997-15 de septiembre de 2003Predecesor Jorge Salomón Azar GarcíaSucesor Jorge Carlos Hurtado Valdez Presidente Municipal de Campeche 1994-1996Predecesor Gabriel Escalante CastilloSucesor Javier Buenfil Osorio Información personalNacimiento 4 de mayo de 1952 (71 años)Campeche, CampecheNacionalidad MexicanaFamiliaCónyuge Elvia María Pérez de GonzálezEducaciónEducado en Universidad Internacional de FloridaInfo...

 

 

Computer scientist, Tolkien scholarMatthew DickersonAcademic backgroundEducationDartmouth College (AB)Cornell University (PhD)Academic workDisciplineComputer scienceLiteratureHistory of musicSub-disciplineComputer algebraComputational geometryFantasy literatureJ. R. R. TolkienInstitutionsMiddlebury College Matthew T. Dickerson is an American academic working as a professor of computer science at Middlebury College in Vermont.[1] A scholar of J. R. R. Tolkien's literary work and the In...

 

 

This is the results breakdown of the local elections held in Cantabria on 26 May 2019. The following tables show detailed results in the autonomous community's most populous municipalities, sorted alphabetically.[1][2][3] Opinion polls Main article: Opinion polling for the 2019 Spanish local elections (Cantabria) § Municipalities City control The following table lists party control in the most populous municipalities, including provincial capitals (shown in bold)...

Куйбишевський район рос. Куйбышевский район Прапор Куйбишевського району Основні дані Суб'єкт Російської Федерації: Ростовська область Утворений: 1923 року Населення (2013[1]): 14 322 особи Площа: 872,15 км² Густота населення: 16,4 осіб/км² Телефонний код: 7-86348 Населені пунк...

 

 

Seweryn Wysłouch Seweryn Wysłouch (March 19, 1900 in Pirkowicze near Drohiczyn – February 28, 1968 in Wrocław) was a legal historian and vice-rector of Wrocław University. Biography Seweryn was born in Pirkowicze near Drohiczyn (Polesie, Poland), the Wysłouch family manor. In 1927 he graduated from the School of Law and Social Sciences of the Stefan Batory University in Vilnius and began to work there as an academic. His career was interrupted by the outbreak of the Second World Wa...

 

 

Die Drei Säulen des Donnernden Jupiter sind eine zwischen 1778 und 1785 errichtete Ruinenarchitektur im Exotischen Garten der Universität Hohenheim in Stuttgart. Sie sind eines von drei Objekten, die von den ursprünglich 60 Kleinarchitekturen des Englischen Dorfes noch erhalten sind. Die Säulen stellen die Nachbildung der Reste des Tempels des Vespasian und des Titus auf dem Forum Romanum in Rom dar, der früher für den Tempel des Donnernden Jupiter gehalten wurde. Von den drei Säulen d...

Masjid Jamkaran di Qom,Iran Masjid Jamkaran (Bahasa Arab:مَسجِد جَمكَرَان) adalah sebuah masjid syiah yang terletak Kota Qom, Iran dan di bangun pada abad ke 4 Hijriah. Masjid Jamkaran dibangun atas perintah Abu Hasan,seorang 'Ulama Qom saat itu.Salah seorang 'Ulama Qom lainnya bernama Syaikh Hasan Bin Mitslih Jamkarani bermimpi bertemu Imam Mahdi,dan beliau menyuruh agar membangun Masjid Jamkaran. Menurut Mirza Husain Nuri seorang ahli hadits Syiah, peristiwa ini terjadi tahun...

 

 

Bupati Asmat Republik IndonesiaLambang Bupati Asmat Republik IndonesiaPetahanaElisa Kambu, S.Sossejak 2016Masa jabatan5 tahun (definitif)Dibentuk2003Pejabat pertamaDr. Yohanis Wiro WatkenSitus webSitus Resmi Kabupaten Asmat Kabupaten Asmat dari awal pengesahannya pada tahun 2005 hingga saat ini sudah pernah dipimpin oleh 2 bupati. Saat ini Bupati Asmat dijabat oleh Elisa Kambu. Daftar Bupati Berikut ini adalah Bupati Asmat dari masa ke masa No Bupati Mulai menjabat Akhir menjabat Prd. Ke...

 

 

Children's animated television series Mona the VampireTitle cardGenre Comedy Fantasy Horror Superhero Based onMona the Vampireby Sonia HolleymanHiawyn OramDeveloped by Adam Kempton[1] Ian Lewis[2] Pierre Colin Thibert[1] Directed by Louis Piché Jean Caillon (co-director, S1-3) François Perreau (co-director, S4) StarringEmma Taylor-IsherwoodJustin Bradley (S1-3)Carrie FinlayTia CaroleoMarcel JeanninCarole JeghersEvan Smirnow (S4)Theme music composerJudy Henderson &...

Peter Latz (* 19. Oktober 1939 in Darmstadt) ist ein deutscher Landschaftsarchitekt und Universitätsprofessor. Landschaftspark Duisburg-Nord Inhaltsverzeichnis 1 Werdegang 2 Landschaftsarchitektur 3 Auszeichnungen und Preise 4 Schüler 5 Filmografie 6 Literatur 7 Weblinks 8 Einzelnachweise Werdegang Peter Latz wuchs als ältestes von acht Kindern des aus Saarwellingen stammenden Architekten Heinrich Latz und dessen Frau Marianne (geb. Glückert) im Saarland auf. Nach dem Abitur studierte er ...

 

 

32°28′28″N 35°05′32″E / 32.474416666667°N 35.092136111111°E / 32.474416666667; 35.092136111111 برطعة مركز برطعة تقسيم إداري البلد  فلسطين التقسيم الأعلى منطقة حيفا  خصائص جغرافية إحداثيات 32°28′28″N 35°05′32″E / 32.474416666667°N 35.092136111111°E / 32.474416666667; 35.092136111111  المساحة 4.3 كيلومتر مربع  ا

 

 

1996 live album by Hot TunaClassic Hot Tuna ElectricLive album by Hot TunaReleased1996RecordedJuly 3, 1971VenueFillmore West (San Francisco)GenreBlues rockLabelRelixProducerMichael FalzaranoLeslie D. Kippel (executive)Hot Tuna chronology Classic Hot Tuna Acoustic(1996) Classic Hot Tuna Electric(1996) Splashdown Two(1997) Classic Hot Tuna Electric is a Hot Tuna album released in 1996 and is an expansion of the B-side of the 1985 vinyl release Historic Live Tuna. The tracks were recorde...

У Вікіпедії є статті про: Тит та Тит Флавій Веспасіан Старший. Титлат. T. Flavius T.f.T.n. Vespasianus Римський Імператор 24 червня 79 — 13 вересня 81 Попередник: Веспасіан Наступник: Доміціан Август 24 червня 79 — 13 вересня 81 Попередник: Веспасіан Наступник: Доміціан Цезар 24...

 

 

American sportsactivist (1942–2000) Jack ScottBorn(1942-03-03)March 3, 1942Scranton, Pennsylvania, USDiedFebruary 6, 2000(2000-02-06) (aged 57)Oakland, California, USNationalityAmericanEducationStanford UniversitySyracuse University (BA)UC Berkeley (PhD)Known forsports activismSpouseMicki Scott Jack Scott (b. Scranton, Pennsylvania, March 3, 1942- d. Oakland, California February 6, 2000) was an American political activist known for his concern with exploitation of athletes and rac...

 

 

Village and civil parish in the East Lindsey district of Lincolnshire, England Human settlement in EnglandGrimoldbyChurch of St Edith, GrimoldbyGrimoldbyLocation within LincolnshireOS grid referenceTF392878• London130 mi (210 km) SDistrictEast LindseyShire countyLincolnshireRegionEast MidlandsCountryEnglandSovereign stateUnited KingdomPost townLOUTHPostcode districtLN11Dialling code01507PoliceLincolnshireFireLincolnshireAmbulanceEast Midland...

Colombian musician This biography of a living person needs additional citations for verification. Please help by adding reliable sources. Contentious material about living persons that is unsourced or poorly sourced must be removed immediately from the article and its talk page, especially if potentially libelous.Find sources: Leonor Gonzalez Mina – news · newspapers · books · scholar · JSTOR (August 2022) (Learn how and when to remove this template me...

 

 

Papua NuginiInggris:Independent State of Papua New GuineaTok Pisin:Independen Stet bilong Papua NiuginiHiri Motu:Independen Stet bilong Papua Niu Gini Bendera Lambang Semboyan: Unity in diversity (Inggris: Persatuan dalam Keragaman) [1]Lagu kebangsaan:  O Arise, All You Sons (Indonesia: O Bangunlah, Semua Kalian Putera!)[2] Lagu kerajaan:  God Save the King (Indonesia: Tuhan Menjaga sang Raja) Perlihatkan BumiPerlihatkan peta BenderaLokasi Papua NuginiIbu kota(d...

 

 

Geologic massif that covers a large area in the northwest of France Location of the Armorican Massif on a structural map of the north of France. Hercynian massifs are olive coloured. Geologic map of the Armorican Massif and surrounding areas. The Armorican Massif (French: Massif armoricain, pronounced [masif aʁmɔʁikɛ̃]) is a geologic massif that covers a large area in the northwest of France, including Brittany, the western part of Normandy and the Pays de la Loire. It is importa...

Gangneung OvalPemandangan Gangneung Oval dari atasKapasitas8,000 kursiKonstruksiMulai pembangunanSeptember 2013; 10 tahun lalu (2013-09)DibukaJanuari 2017; 6 tahun lalu (2017-01)PemakaiKejuaraan Seluncur Cepat Jarak Tunggal Dunia 2017Olimpiade Musim Dingin 2018 Gangneung Oval (bahasa Korea: 스피드 스케이팅) adalah arena olahraga di Korea Selatan,[1] yang digunakan untuk kompetisi speed skating di Olimpiade Musim Dingin 2018. Pembangunan oval dimulai pada bulan Sep...

 

 

Alexandros Tziolis Informasi pribadiNama lengkap Alexandros TziolisTanggal lahir 13 Februari 1985 (umur 38)Tempat lahir Katerini, YunaniTinggi 1,89 m (6 ft 2+1⁄2 in)Posisi bermain GelandangInformasi klubKlub saat ini MonacoNomor 38Karier junior1995–2002 Apollon LitochorouKarier senior*Tahun Tim Tampil (Gol)2002–2005 Panionios 61 (3)2005–2010 Panathinaikos 116 (6)2009 → Werder Bremen (pinjaman) 15 (1)2010–2011 Siena 13 (0)2010–2011 → Racing Santander (pi...

 

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!