AlphaGo Zero

AlphaGo Zero is a version of DeepMind's Go software AlphaGo. AlphaGo's team published an article in Nature in October 2017 introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version.[1] By playing games against itself, AlphaGo Zero: surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0; reached the level of AlphaGo Master in 21 days; and exceeded all previous versions in 40 days.[2]

Training artificial intelligence (AI) without datasets derived from human experts has significant implications for the development of AI with superhuman skills, as expert data is "often expensive, unreliable, or simply unavailable."[3] Demis Hassabis, the co-founder and CEO of DeepMind, said that AlphaGo Zero was so powerful because it was "no longer constrained by the limits of human knowledge".[4] Furthermore, AlphaGo Zero performed better than standard deep reinforcement learning models (such as Deep Q-Network implementations[5]) due to its integration of Monte Carlo tree search. David Silver, one of the first authors of DeepMind's papers published in Nature on AlphaGo, said that it is possible to have generalized AI algorithms by removing the need to learn from humans.[6]

Google later developed AlphaZero, a generalized version of AlphaGo Zero that could play chess and Shōgi in addition to Go.[7] In December 2017, AlphaZero beat the 3-day version of AlphaGo Zero by winning 60 games to 40, and with 8 hours of training it outperformed AlphaGo Lee on an Elo scale. AlphaZero also defeated a top chess program (Stockfish) and a top Shōgi program (Elmo).[8][9]

Architecture

The network in AlphaGo Zero is a ResNet with two heads.[1]: Appendix: Methods 

  • The stem of the network takes as input a 17x19x19 tensor representation of the Go board.
    • 8 channels are the positions of the current player's stones from the last eight time steps. (1 if there is a stone, 0 otherwise. If the time step go before the beginning of the game, then 0 in all positions.)
    • 8 channels are the positions of the other player's stones from the last eight time steps.
    • 1 channel is all 1 if black is to move, and 0 otherwise.
  • The body is a ResNet with either 20 or 40 residual blocks and 256 channels.
  • There are two heads, a policy head and a value head.
    • Policy head outputs a logit array of size , representing the logit of making a move in one of the points, plus the logit of passing.
    • Value head outputs a number in the range , representing the expected score for the current player. -1 represents current player losing, and +1 winning.

Training

AlphaGo Zero's neural network was trained using TensorFlow, with 64 GPU workers and 19 CPU parameter servers. Only four TPUs were used for inference. The neural network initially knew nothing about Go beyond the rules. Unlike earlier versions of AlphaGo, Zero only perceived the board's stones, rather than having some rare human-programmed edge cases to help recognize unusual Go board positions. The AI engaged in reinforcement learning, playing against itself until it could anticipate its own moves and how those moves would affect the game's outcome.[10] In the first three days AlphaGo Zero played 4.9 million games against itself in quick succession.[11] It appeared to develop the skills required to beat top humans within just a few days, whereas the earlier AlphaGo took months of training to achieve the same level.[12]

Training cost 3e23 FLOPs, ten times that of AlphaZero.[13]

For comparison, the researchers also trained a version of AlphaGo Zero using human games, AlphaGo Master, and found that it learned more quickly, but actually performed more poorly in the long run.[14] DeepMind submitted its initial findings in a paper to Nature in April 2017, which was then published in October 2017.[1]

Hardware cost

The hardware cost for a single AlphaGo Zero system in 2017, including the four TPUs, has been quoted as around $25 million.[15]

Applications

According to Hassabis, AlphaGo's algorithms are likely to be of the most benefit to domains that require an intelligent search through an enormous space of possibilities, such as protein folding (see AlphaFold) or accurately simulating chemical reactions.[16] AlphaGo's techniques are probably less useful in domains that are difficult to simulate, such as learning how to drive a car.[17] DeepMind stated in October 2017 that it had already started active work on attempting to use AlphaGo Zero technology for protein folding, and stated it would soon publish new findings.[18][19]

Reception

AlphaGo Zero was widely regarded as a significant advance, even when compared with its groundbreaking predecessor, AlphaGo. Oren Etzioni of the Allen Institute for Artificial Intelligence called AlphaGo Zero "a very impressive technical result" in "both their ability to do it—and their ability to train the system in 40 days, on four TPUs".[10] The Guardian called it a "major breakthrough for artificial intelligence", citing Eleni Vasilaki of Sheffield University and Tom Mitchell of Carnegie Mellon University, who called it an impressive feat and an “outstanding engineering accomplishment" respectively.[17] Mark Pesce of the University of Sydney called AlphaGo Zero "a big technological advance" taking us into "undiscovered territory".[20]

Gary Marcus, a psychologist at New York University, has cautioned that for all we know, AlphaGo may contain "implicit knowledge that the programmers have about how to construct machines to play problems like Go" and will need to be tested in other domains before being sure that its base architecture is effective at much more than playing Go. In contrast, DeepMind is "confident that this approach is generalisable to a large number of domains".[11]

In response to the reports, South Korean Go professional Lee Sedol said, "The previous version of AlphaGo wasn’t perfect, and I believe that’s why AlphaGo Zero was made." On the potential for AlphaGo's development, Lee said he will have to wait and see but also said it will affect young Go players. Mok Jin-seok, who directs the South Korean national Go team, said the Go world has already been imitating the playing styles of previous versions of AlphaGo and creating new ideas from them, and he is hopeful that new ideas will come out from AlphaGo Zero. Mok also added that general trends in the Go world are now being influenced by AlphaGo's playing style. "At first, it was hard to understand and I almost felt like I was playing against an alien. However, having had a great amount of experience, I’ve become used to it," Mok said. "We are now past the point where we debate the gap between the capability of AlphaGo and humans. It’s now between computers." Mok has reportedly already begun analyzing the playing style of AlphaGo Zero along with players from the national team. "Though having watched only a few matches, we received the impression that AlphaGo Zero plays more like a human than its predecessors," Mok said.[21] Chinese Go professional Ke Jie commented on the remarkable accomplishments of the new program: "A pure self-learning AlphaGo is the strongest. Humans seem redundant in front of its self-improvement."[22]

Comparison with predecessors

Configuration and strength[23]
Versions Playing hardware[24] Elo rating Matches
AlphaGo Fan 176 GPUs,[2] distributed 3,144[1] 5:0 against Fan Hui
AlphaGo Lee 48 TPUs,[2] distributed 3,739[1] 4:1 against Lee Sedol
AlphaGo Master 4 TPUs,[2] single machine 4,858[1] 60:0 against professional players;

Future of Go Summit

AlphaGo Zero (40 days) 4 TPUs,[2] single machine 5,185[1] 100:0 against AlphaGo Lee

89:11 against AlphaGo Master

AlphaZero (34 hours) 4 TPUs, single machine[8] 4,430 (est.)[8] 60:40 against a 3-day AlphaGo Zero

AlphaZero

On 5 December 2017, DeepMind team released a preprint on arXiv, introducing AlphaZero, a program using generalized AlphaGo Zero's approach, which achieved within 24 hours a superhuman level of play in chess, shogi, and Go, defeating world-champion programs, Stockfish, Elmo, and 3-day version of AlphaGo Zero in each case.[8]

AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ) algorithm, and is able to play shogi and chess as well as Go. Differences between AZ and AGZ include:[8]

  • AZ has hard-coded rules for setting search hyperparameters.
  • The neural network is now updated continually.
  • Chess (unlike Go) can end in a tie; therefore AZ can take into account the possibility of a tie game.

An open source program, Leela Zero, based on the ideas from the AlphaGo papers is available. It uses a GPU instead of the TPUs recent versions of AlphaGo rely on.

References

  1. ^ a b c d e f g Silver, David; Schrittwieser, Julian; Simonyan, Karen; Antonoglou, Ioannis; Huang, Aja; Guez, Arthur; Hubert, Thomas; Baker, Lucas; Lai, Matthew; Bolton, Adrian; Chen, Yutian; Lillicrap, Timothy; Fan, Hui; Sifre, Laurent; Driessche, George van den; Graepel, Thore; Hassabis, Demis (19 October 2017). "Mastering the game of Go without human knowledge" (PDF). Nature. 550 (7676): 354–359. Bibcode:2017Natur.550..354S. doi:10.1038/nature24270. ISSN 0028-0836. PMID 29052630. S2CID 205261034. Archived (PDF) from the original on 18 July 2018. Retrieved 2 September 2019.Closed access icon
  2. ^ a b c d e Hassabis, Demis; Siver, David (18 October 2017). "AlphaGo Zero: Learning from scratch". DeepMind official website. Archived from the original on 19 October 2017. Retrieved 19 October 2017.
  3. ^ "Google's New AlphaGo Breakthrough Could Take Algorithms Where No Humans Have Gone". Yahoo! Finance. 19 October 2017. Archived from the original on 19 October 2017. Retrieved 19 October 2017.
  4. ^ Knapton, Sarah (18 October 2017). "AlphaGo Zero: Google DeepMind supercomputer learns 3,000 years of human knowledge in 40 days". The Telegraph. Archived from the original on 19 October 2017. Retrieved 19 October 2017.
  5. ^ mnj12 (7 July 2021), mnj12/chessDeepLearning, retrieved 7 July 2021{{citation}}: CS1 maint: numeric names: authors list (link)
  6. ^ "DeepMind AlphaGo Zero learns on its own without meatbag intervention". ZDNet. 19 October 2017. Archived from the original on 20 October 2017. Retrieved 20 October 2017.
  7. ^ https://www.idi.ntnu.no/emner/it3105/materials/neural/silver-2017b.pdf
  8. ^ a b c d e Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI].
  9. ^ Knapton, Sarah; Watson, Leon (6 December 2017). "Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours". The Telegraph. Archived from the original on 2 December 2020. Retrieved 5 April 2018.
  10. ^ a b Greenemeier, Larry. "AI versus AI: Self-Taught AlphaGo Zero Vanquishes Its Predecessor". Scientific American. Archived from the original on 19 October 2017. Retrieved 20 October 2017.
  11. ^ a b "Computer Learns To Play Go At Superhuman Levels 'Without Human Knowledge'". NPR. 18 October 2017. Archived from the original on 20 October 2017. Retrieved 20 October 2017.
  12. ^ "Google's New AlphaGo Breakthrough Could Take Algorithms Where No Humans Have Gone". Fortune. 19 October 2017. Archived from the original on 19 October 2017. Retrieved 20 October 2017.
  13. ^ "Data on Notable AI Models". Epoch AI. 19 June 2024. Retrieved 29 November 2024.
  14. ^ "This computer program can beat humans at Go—with no human instruction". Science | AAAS. 18 October 2017. Archived from the original on 2 February 2022. Retrieved 20 October 2017.
  15. ^ Gibney, Elizabeth (18 October 2017). "Self-taught AI is best yet at strategy game Go". Nature News. doi:10.1038/nature.2017.22858. Archived from the original on 1 May 2020. Retrieved 10 May 2020.
  16. ^ "The latest AI can work things out without being taught". The Economist. Archived from the original on 19 October 2017. Retrieved 20 October 2017.
  17. ^ a b Sample, Ian (18 October 2017). "'It's able to create knowledge itself': Google unveils AI that learns on its own". The Guardian. Archived from the original on 19 October 2017. Retrieved 20 October 2017.
  18. ^ "'It's able to create knowledge itself': Google unveils AI that learns on its own". The Guardian. 18 October 2017. Archived from the original on 19 October 2017. Retrieved 26 December 2017.
  19. ^ Knapton, Sarah (18 October 2017). "AlphaGo Zero: Google DeepMind supercomputer learns 3,000 years of human knowledge in 40 days". The Telegraph. Archived from the original on 15 December 2017. Retrieved 26 December 2017.
  20. ^ "How Google's new AI can teach itself to beat you at the most complex games". Australian Broadcasting Corporation. 19 October 2017. Archived from the original on 20 October 2017. Retrieved 20 October 2017.
  21. ^ "Go Players Excited About 'More Humanlike' AlphaGo Zero". Korea Bizwire. 19 October 2017. Archived from the original on 21 October 2017. Retrieved 21 October 2017.
  22. ^ "New version of AlphaGo can master Weiqi without human help". China News Service. 19 October 2017. Archived from the original on 19 October 2017. Retrieved 21 October 2017.
  23. ^ "【柯洁战败解密】AlphaGo Master最新架构和算法,谷歌云与TPU拆解" (in Chinese). Sohu. 24 May 2017. Archived from the original on 17 September 2017. Retrieved 1 June 2017.
  24. ^ Hardware used during training may be substantially more powerful

Read other articles:

1921年11月17日,《北京大学日刊》中《发起马克斯学说研究会启事》原文 马克思学说研究会1920年3月在北京大学创立,是中国“最早研究与宣传马克思主义的理论团体”,对马克思主义在中国的传播有开拓性贡献。[1]:93 称谓 因对卡尔·马克思的译名不同,有“马克斯学说研究会”、“马克思学说研究会”、“马克司学说研究会”三种称呼。研究会会员陈仲瑜回忆说:“...

 

Ubicación del Fuerte Venus en la isla de Tahití. El 3 de junio de 1769 el navegante inglés James Cook, el astrónomo británico Charles Green y el naturalista sueco Daniel Solander observaron y grabaron el tránsito de Venus en la isla de Tahití durante el primer viaje de Cook alrededor del mundo.[1]​ Durante un tránsito, Venus aparece como un pequeño disco negro viajando a través del Sol. Este fenómeno astronómico inusual tiene lugar en un patrón que se repite cada 243 años....

 

Eucithara angela Classificação científica Domínio: Eukaryota Reino: Animalia Filo: Mollusca Classe: Gastropoda Subclasse: Caenogastropoda Ordem: Neogastropoda Superfamília: Conoidea Família: Mangeliidae Gênero: Eucithara Espécies: E. angela Nome binomial Eucithara angela(Adams & Angas, 1864) Sinónimos[1] Cithara angela Adams & Angas, 1864 (combinação original) Cithara balansai Crosse, 1873 Eucithara balansai J.C.H. Crosse, 1873 Mangilia angela (Adams & Angas, 1864)...

Le coucher de la mariée Ficha técnicaDirección Albert KirchnerProducción Eugène PirouProtagonistas Louise Willy Ver todos los créditos (IMDb)Datos y cifrasPaís FranciaAño 1896Género Cine erótico y cine pornográficoDuración 7 minutos y 3 minutosIdioma(s) FrancésFicha en IMDbFicha en FilmAffinity[editar datos en Wikidata] Le Coucher de la Mariée (El acostarse de la casada) es un cortometraje francés que se considera como una de las primeras películas pornográficas de ...

 

Swedish pop band For the song by Lee Ryan, see Army of Lovers (song). For the film after which the group was named, see Army of Lovers or Revolt of the Perverts. This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Army of Lovers – news · newspapers · books · scholar · JSTOR (October 2023) (Learn how and when to ...

 

2022 Ukrainian patriotic song Good Evening (Where Are You From?)Song by ProBass and HardiLanguageUkrainianReleasedOctober 2021GenreUkrainian folk music, electronic dance musicLength 2:48 Songwriter(s)ProBass, Hardi Good Evening (Where Are You From?) (Ukrainian: Доброго вечора [Where Are You From?]), also known by its incipit Good evening, we are from Ukraine (Ukrainian: Доброго вечора, ми з України),[1][2] is a song by the Ukrainian electron...

高齢の超正統派ユダヤ人男性がトーラーの詠唱を行っている様子 ハシディーム派の家族、ニューヨーク・ブルックリンにて 超正統派の男性信者黒い帽子に黒いスーツ・長い髭を伸ばすのは超正統派男性信者の容姿の特徴でもある。 超正統派、ハレーディー(ム)(חֲרֵדִים Haredi または Charedi Judaism, ultra-Orthodox Judaism)とは、ユダヤ教の宗派の一部。ユダヤ教正統派...

 

Na de verkiezingen voor het Belgisch Parlement op 27 november 1932 ging de formatie van een nieuwe Belgische regering van start. De formatie, die negen dagen duurde, leidde tot de vorming van de regering-De Broqueville IV. Verloop van de formatie Tijdslijn Aanloop naar de formatie Op 27 november 1932 vonden in volle economische crisis vervroegde verkiezingen plaats. De Katholieke Unie bleef de grootste partij en ging er een paar procenten op vooruit: haar resultaat steeg van 35 procent bij de...

 

Dutch General and German Field Marshal Prince Georg Friedrich of WaldeckGeorg Friedrich of WaldeckBorn(1620-01-31)31 January 1620Arolsen, County of WaldeckDied19 November 1692(1692-11-19) (aged 72)Arolsen, County of WaldeckAllegiance Brandenburg-Prussia  Habsburg Monarchy  Dutch Republic Service/branchDutch States ArmyRankField marshalBattles/wars Second Northern War Warsaw Franco-Dutch War Naarden Seneffe Cassel Saint-Denis Great Turkish War Vienna Nine Years' War Walcourt Fle...

Human settlement in Northern IrelandLaganbank District Electoral AreaMap showing Laganbank wards within BelfastArea8.37 km2 (3.23 sq mi)Population32,316 (2008 Estimate)• Density3,861/km2 (10,000/sq mi)DistrictBelfast City CouncilCountyCounty Antrim County DownCountryNorthern IrelandSovereign stateUnited KingdomUK ParliamentBelfast SouthNI AssemblyBelfast South List of places UK Northern Ireland Laganbank was one of the nine district electoral are...

 

Czech footballer David Limberský Limberský with the Czech Republic at UEFA Euro 2012Personal informationDate of birth (1983-10-06) 6 October 1983 (age 40)Place of birth Plzeň, CzechoslovakiaHeight 1.81 m (5 ft 11 in)Position(s) Left backTeam informationCurrent team Jiskra DomažliceNumber 8Youth career1989–1990 Tatran Třemošná1990–2002 Viktoria PlzeňSenior career*Years Team Apps (Gls)2003–2007 Viktoria Plzeň 56 (7)2004 → Modena (loan) 4 (0)2005 → Tottenha...

 

Czech footballer Tereza Kožárová Personal informationFull name Tereza KožárováDate of birth (1991-10-18) 18 October 1991 (age 32)Place of birth Děčín, Czech RepublicHeight 1.77 m (5 ft 10 in)Position(s) StrikerTeam informationCurrent team Sparta PragueNumber 17Senior career*Years Team Apps (Gls)2006–2012 Sparta Prague 2012–2023 Slavia Prague ? (103)2019 → Slovan Liberec (loan) 2023– Sparta Prague 0 (0)International career‡2010– Czech Republic 13 (5) *C...

Village in Leinster, IrelandThe Rower An RobharVillageApproaching The Rower from the south on the R705The RowerLocation in IrelandCoordinates: 52°27′16″N 6°57′31″W / 52.4544°N 6.9587°W / 52.4544; -6.9587CountryIrelandProvinceLeinsterCountyCounty KilkennyTime zoneUTC+0 (WET) • Summer (DST)UTC-1 (IST (WEST)) Rower or The Rower (Irish: An Robhar)[1] is a small village in County Kilkenny, Ireland. The Rower is on the R705 regional road, roug...

 

Regulatory body for securities in India Securities and Exchange Board of IndiaSEBI Bhavan (headquarters) in MumbaiAgency overviewFormedApril 12, 1988; 35 years ago (1988-04-12) (Established) January 30, 1992; 31 years ago (1992-01-30) (Acquired Statutory Status)[1]TypeRegulatory agencyHeadquartersMumbai, MaharashtraEmployees867+ (2020)[2]Agency executiveMadhabi Puri Buch, ChairpersonParent departmentMinistry of Finance, Government of IndiaCh...

 

City in Nevada, United States City in Nevada, United StatesWinnemucca, NevadaCityDowntown Winnemucca viewed from Winnemucca MountainNickname: City of Paved Streets[1][2]Humboldt County and City of Winnemucca, NevadaWinnemuccaShow map of NevadaWinnemuccaShow map of the United StatesCoordinates: 40°58′6″N 117°43′36″W / 40.96833°N 117.72667°W / 40.96833; -117.72667CountryUnited StatesStateNevadaCountyHumboldtNamed forChief WinnemuccaGovern...

Sailing race trophy This article is about the international yachting trophy. For other uses, see America's Cup (disambiguation). For the most recent race, see 2021 America's Cup. America's CupThe America's Cup ewerSportSailing match raceFounded1851Most recentchampion(s) Royal New Zealand Yacht Squadron (4th title)Most titles New York Yacht Club (25 titles)Official websiteAmericasCup.com The America's Cup, informally known as the Auld Mug, is a trophy awarded in the sport of sail...

 

State electoral district of Victoria, Australia Box HillVictoria—Legislative AssemblyLocation of Box Hill (dark green) in Greater MelbourneStateVictoriaCreated1945MPPaul HamerPartyLaborNamesakeSuburb of Box HillElectors48,260 (2022)Area29 km2 (11.2 sq mi)DemographicMetropolitan Electorates around Box Hill: Kew Bulleen Warrandyte Hawthorn Box Hill Ringwood Ashwood Ashwood Glen Waverley The electoral district of Box Hill is an electoral district of the Victorian Legislativ...

 

American architect (1720–1801) Benjamin LoxleyBorn(1720-12-20)December 20, 1720Wakefield, EnglandDiedDarby, Pennsylvania, U.S.Occupation(s)Carpenter-architect, master builder, investorOrganizationCarpenters' Company of the City and County of PhiladelphiaMilitary careerAllegianceProvince of Pennsylvania United StatesService/branchPhiladelphia AssociatorsYears of service1742–1780RankMajorBattles/warsBattle of Red BankBattle of Brandywine Benjamin Loxley, also known as Benjamin Loc...

此條目没有列出任何参考或来源。 (2019年5月14日)維基百科所有的內容都應該可供查證。请协助補充可靠来源以改善这篇条目。无法查证的內容可能會因為異議提出而被移除。 戈尔诺市镇Gorno戈尔诺戈尔诺在意大利的位置坐标:45°52′00″N 9°50′00″E / 45.8667°N 9.8333°E / 45.8667; 9.8333国家 義大利省份/广域市贝加莫省面积 • 总计9 平方公里(3 ...

 

Tu-22MΤύποςΣτρατηγικό βομβαρδιστικόΚατασκευαστήςTupolevΠαρθενική πτήση30 Αυγούστου 1969Πρώτη παρουσίαση1972ΚατάστασηΣε υπηρεσίαΚύριος χειριστήςΡωσική Πολεμική ΑεροπορίαΠαραγωγή1967–1997[1]Μονάδες που παρήχθησαν497Αναπτύχθηκε απόTupolev Tu-22 Το Tupolev Tu-22M (Ρωσικά: Туполев Ту-22М; Ονο...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!