최근 대학원 스터디에서 시계열분석에 대한 주제로 발표를 하게 되었다.

워낙 통계를 싫어하는 터라 부정적인 입장을 갖고 바라보곤 했는데...

일단 발표를 해야하니 공부를 했다.

처음에는 무슨말인지 하나도 모르겠었는데...

보다보니 또 무슨말인지 이해가 되는...ㅋㅋㅋㅋㅋ

시계열분석 책

구글에서 여타 다른 책들을 많이 찾아봤지만, 전공이 수학이라 그런지

수학적인 수식과 설명이 잘 되어있지 않은 책은 별로 눈에 잘 가지 않는다...

그나마 이 책이 눈에 잘 들어오는데(저자 이름이 더 눈에 띈다...더글라스 C. 몽고메리...그래서 선택했을수도...)

 

 

 

 

첫번째 책 이외에 주로 내가 보는 통계학 책이다.(이곳에 이렇게 올려도 저작권 침해는 안받겠지....?)

(가냘픈 대학원생을 용서해주세요 대가님들과 출판업계분들...)

암튼 책소개하려는건 아니고...(이미 다 소개했지만)

앞으로 올리는 글에 이 책의 내용들이 포함될 소지가 있어 사전에 미리 알리기 위해 직접 책을 스샷찍어서 올린다.

(부디 많이 많이 팔려 저 말고 출판업계 돈 많이 벌고 저자들도 돈 많이벌게 해주세요...저말고...저말고...)

 

암튼 기초통계는 대략 생략하고

시계열분석의 코드와 개념 위주로 글을 쓸 예정이니....

잘 정리해놓고 기억안날때 마다 자주 들어와서 좀 보고 복기해라...쓰니야...

 

Source

 

 

'수학 > Time Series Analysis' 카테고리의 다른 글

Chapter1 : 1.2 Some Time Series Data  (0) 2018.01.23

얼마 전, 후배에게 연락이 와서 이 책을 소개해줬다.

이 책을 읽기 전과 후 자신의 세상을 보는 눈이 달라졌다고 하는데...

21세기 자본주의를 살아가는 우리에게 도움이 될 것 같다.

후기는 읽고나서...

'독서' 카테고리의 다른 글

부는 어디에서오는가-에릭 바인하커  (0) 2018.01.23

 

 

 

 

웹스크래핑

In [92]:
from selenium import webdriver
from bs4 import BeautifulSoup
import re
def cleanText(readData):
 
    #텍스트에 포함되어 있는 특수 문자 제거
 
    text = re.sub('[-=+,#/\?:^$.@*\"※~&%ㆍ!』\\‘|\(\)\[\]\<\>`\'…》]', '', readData)
 
    return text
driver = webdriver.Chrome('C:/Users/User/Downloads/chromedriver')
#웹 자원 로드를 위해 3초 대기
driver.implicitly_wait(3)
#url접근
keyword = "주식"
URL = 'https://search.naver.com/search.naver?where=news&sm=tab_jum&query='
driver.get('https://search.naver.com/search.naver?where=news&sm=tab_jum&query='+keyword)
#url=driver.find_element_by_css_selector('a._sp_each_title').text
title = driver.find_element_by_css_selector('a._sp_each_title').text
print(title)
print(URL)
 
KB금융, 푸르덴셜생명 2.3조에 인수…주식매매계약 체결(종합2보)
https://search.naver.com/search.naver?where=news&sm=tab_jum&query=
In [93]:
#print(driver)
import urllib.request
selected_tag_a = driver.find_element_by_tag_name('a')
selected_link = driver.find_elements_by_partial_link_text('')
soup = BeautifulSoup(driver.page_source, 'lxml')
resp = urllib.request.urlopen(URL)
soup1 = BeautifulSoup(resp,from_encoding=resp.info().get_param('charset'), features='html.parser')
items = soup.select('a._sp_each_title')
d1 = soup.select('dt')
url = soup.select('a._sp_each_url')
source_time = soup.select('dd.txt_inline')
def rr(data):
    result = re.sub("<.+?>", "", str(data))
    return result

#OMG = re.sub("<.+?>", "", str(items))
OMG = rr(url)
OMG2 = rr(source_time)
print(OMG2)
print(OMG)
test1 = soup1.find_all('a', herf=True)
#import pandas as pd
#data = {'name' : OMG, 'time_souce' : OMG2}
#pd.DataFrame(data)
print(test1)
print(resp)
 
[연합뉴스언론사 선정  1일 전  네이버뉴스   보내기  , 서울경제언론사 선정  9시간 전  네이버뉴스   보내기  , 중앙일보언론사 선정  3시간 전  네이버뉴스   보내기  , 조선일보언론사 선정  2일 전  네이버뉴스   보내기  , 파이낸셜뉴스언론사 선정  1일 전  네이버뉴스   보내기  , 아이뉴스24언론사 선정  5시간 전  네이버뉴스   보내기  , KBS  1일 전  네이버뉴스   보내기  , MBC  1일 전  네이버뉴스   보내기  , EBN  1일 전   보내기  , 뉴스핌  1일 전   보내기  ]
[네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, 네이버뉴스, KB금융 '푸르덴셜생명보험' 주식매매계약 체결, 대신證 "국내외 주식 1억이상 거래하고, 축하금 받자"]
[]
<http.client.HTTPResponse object at 0x00000254F3886CC0>
In [133]:
import urllib.request
from bs4 import BeautifulSoup

url = "https://search.naver.com/search.naver?where=news&sm=tab_jum&query=%EC%A3%BC%EC%8B%9D"
req = urllib.request.Request(url)
sourcecode = urllib.request.urlopen(url).read()
soup = BeautifulSoup(sourcecode, "html.parser")
#aa = soup.find("div", class_ = "news mynews section _prs_nws").find_all("li")
#print(aa[3].find("a")["href"])
aaa = [0]
#for href in soup.find("div", class_ = "news mynews section _prs_nws").find_all("li"):
	#print(href.get('href'))
    
for href in soup.find_all("a"):
    print(href.get('href'))
 
#lnb
#content
https://www.naver.com
#
https://help.naver.com/support/alias/search/word/word_16.naver
#
#
https://nid.naver.com/nidlogin.login?url=https%3A%2F%2Fsearch.naver.com%2Fsearch.naver%3Fwhere%3Dnews%26sm%3Dtab_jum%26query%3D%25EC%25A3%25BC%25EC%258B%259D
https://help.naver.com/support/alias/search/word/word_16.naver
https://help.naver.com/support/alias/search/word/word_21.naver
https://help.naver.com/support/alias/search/word/word_17.naver
https://help.naver.com/support/alias/search/word/word_18.naver
javascript:;
javascript:;
https://help.naver.com/support/alias/search/word/word_17.naver
https://help.naver.com/support/alias/search/word/word_18.naver
javascript:;
javascript:;
https://help.naver.com/support/alias/search/word/word_17.naver
https://help.naver.com/support/alias/search/word/word_18.naver
javascript:;
javascript:;
https://help.naver.com/support/alias/search/word/word_17.naver
https://help.naver.com/support/alias/search/word/word_18.naver
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
#
?where=nexearch&sm=tab_jum&query=%EC%A3%BC%EC%8B%9D
?where=news&sm=tab_jum&query=%EC%A3%BC%EC%8B%9D
https://dict.naver.com/search.nhn?dicQuery=%EC%A3%BC%EC%8B%9D&query=%EC%A3%BC%EC%8B%9D&target=dic&query_utf=&isOnlyViewEE=
?where=article&sm=tab_jum&query=%EC%A3%BC%EC%8B%9D
?where=realtime&sm=tab_jum&query=%EC%A3%BC%EC%8B%9D
?where=image&sm=tab_jum&query=%EC%A3%BC%EC%8B%9D
https://book.naver.com/search/search.nhn?query=%EC%A3%BC%EC%8B%9D
?where=post&sm=tab_jum&query=%EC%A3%BC%EC%8B%9D
#
?where=kin&sm=tab_jum&query=%EC%A3%BC%EC%8B%9D
?where=kdic&sm=tab_jum&query=%EC%A3%BC%EC%8B%9D
?where=webkr&sm=tab_jum&query=%EC%A3%BC%EC%8B%9D
?where=video&sm=tab_jum&query=%EC%A3%BC%EC%8B%9D
https://search.shopping.naver.com/search/all.nhn?where=all&frm=NVSCTAB&query=%EC%A3%BC%EC%8B%9D
https://map.naver.com/v5/search/%EC%A3%BC%EC%8B%9D
https://m.post.naver.com/search/post.nhn?keyword=%EC%A3%BC%EC%8B%9D
https://vibe.naver.com/search?query=%EC%A3%BC%EC%8B%9D
https://academic.naver.com/search.naver?field=0&query=%EC%A3%BC%EC%8B%9D
https://audioclip.naver.com/search/all?keyword=%EC%A3%BC%EC%8B%9D
#
https://help.naver.com/support/alias/search/integration/integration_1.naver
https://help.naver.com/support/alias/search/integration/integration_2.naver
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
https://help.naver.com/support/alias/search/integration/news2.naver
#order_cat
#order_abc
#
#
https://help.naver.com/support/alias/search/integration/integration_4.naver
#
#
#
#
#
#
#
http://yna.kr/AKR20200410041052002?did=1195m
http://yna.kr/AKR20200410041052002?did=1195m
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=001&aid=0011539571
#
http://www.newsis.com/view/?id=NISX20200410_0000990286&cID=10404&pID=10400
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=003&aid=0009809215
http://ilyo.co.kr/?ac=article_view&entry_id=366823
https://www.vop.co.kr/A00001481268.html
#
https://www.sedaily.com/NewsView/1Z1FM2MBWX
https://www.sedaily.com/NewsView/1Z1FM2MBWX
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=011&aid=0003723145
#
http://www.newspim.com/news/view/20200410001105
https://news.joins.com/article/olink/23346890
https://news.joins.com/article/olink/23346890
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=025&aid=0002992034
#
https://news.chosun.com/site/data/html_dir/2020/04/09/2020040901990.html?utm_source=naver&utm_medium=original&utm_campaign=news
https://news.chosun.com/site/data/html_dir/2020/04/09/2020040901990.html?utm_source=naver&utm_medium=original&utm_campaign=news
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=023&aid=0003522288
#
http://yna.kr/AKR20200409046200003?did=1195m
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=001&aid=0011534925
https://biz.chosun.com/site/data/html_dir/2020/04/09/2020040902083.html?utm_source=naver&utm_medium=original&utm_campaign=biz
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=366&aid=0000507239
http://www.dt.co.kr/contents.html?article_no=2020040902109932781006&ref=naver
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=029&aid=0002592977
http://news.khan.co.kr/kh_news/khan_art_view.html?artid=202004091544001&code=920101
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=032&aid=0003002790
#
http://www.fnnews.com/news/202004101440328115
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=014&aid=0004406408
#
http://www.cnbnews.com/news/article.html?no=443020
http://www.cnews.co.kr/uhtml/read.jsp?idxno=202004101515039580888
http://www.inews24.com/view/1257226
http://www.inews24.com/view/1257226
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=031&aid=0000533897
#
http://news.kbs.co.kr/news/view.do?ncd=4422243&ref=A
http://news.kbs.co.kr/news/view.do?ncd=4422243&ref=A
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=056&aid=0010817646
#
https://www.seoul.co.kr/news/newsView.php?id=20200410500109&wlog_tag3=naver
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=081&aid=0003081491
http://www.viva100.com/main/view.php?key=20200410010004020
http://yna.kr/AKR20200410063600002?did=1195m
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=004&oid=001&aid=0011538924
http://www.kukinews.com/news/article.html?no=775370
#
https://imnews.imbc.com/news/2020/econo/article/5718850_32647.html
https://imnews.imbc.com/news/2020/econo/article/5718850_32647.html
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=214&aid=0001030053
#
https://www.sedaily.com/NewsView/1Z1F6BKVD7
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=011&aid=0003723041
http://www.ebn.co.kr/news/view/1423890/?sc=naver
http://www.ebn.co.kr/news/view/1423890/?sc=naver
#
https://view.asiae.co.kr/article/2020041014365687672
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=277&aid=0004659543
http://www.enewstoday.co.kr/news/articleView.html?idxno=1380591
http://www.busan.com/view/busan/view.php?code=2020041016065137342
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=082&aid=0000995401
http://www.weeklytoday.com/news/articleView.html?idxno=168815
#
http://www.newspim.com/news/view/20200410000854
http://www.newspim.com/news/view/20200410000854
#
http://www.shinailbo.co.kr/news/articleView.html?idxno=1268846
http://news.mt.co.kr/mtview.php?no=2020041013231790945
https://news.naver.com/main/read.nhn?mode=LSD&mid=sec&sid1=101&oid=008&aid=0004392630
http://www.ebn.co.kr/news/view/1423906/?sc=naver
http://www.newsworks.co.kr/news/articleView.html?idxno=447290
#
?&where=news&query=%EC%A3%BC%EC%8B%9D&sm=tab_pge&sort=0&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:r,p:all,a:all&mynews=0&cluster_rank=33&start=11
?&where=news&query=%EC%A3%BC%EC%8B%9D&sm=tab_pge&sort=0&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:r,p:all,a:all&mynews=0&cluster_rank=33&start=21
?&where=news&query=%EC%A3%BC%EC%8B%9D&sm=tab_pge&sort=0&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:r,p:all,a:all&mynews=0&cluster_rank=33&start=31
?&where=news&query=%EC%A3%BC%EC%8B%9D&sm=tab_pge&sort=0&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:r,p:all,a:all&mynews=0&cluster_rank=33&start=41
?&where=news&query=%EC%A3%BC%EC%8B%9D&sm=tab_pge&sort=0&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:r,p:all,a:all&mynews=0&cluster_rank=33&start=51
?&where=news&query=%EC%A3%BC%EC%8B%9D&sm=tab_pge&sort=0&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:r,p:all,a:all&mynews=0&cluster_rank=33&start=61
?&where=news&query=%EC%A3%BC%EC%8B%9D&sm=tab_pge&sort=0&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:r,p:all,a:all&mynews=0&cluster_rank=33&start=71
?&where=news&query=%EC%A3%BC%EC%8B%9D&sm=tab_pge&sort=0&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:r,p:all,a:all&mynews=0&cluster_rank=33&start=81
?&where=news&query=%EC%A3%BC%EC%8B%9D&sm=tab_pge&sort=0&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:r,p:all,a:all&mynews=0&cluster_rank=33&start=91
?&where=news&query=%EC%A3%BC%EC%8B%9D&sm=tab_pge&sort=0&photo=0&field=0&reporter_article=&pd=0&ds=&de=&docid=&nso=so:r,p:all,a:all&mynews=0&cluster_rank=33&start=11
https://help.naver.com/support/alias/news/news_21.naver
#
#
#
#
#
#
#
#
#
#
#
#
#
https://help.naver.com/support/alias/search/word/word_1.naver
?where=news&query=%EC%B2%AD%EB%8B%B4%EB%8F%99%EC%A3%BC%EC%8B%9D%EB%B6%80%EC%9E%90%EC%9D%B4%ED%9D%AC%EC%A7%84&ie=utf8&sm=tab_she&qdt=0
?where=news&query=%EC%9E%84%EC%A7%80%EC%9B%90&ie=utf8&sm=tab_she&qdt=0
?where=news&query=%EA%B5%AD%EB%82%B4%EC%A3%BC%EC%8B%9D%ED%98%95%ED%8E%80%EB%93%9C&ie=utf8&sm=tab_she&qdt=0
?where=news&query=%EC%BD%94%EC%8A%A4%ED%94%BC%EC%95%BC%EA%B0%84%EC%84%A0%EB%AC%BC&ie=utf8&sm=tab_she&qdt=0
?where=news&query=%EB%AF%B8%EA%B5%AD%EC%A3%BC%EC%8B%9D%ED%88%AC%EC%9E%90&ie=utf8&sm=tab_she&qdt=0
?where=news&query=2019%EC%A3%BC%EC%8B%9D&ie=utf8&sm=tab_she&qdt=0
?where=news&query=%EC%9E%90%EC%82%B0%EA%B4%80%EB%A6%AC&ie=utf8&sm=tab_she&qdt=0
?where=news&query=%EC%A3%BC%EC%8B%9D%EC%A0%95%EB%B3%B4&ie=utf8&sm=tab_she&qdt=0
?where=news&query=%EB%8B%AC%EB%9F%AC%EC%A0%84%EB%A7%9D&ie=utf8&sm=tab_she&qdt=0
?where=news&query=%EC%9D%B4%EB%8F%99%ED%8F%89%EA%B7%A0%EC%84%A0&ie=utf8&sm=tab_she&qdt=0
#
#
https://help.naver.com/support/alias/search/word/word_2.naver
#
#
#
?where=nexearch&sm=tab_htk.nws&ie=utf8&query=%EC%98%A4%ED%9B%84+4%EC%8B%9C+%ED%88%AC%ED%91%9C%EC%9C%A8+23.46%25
?where=nexearch&sm=tab_htk.nws&ie=utf8&query=%EC%98%A4%ED%9B%84+3%EC%8B%9C+%EB%88%84%EC%A0%81%ED%88%AC%ED%91%9C%EC%9C%A8+21.95%25
?where=nexearch&sm=tab_htk.nws&ie=utf8&query=%EC%BD%94%EB%A1%9C%EB%82%9819+%EA%B2%80%EC%82%AC
?where=nexearch&sm=tab_htk.nws&ie=utf8&query=%EC%B4%9D%EC%84%A0+%EC%82%AC%EC%A0%84%ED%88%AC%ED%91%9C
?where=nexearch&sm=tab_htk.nws&ie=utf8&query=%EC%BD%94%EB%A1%9C%EB%82%9819+%EB%B0%B1%EC%8B%A0
?where=nexearch&sm=tab_htk.nws&ie=utf8&query=%EC%A1%B0%EC%9A%A9%ED%95%9C+%EC%A0%84%ED%8C%8C
?where=nexearch&sm=tab_htk.nws&ie=utf8&query=%EC%82%AC%EC%A0%84%ED%88%AC%ED%91%9C+%EB%A7%88%EC%A7%80%EB%A7%89%EB%82%A0
?where=nexearch&sm=tab_htk.nws&ie=utf8&query=%EA%B5%AD%EB%AF%BC+100%EB%AA%85%EB%8B%B9+1%EB%AA%85%EA%BC%B4
?where=nexearch&sm=tab_htk.nws&ie=utf8&query=%EC%82%AC%EC%A0%84%ED%88%AC%ED%91%9C+%EC%97%B4%EA%B8%B0
?where=nexearch&sm=tab_htk.nws&ie=utf8&query=%EB%A7%88%EC%A7%80%EB%A7%89+%EC%A3%BC%EB%A7%90
?where=nexearch&sm=tab_htk.ent&ie=utf8&query=%EC%9D%8C%EC%95%85%EC%A4%91%EC%8B%AC+%EC%9E%84%EC%98%81%EC%9B%85
?where=nexearch&sm=tab_htk.ent&ie=utf8&query=%EA%B9%80%EC%B0%BD%EC%98%A5+%EC%87%BC+%EA%B9%80%ED%98%B8%EC%A4%91
?where=nexearch&sm=tab_htk.ent&ie=utf8&query=2%EA%B5%B0+%EC%84%A0%EC%88%98+%EB%B0%9C%EC%97%B4%EB%A1%9C+%ED%9B%88%EB%A0%A8+%EC%A4%91%EB%8B%A8
?where=nexearch&sm=tab_htk.ent&ie=utf8&query=%ED%95%98%EC%9D%B4%EC%97%90%EB%82%98+%EA%B9%80%ED%98%9C%EC%88%98
?where=nexearch&sm=tab_htk.ent&ie=utf8&query=%ED%95%98%EC%9D%B4%EC%97%90%EB%82%98+%EC%A2%85%EC%98%81+%EC%86%8C%EA%B0%90
?where=nexearch&sm=tab_htk.ent&ie=utf8&query=%EC%9C%A0%EB%B3%84%EB%82%98+%EB%AC%B8%EC%85%B0%ED%94%84+%EC%97%90%EB%A6%AD
?where=nexearch&sm=tab_htk.ent&ie=utf8&query=%EC%B6%9C%EC%97%B0+%EC%97%86%EC%9D%B4+1%EC%9C%84
?where=nexearch&sm=tab_htk.ent&ie=utf8&query=%EB%B0%98%EC%9D%98%EB%B0%98+%EB%AA%85%EC%84%B8%EB%B9%88
?where=nexearch&sm=tab_htk.ent&ie=utf8&query=%EB%8B%B9%EB%82%98%EA%B7%80+%EA%B7%80+%EB%B0%95%EC%84%B1%EA%B4%91
?where=nexearch&sm=tab_htk.ent&ie=utf8&query=%EC%9C%84%EB%84%88+Remember
https://help.naver.com/support/alias/search/word/word_3.naver
http://newssearch.naver.com/search.naver?where=rss&query=%EC%A3%BC%EC%8B%9D&field=0&nx_search_query=&nx_and_query=&nx_sub_query=&nx_search_hlquery=&is_dts=0
#
#
https://help.naver.com/support/alias/search/word/word_17.naver
https://help.naver.com/support/alias/search/word/word_18.naver
javascript:;
javascript:;
https://help.naver.com/support/alias/search/word/word_16.naver
#
#
https://nid.naver.com/nidlogin.login?url=https%3A%2F%2Fsearch.naver.com%2Fsearch.naver%3Fwhere%3Dnews%26sm%3Dtab_jum%26query%3D%25EC%25A3%25BC%25EC%258B%259D
https://help.naver.com/support/alias/search/word/word_16.naver
https://help.naver.com/support/alias/search/word/word_21.naver
https://help.naver.com/support/alias/search/word/word_17.naver
https://help.naver.com/support/alias/search/word/word_18.naver
javascript:;
javascript:;
https://help.naver.com/support/alias/search/word/word_17.naver
https://help.naver.com/support/alias/search/word/word_18.naver
javascript:;
javascript:;
https://help.naver.com/support/alias/search/word/word_17.naver
https://help.naver.com/support/alias/search/word/word_18.naver
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
javascript:;
#
https://searchad.naver.com/
https://news.naver.com/main/ombudsman/searchAlliance.nhn
https://help.naver.com/support/alias/search/footer/news.naver
https://help.naver.com/support/alias/report/unsound.naver
https://www.navercorp.com/

 

 

 

2강 Data Representation Learning

 

Step 1 Load MNIST dataset

In [28]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
import keras
from keras.datasets import mnist
from sklearn import preprocessing
import numpy as np

(train_xs, train_ys), (test_xs, test_ys) = mnist.load_data()
dim_x = train_xs.shape[1] * train_xs.shape[2]
dim_y = 10

train_xs = train_xs.reshape(train_xs.shape[0], dim_x).astype(np.float32)

scaler = preprocessing.MinMaxScaler().fit(train_xs)
train_xs = scaler.transform(train_xs)
print(train_xs.shape)
print(train_ys.shape)
 
 
 
(60000, 784)
(60000,)
 

Step 2 Data Sampling

In [27]:
ridx = np.random.randint(train_xs.shape[0], size=10000)
np_train_xs = train_xs[ridx, :]
np_train_ys = train_ys[ridx]
print(np_train_xs.shape)
print(np_train_ys.shape)
 
(10000, 784)
(10000,)
 

Step 3 Import t-SNE & seaborn

In [12]:
%matplotlib inline
import sklearn
from sklearn.manifold import TSNE
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_style('darkgrid')
sns.set_palette('muted')
sns.set_context("notebook", font_scale=1.5, rc={"lines.linewidth": 2.5})
 

Step 4 Define Scatterplot method. Run t-SNE.

MNIST 데이터를 t-SNE를 이용하여 차원 축소(784차원 -> 2차원) 및 시각화 하기

  • t-SNE의 이론적 배경 소개
    • Perplexity 의미 : 가까운 점은 그 Label에 상관 없이 같은 군집이라고 가정하는 척도
    • Perplexity가 작으면 : 군집간 거리가 멀어짐
    • Perplexity가 크면 : 군집간 거리가 가까워 짐 #### Step1 Display the result
In [26]:
def draw_scatter(x, n_class, colors):
    sns.palplot(sns.color_palette())
    palette = np.array(sns.color_palette())
    
    f = plt.figure(figsize=(14,14))
    ax = plt.subplot(aspect='equal')
    sc = ax.scatter(x[:,0], x[:,1], lw=0, s=540, c=palette[colors.astype(np.int)], alpha=0.2)
    plt.xlim(-25, 25)
    plt.ylim(-25, 25)
    ax.axis('off')
    ax.axis('tight')
    plt.show()
    
    
tsne_train_xs = TSNE(random_state=42).fit_transform(np_train_xs)
draw_scatter(tsne_train_xs, dim_y, np_train_ys)
 
 
 

t-SNE의 Perplexity 실습하기

In [30]:
ridx = np.random.randint(train_xs.shape[0], size = 1000) #data 크기를 줄임
np_train_xs = train_xs[ridx, :]
np_train_ys = train_ys[ridx]

sns.palplot(sns.color_palette()) # 숫자 0~9를 Color로 표시하여 보여줌
palette = np.array(sns.color_palette())
# 화면 구성은 3x3으로 보여줌
fig, axs = plt.subplots(nrows=3, ncols=3, figsize=(15,15))
for ax, perplexity in zip(axs.flat, [2,5,10,20,30,50,75,100,150]):
    tsne_out = TSNE(n_components = 2, perplexity = perplexity).fit_transform(np_train_xs)
    title = 'Perpelexity = {}'.format(perplexity)
    ax.set_title(title)
    ax.scatter(tsne_out[:,0], tsne_out[:,1], lw=0, s=25, c=palette[np_train_ys.astype(np.int)], alpha=0.3)
    ax.axis('tight')

plt.show()
 
 
 
  • Perplexity가 작을 수록 분포의 면적이 넓어짐
  • Perplexity가 낮은 값에서부터 조금씩 증가할 수 록 군집 내 거리를 가깝고 군집간 거리가 멀어지는 것을 알 수 있음.
  • Perplexity가 30이 넘어가면서 전체 데이터가 차지하는 공간이 작아지면서 군집간 거리도 좁아지는 것을 알 수 있음.
  • Perplexity가 100이 넘으면 오히려 군집간 식별력이 떨어질 수 있음(본 문제의 경우)

+ Recent posts