'Python' 카테고리의 글 목록

Python

python 문자열 중 join() 에 대해 알아보자! 2021.06.15
Train set/Validation set/Test set 2021.01.15
tf.lookup 모듈을 알아보자 2020.10.11
python library 're'를 알아보자(1) 2020.10.11

python 문자열 중 join() 에 대해 알아보자!

Hugh_Shannon 2021. 6. 15. 09:45

2021. 6. 15. 09:45

python 문자열 중 join() 에 대해 알아보자!

리스트 형태의 문자열이 있을 때, 이를 합쳐서 하나의 문자열로 만들어야 하는 경우가 있다.

다음과 같은 형태로 만들수도 있지만, 리스트의 길이가 길 경우 비효율적이다.

입력

fruit_list = ["apple", "orange", "grape"]
fruit_list[0] + " " + fruit_list[1] + " " + fruit_list[2]

출력

'apple orange grape'

A.join(B)는 리스트 B에 있는 모든 문자열을 하나의 단일 문자열 A로 결합한다.

입력

fruit_list = ["apple", "orange", "grape"]
" ".join(fruit_list)

출력

'apple orange grape'

다음과 같이 여러 형태로 응용할 수 있다.

# 입력
>>> fruit_list = ["apple", "orange", "grape"]
>>> "".join(fruit_list)
# 출력
>>> 'appleorangegrape'

# 입력
fruit_list = ["apple", "orange", "grape"]
"-".join(fruit_list)
# 출력
'apple-orange-grape'

저작자표시 (새창열림)

'Python > python' 카테고리의 다른 글

python library 're'를 알아보자(1) (0)	2020.10.11

Train set/Validation set/Test set

Hugh_Shannon 2021. 1. 15. 07:31

2021. 1. 15. 07:31

출처

3months.tistory.com/118

Machine Learning에서 validation set을 사용하는 이유

validation set은 machine learning 또는 통계에서 기본적인 개념 중 하나입니다. 하지만 실무를 할때 귀찮은 부분 중 하나이며 간과되기도 합니다. 그냥 training set으로 training을 하고 test만 하면 되지..

3months.tistory.com

ganghee-lee.tistory.com/38

Train / Test / Validation set의 차이

딥러닝에서 신경망 모델을 학습하고 평가하기 위해 dataset이 필요하다. 이때 dataset을 성질에 맞게 보통 다음 3가지로 분류한다. 1. Train set 2. Validation set 3. Test set 이렇게 각각 모델을 학습하고 검

ganghee-lee.tistory.com

머신러닝에서 모델을 구성하고 학습/평가하기위해서는 dataset이 있어야한다.

이 dataset을 성질에 맞게 다음 3가지로 분류한다.

1. Train set : 모델을 학습하기 위한 data set

2. Validation set : 학습이 이미 완료된 모델을 검증하기 위한 data set

3. Test set : 학습과 검증이 완료된 모델의 성능을 평가하기 위한 data set

저작자표시 (새창열림)

tf.lookup 모듈을 알아보자

Hugh_Shannon 2020. 10. 11. 23:12

2020. 10. 11. 23:12

source : www.tensorflow.org/api_docs/python/tf/lookup

Module: tf.lookup | TensorFlow Core v2.3.0

Public API for tf.lookup namespace.

www.tensorflow.org

tf.lookup 에는 다음과 같은 Classes를 갖고있다.

class KeyValueTensorInitializer
class StaticHashTable
class StaticVocabularyTable
class TextFileIndex
class TextFileInitializer

먼저 class KeyValueTensorInitializer에 대해서 알아보자

일단 설명하기로는 주어진 키와 값 텐서를 테이블로 초기화하는 초기화자로 써있는데...

tf.lookup.KeyValueTensorInitializer(
	keys, values, key_dtype = None, value_dtype = None, name = None
)

keys : The tensor for the keys, key 텐서
values : The tensor for the values, value 텐서
key_dtype : The keys data type. Used when keys is a python array. key의 데이터 형식을 지정할 수 있다.
value_dtype : The values data type. Used when values is a python array. value의 데이터 형식을 지정할 수 있다.
name : A name for the operation(optional). 옵션으로 해도 되고 안해도 되지만 한다면 이름을 지정하는것 같다...

예제를 만들어서 살펴보자

keys = ["연필", "지우개", "볼펜"]
values = [1,2,3]
table_init = tf.lookup.KeyValueTensorInitializer(keys, values)

근데 막상 이렇게 만들어서 초기화 한 값을 출력해보면 주속 값만 나오게 되는데

이렇게 한 테이블을 출력하기 위해선 다른 명령어를 입력해서 할당해야하는데(~~이유는 나중에 다시 살펴보도록하자...~~)

본래 tensorflow에서 텍스트 파일을 리스트로 만들기위해서 이리저리 알아보다가 여기까지 오게되었는데...아무튼 입력한 값을 출력하기 위해서는 다음과 같은 명령어가 필요하다.

num_oov_buckets = 2
table = tf.lookup.StaticVocabularyTable(table_init, num_oov_buckets)
keys1 = tf.constant(keys)
table.lookup(keys1)

<tf.Tensor: shape=(3,), dtype=int64, numpy=array([1,2,3])>

물론 다음과 같이 출력되게 하기 위해서 아래의 조치도 취했다.

import tensorflow as tf
values = [1,2,3]
valuse = tf.dtypes.cast(values, tf.int64)

int64의 형태로 되어야 tf.lookup.StaticVocabularyTable이 먹히는것 같다.

그렇다면 tf.lookup.StaticVocabularyTable은 뭘하는 클래스일까...

역시나 다음 출처에서 알아보았다.

SOURCE : www.tensorflow.org/api_docs/python/tf/lookup/StaticVocabularyTable

tf.lookup.StaticVocabularyTable | TensorFlow Core v2.3.0

String to Id table wrapper that assigns out-of-vocabulary keys to buckets.

www.tensorflow.org

String to ld table wrapper that assigns out-of-vocabulary keys to buckets.

라고 쓰여있는데...

흠 다음에 기회가되면 더 알아보도록 해야할것 같다...

저작자표시 (새창열림)

python library 're'를 알아보자(1)

Hugh_Shannon 2020. 10. 11. 23:05

2020. 10. 11. 23:05

source : docs.python.org/3/library/re.html#

re — Regular expression operations — Python 3.9.0 documentation

This module provides regular expression matching operations similar to those found in Perl. Both patterns and strings to be searched can be Unicode strings (str) as well as 8-bit strings (bytes). However, Unicode strings and 8-bit strings cannot be mixed:

docs.python.org

텍스트 마이닝을 하고 있는데, 텍스트를 전처리 해야할일이 꽤 많아져서 라이브러리를 찾던 중 're'를 알게되었다.

정확하게 내부 함수나 요소를 알아보고 정리하기 위해서 API documents를 찾아보고 정리해보자.

Regular Expression Operations

Regular Expression Syntax

원문에 이런글이 보인다...

A regular expression (or RE) specifies a set of strings......

re는 일단 Regular Expression 의 줄임말인듯하다...

저작자표시 (새창열림)

'Python > python' 카테고리의 다른 글

python 문자열 중 join() 에 대해 알아보자! (0)	2021.06.15

PREV 이전 1 NEXT 다음

Space of my mind(내 머릿속의 공간)

Python

python 문자열 중 join() 에 대해 알아보자!

python 문자열 중 join() 에 대해 알아보자!

'Python > python' 카테고리의 다른 글

Train set/Validation set/Test set

tf.lookup 모듈을 알아보자

source : www.tensorflow.org/api_docs/python/tf/lookup

SOURCE : www.tensorflow.org/api_docs/python/tf/lookup/StaticVocabularyTable

python library 're'를 알아보자(1)

Regular Expression Operations

Regular Expression Syntax

'Python > python' 카테고리의 다른 글

+ Recent posts

티스토리툴바