Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ethen8181
GitHub Repository: ethen8181/machine-learning
Path: blob/master/recsys/__pycache__/movielens.cpython-35.pyc
2604 views


N��X�@s~ddlZddlZddlZddlmZdd�Zdd�Ze	dkrzdZ
ee
�\ZZej
�dS)	�N)�callcCstjj|d�}tjj|�sVtddd|dg�td|dg�ddd	d
g}tj|ddd
|�}|dj�jd}|dj�jd}t	j
||f�}x;|jdd�D]'}|j||j
d|jdf<q�W|S)zcreate movielens rating matrixzu.dataZcurlz-Oz.http://files.grouplens.org/datasets/movielens/z.zipZunzip�user_id�item_id�rating�	timestamp�sep�	�namesr�indexF�)�os�path�join�isdirr�pd�read_csv�unique�shape�np�zeros�
itertuplesrrr)�file_dir�	file_pathr	�dfZn_usersZn_items�ratings�row�r�1/Users/ethen/machine-learning/recsys/movielens.py�create_rating_mats%rcCs�t|�}tj|j�}|j�}xpt|jd�D][}tjjtj||�dddd�}d|||f<|||f|||f<q>Wtj	||dk�s�t
�||fS)zu
    split into training and test sets,
    remove 10 ratings from each user
    and assign them to the test set
    r�size�
�replaceFg)rrrr�copy�range�random�choice�flatnonzero�all�AssertionError)rr�test�train�userZ
test_indexrrr�create_train_test sr,�__main__zml-100k)r�numpyr�pandasr�
subprocessrrr,�__name__rr*r)�headrrrr�<module>s