西西河

主题:【文摘】youtube architecture -- 西电鲁丁

共:💬7 🌺15 新:
全看分页树展 · 主题
家园 【文摘】youtube architecture

好文章不敢独享,贴出来供大家参考。

http://highscalability.com/youtube-architecture

文章介绍了youtube网站的软件平台和主要的技术架构,重点是如何实现存储海量的视频和服务每日超过1亿的视频访问的。

摘要:

1. YouTube的软件平台全是开源的

# Apache

# Python

# Linux (SuSe)

# MySQL

# psyco, a dynamic python->C compiler

# lighttpd for video instead of Apache

2. Most popular content is moved to a CDN (content delivery network):

- CDNs replicate content in multiple places. There's a better chance of content being closer to the user, with fewer hops, and content will run over a more friendly network.

- CDN machines mostly serve out of memory because the content is so popular there's little thrashing of content into and out of memory.

# Less popular content (1-20 views per day) uses YouTube servers in various colo sites.

-Caching doesn't do a lot of good in this scenario, so spending money on more cache may not make sense. This is a very interesting point

3. Thumbnails (4 thumbnails for each video so there are a lot more thumbnails than videos)最初存放于LINUX EXT3的文件系统,已经到了目录下文件数的上限,决定采用Google's BigTable(记得河里的邓兄介绍过,不记得坑填完了没有)

4. MYSQL数据库存储META DATA(元数据),包括TAGS,描述等 。(其实大部分的ECM-企业内容管理系统也是采用数据库管理元数据,文件系统或数据库存储内容) Can now scale database almost arbitrarily.

令人惊讶的是YouTube的技术队伍只有9人,

2 sysadmins, 2 scalability software architects

2 feature developers, 2 network engineers, 1 DBA

这个应该会令很多网站汗颜。

全看分页树展 · 主题


有趣有益,互惠互利;开阔视野,博采众长。
虚拟的网络,真实的人。天南地北客,相逢皆朋友

Copyright © cchere 西西河