Big_Data_Whitepaper
- 格式:pdf
- 大小:136.63 KB
- 文档页数:6
Apple ProRes White PaperOctober 2012Contents3 Introduction4 Apple ProRes Family Overview6 Properties of Digital ImagesFrame Size (Full-Width Versus Partial-Width)Chroma SamplingSample Bit Depth9 Properties of Apple ProRes CodecsData RateQualityPerformanceApple ProRes 4444 Alpha Channel Support 18 AppendixTarget Data Rates20 GlossaryIntroductionApple ProRes is one of the most popular and commonly used codecs in professional post-production. The Apple ProRes family of video codecs has made it both possible and affordable to edit full-frame, 10-bit, 4:2:2 high-definition (HD), 2K, 4K, and 5K video sources with multistream performance in Final Cut Pro X. This white paper provides in-depth information about all five members of the Apple ProRes family, including technical specifications and performance metrics.All members of the Apple ProRes family provide an unparalleled combination ofmultistream, real-time editing performance coupled with impressive image quality at reduced storage rates. Additionally, all five codecs preserve standard-definition (SD), HD, 2K, 4K, and 5K frame sizes at full resolution.As a variable bit rate (VBR) codec technology, Apple ProRes uses fewer bits on simple frames that would not benefit from encoding at a higher data rate. All Apple ProRes codecs are frame-independent (or “intra-frame”) codecs, meaning that each frame is encoded and decoded independently of any other frame. This technique provides the greatest editing performance and flexibility.• Apple ProRes 4444: The Apple ProRes 4444 codec preserves motion image sequences originating in either 4:4:4 RGB or 4:4:4 Y’C B C R color spaces. With a remarkably low data rate (as compared to uncompressed 4:4:4 HD), Apple ProRes 4444 supports 12-bit pixel depth with an optional, mathematically lossless alpha channel for true 4:4:4:4 support. Apple ProRes 4444 preserves visual quality at the same high level as Apple ProRes 422 (HQ), but for 4:4:4 image sources, which can carry the highest possible color detail.Apple ProRes 4444 is ideal for next-generation video and the most pristine computer graphics sources for cinema-quality compositing. For projects that previously required the Animation codec, Apple ProRes 4444 is a modern replacement with real-timeplayback support in Final Cut Pro. Because color detail is preserved, Apple ProRes 4444 is ideal for color grading with 4:4:4 image sources.• Apple ProRes 422 (HQ): Apple ProRes 422 (HQ) offers visually lossless preservation of the highest-quality professional HD video that a (single-link) HD-SDI signal can carry. For this reason, Apple ProRes 422 (HQ) has enjoyed widespread adoption across the video post-production industry. This codec supports full-width, 4:2:2 video sources at 10-bit pixel depths, while retaining its visually lossless characteristic through many generations of decoding and reencoding.Apple ProRes 422 (HQ) can be used both as an intermediate codec to accelerate workflows for complex, compressed video sources and as an affordable,high-performance alternative to uncompressed 4:2:2 video.Apple ProRes Family Overview• Apple ProRes 422: Apple ProRes 422 offers nearly all the benefits of its big brother, Apple ProRes 422 (HQ), but at a significantly lower data rate. It provides visually lossless coding performance for the same full-width, 10-bit, 4:2:2 sequences as Apple ProRes 422 (HQ) with even better multistream real-time editing performance.• Apple ProRes 422 (LT): Like Apple ProRes 422 (HQ) and Apple ProRes 422, theApple ProRes 422 (LT) codec supports full-width, 10-bit video sequences, but at a target data rate even lower than that of its siblings. Apple ProRes 422 (LT) weighs inat 100 Mbps or less, depending on the particular video format. Apple ProRes 422 (LT) balances incredible image quality with small file sizes, and is perfect for digital broadcast environments where storage capacity and bandwidth are often at a premium.Apple ProRes 422 (LT) is ideal for live multicamera and on-location productions where large amounts of footage are acquired to disk.• Apple ProRes 422 (Proxy): The lowest-bandwidth member of the Apple ProRes family is Apple ProRes 422 (Proxy). This codec maintains HD data rates below 36 Mbps, yet like its higher-rate Apple ProRes 422 siblings, it supports full-frame, 10-bit, 4:2:2 video. Apple ProRes 422 (Proxy) is intended for draft-mode or preview uses where low data rates are required, yet full-resolution video is desired. It is the ideal format for use in offline editing workflows on notebook computers or other systems where storage is at a premium and you have a large quantity of source media.Today’s HD workflows demand offline video formats that support native frame size and aspect ratio. Apple ProRes 422 (Proxy) supports full 1920 x 1080 and 1280 x 720 resolutions, enabling full HD resolution while editing, and accurate representation of Final Cut Pro motion effects from the creative stages through finishing.Properties of Digital Images The technical properties of digital images correspond to different aspects of imagequality. For example, high-resolution HD images can carry more detail than their lower-resolution SD counterparts. 10-bit images can carry finer gradations of color,thereby avoiding the banding artifacts that can occur in 8-bit images.The role of a codec is to preserve image quality as much as possible at a particular reduced data rate, while delivering the fastest encoding and decoding speed.The Apple ProRes family supports the three key properties of digital images that contribute to image quality—frame size, chroma sampling, and sample bit depth—while offering industry-leading performance and quality at each supported data rate. In order to appreciate the benefits of the Apple ProRes family as a whole andto choose which family members to use in various post-production workflows, it is important to understand these three properties.Frame Size (Full-Width Versus Partial-Width)Many video camcorders encode and store video frames at less than the full HD widths of 1920 pixels or 1280 pixels, for 1080-line or 720-line HD formats, respectively. When such formats are displayed, they are upsampled horizontally to full HD widths, but they cannot carry the amount of detail possible with full-width HD formats.All Apple ProRes family members can encode full-width HD video sources (sometimes called “full-raster” video sources) to preserve the maximum possible detail an HD signal can carry. Apple ProRes codecs can also encode partial-width HD sources if desired, thereby averting potential quality and performance degradation that results from upscaling partial-width formats prior to encoding.Chroma SamplingColor images require three channels of information. In computer graphics, a pixel’s color is typically defined by R, G, and B values. In traditional digital video, a pixel is represented by Y’, C B , and C R values, where Y’ is the “luma” or grayscale value and C B and C R contain the “chroma” or color-difference information. Because the eye isless sensitive to fine chroma detail, it is possible to average together and encode fewer C B and C R samples with little visible quality loss for casual viewing. This technique, known as chroma subsampling , has been used widely to reduce the data rate of video signals. However, excessive chroma subsampling can degrade quality for compositing, color correction, and other image-processing operations. The Apple ProRes family handles today’s popular chroma formats as follows:• 4:4:4 is the highest-quality format for preserving chroma detail. In 4:4:4 image sources, there is no subsampling, or averaging, of chroma information. There are three unique samples, either Y’, C B , and C R or R, G, and B, for every pixel location. Apple ProRes 4444 fully supports 4:4:4 image sources, from either RGB or Y’C B C R color spaces. The fourth “4” means that Apple ProRes 4444 can also carry a unique alpha-channel sample for every pixel location. Apple ProRes 4444 is intended to support 4:4:4:4 RGB+Alpha sources exported from computer graphics applications such as Motion, as well as 4:4:4 video sources from high-end devices such as dual-link HDCAM-SR.• 4:2:2 is considered a high-quality, professional video format in which the chroma values of Y’C B C R images are averaged together such that there is one C B and one C R sample, or one “C B /C R chroma pair,” for each Y’ (luma) sample. This minimal chroma subsampling has traditionally been considered adequate for high-quality compositing and color correction, although better results can be achieved using 4:4:4 sources. 4:2:2 sources are generated by many popular higher-end video camcorder formats, including DVCPRO HD, AVC-Intra/100, and XDCAM HD422/50. All Apple ProRes 422 family members fully support the chroma resolution inherent in 4:2:2 video formats.• 4:2:0 and 4:1:1 have the least chroma resolution of the formats mentioned here, with just one C B /C R chroma pair for every four luma samples. These formats are used in a variety of consumer and professional video camcorders. Depending on the quality of a camera’s imaging system, 4:2:0 and 4:1:1 formats can provide excellent viewing quality. However, in compositing workflows it can be difficult to avoid visible artifacts around the edges of a composited element. HD 4:2:0 formats include HDV, XDCAM HD, and AVC-Intra/50. 4:1:1 is used in DV. All Apple ProRes 422 formats can support 4:2:0 or 4:1:1 sources if the chroma is upsampled to 4:2:2 prior to encoding.4:2:24:4:44:2:0 (interstitial siting)4:1:1••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••Image pixel •Chroma sampleSample Bit DepthThe number of bits used to represent each Y’, C B, or C R (or R, G, or B) image sample determines the number of possible colors that can be represented at each pixel location. Sample bit depth also determines the smoothness of subtle color shading that can be represented across an image gradient, such as a sunset sky, without visible quantization or “banding” artifacts.Traditionally, digital images have been limited to 8-bit samples. In recent years the number of professional devices and acquisition techniques supporting 10-bit and even 12-bit image samples has increased. 10-bit imagery is now often found in 4:2:2 video sources with professional digital (SDI, HD-SDI or even HDMI) outputs. 4:2:2 video sources rarely exceed 10 bits, but a growing number of 4:4:4 image sources claim12-bit resolution, though with sensor-derived images the least significant one or two bits may have more noise than signal. 4:4:4 sources include high-end film scanners and film-like digital cameras and can include high-end computer graphics.Apple ProRes 4444 supports image sources up to 12 bits and preserves alpha sample depths up to 16 bits. All Apple ProRes 422 codecs support up to 10-bit image sources, though the best 10-bit quality will be obtained with the higher bit rate family members—Apple ProRes 422 and Apple ProRes 422 (HQ). (Note: Like Apple ProRes 4444, all Apple ProRes 422 codecs can in fact accept image samples even greater than 10 bits, although such high bit depths are rarely found among 4:2:2 or 4:2:0 video sources.)Every image or video codec can be characterized by how well it behaves in three critical dimensions: compression, quality, and complexity. Compression means data reduction, or how many bits are required compared to the original image. For image sequences or video streams, compression means data rate, expressed in bits/sec for transmission or bytes/hour for storage. Quality describes how closely a compressed image resembles the original. “Fidelity” would therefore be a more accurate term, but “quality” is the term widely used. Complexity relates to how many arithmetic operations must be computed to compress or decompress an image frame orsequence. For software codec implementations, the lower the complexity, the greater the number of video streams that can be decoded simultaneously in real time, resulting in higher performance within post-production applications.Every image or video codec design must make a tradeoff among these threeproperties. Because codecs used within professional camcorders or for professional video editing must maintain high visual quality, the tradeoff amounts to one of data rate versus performance. For example, AVCHD camcorders can produce H.264 video streams with excellent image quality at low data rates. However, the complexity of the H.264 codec is very high, resulting in lower performance for real-time video editing with multiple video streams and effects. In comparison, Apple ProRes features excellent image quality as well as low complexity, which results in better performance for real-time video editing.The following sections describe how the various Apple ProRes codecs behave and compare to one another in terms of these three important codec properties: data rate, quality, and performance .Data RateThe Apple ProRes family spans a broad range of data rates to support a variety of workflow and application purposes. This section describes how Apple ProRes data rates compare to each other and to uncompressed video. The section also illustrates how frame size and frame rate affect Apple ProRes data rates. Finally, the text includes information on the variable bit rate (VBR) nature of the Apple ProRes codec family.Properties of Apple ProRes CodecsThe bar chart below shows how the data rate of each Apple ProRes format compares to uncompressed, full-width (1920 x 1080), 4:4:4 12-bit and 4:2:2 10-bit imagesequences at 29.97 frames/sec. The chart shows that even the two highest-quality Apple ProRes formats—Apple ProRes 4444 and Apple ProRes 422 (HQ)—offer significantly lower data rates than their uncompressed counterparts.7501,5002,2503,000Uncompressed 12-bit 4:4:4ProRes 4444[no alpha]Uncompressed 10-bit 4:2:2ProRes 422 (HQ)ProRes 422ProRes 422 (LT)ProRes 422 (Proxy)Data Rates - Uncompressed and Apple ProRes at 1920 x 1080, 29.97 fpsM b /s The data rates shown in the bar chart above are for “full-width” (1920 x 1080) HD frames at 29.97 frames/sec. The Apple ProRes family also supports the 720p HD format at its full width (1280 x 720). In addition to full-width HD formats, Apple ProRes codecs support three different “partial-width” HD video formats used as the recording resolutions in a number of popular HD camcorders: 1280 x 1080, 1440 x 1080, and 960 x 720.The data rate of an Apple ProRes format is determined primarily by three key factors: Apple ProRes codec type, encoded frame size, and frame rate. The chart below shows some examples of how varying any one of these factors changes an Apple ProRes format’s data rate. A table of data rates for a number of Apple ProRes formats supported for real-time editing in Final Cut Pro can be found in the appendix.23.976 fps29.97 fps03875113150Data Rates - Apple ProRes 422 (LT) versus Apple ProRes 422Mb/sApple ProRes is a variable bit rate (VBR) video codec. This means that the number of bits used to encode each frame within a stream is not constant, but varies from one frame to the next. For a given video frame size and a given Apple ProRescodec type, the Apple ProRes encoder aims to achieve a “target” number of bits per frame. Multiplying this number by the frames per second of the video format being encoded results in the target data rate for a specific Apple ProRes format.Although Apple ProRes is a VBR codec, the variability is usually small. The actual data rate is usually close to the target data rate. For a given Apple ProRes format, there is also a maximum number of bits per frame that is never exceeded. This maximum is approximately 10 percent more than the target number of bits per frame. The graph below plots the actual number of bits used per frame in an example Apple ProRes video sequence.Compressed Frame Sizes - Apple ProRes 422200040006000800010000Frame0400000800000F r a m e S i z e (B y t e s )MaxTargetSequence depicted is ASC/DCI Standard Evaluation Material (StEM) Mini-Movie at 1920 x 1080.Note that for this particular sequence of over 10,000 frames, only one frame uses the maximum number of bits and most frames are clustered within a few percent of the target. However, many frames use significantly fewer bits than the target. This is because Apple ProRes encoders only add bits to a frame if doing so will produce a better match to the original image. Beyond a certain point, simple image frames, such as an all-black frame with a few words of text, incur no quality benefit if more bits are added. Apple ProRes encoders do not waste bits on any frame if adding more will not improve the fidelity.QualityWhile the ability to produce high-quality output is a key attribute of image and video codecs, it is quality preservation—or fidelity—that is the actual goal of a codec. Imagery often goes through many stages of processing prior to Apple ProRes encoding, and these stages may add visible flaws, or “artifacts,” to the images. If an image sequence has visible artifacts to begin with, Apple ProRes will perfectly preserve these artifacts, which can make viewers mistakenly think such flaws are caused by the Apple ProRes codec itself. The goal of every Apple ProRes family member is to perfectly preserve the quality of the original image source, be it good or bad.The quality-preserving capability of the various Apple ProRes codecs can beexpressed in both quantitative and qualitative terms. In the field of image and video compression, the most widely used quantitative measure of image fidelity is peak signal-to-noise ratio (PSNR). PSNR is a measure of how closely a compressed image (after being decompressed) matches the original image handed to the encoder. The higher the PSNR value, the more closely the encoded image matches the original. The graph below plots the PSNR value for each image frame in a test sequence for three different codecs: Apple ProRes 422 (HQ), Avid DNxHD, and Panasonic D5.PSNR Comparison - Apple ProRes, DNxHD, and D5200040006000800010000Frame4050607080L u m a P S N R (d B )Measured using ASC/DCI Standard Evaluation Material (StEM) Mini-Movie at 1920 x 1080.The next graph shows the same sequence plotted for each Apple ProRes 422 codec. As the graph shows, there is a difference in PSNR between one family member and the next. These differences correspond to the comparative data rates of the Apple ProRes 422 codecs. PSNR for Apple ProRes 422 (HQ) is 15-20 dB higher than that for Apple ProRes 422 (Proxy), but the Apple ProRes 422 (HQ) stream has nearly five times the data rate of the Apple ProRes 422 (Proxy) stream. The benefit of higher fidelity comes at the cost of larger file sizes, so it is important to select the0200040006000800010000Frame20406080L u m a P S N R (d B )Measured using ASC/DCI Standard Evaluation Material (StEM) Mini-Movie at 1920 x 1080.In addition to indicating visual fidelity, the difference in PSNR values also denotes headroom. For example, if you were to view the original sequence used in the graph above, and then view the Apple ProRes 422 (HQ) and Apple ProRes 422 encoded versions of the same stream, all three would look visually identical. However, thehigher PSNR value for Apple ProRes 422 (HQ) indicates greater quality headroom. This increased headroom means that an image sequence can be decoded and reencoded over multiple generations and still look visually identical to the original, as shown in the graph below:203040506013579Multigeneration PSNRL u m a P S N R (d B )GenerationsBecause PSNR is not a perfect measure of compressed image fidelity—there is no particular PSNR number that can absolutely guarantee that a compressed image will have no visible difference from the original—it is useful to have some qualitative description of expected image quality for each Apple ProRes codec type. Note that in the table below, the qualitative description for Apple ProRes 4444 (without an alpha channel) is identical to that for Apple ProRes 422 (HQ). This is because Apple ProRes 4444, though its target bit rate is 50 percent higher than that of Apple ProRes 422 (HQ), uses extra bits to encode the greater number of chroma samples in 4:4:4 at the same high quality headroom ensured by Apple ProRes 422 (HQ) for 4:2:2 sources.Apple ProRes Codec Visible differences (1st gen.) Quality headroomProRes 4444 Virtually never Very high, excellent for multi-gen. finishing ProRes 422 (HQ) Virtually never Very high, excellent for multi-gen. finishing ProRes 422 Very rare High, very good for most multi-gen. workflows ProRes 422 (LT) RareGood for some multi-gen. workflowsProRes 422 (Proxy)Subtle for high-detail imagesOK, intended for first-gen. viewing and editingPerformanceThe Apple ProRes family of codecs is designed for speed, and high speed of both encoding and decoding is essential to avoid workflow bottlenecks.Fast decoding is especially critical for multistream, real-time editing in Final Cut Pro. The Apple ProRes codec family performs exceptionally well in this regard. For each Apple ProRes codec type, the following chart shows the number of full-width, 1920 x 1080 HD streams that can be edited simultaneously in real time on aMacBook Pro computer. In practice of course, you may not often need to edit five, six, or more streams simultaneously, but this chart gives an idea of how much processing time will be available for real-time titling, effects, and so on, when just one, two, or three streams are being used.ProRes 422 (Proxy)*ProRes 422 (LT)*ProRes 422ProRes 422 (HQ)481216MacBook Pro – Final Cut Pro Multicam Streams at 1920 x 1080, 23.976 fpsNumber of simultaneous streams*The Final Cut Pro X multicam feature allows you to view up to a maximum of 16 angles simultaneously while switching or cutting angles in real time.Testing conducted by Apple in October 2012 using OS X Mountain Lion v10.8.2 and shipping 2.6GHz quad-core Intel Core i7-based 15-inch MacBook Pro with Retina display units configured with 512GB of flash storage and 8GB of RAM. Testing conducted with a prerelease version of Final Cut Pro X using 7-minute, 30-second 1920 x 1080p24 ASC-DCI Standard Evaluation Material multicam clips for each content type. MacBook Pro continuously monitors system thermal and power conditions, and may adjust processor speed as neededto maintain optimal system operation. Performance may vary depending on system configuration and content.ProRes 4444(w/o alpha)Today’s Mac notebook and desktop machines rely on multicore processing, so the speed of a fast editing decoder must scale up—meaning that decoding time per frame should decrease—as the number of processing cores increases. Many industry codec implementations “hit the wall” and do not realize further performance gains as more processors are added, but Apple ProRes codecs continue to get faster as more cores are added, as the following chart shows.1242101051520Multiprocessor Scaling – Apple ProRes 422 (HQ) at 1920 x 1080Decoding time (ms per frame; shorter is faster)N u m b e r o f p r o c e s s o r c o r e sTesting conducted by Apple in October 2012 using OS X Mountain Lion v10.8.2, a prerelease version ofFinal Cut Pro X (10.0.6), and shipping Mac Pro 3.06GHz 12-core Intel Xeon units configured with 12GB of RAM.Performance may vary depending on system configuration, content, and performance measurement tool use.6Apple ProRes decoders are designed to work especially well as high-quality,high-performance editing codecs for Final Cut Pro. Not only are they fast for decoding video at full frame size and quality, but they are even faster at decoding frames at “half-size” frame (1/2 height and 1/2 width). Especially for high-resolution formats like HD and 2K, half-size images provide plenty of onscreen detail for making editing decisions.The chart below shows that half-size decoding is substantially faster than already-fast full-size decoding, especially for the higher-quality Apple ProRes codecs. The faster decoding speed means more CPU time is available for decoding more streams or more real-time effects.ProRes 422 (Proxy)ProRes 422 (LT)ProRes 422ProRes 422 (HQ)ProRes 4444(w/o alpha)246Reduced Resolution Decoding Speed at 1920 x 1080Decoding time (ms per frame; shorter is faster)Testing conducted by Apple in October 2012 using OS X Mountain Lion v10.8.2 and shipping 2.6GHzquad-core Intel Core i7-based 15-inch MacBook Pro with Retina display units configured with 512GB of flash storage and 16GB of RAM. MacBook Pro continuously monitors system thermal and power conditions, and may adjust processor speed as needed to maintain optimal system operation. Performance may vary depending on system configuration, content, and performance measurement tool use.Although fast decoding speed is the primary factor in real-time editing performance, fast encoding speed is also important for key steps in post-production workflows. Like Apple ProRes decoders, the Apple ProRes family of encoders have all been built as efficient software implementations, and fast encoding is achieved through efficient use of multicore processors. Fast encoding speed is essential for some steps and important in virtually all others.For real-time capture and Apple ProRes encoding of baseband video signals (either analog or digital SD or HD signal sources), Apple ProRes software encoders must be fast enough to keep up with the incoming real-time video frames. An appropriate video capture card must be used for this purpose, but otherwise no specialized encoding hardware is required to achieve real-time capture of baseband video to Apple ProRes formats.For file-based transcoding of video files that have been encoded with other(non-Apple ProRes) video codecs, transcoding to Apple ProRes entails both decoding of the starting technique and reencoding to Apple ProRes. The minimum total transcoding time will therefore be the sum of the time required to decode the file and the time required to reencode it to Apple ProRes. For certain video codec formats known to be highly complex and therefore relatively slow to decode, such as JPEG-2000 and the REDCODE® RAW (R3D) native codec format, the overall transcoding time will be dominated by the decoding time. Still, fast Apple ProRes encoding helps make the total transcoding time faster.Fast encoding and decoding also benefits rendering and exporting. Rendering effects, as part of a creative process or the final step before output, is basically a decodeof the source media and a reencode to the chosen final output format. During the rendering process, all of the decoding, blending, and compositing steps must be precomputed before encoding to the compressed format defined in your Final Cut Pro project. Although you can choose any Apple ProRes codec as a rendering format—from Apple ProRes 422 (LT) to Apple ProRes 4444—and change it at any time during post-production, Final Cut Pro X defaults to rendering in Apple ProRes 422.When rendering to Apple ProRes, the total rendering time is determined by the speed of both the decoding and encoding steps, which can be significantly quicker compared to other, more complex and slower codecs. The speed advantage of Apple ProRes is also beneficial when exporting a file at the end of a project. Ifyou need to deliver to the web, DVD, or Blu-ray disc, you can speed up the export process by choosing to edit in Apple ProRes instead of other professional formats, including uncompressed.Apple ProRes 4444 Alpha Channel SupportIn addition to supporting Y’C B C R or RGB 4:4:4 pixel data, the Apple ProRes 4444 codec type supports an optional alpha channel. The sampling nomenclature for such Y’C B C R A or RGBA images is 4:4:4:4, to indicate that for each pixel location, there is an alpha—or A—value in addition to the three Y’C B C R or RGB values. An alpha value specifies the proportion of its associated RGB or Y’C B C R pixel that should be blended with the pixel at the corresponding location of a background image, creating the effect of varying transparency for use in compositing workflows. Unlike Y’C B C R or RGB pixel values, alpha values do not represent samples of a real-world image, or even samples of a computer-generated image, both of which are intended for human viewing.Alpha values are essentially numeric data that specify how to blend, or composite,a foreground image into a background image. For this reason, Apple ProRes 4444 encodes alpha values exactly rather than approximately. This kind of exact encoding is called “lossless” (or sometimes “mathematically lossless”) compression. It uses different encoding techniques from those used by the Apple ProRes codec family for RGB or Y’C B C R pixel values, where approximate encoding is acceptable as long as differences from the original are not visible to the viewer and do not affect processing. The Apple ProRes 4444 codec losslessly encodes alpha channel values of any bit depth up to and including 16 bits.In summary, the Apple ProRes 4444 codec can be considered “visually lossless” for encoding the Y’C B C R or RGB pixel values intended for viewing, but “mathematically lossless” for encoding the alpha values that specify compositing. As a result, the degree of quality or fidelity is never a question for Apple ProRes 4444 alpha channels because the decoded data always matches the original perfectly.With any kind of lossless compression, the data rate varies according to the amount of image detail being encoded. This is true of Apple ProRes 4444 lossless alpha channel compression as well. However, in practice alpha channels typically contain just the information related to object outlines, so the optional alpha channel typically adds just a few percent to the overall Apple ProRes 4444 data rate. For this reason, the presence of an alpha channel in an Apple ProRes 4444 stream typically reduces decoding and encoding performance by only about 10 percent or less.。
第36卷计 算 机 学 报Vol. 362013年 论文在线发布号 No.10CHINESE JOURNAL OF COMPUTERS2013 Article Online No.10———————————————本课题得到国家自然科学基金项目(No.91118006)资助.冯登国,男,1965年生,博士,研究员,主要研究领域为信息安全与密码学、可信计算与信息保障.张敏,女,1975年生,博士,副研究员,主要研究领域为数据隐私保护与可信计算.李昊,男,1983年生,博士,助理研究员,主要研究领域为数据隐私保护与可信计算.大数据安全与隐私保护冯登国, 张敏, 李昊(中国科学院软件研究所 可信计算与信息保障实验室, 北京 中国 100190)摘 要 大数据(Big Data )已成为学术界和产业界的研究热点,正影响着人们日常生活、工作习惯及思考方式。
但是目前大数据在收集、存储和使用过程中面临着诸多安全风险,大数据所导致的隐私泄露为用户带来严重困扰,而虚假大数据将导致错误或无效的分析结果。
本文分析了实现大数据安全与隐私保护所面临的技术挑战,整理出解决问题的若干关键技术及其最新进展。
通过分析指出大数据在引入安全问题的同时,也是解决信息安全问题的有效手段。
它为信息安全领域的发展带来了新的契机。
关键词大数据;大数据安全;隐私保护中图法分类号 TP309Big Data Security and Privacy ProtectionFENG Deng-Guo, ZHANG Min, LI Hao(Department of TCA, Institute of Software, Chinese Academy of Sciences, Beijing 100191, China)Abstract Nowadays big data has become a hot topic in academic and industrial research. It is regarded as a revolution that will transform how we live, work and think. However, there are many security risks in the field of data security and privacy protection when collecting, storing and utilizing big data. Privacy issues related with big data analysis spell trouble for individuals. And deceptive or fake information within big data may lead to incorrect analysis results. This paper summarizes and analyzes the security challenges brought by big data, and then describes the key technologies which can be exploited to deal with these challenges. Finally, this paper argues that big data brings not only challenges, but also technical revolution in the field of information security. Key words big data; big data security; privacy protection1 引言当今,由于社会信息化和网络化的发展导致产生的数据爆炸式增长。
The development and tendency of Big DataAbstract: "Big Data" is the most popular IT word after the "Internet of things" and "Cloud computing". From the source, development, status quo and tendency of big data, we can understand every aspect of it. Big data is one of the most important technologies around the world and every country has their own way to develop the technology.Key words: big data; IT; technology1 The source of big dataDespite the famous futurist Toffler propose the conception of “Big Data” in 1980, for a long time, because the primary stage is still in the development of IT industry and uses of information sources, “Big Data” is not get enough attention by the people in that age[1].2 The development of big dataUntil the financial crisis in 2008 force the IBM ( multi-national corporation of IT industry) proposing conception of “Smart City”and vigorously promote Internet of Things and Cloud computing so that information data has been in a massive growth meanwhile the need for the technology is very urgent. Under this condition, some American data processing companies have focused on developing large-scale concurrent processing system, then the “Big Data”technology become available sooner and Hadoop mass data concurrent processing system has received wide attention. Since 2010, IT giants have proposed their products in big data area. Big companies such as EMC、HP、IBM、Microsoft all purchase other manufacturer relating to big data in order to achieve technical integration[1]. Based on this, we can learn how important the big data strategy is. Development of big data thanks to some big IT companies such as Google、Amazon、China mobile、Alibaba and so on, because they need a optimization way to store and analysis data. Besides, there are also demands of health systems、geographic space remote sensing and digital media[2].3 The status quo of big dataNowadays America is in the lead of big data technology and market application. USA federal government announced a “Big Data’s research and development” plan in March,2012, which involved six federal government department the National Science Foundation, Health Research Institute, Department of Energy, Department of Defense, Advanced Research Projects Agency and Geological Survey in order to improve the ability to extract information and viewpoint of big data[1]. Thus, it can speed science and engineering discovery up, and it is a major move to push some research institutions making innovations.The federal government put big data development into a strategy place, which hasa big impact on every country. At present, many big European institutions is still at the primary stage to use big data and seriously lack technology about big data. Most improvements and technology of big data are come from America. Therefore, there are kind of challenges of Europe to keep in step with the development of big data. But, in the financial service industry especially investment banking in London is one of the earliest industries in Europe. The experiment and technology of big data is as good as the giant institution of America. And, the investment of big data has been maintained promising efforts. January 2013, British government announced 1.89 million pound will be invested in big data and calculation of energy saving technology in earth observation and health care[3].Japanese government timely takes the challenge of big data strategy. July 2013, Japan’s communications ministry proposed a synthesize strategy called “Energy ICT of Japan” which focused on big data application. June 2013, the abe cabinet formally announced the new IT strategy----“The announcement of creating the most advanced IT country”. This announcement comprehensively expounded that Japanese new IT national strategy is with the core of developing opening public data and big data in 2013 to 2020[4].Big data has also drawn attention of China government.《Guiding opinions of the State Council on promoting the healthy and orderly development of the Internet of things》promote to quicken the core technology including sensor network、intelligent terminal、big data processing、intelligent analysis and service integration. December 2012, the national development and reform commission add data analysis software into special guide, in the beginning of 2013 ministry of science and technology announced that big data research is one of the most important content of “973 program”[1]. This program requests that we need to research the expression, measure and semantic understanding of multi-source heterogeneous data, research modeling theory and computational model, promote hardware and software system architecture by energy optimal distributed storage and processing, analysis the relationship of complexity、calculability and treatment efficiency[1]. Above all, we can provide theory evidence for setting up scientific system of big data.4 The tendency of big data4.1 See the future by big dataIn the beginning of 2008, Alibaba found that the whole number of sellers were on a slippery slope by mining analyzing user-behavior data meanwhile the procurement to Europe and America was also glide. They accurately predicting the trend of world economic trade unfold half year earlier so they avoid the financial crisis[2]. Document [3] cite an example which turned out can predict a cholera one year earlier by mining and analysis the data of storm, drought and other natural disaster[3].4.2 Great changes and business opportunitiesWith the approval of big data values, giants of every industry all spend more money in big data industry. Then great changes and business opportunity comes[4].In hardware industry, big data are facing the challenges of manage, storage and real-time analysis. Big data will have an important impact of chip and storage industry,besides, some new industry will be created because of big data[4].In software and service area, the urgent demand of fast data processing will bring great boom to data mining and business intelligence industry.The hidden value of big data can create a lot of new companies, new products, new technology and new projects[2].4.3 Development direction of big dataThe storage technology of big data is relational database at primary. But due to the canonical design, friendly query language, efficient ability dealing with online affair, Big data dominate the market a long term. However, its strict design pattern, it ensures consistency to give up function, its poor expansibility these problems are exposed in big data analysis. Then, NoSQL data storage model and Bigtable propsed by Google start to be in fashion[5].Big data analysis technology which uses MapReduce technological frame proposed by Google is used to deal with large scale concurrent batch transaction. Using file system to store unstructured data is not lost function but also win the expansilility. Later, there are big data analysis platform like HA VEn proposed by HP and Fusion Insight proposed by Huawei . Beyond doubt, this situation will be continued, new technology and measures will come out such as next generation data warehouse, Hadoop distribute and so on[6].ConclusionThis paper we analysis the development and tendency of big data. Based on this, we know that the big data is still at a primary stage, there are too many problems need to deal with. But the commercial value and market value of big data are the direction of development to information age.忽略此处..[1] Li Chunwei, Development report of China’s E-Commerce enterprises, Beijing , 2013,pp.268-270[2] Li Fen, Zhu Zhixiang, Liu Shenghui, The development status and the problems of large data, Journal of Xi’an University of Posts and Telecommunications, 18 volume, pp. 102-103,sep.2013 [3] Kira Radinsky, Eric Horivtz, Mining the Web to Predict Future Events[C]//Proceedings of the 6th ACM International Conference on Web Search and Data Mining, WSDM 2013: New York: Association for Computing Machinery,2013,pp.255-264[4] Chapman A, Allen M D, Blaustein B. It’s About the Data: Provenance as a Toll for Assessing Data Fitness[C]//Proc of the 4th USENIX Workshop on the Theory and Practice of Provenance, Berkely, CA: USENIX Association, 2012:8[5] Li Ruiqin, Zheng Janguo, Big data Research: Status quo, Problems and Tendency[J],Network Application,Shanghai,1994,pp.107-108[6] Meng Xiaofeng, Wang Huiju, Du Xiaoyong, Big Daya Analysis: Competition and Survival of RDBMS and ManReduce[J], Journal of software, 2012,23(1): 32-45。
WhitepaperDeep Learning Performance Scale-outRevision: 1.2Issue Date: 3/16/2020AbstractIn this whitepaper we evaluated the training performance of Scale-out implementation with the latest software stack and compared it with the results obtained in our previous paper [0]. Using TensorFlow as the primary framework and tensorflow benchmark models, the performance was compared in terms of throughput images/sec on ImageNet dataset, at a single node and multi-node level. We tested some of the more popular neural networks architectures for this comparison and demonstrated the scalability capacity of Dell EMC PowerEdge Servers powered by NVIDIA V100 Tensor Core GPUs.RevisionsDate Description3/16/2020Initial release AcknowledgementsThis paper was produced by the following:NameVilmara Sanchez Dell EMC - Software EngineerBhavesh Patel Dell EMC - Distinguished EngineerJosh Anderson Dell EMC - System Engineer (contributor)We would like to acknowledge:❖Technical Support Team - Mellanox Technologies❖Uber Horovod GitHub Team❖Nvidia Support teamTable of ContentsMotivation (4)Test Methodology (4)PowerEdge C4140-M Details (6)Performance Results - Short Tests for Parameter Tuning (8)Performance Results - Long Tests Accuracy Convergence (16)Conclusion (17)Server Features (18)Citation (19)References (19)Appendix: Reproducibility (20)MotivationWith the recent advances in the field of Machine Learning and especially Deep Learning, it’s becoming more and more important to figure out the right set of tools that will meet some of the performance characteristics for these workloads. Since Deep Learning is compute intensive, the use of accelerators like GPU become the norm, but GPUs are premium components and often it comes down to what is the performance difference between a system with and without GPU. In that sense Dell EMC is constantly looking to support the business goals of customers by building highly scalable and reliable infrastructure for Machine Learning/Deep Learning workloads and exploring new solutions for large scale distributed training to optimize the return on investment (ROI) and Total Cost of Ownership (TCO).Test MethodologyWe have classified TF benchmark tests in two categories: short and long tests. During the development of the short tests, we experimented with several configurations to determine the one that yielded the highest throughput in terms of images/second, then we selected that configuration to run the long tests to reach certain accuracy targets.Short TestsThe tests consisted of 10 warmup steps and then another 100 steps which were averaged to get the actual throughput. The benchmarks were run with 1 NVIDIA GPU to establish a baseline number of images/sec and then increasing the number of GPUs to 4 and 8. These tests allow us to experiment with the parameter tuning of the models in distributed mode.Long TestsThe tests were run using 90 epochs as the standard for ResNet50. This criterion was used to determine the total training time on C4140-M servers in distributed mode with the best parameter tuning found in the short tests and using the maximum number of GPUs supported by the system. In the section below, we describe the setup used, and Table 1gives an overall view on the test configuration.Testing SetupCommercial Application Computer Vision - Image classificationBenchmarks code▪TensorFlow Benchmarks scriptsTopology ▪ Single Node and Multi Node over InfiniBandServer ▪ PowerEdge C4140-M (4xV100-16GB-SXM2)Frameworks▪ TensorFlow with Horovod library for Distributed Mode[1] Models ▪ Convolutional neural networks: Inception-v4, vgg19,vgg16, Inception-v3, ResNet-50 and GoogLeNetBatch size ▪ 128-256GPU’s▪ 1-8Performance Metrics▪ Throughput images/second▪ Training to convergence at 76.2% TOP-1 ACCURACY Dataset ▪ ILSVRC2012 - ImageNetEnvironment ▪ DockerSoftware StackThe Table shows the software stack configuration used to build the environment to run the tests shown in paper [0] and the current testsSoftware Stack Previous Tests Current TestsTest Date February 2019 January 2020OS Ubuntu 16.04.4 LTS Ubuntu 18.04.3 LTSKernel GNU/Linux 4.4.0-128-genericx86_64GNU/Linux 4.15.0-69-genericx86_64nvidia driver 396.26 440.33.01CUDA 9.1.85 10.0 cuDNN 7.1.3 7.6.5 NCCL 2.2.15 2.5.6 TensorFlow 1.10 1.14 Horovod 0.15.2 0.19.0 Python 2.7 2.7 Open MPI 3.0.1 4.0.0 Mellanox OFED 4.3-1 4.7-3 GPUDirect RDMA 1.0-7 1.0-8Single Node - Docker Container TensorFlow/tensorflow:nightly-gpu-py3nvidia/cuda:10.0-devel-ubuntu18.04Multi Node -Docker Container built from nvidia/cuda:9.1-devel-ubuntu16.04nvidia/cuda:10.0-devel-ubuntu18.04Benchmark scripts tf_cnn_benchmarks tf_cnn_benchmarksDistributed SetupThe tests were run in a docker environment. Error! Reference source not found. 1 below shows the different logical layers involved in the software stack configuration. Each server is connected to the InfiniBand switch; has installed on the Host the Mellanox OFED for Ubuntu, the Docker CE, and the GPUDirect RDMA API; and the container image that was built with Horovod and Mellanox OFED among other supporting libraries. To build the extended container image, we used the Horovod docker file and modified it by adding the installation for Mellanox OFED drivers [2]. It was built from nvidia/cuda:10.0-devel-ubuntu18.04Figure 1: Servers Logical Design. Source: Image adapted fromhttps:///docs/DOC-2971Error! Reference source not found.2 below shows how PowerEdge C4140-M is connected via InifniBand fabric for multi-node testing.Figure 2: Using Mellanox CX5 InfiniBand adapter to connect PowerEdge C4140 in multi-nodeconfigurationPowerEdge C4140-M DetailsThe Dell EMC PowerEdge C4140, an accelerator-optimized, high density 1U rack server, is used as the compute node unit in this solution. The PowerEdge C4140 can support four NVIDIA V100 Tensor Core GPUs, both the V100-SXM2 (with high speed NVIDIA NVLink interconnect) as well as the V100-PCIe models.Figure 3: PowerEdge C4140 ServerThe Dell EMC PowerEdge C4140 supports NVIDIA V100 with NVLink in topology ‘M’ with a high bandwidth host to GPU communication is one of the most advantageous topologies for deep learning. Most of the competitive systems, supporting either a 4-way, 8-way or 16-way NVIDIAVolta SXM, use PCIe bridges and this limits the total available bandwidth between CPU to GPU. See Table 2 with the Host-GPU Complex PCIe Bandwidth Summary.ConfigurationLink Interface b/n CPU-GPU complex Total BandwidthNotesM4x16 Gen3128GB/sEach GPU has individual x16 Gen3 to Host CPUTable 2: Host-GPU Complex PCIe Bandwidth SummarySXM1SXM2SXM4SXM3CPU2CPU1x16x16x16x16UPI Configuration MX16 IO SlotX16 IO SlotFigure 4: C4140 Configuration-MNVLinkPCIePerformance Results - Short Tests for Parameter Tuning Below are the results for the short tests using TF 1.14. In this section we tested all the models in multi node mode and compared the results obtained with TF 1.10 in 2019. Throughput CNN Models TF 1.10 vs TF 1.14The Figure 5 several CNN models comparing results with TF 1.10 vs TF 1.14. In Figure 6 we notice that the performance gain is about 1.08X (or 8%) between the two releases.Figure 5: Multi Node PowerEdge C4140-M – Several CNN Models TF 1.10 vs TF 1.14Figure 6 Multi Node PowerEdge C4140-M – Several CNN Models TF 1.10 vs TF 1.14 (Speedup factor)Performance Gain with XLASince there was not much performance gain with the basic configuration, we decided to explore the limits of GPU performance using other parameters. We looked at XLA (Accelerated Linear Algebra) [3], by adding the flag –xla=true at the script level. By default, the TensorFlow graph executor “executes” the operations with individual kernels, one kernel for the multiplication, one kernel for the addition and one for the reduction. With XLA, these operations are “fused” in just one kernel; keeping the intermediate and final results in the GPU, reducing memory operations, and therefore improving performance.See below Figure 7 for results and Figure 8 for speedup factors across the models. The models inception-v4, inception-v3 and ResNet-50 showed much better performance using XLA, with speedup factors from 1.35X up to 1.43X. Since the ResNet-50 model is most widely used, we used it to continue the rest of the tests.Figure 7: Multi Node PowerEdge C4140-M. Several CNN Models TF 1.10 vs TF 1.14 + XLAFigure 8: Multi Node PowerEdge C4140-M. Several CNN Models TF 1.10 vs TF 1.14 + XLA (Speedupfactor)ResNet-50’s Performance with TF 1.14 + XLAIn this section, we evaluated the performance of ResNet-50 model trained with TF 1.14 and TF 1.14 with XLA enabled. The tests were run with 1 GPU, 4 GPUs, and 8 GPUs and the results were compared with those obtained for version TF 1.10 from our previous paper [0]. Also, we explored the performance using batch size of 128 and 256. See Figure 9 and Figure 10.Figure 9: Multi Node PowerEdge C4140-M. ResNet-50 BS 128 TF 1.10 vs TF 1.14 vs TF 1.14 + XLAAs we saw in the previous section ResNet-50 with batch size 128 with 8 GPUs had a performance gain of ~3% with TF 1.10 vs TF 1.14, and ~35% of performance gain with TF 1.10 vs TF 1.14 with XLA enabled, see Figure 9. On the other hand, ResNet-50 with batch size 256 with 8 GPUs had a performance gain of ~2% with TF 1.10 vs TF 1.14, and ~46% of performance gain with TF 1.10 vs TF 1.14 with XLA enabled, see Figure 10. Due to the higher performance of ResNet-50 with batch size 256, we have selected it to further optimize performance.Figure 10: Multi Node PowerEdge C4140-M. ResNet-50 BS 256 TF 1.10 vs TF 1.14 vs TF 1.14 + XLAResNet-50 with TF 1.14 + XLA + GPUDirect RDMAAnother feature explored in our previous paper was GPUDirect RDMA which provides a direct P2P (Peer-to-Peer) data path between GPU memory using a Mellanox HCA device between the nodes. In this test, we enabled it by adding the NCCL flag – x NCCL_NET_GDR_LEVEL=3 at the script level (this variable replaced the variable NCCL_IB_CUDA_SUPPORT in NCCL v 2.4.0). NCCL_NET_GDR_LEVEL variable allows you to control when to use GPUDirect RDMA between a NIC and a GPU. Example level 3 indicates to use GPUDirect RDMA when GPU and NIC are on the same PCI root complex [4].Figure 11: Multi Node PowerEdge C4140-M. ResNet-50 with TF 1.14 + XLA + GPUDirect RDMA Figure 11shows the results of ResNet-50 with TF 1.14 w/XLA enabled, with and without GPUDirect RDMA. We did not observe much performance gains using GPUDirect RDMA across nodes i.e. the performance remained the same and hence we did not explore it further in our testing. This is not to say that GPUDirect RDMA does not help when using scale-out, all we are saying is we did not see the performance gains; hence we are not exploring it further in this paper.ResNet-50’s configuration for better performanceFigure 12 summarizes the results of different configurations explored for the short tests and based on the tests we found that the best performance was achieved using the combination below:ResNet-50 + BS 256 + TF 1.14 + XLA enabledFigure 12: Multi Node PowerEdge C4140-M - ResNet-50’s Configuration for Best PerformanceResNet-50’s Scale-outPowerEdge C4140 using Nvidia 4x NVLink architecture scales relatively well when using Uber Horovod distributed training library and Mellanox InfiniBand as the high-speed link between nodes. It scales ~3.9x times within the node and ~6.9x using scale-out for ResNet-50 with batch size 256. See Figure 13.Figure 13: Multi Node PowerEdge C4140-M - ResNet-50’s Scale-out vs Scale-upFigure 14: Multi Node PowerEdge C4140-M vs CompetitorThe above benchmarks shown in Figure 14 are done on 2 servers C4140 x4 V100 GPUs, each connected by Mellanox ConnectX-5 network adapter with 100Gbit/s over IPoIB. The Dell EMC distributed mode with Horovod achieved 85% of scaling efficiency for ResNet-50 batch size 256 compared with the ideal performance; on the other hand, it achieved 95% of scaling efficiency versus a test run by TF team on 2018 with a VM (virtual machine) instance on GCP(Google cloud) with 8x V100 GPUs and batch size=364 [5].Performance Results - Long Tests Accuracy Convergence Our final tests were to determine the total training time for accuracy convergence with the latest tensorflow version.In this section we decided to include all the batch sizes we tested in our previous paper and compared it with ResNet-50 using batch size 256.Figure 15shows the total training time achieved when running ResNet-50 with different batch sizes and both versions of tensorflow (TF 1.10 vs TF 1.14 with XLA enabled).On average using TF 1.14 +XLA was ~1.3X faster than our previous tests.Figure 15: Multi Node PowerEdge C4140-M - ResNet-50’s Long Training for Accuracy Conv.Conclusion•The performance with TF 1.14 among the models was just slightly superior (~1%-8%) versus TF 1.10. On the other hand, TF 1.14 with XLA boosted the performance up to~46% among the models ResNet-50, Inception-v3 and Inception-v4.•In the case of ResNet-50 model, its performance improved up to ~ 3% with TF 1.14, and up to ~46% with TF 1.14 and XLA enabled. ResNet-50 batch size 256 scaled better(1.46X) versus ResNet-50 BS 128 (1.35X).•The configuration with the highest throughput (img/sec) was ResNet-50 batch size 256 trained with distributed Horovod + TF 1.14 + XLA enabled.•Dell EMC PowerEdge C4140 using Nvidia 4x NVLink architecture scales relatively well when using Uber Horovod distributed training library and Mellanox InfiniBand as the high-speed link between nodes. It scale-out ~3.9X times within the node and ~6.9X across nodes for ResNet-50 BS 256.•On average, the training time for the long tests to reach accuracy convergence were ~ 1.3 X faster using distributed Horovod + TF 1.14 + XLA enabled.•There is a lot of performance improvement being added continuously either at the GPU level, library level or framework level. We are continuously looking at how we can improve our performance results by experimenting with different hyper parameters.•TensorFlow in multi-GPU/multi-node with Horovod Distributed and XLA support improve model performance and reduce the training time, allowing customers to do more with no additional hardware investment.Server FeaturesCitation@article {sergeev2018horovod,Author = {Alexander Sergeev and Mike Del Balso},Journal = {arXiv preprint arXiv: 1802.05799},Title = {Horovod: fast and easy distributed deep learning in {TensorFlow}}, Year = {2018}}References•[0] https:///manuals/all-products/esuprt_solutions_int/esuprt_solutions_int_solutions_resources/servers-solution-resources_white-papers52_en-us.pdf•[1] Horovod GitHub, “Horovod Distributed Deep Learning Training Framework” [Online].Available: https:///horovod/horovod•[2] Mellanox Community, “How to Create a Docker Container with RDMA Accelerated Applications Over 100Gb InfiniBand Network” [Online]. Available:https:///docs/DOC-2971•[3] TensorFlow, “XLA: Optimizing Compiler for Machine Learning” [Online] Available: https:///xla•[4] NCCL 2.5, “NCCL Environment Variables” [Online]. Available:https:///deeplearning/sdk/nccl-developer-guide/docs/env.html#nccl-ib-cuda-support•[5] TensorFlow, “Pushing the limits of GPU performance with XLA” [Online]. Available: https:///tensorflow/pushing-the-limits-of-gpu-performance-with-xla-53559db8e473Appendix: ReproducibilityThe section below walks through the setting requirements for the distributed Dell EMC system and execution of the benchmarks. Do this for both servers:•Update Kernel on Linux•Install Kernel Headers on Linux•Install Mellanox OFED at local host•Setup Password less SSH•Configure the IP over InfiniBand (IPoIB)•Install CUDA with NVIDIA driver•install CUDA Toolkit•Download and install GPUDirect RDMA at the localhost•Check GPUDirect kernel module is properly loaded•Install Docker CE and nvidia runtime•Build - Horovod in Docker with MLNX OFED support•Check the configuration status on each server (nvidia-smi topo -m && ifconfig && ibstat && ibv_devinfo -v && ofed_info -s)•Pull the benchmark directory into the localhost•Mount the NFS drive with the ImageNet dataRun the system as:On Secondary node (run this first):$ sudo docker run --gpus all -it --network=host -v /root/.ssh:/root/.ssh --cap-add=IPC_LOCK -v /home/dell/imagenet_tfrecords/:/data/ -v/home/dell/benchmarks/:/benchmarks -v /etc/localtime:/etc/localtime:ro --privileged horovod:latest-mlnxofed_gpudirect-tf1.14_cuda10.0 bash -c "/usr/sbin/sshd -p 50000; sleep infinity"On Primary node:$ sudo docker run --gpus all -it --network=host -v /root/.ssh:/root/.ssh --cap-add=IPC_LOCK -v /home/dell/imagenet_tfrecords/:/data/ -v/home/dell/benchmarks/:/benchmarks -v /etc/localtime:/etc/localtime:ro --privileged horovod:latest-mlnxofed_gpudirect-tf1.14_cuda10.0•Running the benchmark in single node mode with 4 GPUs:$ python /benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --device=gpu --data_format=NCHW --optimizer=sgd --distortions=false --use_fp16=True --local_parameter_device=gpu --variable_update=replicated --all_reduce_spec=nccl --data_dir=/data/train --data_name=imagenet --model=ResNet-50 --batch_size=256 --num_gpus=4 --xla=true•Running the benchmark in multi node mode with 8 GPUs:$ mpirun -np 8 -H 192.168.11.1:4,192.168.11.2:4 --allow-run-as-root -xNCCL_NET_GDR_LEVEL=3 -x NCCL_DEBUG_SUBSYS=NET -x NCCL_IB_DISABLE=0 -mcabtl_tcp_if_include ib0 -x NCCL_SOCKET_IFNAME=ib0 -x NCCL_DEBUG=INFO -xHOROVOD_MPI_THREADS_DISABLE=1 --bind-to none --map-by slot --mca plm_rsh_args "-p 50000" python /benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --device=gpu --data_format=NCHW --optimizer=sgd --distortions=false --use_fp16=True --local_parameter_device=gpu --variable_update=horovod --horovod_device=gpu --datasets_num_private_threads=4 --data_dir=/data/train --data_name=imagenet --display_every=10 --model=ResNet-50 --batch_size=256 --xla=True。
HOW TO WRITE A WHITEPAPER如何写白皮书All set to write a white paper? If you are still preparing and searching for ways to compile a company’s deep knowledge of its industry and present it to your users as a white paper, this article may help you with white paperwriting.White papers are useful to highlight the expertise of a company. It is undoubtedly one of the most valuable tools of marketing.But what is the right way to make a white paper? You can take some references from business reports as white papers are quite similar to them. To write a flawless one, you must know what it is,why people need a white paper and what all is needed for white paper writing.If you are eager to know all these things, go ahead:What is a white paper?A white paper is a detailed guide or report of a particular topic and the problems it has. It is written to inform the audience and make them understand the topic, its problems, and solutions.This was a layman definition, a structured one as per the world of marketing would go like this:A white paper is a piece of content resembling an eBook. Opinions and facts included in white papers are supported by research and statistics derived from reliable sources. They include visuals as well, such as tables, images, graphs, charts, and manymore.The white paper is generally longer, detailed, and technical than an eBook.Backstory of White PaperThe term “white papers” traces its origin in England’s government-issued documents. One of the widely popular examples is the Churchill White Paper, 1922, which was commissioned by Winston Churchill.In businesses, especially inB2B, consulting, financial, sectors, white papers are used for communicating the philosophy of a company on a certain topic. It makes the case for the goodness of their product.White papers can be considered as an editorial than other forms of content. It has a deep foundation of research which gives it an authoritative tone. This is the reason why they promote thought leadership.What is the use of white paper?Businesses prepare white papers with two major intentions:To record expertise* To market themselvesThey are dedicated to an audience outside the range of business. This is why they act as a medium to attract new readers to the company as potential consumers by providing them with industry knowledge.A white paper is not a sales pitchEvery writer must know that white papers can never be a sales pitch. They are written in an entirely different way. Yes, they both serve a purpose to sell the company, but white papers don’t market in typically. They highlight the expertise of business by highlighting valuablerecommendations and internal expertise of the company. Unlike sales pitch, they don’t bid for the business.Examples:White Paper: Online Advertising: Making marketing cheap yet effective* Sales Pitch: How XYZ marketing is saving users’ budget on advertising?Types of white papersThere are various types of white papers a company can publish:Backgrounder: This white paper offers a detailed explanation of the b enefits of a company’s service, product, and methodology.* Problem-Solution Approach: While white paper writing, one walks through the solution to an issue that is prevalent in the industry.If you are a student and willing to know about other white papers, you can get online assignment help. Experts in such sites can let you know about them.How to select a white paper topic?It is tough to choose the right kind of topic. Here are the three factors you can consider while choosing a topic:Audience: Keep your target audience in mind. Try to ascertain their preferences.* Expertise: While writing a white paper you must highlight the expertise of your business.* Problem-based: Address a specific problem related to your business or industry.* Solution-focused: It should have a proposed solution or recommend a solution to the problem. How to prepare for white paper writing ResearchA white paper topic has to be research comprehensively. You can dig information from industry resources, online references, and internal documents.You don’t have to follow any stern rules on citations, however, do cite data that you didn’t know prior to research. But, try to know the readers’ confidence is expected to grow with a big list of resources.All your sources must belong to authoritative websites. To make a valuable document, you will have to cite credible origins. Also, you can take help with custom research paper writing.Read white papersIt is always a good idea to read others’ work before writing your own. By doing so, you can gain ideas on how to make your content more engaging. Also, you can decide on things you don’t want to include in your piece of writing.So, find out white papers written on your topic on the web. Go through them and determine the knowledge gaps you have. Use your reading skill as an opportunity to create existing content. Go for organizational toolsIsn’t it great to keep track of all sources, content, and ideas involved while writing a white paper? By using an organization tool like Mind-Map, you can do this easily. Such tools allow the writer to connect multiple pieces of information into one visual overview.There is no need to spend extrabucks to access an organizer. You can use free tools like FreeMind which are simple to use.How to format a white paperThe white paper format is generally a standard document. The order of content is often similar to other business documents, but the major difference is:Many business communications place conclusions at the beginning of the document, such as business proposals or technical reports. The white paper places the summary or conclusion at the end.* This order reflects the wishes of the audience and their preference while receiving information. * Content and research shows inform the audience. In a white paper, these are the twofactors that increase their understanding problem.* The last section offers a constructive moment. Here the audience receives solutions that are backed by reliable evidence in the document.* The preferences and journeys of readers in a business report and white paper are different. The major findings follow suit.* In case you are not sure of distinctions and lack business writing skills, you can connect to some online report writers.Irrespective of the journey, your document must be easy to read and understand. It must have informative headings and overall easy navigation. Here are the basics for drafting all sections:TitleIt must be appealing as users are more likely to click on catchy titles than the plain, boring ones. You can say, it is crucial to think of a good title. Try to clearly say what readers will get in your content. Entice them with your title.Don’t use the phrase “white paper” in the title unless required. It has been observed that the audience seek authoritative indicator. Understand the preferences of your potential readers.AbstractThis section is picked by most of the users. It gives a brief overview of the major points of a white paper. Here, the readers are allowed to acknowledge they have found a relevant document to serve their needs. After going through abstracts they must feel they are in the right place.Problem StatementThe problem statement explains the problem your white paper is mainly addressing. The main issue has to be defined and put into context. This has to ensure readers understand the problem.BackgroundIn this section, you will have to offer background information needed for the readers. Your readers should be able to grasp the issue and solution.The contents of the background must be detailed, high-level, broad, and technical. It must depend on the reader and theproblem.When original research is done for a white paper, you must communicate the methods.SolutionHere comes the most sorted moment of white paper writing.On the basis of the preceding data, you will have to present the solution here. It must be argued and developed for the use of aggregated evidence and expertise of the author and their company.ConclusionYou may already know what the conclusion is meant for. Summarize the content of the major findings of your white paper. Present recommendations based on solutions provided. ReferencesAll sources are utilized to develop a white paper which should be aggregated and cited here. References add validity to your white paper. This is the only section where your readers get content for further research. Follow citationformats like MLA or APA. Create it as per your industryWhat not to do while preparing a white paper Knowing everything about white paper is good if you are a beginner. But before you begin with one, there are certain mistakes you must know so that you can avoid them.There are some common mistakes listed here. All of them have the potential to turn your effort and dedication into a wasted venture. Let us quickly go through them:Make it like a sales pitchYes, white papers are used in marketing campaigns where businesses present their products and services. But that does not mean it should look like a sales pitch. If you are turning a white paper into a sales pitch, you are taking itall wrong. This will instantly turn your readers off. Because, when the audience comes to white papers in search of informative pieces made without any bias. Thus, it should help them by providing information and not try to persuade them to buy a service or product. A sales pitch is a different concept, you can save it for product brochures or other content.Lack of in-depth researchYou have already read that white papers mustbe intensely researched documents. It is right that carrying out valuable authentic research requires a lot. Apt research often needs collective efforts. At times it may fall beyond the budget of a marketing team, and minor statistics from the first page of a simple Google searchwill not do any good.Lengthy research is often time-consuming. Searching through a number ofscholarly works and collecting relevant statistics will surely take time, but the results will be worth your effort. If you want your white paper to have an intended effect, establish your content as an authoritative source. A source to which your users can trust want return. In any case, don’t compromise with research.Not-so-good designThe design of white paper is a broad concept to discuss. Though we cannot elaborate much here, it’s worth a brief mentioning. All writers, newbies, or pros must focus on the written content of the white paper, it matters a lot. But if one is neglecting design, even the perfect content will go unnoticed. Not putting effort to create stellar designs can harm your white paper to a great length. On the other hand, a good design canmake your salient points stand out. Your readers will easily understand what is presented in the paper. Visuals like videos, charts, images, and graphs supporting your argument are important. You can search on the internet how people make efficient use of visuals in design.Keeping it vagueWhite papers do hidden marketing. They serve a purpose to offer facts and information without directly persuading them. But, the essence remains in attracting potential consumers. To drive people towards a brand, white papers must not be boring. Don’t make it vague.Try telling a story. Problem-solutions papers, backgrounders, research findings, all they have some story. Make your readers aware of these stories through your narrative. Because they cannot make it via the whole piece if there willbe nothing to keep them connected. Set a problem, discuss its solution. Involve some success stories, it’s a proven formula to create engaging content.Keeping it abstractBecause most white papers will involve sharing research findings, it can be easy to leave them in the realm of theory without explaining how to utilize those findings on a practical level. This is truly more of backgrounders but can be the case with problem-solution white papers aswell.Generally, white papers present and share research findings. It is very to keep them in the reign of theory. You don’t need to describe how to use the research findings in the real world. TakeawayPreparing a perfect white paper is not a 2-minute job. Time, skill, and effort are the key ingredients you need to create a valuable and appealing document. When you are making something to share a company’s essence, knowledge, contribution to the welfare of society, role in the industry, you must focus on all the four pillars of a white paper: research, format, design, and content. One stellar white paper can increase business opportunities to a great extent.。
科学数据投稿流程在科学研究领域,数据的收集、处理和分析是至关重要的。
然而,随着数据量的增长,如何有效地管理和共享数据变得尤为重要。
为此,许多学术出版机构和数据存储库提供了数据投稿的服务,使得研究人员能够遵循一定的规范和标准,更加便捷地分享和交流数据。
本文将详细介绍科学数据的投稿流程。
一、选择合适的数据存储库首先,研究人员需要选择一个合适的数据存储库进行数据投稿。
数据存储库的类型多种多样,包括通用型数据存储库和领域特定型数据存储库。
通用型数据存储库如figshare、Dryad 等,适用于各种学科领域的数据存储和共享;领域特定型数据存储库则针对特定学科领域的数据进行管理和存储,如生物学领域的GenBank、生态学领域的Global Biodiversity Information Facility 等。
在选择数据存储库时,研究人员需要考虑以下几个因素:1.数据存储库的声誉和影响力,以确保数据能够得到广泛的认可和使用;2.数据存储库的学科领域覆盖范围,以确保数据能够被相关领域的研究人员使用;3.数据存储库的数据管理规范和标准,以确保数据的质量和可读性;4.数据存储库的使用费用和开放性,以确保数据的开放性和共享性。
二、准备数据稿件在选择合适的数据存储库后,研究人员需要准备数据稿件。
数据稿件应该包括以下内容:1.数据描述:对数据进行简要介绍,包括数据的来源、收集方法、处理过程等;2.数据格式:说明数据的格式和编码方式,以便于其他研究人员能够正确地读取和使用数据;3.数据文件:将数据以适当的格式进行组织和整理,并确保数据的完整性和准确性;4.数据元数据:提供数据的元数据信息,包括数据的创建日期、修改日期、创作者、版权信息等;5.数据使用许可:说明数据的许可协议和使用条款,以确保其他研究人员能够合法地使用数据。
在准备数据稿件时,研究人员需要注意以下几个事项:1.遵循所选数据存储库的数据管理规范和标准,确保数据的可读性和质量;2.对数据进行充分的描述和注释,以便于其他研究人员能够理解数据的含义和用途;。
Big data is a term used to describe the large volume of structured and unstructured data beyond the capacity of traditional data processing applications that inundates a business on a daytoday basis.Here are some key points to consider when discussing big data in an English essay:1.Definition and Scope:Begin by defining big data and explaining its scope.Big data is characterized by the three Vs:Volume the sheer amount of data,Velocity the speed at which data is generated and processed,and Variety the range of data types and sources.2.Origin and Growth:Discuss the origins of big data and how it has grown exponentially with the advent of the internet,social media,and the proliferation of digital devices.3.Data Sources:Identify various sources of big data,such as social media platforms, online transactions,sensors,and machinetomachine communication.4.Technological Advancements:Explain how technological advancements,such as cloud computing,Hadoop,and NoSQL databases,have facilitated the storage,processing,and analysis of big data.5.Applications:Describe the various applications of big data in different industries, including healthcare,finance,retail,and government.For example,big data can be used for personalized medicine,fraud detection,customer behavior analysis,and smart city management.6.Data Analytics:Discuss the importance of data analytics in making sense of big data. Highlight the role of predictive analytics,machine learning,and artificial intelligence in extracting insights from large datasets.7.Challenges:Address the challenges associated with big data,such as data privacy, security,data quality,and the need for skilled professionals to manage and analyze the data.8.Ethical Considerations:Discuss the ethical implications of big data,including issues related to surveillance,discrimination,and consent.9.Future Prospects:Contemplate the future of big data,including the potential for realtime analytics,the Internet of Things IoT,and the integration of big data with other emerging technologies like blockchain.10.Conclusion:Summarize the main points and emphasize the transformative impact ofbig data on society,economy,and decisionmaking processes.When writing your essay,ensure that you provide concrete examples and case studies to illustrate your e a clear and logical structure,and make sure to cite credible sources to support your arguments.。
紧跟多业务应用发展概述应用已经从业务支持程序发展为企业的核心业务。
所有用户连接到一个集中的庞大站点访问信息的时代已经一去不复返。
应用已成为一项关键业务,并将其范围扩展到全球市场。
它们还采用诸如VoIP、XML 等的下一代技术,并借助支持多级执行的嵌入式Web 业务,实现到全状态的转变,从而提高了自身的复杂程度。
对于部署此类多业务应用的企业而言,确保应用的可用性和可管理性是立足于这一业务的关键所在。
停机及巨额管理开销不仅会造成收入损失,还可能导致重大的产品及服务交付延迟,从而影响对客户的信誉。
由于托管多业务应用的站点在数量和复杂性方面均有提高,最终客户体验也将随之有所发展。
这样,由于客户不满而造成的企业业务量及收入走低的局面将得以逆转。
Zona Research 在报告中指出:每年,由于web 性能较差所造成的经济损失超过250 亿美元。
对于金融公司,停机则将导致收入减少和交付延迟,并会因违反政府规定受到严重处罚。
DNS 攻击的威胁,以及多业务应用集中管理制度的缺乏可能影响到企业的生产能力和利润收入。
1此白皮书介绍了F5 的BIG-IP® 全球流量管理器如何解决多业务应用架构的挑战,以实现最大灵活性,降低架构成本和管理费用,并改进用户性能,从而实现更佳的最终用户体验。
挑战应用不再是运行在庞大服务器中,为用户提供内容的单片装置。
它们已经发展成为一系列完全不同以及相互合作的复杂业务。
这些业务通常为全状态,相互依存,并分布在不同地理位置站点上。
企业只能管理单个业务,而无法提供多业务应用。
企业在提供多业务应用时所面临的一些挑战包括:▪缺乏对应用状态的了解――在多业务应用环境中,企业缺乏跟踪业务状态以及业务之间相互关系的工具,因而无法准确确定这些应用的状态。
当这些业务出现故障时,一些企业则手动进行故障切换,这便要求跟踪并逐个关闭具备相互关系的业务。
除了昂贵的停机成本外,这一方式还要求大量人工操作,并在企业面临大量应用及业务时,尤其容易出错。
关于大数据的学术英文文献Big Data: Challenges and Opportunities in the Digital Age.Introduction.In the contemporary digital era, the advent of big data has revolutionized various aspects of human society. Big data refers to vast and complex datasets generated at an unprecedented rate from diverse sources, including social media platforms, sensor networks, and scientific research. While big data holds immense potential for transformative insights, it also poses significant challenges and opportunities that require thoughtful consideration. This article aims to elucidate the key challenges and opportunities associated with big data, providing a comprehensive overview of its impact and future implications.Challenges of Big Data.1. Data Volume and Variety: Big data datasets are characterized by their enormous size and heterogeneity. Dealing with such immense volumes and diverse types of data requires specialized infrastructure, computational capabilities, and data management techniques.2. Data Velocity: The continuous influx of data from various sources necessitates real-time analysis and decision-making. The rapid pace at which data is generated poses challenges for data processing, storage, andefficient access.3. Data Veracity: The credibility and accuracy of big data can be a concern due to the potential for noise, biases, and inconsistencies in data sources. Ensuring data quality and reliability is crucial for meaningful analysis and decision-making.4. Data Privacy and Security: The vast amounts of data collected and processed raise concerns about privacy and security. Sensitive data must be protected fromunauthorized access, misuse, or breaches. Balancing data utility with privacy considerations is a key challenge.5. Skills Gap: The analysis and interpretation of big data require specialized skills and expertise in data science, statistics, and machine learning. There is a growing need for skilled professionals who can effectively harness big data for valuable insights.Opportunities of Big Data.1. Improved Decision-Making: Big data analytics enables organizations to make informed decisions based on comprehensive data-driven insights. Data analysis can reveal patterns, trends, and correlations that would be difficult to identify manually.2. Personalized Experiences: Big data allows companies to tailor products, services, and marketing strategies to individual customer needs. By understanding customer preferences and behaviors through data analysis, businesses can provide personalized experiences that enhancesatisfaction and loyalty.3. Scientific Discovery and Innovation: Big data enables advancements in various scientific fields,including medicine, genomics, and climate modeling. The vast datasets facilitate the identification of complex relationships, patterns, and anomalies that can lead to breakthroughs and new discoveries.4. Economic Growth and Productivity: Big data-driven insights can improve operational efficiency, optimize supply chains, and create new economic opportunities. By leveraging data to streamline processes, reduce costs, and identify growth areas, businesses can enhance their competitiveness and contribute to economic development.5. Societal Benefits: Big data has the potential to address societal challenges such as crime prevention, disease control, and disaster management. Data analysis can empower governments and organizations to make evidence-based decisions that benefit society.Conclusion.Big data presents both challenges and opportunities in the digital age. The challenges of data volume, velocity, veracity, privacy, and skills gap must be addressed to harness the full potential of big data. However, the opportunities for improved decision-making, personalized experiences, scientific discoveries, economic growth, and societal benefits are significant. By investing in infrastructure, developing expertise, and establishing robust data governance frameworks, organizations and individuals can effectively navigate the challenges and realize the transformative power of big data. As thedigital landscape continues to evolve, big data will undoubtedly play an increasingly important role in shaping the future of human society and technological advancement.。
大数据英文作文Big data is everywhere in today's world. From social media to business analytics, big data is being used to analyze and understand trends, patterns, and behaviors. It has become an essential tool for companies and organizations to make informed decisions and predictions.The sheer volume of data being generated every day is mind-boggling. With the advent of the internet and connected devices, we are constantly producing and consuming data at an unprecedented rate. This flood of data presents both opportunities and challenges for businesses and individuals alike.One of the key benefits of big data is its ability to uncover hidden insights and correlations that would otherwise go unnoticed. By analyzing large datasets, businesses can gain valuable insights into customer behavior, market trends, and operational efficiency. This can lead to more targeted marketing campaigns, bettercustomer service, and improved decision-making.However, the use of big data also raises important ethical and privacy concerns. As companies collect and analyze vast amounts of personal data, there is a risk of misuse and abuse. It is crucial for businesses toprioritize data security and privacy to maintain the trustof their customers and the general public.In conclusion, big data has revolutionized the way we understand and interact with the world. Its potential for driving innovation and improving decision-making is immense, but it also comes with significant responsibilities. As we continue to navigate the era of big data, it is essentialto strike a balance between harnessing its power and protecting the rights and privacy of individuals.。
关于大数据的英文作文500词英文回答:Big data refers to the massive amount of structured and unstructured data that is generated from various sources such as social media, sensors, and transactions. It has become a buzzword in recent years as it has the potential to revolutionize industries and bring about significant changes in our daily lives.One of the key benefits of big data is its ability to provide valuable insights and predictions. By analyzing large datasets, businesses can gain a deeper understanding of customer behavior, market trends, and operational efficiency. For example, online retailers can use big data to analyze customer browsing and purchase history to personalize product recommendations. This not only improves customer satisfaction but also increases sales.Big data also plays a crucial role in healthcare. Byanalyzing patient records, medical professionals can identify patterns and trends that can help in early disease detection and prevention. This can lead to better treatment outcomes and potentially save lives. Additionally, big data can be used to track the spread of infectious diseases and develop effective strategies for containment.Another area where big data is making a significant impact is transportation. By analyzing data from sensors, cameras, and GPS devices, cities can optimize traffic flow, reduce congestion, and improve public transportation systems. For instance, predictive analytics can be used to anticipate traffic jams and reroute vehicles accordingly, saving commuters valuable time and reducing carbon emissions.Furthermore, big data has transformed the field of finance. Banks and financial institutions can use data analytics to detect fraudulent activities and assess creditworthiness. This not only protects customers from identity theft and financial fraud but also helps businesses make informed lending decisions.中文回答:大数据是指从各种来源如社交媒体、传感器和交易中产生的海量结构化和非结构化数据。
大数据的优缺点英语作文英文回答:Advantages of Big Data:Improved decision making: Big data provides organizations with the ability to analyze vast amounts of data to identify patterns, trends, and outliers. This enables them to make data-driven decisions that are more likely to lead to positive outcomes.Enhanced customer service: Big data allows companies to gain a deep understanding of their customers' preferences, behaviors, and needs. This information can be used to provide personalized and targeted customer experiences, leading to increased satisfaction and loyalty.Operational efficiency: Organizations can use big data to optimize their operations, reduce costs, and improve productivity. By analyzing data on processes, resourceutilization, and performance, they can identify areas for improvement and implement changes accordingly.Innovation and product development: Big data empowers companies to gain insights into new markets, explore new product opportunities, and develop innovative solutionsthat meet the evolving needs of customers.Competitive advantage: Access to and effective utilization of big data can provide organizations with a competitive edge in their respective industries. By leveraging data to make better decisions, improve customer experiences, and drive innovation, they can gain market share and stay ahead of the competition.Disadvantages of Big Data:Data privacy and security concerns: The collection and storage of vast amounts of personal and sensitive data raise concerns about privacy and security. Companies need to implement robust measures to protect data from unauthorized access, theft, or misuse.Data complexity and management challenges: Big data often involves complex and unstructured data, which can be challenging to manage and analyze. Organizations need specialized tools and expertise to process, store, and extract meaningful insights from big data.Cost and resource requirements: Implementing and maintaining big data infrastructure can be expensive. Organizations need to invest in hardware, software, and highly skilled professionals to effectively harness the value of big data.Ethical and societal implications: The use of big data has ethical and societal implications that need to be considered. For example, data-driven decisions can have unintended consequences or lead to biases if not used responsibly.Skill and knowledge gap: The field of big data is rapidly evolving, requiring specialized skills and knowledge to effectively utilize and interpret data.Organizations may face challenges finding and retaining qualified professionals with the necessary expertise.中文回答:大数据的优点:改善决策制定,大数据使组织能够分析海量数据以识别模式、趋势和异常值。
大数据预测分析英语作文With the rapid development of technology, big data has become an important tool for predicting and analyzing various aspects of our lives. In this essay, I will discuss the impact of big data on predicting and analyzing various aspects of our lives.Firstly, big data has revolutionized the way we predict and analyze trends in various fields. With the help of big data, businesses can now analyze customer behavior and predict future trends in the market. This has helped businesses to make informed decisions and stay ahead of the competition. Similarly, big data is also being used in the healthcare industry to predict and analyze the spread of diseases, which has proven to be crucial in controlling and preventing outbreaks.Furthermore, big data has also played a significantrole in predicting and analyzing weather patterns. By collecting and analyzing large amounts of data from varioussources, meteorologists can now make more accurate predictions about the weather, which has proven to becrucial in preparing for natural disasters and extreme weather events.In addition, big data has also revolutionized the waywe predict and analyze human behavior. With the help of big data, researchers can now analyze large amounts of data to predict and understand human behavior in various situations. This has proven to be crucial in understanding and addressing social issues such as crime and poverty.Moreover, big data has also played a significant rolein predicting and analyzing the impact of human activities on the environment. By collecting and analyzing large amounts of data, environmentalists can now predict and analyze the impact of human activities on the environment, which has proven to be crucial in developing strategies to protect and preserve the environment.In conclusion, big data has revolutionized the way we predict and analyze various aspects of our lives. With thehelp of big data, businesses can now make informed decisions, researchers can now understand human behavior, meteorologists can now make more accurate weather predictions, and environmentalists can now develop strategies to protect the environment. As technology continues to advance, it is clear that big data will continue to play a crucial role in predicting and analyzing various aspects of our lives.。
python数据可视化毕业设计题目在本文中,我们将探讨一个关于Python数据可视化的毕业设计题目。
通过数据可视化,我们可以将大量复杂的数据以直观、易理解的方式呈现出来,帮助人们更好地理解和分析数据。
下面是一个关于Python数据可视化毕业设计的题目:题目:基于Python的航班数据可视化分析系统设计与实现摘要:本毕业设计旨在设计和实现一个基于Python的航班数据可视化分析系统。
航班数据是非常庞大而复杂的,通过数据可视化可以使这些数据更加容易理解和分析。
本系统将收集并处理航班数据,然后使用Python数据可视化库来展示这些数据。
这将有助于航空公司、机场管理部门以及旅客更好地了解和分析航班数据。
1. 引言a. 研究背景与意义在航空运输业中,航班数据是至关重要的。
航空公司和机场管理部门需要了解航班的准时率、延误情况、乘客流量等信息来提高运营效率和乘客体验。
然而,航班数据通常是非常庞大和复杂的,传统的数据分析方法往往无法有效地处理和理解这些数据。
因此,数据可视化成为一种重要的工具,可以帮助人们更好地理解和分析航班数据。
b. 研究目标本毕业设计的主要目标是设计和实现一个基于Python的航班数据可视化分析系统。
该系统将收集航班数据,并使用Python数据可视化库对这些数据进行展示和分析。
2. 系统设计与实现a. 数据采集与处理在系统设计初期,我们需要确定从哪些数据源采集航班数据,并设计相应的数据处理流程。
常见的数据源包括航空公司的数据库、机场的实时数据、航班追踪网站等。
采集到的数据可能是非结构化或半结构化的,因此需要对其进行清洗和转换,以便后续分析和可视化。
b. 数据可视化与分析系统将使用Python的数据可视化库,如Matplotlib、Seaborn和Plotly等,对航班数据进行可视化和分析。
根据实际需求,可以选择使用不同的可视化类型,如折线图、柱状图、散点图等。
同时,还可以使用互动式可视化工具,如Plotly提供的交互式图表,使用户可以通过交互方式探索数据。
May 11, 2009William Anthony ManninoA More Accurate Return- on-Investment Diagnosis: Why Healthcare Organizations Are Adopting an ROI CultureHealthcare organizations and the physicians they support are vital to the well-being of the human race, yet, when it comes to the balance sheet, many of these entities are facing real emergencies of their own. Costs for medicine, equipment and indigent care are increasing.At the same time, reimbursements are going down.After cutting expenses to the bare bones, where else can these organizations look to seek out savings? Many technological solutions come with the promise of increasing efficiencies and reducing costs. But, how can organizations really be sure where to invest their limited funds? Sometimes it takes a specialist to identify the best possible cure. That’s why healthcare organizations are turning to an independent third party to assess their individual situations– and identify, quantitatively, which software and process improvements bring the greatest return on investment (ROI).The idea of a full ROI assessment is fairly typical when organizations look at clinical systems, which is also one of the first places organizations look to cut costs. But, examining ROI across the board can provide healthcare organizations with a viable, accurate game plan – and clearly identify which technology investments will either increase revenue or significantly reduce expenses. Instead of basing decisions on hearsay, the results of similar healthcare operations or another CEO’s opinion, these organizations can plan their projects based on scientific analysis of their individual situations, conducted by an unbiased healthcare expert.By adopting an ROI culture, healthcare organizations can strategically tie software solutions to their individual business goals, and experience true results. An ROI assessment can also prevent unnecessary expenditures, or those that simply can’t deliver the result level necessary to justify the expense. In some cases, it uncovers an area of waste and inefficiency that was never acknowledged before.Dissecting the Benefits of Workforce Management SystemsConsider, for example, workforce management systems. Although clinical management systems are typically the focal point for any medical organization, the big savings – and the greatest amount of waste – can often be found in the labor pool. In most organizations, the largest internal cost is people – easily topping millions of dollars each year. Even a one percent to five percent reduction of total payroll costs – without layoffs – would deliver significant financial benefits. The key is having the right people in the right place at the right cost.In most situations, an assessment reveals one or two significant areas of bleeding. For example, an organization may be overusing agency personnel, or have a large occurrence of unearned overtime. It may be paying an unnecessary shift differential that exceeds the expected cost. In other cases, employees may be “buddy punching,” handing off badges for automated sign-in by one employee while the others are parking the car or getting a cup of coffee. Those five or 10 minutes add up over a year, often to staggering totals. Many times, employees – knowingly or unknowingly – work the system, seeking out an extra 15 minutes here or there. Again, this undetected incremental time adds up to tens of thousands of dollars. Then, there’s the inci-dence of plain old human error, which, in a typical organization, averages close to five percent. During an ROI assessment, an experienced team goes into a healthcare organization and first takes a long hard look at payroll. This figure is broken down by percentage of time spent on overtime, agencies, shift differential – as well as productive versus nonproductive time and total payroll cost for an eight-hour period. This figure is compared to benchmarks, culled from an extensive database of comparative data. From here, the team creates a savings forecast using logical variances to pinpoint the projected results that would occur if the organization implemented the new technology and processes, and began using these in an optimal way. For example, an integrated, self-service staff scheduling system could reduce information technology and agency costs, while ensuring planned staffing meets actual patient demand. Dashboard analytics tools could more effectively identify staffing trends, and provide a consistent relational view of labor and its relation to revenue. Any technology implementation would also require behavior change throughout the organization – and a plan of action, tied to the organization’s long-term goals.The second part of the ROI assessment – the forecast – underscores the importance of using an independent third party for this type of evaluation. Chances are, the actual software company, hoping to make a sale, would present the highest possible savings or returns logically possible. On the other hand, independent, software-agnostic third parties, such as ACS, calculate the ROI, then take between two percent and 20 percent of that calculation to present as the projected return. That means organizations – and their boards – can know, beyond a shadow of a doubt, what they are getting back for every dollar they are spending.The third part of the assessment involves measuring actual results. These actuals are typically measured three months and nine months after go-live, and include a cost averageof holiday and non-holiday payrolls. Because of the conservative projections presented, most organizations realize greater-than-anticipated results.As an example, at a recent ACS engagement for a 2,000-employee hospital in the South, ROI for a new workforce management system was projected at one percent of annual payroll.The actual savings was two percent, or $1.5M, based on the first six months of measurement collected after project implementation.In another engagement, a 40,000-employee healthcare system in the Northeast went ahead with its workforce management system after an ROI forecast of one percent of annual payroll. The actual savings, after only six months of utilizing the new technology, is trending to 1.5 percent, or $12 million. In addition, this ACS client was able to write off $200,000 of vacation liability due to the increased tracking of used personal time off (PTO) – an area the assessment identified as one of concern.Just as important, a more efficient workforce management system worked to increase the level of patient care.Checking the Pulse of Enterprise Resource Management SolutionsAnother area of potential high returns involves enterprise resource management (ERM). In this type of ROI assessment, the team evaluates the healthcare organization’s current procurement and inventory control processes, discount structures and price of goods versus specific benchmarks. In essence, they look at how the organization currently manages its assets and resources, identify areas for improvement and project the hard benefits that could be realized by implementing a more robust ERM system.Sometimes, these assessments show the benefits of greater supplier integration, or uncover areas where the organization is simply paying too much for specific items. Other times, the greatest savings come from employing more self-service procurement technology enterprise-wide to decentralize the inventory management process. Many healthcare organizations benefit from employing handheld technology for a more accurate, real-time supply record, which not only prevents outages, but also reduces the incidence of unnecessary orders. Adopting an ROI Culture for Ongoing Fiscal HealthAs healthcare organizations continue to face ongoing economic challenges, accurateROI projects have emerged from a “nice to have” to a standard operating procedure in organizations of all sizes, nationwide. The goal is to make the ROI mindset a part of the organizational culture, with ongoing programs that achieve and reinforce the idea of knowing, for every dollar spent in effort, materials, time or people, exactly what that organization is getting back. If that return is one dollar or less, then that area requires evaluation – and ultimately, change.ROI-focused organizations share similar behaviors beyond the need for projected returns. They have written policies that are always adhered to, to ensure they get the most out of their technology investment. Workflow and procedures go hand-in-hand with technological returns. At the same time, processes are in place to monitor productivity, as well as time and attendance, and overtime. The use of biometrics and other fraud-prevention tools iseither current or in consideration. Just as important, ROI success is shared throughout the organization, to positively reinforce the behavioral change.For the past 20 years, ACS has partnered with healthcare organizations to enable them to achieve their goals. In today’s economy, our experience with ROI is more vital than ever.We have a unique, comprehensive, proven methodology and a staff of former healthcare executives who understand the industry and the intricate challenges it faces. In just four weeks, these experts can provide healthcare organizations with a clear understanding of where their points of pain really are, and which technological systems can bring the rapid relief they need. Healthcare organizations are experts at giving patient care. Now, it’s time for them to ensure they’re getting the most from their technological investments. An ROI assessment, and the adoption of an ongoing ROI culture, can breathe new life into a dollar-stretched organization – and get it on the road to recovery.You can learn more about us at .®。
大数据的影响英文介绍作文Title: The Impact of Big Data: Revolutionizing the Future。
In the digital age, the advent of big data has usheredin a new era of innovation and transformation acrossvarious sectors. Big data, characterized by its vast volume, high velocity, and diverse variety, has profoundlyinfluenced numerous aspects of our lives, ranging from business and healthcare to education and governance. This essay delves into the multifaceted impact of big data, exploring its implications and opportunities for the future.One of the most significant effects of big data lies in its capacity to revolutionize decision-making processes. Through advanced analytics and predictive modeling, bigdata enables organizations to derive actionable insights from massive datasets in real-time. By leveraging these insights, businesses can make informed strategic decisions, optimize operations, and gain a competitive edge in themarket. For instance, retailers can analyze customer purchasing patterns to personalize marketing campaigns, leading to higher customer engagement and increased sales revenue.Furthermore, big data plays a crucial role in driving innovation and fostering technological advancements. The wealth of data generated from various sources, including social media, sensors, and online transactions, serves as a valuable resource for research and development. Machine learning algorithms and artificial intelligence algorithms can analyze this data to uncover hidden patterns,facilitate product innovation, and fuel the creation of disruptive technologies. For example, in the healthcare sector, big data analytics is instrumental in drug discovery, disease diagnosis, and personalized medicine, leading to improved patient outcomes and enhanced healthcare delivery.Moreover, big data has transformative implications for society as a whole, particularly in the realm of governance and public policy. Government agencies can harness big dataanalytics to enhance decision-making processes, improve service delivery, and address societal challenges more effectively. By analyzing data related to transportation, urban planning, and public health, policymakers can develop evidence-based policies and interventions that better meet the needs of citizens. Additionally, big data enables greater transparency and accountability in governance by providing insights into government operations and expenditures, thereby fostering public trust and participation in democratic processes.However, alongside its myriad benefits, big data also raises important ethical, privacy, and security concerns. The collection and analysis of vast amounts of personal data raise questions about individual privacy rights and data protection. Moreover, the potential for data breaches and cyberattacks poses significant risks to data security and confidentiality. As such, it is imperative for organizations and policymakers to implement robust data governance frameworks, security measures, and ethical guidelines to safeguard against misuse and abuse of data.In conclusion, big data represents a transformative force that is reshaping the way we live, work, and interact with the world around us. From enabling data-driven decision-making and fostering innovation to enhancing governance and public policy, the impact of big data is profound and far-reaching. However, realizing the full potential of big data requires a concerted effort to address ethical, privacy, and security concerns, ensuring that its benefits are equitably distributed and responsibly managed. As we continue to harness the power of big data, we have the opportunity to unlock new possibilities and create a more prosperous and sustainable future for generations to come.。
5 Reasonsto Choose Oracle Big Data ApplianceAccording to IDG Enterprise, 78 percent ofenterprises agree that big data analysis willfundamentally transform the way they dobusiness over the next one-to-three years.The challenge is to extract value from data—both at speed and at scale. All too often,complexity stands between you and truebig data value.So, isn’t it time you considered running yourbusiness on Oracle Big Data Appliance?Japan’s Nara Institute of Science and T echnology(NAIST) wanted to centralize its campus-wide,structured and unstructured research data fromHadoop, and chose Oracle Big Data Appliance tointegrate and process massive volumes of data.Thanksto Oracle Big Data Appliance, NAIST gained powerful,integrated information processing and securitycapabilities for its advanced research platform.Oracle Big Data Appliance gives usunprecedented insight into game,business, and market health.Thepossibilities are endless with thishighly extensible solution that enablesus to gather, analyze, and use data inways that were simply not possible before.Craig Fryar, Head of Wargaming BusinessIntelligence, WargamingWe strive to be the world’s pre-eminent connectedcar and broader internet of things integrator,and Oracle is a critical partner on our journey.Oracle inherently delivers an end-to-end stackin the cloud—from Infrastructure as a Service toPlatform as a Service and beyond.Romil Bahl, CEO, LochbridgeUsing Oracle Big Data Cloud Service and Oracle Database Cloud Service,Lochbridge was able to assimilate and analyze massive amounts of data to helpits clients monetize that data and bring IoT innovations to market more quickly.For more reasons to choose Oracle Big Data Appliance,Visit the Oracle Big Data Appliance pageand download our white paper:The Surprising Economics of Engineered Systems for Big Data.Copyright © 2017, Oracle and/or its affliates. All rights reserved. Oracle and Java are registered trademarksof Oracle and/or its affliates. Other names may be trademarks of their respective owners.。
Big Data — It’s not just forGoogle Any MoreThe Software and Compelling Economics of Big Data ComputingEXECUTIVE SUMMARYBig Data holds out the promise of providing businesses with differentiated competitive insights. But those insights, protected by a healthy level of technical complexity, require large-scale data consumption and an innovative approach to the analysis of large public and private data sets. Currently, production Big Data implementations routinely process petabytes and even exabytes of data on a daily basis. Indeed, many of the technologies and implementation strategies for Big Data originated with web companies struggling with unprecedented volumes of data. At the same time, the economics of cloud computing — predicated on declines in the cost of data storage, bandwidth and computing time — point inevitably to the promise of Big Data rapidly becoming a reality for even comparatively small companies.Table of ContentsExecutive Summary (1)Introduction (2)Data Science and the Big Data Opportunity (2)The competitive edge of commodity hardware: In the cloud or in the enterprise (2)Architecting for both sides of the data equation (4)Hadoop Architecture Checklist (5)Summary (5)INTRODUCTIONFew emerging technologies embody both the potential and the complexity of the digital landscape as clearly as Big Data. Data resides in data sets around the globe in amounts that were all but inconceivable even ten years ago. The explosion of data has resulted from numerous sources, such as the proliferation of consumers using mobile digital data creation devices — from digital cameras to smart phones — and the ubiquitous internet-connected digital monitoring and recording devices used in the public and private sectors and throughout global industry. Moreover, this new data, by virtue of ubiquity, mobility, and traceability, is often rich in metadata that can include its creator’s location, identity, purpose, and the data’s time and medium of creation.DATA SCIENCE AND THE BIG DATA OPPORTUNITYThe availability of this new universe of data and the emergence of analytic technologies and methodologies capable of combing through it has had wide impact. One consequence has been that, in science and medicine, the use of empiricism, rather than experimentation, is a growing trend, enabled by large data sets that provide statistical insights. Similarly, successful machine language translation is statistics-based, benefitting from the availability of large data sets.Another consequence of Big Data has been the growing demand for software skill-sets in areas that both enable, and benefit from, Big Data, such as:> Machine Learning> Statistics> NetworkingIn a classic case of “necessity is the mother of invention”, large web companies sought to capitalize on the valuable information they were accumulating by computing at web scale. As a result, web companies have contributed various technologies to the emerging “Big Data” software stack. We’ll look more closely at the role web companies have played in Big Data technology later in this paper.THE COMPETITIVE EDGE OF COMMODITY HARDWARE: IN THE CLOUD OR IN THE ENTERPRISE.The technologies emerging from web companies would not alone be enough to sustain real growth in Big Data were it not for accompanying economic trends. Chief among these is the decline in the cost of bandwidth to a level at or about $150 per TB transferred and falling. To put that in perspective, streaming a movie in 1998 at Netflix’s current encoding bitrate would have cost $270 per movie. Today their cost is $.05.1Scenario: Retrieve and catalog all of the images on the internetHow does the scenario play out in the world of Big Data? Consider a hypothetical application designed to retrieve and catalog all of the images on the internet. As part of the process, the application will convert each image to a 100X100 thumbnail, compute an MD5 hash on each image, extract EXIF data and store when available, and then push the generated catalog to a central location.Let’s estimate the cost per billion images to create an image catalog of all images on the net. For the purposes of this exercise, we’ll assume that the technologies used include:> Nutch (An apache open-source search engine based on Hadoop.)> Hadoop> Amazon EC2> Amazon Elastic Block Store (A web service for storage but with better performance characteristics than S3.)We’ll use the following cost estimate methodology. We’ll randomly select approximately 70,000 domain names, crawl the home page of each, and extract all images from the crawled pages. Next, we’ll randomly select 100,000 images from the gathered set. Then, we will retrieve and process selected images as described above. From the costs determined in this exercise, we can extrapolate the cost per billion.Crunching the numbers2First, let’s look at numbers associated with computing and bandwidth capacity. Low-end EC2 instances can successfully retrieve and process about 100,000 images per hour. Once processed, the resulting image data within the catalog averages approximately 2k bytes compressed. Further, we find that it takes about 10,000 hours of computing time to retrieve 1 billion images, with a storage requirement of about 2 TB of data per billion.We therefore find that the server cost per billion images is $850. This is because low-end EC2 instances sell for $.085 per hour.We also find that the storage cost is $0. This is because the data will not be stored by , but will be pushed to a site owned by the developer.Next, bandwidth costs inbound are $0, since inbound bandwidth on is currently free. However, we can assume $750 per billion images in the future at current bandwidth prices. Bandwidth costs outbound are about $300 per billion, noting the cost of $.15 per GB with 1 billion images yielding 2TB of data.The resulting total cost, then, is about $1,200 per billion images, rising to as much as $2,000 per billion if starts charging for inbound bandwidth.Some additional considerations are that data acquisition at a rate of 1 billion images per month requires about fifteen servers. Low-end EC2 instances can push about 30mbps of bandwidth, which, multiplied by fifteen servers, puts us at about 450 mbps of bandwidth consumed to retrieve 1 billion images per month.Cost in context3What if we had performed this same exercise in 2000? To begin, there was no cloud-computing option, so we would have had to actually purchase and provision fifteen rack mounted servers at a cost of about $75,000. Costs associated with a data center infrastructure to support the servers, such as racks, routers and switches, would have been, conservatively, $250,000. The cost of 450 mbps of bandwidth per month would have run about $300,000.In short, the costs would have been not only $325,000 up front capital investment, but also some $300,000 per month in ongoing bandwidth costs. This puts the cost at about $500,000 just to retrieve 1 billion images processed.ARCHITECTING FOR BOTH SIDES OF THE DATA EQUATIONAlthough the cost of working with Big Data declines continuously, working with Big Data still requires us to solve at leasttwo complex software challenges. First, we must find an effective way to harness existing computing power so that it can process Big Data-scale data sets. Second, we must find a way to serve the data that differs fundamentally from current, standard database serving approaches.Big Data ProcessingAs with many of the tools and methodologies enabling Big Data, the solutions to the processing software challenge emerged from web-based companies. In 2004, Google released a paper entitled MapReduce: Simplified Data Processing on Large Clusters4 arguing that the MapReduce programming pattern can be used to solve a surprising number of data processing problems, and that such a pattern can be used as a way to dynamically scale the execution of MapReduce programs across an arbitrarily large cluster of machines.An open source, Java-based implementation of MapReduce known as Hadoop, released by Doug Cutting, developer of Lucene and Nutch, among other things, was adopted as an Apache Foundation project and has become a widely used platform and runtime environment for the deployment of Big Data applications.Hadoop provides a scalable runtime environment that handles most of the complexities associated with doing large scale data processing jobs across large clusters. Hadoop includes both a distributed file system and a distributed runtime for map/ reduce execution. Hadoop deploys code, partitions and organizes data, splits jobs into multiple tasks, schedules those tasks taking into account data locality and network topology, and handles failures and their associated restarts. Developers who implement the appropriate map/reduce interfaces in their code receive the benefit of exploiting the compute and I/O capacity of large clusters transparently to their own implementation. In a sense, Hadoop gives them horizontal scalability for free.Hadoop has become a platform on which developers have layered other kinds of interfaces. Solutions exist, for instance, that accept SQL input and dynamically generate map/reduce jobs that run on Hadoop to extract data stored within Hadoop’s distributed file system. Generalized workflow engines have been built on Hadoop. These kinds of systems are built to simplify the ability of end-users to take advantage of the scalable architecture and Big Data capacity of a platform like Hadoop without requiring them to master the MapReduce pattern on which Hadoop is based.Although Hadoop is an outstanding infrastructure for Big Data Processing, it is not a suitable platform for data serving. Hadoop isn’t designed to respond to fine-grained network requests arriving from end-users who expect low latency responses. Serving for Big Data, therefore, requires a different solution set.Big Data ServingJust as Big Data calls for new approaches to computing, it also requires new approaches to data serving. Today, single web sites routinely serve user communities larger than the entire population of whole continents. Low-latency responses are a bellwether of web site quality, since users don’t like, and will generally not tolerate, long wait times when connecting to web sites. Businesses therefore require highly-available, fast-responding sites in order to compete, but no single server offers sufficient capacity for web-scale computing. As a result, web site operators have embraced a scale-out architecture. Scale-out architectures generally involve the use of highly distributed software architectures in conjunction with the deployment of fleets of commodity servers.The fundamental challenge in big data serving is to facilitate low latency responses from enormous data repositories. Because traditional databases are often unable to scale to the levels required by the largest web sites, a number of alternative approaches have emerged. One approach often used in conjunction with traditional databases is to use a distributed, in-memory cache alongside an RDBMS. These distributed caches (e.g. memcached5) can cache query results and are designed to significantly reduce database overhead by reducing the number of queries. Facebook has contributed significantly to projects such as memcached.6Beyond merely improving existing open-source products, some web companies have open-sourced entirely original software packages. Facebook has released a product called Cassandra through the Apache Foundation. Cassandra is described as “a distributed storage system for managing very large amounts of structured data spread out across many commodity servers.”7 Cassandra provides a low-latency persistent database that scales across many machines.One aspect of achieving low-latency responses in highly distributed systems is to minimize the coordinated transactive overhead of updates and maximize read performance. In that vein, Cassandra represents a persistence model of “eventual consistency”. This capitalizes on a key characteristic of many big data applications, which is that they don’t have the need for immediate consistency across all server instances.SUMMARYAs the amount of data being stored around the globe continues to rise and the cost of technologies that enable the extraction of meaningful patterns from that data continues to fall, more and more companies can benefit from the powerof Big Data. The current generation of Big Data technologies — as exemplified by Hadoop — represents a relatively early incursion into the Big Data universe. But few would argue that, even today, the benefits of Big Data may constitute a game-changing competitive advantage for any business that is prepared tap into it.© 2010 Advanced Micro Devices, Inc. AMD, the AMD Arrow Logo, ATI, the ATI Logo, AMD Opteron, PowerNow, and combinationsthereof are trademarks of Advanced Micro Devices, Inc. © Copyright 2010 Hewlett-Packard Development Company, L.P .The information contained herein is subject to change without notice. The only warranties for HP products and services are setforth in the express warranty statements accompanying such products and services. Nothing herein should be construed as1h ttp:///the_business_of_online_vi/2010/01/bandwidth-pricing-trends-cost-to-stream-a-movie-today-five-cents-cost-in-1998-270.html. Comcast 25 mbps of home bandwidth would have cost the consumer $25k per month in 1998.2 R esults of tests run on Amazon’s EC2 platform by Advanced Micro Devices, Inc., using Amazon’s current pricing (found at /ec2/#pricing). At the time of the testing, inbound bandwidth at Amazon was free of charge.3 B andwidth costs were extrapolated from various data regarding bandwidth prices a decade ago (e.g. /news/2000/0626carrier.html). The server costs were a conservative estimate assuming rack mounted server costs of $5k per.4 /external_content/untrusted_dlcp//en/us/papers/mapreduce-osdi04.pdf5 /6 /note.php?note_id=393913789197 /projects/ladis2009/papers/lakshman-ladis2009.pdf。