Caltech Market Data Guide
Last Updated: 02/14/2017 - Added information about new and exciting on campus resources!
Due to frequent inquiries regarding research ready market data, we have put together a quick guide to answer common questions.
Click Here for listing of On-Campus Resources (Current Students only)
New! Center for Research in Security Prices Dataset (CRSP) - This dataset is now available through Caltech Libraries. This dataset is provided by the University of Chicago Booth School of Business and is relatively comprehensive, providing more than just equities data. It is not exactly "user friendly", and data typically requires a lot of post-processing effort before it can be made research ready. It doesn't have the ease of use of QuantQuote datasets, but it is available for free to ALL Caltech students.
Caltech Historical Stock Database (CHSD) – This database contains minute resolution data for over 1000 US Equities and most active ETFs. Price information is split and dividend adjusted making it research ready with little or no additional effort. The database is stored on the Caltech Quant Group computing cluster and available for local access. This is available to all current members of the Caltech community. Update: This dataset is no longer maintained, we recommend Caltech students use the on campus CRSP resource now.
Group license to the QuantQuote Cloud – This database contains tick resolution data for the entire US equities market dating back to 1998 along with split and dividend adjustments. Caltech username is quantclub and full dataset specifications can be found here. For access, please contact us for the password, access is typically limited to group members engaged in active research.
Bloomberg Terminal - Available via the Caltech Investment Office, this resource is not managed by the Caltech Quant Group.
Free Historical Data
For alumni and non-Caltech users, there is a wide selection of stock market data available for free. Below is a comprehensive list compiled by group members.
Daily Resolution Data
1) Yahoo! Finance– Daily resolution data, with split/dividend adjustments can be downloaded from here. The download procedure can be automated using this tool. Note, Yahoo quite frequently has errors in its database and does not contain data for delisted symbols.
2) QuantQuote Free Data– QuantQuote offers free daily resolution data for the S&P500 at this web page under the Free Data tab. The data accounts for symbol changes, splits, and dividends, and is largely free of the errors found in the Yahoo data. Note, only 500 symbols are available unlike Yahoo which provides all listed symbols.
Commercial Historical Data
Higher resolution and more complete datasets are generally not available for free. Below is a list of vendors which have passed our quality screening (in total, we screened over a dozen vendors). To qualify, the vendor must aggregate data from all US national/regional exchanges as only complete datasets are suitable for research use. The last point is especially important as there are many vendors who just get data from a couple sources and is missing important information such as dark pool trades.
1) QuantQuote – Based on our own experience and recommendations from Caltech Quant Group alumni now working in industry, QuantQuote is the vendor we most highly recommend. This is the dataset we use for our own research alongside CHSD. Included split/dividend adjustments, symbol change tracking, and excellent software make this the easiest to use research dataset. As an added bonus, pricing is the lowest of the vendors listed in this section so QuantQuote provides the best value (although we wouldn't exactly call the prices cheap). Provides data in minutes, second, and tick resolution.
One key differentiator is that QuantQuote is mostly a HFT trading software company and not purely a data company, so they have unique insight into how a Quant dataset should be structured and delivered. As far as we know, QuantQuote is the only vendor which correctly removes out of sequence trades when generating their minute and second resolution datasets.
EDIT (11/20/2015): An alumnus has recently emailed to let us know that QuantQuote provides a 10% educational discount using coupon code EDU2014 (still working even though it is 2015 now). The restriction is the order must be place with a university email.
2) TickData – Also well regarded by our alumni in industry, TickData has a large selection of data from overseas exchanges. It too keeps track of symbol changes and provides data for corporate events. However, the software interface is Windows based and unsuitable for Linux development (EDIT: We have since learned that TickData now has a Linux client, although we haven't tested it to see how well it works). The data quality is comparable to QuantQuote, but corporate events and splits/dividend data structuring is not as intuitive/easy to use. Finally, TickData is by far the most expensive of all of the vendors we reviewed, even with the 10% discount with coupon code ACA10SM. Tickdata offers minute and tick resolution data.
3) NYSE TAQ – This is the official data for NYSE listed symbols provided by NYSE. Like the other vendors in this list, the data includes all regional exchanges, not just NYSE and NYSEARCA. Split and dividend information is provided, but symbol change information is not provided. The disadvantage is no NASDAQ listed symbols are provided; they need to be purchased separately from NASTRAQ. Provides tick data only. Unless you want specifically NYSE securities, we recommend QuantQuote and TickData datasets since they are cheaper than a combination of NYSE TAQ and NASTRAQ and offer a number of value added features. Provides tick data only.
EDIT: We have heard from an alumnus that NYSE TAQ has errors, specifically some exchanges will report incorrect trade prices for some symbols on some days. These can be off from the correct value by as much as an order of magnitude! We did not check against Tickdata, but checks against the QuantQuote dataset show that QuantQuote removes these errors
4) NASTRAQ – This is NASDAQ’s equivalent to TAQ, the description above for TAQ also applies equally well to NASTRAQ. Provides tick data only
5) Compustat – Compustat provides ONLY daily resolution data going back to the 1960’s including delisted symbols (which Yahoo! Finance does not include). Compustat also contains an exhaustive database of corporate fundamentals data. Compustat is not geared towards the individuals, instead they target corporations and institutions and data access is provided via subscription purposes. Minimum monthly charges are generally several thousand.
6) CRSP – Run by the University of Chicago, CRSP contains comprehensive stock data in ONLY daily resolution only going back over 100 years, including delisted symbols. Like Compustat, CRSP is also geared towards institutions and does not sell data or offer subscriptions suitable for individual.
Vendors to AVOID
The vendors listed below are ones members and alumni have had poor experiences with. Issues include uncertain provenance of data, incomplete market coverage, frequent data errors, and missing features.
1) Kibot – This vendor is notorious for its poor data quality. Numerous group members have reported very poor experience with Kibot. Problems include missing data or just completely incorrect data (errors like negative prices and incorrect dividend adjustment formulae have been reported). We strongly recommend this vendor be avoided.
2) Pi Trading – The missing and incorrect data problems reported with Kibot have also been widely seen in the Pi-Trading data. Pi-Trading data is priced very low, but zero care has been taken in ensuring correct split/dividend adjustment or accounting for symbol changes. We do not consider this data suitable for any serious research as the data quality is very poor.
3) Kinetick – Kinetick has very limited history. Of the history that it goes have, members have reported numerous problems similar to those found in Kibot and Pi Trading so we have confirmed that the data quality is bad.
4) TradingPhysics – Members who have reviewed TradingPhysics report that the data is expensive (particularly for large orders) and not intuitive/easy to use. Ordering or large datasets is also challenging. Finally, we have received reports of poor after sales support. Does not account for corporate events.
5) CGQ – Difficult to navigate ordering system and extremely overpriced compared to other vendors. Does not account for corporate events.
Free Live "Feeds"
We put Feeds in quotations because there doesn't really exist a real feed which is free. The free live stock data feeds are simply websites which at no cost, provide a streaming live feed of decent quality, where we define quality as data completeness and speed of updating.
investing.com - We recommend investing.com because as far as we can tell, they derive data from the consolidated tape, and their website has one of the highest update frequencies. It also ranks highly for data completeness, with many futures, indexes, FX, options chain data included.
Yahoo Finance - Yahoo is actually another solid provider, with good coverage for stocks and ETFs, but a bit lacking with futures and other more arcane instruments. The downside is the refresh rate is not so good, and the data is derived from BATS only. However, at these resolution/time intervals, this later point probably doesn't make a big difference. Yahoo also has an API that can be tapped into although it is rate limited and will IP block you if you make too many requests on it in rapid succession.
Google Finance - Like Yahoo, Google is good geographic research (most international exchanges), but our feeling is that Google doesn't take Google Finance very seriously, so errors are more common, and like with Yahoo, some key futures exchanges and less common instruments are missing, however, it does provide better charting capabilities.
Commercial Live Feeds
We define professional products as geared towards Quant trading and while the non-professional products are meant for display/charting software.
Professional feeds will aggregate data from all markets including regional exchanges to build a consolidated book. Instead of querying an API at pre-defined intervals, professional feeds pass all data to the client as it is sent out by the originating exchanges. These solutions are generally high bandwidth, high update frequency, and lowest latency.
QuantQuote TickView – TickView by QuantQuote is a very solid product that is available in both an exchange colocation environment and also via the internet. TickView automatically performs feed consolidation (TickView consolidates data from over a dozen exchanges). The application is natively Linux and for non-colocated clients, utilizes an advanced compression to save bandwidth. It is also windows compatible (at least for US equities; CME & OPRA are Linux only).
EDIT: We have heard from alumni that QuantQuote only offers TickView software with colocation now, the over the internet product is discontinued.
Nanex Nxcore – A service very similar to TickView except Nxcore is Windows based. For Linux environments, we recommend TickView while Nxcore is better on Windows. Pricing tends to be higher than TickView for a similar amount of bandwidth usage. Latency found to be slightly higher than TickView.
Thomson Reuters Elektron – A very pricey service that does not implement socket compression as well as TickView and NxCore. Can send you data from all exchanges separately (not NBBO consolidated) but requires lots of bandwidth and pricey dedicated fiber lines. For the fiber connection, Thomson Reuters partners with Savvis to provide connectivity. Also provides feeds for exchange collocated systems (but don’t offer the rack space themselves, just the cross connects within the datacenter). Overall, a good service, but QuantQuote and Nanex provide more cost effective alternatives.
Non-professional feeds do not give a true indication sense of market activity as they generally redistribute data from only one or two exchanges, such as BATS or the NYSE consolidated tape. Furthermore, the data they send out is usually snapshots with larger update intervals and longer latency. Clients generally query the data via an API on request, instead of having every tick sent. Generally, they are designed for interfacing with brokerage/charting programs like TradeStation and not meant for programmatic algorithmic trading. As our focus is programmatic trading, we did not review any of the vendors listed below, they are just included for completeness. We welcome feedback from members and alums about these services.
The consensus that we have is that solutions such as QuantQuote TickView are not that much more expensive compared to the non-professional solutions listed here and we recommend using a professional feed for optimal quantitative trading results.
Beyond the scope of this article, but many exchanges allow customers to be collocated on site and access data feeds directly without going through 3rd party market data vendors. For port access, connectivity, and rack space, the minimum charges for these types of services can run tens of thousands per month, per exchange. However, they are the ultimate low latency market data solutions.
Active symbol lists - Available from the Nasdaq public FTP, see this post.
Survivorship bias free historical lists - Available for very low cost from QuantQuote as a standalone offering. Also available with Compustat subscription.
S&P Index Constituent Changes - Announced here, manual compiling required.
Current Index Constituents - The trick is to find constituents of ETFs which track the index, Russell 3000 example here.
CUSIP lookup - Provided for free by Quantum Online
Split announcements - Provided for free by Yahoo! Finance.
Economic Calendar - Provided for free by Yahoo! Finance.
Economic Indicators - Provided for free by Oanda.
Stock Screener - Provided for free by Finviz.
If there are additional sources of data that can be added to this list, or additional feedback on any of the sources provided above, please contact us.