bringing the yahoo finance stream to the shell


A little while ago a posted a primitive way to get to yahoo finance streaming data. As you can guess this was just the beginning. To raise the bar I tried to parse the received data and bring it to the shell. To get this done I needed several tools.

  • curl – to send and receive the http request
  • transform – a primitive tool to do streaming operations within one line
  • spidermonkey shell (a javascript shell which can parse and reformat the data)

The complete logic will be done in the javascript. So lets start with the curl command line:

curl -s -o - -N 'http://streamerapi.finance.yahoo.com/streamer/1.0?s=JAVA,MSFT&k=l10&callback=parent.yfs_u1f&mktmcb=parent.yfs_mktmcb&gencallback=parent.yfs_gencb'

Let’s see what we have here. First we call the yahoo streaming api and want the current price (l10) for the stocks of Sun ans Microsoft. The callback part cannot be changed. If you change this part the whole request will not succeed. Also important is to get the output to STDOUT so that we can pipe the output to the next application.

Second part of the work is just to call the transform application (further explanation here).

The third part is to pipe the output of the transform process into the javascript shell. I started the shell with the following command:

js -f script.js

The script script.js look like this:

yfs_u1f = function(tmp) {
try{
print("msft: "+tmp.MSFT.l10);
}catch(ex){}
try{
print("java: "+tmp.JAVA.l10);
}catch(ex){}
};

yfs_mktmcb = function(tmp) {
/*ignore timestamp */

};
var parent=this;
parent.yfs_u1f = yfs_u1f;
parent.yfs_mktmcb = yfs_mktmcb;

while(1==1){
var t = readline();
if(t.substr(0,3) == "try"){
eval(t);
}
}

First we have to implement the callback functions which will be called from the http response. Then we construct an object called parent where we map these functions into. Now we have a working construct to receive the data and are able to work with it in our shell. What we need now is a little while loop to continuously read from STDIN and wait for new data. By the way accessing the tmp variable in the callback function seems somewhat complicated to me. I’m sure there is an easier way to access it but I have no clue how. If you have an idea how to do it better please post it to the comments.

The complete bash statement would look like this:

curl -s -o - -N 'http://streamerapi.finance.yahoo.com/streamer/1.0?s=JAVA,MSFT&k=l10&callback=parent.yfs_u1f&mktmcb=parent.yfs_mktmcb&gencallback=parent.yfs_gencb' | /tmp/transform | js -f script.js

If you run this you should get this output:

msft: 17.83
java: 4.47
msft: 17.84
msft: 17.86
msft: 17.81
java: 4.46

Now you can use whatever tools you want to work with that data. For me this will be piped directly into my postgres db for further processing.

, , ,

  1. #1 by Richard on March 13, 2009 - 14:25

    Hi Jens,

    Can you tell me if your code still works? I am trying to get at Yahoo Finance from a java app and seem to be running into a brick wall. As a test I created a simple app that opens a connection to the http location you use, expecting to get some sort of reply. Sadly the connection seems to hang (much as it does if I use that URL in IE8

    ANy ideas?

  2. #2 by jens on March 15, 2009 - 01:58

    @Richard
    Hi,
    I’m currently working on an easier version to get this running without this much steps. But the URL works absolutely fine for me.
    You mentioned that you tried the url in a browser -> that will not work, because the response of this request is just a bunch of json objects, so the browser has no clue what to do with it.
    On the other hand if you are using this from a java http client you should receive a response stream. Till now I never tried it in java (when I have some time next week, I will try it myself).

    One Problem I had in all implementations: In general most stream applications use buffers. The response data which comes from that stream is pretty small, so to fill one buffer could take quite a while. If you can, try to reduce every buffer you have.
    I hope this helps with your implementation.

    Regards, jens

  3. #3 by Charlie on March 23, 2009 - 02:02

    I had the same problem and sorted it out by using Apache HttpClient and its getResponseBodyAsStream to pump the buffer into my parser:

    // Create an instance of HttpClient.
    HttpClient client = new HttpClient();

    // Create a method instance.
    GetMethod method = new GetMethod(url);

    // Provide custom retry handler is necessary
    method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
    new DefaultHttpMethodRetryHandler(3, false));

    try {
    // Execute the method.
    int statusCode = client.executeMethod(method);

    if (statusCode != HttpStatus.SC_OK) {
    System.err.println(“Method failed: ” + method.getStatusLine());
    }

    // Read the response body.
    InputStream is = method.getResponseBodyAsStream();
    InputStreamReader reader = new InputStreamReader(is);

    JsonParser parser = new JsonParser();
    char[] cbuf = new char[50];
    int len = -1;
    while ( (len= reader.read(cbuf)) != -1 ) {
    parser. parse(cbuf, len);
    }

    } catch (Exception e) {

    }

  4. #4 by Prosenjit Kundu on May 8, 2009 - 20:49

    hi jens,
    Great work man!!! thanks for this, it works like a charm. A small problem i am facing is i can get stream data when i use stocks(as u specified java and mfst), but when i change to index like DOW (^DJI) it wont work.What will be the url for index?

    thanks.

  5. #5 by Prosenjit Kundu on May 8, 2009 - 20:51

    @ jens
    Please ans my question.

  6. #6 by jens on May 11, 2009 - 14:53

    @ Prosenjit Kundu
    hey,

    I tried it with ^DJI and the url works fine – here what I entered:
    curl -s -o – -N ‘http://streamerapi.finance.yahoo.com/streamer/1.0?s=^DJI&k=l10&callback=parent.yfs_u1f&mktmcb=parent.yfs_mktmcb&gencallback=parent.yfs_gencb’

    But there is a catch here. The javascript used to extract the information has the symbol hardcoded, so this needs to be refit.

    For me I have an improved script, which does this a more generic way:
    yfs_u1f = function(tmp) {
    try{
    var source = tmp.toSource();
    var name = source.slice(source.indexOf(“‘”)+1,source.lastIndexOf(“‘”));
    var price = ((new Number(tmp[name].a00) + new Number(tmp[name].b00))/2);
    if(! isNaN(price) )
    {
    print(“insert into realtime(symbol,date,price) values(‘”+name+”‘,current_timestamp,”+price+”);”);
    }
    }catch(ex){}
    };

    -> But even in this version, the values requested, are hardcoded (in my version ask and bid). My tryouts to bring this into one generic c application failed so far, because the yahoo JSON code isn’t a hundret percent compatible with the JSON standard.

  7. #7 by KrisBelucci on June 2, 2009 - 10:18

    Hi, good post. I have been wondering about this issue,so thanks for posting.

  8. #8 by Eric on June 18, 2009 - 20:37

    Hi, I tried to run the command shown but I get this error:

    js: uncaught JavaScript runtime exception: ReferenceError: “readline” is not defined.

    I’m not familiar with java so I was wondering if there is another function I can use instead of readline.

    BTW: I’m running Mac OSX

    Thanks

  9. #9 by steve on July 12, 2009 - 17:14

    One question guys, does this script work on Sundays?

    Here is the error message when I tried on Sunday

    D:\curl>curl -s -o – -N http://streamerapi.finance.yahoo.com/streamer/1.0?s=JAVA
    ,MSFT&k=l10&callback=parent.yfs_u1f&mktmcb=parent.yfs_mktmcb&gencallback=parent.
    yfs_gencb
    ‘k’ is not recognized as an internal or external command,
    operable program or batch file.
    ‘callback’ is not recognized as an internal or external command,
    operable program or batch file.
    ‘mktmcb’ is not recognized as an internal or external command,
    operable program or batch file.
    ‘gencallback’ is not recognized as an internal or external command,
    operable program or batch file.

  10. #10 by steve on July 12, 2009 - 23:34

    I solved it using ” for the URL, my bad.

  11. #11 by steve on July 18, 2009 - 18:01

    Hi,

    I am on windows 2000, doing something like
    curl URL|test.exe

    Apparently, I am not getting anything from curl because the stream is continuous. However, I can get some thing if I do
    curl “http://streamerapi.finance.yahoo.com/streamer/1.0?s=JAVA,MSFT&k=l10&callback=parent.yfs_u1f&mktmcb=parent.yfs_mktmcb&gencallback=parent.yfs_gencb”|test.exe

    Any help will be great.

    #include “stdafx.h”
    #include
    #include
    #include
    #include
    #include

    using namespace std;

    int main(int argc, char* argv[])
    {
    ofstream myfile;

    string s;
    string buf;
    do {
    myfile.open (“example.txt”);
    cin >> buf;
    s.append(buf);
    myfile << buf;
    myfile.close();
    } while (!cin.eof());

    return 0;
    }

  12. #12 by steve on July 18, 2009 - 18:20

    Regarding my previous post its the other way round….
    it works for normal response like
    http://www.google.com
    but does not work for
    url “http://streamerapi.finance.yahoo.com/streamer/1.0?s=JAVA,MSFT&k=l10&callback=parent.yfs_u1f&mktmcb=parent.yfs_mktmcb&gencallback=parent.yfs_gencb”|test.exe

    Sorry about that, a bit tired…

  13. #13 by jens on July 20, 2009 - 16:38

    Hi,

    in general this looks perfectly fine to me. But there were a few things which troubled me with my implementation (mainly buffering). In short you have 3 buffers to look at.
    First one is easy – curl: Curl provides a parameter which disables the buffering (–no-buffer).
    Second buffer is the pipe: On that end I can only speak for the linux implementation of pipes. You can’t control pipe buffering. All pipes are controlled by the kernel and there is no way to influence that behavior.
    Third buffer is the stdin buffer for your streamreader. In C you can influence that buffer size, so I think this should also be possible in C++ too.

    For me solving the problems with the first and third buffer solved the problem. The kernel never did anything unexpected which prevents this from functioning (again – linux kernel).

    Finally, the reason why google.com always works is easy. The site google.com has a relatively fixed size and after downloading this the stream will be closed. After the close all buffers will be flushed and you see all the data in you application. On the other hand, if this stream is not closed, the flush can take some time. Without explicit flushing (as explained above) the stream flushed very irregularly (several minutes) on my machine. So as you can see, without minimizing buffers it can take quite a while.

    I hope this helps (If not just post again, maybe I can get a simple c++ application running on a Windows machine just to get more insight)

  14. #14 by Bryan on February 6, 2010 - 01:15

    Would it be possible to access the stock info for a specified time of the day? Yahoo uses streamerapi for their Historical charts (which includes the current price for every minute of the day).

    http://finance.yahoo.com/charts?s=IBM#chart1:symbol=ibm;range=1d;charttype=line;crosshair=on;ohlcvalues=0;logscale=on;source=undefined

    I’m not able to get anything returned from trying the URIs you mentioned, so I can’t explore my idea, but do you think it might work if {“open” : “1265380231”, “close” : “1265403631”} (In the URI of course) were the same timestamp?

    So theoretically I should be able to access the Bid price from 9:43 am for IBM from the historical data.

  15. #15 by Ulf on April 21, 2010 - 14:06

    Hello Jens,

    thank you for your hint using streamerapi.

    i am not whether it helps. i found some c code for buffering with libcurl. here we go:

    //Declaration bla bla
    #include
    #include “/usr/include/curl/curl.h”
    #include “/usr/include/curl/easy.h”
    using namespace std;

    //Hello I am a function pointer!
    int writer(char *data, size_t size, size_t nmemb, string *buffer){
    /* See: http://curl.haxx.se/libcurl/c/getinmemory.html */
    int result = 0;
    if(buffer != NULL) {
    buffer -> append(data, size * nmemb);
    result = size * nmemb;
    }
    return result;
    }

    //writer is used here
    string download_data_to_buffer(string strUrl){
    /* (A) Variable Declaration */
    CURL *curl; /* That is the connection, see: curl_easy_init */
    string buffer; /* See: CURLOPT_WRITEDATA */
    curl = curl_easy_init(); /* (B) Initilise web query */
    /* (C) Set Options of the web query
    * See also: http://curl.haxx.se/libcurl/c/curl_easy_setopt.html */
    if (curl){
    curl_easy_setopt(curl, CURLOPT_URL, strUrl.c_str() );
    curl_easy_setopt(curl, CURLOPT_FAILONERROR, 1);
    curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writer); /* Function Pointer “writer” manages the required buffer size */
    curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer ); /* Data Pointer &buffer stores downloaded web content */
    }
    curl_easy_perform(curl); /* (D) Fetch the data */
    curl_easy_cleanup(curl); /* (E) Close the connection */
    return buffer;
    }

    compile with

    g++ -c -I/usr/include/curl piipapo.c
    g++ -o testtest piipapo.o -L/usr/lib/libcurl -lcurl

    I am not sure if it helps you.

  16. #16 by stock market trading on July 4, 2013 - 17:22

    Thanks a lot for sharing this with all of us you actually recognize what you are talking approximately!
    Bookmarked. Kindly also consult with my website
    =). We will have a link change arrangement among us

(will not be published)