Jump to content


Photo

Idea: Sharing data between extensions / dlls


  • Please log in to reply
85 replies to this topic

#1 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 18 April 2011 - 01:36 PM

Programmers sometimes need to move data between different, unrelated extensions. An example would be an extension for file reading/writing, and one for networking: Both extensions would likely have some sort of buffer data structure, but if you want to read a file and send it over a connection, how do you get the data from the file extension into the networking extension? You will probably need to copy it over GML somehow. 39dll solves this particular problem by having both file access and networking functions, but in my opinion this is not a good solution, unless your ideal goal is to create the "everything extension". I prefer modularity though, which is why I don't plan to include file access functions in my own Faucet Networking extension. In my opinion, the better solution is to solve the problem of data exchange between extensions.

One option would be to create a "Shared Buffers" dll that is linked by all extensions which want to exchange data in this way. Windows will only load the dll into memory once for the game process, so it can be used as a central point to manage and exchange shared data buffers. If both the file and the networking extension in the example above used this dll, the user could simple read a file into a shared buffer and pass that to the networking extension for sending. A more complex implementation could offer interfaces to exchange data from "internal" buffers of extensions which use different implementations, or other types of data exchange, like data streams.

So far, this is only an idea that still needs to be fleshed out, but I want to put it out there to find out what you think. One problem with the shared dll approach is that this dll would have to be in the game directory when running, and some people probably don't like that. Additionally, the dll would have to be adopted by several extensions to be really useful. A major problem is to ensure stability of the Buffer dll's interface, so that you don't get different extensions using different, incompatible versions of the Buffers dll.

But before any work goes into creating this, please tell me your opinion. Do you think this would be useful to you as an extension/dll user? Would you, as an extension/dll creator, support this interface? Discuss! :P
  • 0

#2 Medusar

Medusar

    GMC Member

  • GMC Member
  • 1228 posts
  • Version:GM:Studio

Posted 18 April 2011 - 02:14 PM

I can't really think of many occasions when you really need to transfer data between separate DLLs... Your networking example is one but as you mentioned that's already been accounted for. Apart from that, most DLLs for GM allow complex calculations to be done in compiled code or they provide an interface to an external API. You'd hardly ever even want to share data between unrelated DLLs as they tend to use their own structures. So lots of data would not be compatible.
The developer could decide to expose an API for his DLL but I'd prefer any contributors to PM me so that the extra functionality would end up in the same code base. Either way this would not involve a buffer DLL.
  • 0

Posted Image

Q: Why do programmers always get Christmas and Halloween mixed up?
A: Because DEC 25 = OCT 31

#3 Maarten Baert

Maarten Baert

    GMC Member

  • GMC Member
  • 750 posts
  • Version:GM8.1

Posted 18 April 2011 - 02:54 PM

I had exactly the same problem while writing my own networking dll (Http Dll 2), which is why I added buffers. All my other DLL's simply convert the data to a very long hexadecimal string, which can be safely passed through GML. This is clearly not very efficient.

I would prefer letting GM load the Shared Buffers dll, as a normal extension. Then the GEX could have a function 'shared_buffer_handle()' that returns a pointer to a controller class inside the shared buffers DLL. Other DLLs could then store this pointer during initialization, and use the pointer to access the buffers later. Since the DLL is only loaded once (by GM), there can only be one version of the DLL at any time. All other DLLs can use any version of the shared buffers DLL as long as the interface of the DLL never changes. This is possible with function pointers. Virtual functions should work too, but since different compilers use different vtable formats this won't work if one DLL was compiled in VC++ and another was compiled in GCC.

class Buffer; // opaque pointer
class SharedBufferInterface {
    
    private:
    Buffer* (*createbuffer)();
    void (*destroybuffer)(Buffer*);
    // ...
    
    public:
    inline Buffer* CreateBuffer() { return (*createbuffer)(); }
    inline void DestroyBuffer(Buffer* buffer) { (*destroybuffer)(buffer); }
    // ...
    
};

I would probably use this DLL in my own DLLs if we would create one. I have code for buffers that are very similar to 39dll's buffers, I will create a simple proof of concept to test the idea.

@Medusar: It would be really useful for serialization: converting a data structure to a string or loading a data structure from a string (like ds_map_write/ds_map_read). This is useful to save/load the game. Many DLLs use data structures, it would be great if we could save them all to a single file without having to use temp files. This would also make it a lot easier to send the contents of a data structure to another client in a multiplayer game.
  • 0
Posted Image

#4 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 18 April 2011 - 05:16 PM

You make an interesting point about the vtable - the way it looks, you need a C interface instead of a C++ one in order to make GCC and MSVC work together. And using two seperate versions for the two compilers would defeat the entire point of course.

Since the DLL is only loaded once (by GM), there can only be one version of the DLL at any time.

I didn't test this, but from my understanding Windows only loads a dll once per process even if it is requested multiple times.
  • 0

#5 Tha_14

Tha_14

    GMC Member

  • New Member
  • 174 posts
  • Version:GM8.1

Posted 18 April 2011 - 05:31 PM

OOOHHHHH,Big Replies!!!
I need a lot of time to read that.
  • 0

Posted Image


#6 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 18 April 2011 - 05:56 PM

Another thought, it's possible to implement this without an extra dll and offer a statically linked library instead. The idea is to use handles which contain a pointer to the buffer descriptor instead of a generated integer as is usually done. As long as everything runs in 32-bit mode you should be able to store the pointer's integer value in a GM real without a problem, but it feels slightly less naughty to encode it to a GM string instead.
  • 0

#7 icuurd12b42

icuurd12b42

    Self Formed Sentient

  • GMC Elder
  • 17079 posts
  • Version:GM:Studio

Posted 18 April 2011 - 06:06 PM

You can pass along the shared pointer via a function call in GM

1) define external call to sharedmem.dll called CreateMem
2) define external call in dll1 called Dll1SetSharedMem
2) define external call in dll2 called Dll2SetSharedMem


var mem; mem = CreateMem();
Dll1SetSharedMem(mem);
Dll2SetSharedMem(mem);


in sharemem CreateMem
return (double)(DWORD) GlobalAllocPtr(GMEM_FIXED!GMEM_ZEROINIT, 1024);

in dll 1 and dll2
void* sharedmem = NULL;

void Dll1SetSharedMem(double mem) //same for Dll2SetSharedMem in dll2
{
sharedmem = (void*)(DWORD) mem;
}

Edited by icuurd12b42, 18 April 2011 - 06:07 PM.

  • 0

#8 paul23

paul23

    GMC Member

  • Global Moderators
  • 4084 posts
  • Version:GM:Studio

Posted 18 April 2011 - 09:38 PM

uhm wait, don't cast pointers like that. - There's no way this is guaranteed to work: with 64-bit computers pointers can be larger than the mantissa of doubles. - And I'm unsure how this works out (I think this depends on the compiler wether they're mem-copied or "copied-by-value"), and as someone raised in this post there are a lot of other "potential" problems when using pointers like this.


Now I am not in favour of the whole idea, instead I would try to redesign and make a "tree-structure" of dependency: GM (root) handles 1 or 2 dlls, and then those dlls handle sub dlls (and the communication between sub-dlls). Dlls don't know about their "roots" or other dlls except for their "children".
If you let memory be shared between dlls and generally do stuff like that, in bigger projects you'll always run into the problem: "who owns what", which can slow development down a lot!
  • 0

#9 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 18 April 2011 - 10:28 PM

uhm wait, don't cast pointers like that. - There's no way this is guaranteed to work: with 64-bit computers pointers can be larger than the mantissa of doubles.

That's why I said "As long as everything runs in 32-bit mode", which is currently the case for Game Maker related things.

- And I'm unsure how this works out (I think this depends on the compiler wether they're mem-copied or "copied-by-value"), and as someone raised in this post there are a lot of other "potential" problems when using pointers like this.

Again, for 32-bit pointers this shouldn't be a problem if done this way, assuming Game Maker's real type always corresponds to double. It doesn't matter if they are stored in FPU registers now and then, because those have an even larger mantissa than the normal double representation. Copying the pointer to a string instead (in some encoding that avoids the 0-byte of course) is a bit more straightforward in terms of correctness though.

Now I am not in favour of the whole idea, instead I would try to redesign and make a "tree-structure" of dependency: GM (root) handles 1 or 2 dlls, and then those dlls handle sub dlls (and the communication between sub-dlls). Dlls don't know about their "roots" or other dlls except for their "children".
If you let memory be shared between dlls and generally do stuff like that, in bigger projects you'll always run into the problem: "who owns what", which can slow development down a lot!

The "Who owns what" question has to be solved once - when designing the dll. After that everyone just has to follow the rules.
I don't see where you are going with your idea. Arranging dlls in a tree structure implies that they are dependend on each other, but what I would like to get is a standard way to exchange information between independend extensions. For example, imagine an SHA256 dll that creates a digest over a buffer. It would be good if that function could be developed to just work on any buffer, without all extension programmers having to add this new dll as a "child" dll.
  • 0

#10 paul23

paul23

    GMC Member

  • Global Moderators
  • 4084 posts
  • Version:GM:Studio

Posted 19 April 2011 - 12:03 AM


uhm wait, don't cast pointers like that. - There's no way this is guaranteed to work: with 64-bit computers pointers can be larger than the mantissa of doubles.

That's why I said "As long as everything runs in 32-bit mode", which is currently the case for Game Maker related things.

- And I'm unsure how this works out (I think this depends on the compiler wether they're mem-copied or "copied-by-value"), and as someone raised in this post there are a lot of other "potential" problems when using pointers like this.

Again, for 32-bit pointers this shouldn't be a problem if done this way, assuming Game Maker's real type always corresponds to double. It doesn't matter if they are stored in FPU registers now and then, because those have an even larger mantissa than the normal double representation. Copying the pointer to a string instead (in some encoding that avoids the 0-byte of course) is a bit more straightforward in terms of correctness though.

Now I am not in favour of the whole idea, instead I would try to redesign and make a "tree-structure" of dependency: GM (root) handles 1 or 2 dlls, and then those dlls handle sub dlls (and the communication between sub-dlls). Dlls don't know about their "roots" or other dlls except for their "children".
If you let memory be shared between dlls and generally do stuff like that, in bigger projects you'll always run into the problem: "who owns what", which can slow development down a lot!

The "Who owns what" question has to be solved once - when designing the dll. After that everyone just has to follow the rules.
I don't see where you are going with your idea. Arranging dlls in a tree structure implies that they are dependend on each other, but what I would like to get is a standard way to exchange information between independend extensions. For example, imagine an SHA256 dll that creates a digest over a buffer. It would be good if that function could be developed to just work on any buffer, without all extension programmers having to add this new dll as a "child" dll.


I'm saying that there is no need for a "standard" way: GM acts as root and should handle these translations.

Also consider the loading:
if A is a dll and B needs to get memory allocated by A: how can B be sure that A is not freed, and the memory cleared?
  • 0

#11 icuurd12b42

icuurd12b42

    Self Formed Sentient

  • GMC Elder
  • 17079 posts
  • Version:GM:Studio

Posted 19 April 2011 - 02:01 AM

@paul... about the casts... Problem occurs when you don't do a progressive cast down Like char t= (char)MyDouble; You have to progressively cast to the next smaller type char t = (char)(int)(DWORD)MyDouble;

As for who owns what, in my example, GM owns the memory but SharedMem.dll manages it. You have to decide on what is located in the address and who can to what to it.

Best scenario is to have a garbage system... Simplest is to simply global allocate a large chunk that will exist from start to end of the application and this chunk can be used for data passing. AKA the mem_fixed flag used here.

Remember data sharing is the goal of this concept. Each dll could copy the passed data in it's own space. You just read and write to the buffer as a means to pass along data.


Here I have an in head elaborate design for shared resources for a GMTOOLS (sharing resources and functions)

http://cid-fba0b7e57.../Public/GMTools

see tools overview

Edited by icuurd12b42, 19 April 2011 - 02:04 AM.

  • 0

#12 Maarten Baert

Maarten Baert

    GMC Member

  • GMC Member
  • 750 posts
  • Version:GM8.1

Posted 19 April 2011 - 09:42 AM

I just finished my proof of concept:
http://gm.maartenbae...red_buffers.zip
I've tested it in GCC and VC++, both versions are compatible.

The main DLL (shared_buffers.dll) exports buffer functions to GM, so games can access the buffers too:
shared_buffers_get_handle()buffer_create()buffer_destroy(id)buffer_exists(id)buffer_to_string(id)buffer_get_pos(id)buffer_get_length(id)buffer_at_end(id)buffer_get_error(id)buffer_clear_error(id)buffer_clear(id)buffer_set_pos(id,pos)buffer_read_from_file(id,filename)buffer_write_to_file(id,filename)buffer_read_int8(id)buffer_read_uint8(id)buffer_read_int16(id)buffer_read_uint16(id)buffer_read_int32(id)buffer_read_uint32(id)buffer_read_int64(id)buffer_read_uint64(id)buffer_read_float32(id)buffer_read_float64(id)buffer_write_int8(id,value)buffer_write_uint8(id,value)buffer_write_int16(id,value)buffer_write_uint16(id,value)buffer_write_int32(id,value)buffer_write_uint32(id,value)buffer_write_int64(id,value)buffer_write_uint64(id,value)buffer_write_float32(id,value)buffer_write_float64(id,value)buffer_read_string(id)buffer_write_string(id,string)buffer_read_data(id,len)buffer_write_data(id,string)buffer_read_hex(id,len)buffer_write_hex(id,string)buffer_write_buffer(id,id2)buffer_write_buffer_part(id,id2,pos,len)
Please tell me if I forgot something important. The functions that can be called by the other DLLs are almost the same, but they use pointers instead of ids:
Buffer* Create();void Destroy(Buffer* buffer);unsigned int GetID(Buffer* buffer);Buffer* Find(unsigned int id);const char* ToString(Buffer* buffer);char* GetData(Buffer* buffer);unsigned int GetPos(Buffer* buffer);unsigned int GetLength(Buffer* buffer);bool IsAtEnd(Buffer* buffer);bool GetError(Buffer* buffer);void ClearError(Buffer* buffer);void Clear(Buffer* buffer);void SetPos(Buffer* buffer, unsigned int newpos);void SetLength(Buffer* buffer, unsigned int newlength);bool ReadFromFile(Buffer* buffer, const char* filename);bool WriteToFile(Buffer* buffer, const char* filename);int8_t ReadInt8(Buffer* buffer);uint8_t ReadUint8(Buffer* buffer);int16_t ReadInt16(Buffer* buffer);uint16_t ReadUint16(Buffer* buffer);int32_t ReadInt32(Buffer* buffer);uint32_t ReadUint32(Buffer* buffer);int64_t ReadInt64(Buffer* buffer);uint64_t ReadUint64(Buffer* buffer);float ReadFloat32(Buffer* buffer);double ReadFloat64(Buffer* buffer);void WriteInt8(Buffer* buffer, int8_t value);void WriteUint8(Buffer* buffer, uint8_t value);void WriteInt16(Buffer* buffer, int16_t value);void WriteUint16(Buffer* buffer, uint16_t value);void WriteInt32(Buffer* buffer, int32_t value);void WriteUint32(Buffer* buffer, uint32_t value);void WriteInt64(Buffer* buffer, int64_t value);void WriteUint64(Buffer* buffer, uint64_t value);void WriteFloat32(Buffer* buffer, float value);void WriteFloat64(Buffer* buffer, double value);const char* ReadString(Buffer* buffer);void WriteString(Buffer* buffer, const char* string);void ReadData(Buffer* buffer, char* ptr, unsigned int bytes);void WriteData(Buffer* buffer, const char* ptr, unsigned int bytes);void ReadHex(Buffer* buffer, char* ptr, unsigned int bytes);void WriteHex(Buffer* buffer, const char* ptr, unsigned int bytes);void WriteBuffer(Buffer* buffer, Buffer* source);void WriteBufferPart(Buffer* buffer, Buffer* source, unsigned int pos, unsigned int bytes);
The functions GetID and Find can be used to convert pointers to ids and vice versa.

The function shared_buffers_get_handle() is used to pass the pointer to the other DLL:
shared_buffers_init();
test_init();

// initialize shared buffers interface for test.dll
test_init_shared_buffers(shared_buffers_get_handle());
Now the other DLL can use the pointer to manipulate buffers:
#include "gm.h"
#include "SharedBuffers.h"

SharedBufferInterface *sbi = NULL;

gmexport double test_init_shared_buffers(double handle) {
	sbi = (SharedBufferInterface*)(gm_double_to_uint(handle));
	return 1;
}

gmexport double test_writetobuffer(double id) {
	if(sbi == NULL) return 0;
	Buffer *b = sbi->Find(gm_double_to_uint(id));
	if(b == NULL) return 0;
	sbi->WriteInt32(b, 12345);
	sbi->WriteString(b, "Hello world");
	sbi->WriteFloat32(b, 42.42f);
	return 1;
}

Memory management is done by shared_buffers.dll, other DLLs can't allocate or free memory. They should call the Resize function.

Using it in other DLLs is very simple, just include SharedBuffers.h. That's it.

This system has the added benefit of modularity: If some user doesn't need the buffer functionality, that user can choose not to add shared_buffers.dll to his/her game. The part of the DLL that doesn't need buffers will still work.

We could also use shared memory to pass the pointer of the struct, or even to store the entire struct, but I think it's easier to simply pass the pointer through GM. If we use shared memory we would still need functions to tell DLLs to load the shared memory at the right time, because it's possible some DLLs are loaded before shared_buffers.dll is loaded.

What do you think? Would you use something like this in your DLLs? What should be changed to make it better?

Edited by Maarten Baert, 19 April 2011 - 10:53 AM.

  • 0
Posted Image

#13 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 19 April 2011 - 10:32 AM

@paul... about the casts... Problem occurs when you don't do a progressive cast down Like char t= (char)MyDouble; You have to progressively cast to the next smaller type char t = (char)(int)(DWORD)MyDouble;

That cast has undefined behaviour if the double value is out of range for the DWORD type. And with "undefined behaviour" I don't mean that you could get a wrong result, but that it can cause your program to crash. And that's not theoretic, it happened to me when developing Faucet Networking. Always make sure that the source value fits in the target type when you cast floating point values to anything else - then you don't need this strange cascade of casts either.

To avoid this problem you can use boost::numeric_cast, which throws an exception if the source is out of range, or you can use my own clipped_cast, which results in the highest/lowest legal value for the target if the source is out of range. In your example, you could use it like "char t = clipped_cast<char>(MyDouble)".

I found this very informative in this matter: http://www.boost.org...efinitions.html

Here I have an in head elaborate design for shared resources for a GMTOOLS (sharing resources and functions)

That looks very comprehensive, but it's far beyond the scope of what I want. In my opinion, finding a good way to share buffers is difficult enough for a single project :)

Here are two different ways to define the ownership/responsibility and rules. There are more different possibilities of course, but something to think about:

Model 1:
Buffers are owned by the extensions that created them, and can be shared for read-only access by other extensions. The owning extension defines for how long a buffer is valid. If extensions are allowed to keep references to the buffer, they must assume that the buffer can change and become invalid at any time between function calls (it is possible to provide a safe way for checking this). Otherwise the extension has to create a copy to work on.

This is probably the simplest model, and it allows efficient access to buffers of other extensions without (usually) requiring to copy. In order to allow sharing buffers with more sophisticated data structures without copying, a buffer can be modeled as a list of memory blocks instead of a single one. There could be a convenience function for people who don't want to deal with this, which will always give you a buffer as a single block of memory by making a copy if it consists of multiple regions.

This model can be implemented without an extra dll.

Model 2:
Buffers are owned by a Shared Buffers dll, and can be read and written by all extensions which have a reference. That means buffers would have to follow a generic implementation provided in the dll, or at least a generic interface which can be implemented by the extensions if they really need something more sophisticated. Also, all extensions would have to assume that buffers can be changed between function calls, unless e.g. a read-only wrapper for buffers in provided. On the upside, you could provide a single set of GM functions for reading/writing bytes, floats etc., which is more difficult if buffers are managed by the extensions.

This is more complex, but allows to have a unified Buffer implementation for most purposes.

Actually, thinking about this some more, this can also be achieved without an extra dll.

Edit: I was still writing this when Maarten posted :)

Edited by Medo42, 19 April 2011 - 10:34 AM.

  • 0

#14 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 19 April 2011 - 11:03 AM

Great stuff Maarten. Here are a few thoughts:
The finished thing should have a very permissive license so that it can be used with any extension. Something like MIT or (even simpler but equivalent) ISC would be preferable for me.

Now for some technical comments :)

shared_buffers_get_handle()
I don't really like relying on the size of int as you do here... I know we do it anyway in the end, but I'd still change that to uintptr_t :)
And I'd prefer an automatic way to share this information, so that the user doesn't have to care about it. I guess an extension could do it as part of an automatic initialization script though and hide it this way.

buffer_get_error(id)
buffer_clear_error(id)

I don't think it's very useful to handle buffer overruns like this. If someone writes code that reads past the end of a buffer, it is not likely that he will explicitly check for errors either. The only real improvement would be with exceptions, but GM doesn't support anything like that of course. As it is, I'd prefer the behaviour to be undefined but safe. In my own Buffers implementation I just read until the end and then stop there, the rest of the returned data is undefined (unless you try to read a string - in that case a reasonable behaviour is still possible and I just return the first part of the string)

buffer_read_from_file(id,filename)
buffer_write_to_file(id,filename)

IMO, that should go into its own dll/extension - the very purpose of this project is to make things more modular :)

More criticism later :)
Can you upload this to github or some other collaboration platform?

Edited by Medo42, 19 April 2011 - 12:42 PM.

  • 0

#15 Maarten Baert

Maarten Baert

    GMC Member

  • GMC Member
  • 750 posts
  • Version:GM8.1

Posted 19 April 2011 - 12:43 PM

Great stuff Maarten. Here are a few thoughts:
The finished thing should have a very permissive license so that it can be used with any extension. Something like MIT or (even simpler but equivalent) ISC would be preferable for me.

Okay, I'm using ISC now.

shared_buffers_get_handle()
I don't really like relying on the size of int as you do here... I know we do it anyway in the end, but I'd still change that to uintptr_t :)

Changed unsigned int to uintptr_t :).

And I'd prefer an automatic way to share this information, so that the user doesn't have to care about it. I guess an extension could do it as part of an automatic initialization script though and hide it this way.

I still have to try this, I'm not 100% sure extensions can call other extensions during initialization. It might not work correctly if the extensions are loaded in the wrong order.

buffer_get_error(id)
buffer_clear_error(id)

I don't really like the idea that a buffer can have an error. If someone writes code that reads past the end of a buffer, it is not likely that he will explicitly check for errors either. The only real improvement would be with exceptions, but GM doesn't support anything like that of course. As it is, I'd prefer the behaviour to be undefined but safe. In my own Buffers implementation I just read until the end and then stop there, the rest of the returned data is undefined (unless you try to read a string - in that case a reasonable behaviour is still possible and I just return the first part of the string)

They are optional, you can simply ignore the errors if you want. If you read past the end of the buffer, you will simply get the default value for that type (which is either 0, 0.0, an empty string, or a block of null bytes depending on the function you're using). I added them because some users might want to make sure the data they've just read is actual data. This could be important in some situations.

buffer_read_from_file(id,filename)
buffer_write_to_file(id,filename)

IMO, that should go into its own dll/extension - the very purpose of this project is to make things more modular :)

I know, but they're so simply I thought it would be silly not to add them. If I comment them out the DLL becomes 0.5KB smaller, that's hardly noticeable. We can still write a separate extension with more complex file functions.

Can you upload this to github or some other collaboration platform?

I have very little experience with revision control software, could you upload it somewhere?

I just made some changes to the code. The previous version would write everything to the end of the buffer, the new version writes it at the current position and resizes the buffer if needed. This is similar to writing to files. Now it's also possible to write to the middle of the buffer. I also added buffer_set_length (for GM).

The link is still the same:
http://gm.maartenbae...red_buffers.zip
I'm writing the documentation now.
  • 0
Posted Image

#16 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 19 April 2011 - 01:07 PM

They are optional, you can simply ignore the errors if you want. If you read past the end of the buffer, you will simply get the default value for that type (which is either 0, 0.0, an empty string, or a block of null bytes depending on the function you're using). I added them because some users might want to make sure the data they've just read is actual data. This could be important in some situations.

In this situation you should check *first* whether there is actually enough data left in the buffer.

I just made some changes to the code. The previous version would write everything to the end of the buffer, the new version writes it at the current position and resizes the buffer if needed. This is similar to writing to files. Now it's also possible to write to the middle of the buffer. I also added buffer_set_length (for GM).

These are both things I don't see much practical use for (unless you can come up with a good example). You could always add these functions later if it turns out they are needed, but you can't remove them again once they are included in a release.
  • 0

#17 Maarten Baert

Maarten Baert

    GMC Member

  • GMC Member
  • 750 posts
  • Version:GM8.1

Posted 19 April 2011 - 03:42 PM

In this situation you should check *first* whether there is actually enough data left in the buffer.

Yes, but that's not always possible. If you're reading null-terminated strings from the buffer, you don't know the size in advance - unless you read the string byte by byte, which is cumbersome.

These are both things I don't see much practical use for (unless you can come up with a good example). You could always add these functions later if it turns out they are needed, but you can't remove them again once they are included in a release.

You can also use buffers as large arrays, which is useful to save memory. A 1024x1024 ds_grid uses 24*1024*1024 bytes = 24MB, but if you're only storing bytes this could be done with a 1MB buffer. A 3D particle system DLL could also use buffers to store the position of the particles, so it can be passed to a 3D graphics DLL directly without the overhead of rewriting the entire buffer every time something changes.

I think it can also be useful for some file formats. Some file formats store 'pointers' to other parts of the file in the file itself, which is cumbersome if you can't 'go back' to set the pointers to parts of the file that have been written later.

I think it would be great to have a simple 'memory block' data structure in GM, which gives you the same freedom as C++. I don't see why you'd want to limit it to sequential writing. Random access writing doesn't make the DLL significantly more complicated, slower or harder to use. You can still do sequential reading or writing, you just have to set the position to 0 if you want to start reading from the start again.

I've finished the help file, it's included in the ZIP:
http://gm.maartenbae...red_buffers.zip

Edited by Maarten Baert, 19 April 2011 - 03:45 PM.

  • 0
Posted Image

#18 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 20 April 2011 - 01:46 PM

You are moving very fast with the implementation, which is nice, but I hope you don't mind if I try out some ideas and change what you have done so far around a bit. Then you'll finally be able to criticise my stuff instead of me just nagging about yours :D. I hope to be constructive with this though, so don't take it as anything against you - I just want to make sure this in the best possible shape before people start using it.

The array/file-like usage could be useful in some cases, it is just not what I had in mind originally.

What I want to do is try to find a minimal interface for buffers and seperate the existing functions into an implementation of that interface (e.g. the functions for reading/writing blocks of data) and "helper" functions which only use that interface (e.g. reading/writing specific data types).

The second point would be to make this all work without a dll, using a very small statically linked library instead. I have a plan for how that would work already, and it would get rid of the initialization problem entirely. Maybe it can even be done with just a .hpp file and no library at all.

And when that works, the GM buffer manipulation functions can be put into their own dll/extension that just uses the buffer sharing code like any other extension. This might seems a little bit pointless, but I'd like to try it because splitting things up into independend components is very often a good idea. If nothing else, it helps avoid the argument whether there should be file access functions included in the dll, because it's now in an exchangable component, even if no alternative is ever written.

Edited by Medo42, 20 April 2011 - 01:47 PM.

  • 0

#19 sabriath

sabriath

    12013

  • GMC Member
  • 3197 posts

Posted 20 April 2011 - 02:18 PM

Call me stupid...but why not just simply do this:

//in gm

var temp;

temp = "012345678";

external_call(get_pointer, someid, temp);
external_call(give_pointer, someotherid, temp);

I did a proof-of-concept that GM will actually pass strings byref to their data (not the variable itself, but the data it contains is retained). This allowed me to create hooks into classes by creating enough space for a string to hold an entire class and then calling a 'create' function to move it there. I then made it so that the class could hold pointer data to DLL allocated space, and since it's a pointer, GM wouldn't mess up the actual data for it.

You could use this same technique to pass a pointer reference between DLL extensions by creating a "get_pointer" function on one and a "give_pointer" on another. Since the DLL would allocate the space (and the OS would link it to the running applications handle), it doesn't change it's location between extensions

An example:

gmexportd get_pointer(double id, buffer* pt)
{
  pt = mybuffers[static_cast<int>(id)];
}


gmexportd give_pointer(double id, buffer* pt)
{
  otherbuffers[static_cast<int>(id)] = pt;
}



However, I am not sure how this will help programmers by "modularizing" a standard way of communicating between extensions. It might be great for 2 extensions to combine minds and come up with something together, but trying to come up with a general standard?
  • 0

Tutorials of Interest:
* Multiplayer: mine or True Valhalla's

Projects:
* Net39
* My 39dll scripts
* My 39dll lib
* Multiplayer Engine
* Artificial Chemistry

Posted Image

#20 Maarten Baert

Maarten Baert

    GMC Member

  • GMC Member
  • 750 posts
  • Version:GM8.1

Posted 20 April 2011 - 04:00 PM

You are moving very fast with the implementation, which is nice, but I hope you don't mind if I try out some ideas and change what you have done so far around a bit. Then you'll finally be able to criticise my stuff instead of me just nagging about yours :D. I hope to be constructive with this though, so don't take it as anything against you - I just want to make sure this in the best possible shape before people start using it.

Sure, no problem :).

The array/file-like usage could be useful in some cases, it is just not what I had in mind originally.

What I want to do is try to find a minimal interface for buffers and seperate the existing functions into an implementation of that interface (e.g. the functions for reading/writing blocks of data) and "helper" functions which only use that interface (e.g. reading/writing specific data types).

I think that's the main difference between my idea and yours: You're trying to solving one problem: "DLLs need a way to transfer data to each other", but I'm trying to solve a second problem at the same time: "Many DLLs need buffers, it would be much simpler if all DLLs could use the same buffers instead of their own implementation".

The reason I think we have to solve both problems at once is that most DLLs won't be compatible, unless they were specifically designed to communicate with each other. If you create a platform to transfer raw data without any formatting, it will be used by only a few DLLs. But if you create a shared buffers system that also allows GML users to read or write data, all DLLs that need buffers can use it. And there are lots of DLLs that could use buffers:
- data structure DLLs can use it to serialize/unserialize data structures
- file IO Dlls can use it to save/load files
- socket DLLs can use it to send/receive data
- HTTP request DLLs can use it to store the data that was downloaded from the server
- cryptography DLLs can use it to encrypt/decrypt binary data or calculate hashes (MD5/SHA1/...)
- compression DLLs can use it to compress/decompress data
- ...
The possibilities are endless, and all those DLLs would instantly be compatible with each other - simply because they're using the same buffers. You could serialize ANY data structure, use ANY compression DLL to compress that data, then use ANY cryptography DLL to encrypt it, and finally use ANY network DLL to send the data to another computer. This is currently not possible without using temp files or converting the data to a hex string every time.

I agree with what you said about a minimal interface. Most DLLs won't need functions like buffer_write_int32, you can easily do that yourself in C++ if you have buffer_write_data, which takes any number of bytes. I will try to rewrite it to make the interface a bit simpler.

The second point would be to make this all work without a dll, using a very small statically linked library instead. I have a plan for how that would work already, and it would get rid of the initialization problem entirely. Maybe it can even be done with just a .hpp file and no library at all.

I could be wrong, but I don't think that will work that easily. If DLLs have their own runtime libraries (e.g. because they use different compilers), they will also have their own heap. So you can't just allocate memory in DLL_A and resize the same buffer in DLL_B - unless you use functions like GlobalAlloc, I think (but I'm not sure). MSDN says GlobalAlloc is slower than the default malloc, I will have to test this to see if the difference is relevant (if it works in the first place).

By the way, statically linked libraries are not compatible with other compilers, so you would have to create a separate library for every compiler. It would be easier to simply add one .cpp file and one .h file IMHO.

I think we will need the helper functions anyway, so why not keep it simple and just add the functions to the helper functions DLL?

And when that works, the GM buffer manipulation functions can be put into their own dll/extension that just uses the buffer sharing code like any other extension. This might seems a little bit pointless, but I'd like to try it because splitting things up into independend components is very often a good idea. If nothing else, it helps avoid the argument whether there should be file access functions included in the dll, because it's now in an exchangable component, even if no alternative is ever written.

Good point, but I think the buffers would be almost useless without the helper functions.

Call me stupid...but why not just simply do this:
[...]
I did a proof-of-concept that GM will actually pass strings byref to their data (not the variable itself, but the data it contains is retained). This allowed me to create hooks into classes by creating enough space for a string to hold an entire class and then calling a 'create' function to move it there. I then made it so that the class could hold pointer data to DLL allocated space, and since it's a pointer, GM wouldn't mess up the actual data for it.

I'd prefer not to rely on this, because it could change in the future (if YYG decides to create a new runner for GM9, for example).

However, I am not sure how this will help programmers by "modularizing" a standard way of communicating between extensions. It might be great for 2 extensions to combine minds and come up with something together, but trying to come up with a general standard?

See above :).

Edited by Maarten Baert, 20 April 2011 - 04:02 PM.

  • 0
Posted Image

#21 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 20 April 2011 - 05:36 PM

I think that's the main difference between my idea and yours: You're trying to solving one problem: "DLLs need a way to transfer data to each other", but I'm trying to solve a second problem at the same time: "Many DLLs need buffers, it would be much simpler if all DLLs could use the same buffers instead of their own implementation".

And I agree - there should definitely be a default implementation that everyone can use. I just want to give people who really want to do it the option of defining their own implementation, which might be backed by an internal data structure. For example, in Faucet Networking you can supply a socket instead of a buffer in all of the read/write operations - writing will append data to the send buffer, and reading will read from the buffer of received data. This was added after actually using the extension for a bit, because it makes especially receiving small amounts of data at a time much less verbose, and I do not want to revert to the old behavior of always passing a buffer to send, and returning one with the received data. However, if we go with your plan of only one buffer implementation, I would either have to do exactly that, or keep my own write/read functions beside the ones of the default buffer dll (which would be confusing), or create new functions "socket_sendbuffer(socket)" and "socket_receivebuffer(socket)". The last option would actually be acceptable, but still a bit less elegant.

The reason I think we have to solve both problems at once is that most DLLs won't be compatible, unless they were specifically designed to communicate with each other. If you create a platform to transfer raw data without any formatting, it will be used by only a few DLLs. But if you create a shared buffers system that also allows GML users to read or write data, all DLLs that need buffers can use it.


So your point is one of gaining widespread acceptance - which is obviously important here. But as I said above, my point is to allow custom implementations, not to exclude the default one, so it should still be easy to just use the default implementation.

I could be wrong, but I don't think that will work that easily. If DLLs have their own runtime libraries (e.g. because they use different compilers), they will also have their own heap. So you can't just allocate memory in DLL_A and resize the same buffer in DLL_B


But it won't be necessary to do so, because the buffers will be managed by the dll which creates them. I'll try to come up with a proof of concept.

By the way, statically linked libraries are not compatible with other compilers, so you would have to create a separate library for every compiler. It would be easier to simply add one .cpp file and one .h file IMHO.


As far as I can make out, static libraries generated by gcc (for C code, not C++) can be linked by Visual Studio. I didn't try this though, and just adding the .cpp/.h files to your project is easy enough as an alternative.

Edited by Medo42, 20 April 2011 - 08:05 PM.

  • 0

#22 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 20 April 2011 - 08:06 PM

Sorry, I can't concentrate well at the moment for some reason, so I won't manage changing your code to work dll-less today, and I'm busy tomorrow and over the weekend. The idea is this though: Every extension / dll has its own SharedBufferInterface instance. Buffer references passed over GM consist of a string-encoded (e.g. turned into a hex string) struct that looks like this:
struct BufferReference {
	SharedBufferInterface *owner;
	uint32_t handle;
};
To use a buffer reference that is handed in, you reconstruct the struct (heh) and delegate all function calls to the included owner pointer. That way, the buffers are always managed in the heap space of the extension where they were created.

One other problem which I just noticed is that the buffer writing and reading in Faucet Networking needs to be aware of endianness, which can be set on a per-buffer basis. I don't see a way of replicating that functionality without again resorting to provide my own read/write functions, unless it is included in the generic implementation.

Edited by Medo42, 20 April 2011 - 08:23 PM.

  • 0

#23 Maarten Baert

Maarten Baert

    GMC Member

  • GMC Member
  • 750 posts
  • Version:GM8.1

Posted 21 April 2011 - 12:43 PM

And I agree - there should definitely be a default implementation that everyone can use. I just want to give people who really want to do it the option of defining their own implementation, which might be backed by an internal data structure. For example, in Faucet Networking you can supply a socket instead of a buffer in all of the read/write operations - writing will append data to the send buffer, and reading will read from the buffer of received data. This was added after actually using the extension for a bit, because it makes especially receiving small amounts of data at a time much less verbose, and I do not want to revert to the old behavior of always passing a buffer to send, and returning one with the received data. However, if we go with your plan of only one buffer implementation, I would either have to do exactly that, or keep my own write/read functions beside the ones of the default buffer dll (which would be confusing), or create new functions "socket_sendbuffer(socket)" and "socket_receivebuffer(socket)". The last option would actually be acceptable, but still a bit less elegant.

I would either use the last option, or create functions like socket_read_float32 (instead of buffer_read_float32).

So your point is one of gaining widespread acceptance - which is obviously important here. But as I said above, my point is to allow custom implementations, not to exclude the default one, so it should still be easy to just use the default implementation.

Ah, I understand what you mean now. You want to be able to create buffers that aren't actually memory buffers (it could be files, sockets, cryptographic streams, whatever), but still behave like buffers to all other DLLs. Like polymorphism. That's a great idea, I really like it :). But obviously you don't want to use an interface with tons of functions (like buffer_write_int32 etc.) that aren't really needed, because those will just make it harder to write your own implementation.

As far as I can make out, static libraries generated by gcc (for C code, not C++) can be linked by Visual Studio. I didn't try this though, and just adding the .cpp/.h files to your project is easy enough as an alternative.

I thought it was impossible without some conversion program, but I could be wrong.

Sorry, I can't concentrate well at the moment for some reason, so I won't manage changing your code to work dll-less today, and I'm busy tomorrow and over the weekend. The idea is this though: Every extension / dll has its own SharedBufferInterface instance. Buffer references passed over GM consist of a string-encoded (e.g. turned into a hex string) struct that looks like this:

struct BufferReference {
	SharedBufferInterface *owner;
	uint32_t handle;
};
To use a buffer reference that is handed in, you reconstruct the struct (heh) and delegate all function calls to the included owner pointer. That way, the buffers are always managed in the heap space of the extension where they were created.


One other problem which I just noticed is that the buffer writing and reading in Faucet Networking needs to be aware of endianness, which can be set on a per-buffer basis. I don't see a way of replicating that functionality without again resorting to provide my own read/write functions, unless it is included in the generic implementation.

Good point, I hadn't thought about endianness. Why exactly do you need it? I thought all Windows DLLs were little-endian anyway. You could add a function 'GetEndianness' to the interface, but I don't really see why it is needed. Endianness is only a problem if the program that generates the data does not use the same format as the program that interprets the data. But if a program can interpret the data, I assume it already knows what the data is, so it should also know whether it's little-endian or big-endian. You will have to come up with a fixed format anyway if you want to exchange data, so the endianness should be part of the definition of that format.

I think it's a good idea to add an extra argument to functions like buffer_write_int32 and buffer_read_int32 to set the endianness, so users can read or write in the correct format. But I don't think we should set the endianness on a per-buffer basis, it's the responsibility of the reader and writer to use the correct format.


Now, about the interface. I'm trying to come up with a minimal set of functions we will need for buffers. But there are actually two kinds of buffers we could use:
- stream-like buffers, which can only read from the start and write to the end (like sockets), which I will call streams.
- memory-like buffers, which can read and write at any position (like my buffers, or files), which I will call buffers.
You could use the buffers as if they are streams (writing writes to the 'writing position' of the buffer, reading reads at the 'reading position'). So you could create a set of functions for reading and writing that work for streams AND buffers, and a second set of functions that only work for buffers.

Streams:
- read any type of data (the interface has just one read function, but the helper functions can add more)
- write any type of data (same here).
- a function that returns how many bytes of data are left to be read

Buffers:
- get read position
- get write position
- get length
- set read position
- set write position
- set length

The interface should allow anyone to create their own implementation of streams (e.g. sockets) OR buffers (e.g. files). However, we will still need the GML functions to manipulate the buffers and stream, and also the helper functions, so we will still need a buffer dll. Well, it's not strictly needed anymore, but buffers are rather useless if you can't manipulate them :).

There's one thing that worries me though. GM users are used to ids, but now we're forcing them to use real pointers instead. That means they could easily make their game crash if they accidentally pass the wrong variable to a function like buffer_write_int32, or if they accidentally destroy the same buffer twice. I'm not sure that's a good idea. I think we should use the shared dll to keep a list of all buffers and streams, so we can simply pass ids to GM instead of pointers.
  • 0
Posted Image

#24 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 28 April 2011 - 11:13 AM

Ok, some more discussing then before I go ahead with the implementation because you raise a good point once more. If we keep on like this I think we can work out most of the problems :)

Good point, I hadn't thought about endianness. Why exactly do you need it? I thought all Windows DLLs were little-endian anyway. You could add a function 'GetEndianness' to the interface, but I don't really see why it is needed. Endianness is only a problem if the program that generates the data does not use the same format as the program that interprets the data. But if a program can interpret the data, I assume it already knows what the data is, so it should also know whether it's little-endian or big-endian. You will have to come up with a fixed format anyway if you want to exchange data, so the endianness should be part of the definition of that format.

I think it's a good idea to add an extra argument to functions like buffer_write_int32 and buffer_read_int32 to set the endianness, so users can read or write in the correct format. But I don't think we should set the endianness on a per-buffer basis, it's the responsibility of the reader and writer to use the correct format.

Maybe I expressed that a bit ambiguously - Of course I know whether the data is little- or big-endian, but I don't necessarily have any influence on it. Talking from the standpoint of my networking extension again, many networking protocols are defined with big-endian byte order, while others (e.g. most things based on 39dll) are little-endian. So in order to communicate in any particular protocol you have to set the correct endianness in the buffer you are reading / writing. You could make that a parameter of the reading/writing functions, but since the data from/to any source / destination will usually use the same convention throughout, setting it as a parameter of the buffer saves a lot of redundand typing - and you can still switch it around for some fields if you really need to, which is more verbose than using a parameter but only necessary in very few cases. This applies equally to data formats in files.

Now, about the interface. I'm trying to come up with a minimal set of functions we will need for buffers. But there are actually two kinds of buffers we could use:
- stream-like buffers, which can only read from the start and write to the end (like sockets), which I will call streams.
- memory-like buffers, which can read and write at any position (like my buffers, or files), which I will call buffers.
You could use the buffers as if they are streams (writing writes to the 'writing position' of the buffer, reading reads at the 'reading position'). So you could create a set of functions for reading and writing that work for streams AND buffers, and a second set of functions that only work for buffers.

The buffers I currently use are actually a mix of the two, where you can write only to the end but read anywhere. That would not be possible for all types of streams though, and I want to avoid making things more complicated. I never actually used the option to set the read position in Gang Garrison 2, so it probably isn't a very usual case. It could be removed from the actual receive buffer of the socket to make the socket fit your definition of a stream - if someone does need it, he can copy the received data out to a separate buffer first.

There's one thing that worries me though. GM users are used to ids, but now we're forcing them to use real pointers instead. That means they could easily make their game crash if they accidentally pass the wrong variable to a function like buffer_write_int32, or if they accidentally destroy the same buffer twice. I'm not sure that's a good idea. I think we should use the shared dll to keep a list of all buffers and streams, so we can simply pass ids to GM instead of pointers.

The idea is that users should never look at or change handles, and only call the functions with some kind of handle that was passed out before... but of course, you cannot assume that, and it would be good if it didn't crash the entire game when this rule is disregarded. We could add a "signature" to the beginning of a valid pointer, some magic number that can be used to check that this is actually a buffer handle and not some random string. It would still be possible to craft a bad handle to crash the game, but it would be very unlikely to have it happen by accident. We could also add a magic number to the beginning of a buffers implementation in memory to make malicious crafting of handles more difficult, but that's probably overkill.

I have to admit that while the solution without a central dll is interesting and in a way quite elegant, it does create some practical problems. On the other hand it solves the problem of requiring the user to initialize all extensions with a pointer to the buffers implementation. If we can get rid of that somehow in the central dll solution and also provide a useful error message like "This extension requires the buffers dll / extension, please install it", I'll be happy with that one.

You mentioned that it might not be possible in the extension initialization, but if that is the case there are still convoluted ways to get the result we want. For example, this is how an extension init code could look like:
if(variable_global_exists("__sharedBuffersInitialized")) {
    __myExtensionInit(shared_buffers_get_handle());
} else {
    if(!variable_global_exists("__sharedBuffersInitCallbacks")) {
        global.__sharedBuffersInitCallbacks = ds_list_create();
    }
    ds_list_add(global.__sharedBuffersInitCallbacks, __myExtensionInit);
}
Then the shared buffers dll / extension could be initialized like this:
global.__sharedBuffersInitialized = true;
if(variable_global_exists("__sharedBuffersInitCallbacks")) {
    var i;
    for(i=0; i<ds_list_size(global.__sharedBuffersInitCallbacks); i+=1) {
        script_execute(ds_list_find_value(global.__sharedBuffersInitCallbacks, i), shared_buffers_get_handle());
    }
    ds_list_destroy(global.__sharedBuffersInitCallbacks);
}
I'm not sure if extension GML scripts can be called that way. If not, one could work around that by using a string with the script name instead.

Hmm, maybe I should have checked first how extensions are initialized and if this is even necessary :)
  • 0

#25 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 28 April 2011 - 09:27 PM

I set up a git repository at https://github.com/M...shared-buffers/ and added a first change: When you write an integer, the value you pass in is now rounded, and the conversion is now done by a template function.
  • 0

#26 Maarten Baert

Maarten Baert

    GMC Member

  • GMC Member
  • 750 posts
  • Version:GM8.1

Posted 30 April 2011 - 09:26 PM

Maybe I expressed that a bit ambiguously - Of course I know whether the data is little- or big-endian, but I don't necessarily have any influence on it. Talking from the standpoint of my networking extension again, many networking protocols are defined with big-endian byte order, while others (e.g. most things based on 39dll) are little-endian. So in order to communicate in any particular protocol you have to set the correct endianness in the buffer you are reading / writing. You could make that a parameter of the reading/writing functions, but since the data from/to any source / destination will usually use the same convention throughout, setting it as a parameter of the buffer saves a lot of redundand typing - and you can still switch it around for some fields if you really need to, which is more verbose than using a parameter but only necessary in very few cases. This applies equally to data formats in files.

Yes, you're right - adding that argument every time is just annoying. But we can't really use the buffers to store the endianness, since the new buffer interface has only one write function: WriteData. It doesn't know the exact type, so it can't do the conversion properly (obviously you don't need the conversion for strings, and it's a lot more complicated if you try to write entire structs at once). But we can still make a global helper function to set the endianness, so the other helper functions can use this setting when reading or writing. I don't really like global state either, but it's easier than adding that extra argument every time.

The idea is that users should never look at or change handles, and only call the functions with some kind of handle that was passed out before... but of course, you cannot assume that, and it would be good if it didn't crash the entire game when this rule is disregarded. We could add a "signature" to the beginning of a valid pointer, some magic number that can be used to check that this is actually a buffer handle and not some random string. It would still be possible to craft a bad handle to crash the game, but it would be very unlikely to have it happen by accident. We could also add a magic number to the beginning of a buffers implementation in memory to make malicious crafting of handles more difficult, but that's probably overkill.

That doesn't really solve the destroy-twice problem: the handle is essentially valid, the buffer just happens to be gone. Additionally, I think users will easily get confused by the different types of buffers and streams, so they could accidentally try to destroy one type (e.g. a socket) using the function that was meant to destroy another type (e.g. a memory buffer). That would lead to very weird crashes. Since GM users aren't familiar to debuggers and such, they have really no way to track this type of bug down, so they would quickly give up.

I have to admit that while the solution without a central dll is interesting and in a way quite elegant, it does create some practical problems. On the other hand it solves the problem of requiring the user to initialize all extensions with a pointer to the buffers implementation. If we can get rid of that somehow in the central dll solution and also provide a useful error message like "This extension requires the buffers dll / extension, please install it", I'll be happy with that one.

We could solve the initialization problem simply by delaying the initialization until the first function that actually needs buffers tries to use them. That would be even better, as it allows us to create DLLs that are compatible with Shared Buffers, but don't require it. If you would try to call such a function without the DLL, the DLL could either show an error message or simply do nothing - that's up to the DLL developer.

You mentioned that it might not be possible in the extension initialization, but if that is the case there are still convoluted ways to get the result we want. For example, this is how an extension init code could look like:

if(variable_global_exists("__sharedBuffersInitialized")) {
    __myExtensionInit(shared_buffers_get_handle());
} else {
    if(!variable_global_exists("__sharedBuffersInitCallbacks")) {
        global.__sharedBuffersInitCallbacks = ds_list_create();
    }
    ds_list_add(global.__sharedBuffersInitCallbacks, __myExtensionInit);
}
Then the shared buffers dll / extension could be initialized like this:
global.__sharedBuffersInitialized = true;
if(variable_global_exists("__sharedBuffersInitCallbacks")) {
    var i;
    for(i=0; i<ds_list_size(global.__sharedBuffersInitCallbacks); i+=1) {
        script_execute(ds_list_find_value(global.__sharedBuffersInitCallbacks, i), shared_buffers_get_handle());
    }
    ds_list_destroy(global.__sharedBuffersInitCallbacks);
}
I'm not sure if extension GML scripts can be called that way. If not, one could work around that by using a string with the script name instead.

Convoluted indeed :P. But if it works, it would be great. And if we can't call GEX functions like that, we could still use execute_string (finally, a good reason to use that function).

I set up a git repository at https://github.com/M...shared-buffers/ and added a first change: When you write an integer, the value you pass in is now rounded, and the conversion is now done by a template function.

Thanks, I'm downloading Git now. I didn't know about std::numeric_limits, that's really useful. Your cast function is very similar to my gm_cast<X> function I used in ExtremePhysics, I just rewrote it to use std::numeric_limits.

During the last few days I've written another proof-of-concept:
http://gm.maartenbae...eam_buffers.zip
It uses separate streams and buffers now. I've included memory buffers in the main DLL for now, but I could move them to a separate DLL (I wasn't sure which was better). The new version allows anyone to create their own implementation of either streams or buffers (all buffers are also streams). I've rewritten the help file too.

The DLL uses a global id table, so you don't have to pass pointers around. You can use the same id for the buffer/stream and your own data structure. For example, if your DLL creates sockets, you can use the id that was generated for the buffer/stream for your socket, so you won't need a function like 'socket_get_stream_id' or similar - the id is the same.

There's also a function 'GetInterface' which you can use to get the pointer to the interface of a specific buffer/stream (not the main interface, just the interface of that type). This address isn't really useful, but you can use it to check whether some stream or buffer has the correct type. That way you don't have to keep your own table of ids: you can just use the shared buffer dll to convert the id into a pointer, and then check the interface address to make sure it's the correct type. You don't have to use it like that, but I thought it could be useful (I used this for the memory buffer implementation).

I haven't added endianness yet, and it still uses my old cast functions. I will change that tomorrow (it's 23:25 now where I live :)).
  • 0
Posted Image

#27 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 20 May 2011 - 09:16 PM

Sorry for not being more active. This project is still on my mind, but at the moment I am tied up with work and writing a thesis. I will try to make some time for this soon though.

Let me just clear up one points from that last post. I hope to answer some of the others in code soon :)


The idea is that users should never look at or change handles, and only call the functions with some kind of handle that was passed out before... but of course, you cannot assume that, and it would be good if it didn't crash the entire game when this rule is disregarded. We could add a "signature" to the beginning of a valid pointer, some magic number that can be used to check that this is actually a buffer handle and not some random string. It would still be possible to craft a bad handle to crash the game, but it would be very unlikely to have it happen by accident. We could also add a magic number to the beginning of a buffers implementation in memory to make malicious crafting of handles more difficult, but that's probably overkill.

That doesn't really solve the destroy-twice problem: the handle is essentially valid, the buffer just happens to be gone. Additionally, I think users will easily get confused by the different types of buffers and streams, so they could accidentally try to destroy one type (e.g. a socket) using the function that was meant to destroy another type (e.g. a memory buffer). That would lead to very weird crashes. Since GM users aren't familiar to debuggers and such, they have really no way to track this type of bug down, so they would quickly give up.

Check the implementation detail that I gave again, specifically the struct for the buffer handle:
struct BufferReference {
        SharedBufferInterface *owner;
        uint32_t handle;
};
The pointer does not point to a buffer, but rather to an instance of the function pointer interface. We already share the exact same pointer between extensions in your implementation. The only difference is that there can now be several instances of this, each managing its own buffers. This way all of the problems you mention can be avoided.
  • 0

#28 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 21 May 2011 - 11:57 PM

Alright, I went over your code and started picking it apart a bit. Maybe I'm just taking your stuff apart and putting it back together in a slightly different way, but if so it at least helps me undersand things a bit better :).

I'd still like to have a well-defined "core" whose API is the only place where we actually have to care about compiler compatibility, since this is where all the extensions will call into. Now, I'm still not quite clear on what you can and can't expect to be compatible, but let me start with the proposed API (developed mainly from your interface structs) and then discuss it further:
#include <stdint.h>

typedef struct {
	uint32_t (__stdcall *read)(void*, uint8_t*, uint32_t);
	void (__stdcall *write)(void*, const uint8_t*, uint32_t);
	uint32_t (__stdcall *getBytesLeft)(void*);
	uint8_t (__stdcall *destroy)(void*);
} SharedStreamInterface;

typedef struct {
	uint32_t (__stdcall *read)(void*, uint8_t*, uint32_t);
	void (__stdcall *write)(void*, const uint8_t*, uint32_t);
	uint32_t (__stdcall *getBytesLeft)(void*);
	uint8_t (__stdcall *destroy)(void*);

	uint32_t (__stdcall *getreadpos)(void*);
	uint32_t (__stdcall *getwritepos)(void*);
	uint32_t (__stdcall *getlength)(void*);
	void (__stdcall *setreadpos)(void*, uint32_t);
	void (__stdcall *setwritepos)(void*, uint32_t);
	void (__stdcall *setlength)(void*, uint32_t);
} SharedBufferInterface;

extern "C" {
	// Functions applicable for both buffers and streams
	__stdcall uint32_t readData(uint32_t id, uint8_t* data, uint32_t size);
	__stdcall void writeData(uint32_t id, const uint8_t* data, uint32_t size);
	__stdcall uint32_t getBytesLeft(uint32_t id);
	__stdcall void destroyStreamOrBuffer(uint32_t id);
	__stdcall uint8_t streamOrBufferExists(uint32_t id);

	// Functions only applicable for buffers
	__stdcall uint32_t getReadPos(uint32_t id);
	__stdcall uint32_t getWritePos(uint32_t id);
	__stdcall uint32_t getLength(uint32_t id);
	__stdcall void setReadPos(uint32_t id, uint32_t pos);
	__stdcall void setWritePos(uint32_t id, uint32_t pos);
	__stdcall void setLength(uint32_t id, uint32_t length);
	__stdcall uint8_t bufferExists(uint32_t id);

	// Adding new buffers/Streams
	__stdcall uint32_t addStream(SharedStreamInterface *interface, void *stream);
	__stdcall uint32_t addBuffer(SharedBufferInterface *interface, void *buffer);
}

Most things should be straightforward. Technicalities: I'm not sure stdcall is actually more compatible, so that could possibly be removed. The code is relying on mingw and MSVC to generate the same memory layout for the struct definitions, but since they only contain fields of the same type (function pointers) it will probably work. After all, you relied on the same thing too :)

One difference from your model is that this interface allows a uniform "destroy" function that can be used on any type of stream or buffer. On destruction, the library will first call the destroy()-function provided in the interface to allow the buffer to clean up. If that function returns true, the buffer is removed from the list. Specific implementations can choose to prevent being destroyed by returning false instead, e.g. if the buffer is actually tied to some internal data structure of a different resource. In order to *actually* destroy those buffers, an implementation would have to make sure that true is returned when it really wants to clean up - a bit complicated, but I think it's manageable.

By offering the functions directly instead of returning pointers to buffer structs it is impossible for a user of this api to store a pointer that could become invalid at any time.

In my opinion, this api is the part which has to be most carefully designed, because any change to it would break compatibility with basically everything. We can put some classes on top of it to make it nicer to use, and we should definitely have your memory buffer implementation to go with it, but that is all outside this "core" part and can be written in normal C++ and compiled together with the extension that uses it.

The function names could use some tweaking though. Please criticise every aspect of it that you can, now is the best time for it. I can add an implementation tomorrow.
  • 0

#29 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 23 May 2011 - 11:23 PM

The promised implementation is actually done, but untested and in need of some extra polish, so I will put it up a bit later. I also wrote a small helper wrapper for the addStream/addBuffer functions in the api posted above, which allows you to write buffer and stream implementations as plain classes, as long as they are derived from one of the interface classes AbstractStream or AbstractBuffer.

One thing I am still not quite content with is that we have to rely on the interface pointers that are passed in to remain valid. We could store a copy of the interface struct instead of the pointer to get around this, but that would waste a bit of space. The pointer validity problem takes an especially difficult turn when considering what happens during shutdown, because there may be situations where one DLL has already been unloaded while the other is being destroyed. It might not be as difficult as I make it out at the moment though.

A question about your buffer implementation: The grow/shrink strategy seems sound, but do you have a specific reason for rolling your own instead of relying on std::vector?
  • 0

#30 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 25 May 2011 - 11:30 PM

You can download what I have so far over here: https://github.com/M...-shared-buffers

The destroy function is a bit of a sore spot in this whole thing, since it really shouldn't directly belong to the buffer implementation - it should be implemented by the entity that owns the buffer instead. I might pull it out of the shb_StreamInterface and put it as a parameter into the shareBuffer/shareStream functions instead, or alternatively just remove it from the AbstractStream class and have shareStream take it as an extra parameter.

I added a new function writeOther() to the stream interface that takes the id of another buffer and a size argument, and then reads the requested ammount of data from the other stream/buffer directly to its own memory. Transferring data between memory buffers would require an extra copy without a function like this, but I am not totally happy with how this worked out either and would appreciate a better solution.

I'm quite convinced by now that the best approach to the architecutre question is to provide the shared buffers library implementation as an extension and have it loaded exactly once. Distributing a pointer to the interface to the other extensions / dlls doesn't really require the complicated solution we discussed earlier, it can be accomplished by having the Shared Buffers extension provide a function like getSharedBufferInterfacePointer that is called by everyone else :) (Reading back I notice that this was your original idea, d'oh! Oh well, I actually tested that it works this way now.)

Please don't be annoyed that I rewrote so much of what you had already done. This project is a bit of a learning exercise for me (I'm not very experienced with C++), and in many cases I had to get my hands dirty myself in order to understand all the issues involved.

Edit: I replaced the buffer implementation (which was adapted from my networking lib) with a modified version of yours now. Also, I remover writeOther again in favor of a function to directly access the memory of a buffer. This makes working with buffers a bit more efficient for tasks like creating a checksum over a whole buffer or appending a buffer to anothe one, and I think it is fairly safe to assume that most buffer implementations would use a single block of memory. However, we might make this an optional operation and let buffer implementations return NULL there if they don't support this - in that case, the caller would have to use the other access mechanism and create a copy first.

Edited by Medo42, 26 May 2011 - 11:06 AM.

  • 0

#31 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 27 May 2011 - 10:40 PM

Just another bump to say that I rewrote almost everything and everything is finally starting to come together for me. You can still find the code on Github, and until there is either a reply here or I am done with a major milestone, I won't post another reply.
  • 0

#32 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 02 June 2011 - 07:41 PM

Uploaded the latest version to github: https://github.com/M...-shared-buffers

General layout:
The core folder contains the code that actually manages buffer sharing. It is used through a C-style API of stdcall-functions and structs, defined in public_core_api.hpp. Only the Shared Buffers extension will actually contain the compiled code from this folder, everyone else only needs public_core_api.hpp. The pointer to the core API struct is provided by the Shared Buffers extension in a GM-visible function that has yet to be written. Every extension that wants to work with shared buffers needs to get this pointer and store it.

Most people will not need to work with this crude C API though, and can use the classes built on top of it instead. Those can be found in the main source folder. If you only want to use buffers shared by other extensions, you only need to include BufferProxy.hpp. There, you have two classes StreamProxy and BufferProxy, which can be instantiated with the core API pointer and a handle. After instantiation, they behave like the buffers/streams whose handle they hold, or (if the buffer or stream does not actually exist) like an empty stream/buffer.

To create your own stream/buffer implementation, you can include AbstractBuffer.hpp and derive a class from either AbstractStream or AbstractBuffer and implement the required virtual methods. For AbstractBuffer, several methods already have default implementations that work in terms of the methods you have to implement yourself. After this is done, you can instantiate your buffer and share it through a BufferManager (see below). DefaultMemBuffer.hpp contains a buffer that is defined just like that, and which should be good enough for most needs.

In order to actually share a buffer, there needs to be an object that owns it and can react to destroy requests made through the shared interface. This is what a BufferManager does. One option is to use the DefaultBufferManager class provided in the library. This implementation can share buffers handed in as pointers, and takes ownership of them, i.e. you should only destroy them through the shared interface or through the manager itself once you shared them. You can also mark buffers as "indestructable" so they can only be destroyed through the manager itself, not through the shared interface. If this is not good enough for you and you want something fancier, e.g. shared ownership with smart pointers, or having existing classes in your project manage their own buffers, you can derive a class from AbstractBufferManager and override its destroyCallback method, which is called whenever someone requests through the shared interface that a buffer or stream shared through this manager should be destroyed.

The gm folder contains the Game Maker interface of the Shared Buffers extension.

That's all for the layout and overview :). The current version is still missing the endianness support and is completely untested so far, but it has all the features of Maarten's implementation now and offers some improvements over it (imo). I added some extra primitives (peeking for streams, direct memory access for buffers) for increased efficiency and flexibility, and I don't think they much reduce the generality of the interfaces. I removed the int64 functions because using them would give some unintuitive results, due to the fact that not all int64 values have a double/real representation. And files can now be read into any buffer, not just the default memory buffers.

Maarten: I hope you didn't take offense at my decision to rewrite everything. In retrospect, I didn't understand some issues you were considering and after I took them into account I ended up with something much closer to your code than I had though I would. However, I do think the result is an improvement in some aspects, and it would be really helpful if you could hand back some criticism and suggestions.

Endianness will probably be integrated into the core, as an attribute associated with streams, maybe as part of a scheme to allow adding more attributes later without breaking compatibility. The core would not interpret these attributes in any way, it would only provide a mechanism for associating them with the buffers. It would be possible with reasonable efficiency to tack this on outside the core, by keeping a map from handles to attributes and regularly scanning for handles that are no longer valid, but that would be a bit kludgy.

Edited by Medo42, 02 June 2011 - 07:53 PM.

  • 0

#33 Maarten Baert

Maarten Baert

    GMC Member

  • GMC Member
  • 750 posts
  • Version:GM8.1

Posted 04 June 2011 - 04:53 PM

Sorry, I should have checked this topic more often.

Sounds complicated :P. I haven't looked at the code yet, I will do so when I have more time.

I think there's something wrong with the repository, some trees are 'invalid'?

A question about your buffer implementation: The grow/shrink strategy seems sound, but do you have a specific reason for rolling your own instead of relying on std::vector?

std::vector is designed to work with any type of data, not just chars. But since buffers only store chars, it's possible to make some assumptions which make the code more efficient (well, I think). std::vector is overly complicated in this case since we don't need iterators and everything.

Also, I remove[d] writeOther again in favor of a function to directly access the memory of a buffer. This makes working with buffers a bit more efficient for tasks like creating a checksum over a whole buffer or appending a buffer to anothe one, and I think it is fairly safe to assume that most buffer implementations would use a single block of memory. However, we might make this an optional operation and let buffer implementations return NULL there if they don't support this - in that case, the caller would have to use the other access mechanism and create a copy first.

I don't think it's a good idea to even assume the buffer is stored in memory! The buffer could also be a 1GB file that doesn't even fit in memory (just an example). If we add this, it should definitely be optional.

Then again, why would anyone create a buffer implementation that stores the data in memory? MemoryBuffer already does exactly that, why would anyone create a new implementation that's identical to MemoryBuffer?

Just some more thoughts:

- I don't think __stdcall makes it more portable, AFAIK the keyword '__stdcall' is non-standard (does it even work in GCC?). Anyway, only the function pointers need the __stdcall, the interface functions can use any calling convention since they are recompiled for every DLL anyway, right?

- I don't really like the way destroying works. If the function that creates the buffer/stream is defined by an external DLL, then that DLL should also define a function to destroy it. You can't predict how the buffer will be stored, it might even be impossible to destroy the buffer correctly with this interface. For example, if the buffer is stored in a std::list, which itself is stored in another class, the buffer can't possible destroy itself unless it also has a reference to its owner. But that's exactly the kind of reference you normally want to avoid because it makes the code less modular (if A references B and B references A, then A and B can only be used together and you can't create a new class C which uses A without either using B too or drastically changing A).

- I hadn't thought about the shut-down problem, that could be difficult to fix.

I can't access the core folder, so I'm not sure how your interface works. Can you fix that?
  • 0
Posted Image

#34 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 04 June 2011 - 06:17 PM

I think there's something wrong with the repository, some trees are 'invalid'?

Oh man, that again :(
It seems that EGit, the Eclipse Git plugin, has a bug that leaves empty directories behind when you move a dir somewhere else. I fixed it now, sorry for the confusion. The thing is, I restructured everything a bit today and made the library (at least, the part which is used by other dlls) library-only, so it can be used a bit more easily. In the process the gm and core directories were moved in other positions under src.

I don't think it's a good idea to even assume the buffer is stored in memory! The buffer could also be a 1GB file that doesn't even fit in memory (just an example). If we add this, it should definitely be optional.

Then again, why would anyone create a buffer implementation that stores the data in memory? MemoryBuffer already does exactly that, why would anyone create a new implementation that's identical to MemoryBuffer?

The new "getFragment()" mechanism is much more flexible, it no longer assumes that the buffer is in a single block. It still does assume that the buffer is in memory though, but that could actually be fixed with a small change of the contract of getFragment(). Right now the returned memory region has to remain valid until any setter or the write method is called on the buffer, but if it only had to stay valid until the next call to getFragment() (ot the other functions mentioned), you could satisfy the fragment requests with a small temporary buffer that you read from the file and write back later. getFragment could further be changed to only allow reading from the returned memory region, not writing. That could make it again a bit more flexible for implementations, but would make copying data from a stream to a buffer require two copies again.

Even if you cannot make your structure fit the buffer contract, you can still share it as a stream - and you actually have to do this if you want to allow reading/writing big files anyway, because of the 4GB limit of size_t / uint32_t which is used as buffer length and position type at the moment.

As for why it would be useful at all to have different memory buffer implementations: People could have special-purpose data structures they want to allow access too. For example, in Faucet Networking I have a Sendbuffer class that allows efficiently removing data from the front (because it has been sent), but it still keeps the data in relatively large chunks of memory (64kb). I might want to allow accessing that sendbuffer as a shared buffer.

- I don't think __stdcall makes it more portable, AFAIK the keyword '__stdcall' is non-standard (does it even work in GCC?). Anyway, only the function pointers need the __stdcall, the interface functions can use any calling convention since they are recompiled for every DLL anyway, right?

Yes, it is non-standard. It does work in gcc though, and iirc it is the calling convention of the Windows API, so any Windows-based compiler would have to support it in some way to be worth anything.

- I don't really like the way destroying works. If the function that creates the buffer/stream is defined by an external DLL, then that DLL should also define a function to destroy it. You can't predict how the buffer will be stored, it might even be impossible to destroy the buffer correctly with this interface. For example, if the buffer is stored in a std::list, which itself is stored in another class, the buffer can't possible destroy itself unless it also has a reference to its owner. But that's exactly the kind of reference you normally want to avoid because it makes the code less modular (if A references B and B references A, then A and B can only be used together and you can't create a new class C which uses A without either using B too or drastically changing A).

A common destroy function is useful to GM programs because it allows them to work with any kind of buffer/stream. After instantiating a buffer, you should not be required to know what type it has, as long as you only work with the basic interface. This is the same in object-oriented programming. You can delete an object even if you only know it by its superclass interface. If that was never required, there would not be a need for virtual destructors.

As for feasibility, please look at the code first. Yes, the shared buffers library can have no idea how the buffer will be stored, or even if it makes sense to allow destroying it. That's why you have to provide a destroy callback when sharing a buffer, which takes care of these issues. The only thing the sharing library does is call that callback, and delete the handle and associated information if the callback does destroy the buffer, or otherwise thinks it's a good idea to un-share the buffer (indicated by returning true).

With the way it is handled in the current code, this seperates the concerns of the buffer (responsible for doing buffer things) and the buffer's owner/manager (responsible for providing the destroy callback). The buffer only depends on the AbstractBuffer interface, which is not even tied to the sharing mechanism at all, it is just a generic abstract base class which defines the basic methods you have to implement.

A buffer manager implementation doesn't have to depend on any specific buffer type either, see the DefaultBufferManager which can act as owner for any AbstractBuffer. The name BufferManager might imply that you always have a single one and share all buffers through that, but the concept is more generally that of the buffer's owner. For example, if I did decide to let my Socket class provide a shared send buffer and receive buffer, I could make Socket implement AbstractBufferManager, and every socket object would share exactly those two buffers. Implementing AbstractBufferManager is very lightweight, it only requires implementing the callback method (and that can be as simple as "return false;") and only adds a single data member to your class.

About the shutdown cleanup, it's not that bad actually. You only have to make sure that you clean up the things which require cross-dll calls in the special Finalization function provided for GM extensions, because at that point all the global objects and memory locations are still valid. As long as you don't access foreign pointers in the destructors of your global objects or the PROCESS_DETACH case in your DllMain, everything should be fine.

Edited by Medo42, 04 June 2011 - 06:19 PM.

  • 0

#35 Maarten Baert

Maarten Baert

    GMC Member

  • GMC Member
  • 750 posts
  • Version:GM8.1

Posted 04 June 2011 - 07:01 PM

The reason I don't like the destroy thing is because it breaks the 'ownership idiom', which I use extensively (and I think most C++ programmers do, you simply can't avoid it). It basically means every object has an owner which is responsible for destroying the object. If A is the owner of B, and A is destroyed, A will destroy B because B can't exist without an owner. Only A can destroy B, because A is the owner of B. Other objects can use B too, but can't destroy it.

There are three basic ownership patterns I use often:
- single owner: A creates Z, A destroys Z
- transfer of ownership: A creates Z, but A can transfer the ownership to B, and now B can destroy Z (but A can't anymore).
- shared ownership: Z is owned by multiple objects and it is automatically destroyed when all references to Z are gone. This is what Java does for all objects. It's easy to implement in C++ (e.g. boost::shared_ptr), but it's not very practical in GML because you would have to release the handles manually.

Your system currently doesn't fit with any of these patterns - the buffer isn't owned by anyone, and everyone can destroy it (or at least try it). But I think the third system would be ideal (if we can implement it). In a system with shared ownership anyone can release it, but it won't be destroyed until everyone has released it (that's really easy with reference counting, and also very fast).

More thoughts:

- HandlePool stores a set of used ids, and when a new stream/buffer is created it uses the first id that is not in use. But this is very slow (assuming a set is O(log(n)), this is O(n*log(n))), and also confusing. If you destroy a buffer with id 2, you would expect buffer_exists(2) would return false. But if a second buffer with the same id is created it will return true, even though the new id refers to a completely different buffer. In fact it could even be a stream now (if I understand correctly).

My implementation just started at 1 and counted up, I think that method is better. I don't think anyone will ever create and destroy more than 4 billion streams/buffers.

- Why is BufferProxy derived from AbstractBuffer? Sure, they have the same functions, but the entire point of using BufferProxy instead of AbstractBuffer is to make it portable. Virtual inheritance isn't portable, yet BufferProxy also uses virtual inheritance. That makes no sense. You can't safely call virtual functions of AbstractBuffer unless you know the dynamic type was declared by the same DLL as the calling DLL. IMO BufferProxy should be an ordinary class with normal (inline) functions, otherwise this won't work.

- BufferFragment is basically an optimization, so it would be great if it was optional. I don't think it's easy to implement for all types of buffers.

- I'm a bit worried about the performance of your version. You're doing an id lookup for every single function call, which will take some time. Suppose a DLL writes a 1MB file to a buffer byte per byte, for some reason. This will generate a lot of overhead. It would be a lot faster if BufferProxy would simply do the IP lookup in its constructor, and then use the pointer whenever something has to be done. I know it's less safe since it's possible to use a buffer/stream that has already been destroyed, but that's the responsibility of the DLL developer. Anyone who is capable of writing DLLs should also be capable of that. If there's a good reason not to rely on the pointer, the developer can just store the id and create the BufferProxy whenever it is needed (although storing buffer/stream id's is a bad idea anyway IMO, a stream or buffer could be destroyed at any time).

About the shutdown cleanup, it's not that bad actually. You only have to make sure that you clean up the things which require cross-dll calls in the special Finalization function provided for GM extensions, because at that point all the global objects and memory locations are still valid.

Have you tested it? It sounds logical, but are you sure GM doesn't just finalize and release one extension at a time?

Then again, what about abnormal shutdowns? What happens if an error occurs?

I don't have time to look at all the code at the moment, let alone write code (I have to study for my exams now). But from what I've seen it looks pretty good to me :).
  • 0
Posted Image

#36 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 05 June 2011 - 01:05 AM

The reason I don't like the destroy thing is because it breaks the 'ownership idiom', which I use extensively (and I think most C++ programmers do, you simply can't avoid it). It basically means every object has an owner which is responsible for destroying the object. If A is the owner of B, and A is destroyed, A will destroy B because B can't exist without an owner. Only A can destroy B, because A is the owner of B. Other objects can use B too, but can't destroy it.

There are three basic ownership patterns I use often:
- single owner: A creates Z, A destroys Z
- transfer of ownership: A creates Z, but A can transfer the ownership to B, and now B can destroy Z (but A can't anymore).
- shared ownership: Z is owned by multiple objects and it is automatically destroyed when all references to Z are gone. This is what Java does for all objects. It's easy to implement in C++ (e.g. boost::shared_ptr), but it's not very practical in GML because you would have to release the handles manually.

Your system currently doesn't fit with any of these patterns - the buffer isn't owned by anyone, and everyone can destroy it (or at least try it). But I think the third system would be ideal (if we can implement it). In a system with shared ownership anyone can release it, but it won't be destroyed until everyone has released it (that's really easy with reference counting, and also very fast).

I'm not completely sure if you are talking about ownership of the C++ buffer object, or of the handle. I assume from your remark about shared ownership that you are referring to the question of who owns the handle, but let me just address both :)

First, the C++ object. It is the responsibility of the buffer manager this object is shared through to establish the rule of ownership. All three models you list can be used here. In a single owner scenario, the manager creates and destroys the object himself. This would be a sensible model for the case I suggested above, where my Socket class itself acts as a buffer manager and shares two buffers. An example of the second model is the DefaultBufferManager - it explicitly takes ownership of the buffers it shares, but does not create them. The third can be achieved by writing a buffer managed much like the default one, but which stores shared pointers instead of plain ones.

The second question is, who should call the shared destroy function? First off, calling destroy should not be equated to actually destroying the buffer object, but rather to destroying the handle. The owner of the actual buffer object is free to continue using it after the handle is destroyed. So the question is, who owns the handle? At the moment, defining this is up to the code which shares the buffer and receives the handle from the library. All models of ownership are supported, but none is enforced. This is actually very much analogous to C++ pointers - in theory, everyone can call delete on them, but people who write functions make it clear if and when ownership is transferred, by documentation and conventions. If you call a function create_buffer, you will probably assume that you take ownership of the handle that is returned. If the function is called get_internal_buffer, not so much.

Now I said that all models are supported, but if someone calls destroy the handle may be gone, so what about shared ownership and releasing handles? That again is the same as in C++. Pointers don't support shared ownership by themselves either, but just as with pointers you can build a sharing mechanism on top of the current API. And it might be a good idea to add them to the main extension, though I'm not completely convinced, because you basically can't build a shared handle mechanism in GM which gives you the same safety and convenience as a boost::shared_ptr. For example, someone could accidentally release the same handle twice, which is harmless if you don't use reference counting. This can create hard to find bugs, because the effect might only become apparent much later, when the code holding the last reference suddenly has no buffer anymore. I think that in most cases, it will actually be easier to work with single ownership in GML code, and if the buffer is an integral part of a different resource, shared ownership does not even make sense.

So what about the ability of the manager to prevent deleting the handle? This is only a safeguard for when people do mix up the ownership question. It is meant for cases like get_internal_buffer, where the ownership of the handle stays with the owner of the buffer and is not transferred to the caller. In this case, different parts of the code might be using the same handle, relying on the assumption that the buffer will be valid as long as the owner (e.g. a socket) exists. If one of those parts accidentally calls destroy on this handle, I think it is a good thing that the destruction can be prevented by the manager, which knows that the destruction makes no sense. Think about it as making a private destructor that can only be used by friend classes - except that this here is a runtime mechanism of course.

My god this is becoming a wall of text. Let's wrap it up with a summary of my position: It is not the responsibility of the sharing library to select a model of ownership for the handles. The people using the library should be able to decide on the one that best fits their needs.

- HandlePool stores a set of used ids, and when a new stream/buffer is created it uses the first id that is not in use. But this is very slow (assuming a set is O(log(n)), this is O(n*log(n))), and also confusing. If you destroy a buffer with id 2, you would expect buffer_exists(2) would return false. But if a second buffer with the same id is created it will return true, even though the new id refers to a completely different buffer. In fact it could even be a stream now (if I understand correctly).

My implementation just started at 1 and counted up, I think that method is better. I don't think anyone will ever create and destroy more than 4 billion streams/buffers.

HandlePool uses the exact same mechanism - it hands out handles starting from 1 and counting up. It will NOT start searching for a new free handle from 0 every time allocate is called, but instead try the last issued handle plus one. The std::set is only included to correctly handle the case where the handles do wrap around. Unlikely? Maybe, but definitely not impossible. If some multiplayer game creates and destroys 50 buffers per step, at 30 steps per second, and he leaves it running for a whole month, he'll get into integer overflow territory - and it's very likely that there are some global buffers with low IDs around.

Until this actually happens, HandlePool::allocate() works with O(log(n)) complexity, and I do not think this will create a performance issue of any relevance. And of course, the misunderstanding in how handles are allocated makes your second point invalid as well. Handles will only be re-used when all possible handles have been used once. The only way to do better would be to hand out the least-recently destroyed handles first, but that would require at least log256((2^32)!) ~~ over 5GB for storing the destruction order, unless you compress :)

- Why is BufferProxy derived from AbstractBuffer? Sure, they have the same functions, but the entire point of using BufferProxy instead of AbstractBuffer is to make it portable.

No. The entire point of using BufferProxy is to provide a nice way to use the portable, binary-compatible C interface of the core library. There is no actual need to make it derive from AbstractBuffer, but it is convenient since it allows you to use a BufferProxy the same way as an "actual" buffer object (e.g. with the buffer/stream copy functions in AbstractBuffer.hpp).

Virtual inheritance isn't portable, yet BufferProxy also uses virtual inheritance. That makes no sense. You can't safely call virtual functions of AbstractBuffer unless you know the dynamic type was declared by the same DLL as the calling DLL. IMO BufferProxy should be an ordinary class with normal (inline) functions, otherwise this won't work.

Inheritance is not portable at all, if you go by binary compatibility. Once you leave POD territory, the compiler is e.g. allowed to insert padding or management information at the start of the object. It might work between current versions of mingw and msvc, but I chose to be safe and stick completely to structs, functions and function pointers for the binary compatible interface.

Open up src/shared_buffers/core/public_core_api.hpp. This is the interface that has to be binary compatible. Everything inside the core is only provided as part of the shared buffers DLL. Everything in src/shared_buffers is compiled together with the code that is using it (obviously, since it is headers-only). Yes, AbstractStream pointers are passed into the sharing library as void pointers, and later they are retrieved again, cast back to AbstractStream and have virtual methods called on them - but this happens in the functions that were passed in together with the pointer, and thus in code from the same DLL that provided the pointer in the first place.

For example, if you use BufferProxy to call the read method on an AbstractBuffer shared through the DefaultBufferManager from a different DLL, it first calls the read function in the shb_Buffer struct it receives from the core API. This is actually the read function defined in AbstractBufferManager.ipp as compiled by the same dll that provided the implementation pointer. The function then casts the implementation pointer back to an AbstractStream pointer and calls read on that.

- BufferFragment is basically an optimization, so it would be great if it was optional. I don't think it's easy to implement for all types of buffers.

I would not like to make it optional, because then you would have to create fallback code even when you know you are working with buffers. With the changes proposed in my previous post, it is trivial to provide an implementation with reasonable performance. You can implement it for everything which fits the description "collection of addressable bytes", using read() and a small temporary buffer. It would take up 5 lines, including curly braces. Memory buffers should not do it like that of course.

- I'm a bit worried about the performance of your version. You're doing an id lookup for every single function call, which will take some time. Suppose a DLL writes a 1MB file to a buffer byte per byte, for some reason. This will generate a lot of overhead. It would be a lot faster if BufferProxy would simply do the IP lookup in its constructor, and then use the pointer whenever something has to be done. I know it's less safe since it's possible to use a buffer/stream that has already been destroyed, but that's the responsibility of the DLL developer. Anyone who is capable of writing DLLs should also be capable of that. If there's a good reason not to rely on the pointer, the developer can just store the id and create the BufferProxy whenever it is needed (although storing buffer/stream id's is a bad idea anyway IMO, a stream or buffer could be destroyed at any time).

I have actually considered that when writing the class, and decided against it precisely because of the safety reasons. I'm not sure if the performance hit of the ID lookup would really be noticable except in extreme cases like the one you suggested, and in that case I'd place the fault for bad performance with the dll.

Before we decide on performance over as safe API, I'd like to see a benchmark of the difference. A map lookup should really just consist of a small number of integer comparisons and subsequent pointer-following, so it might not actually be a big factor. And even if it is, we could try some other variations first (e.g. using an unordered_map).


About the shutdown cleanup, it's not that bad actually. You only have to make sure that you clean up the things which require cross-dll calls in the special Finalization function provided for GM extensions, because at that point all the global objects and memory locations are still valid.

Have you tested it? It sounds logical, but are you sure GM doesn't just finalize and release one extension at a time?

Then again, what about abnormal shutdowns? What happens if an error occurs?

I have. About abnormal shutdowns - are the destructors even called in that case? If not, no problem. If yes, you need to make sure that your global objects don't do anything stupid in their destructors.

Wow. Believe it or not, it took me over three hours to write this. I wish I'd put that much enthusiasm into writing my thesis. Good night for now :)
  • 0

#37 Maarten Baert

Maarten Baert

    GMC Member

  • GMC Member
  • 750 posts
  • Version:GM8.1

Posted 05 June 2011 - 01:53 PM

The entire system sounds very complicated. I don't really care about the complexity of the internals, but if we want other DLLs to use this it should be as simple as possible for them. Nobody will use it if it takes hours just to understand what you're supposed to do. We should really keep that in mind.

[...]
My god this is becoming a wall of text. Let's wrap it up with a summary of my position: It is not the responsibility of the sharing library to select a model of ownership for the handles. The people using the library should be able to decide on the one that best fits their needs.

And that's exactly why I don't like the global destroy function. It only works for some types of buffers, and this is confusing for GM users. But if you force all DLLs to provide their own destroy functions, their is no confusion. If the buffer can be destroyed, there is a function. If there is no function, the buffer can't be destroyed.

Let's put it the other way around: what's the advantage of a global destroy function?

HandlePool uses the exact same mechanism - it hands out handles starting from 1 and counting up. It will NOT start searching for a new free handle from 0 every time allocate is called, but instead try the last issued handle plus one. The std::set is only included to correctly handle the case where the handles do wrap around. Unlikely? Maybe, but definitely not impossible. If some multiplayer game creates and destroys 50 buffers per step, at 30 steps per second, and he leaves it running for a whole month, he'll get into integer overflow territory - and it's very likely that there are some global buffers with low IDs around.

Well, as long as it doesn't hurt performance too much I don't really care. I used the same system in older versions of ExtremePhysics, but I dropped it because it was just way too slow. Granted, I didn't use std::set but my own hash table, which might be slower. But ExtremePhysics created and destroyed hundreds of contacts every step, and the same is not likely for buffers.

No. The entire point of using BufferProxy is to provide a nice way to use the portable, binary-compatible C interface of the core library. There is no actual need to make it derive from AbstractBuffer, but it is convenient since it allows you to use a BufferProxy the same way as an "actual" buffer object (e.g. with the buffer/stream copy functions in AbstractBuffer.hpp).

But it is confusing. Sure, you can cast a BufferProxy to AbstractBuffer and call it, it will work. But this suggests that it will also work for other AbstractBuffers, which is definitely not true. That's why I want to keep them separate, to avoid confusion.

There's a second reason to keep them separate: to make the interface simpler. Most DLL will probably use buffers without implementing their own buffers. They will never use AbstractBuffer, only BufferProxy. Yet now they have to include a class and implementation code they are never going to use. If you just remove the 'virtual' keyword and make BufferProxy a normal class, this wouldn't be needed. This wouldn't change anything except it won't be possible anymore to cast BufferProxy to AbstractBuffer. But I can't think of a good reason to do that cast anyway.

I have actually considered that when writing the class, and decided against it precisely because of the safety reasons. I'm not sure if the performance hit of the ID lookup would really be noticable except in extreme cases like the one you suggested, and in that case I'd place the fault for bad performance with the dll.

So you suggest the DLL should create its own local memory buffer implementation, so it can write the data to that buffer and in the end copy all the data to the actual destination buffer? Wasn't this unnecessary copy exactly what you were trying to avoid with BufferFragment?

Before we decide on performance over as safe API, I'd like to see a benchmark of the difference. A map lookup should really just consist of a small number of integer comparisons and subsequent pointer-following, so it might not actually be a big factor. And even if it is, we could try some other variations first (e.g. using an unordered_map).

I just tried it, 1M id lookups takes 10ms with 9 ids in the map (tested in VC++). This would mean you definitely shouldn't write a large file to a buffer one byte at a time.

I have. About abnormal shutdowns - are the destructors even called in that case?

Yes, Windows calls them anyway. I'm not talking about crashes here, just simple GM error messages. We don't want the DLLs to crash after an error message, that would be very annoying.

Can't we use some kind of 'soft link' to BufferProxys that is broken automatically when the buffer is destroyed? If we can do that we can simply destroy the buffers in any order we want. This would also avoid the safety problems with pointers in BufferProxy (if the buffer doesn't exist anymore the pointer will be set to NULL), so it could be a lot faster than id's too.

What I mean is something like this:
- shared_buffers.dll keeps a list of all buffers with their id. For every buffer it also keeps a list of pointers to BufferProxies that refer to the buffer. I will call this BufferInfo.
- When a BufferProxy is created, it is added to the list of the buffer (in shared_buffers.dll).
- When a BufferProxy is destroyed, it is removed from the list of the buffer (in shared_buffers.dll).
- When a buffer is destroyed by the DLL that defines that buffer, it will call an interface function which will destroy the BufferInfo.
- When a BufferInfo is destroyed, all BufferProxy objects are 'unlinked', i.e. their pointer to the buffer is set to NULL. If the BufferInfo is destroyed while the actual buffer still exists, the actual buffer is 'unlinked' (i.e. its pointer to BufferInfo is set to NULL).

So there's a soft link between
- The buffer implementation and BufferInfo
- BufferInfo and BufferProxy.
Now we can unload the DLLs in any order we want.
  • 0
Posted Image

#38 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 05 June 2011 - 04:36 PM

The entire system sounds very complicated. I don't really care about the complexity of the internals, but if we want other DLLs to use this it should be as simple as possible for them. Nobody will use it if it takes hours just to understand what you're supposed to do. We should really keep that in mind.

I don't think the system is at all complicated to use. There is some complexity under the covers, and the documentation obviously needs to explain how you are supposed to get a pointer to the core interface, but how complicated is it beyond that? If all you want to do is using shared streams or buffers, you simply include BufferProxy.hpp and can use the proxy classes to work with shared buffers as if they were local objects. I don't see how it could be much easier. Most people will only ever need to learn about those things.

A small number of users will want to create their own buffer or stream implementations, and they do that by writing an implementation to an abstract base class. Again, I can't really think of a way to make this easier or more natural. The Buffer base class even provides some methods as default implementations if you don't want to add them yourselves. The only additional thing you need to do is define a global instance of DefaultBufferManager and pass your created buffers to that for sharing.

And if you really want to make your own BufferManager, you can do that by implementing an abstract base class which requires a single method to be implemented.

Yes, it needs some documentation that explains those things, but I do not consider this difficult to use.

And that's exactly why I don't like the global destroy function. It only works for some types of buffers, and this is confusing for GM users. But if you force all DLLs to provide their own destroy functions, their is no confusion. If the buffer can be destroyed, there is a function. If there is no function, the buffer can't be destroyed.

Let's put it the other way around: what's the advantage of a global destroy function?

One advantage is that it allows you to transfer ownership of the handle to code that does not know the specific implementation. This is e.g. useful if you want to implement a generic handle sharing mechanism, because otherwise you have to pass your shared_handle a callback to the destroy function to call if the reference counter hits zero - effectively duplicating the mecahnism that is now in the core. Another advantage is that you don't need to remember the names of different implementation-specific destroy functions which all basically do the same thing. Those are not critical advantages, but I rather like them and still don't see the big drawback. There should be little room for confusion if the function which returns the handle to GML documents by its name and description that the handle which is returned is owned by another resource, not the GML code.


No. The entire point of using BufferProxy is to provide a nice way to use the portable, binary-compatible C interface of the core library. There is no actual need to make it derive from AbstractBuffer, but it is convenient since it allows you to use a BufferProxy the same way as an "actual" buffer object (e.g. with the buffer/stream copy functions in AbstractBuffer.hpp).

But it is confusing. Sure, you can cast a BufferProxy to AbstractBuffer and call it, it will work. But this suggests that it will also work for other AbstractBuffers, which is definitely not true. That's why I want to keep them separate, to avoid confusion.

You lost me here. Yes, you can cast BufferProxy to AbstractBuffer and it will behave like any other AbstractBuffer. But you can't.... uh, cast any other AbstractBuffer to AbstractBuffer? You obviously can. Or do you mean that BufferProxy can't be used everywhere that an AbstractBuffer can? But that's not the case either, you can even share an AbstractBuffer again through a BufferManager (though it would not make much sense of course - but it would work.) So what do you mean by "it" in "it will work for other AbstractBuffers"?

The way it is implemented now, BufferProxy and StreamProxy follow the Proxy pattern, and they suggest as much by their name. I again fail to see the source of your confusion.

There's a second reason to keep them separate: to make the interface simpler. Most DLL will probably use buffers without implementing their own buffers. They will never use AbstractBuffer, only BufferProxy. Yet now they have to include a class and implementation code they are never going to use. If you just remove the 'virtual' keyword and make BufferProxy a normal class, this wouldn't be needed. This wouldn't change anything except it won't be possible anymore to cast BufferProxy to AbstractBuffer. But I can't think of a good reason to do that cast anyway.

Yes, AbstractBuffer is not needed if you only use BufferProxy - but what is the downside? Are you worried about performance of the virtual calls? Those are never actually generated by the compiler unless you pass BufferProxies around by pointer or reference. If you insist, the default implementations could be moved into a different class, but I don't see the drawback in them at all. Slightly higher compile times? You're not going to use all the code that is present in a boost header library either, so why is it an issue here?

On the other hand, there is a good reason to allow treating a BufferProxy like an AbstractBuffer, and that is to allow writing functions that don't have to care whether they work on "local" buffer objects or shared ones.


I have actually considered that when writing the class, and decided against it precisely because of the safety reasons. I'm not sure if the performance hit of the ID lookup would really be noticable except in extreme cases like the one you suggested, and in that case I'd place the fault for bad performance with the dll.

So you suggest the DLL should create its own local memory buffer implementation, so it can write the data to that buffer and in the end copy all the data to the actual destination buffer? Wasn't this unnecessary copy exactly what you were trying to avoid with BufferFragment?

No, unless the DLL writes those bytes linearly - in which case, why doesn't it use the BufferFragment mechanism? If it really does something more unusual like reversing the order of an input sequence that can't be read backward - it's not difficult to reduce the number of calls to write() significantly by using a small fixed-size array on the stack as a buffer. If the DLL really needs to efficiently insert single bytes to scattered locations in the file with a throughput of several megabytes per second, then yes, it should not perform this operation directly on a shared buffer. Even without the ID lookup the performance would probably be better if it copied the whole buffer to something local, e.g. a DefaultMemBuffer with direct access to the internal array. And guess what, it can simply use the copyBuffer function for that, because BufferProxy is derived from AbstractBuffer! Isn't that handy?

I just tried it, 1M id lookups takes 10ms with 9 ids in the map (tested in VC++). This would mean you definitely shouldn't write a large file to a buffer one byte at a time.

What would be much more interesting: What is the difference when you take the whole access into account, not the isolated ID lookup? That is, how much is the actual speed difference when using a BufferProxy? There is probably no need to test this though, because I quite like the suggestion below :)

Can't we use some kind of 'soft link' to BufferProxys that is broken automatically when the buffer is destroyed? If we can do that we can simply destroy the buffers in any order we want. This would also avoid the safety problems with pointers in BufferProxy (if the buffer doesn't exist anymore the pointer will be set to NULL), so it could be a lot faster than id's too.

A good idea, I will look into that. The mechanism will have to rely on its users to always release soft links which are no longer in use, but this will not be difficult to ensure for BufferProxy, and if someone really wants to directly use the blank C interface he should better know what he's doing anyway.

Edit: There is actually a really good argument against getFragment: It creates alias problems. If you have an algorithm that uses two buffers, but it gets passed the same one twice (e.g. copyBuffer, to copy from one location in the buffer to another), the semantics of getFragment can become very confusing. If you get a fragment from buffer1 and call buffer2.write(fragment.getStart(), fragment.getLength()), this is suddenly invalid if buffer1 and buffer2 are actually the same, which means the current implementation of copyBuffer is broken. So I'm definitely reconsidering getFragment now.

Edited by Medo42, 05 June 2011 - 05:29 PM.

  • 0

#39 Maarten Baert

Maarten Baert

    GMC Member

  • GMC Member
  • 750 posts
  • Version:GM8.1

Posted 05 June 2011 - 06:06 PM

One advantage is that it allows you to transfer ownership of the handle to code that does not know the specific implementation. This is e.g. useful if you want to implement a generic handle sharing mechanism, because otherwise you have to pass your shared_handle a callback to the destroy function to call if the reference counter hits zero - effectively duplicating the mecahnism that is now in the core. Another advantage is that you don't need to remember the names of different implementation-specific destroy functions which all basically do the same thing. Those are not critical advantages, but I rather like them and still don't see the big drawback. There should be little room for confusion if the function which returns the handle to GML documents by its name and description that the handle which is returned is owned by another resource, not the GML code.

I just think it's confusing to GM users that some buffers can be destroyed and some buffers can't. I'm sure some users will try to destroy buffers that can't be destroyed, or never destroy memory buffers because they aren't used to destroying them. And I don't like the destroy callback function, because callback functions always complicate things. Now other DLLs can call the destroy function which will call the destroy callback function. So hypothetically this means that calling BufferProxy::read on some buffer defined by an external DLL could result in my own destroy callback function being called. Every time I call a function of the API I would have to take this into account, or 'Bad Things™' could happen :). That's why I want to avoid destroy callback functions. I know this scenario is unlikely, but it's perfectly possible.

You lost me here. Yes, you can cast BufferProxy to AbstractBuffer and it will behave like any other AbstractBuffer. But you can't.... uh, cast any other AbstractBuffer to AbstractBuffer? You obviously can. Or do you mean that BufferProxy can't be used everywhere that an AbstractBuffer can? But that's not the case either, you can even share an AbstractBuffer again through a BufferManager (though it would not make much sense of course - but it would work.) So what do you mean by "it" in "it will work for other AbstractBuffers"?

The way it is implemented now, BufferProxy and StreamProxy follow the Proxy pattern, and they suggest as much by their name. I again fail to see the source of your confusion.

I didn't realize AbstractBuffer had functions that weren't overwritten by BufferProxy, so it seemed pointless to me. But with those functions it makes sense.

No, unless the DLL writes those bytes linearly - in which case, why doesn't it use the BufferFragment mechanism?

... I should have thought about that :P. But it's not a big issue anyway since DefaultMemBuffer is 'local' anyway. You can always use it directly and copy the contents later.

Edited by Maarten Baert, 05 June 2011 - 06:08 PM.

  • 0
Posted Image

#40 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 05 June 2011 - 06:30 PM

And I don't like the destroy callback function, because callback functions always complicate things. Now other DLLs can call the destroy function which will call the destroy callback function. So hypothetically this means that calling BufferProxy::read on some buffer defined by an external DLL could result in my own destroy callback function being called. Every time I call a function of the API I would have to take this into account, or 'Bad Things™' could happen . That's why I want to avoid destroy callback functions. I know this scenario is unlikely, but it's perfectly possible.

This is once more a good point. I'll have to think about that a bit :)


You lost me here. Yes, you can cast BufferProxy to AbstractBuffer and it will behave like any other AbstractBuffer. But you can't.... uh, cast any other AbstractBuffer to AbstractBuffer? You obviously can. Or do you mean that BufferProxy can't be used everywhere that an AbstractBuffer can? But that's not the case either, you can even share an AbstractBuffer again through a BufferManager (though it would not make much sense of course - but it would work.) So what do you mean by "it" in "it will work for other AbstractBuffers"?

The way it is implemented now, BufferProxy and StreamProxy follow the Proxy pattern, and they suggest as much by their name. I again fail to see the source of your confusion.

I didn't realize AbstractBuffer had functions that weren't overwritten by BufferProxy, so it seemed pointless to me. But with those functions it makes sense.

Sorry for the confusion, that sentence should read "you can even share a BufferProxy again through a BufferManager".

To confuse you further though, the existence of those convenience methods in AbstractBuffer is not the main reason why that inheritance association is a good idea, though it is related. Just look at the copyBuffer function defined in AbstractBuffer.hpp - as it is, it can work with any AbstractBuffer, including BufferProxy. If BufferProxy wasn't an AbstractBuffer, you'd have to write new functions to work with any combination of Proxy and non-proxy buffers/streams, and they would all have the same code except for the types.

Just in case you missed the edit above: There is actually a really good argument against getFragment: It creates alias problems. If you have an algorithm that uses two buffers, but it gets passed the same one twice (e.g. copyBuffer, to copy from one location in the buffer to another), the semantics of getFragment can become very confusing. If you get a fragment from buffer1 and call buffer2.write(fragment.getStart(), fragment.getLength()), this is suddenly invalid if buffer1 and buffer2 are actually the same, which means the current implementation of copyBuffer is broken. Another piece in the puzzle is that using a small temporary buffer and copying with read() and write() is probably not twice as slow as only copying once, because the small buffer will probably stay in L1 cache the whole time. So, now I think removing getFragment again might actually be a good idea, since it is more difficult use than expected and might not actually improve performance all that much.
  • 0

#41 Maarten Baert

Maarten Baert

    GMC Member

  • GMC Member
  • 750 posts
  • Version:GM8.1

Posted 05 June 2011 - 06:53 PM

To confuse you further though, the existence of those convenience methods in AbstractBuffer is not the main reason why that inheritance association is a good idea, though it is related. Just look at the copyBuffer function defined in AbstractBuffer.hpp - as it is, it can work with any AbstractBuffer, including BufferProxy. If BufferProxy wasn't an AbstractBuffer, you'd have to write new functions to work with any combination of Proxy and non-proxy buffers/streams, and they would all have the same code except for the types.

I would have done it the other way around - make copyBuffer use BufferProxy instead. But your solution is obviously more efficient if one or both buffers are local.

About endianness, I still don't think setting it on a per-buffer (or per-stream) basis is a good idea. The buffer or stream has no idea of the meaning of the data. It could be uint32_t, double, or something completely else. The buffer/stream can't possibly do this conversion correctly for all possible data types. Even if it could, what would it do if you want to copy data from a little-endian to a big-endian buffer?

I think we should add extra arguments to the GM buffer functions, or add a global setting associated with a new GM function to set the endianness. I assume other DLL developers can easily change the endianness themselves while reading or writing. It's not hard at all:
template<typename A>
inline A reverse_endianness(A a) {
    uint8_t destination[sizeof(A)];
    uint8_t *source = (uint8_t*)(&a);
    for(unsigned int i = 0; i<sizeof(A); ++i) {
        destination[sizeof(A)-1-i] = source[i];
    }
    return *(A*)(destination);
}
You can add a few specializations for common types like uint32_t to make it more efficient of course:
template<>
inline uint32_t reverse_endianness<uint32_t>(uint32_t a) {
    return (a>>24)|(a<<24)|((a&0xff00)<<8)|((a&0xff0000)>>8);
}
template<>
inline int32_t reverse_endianness<int32_t>(int32_t a) {
    return reverse_endianness<uint32_t>(a); // bitshifts aren't safe for signed types
}

Edited by Maarten Baert, 05 June 2011 - 06:53 PM.

  • 0
Posted Image

#42 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 05 June 2011 - 07:19 PM

About endianness, I still don't think setting it on a per-buffer (or per-stream) basis is a good idea. The buffer or stream has no idea of the meaning of the data. It could be uint32_t, double, or something completely else. The buffer/stream can't possibly do this conversion correctly for all possible data types. Even if it could, what would it do if you want to copy data from a little-endian to a big-endian buffer?

The endianness information would only be some bit of meta information managed with the buffer. It would be up to the functions like read_uint32() etc. to use this information (or more generically, read_X :)). Once data is written to a buffer, it is just a collection of bytes, so copies will not magically change the endianness of the data.
  • 0

#43 SyncViews

SyncViews

    GMC Member

  • GMC Member
  • 392 posts

Posted 17 August 2011 - 04:09 PM

In a previous thread of mine I was pointed here. Not sure what happened with the previous stuff mentioned in here, but I think simplicity is the key. Something lightweight that can just refer to data in a common way, and leave it completely up to extensions as to what they do with this data (that could be a C++ object, a block of data read from a file, etc.). I wrote some test code and converted some of my personal DLL's (one dealing with binary data buffers and reading/writing the contents, and another that wraps around C++ stream objects, which now have some crossover with the ability to read/write buffers using the stream extension) and came up with this, which seems to work quite nicely.

#ifndef MEM_HANDLES_H
#define MEM_HANDLES_H
#define gm_null_id 0
#ifdef __cplusplus
extern "C"{
#endif
    /**Get a new id for a pointer.
     * 
     * If the pointer p is null, the null id 0 will always be returned.
     */
    unsigned gm_new_id(void *p);
    /**Returns 1 if the id is valid, else 0. The null id 0 is always valid.*/
    int gm_id_is_valid(unsigned id);
    /**Gets number of currently valid ids excluding the null id. The Main
     * intent is to provide an easy end of game check to see if items were not
     * deleted.
     */
    unsigned gm_id_count(void);
    /**Gets the pointer for the id.*/
    void *gm_id_ptr(unsigned id);
    /**Deletes the id. This does NOT delete the memory the id
     * references. Deleting the null id 0 has no effect.
     */
    void gm_del_id(unsigned id);
#ifdef __cplusplus
}//extern "C"
#endif

#endif
This DLL wouldn’t be used by the GML coder directly (except perhaps a version of the gm_id_count and gm_id_is_valid to aid in debugging), but rather within the extensions themselves, using gm_new_id/gm_del_id in there creation/destroy functions and gm_id_ptr anywhere they need the pointer value.

The intention is this DLL is shipped with the game and the extensions load and use it. Important to note the intention is not to embed it in each extensions GEX, that results in multiple copies and thus multiple instances of the DLL, although if that’s really wanted for some reason I guess it could use Win32 named shared memory (with a name based around the PID, eg "<PID>-GM-ID-Global") so all the instances use the same global memory region without too much effort...
  • 0

#44 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 17 August 2011 - 06:30 PM

Not sure what happened with the previous stuff mentioned in here

There is an almost finished implementation which you can find on Github: https://github.com/M...-shared-buffers
It still has a few problem points that are not addressed yet, and they need to be addressed before we release this because once it's out, it will be very difficult to change anything. However, the code is in pretty good shape. The project has stalled for quite a while, but I do intend to pick it back up soon.

Your solution is very simple, and that's its main strong point. However, I see a few problems with it. For one thing, you don't offer a function to find out how long a memory buffer is, so you would need to rely on passing this information some other way, which seems error prone. Also, you assume that every buffer will be a contiguous region of memory. I could actually accept this since it is enough for most purposes, but, for example, I am using a special queue data structure for the send buffer in my networking library, and would prefer if I could share this buffer directly anyway.

Since your implementation doesn't have read and write position pointers, you can't create generic buffer read/write commands (read_byte(buffer) etc.), so you would still have a seperate version of them for every extension that uses its own buffers.

Maybe Maarten and I did design this a bit more generic than necessary. It tries to solve more problems than your solution, but it is also more difficult to understand. However, a small tutorial is probably enough to explain how to use our library. I'll write one over the next days since it's needed anyway. Maybe I can still win you over to this solution after all :)
  • 0

#45 SyncViews

SyncViews

    GMC Member

  • GMC Member
  • 392 posts

Posted 17 August 2011 - 08:28 PM

Well the main point behind my idea follows the following:
  • It works for all pointers as a standard solution to reference objects in GM. 90% of these I suspect are objects only used by one extension directly and so anything other than the pointer value is unneeded overhead.
  • Keeps a simple concept for a null pointer. My current idea is a null id value, but this could also be a made into a “gm_id_isnull” function (likely the best way if people think changing the value of an existing id is useful to solve valid->null and null->valid change issues).
  • Can be extended. As long as people follow the document API’s and those API’s are not changed, new ones can be added, e.g. to attach some kind of type id to each pointer to help catch bad casts.
  • Is simple to use, and offers easy migration from existing internal solutions.
  • Shouldnt get broken by possible future changes to the GM runner, and a possible move to 64bit.
Everything else can be built on that, as I did already with my personal buffer DLL. There’s the internal buffer object id’s which can only be passed back to my DLL (since the contents is completely undefined) and some methods like “buffer_get_data” to retrieve an id that refers to the buffers internal data and is used to exchange buffers with other extensions (like my c++ stream wrapper extension).
The best bit is its really simple. To convert my stream extension, I made one change to my create functions, one change to my delete function, and then did a find and replace on my cpp, job done. Most other extensions will have a similar adoption plan.

So for example if a socket extension gives me a data buffer id and a byte length, I might go (my stream extension isn’t quite this complete yet, only providing fstream and stringstream capabilities so far, and of course stringstream does indeed copy the data, but this is my aim):
socketdata = socket_read_packet(socket);
socketdata_length = socket_read_length(socket);
//id of data buffer, the length of it, and type
//STREAM_REF_ONLY = don’t copy the data ever. If writing, writing beyond length = error
stream = stream_create_memory_existing(socketdata, socketdata_length, stream_ref_only | stream_in | stream_binary);
//I implemented the “default” binary strings as 16bit length prefix followed by length bytes of utf8
//There are options if you binary string isn’t in this form
name = stream_bin_read_string(stream);
address = stream_bin_read_string(stream);
age = stream_bin_read_uint16(stream);
success = stream_good(stream);
stream_delete(stream);
socket_del_read_buffer(socketdata);

Edited by SyncViews, 17 August 2011 - 08:29 PM.

  • 0

#46 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 17 August 2011 - 09:38 PM

Maybe I misunderstood you: I thought the pointer returned by gm_id_ptr was always a pointer to the raw data, but if you intend to share C++ objects like e.g. streams this way, your library is actually more like a generic handle manager.

Seeing that, your dll might not be all that different from ours. At its core, it is mainly a small library that assigns IDs to buffers or streams. However, instead of sharing pointers to just anything, we share C++ objects which need to implement a defined set of functions for accessing data. At least, that's what it looks like when you use it. Under the hood all calls go through a C api for cross-compiler compatibility, but the C++ part of the library neatly hides this from you.

The problem with sharing just any kind of C++ object as you propose is that, without a common convention, you will not be able to get extensions to work together unless they were specifically designed for it. The mention of a function like buffer_get_data suggests that you plan a convention for buffers on top of your handle sharing code; In this case, you can view our dll as a combination of handle sharing and a buffer convention.

Edited by Medo42, 17 August 2011 - 09:39 PM.

  • 0

#47 SyncViews

SyncViews

    GMC Member

  • GMC Member
  • 392 posts

Posted 17 August 2011 - 11:28 PM

Maybe I misunderstood you: I thought the pointer returned by gm_id_ptr was always a pointer to the raw data, but if you intend to share C++ objects like e.g. streams this way, your library is actually more like a generic handle manager.

Yes that was my intention to provide a generic handle manager, with the capability to represent data that may be shared. Either really simple stuff like arrays, stuff defined by the user such as binary file or network data, or stuff defined by the extension if the writer decided to publish the format of some objects and/or provide an additional interface intended for extensions (e.g. the C++ class declarations, although that leavse the compiler issue. C-API or COM-like API's would be portable though).

The problem with sharing just any kind of C++ object as you propose is that, without a common convention, you will not be able to get extensions to work together unless they were specifically designed for it.

Well yes, but a major part of my idea is a standard way to solve the handle issue in a simple low overhead and generic way, and not force extensions to adopt two standards (one for private objects, and one for data buffers) when interacting with GM.

Seeing that, your dll might not be all that different from ours. At its core, it is mainly a small library that assigns IDs to buffers or streams....

Well, from a lot of what your library contains, I see them more as two different levels. Mine is like the pointer/reference/handle, and yours is more like the std::ios. So you could actually build what your looking at ontop of my idea as the convention for those types of objects.

Allthough I'm still not totally convinced on why you need such a heavy weight object, your buffer interface seems more like an std::stringstream to me than a byte array, and seems to miss what in my mind is the most important function of all, the "just give me the pointer to the data" which is useful for integrating with legacy code or places where you just want todo somthing with the whole buffer e.g:
myfstream.write((const char*)shbbuffer->bufferInterface->getData(shbbuffer->implementation), shbbuffer->bufferInterface->getLength(shbbuffer->implementation);

Also, dont you think a COM interface would be nicer than this whole "shbbuffer->someinterface->somemethod(shbbuffer->implementation, ...)" business (unless I misunderstood the intent of the code, but thats what it looked like)? Or at least hide the dynamic dispatch inside the dll and provide a "shb_somemethod(shbbuffer, ...)" as th epublic API.
  • 0

#48 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 18 August 2011 - 12:24 AM

Well, from a lot of what your library contains, I see them more as two different levels. Mine is like the pointer/reference/handle, and yours is more like the std::ios. So you could actually build what your looking at ontop of my idea as the convention for those types of objects.

I thought about that, and it might make sense to seperate the two issues. This can always be done later though, since my dll could be modified to use your handle manager without changing the public interface.

Allthough I'm still not totally convinced on why you need such a heavy weight object, your buffer interface seems more like an std::stringstream to me than a byte array, and seems to miss what in my mind is the most important function of all, the "just give me the pointer to the data" which is useful for integrating with legacy code or places where you just want todo somthing with the whole buffer

There was some back and forth about having a getData function. Maarten opposed it since it prevents you from e.g. sharing files as bufferes (similar to memory-mapped I/O), or sharing more complicated data structures like queues as buffers, since they don't consist of a single contiguous memory region. I tried implementing a compromise solution but it turned out to be far too error prone.

The current solution uses a small stack-allocated buffer for operations that move large ammounts of data, e.g. copying from one buffer to another. Due to the fact that the data has to pass through the CPU cache anyway, this might not actually be much slower than shifting the data directly (not actually tested yet), but it's a bit more unwieldy of course. On the other hand, if you want to support the more generic streams in your function as well, you need a solution like that anyway. I'm not totally convinced the current way is the best, but I did spend some time thinking about the pros and cons.

Also, dont you think a COM interface would be nicer than this whole "shbbuffer->someinterface->somemethod(shbbuffer->implementation, ...)" business (unless I misunderstood the intent of the code, but thats what it looked like)? Or at least hide the dynamic dispatch inside the dll and provide a "shb_somemethod(shbbuffer, ...)" as th epublic API.


You are looking at the "core" C api. As I said above, it is required for compiler compatibility (and for that reason it needs to be the definitive api), but you normally don't work with it directly. Instead, you create a BufferProxy or StreamProxy object and work with that. Let's pretend buffers had a getData method, then your example from above could be written like this:
BufferProxy buffer(coreApi, bufferId);
myfstream.write((const char*)buffer.getData(), buffer.getLength());

Edited by Medo42, 18 August 2011 - 12:25 AM.

  • 0

#49 SyncViews

SyncViews

    GMC Member

  • GMC Member
  • 392 posts

Posted 18 August 2011 - 10:18 AM

I think a get data is important, even if you reallly really for some reason that is beyond me want to pretend that a buffer and a stream are the same thing past the "they both represent data", not providing it just means lots of programmers will do a read(everything) solution to get the data which is suboptimal in every case (since a buffer could just return a pointer without an allocation and copy, whereas of course streams would need todo the read to get the data). Espcially since numerous existing API's require a databuffer, not some kind of stream object.


Personally I really think following what has been done else where and saying buffer != stream is the best solution, and then provide a standard in memory stream object that can be made to operate on buffers if desired.

Buffers:
  • You know the size of them
  • They are random access
  • All data is considered as being there right now
  • May be read only
  • Often dont have the means to resize
  • Resize is often expensive if they can
  • Memory is generally owned by whoever created the buffer
  • Very low overhead
  • Since the data is there, it can generally be assumed a read/write operation will not fail.
Streams:
  • You may or may not know the size of them
  • May or may not be seekable, and even if they are, random access may be slow
  • Reading and writing data is often slow
  • Often perform some form of buffering to make the effective read and write less slow (since telling say HDD to read 4 bytes 10k times is far slower than one 40K read)
  • Reads may even block for long periods (e.g. pipe or socket stream)
  • Often the programmer needs the tools to get around this (only reading data that is available right now, using asyncronous operations, etc.)
  • Many different implementations for different types of stream
  • Memory is often owned by the stream (exception would be some in-memory streams that provide stream interfaces to buffers and other containers, where the buffer/container continues to own the data, such in-memory streams are also the only type of stream to generally be random-access and have fast-read write).
  • Streams may fail at any time (overflowed the network/pipe buffer, got disconnected, etc) and the programmer must be aware of this when ever they are using stream objects.

Of course somtimes it makes sense to get a buffer from a stream or a stream interface from a buffer, such as when the data size is very small, and a buffer is just easier to work with in this case, or the buffer contains some complex data where stream read/write operations are easier to use (but in many other cases, the buffer might be nothing more than an array, and an array is always simplest as, well an array).

Edited by SyncViews, 18 August 2011 - 10:22 AM.

  • 0

#50 Medo42

Medo42

    GMC Member

  • GMC Member
  • 317 posts

Posted 18 August 2011 - 02:02 PM

I think a get data is important, even if you reallly really for some reason that is beyond me want to pretend that a buffer and a stream are the same thing past the "they both represent data", not providing it just means lots of programmers will do a read(everything) solution to get the data which is suboptimal in every case (since a buffer could just return a pointer without an allocation and copy, whereas of course streams would need todo the read to get the data). Espcially since numerous existing API's require a databuffer, not some kind of stream object.

I don't want to pretend that a stream is a buffer, but since a stream is more generic you can use every buffer as a stream. If you write an algorithm to work with streams, e.g. calculate the md5 sum of all currently available data, it will automatically work with buffers as well. If you want to make use of a getData function, your implementation will only work with buffers, so if you still want to support streams as well you need two sperate implementations, one for buffers and one for streams. Of course, many functions don't need to support streams. For those, using getData would save some work.

On the other hand, being able to directly access the memory pointer can cause subtle problems. For example, a very simple way to copy a whole buffer which I actually implemented at one point is this: destBuffer.write(srcBuffer.getData(), srcBuffer.getLength());
This looks rather innocent until your consider what happens if the source and destination are actually the same buffer: If the buffer needs to be resized during the write operation, the source data pointer might suddenly become invalid because the resize moved the buffer content somewhere else. If buffers never hand out the internal data pointer, this kind of problem can't happen. You might instead try to avoid this potential source of confusion by specifying that the write function doesn't implicitly resize the buffer, or by removing it completely, but that would make other parts of the code more complicated.

Many APIs that take data pointers allow you to provide the data in several buffers with subsequent calls, so you can use a small temp buffer on the stack and loop until you have processed as much data as you need. For that matter, simply dumping the entire data content into one big buffer is perfectly fine for many applications, and unless speed is essential (which it often isn't) I'd even prefer this over reading the data in small buffers, since it is easier to understand and less error prone. Admittedly, getData would still be easier though.

Apart from ease of use, the extra copying is also a performance issue of course - but far less so than you might think, because when using a small temporary buffer, the data can stay in the CPUs L1 cache. I've actually benchmarked some different approaches now, by copying one buffer to another in various ways. The size of the buffer was 100MB, and it was copied 100 times in a GM loop. The first approach used getData to prevent any extra copying and took 5.8 seconds (1.73GB/s). This is basically the speed of copying the data with a single memcpy call. The second one used only the read and write functions and a small buffer on the stack to move the data through. With a buffer size of 4KB, this approach took 6.1 seconds (1.63GB/s), which is still 94% of the speed with getData. On the other hand, dumping everything into a big buffer first and writing from there took 18.6 seconds (0.54GB/s, or 31% of the getData speed). This is obviously not so great, but for many applications it would still be sufficient.

I'd discuss your suggestions for buffer/stream definitions, but I really need to get working again. I'll be back later.
  • 0




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users